comparison man/internals/internals.texi @ 442:abe6d1db359e r21-2-36

Import from CVS: tag r21-2-36
author cvs
date Mon, 13 Aug 2007 11:35:02 +0200
parents 8de8e3f6228a
children 576fb035e263
comparison
equal deleted inserted replaced
441:72a7cfa4a488 442:abe6d1db359e
67 67
68 @author Ben Wing 68 @author Ben Wing
69 @author Martin Buchholz 69 @author Martin Buchholz
70 @author Hrvoje Niksic 70 @author Hrvoje Niksic
71 @author Matthias Neubauer 71 @author Matthias Neubauer
72 @author Olivier Galibert
72 @page 73 @page
73 @vskip 0pt plus 1fill 74 @vskip 0pt plus 1fill
74 75
75 @noindent 76 @noindent
76 Copyright @copyright{} 1992 - 1996 Ben Wing. @* 77 Copyright @copyright{} 1992 - 1996 Ben Wing. @*
116 * The XEmacs Object System (Abstractly Speaking):: 117 * The XEmacs Object System (Abstractly Speaking)::
117 * How Lisp Objects Are Represented in C:: 118 * How Lisp Objects Are Represented in C::
118 * Rules When Writing New C Code:: 119 * Rules When Writing New C Code::
119 * A Summary of the Various XEmacs Modules:: 120 * A Summary of the Various XEmacs Modules::
120 * Allocation of Objects in XEmacs Lisp:: 121 * Allocation of Objects in XEmacs Lisp::
122 * Dumping::
121 * Events and the Event Loop:: 123 * Events and the Event Loop::
122 * Evaluation; Stack Frames; Bindings:: 124 * Evaluation; Stack Frames; Bindings::
123 * Symbols and Variables:: 125 * Symbols and Variables::
124 * Buffers and Textual Representation:: 126 * Buffers and Textual Representation::
125 * MULE Character Sets and Encodings:: 127 * MULE Character Sets and Encodings::
132 * Glyphs:: 134 * Glyphs::
133 * Specifiers:: 135 * Specifiers::
134 * Menus:: 136 * Menus::
135 * Subprocesses:: 137 * Subprocesses::
136 * Interface to X Windows:: 138 * Interface to X Windows::
137 * Index:: Index including concepts, functions, variables, 139 * Index::
138 and other terms. 140
139 141 @detailmenu
140 --- The Detailed Node Listing --- 142
141 143 --- The Detailed Node Listing ---
142 Here are other nodes that are inferiors of those already listed,
143 mentioned here so you can get to them in one step:
144 144
145 A History of Emacs 145 A History of Emacs
146 146
147 * Through Version 18:: Unification prevails. 147 * Through Version 18:: Unification prevails.
148 * Lucid Emacs:: One version 19 Emacs. 148 * Lucid Emacs:: One version 19 Emacs.
149 * GNU Emacs 19:: The other version 19 Emacs. 149 * GNU Emacs 19:: The other version 19 Emacs.
150 * GNU Emacs 20:: The other version 20 Emacs.
150 * XEmacs:: The continuation of Lucid Emacs. 151 * XEmacs:: The continuation of Lucid Emacs.
151 152
152 Rules When Writing New C Code 153 Rules When Writing New C Code
153 154
154 * General Coding Rules:: 155 * General Coding Rules::
155 * Writing Lisp Primitives:: 156 * Writing Lisp Primitives::
156 * Adding Global Lisp Variables:: 157 * Adding Global Lisp Variables::
158 * Coding for Mule::
157 * Techniques for XEmacs Developers:: 159 * Techniques for XEmacs Developers::
160
161 Coding for Mule
162
163 * Character-Related Data Types::
164 * Working With Character and Byte Positions::
165 * Conversion to and from External Data::
166 * General Guidelines for Writing Mule-Aware Code::
167 * An Example of Mule-Aware Code::
158 168
159 A Summary of the Various XEmacs Modules 169 A Summary of the Various XEmacs Modules
160 170
161 * Low-Level Modules:: 171 * Low-Level Modules::
162 * Basic Lisp Modules:: 172 * Basic Lisp Modules::
179 * Garbage Collection - Step by Step:: 189 * Garbage Collection - Step by Step::
180 * Integers and Characters:: 190 * Integers and Characters::
181 * Allocation from Frob Blocks:: 191 * Allocation from Frob Blocks::
182 * lrecords:: 192 * lrecords::
183 * Low-level allocation:: 193 * Low-level allocation::
184 * Pure Space::
185 * Cons:: 194 * Cons::
186 * Vector:: 195 * Vector::
187 * Bit Vector:: 196 * Bit Vector::
188 * Symbol:: 197 * Symbol::
189 * Marker:: 198 * Marker::
190 * String:: 199 * String::
191 * Compiled Function:: 200 * Compiled Function::
201
202 Garbage Collection - Step by Step
203
204 * Invocation::
205 * garbage_collect_1::
206 * mark_object::
207 * gc_sweep::
208 * sweep_lcrecords_1::
209 * compact_string_chars::
210 * sweep_strings::
211 * sweep_bit_vectors_1::
212
213 Dumping
214
215 * Overview::
216 * Data descriptions::
217 * Dumping phase::
218 * Reloading phase::
219
220 Dumping phase
221
222 * Object inventory::
223 * Address allocation::
224 * The header::
225 * Data dumping::
226 * Pointers dumping::
192 227
193 Events and the Event Loop 228 Events and the Event Loop
194 229
195 * Introduction to Events:: 230 * Introduction to Events::
196 * Main Loop:: 231 * Main Loop::
226 MULE Character Sets and Encodings 261 MULE Character Sets and Encodings
227 262
228 * Character Sets:: 263 * Character Sets::
229 * Encodings:: 264 * Encodings::
230 * Internal Mule Encodings:: 265 * Internal Mule Encodings::
266 * CCL::
231 267
232 Encodings 268 Encodings
233 269
234 * Japanese EUC (Extended Unix Code):: 270 * Japanese EUC (Extended Unix Code)::
235 * JIS7:: 271 * JIS7::
237 Internal Mule Encodings 273 Internal Mule Encodings
238 274
239 * Internal String Encoding:: 275 * Internal String Encoding::
240 * Internal Character Encoding:: 276 * Internal Character Encoding::
241 277
242 The Lisp Reader and Compiler
243
244 Lstreams 278 Lstreams
279
280 * Creating an Lstream:: Creating an lstream object.
281 * Lstream Types:: Different sorts of things that are streamed.
282 * Lstream Functions:: Functions for working with lstreams.
283 * Lstream Methods:: Creating new lstream types.
245 284
246 Consoles; Devices; Frames; Windows 285 Consoles; Devices; Frames; Windows
247 286
248 * Introduction to Consoles; Devices; Frames; Windows:: 287 * Introduction to Consoles; Devices; Frames; Windows::
249 * Point:: 288 * Point::
250 * Window Hierarchy:: 289 * Window Hierarchy::
290 * The Window Object::
251 291
252 The Redisplay Mechanism 292 The Redisplay Mechanism
253 293
254 * Critical Redisplay Sections:: 294 * Critical Redisplay Sections::
255 * Line Start Cache:: 295 * Line Start Cache::
296 * Redisplay Piece by Piece::
256 297
257 Extents 298 Extents
258 299
259 * Introduction to Extents:: Extents are ranges over text, with properties. 300 * Introduction to Extents:: Extents are ranges over text, with properties.
260 * Extent Ordering:: How extents are ordered internally. 301 * Extent Ordering:: How extents are ordered internally.
261 * Format of the Extent Info:: The extent information in a buffer or string. 302 * Format of the Extent Info:: The extent information in a buffer or string.
262 * Zero-Length Extents:: A weird special case. 303 * Zero-Length Extents:: A weird special case.
263 * Mathematics of Extent Ordering:: A rigorous foundation. 304 * Mathematics of Extent Ordering:: A rigorous foundation.
264 * Extent Fragments:: Cached information useful for redisplay. 305 * Extent Fragments:: Cached information useful for redisplay.
265 306
266 Faces 307 @end detailmenu
267
268 Glyphs
269
270 Specifiers
271
272 Menus
273
274 Subprocesses
275
276 Interface to X Windows
277
278 @end menu 308 @end menu
279 309
280 @node A History of Emacs, XEmacs From the Outside, Top, Top 310 @node A History of Emacs, XEmacs From the Outside, Top, Top
281 @chapter A History of Emacs 311 @chapter A History of Emacs
282 @cindex history of Emacs 312 @cindex history of Emacs
313 * GNU Emacs 19:: The other version 19 Emacs. 343 * GNU Emacs 19:: The other version 19 Emacs.
314 * GNU Emacs 20:: The other version 20 Emacs. 344 * GNU Emacs 20:: The other version 20 Emacs.
315 * XEmacs:: The continuation of Lucid Emacs. 345 * XEmacs:: The continuation of Lucid Emacs.
316 @end menu 346 @end menu
317 347
318 @node Through Version 18 348 @node Through Version 18, Lucid Emacs, A History of Emacs, A History of Emacs
319 @section Through Version 18 349 @section Through Version 18
320 @cindex Gosling, James 350 @cindex Gosling, James
321 @cindex Great Usenet Renaming 351 @cindex Great Usenet Renaming
322 352
323 Although the history of the early versions of GNU Emacs is unclear, 353 Although the history of the early versions of GNU Emacs is unclear,
426 version 18.58 released ?????. 456 version 18.58 released ?????.
427 @item 457 @item
428 version 18.59 released October 31, 1992. 458 version 18.59 released October 31, 1992.
429 @end itemize 459 @end itemize
430 460
431 @node Lucid Emacs 461 @node Lucid Emacs, GNU Emacs 19, Through Version 18, A History of Emacs
432 @section Lucid Emacs 462 @section Lucid Emacs
433 @cindex Lucid Emacs 463 @cindex Lucid Emacs
434 @cindex Lucid Inc. 464 @cindex Lucid Inc.
435 @cindex Energize 465 @cindex Energize
436 @cindex Epoch 466 @cindex Epoch
514 version 20.3 (the first stable version of XEmacs 20.x) released November 30, 544 version 20.3 (the first stable version of XEmacs 20.x) released November 30,
515 1997. 545 1997.
516 version 20.4 released February 28, 1998. 546 version 20.4 released February 28, 1998.
517 @end itemize 547 @end itemize
518 548
519 @node GNU Emacs 19 549 @node GNU Emacs 19, GNU Emacs 20, Lucid Emacs, A History of Emacs
520 @section GNU Emacs 19 550 @section GNU Emacs 19
521 @cindex GNU Emacs 19 551 @cindex GNU Emacs 19
522 @cindex FSF Emacs 552 @cindex FSF Emacs
523 553
524 About a year after the initial release of Lucid Emacs, the FSF 554 About a year after the initial release of Lucid Emacs, the FSF
591 worse. Lucid soon began incorporating features from GNU Emacs 19 into 621 worse. Lucid soon began incorporating features from GNU Emacs 19 into
592 Lucid Emacs; the work was mostly done by Richard Mlynarik, who had been 622 Lucid Emacs; the work was mostly done by Richard Mlynarik, who had been
593 working on and using GNU Emacs for a long time (back as far as version 623 working on and using GNU Emacs for a long time (back as far as version
594 16 or 17). 624 16 or 17).
595 625
596 @node GNU Emacs 20 626 @node GNU Emacs 20, XEmacs, GNU Emacs 19, A History of Emacs
597 @section GNU Emacs 20 627 @section GNU Emacs 20
598 @cindex GNU Emacs 20 628 @cindex GNU Emacs 20
599 @cindex FSF Emacs 629 @cindex FSF Emacs
600 630
601 On February 2, 1997 work began on GNU Emacs to integrate Mule. The first 631 On February 2, 1997 work began on GNU Emacs to integrate Mule. The first
610 version 20.2 released September 20, 1997. 640 version 20.2 released September 20, 1997.
611 @item 641 @item
612 version 20.3 released August 19, 1998. 642 version 20.3 released August 19, 1998.
613 @end itemize 643 @end itemize
614 644
615 @node XEmacs 645 @node XEmacs, , GNU Emacs 20, A History of Emacs
616 @section XEmacs 646 @section XEmacs
617 @cindex XEmacs 647 @cindex XEmacs
618 648
619 @cindex Sun Microsystems 649 @cindex Sun Microsystems
620 @cindex University of Illinois 650 @cindex University of Illinois
935 Java, which is inexcusable. 965 Java, which is inexcusable.
936 @end enumerate 966 @end enumerate
937 967
938 Unfortunately, there is no perfect language. Static typing allows a 968 Unfortunately, there is no perfect language. Static typing allows a
939 compiler to catch programmer errors and produce more efficient code, but 969 compiler to catch programmer errors and produce more efficient code, but
940 makes programming more tedious and less fun. For the forseeable future, 970 makes programming more tedious and less fun. For the foreseeable future,
941 an Ideal Editing and Programming Environment (and that is what XEmacs 971 an Ideal Editing and Programming Environment (and that is what XEmacs
942 aspires to) will be programmable in multiple languages: high level ones 972 aspires to) will be programmable in multiple languages: high level ones
943 like Lisp for user customization and prototyping, and lower level ones 973 like Lisp for user customization and prototyping, and lower level ones
944 for infrastructure and industrial strength applications. If I had my 974 for infrastructure and industrial strength applications. If I had my
945 way, XEmacs would be friendly towards the Python, Scheme, C++, ML, 975 way, XEmacs would be friendly towards the Python, Scheme, C++, ML,
1186 @file{loadup.el} tells the C code about this function by setting its 1216 @file{loadup.el} tells the C code about this function by setting its
1187 name as the value of the Lisp variable @code{top-level}. 1217 name as the value of the Lisp variable @code{top-level}.
1188 1218
1189 When the Lisp initialization code is done, the C code enters the event 1219 When the Lisp initialization code is done, the C code enters the event
1190 loop, and stays there for the duration of the XEmacs process. The code 1220 loop, and stays there for the duration of the XEmacs process. The code
1191 for the event loop is contained in @file{keyboard.c}, and is called 1221 for the event loop is contained in @file{cmdloop.c}, and is called
1192 @code{Fcommand_loop_1()}. Note that this event loop could very well be 1222 @code{Fcommand_loop_1()}. Note that this event loop could very well be
1193 written in Lisp, and in fact a Lisp version exists; but apparently, 1223 written in Lisp, and in fact a Lisp version exists; but apparently,
1194 doing this makes XEmacs run noticeably slower. 1224 doing this makes XEmacs run noticeably slower.
1195 1225
1196 Notice how much of the initialization is done in Lisp, not in C. 1226 Notice how much of the initialization is done in Lisp, not in C.
1588 1618
1589 @example 1619 @example
1590 [ 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 ] 1620 [ 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 ]
1591 [ 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 ] 1621 [ 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 ]
1592 1622
1593 <---> ^ <------------------------------------------------------> 1623 <---------------------------------------------------------> <->
1594 tag | a pointer to a structure, or an integer 1624 a pointer to a structure, or an integer tag
1595 | 1625 @end example
1596 mark bit 1626
1597 @end example 1627 A tag of 00 is used for all pointer object types, a tag of 10 is used
1598 1628 for characters, and the other two tags 01 and 11 are joined together to
1599 The tag describes the type of the Lisp object. For integers and chars, 1629 form the integer object type. This representation gives us 31 bit
1600 the lower 28 bits contain the value of the integer or char; for all 1630 integers and 30 bit characters, while pointers are represented directly
1601 others, the lower 28 bits contain a pointer. The mark bit is used 1631 without any bit masking or shifting. This representation, though,
1602 during garbage-collection, and is always 0 when garbage collection is 1632 assumes that pointers to structs are always aligned to multiples of 4,
1603 not happening. (The way that garbage collection works, basically, is that it 1633 so the lower 2 bits are always zero.
1604 loops over all places where Lisp objects could exist---this includes
1605 all global variables in C that contain Lisp objects [including
1606 @code{Vobarray}, the C equivalent of @code{obarray}; through this, all
1607 Lisp variables will get marked], plus various other places---and
1608 recursively scans through the Lisp objects, marking each object it finds
1609 by setting the mark bit. Then it goes through the lists of all objects
1610 allocated, freeing the ones that are not marked and turning off the mark
1611 bit of the ones that are marked.)
1612 1634
1613 Lisp objects use the typedef @code{Lisp_Object}, but the actual C type 1635 Lisp objects use the typedef @code{Lisp_Object}, but the actual C type
1614 used for the Lisp object can vary. It can be either a simple type 1636 used for the Lisp object can vary. It can be either a simple type
1615 (@code{long} on the DEC Alpha, @code{int} on other machines) or a 1637 (@code{long} on the DEC Alpha, @code{int} on other machines) or a
1616 structure whose fields are bit fields that line up properly (actually, a 1638 structure whose fields are bit fields that line up properly (actually, a
1617 union of structures is used). Generally the simple integral type is 1639 union of structures is used). Generally the simple integral type is
1618 preferable because it ensures that the compiler will actually use a 1640 preferable because it ensures that the compiler will actually use a
1619 machine word to represent the object (some compilers will use more 1641 machine word to represent the object (some compilers will use more
1620 general and less efficient code for unions and structs even if they can 1642 general and less efficient code for unions and structs even if they can
1621 fit in a machine word). The union type, however, has the advantage of 1643 fit in a machine word). The union type, however, has the advantage of
1622 stricter type checking (if you accidentally pass an integer where a Lisp 1644 stricter type checking. If you accidentally pass an integer where a Lisp
1623 object is desired, you get a compile error), and it makes it easier to 1645 object is desired, you get a compile error. The choice of which type
1624 decode Lisp objects when debugging. The choice of which type to use is 1646 to use is determined by the preprocessor constant @code{USE_UNION_TYPE}
1625 determined by the preprocessor constant @code{USE_UNION_TYPE} which is 1647 which is defined via the @code{--use-union-type} option to
1626 defined via the @code{--use-union-type} option to @code{configure}. 1648 @code{configure}.
1627 1649
1628 @cindex record type 1650 Various macros are used to convert between Lisp_Objects and the
1629 1651 corresponding C type. Macros of the form @code{XINT()}, @code{XCHAR()},
1630 Note that there are only eight types that the tag can represent, but 1652 @code{XSTRING()}, @code{XSYMBOL()}, do any required bit shifting and/or
1631 many more actual types than this. This is handled by having one of the 1653 masking and cast it to the appropriate type. @code{XINT()} needs to be
1632 tag types specify a meta-type called a @dfn{record}; for all such 1654 a bit tricky so that negative numbers are properly sign-extended. Since
1633 objects, the first four bytes of the pointed-to structure indicate what 1655 integers are stored left-shifted, if the right-shift operator does an
1634 the actual type is. 1656 arithmetic shift (i.e. it leaves the most-significant bit as-is rather
1635 1657 than shifting in a zero, so that it mimics a divide-by-two even for
1636 Note also that having 28 bits for pointers and integers restricts a lot 1658 negative numbers) the shift to remove the tag bit is enough. This is
1637 of things to 256 megabytes of memory. (Basically, enough pointers and 1659 the case on all the systems we support.
1638 indices and whatnot get stuffed into Lisp objects that the total amount 1660
1639 of memory used by XEmacs can't grow above 256 megabytes. In older 1661 Note that when @code{ERROR_CHECK_TYPECHECK} is defined, the converter
1640 versions of XEmacs and GNU Emacs, the tag was 5 bits wide, allowing for
1641 32 types, which was more than the actual number of types that existed at
1642 the time, and no ``record'' type was necessary. However, this limited
1643 the editor to 64 megabytes total, which some users who edited large
1644 files might conceivably exceed.)
1645
1646 Also, note that there is an implicit assumption here that all pointers
1647 are low enough that the top bits are all zero and can just be chopped
1648 off. On standard machines that allocate memory from the bottom up (and
1649 give each process its own address space), this works fine. Some
1650 machines, however, put the data space somewhere else in memory
1651 (e.g. beginning at 0x80000000). Those machines cope by defining
1652 @code{DATA_SEG_BITS} in the corresponding @file{m/} or @file{s/} file to
1653 the proper mask. Then, pointers retrieved from Lisp objects are
1654 automatically OR'ed with this value prior to being used.
1655
1656 A corollary of the previous paragraph is that @strong{(pointers to)
1657 stack-allocated structures cannot be put into Lisp objects}. The stack
1658 is generally located near the top of memory; if you put such a pointer
1659 into a Lisp object, it will get its top bits chopped off, and you will
1660 lose.
1661
1662 Actually, there's an alternative representation of a @code{Lisp_Object},
1663 invented by Kyle Jones, that is used when the
1664 @code{--use-minimal-tagbits} option to @code{configure} is used. In
1665 this case the 2 lower bits are used for the tag bits. This
1666 representation assumes that pointers to structs are always aligned to
1667 multiples of 4, so the lower 2 bits are always zero.
1668
1669 @example
1670 [ 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 ]
1671 [ 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 ]
1672
1673 <---------------------------------------------------------> <->
1674 a pointer to a structure, or an integer tag
1675 @end example
1676
1677 A tag of 00 is used for all pointer object types, a tag of 10 is used
1678 for characters, and the other two tags 01 and 11 are joined together to
1679 form the integer object type. The markbit is moved to part of the
1680 structure being pointed at (integers and chars do not need to be marked,
1681 since no memory is allocated). This representation has these
1682 advantages:
1683
1684 @enumerate
1685 @item
1686 31 bits can be used for Lisp Integers.
1687 @item
1688 @emph{Any} pointer can be represented directly, and no bit masking
1689 operations are necessary.
1690 @end enumerate
1691
1692 The disadvantages are:
1693
1694 @enumerate
1695 @item
1696 An extra level of indirection is needed when accessing the object types
1697 that were not record types. So checking whether a Lisp object is a cons
1698 cell becomes a slower operation.
1699 @item
1700 Mark bits can no longer be stored directly in Lisp objects, so another
1701 place for them must be found. This means that a cons cell requires more
1702 memory than merely room for 2 lisp objects, leading to extra memory use.
1703 @end enumerate
1704
1705 Various macros are used to construct Lisp objects and extract the
1706 components. Macros of the form @code{XINT()}, @code{XCHAR()},
1707 @code{XSTRING()}, @code{XSYMBOL()}, etc. mask out the pointer/integer
1708 field and cast it to the appropriate type. All of the macros that
1709 construct pointers will @code{OR} with @code{DATA_SEG_BITS} if
1710 necessary. @code{XINT()} needs to be a bit tricky so that negative
1711 numbers are properly sign-extended: Usually it does this by shifting the
1712 number four bits to the left and then four bits to the right. This
1713 assumes that the right-shift operator does an arithmetic shift (i.e. it
1714 leaves the most-significant bit as-is rather than shifting in a zero, so
1715 that it mimics a divide-by-two even for negative numbers). Not all
1716 machines/compilers do this, and on the ones that don't, a more
1717 complicated definition is selected by defining
1718 @code{EXPLICIT_SIGN_EXTEND}.
1719
1720 Note that when @code{ERROR_CHECK_TYPECHECK} is defined, the extractor
1721 macros become more complicated---they check the tag bits and/or the 1662 macros become more complicated---they check the tag bits and/or the
1722 type field in the first four bytes of a record type to ensure that the 1663 type field in the first four bytes of a record type to ensure that the
1723 object is really of the correct type. This is great for catching places 1664 object is really of the correct type. This is great for catching places
1724 where an incorrect type is being dereferenced---this typically results 1665 where an incorrect type is being dereferenced---this typically results
1725 in a pointer being dereferenced as the wrong type of structure, with 1666 in a pointer being dereferenced as the wrong type of structure, with
1726 unpredictable (and sometimes not easily traceable) results. 1667 unpredictable (and sometimes not easily traceable) results.
1727 1668
1728 There are similar @code{XSET@var{TYPE}()} macros that construct a Lisp 1669 There are similar @code{XSET@var{TYPE}()} macros that construct a Lisp
1729 object. These macros are of the form @code{XSET@var{TYPE} 1670 object. These macros are of the form @code{XSET@var{TYPE}
1730 (@var{lvalue}, @var{result})}, 1671 (@var{lvalue}, @var{result})}, i.e. they have to be a statement rather
1731 i.e. they have to be a statement rather than just used in an expression. 1672 than just used in an expression. The reason for this is that standard C
1732 The reason for this is that standard C doesn't let you ``construct'' a 1673 doesn't let you ``construct'' a structure (but GCC does). Granted, this
1733 structure (but GCC does). Granted, this sometimes isn't too convenient; 1674 sometimes isn't too convenient; for the case of integers, at least, you
1734 for the case of integers, at least, you can use the function 1675 can use the function @code{make_int()}, which constructs and
1735 @code{make_int()}, which constructs and @emph{returns} an integer 1676 @emph{returns} an integer Lisp object. Note that the
1736 Lisp object. Note that the @code{XSET@var{TYPE}()} macros are also 1677 @code{XSET@var{TYPE}()} macros are also affected by
1737 affected by @code{ERROR_CHECK_TYPECHECK} and make sure that the 1678 @code{ERROR_CHECK_TYPECHECK} and make sure that the structure is of the
1738 structure is of the right type in the case of record types, where the 1679 right type in the case of record types, where the type is contained in
1739 type is contained in the structure. 1680 the structure.
1740 1681
1741 The C programmer is responsible for @strong{guaranteeing} that a 1682 The C programmer is responsible for @strong{guaranteeing} that a
1742 Lisp_Object is is the correct type before using the @code{X@var{TYPE}} 1683 Lisp_Object is the correct type before using the @code{X@var{TYPE}}
1743 macros. This is especially important in the case of lists. Use 1684 macros. This is especially important in the case of lists. Use
1744 @code{XCAR} and @code{XCDR} if a Lisp_Object is certainly a cons cell, 1685 @code{XCAR} and @code{XCDR} if a Lisp_Object is certainly a cons cell,
1745 else use @code{Fcar()} and @code{Fcdr()}. Trust other C code, but not 1686 else use @code{Fcar()} and @code{Fcdr()}. Trust other C code, but not
1746 Lisp code. On the other hand, if XEmacs has an internal logic error, 1687 Lisp code. On the other hand, if XEmacs has an internal logic error,
1747 it's better to crash immediately, so sprinkle ``unreachable'' 1688 it's better to crash immediately, so sprinkle @code{assert()}s and
1748 @code{abort()}s liberally about the source code. 1689 ``unreachable'' @code{abort()}s liberally about the source code. Where
1690 performance is an issue, use @code{type_checking_assert},
1691 @code{bufpos_checking_assert}, and @code{gc_checking_assert}, which do
1692 nothing unless the corresponding configure error checking flag was
1693 specified.
1749 1694
1750 @node Rules When Writing New C Code, A Summary of the Various XEmacs Modules, How Lisp Objects Are Represented in C, Top 1695 @node Rules When Writing New C Code, A Summary of the Various XEmacs Modules, How Lisp Objects Are Represented in C, Top
1751 @chapter Rules When Writing New C Code 1696 @chapter Rules When Writing New C Code
1752 1697
1753 The XEmacs C Code is extremely complex and intricate, and there are many 1698 The XEmacs C Code is extremely complex and intricate, and there are many
1763 * Adding Global Lisp Variables:: 1708 * Adding Global Lisp Variables::
1764 * Coding for Mule:: 1709 * Coding for Mule::
1765 * Techniques for XEmacs Developers:: 1710 * Techniques for XEmacs Developers::
1766 @end menu 1711 @end menu
1767 1712
1768 @node General Coding Rules 1713 @node General Coding Rules, Writing Lisp Primitives, Rules When Writing New C Code, Rules When Writing New C Code
1769 @section General Coding Rules 1714 @section General Coding Rules
1770 1715
1771 The C code is actually written in a dialect of C called @dfn{Clean C}, 1716 The C code is actually written in a dialect of C called @dfn{Clean C},
1772 meaning that it can be compiled, mostly warning-free, with either a C or 1717 meaning that it can be compiled, mostly warning-free, with either a C or
1773 C++ compiler. Coding in Clean C has several advantages over plain C. 1718 C++ compiler. Coding in Clean C has several advantages over plain C.
1797 must always be included before any other header files (including 1742 must always be included before any other header files (including
1798 system header files) to ensure that certain tricks played by various 1743 system header files) to ensure that certain tricks played by various
1799 @file{s/} and @file{m/} files work out correctly. 1744 @file{s/} and @file{m/} files work out correctly.
1800 1745
1801 When including header files, always use angle brackets, not double 1746 When including header files, always use angle brackets, not double
1802 quotes, except when the file to be included is in the same directory as 1747 quotes, except when the file to be included is always in the same
1803 the including file. If either file is a generated file, then that is 1748 directory as the including file. If either file is a generated file,
1804 not likely to be the case. In order to understand why we have this 1749 then that is not likely to be the case. In order to understand why we
1805 rule, imagine what happens when you do a build in the source directory 1750 have this rule, imagine what happens when you do a build in the source
1806 using @samp{./configure} and another build in another directory using 1751 directory using @samp{./configure} and another build in another
1807 @samp{../work/configure}. There will be two different @file{config.h} 1752 directory using @samp{../work/configure}. There will be two different
1808 files. Which one will be used if you @samp{#include "config.h"}? 1753 @file{config.h} files. Which one will be used if you @samp{#include
1754 "config.h"}?
1809 1755
1810 @strong{All global and static variables that are to be modifiable must 1756 @strong{All global and static variables that are to be modifiable must
1811 be declared uninitialized.} This means that you may not use the 1757 be declared uninitialized.} This means that you may not use the
1812 ``declare with initializer'' form for these variables, such as @code{int 1758 ``declare with initializer'' form for these variables, such as @code{int
1813 some_variable = 0;}. The reason for this has to do with some kludges 1759 some_variable = 0;}. The reason for this has to do with some kludges
1814 done during the dumping process: If possible, the initialized data 1760 done during the dumping process: If possible, the initialized data
1815 segment is re-mapped so that it becomes part of the (unmodifiable) code 1761 segment is re-mapped so that it becomes part of the (unmodifiable) code
1816 segment in the dumped executable. This allows this memory to be shared 1762 segment in the dumped executable. This allows this memory to be shared
1817 among multiple running XEmacs processes. XEmacs is careful to place as 1763 among multiple running XEmacs processes. XEmacs is careful to place as
1818 much constant data as possible into initialized variables (in 1764 much constant data as possible into initialized variables during the
1819 particular, into what's called the @dfn{pure space}---see below) during 1765 @file{temacs} phase.
1820 the @file{temacs} phase.
1821 1766
1822 @cindex copy-on-write 1767 @cindex copy-on-write
1823 @strong{Please note:} This kludge only works on a few systems nowadays, 1768 @strong{Please note:} This kludge only works on a few systems nowadays,
1824 and is rapidly becoming irrelevant because most modern operating systems 1769 and is rapidly becoming irrelevant because most modern operating systems
1825 provide @dfn{copy-on-write} semantics. All data is initially shared 1770 provide @dfn{copy-on-write} semantics. All data is initially shared
1849 1794
1850 The C source code makes heavy use of C preprocessor macros. One popular 1795 The C source code makes heavy use of C preprocessor macros. One popular
1851 macro style is: 1796 macro style is:
1852 1797
1853 @example 1798 @example
1854 #define FOO(var, value) do @{ \ 1799 #define FOO(var, value) do @{ \
1855 Lisp_Object FOO_value = (value); \ 1800 Lisp_Object FOO_value = (value); \
1856 ... /* compute using FOO_value */ \ 1801 ... /* compute using FOO_value */ \
1857 (var) = bar; \ 1802 (var) = bar; \
1858 @} while (0) 1803 @} while (0)
1859 @end example 1804 @end example
1878 case of @code{GET_EXTERNAL_LIST_LENGTH}, validating the properness of 1823 case of @code{GET_EXTERNAL_LIST_LENGTH}, validating the properness of
1879 the list. The macros @code{EXTERNAL_LIST_LOOP_DELETE_IF} and 1824 the list. The macros @code{EXTERNAL_LIST_LOOP_DELETE_IF} and
1880 @code{LIST_LOOP_DELETE_IF} delete elements from a lisp list satisfying some 1825 @code{LIST_LOOP_DELETE_IF} delete elements from a lisp list satisfying some
1881 predicate. 1826 predicate.
1882 1827
1883 @node Writing Lisp Primitives 1828 @node Writing Lisp Primitives, Adding Global Lisp Variables, General Coding Rules, Rules When Writing New C Code
1884 @section Writing Lisp Primitives 1829 @section Writing Lisp Primitives
1885 1830
1886 Lisp primitives are Lisp functions implemented in C. The details of 1831 Lisp primitives are Lisp functions implemented in C. The details of
1887 interfacing the C function so that Lisp can call it are handled by a few 1832 interfacing the C function so that Lisp can call it are handled by a few
1888 C macros. The only way to really understand how to write new C code is 1833 C macros. The only way to really understand how to write new C code is
2122 2067
2123 @file{eval.c} is a very good file to look through for examples; 2068 @file{eval.c} is a very good file to look through for examples;
2124 @file{lisp.h} contains the definitions for important macros and 2069 @file{lisp.h} contains the definitions for important macros and
2125 functions. 2070 functions.
2126 2071
2127 @node Adding Global Lisp Variables 2072 @node Adding Global Lisp Variables, Coding for Mule, Writing Lisp Primitives, Rules When Writing New C Code
2128 @section Adding Global Lisp Variables 2073 @section Adding Global Lisp Variables
2129 2074
2130 Global variables whose names begin with @samp{Q} are constants whose 2075 Global variables whose names begin with @samp{Q} are constants whose
2131 value is a symbol of a particular name. The name of the variable should 2076 value is a symbol of a particular name. The name of the variable should
2132 be derived from the name of the symbol using the same rules as for Lisp 2077 be derived from the name of the symbol using the same rules as for Lisp
2184 garbage-collection mechanism won't know that the object in this variable 2129 garbage-collection mechanism won't know that the object in this variable
2185 is in use, and will happily collect it and reuse its storage for another 2130 is in use, and will happily collect it and reuse its storage for another
2186 Lisp object, and you will be the one who's unhappy when you can't figure 2131 Lisp object, and you will be the one who's unhappy when you can't figure
2187 out how your variable got overwritten. 2132 out how your variable got overwritten.
2188 2133
2189 @node Coding for Mule 2134 @node Coding for Mule, Techniques for XEmacs Developers, Adding Global Lisp Variables, Rules When Writing New C Code
2190 @section Coding for Mule 2135 @section Coding for Mule
2191 @cindex Coding for Mule 2136 @cindex Coding for Mule
2192 2137
2193 Although Mule support is not compiled by default in XEmacs, many people 2138 Although Mule support is not compiled by default in XEmacs, many people
2194 are using it, and we consider it crucial that new code works correctly 2139 are using it, and we consider it crucial that new code works correctly
2207 * Conversion to and from External Data:: 2152 * Conversion to and from External Data::
2208 * General Guidelines for Writing Mule-Aware Code:: 2153 * General Guidelines for Writing Mule-Aware Code::
2209 * An Example of Mule-Aware Code:: 2154 * An Example of Mule-Aware Code::
2210 @end menu 2155 @end menu
2211 2156
2212 @node Character-Related Data Types 2157 @node Character-Related Data Types, Working With Character and Byte Positions, Coding for Mule, Coding for Mule
2213 @subsection Character-Related Data Types 2158 @subsection Character-Related Data Types
2214 2159
2215 First, let's review the basic character-related datatypes used by 2160 First, let's review the basic character-related datatypes used by
2216 XEmacs. Note that the separate @code{typedef}s are not mandatory in the 2161 XEmacs. Note that the separate @code{typedef}s are not mandatory in the
2217 current implementation (all of them boil down to @code{unsigned char} or 2162 current implementation (all of them boil down to @code{unsigned char} or
2234 @item Bufbyte 2179 @item Bufbyte
2235 @cindex Bufbyte 2180 @cindex Bufbyte
2236 The data representing the text in a buffer or string is logically a set 2181 The data representing the text in a buffer or string is logically a set
2237 of @code{Bufbyte}s. 2182 of @code{Bufbyte}s.
2238 2183
2239 XEmacs does not work with character formats all the time; when reading 2184 XEmacs does not work with the same character formats all the time; when
2240 characters from the outside, it decodes them to an internal format, and 2185 reading characters from the outside, it decodes them to an internal
2241 likewise encodes them when writing. @code{Bufbyte} (in fact 2186 format, and likewise encodes them when writing. @code{Bufbyte} (in fact
2242 @code{unsigned char}) is the basic unit of XEmacs internal buffers and 2187 @code{unsigned char}) is the basic unit of XEmacs internal buffers and
2243 strings format. 2188 strings format. A @code{Bufbyte *} is the type that points at text
2189 encoded in the variable-width internal encoding.
2244 2190
2245 One character can correspond to one or more @code{Bufbyte}s. In the 2191 One character can correspond to one or more @code{Bufbyte}s. In the
2246 current implementation, an ASCII character is represented by the same 2192 current Mule implementation, an ASCII character is represented by the
2247 @code{Bufbyte}, and extended characters are represented by a sequence of 2193 same @code{Bufbyte}, and other characters are represented by a sequence
2248 @code{Bufbyte}s. 2194 of two or more @code{Bufbyte}s.
2249 2195
2250 Without Mule support, a @code{Bufbyte} is equivalent to an 2196 Without Mule support, there are exactly 256 characters, implicitly
2251 @code{Emchar}. 2197 Latin-1, and each character is represented using one @code{Bufbyte}, and
2198 there is a one-to-one correspondence between @code{Bufbyte}s and
2199 @code{Emchar}s.
2252 2200
2253 @item Bufpos 2201 @item Bufpos
2254 @itemx Charcount 2202 @itemx Charcount
2255 @cindex Bufpos 2203 @cindex Bufpos
2256 @cindex Charcount 2204 @cindex Charcount
2257 A @code{Bufpos} represents a character position in a buffer or string. 2205 A @code{Bufpos} represents a character position in a buffer or string.
2258 A @code{Charcount} represents a number (count) of characters. 2206 A @code{Charcount} represents a number (count) of characters.
2259 Logically, subtracting two @code{Bufpos} values yields a 2207 Logically, subtracting two @code{Bufpos} values yields a
2260 @code{Charcount} value. Although all of these are @code{typedef}ed to 2208 @code{Charcount} value. Although all of these are @code{typedef}ed to
2261 @code{int}, we use them in preference to @code{int} to make it clear 2209 @code{EMACS_INT}, we use them in preference to @code{EMACS_INT} to make
2262 what sort of position is being used. 2210 it clear what sort of position is being used.
2263 2211
2264 @code{Bufpos} and @code{Charcount} values are the only ones that are 2212 @code{Bufpos} and @code{Charcount} values are the only ones that are
2265 ever visible to Lisp. 2213 ever visible to Lisp.
2266 2214
2267 @item Bytind 2215 @item Bytind
2268 @itemx Bytecount 2216 @itemx Bytecount
2269 @cindex Bytind 2217 @cindex Bytind
2270 @cindex Bytecount 2218 @cindex Bytecount
2271 A @code{Bytind} represents a byte position in a buffer or string. A 2219 A @code{Bytind} represents a byte position in a buffer or string. A
2272 @code{Bytecount} represents the distance between two positions in bytes. 2220 @code{Bytecount} represents the distance between two positions, in bytes.
2273 The relationship between @code{Bytind} and @code{Bytecount} is the same 2221 The relationship between @code{Bytind} and @code{Bytecount} is the same
2274 as the relationship between @code{Bufpos} and @code{Charcount}. 2222 as the relationship between @code{Bufpos} and @code{Charcount}.
2275 2223
2276 @item Extbyte 2224 @item Extbyte
2277 @itemx Extcount 2225 @itemx Extcount
2281 which are equivalent to @code{unsigned char}. Obviously, an 2229 which are equivalent to @code{unsigned char}. Obviously, an
2282 @code{Extcount} is the distance between two @code{Extbyte}s. Extbytes 2230 @code{Extcount} is the distance between two @code{Extbyte}s. Extbytes
2283 and Extcounts are not all that frequent in XEmacs code. 2231 and Extcounts are not all that frequent in XEmacs code.
2284 @end table 2232 @end table
2285 2233
2286 @node Working With Character and Byte Positions 2234 @node Working With Character and Byte Positions, Conversion to and from External Data, Character-Related Data Types, Coding for Mule
2287 @subsection Working With Character and Byte Positions 2235 @subsection Working With Character and Byte Positions
2288 2236
2289 Now that we have defined the basic character-related types, we can look 2237 Now that we have defined the basic character-related types, we can look
2290 at the macros and functions designed for work with them and for 2238 at the macros and functions designed for work with them and for
2291 conversion between them. Most of these macros are defined in 2239 conversion between them. Most of these macros are defined in
2294 learn about them. 2242 learn about them.
2295 2243
2296 @table @code 2244 @table @code
2297 @item MAX_EMCHAR_LEN 2245 @item MAX_EMCHAR_LEN
2298 @cindex MAX_EMCHAR_LEN 2246 @cindex MAX_EMCHAR_LEN
2299 This preprocessor constant is the maximum number of buffer bytes per 2247 This preprocessor constant is the maximum number of buffer bytes to
2300 Emacs character, i.e. the byte length of an @code{Emchar}. It is useful 2248 represent an Emacs character in the variable width internal encoding.
2301 when allocating temporary strings to keep a known number of characters. 2249 It is useful when allocating temporary strings to keep a known number of
2302 For instance: 2250 characters. For instance:
2303 2251
2304 @example 2252 @example
2305 @group 2253 @group
2306 @{ 2254 @{
2307 Charcount cclen; 2255 Charcount cclen;
2405 @example 2353 @example
2406 Bufbyte *charptr_n_addr (Bufbyte *p, Charcount cc); 2354 Bufbyte *charptr_n_addr (Bufbyte *p, Charcount cc);
2407 @end example 2355 @end example
2408 @end table 2356 @end table
2409 2357
2410 @node Conversion to and from External Data 2358 @node Conversion to and from External Data, General Guidelines for Writing Mule-Aware Code, Working With Character and Byte Positions, Coding for Mule
2411 @subsection Conversion to and from External Data 2359 @subsection Conversion to and from External Data
2412 2360
2413 When an external function, such as a C library function, returns a 2361 When an external function, such as a C library function, returns a
2414 @code{char} pointer, you should almost never treat it as @code{Bufbyte}. 2362 @code{char} pointer, you should almost never treat it as @code{Bufbyte}.
2415 This is because these returned strings may contain 8bit characters which 2363 This is because these returned strings may contain 8bit characters which
2418 always convert it to an appropriate external encoding, lest the internal 2366 always convert it to an appropriate external encoding, lest the internal
2419 stuff (such as the infamous \201 characters) leak out. 2367 stuff (such as the infamous \201 characters) leak out.
2420 2368
2421 The interface to conversion between the internal and external 2369 The interface to conversion between the internal and external
2422 representations of text are the numerous conversion macros defined in 2370 representations of text are the numerous conversion macros defined in
2423 @file{buffer.h}. Before looking at them, we'll look at the external 2371 @file{buffer.h}. There used to be a fixed set of external formats
2424 formats supported by these macros. 2372 supported by these macros, but now any coding system can be used with
2425 2373 these macros. The coding system alias mechanism is used to create the
2426 Currently meaningful formats are @code{FORMAT_BINARY}, 2374 following logical coding systems, which replace the fixed external
2427 @code{FORMAT_FILENAME}, @code{FORMAT_OS}, and @code{FORMAT_CTEXT}. Here 2375 formats. The (dontusethis-set-symbol-value-handler) mechanism was
2428 is a description of these. 2376 enhanced to make this possible (more work on that is needed - like
2377 remove the @code{dontusethis-} prefix).
2429 2378
2430 @table @code 2379 @table @code
2431 @item FORMAT_BINARY 2380 @item Qbinary
2432 Binary format. This is the simplest format and is what we use in the 2381 This is the simplest format and is what we use in the absence of a more
2433 absence of a more appropriate format. This converts according to the 2382 appropriate format. This converts according to the @code{binary} coding
2434 @code{binary} coding system: 2383 system:
2435 2384
2436 @enumerate a 2385 @enumerate a
2437 @item 2386 @item
2438 On input, bytes 0--255 are converted into characters 0--255. 2387 On input, bytes 0--255 are converted into (implicitly Latin-1)
2388 characters 0--255. A non-Mule xemacs doesn't really know about
2389 different character sets and the fonts to display them, so the bytes can
2390 be treated as text in different 1-byte encodings by simply setting the
2391 appropriate fonts. So in a sense, non-Mule xemacs is a multi-lingual
2392 editor if, for example, different fonts are used to display text in
2393 different buffers, faces, or windows. The specifier mechanism gives the
2394 user complete control over this kind of behavior.
2439 @item 2395 @item
2440 On output, characters 0--255 are converted into bytes 0--255 and other 2396 On output, characters 0--255 are converted into bytes 0--255 and other
2441 characters are converted into `X'. 2397 characters are converted into `~'.
2442 @end enumerate 2398 @end enumerate
2443 2399
2444 @item FORMAT_FILENAME 2400 @item Qfile_name
2445 Format used for filenames. In the original Mule, this is user-definable 2401 Format used for filenames. This is user-definable via either the
2446 with the @code{pathname-coding-system} variable. For the moment, we 2402 @code{file-name-coding-system} or @code{pathname-coding-system} (now
2447 just use the @code{binary} coding system. 2403 obsolete) variables.
2448 2404
2449 @item FORMAT_OS 2405 @item Qnative
2450 Format used for the external Unix environment---@code{argv[]}, stuff 2406 Format used for the external Unix environment---@code{argv[]}, stuff
2451 from @code{getenv()}, stuff from the @file{/etc/passwd} file, etc. 2407 from @code{getenv()}, stuff from the @file{/etc/passwd} file, etc.
2452 2408 Currently this is the same as Qfile_name. The two should be
2453 Perhaps should be the same as FORMAT_FILENAME. 2409 distinguished for clarity and possible future separation.
2454 2410
2455 @item FORMAT_CTEXT 2411 @item Qctext
2456 Compound--text format. This is the standard X format used for data 2412 Compound--text format. This is the standard X11 format used for data
2457 stored in properties, selections, and the like. This is an 8-bit 2413 stored in properties, selections, and the like. This is an 8-bit
2458 no-lock-shift ISO2022 coding system. 2414 no-lock-shift ISO2022 coding system. This is a real coding system,
2415 unlike Qfile_name, which is user-definable.
2459 @end table 2416 @end table
2460 2417
2461 The macros to convert between these formats and the internal format, and 2418 There are two fundamental macros to convert between external and
2462 vice versa, follow. 2419 internal format.
2420
2421 @code{TO_INTERNAL_FORMAT} converts external data to internal format, and
2422 @code{TO_EXTERNAL_FORMAT} converts the other way around. The arguments
2423 each of these receives are a source type, a source, a sink type, a sink,
2424 and a coding system (or a symbol naming a coding system).
2425
2426 A typical call looks like
2427 @example
2428 TO_EXTERNAL_FORMAT (LISP_STRING, str, C_STRING_MALLOC, ptr, Qfile_name);
2429 @end example
2430
2431 which means that the contents of the lisp string @code{str} are written
2432 to a malloc'ed memory area which will be pointed to by @code{ptr}, after
2433 the function returns. The conversion will be done using the
2434 @code{file-name} coding system, which will be controlled by the user
2435 indirectly by setting or binding the variable
2436 @code{file-name-coding-system}.
2437
2438 Some sources and sinks require two C variables to specify. We use some
2439 preprocessor magic to allow different source and sink types, and even
2440 different numbers of arguments to specify different types of sources and
2441 sinks.
2442
2443 So we can have a call that looks like
2444 @example
2445 TO_INTERNAL_FORMAT (DATA, (ptr, len),
2446 MALLOC, (ptr, len),
2447 coding_system);
2448 @end example
2449
2450 The parenthesized argument pairs are required to make the preprocessor
2451 magic work.
2452
2453 Here are the different source and sink types:
2463 2454
2464 @table @code 2455 @table @code
2465 @item GET_CHARPTR_INT_DATA_ALLOCA 2456 @item @code{DATA, (ptr, len),}
2466 @itemx GET_CHARPTR_EXT_DATA_ALLOCA 2457 input data is a fixed buffer of size @var{len} at address @var{ptr}
2467 These two are the most basic conversion macros. 2458 @item @code{ALLOCA, (ptr, len),}
2468 @code{GET_CHARPTR_INT_DATA_ALLOCA} converts external data to internal 2459 output data is placed in an alloca()ed buffer of size @var{len} pointed to by @var{ptr}
2469 format, and @code{GET_CHARPTR_EXT_DATA_ALLOCA} converts the other way 2460 @item @code{MALLOC, (ptr, len),}
2470 around. The arguments each of these receives are @var{ptr} (pointer to 2461 output data is in a malloc()ed buffer of size @var{len} pointed to by @var{ptr}
2471 the text in external format), @var{len} (length of texts in bytes), 2462 @item @code{C_STRING_ALLOCA, ptr,}
2472 @var{fmt} (format of the external text), @var{ptr_out} (lvalue to which 2463 equivalent to @code{ALLOCA (ptr, len_ignored)} on output.
2473 new text should be copied), and @var{len_out} (lvalue which will be 2464 @item @code{C_STRING_MALLOC, ptr,}
2474 assigned the length of the internal text in bytes). The resulting text 2465 equivalent to @code{MALLOC (ptr, len_ignored)} on output
2475 is stored to a stack-allocated buffer. If the text doesn't need 2466 @item @code{C_STRING, ptr,}
2476 changing, these macros will do nothing, except for setting 2467 equivalent to @code{DATA, (ptr, strlen (ptr) + 1)} on input
2477 @var{len_out}. 2468 @item @code{LISP_STRING, string,}
2478 2469 input or output is a Lisp_Object of type string
2479 The macros above take many arguments which makes them unwieldy. For 2470 @item @code{LISP_BUFFER, buffer,}
2480 this reason, a number of convenience macros are defined with obvious 2471 output is written to @code{(point)} in lisp buffer @var{buffer}
2481 functionality, but accepting less arguments. The general rule is that 2472 @item @code{LISP_LSTREAM, lstream,}
2482 macros with @samp{INT} in their name convert text to internal Emacs 2473 input or output is a Lisp_Object of type lstream
2483 representation, whereas the @samp{EXT} macros convert to external 2474 @item @code{LISP_OPAQUE, object,}
2484 representation. 2475 input or output is a Lisp_Object of type opaque
2485
2486 @item GET_C_CHARPTR_INT_DATA_ALLOCA
2487 @itemx GET_C_CHARPTR_EXT_DATA_ALLOCA
2488 As their names imply, these macros work on C char pointers, which are
2489 zero-terminated, and thus do not need @var{len} or @var{len_out}
2490 parameters.
2491
2492 @item GET_STRING_EXT_DATA_ALLOCA
2493 @itemx GET_C_STRING_EXT_DATA_ALLOCA
2494 These two macros convert a Lisp string into an external representation.
2495 The difference between them is that @code{GET_STRING_EXT_DATA_ALLOCA}
2496 stores its output to a generic string, providing @var{len_out}, the
2497 length of the resulting external string. On the other hand,
2498 @code{GET_C_STRING_EXT_DATA_ALLOCA} assumes that the caller will be
2499 satisfied with output string being zero-terminated.
2500
2501 Note that for Lisp strings only one conversion direction makes sense.
2502
2503 @item GET_C_CHARPTR_EXT_BINARY_DATA_ALLOCA
2504 @itemx GET_CHARPTR_EXT_BINARY_DATA_ALLOCA
2505 @itemx GET_STRING_BINARY_DATA_ALLOCA
2506 @itemx GET_C_STRING_BINARY_DATA_ALLOCA
2507 @itemx GET_C_CHARPTR_EXT_FILENAME_DATA_ALLOCA
2508 @itemx ...
2509 These macros convert internal text to a specific external
2510 representation, with the external format being encoded into the name of
2511 the macro. Note that the @code{GET_STRING_...} and
2512 @code{GET_C_STRING...} macros lack the @samp{EXT} tag, because they
2513 only make sense in that direction.
2514
2515 @item GET_C_CHARPTR_INT_BINARY_DATA_ALLOCA
2516 @itemx GET_CHARPTR_INT_BINARY_DATA_ALLOCA
2517 @itemx GET_C_CHARPTR_INT_FILENAME_DATA_ALLOCA
2518 @itemx ...
2519 These macros convert external text of a specific format to its internal
2520 representation, with the external format being incoded into the name of
2521 the macro.
2522 @end table 2476 @end table
2523 2477
2524 @node General Guidelines for Writing Mule-Aware Code 2478 Often, the data is being converted to a '\0'-byte-terminated string,
2479 which is the format required by many external system C APIs. For these
2480 purposes, a source type of @code{C_STRING} or a sink type of
2481 @code{C_STRING_ALLOCA} or @code{C_STRING_MALLOC} is appropriate.
2482 Otherwise, we should try to keep XEmacs '\0'-byte-clean, which means
2483 using (ptr, len) pairs.
2484
2485 The sinks to be specified must be lvalues, unless they are the lisp
2486 object types @code{LISP_LSTREAM} or @code{LISP_BUFFER}.
2487
2488 For the sink types @code{ALLOCA} and @code{C_STRING_ALLOCA}, the
2489 resulting text is stored in a stack-allocated buffer, which is
2490 automatically freed on returning from the function. However, the sink
2491 types @code{MALLOC} and @code{C_STRING_MALLOC} return @code{xmalloc()}ed
2492 memory. The caller is responsible for freeing this memory using
2493 @code{xfree()}.
2494
2495 Note that it doesn't make sense for @code{LISP_STRING} to be a source
2496 for @code{TO_INTERNAL_FORMAT} or a sink for @code{TO_EXTERNAL_FORMAT}.
2497 You'll get an assertion failure if you try.
2498
2499
2500 @node General Guidelines for Writing Mule-Aware Code, An Example of Mule-Aware Code, Conversion to and from External Data, Coding for Mule
2525 @subsection General Guidelines for Writing Mule-Aware Code 2501 @subsection General Guidelines for Writing Mule-Aware Code
2526 2502
2527 This section contains some general guidance on how to write Mule-aware 2503 This section contains some general guidance on how to write Mule-aware
2528 code, as well as some pitfalls you should avoid. 2504 code, as well as some pitfalls you should avoid.
2529 2505
2546 It is extremely important to always convert external data, because 2522 It is extremely important to always convert external data, because
2547 XEmacs can crash if unexpected 8bit sequences are copied to its internal 2523 XEmacs can crash if unexpected 8bit sequences are copied to its internal
2548 buffers literally. 2524 buffers literally.
2549 2525
2550 This means that when a system function, such as @code{readdir}, returns 2526 This means that when a system function, such as @code{readdir}, returns
2551 a string, you need to convert it using one of the conversion macros 2527 a string, you may need to convert it using one of the conversion macros
2552 described in the previous chapter, before passing it further to Lisp. 2528 described in the previous chapter, before passing it further to Lisp.
2553 In the case of @code{readdir}, you would use the 2529
2554 @code{GET_C_CHARPTR_INT_FILENAME_DATA_ALLOCA} macro. 2530 Actually, most of the basic system functions that accept '\0'-terminated
2531 string arguments, like @code{stat()} and @code{open()}, have been
2532 @strong{encapsulated} so that they are they @code{always} do internal to
2533 external conversion themselves. This means you must pass internally
2534 encoded data, typically the @code{XSTRING_DATA} of a Lisp_String to
2535 these functions. This is actually a design bug, since it unexpectedly
2536 changes the semantics of the system functions. A better design would be
2537 to provide separate versions of these system functions that accepted
2538 Lisp_Objects which were lisp strings in place of their current
2539 @code{char *} arguments.
2540
2541 @example
2542 int stat_lisp (Lisp_Object path, struct stat *buf); /* Implement me */
2543 @end example
2555 2544
2556 Also note that many internal functions, such as @code{make_string}, 2545 Also note that many internal functions, such as @code{make_string},
2557 accept Bufbytes, which removes the need for them to convert the data 2546 accept Bufbytes, which removes the need for them to convert the data
2558 they receive. This increases efficiency because that way external data 2547 they receive. This increases efficiency because that way external data
2559 needs to be decoded only once, when it is read. After that, it is 2548 needs to be decoded only once, when it is read. After that, it is
2560 passed around in internal format. 2549 passed around in internal format.
2561 @end table 2550 @end table
2562 2551
2563 @node An Example of Mule-Aware Code 2552 @node An Example of Mule-Aware Code, , General Guidelines for Writing Mule-Aware Code, Coding for Mule
2564 @subsection An Example of Mule-Aware Code 2553 @subsection An Example of Mule-Aware Code
2565 2554
2566 As an example of Mule-aware code, we shall will analyze the 2555 As an example of Mule-aware code, we will analyze the @code{string}
2567 @code{string} function, which conses up a Lisp string from the character 2556 function, which conses up a Lisp string from the character arguments it
2568 arguments it receives. Here is the definition, pasted from 2557 receives. Here is the definition, pasted from @code{alloc.c}:
2569 @code{alloc.c}:
2570 2558
2571 @example 2559 @example
2572 @group 2560 @group
2573 DEFUN ("string", Fstring, 0, MANY, 0, /* 2561 DEFUN ("string", Fstring, 0, MANY, 0, /*
2574 Concatenate all the argument characters and make the result a string. 2562 Concatenate all the argument characters and make the result a string.
2609 over the XEmacs code. For starters, I recommend 2597 over the XEmacs code. For starters, I recommend
2610 @code{Fnormalize_menu_item_name} in @file{menubar.c}. After you have 2598 @code{Fnormalize_menu_item_name} in @file{menubar.c}. After you have
2611 understood this section of the manual and studied the examples, you can 2599 understood this section of the manual and studied the examples, you can
2612 proceed writing new Mule-aware code. 2600 proceed writing new Mule-aware code.
2613 2601
2614 @node Techniques for XEmacs Developers 2602 @node Techniques for XEmacs Developers, , Coding for Mule, Rules When Writing New C Code
2615 @section Techniques for XEmacs Developers 2603 @section Techniques for XEmacs Developers
2616 2604
2605 To make a purified XEmacs, do: @code{make puremacs}.
2617 To make a quantified XEmacs, do: @code{make quantmacs}. 2606 To make a quantified XEmacs, do: @code{make quantmacs}.
2618 2607
2619 You simply can't dump Quantified and Purified images. Run the image 2608 You simply can't dump Quantified and Purified images (unless using the
2620 like so: @code{quantmacs -batch -l loadup.el run-temacs @var{xemacs-args...}}. 2609 portable dumper). Purify gets confused when xemacs frees memory in one
2610 process that was allocated in a @emph{different} process on a different
2611 machine!. Run it like so:
2612 @example
2613 temacs -batch -l loadup.el run-temacs @var{xemacs-args...}
2614 @end example
2621 2615
2622 Before you go through the trouble, are you compiling with all 2616 Before you go through the trouble, are you compiling with all
2623 debugging and error-checking off? If not try that first. Be warned 2617 debugging and error-checking off? If not, try that first. Be warned
2624 that while Quantify is directly responsible for quite a few 2618 that while Quantify is directly responsible for quite a few
2625 optimizations which have been made to XEmacs, doing a run which 2619 optimizations which have been made to XEmacs, doing a run which
2626 generates results which can be acted upon is not necessarily a trivial 2620 generates results which can be acted upon is not necessarily a trivial
2627 task. 2621 task.
2628 2622
2657 2651
2658 Unfortunately, Emacs Lisp is slow, and is going to stay slow. Function 2652 Unfortunately, Emacs Lisp is slow, and is going to stay slow. Function
2659 calls in elisp are especially expensive. Iterating over a long list is 2653 calls in elisp are especially expensive. Iterating over a long list is
2660 going to be 30 times faster implemented in C than in Elisp. 2654 going to be 30 times faster implemented in C than in Elisp.
2661 2655
2656 Heavily used small code fragments need to be fast. The traditional way
2657 to implement such code fragments in C is with macros. But macros in C
2658 are known to be broken.
2659
2660 Macro arguments that are repeatedly evaluated may suffer from repeated
2661 side effects or suboptimal performance.
2662
2663 Variable names used in macros may collide with caller's variables,
2664 causing (at least) unwanted compiler warnings.
2665
2666 In order to solve these problems, and maintain statement semantics, one
2667 should use the @code{do @{ ... @} while (0)} trick while trying to
2668 reference macro arguments exactly once using local variables.
2669
2670 Let's take a look at this poor macro definition:
2671
2672 @example
2673 #define MARK_OBJECT(obj) \
2674 if (!marked_p (obj)) mark_object (obj), did_mark = 1
2675 @end example
2676
2677 This macro evaluates its argument twice, and also fails if used like this:
2678 @example
2679 if (flag) MARK_OBJECT (obj); else do_something();
2680 @end example
2681
2682 A much better definition is
2683
2684 @example
2685 #define MARK_OBJECT(obj) do @{ \
2686 Lisp_Object mo_obj = (obj); \
2687 if (!marked_p (mo_obj)) \
2688 @{ \
2689 mark_object (mo_obj); \
2690 did_mark = 1; \
2691 @} \
2692 @} while (0)
2693 @end example
2694
2695 Notice the elimination of double evaluation by using the local variable
2696 with the obscure name. Writing safe and efficient macros requires great
2697 care. The one problem with macros that cannot be portably worked around
2698 is, since a C block has no value, a macro used as an expression rather
2699 than a statement cannot use the techniques just described to avoid
2700 multiple evaluation.
2701
2702 In most cases where a macro has function semantics, an inline function
2703 is a better implementation technique. Modern compiler optimizers tend
2704 to inline functions even if they have no @code{inline} keyword, and
2705 configure magic ensures that the @code{inline} keyword can be safely
2706 used as an additional compiler hint. Inline functions used in a single
2707 .c files are easy. The function must already be defined to be
2708 @code{static}. Just add another @code{inline} keyword to the
2709 definition.
2710
2711 @example
2712 inline static int
2713 heavily_used_small_function (int arg)
2714 @{
2715 ...
2716 @}
2717 @end example
2718
2719 Inline functions in header files are trickier, because we would like to
2720 make the following optimization if the function is @emph{not} inlined
2721 (for example, because we're compiling for debugging). We would like the
2722 function to be defined externally exactly once, and each calling
2723 translation unit would create an external reference to the function,
2724 instead of including a definition of the inline function in the object
2725 code of every translation unit that uses it. This optimization is
2726 currently only available for gcc. But you don't have to worry about the
2727 trickiness; just define your inline functions in header files using this
2728 pattern:
2729
2730 @example
2731 INLINE_HEADER int
2732 i_used_to_be_a_crufty_macro_but_look_at_me_now (int arg);
2733 INLINE_HEADER int
2734 i_used_to_be_a_crufty_macro_but_look_at_me_now (int arg)
2735 @{
2736 ...
2737 @}
2738 @end example
2739
2740 The declaration right before the definition is to prevent warnings when
2741 compiling with @code{gcc -Wmissing-declarations}. I consider issuing
2742 this warning for inline functions a gcc bug, but the gcc maintainers disagree.
2743
2744 Every header which contains inline functions, either directly by using
2745 @code{INLINE_HEADER} or indirectly by using @code{DECLARE_LRECORD} must
2746 be added to @file{inline.c}'s includes to make the optimization
2747 described above work. (Optimization note: if all INLINE_HEADER
2748 functions are in fact inlined in all translation units, then the linker
2749 can just discard @code{inline.o}, since it contains only unreferenced code).
2750
2662 To get started debugging XEmacs, take a look at the @file{.gdbinit} and 2751 To get started debugging XEmacs, take a look at the @file{.gdbinit} and
2663 @file{.dbxrc} files in the @file{src} directory. 2752 @file{.dbxrc} files in the @file{src} directory. See the section in the
2664 @xref{Q2.1.15 - How to Debug an XEmacs problem with a debugger,,, 2753 XEmacs FAQ on How to Debug an XEmacs problem with a debugger.
2665 xemacs-faq, XEmacs FAQ}.
2666 2754
2667 After making source code changes, run @code{make check} to ensure that 2755 After making source code changes, run @code{make check} to ensure that
2668 you haven't introduced any regressions. If you're feeling ambitious, 2756 you haven't introduced any regressions. If you want to make xemacs more
2669 you can try to improve the test suite in @file{tests/automated}. 2757 reliable, please improve the test suite in @file{tests/automated}.
2758
2759 Did you make sure you didn't introduce any new compiler warnings?
2760
2761 Before submitting a patch, please try compiling at least once with
2762
2763 @example
2764 configure --with-mule --with-union-type --error-checking=all
2765 @end example
2670 2766
2671 Here are things to know when you create a new source file: 2767 Here are things to know when you create a new source file:
2672 2768
2673 @itemize @bullet 2769 @itemize @bullet
2674 @item 2770 @item
2677 2773
2678 @item 2774 @item
2679 Generated header files should be included using the @code{#include <...>} syntax, 2775 Generated header files should be included using the @code{#include <...>} syntax,
2680 not the @code{#include "..."} syntax. The generated headers are: 2776 not the @code{#include "..."} syntax. The generated headers are:
2681 2777
2682 @file{config.h puresize-adjust.h sheap-adjust.h paths.h Emacs.ad.h} 2778 @file{config.h sheap-adjust.h paths.h Emacs.ad.h}
2683 2779
2684 The basic rule is that you should assume builds using @code{--srcdir} 2780 The basic rule is that you should assume builds using @code{--srcdir}
2685 and the @code{#include <...>} syntax needs to be used when the 2781 and the @code{#include <...>} syntax needs to be used when the
2686 to-be-included generated file is in a potentially different directory 2782 to-be-included generated file is in a potentially different directory
2687 @emph{at compile time}. The non-obvious C rule is that @code{#include "..."} 2783 @emph{at compile time}. The non-obvious C rule is that @code{#include "..."}
2691 @item 2787 @item
2692 Header files should @emph{not} include @code{<config.h>} and 2788 Header files should @emph{not} include @code{<config.h>} and
2693 @code{"lisp.h"}. It is the responsibility of the @file{.c} files that 2789 @code{"lisp.h"}. It is the responsibility of the @file{.c} files that
2694 use it to do so. 2790 use it to do so.
2695 2791
2696 @item
2697 If the header uses @code{INLINE}, either directly or through
2698 @code{DECLARE_LRECORD}, then it must be added to @file{inline.c}'s
2699 includes.
2700
2701 @item
2702 Try compiling at least once with
2703
2704 @example
2705 gcc --with-mule --with-union-type --error-checking=all
2706 @end example
2707
2708 @item
2709 Did I mention that you should run the test suite?
2710 @example
2711 make check
2712 @end example
2713 @end itemize 2792 @end itemize
2714 2793
2794 Here is a checklist of things to do when creating a new lisp object type
2795 named @var{foo}:
2796
2797 @enumerate
2798 @item
2799 create @var{foo}.h
2800 @item
2801 create @var{foo}.c
2802 @item
2803 add definitions of @code{syms_of_@var{foo}}, etc. to @file{@var{foo}.c}
2804 @item
2805 add declarations of @code{syms_of_@var{foo}}, etc. to @file{symsinit.h}
2806 @item
2807 add calls to @code{syms_of_@var{foo}}, etc. to @file{emacs.c}
2808 @item
2809 add definitions of macros like @code{CHECK_@var{FOO}} and
2810 @code{@var{FOO}P} to @file{@var{foo}.h}
2811 @item
2812 add the new type index to @code{enum lrecord_type}
2813 @item
2814 add a DEFINE_LRECORD_IMPLEMENTATION call to @file{@var{foo}.c}
2815 @item
2816 add an INIT_LRECORD_IMPLEMENTATION call to @code{syms_of_@var{foo}.c}
2817 @end enumerate
2715 2818
2716 @node A Summary of the Various XEmacs Modules, Allocation of Objects in XEmacs Lisp, Rules When Writing New C Code, Top 2819 @node A Summary of the Various XEmacs Modules, Allocation of Objects in XEmacs Lisp, Rules When Writing New C Code, Top
2717 @chapter A Summary of the Various XEmacs Modules 2820 @chapter A Summary of the Various XEmacs Modules
2718 2821
2719 This is accurate as of XEmacs 20.0. 2822 This is accurate as of XEmacs 20.0.
2731 * Modules for Interfacing with the Operating System:: 2834 * Modules for Interfacing with the Operating System::
2732 * Modules for Interfacing with X Windows:: 2835 * Modules for Interfacing with X Windows::
2733 * Modules for Internationalization:: 2836 * Modules for Internationalization::
2734 @end menu 2837 @end menu
2735 2838
2736 @node Low-Level Modules 2839 @node Low-Level Modules, Basic Lisp Modules, A Summary of the Various XEmacs Modules, A Summary of the Various XEmacs Modules
2737 @section Low-Level Modules 2840 @section Low-Level Modules
2738 2841
2739 @example 2842 @example
2740 config.h 2843 config.h
2741 @end example 2844 @end example
2805 chosen by @file{configure}. 2908 chosen by @file{configure}.
2806 2909
2807 2910
2808 2911
2809 @example 2912 @example
2810 crt0.c 2913 ecrt0.c
2811 lastfile.c 2914 lastfile.c
2812 pre-crt0.c 2915 pre-crt0.c
2813 @end example 2916 @end example
2814 2917
2815 These modules are used in conjunction with the dump mechanism. On some 2918 These modules are used in conjunction with the dump mechanism. On some
2940 provided by the @samp{--error-check-*} configuration options. 3043 provided by the @samp{--error-check-*} configuration options.
2941 3044
2942 3045
2943 3046
2944 @example 3047 @example
2945 prefix-args.c
2946 @end example
2947
2948 This is actually the source for a small, self-contained program
2949 used during building.
2950
2951
2952 @example
2953 universe.h 3048 universe.h
2954 @end example 3049 @end example
2955 3050
2956 This is not currently used. 3051 This is not currently used.
2957 3052
2958 3053
2959 3054
2960 @node Basic Lisp Modules 3055 @node Basic Lisp Modules, Modules for Standard Editing Operations, Low-Level Modules, A Summary of the Various XEmacs Modules
2961 @section Basic Lisp Modules 3056 @section Basic Lisp Modules
2962 3057
2963 @example 3058 @example
2964 emacsfns.h
2965 lisp-disunion.h 3059 lisp-disunion.h
2966 lisp-union.h 3060 lisp-union.h
2967 lisp.h 3061 lisp.h
2968 lrecord.h 3062 lrecord.h
2969 symsinit.h 3063 symsinit.h
3008 3102
3009 3103
3010 3104
3011 @example 3105 @example
3012 alloc.c 3106 alloc.c
3013 pure.c
3014 puresize.h
3015 @end example 3107 @end example
3016 3108
3017 The large module @file{alloc.c} implements all of the basic allocation and 3109 The large module @file{alloc.c} implements all of the basic allocation and
3018 garbage collection for Lisp objects. The most commonly used Lisp 3110 garbage collection for Lisp objects. The most commonly used Lisp
3019 objects are allocated in chunks, similar to the Blocktype data type 3111 objects are allocated in chunks, similar to the Blocktype data type
3035 code, adding a new subtype within a subsystem will in general not 3127 code, adding a new subtype within a subsystem will in general not
3036 require changes to the generic subsystem code or affect any of the other 3128 require changes to the generic subsystem code or affect any of the other
3037 subtypes in the subsystem; this provides a great deal of robustness to 3129 subtypes in the subsystem; this provides a great deal of robustness to
3038 the XEmacs code. 3130 the XEmacs code.
3039 3131
3040 @cindex pure space
3041 @file{pure.c} contains the declaration of the @dfn{purespace} array.
3042 Pure space is a hack used to place some constant Lisp data into the code
3043 segment of the XEmacs executable, even though the data needs to be
3044 initialized through function calls. (See above in section VIII for more
3045 info about this.) During startup, certain sorts of data is
3046 automatically copied into pure space, and other data is copied manually
3047 in some of the basic Lisp files by calling the function @code{purecopy},
3048 which copies the object if possible (this only works in temacs, of
3049 course) and returns the new object. In particular, while temacs is
3050 executing, the Lisp reader automatically copies all compiled-function
3051 objects that it reads into pure space. Since compiled-function objects
3052 are large, are never modified, and typically comprise the majority of
3053 the contents of a compiled-Lisp file, this works well. While XEmacs is
3054 running, any attempt to modify an object that resides in pure space
3055 causes an error. Objects in pure space are never garbage collected --
3056 almost all of the time, they're intended to be permanent, and in any
3057 case you can't write into pure space to set the mark bits.
3058
3059 @file{puresize.h} contains the declaration of the size of the pure space
3060 array. This depends on the optional features that are compiled in, any
3061 extra purespace requested by the user at compile time, and certain other
3062 factors (e.g. 64-bit machines need more pure space because their Lisp
3063 objects are larger). The smallest size that suffices should be used, so
3064 that there's no wasted space. If there's not enough pure space, you
3065 will get an error during the build process, specifying how much more
3066 pure space is needed.
3067
3068
3069 3132
3070 @example 3133 @example
3071 eval.c 3134 eval.c
3072 backtrace.h 3135 backtrace.h
3073 @end example 3136 @end example
3161 structures. Note that the byte-code @emph{compiler} is written in Lisp. 3224 structures. Note that the byte-code @emph{compiler} is written in Lisp.
3162 3225
3163 3226
3164 3227
3165 3228
3166 @node Modules for Standard Editing Operations 3229 @node Modules for Standard Editing Operations, Editor-Level Control Flow Modules, Basic Lisp Modules, A Summary of the Various XEmacs Modules
3167 @section Modules for Standard Editing Operations 3230 @section Modules for Standard Editing Operations
3168 3231
3169 @example 3232 @example
3170 buffer.c 3233 buffer.c
3171 buffer.h 3234 buffer.h
3331 This module implements the undo mechanism for tracking buffer changes. 3394 This module implements the undo mechanism for tracking buffer changes.
3332 Most of this could be implemented in Lisp. 3395 Most of this could be implemented in Lisp.
3333 3396
3334 3397
3335 3398
3336 @node Editor-Level Control Flow Modules 3399 @node Editor-Level Control Flow Modules, Modules for the Basic Displayable Lisp Objects, Modules for Standard Editing Operations, A Summary of the Various XEmacs Modules
3337 @section Editor-Level Control Flow Modules 3400 @section Editor-Level Control Flow Modules
3338 3401
3339 @example 3402 @example
3340 event-Xt.c 3403 event-Xt.c
3404 event-msw.c
3341 event-stream.c 3405 event-stream.c
3342 event-tty.c 3406 event-tty.c
3407 events-mod.h
3408 gpmevent.c
3409 gpmevent.h
3343 events.c 3410 events.c
3344 events.h 3411 events.h
3345 @end example 3412 @end example
3346 3413
3347 These implement the handling of events (user input and other system 3414 These implement the handling of events (user input and other system
3392 relevant keymaps.) 3459 relevant keymaps.)
3393 3460
3394 3461
3395 3462
3396 @example 3463 @example
3397 keyboard.c 3464 cmdloop.c
3398 @end example 3465 @end example
3399 3466
3400 @file{keyboard.c} contains functions that implement the actual editor 3467 @file{cmdloop.c} contains functions that implement the actual editor
3401 command loop---i.e. the event loop that cyclically retrieves and 3468 command loop---i.e. the event loop that cyclically retrieves and
3402 dispatches events. This code is also rather tricky, just like 3469 dispatches events. This code is also rather tricky, just like
3403 @file{event-stream.c}. 3470 @file{event-stream.c}.
3404 3471
3405 3472
3429 bootstrapping implementations early in temacs, before the echo-area Lisp 3496 bootstrapping implementations early in temacs, before the echo-area Lisp
3430 code is loaded). 3497 code is loaded).
3431 3498
3432 3499
3433 3500
3434 @node Modules for the Basic Displayable Lisp Objects 3501 @node Modules for the Basic Displayable Lisp Objects, Modules for other Display-Related Lisp Objects, Editor-Level Control Flow Modules, A Summary of the Various XEmacs Modules
3435 @section Modules for the Basic Displayable Lisp Objects 3502 @section Modules for the Basic Displayable Lisp Objects
3436 3503
3437 @example 3504 @example
3438 device-ns.h 3505 console-msw.c
3439 device-stream.c 3506 console-msw.h
3440 device-stream.h 3507 console-stream.c
3508 console-stream.h
3509 console-tty.c
3510 console-tty.h
3511 console-x.c
3512 console-x.h
3513 console.c
3514 console.h
3515 @end example
3516
3517 These modules implement the @dfn{console} Lisp object type. A console
3518 contains multiple display devices, but only one keyboard and mouse.
3519 Most of the time, a console will contain exactly one device.
3520
3521 Consoles are the top of a lisp object inclusion hierarchy. Consoles
3522 contain devices, which contain frames, which contain windows.
3523
3524
3525
3526 @example
3527 device-msw.c
3441 device-tty.c 3528 device-tty.c
3442 device-tty.h
3443 device-x.c 3529 device-x.c
3444 device-x.h
3445 device.c 3530 device.c
3446 device.h 3531 device.h
3447 @end example 3532 @end example
3448 3533
3449 These modules implement the @dfn{device} Lisp object type. This 3534 These modules implement the @dfn{device} Lisp object type. This
3460 subtypes (X, TTY, NeXTstep, Microsoft Windows, etc.) as devices do. 3545 subtypes (X, TTY, NeXTstep, Microsoft Windows, etc.) as devices do.
3461 3546
3462 3547
3463 3548
3464 @example 3549 @example
3465 frame-ns.h 3550 frame-msw.c
3466 frame-tty.c 3551 frame-tty.c
3467 frame-x.c 3552 frame-x.c
3468 frame-x.h
3469 frame.c 3553 frame.c
3470 frame.h 3554 frame.h
3471 @end example 3555 @end example
3472 3556
3473 Each device contains one or more frames in which objects (e.g. text) are 3557 Each device contains one or more frames in which objects (e.g. text) are
3503 is part of the redisplay mechanism or the code for particular object 3587 is part of the redisplay mechanism or the code for particular object
3504 types such as scrollbars. 3588 types such as scrollbars.
3505 3589
3506 3590
3507 3591
3508 @node Modules for other Display-Related Lisp Objects 3592 @node Modules for other Display-Related Lisp Objects, Modules for the Redisplay Mechanism, Modules for the Basic Displayable Lisp Objects, A Summary of the Various XEmacs Modules
3509 @section Modules for other Display-Related Lisp Objects 3593 @section Modules for other Display-Related Lisp Objects
3510 3594
3511 @example 3595 @example
3512 faces.c 3596 faces.c
3513 faces.h 3597 faces.h
3515 3599
3516 3600
3517 3601
3518 @example 3602 @example
3519 bitmaps.h 3603 bitmaps.h
3520 glyphs-ns.h 3604 glyphs-eimage.c
3605 glyphs-msw.c
3606 glyphs-msw.h
3607 glyphs-widget.c
3521 glyphs-x.c 3608 glyphs-x.c
3522 glyphs-x.h 3609 glyphs-x.h
3523 glyphs.c 3610 glyphs.c
3524 glyphs.h 3611 glyphs.h
3525 @end example 3612 @end example
3526 3613
3527 3614
3528 3615
3529 @example 3616 @example
3530 objects-ns.h 3617 objects-msw.c
3618 objects-msw.h
3531 objects-tty.c 3619 objects-tty.c
3532 objects-tty.h 3620 objects-tty.h
3533 objects-x.c 3621 objects-x.c
3534 objects-x.h 3622 objects-x.h
3535 objects.c 3623 objects.c
3537 @end example 3625 @end example
3538 3626
3539 3627
3540 3628
3541 @example 3629 @example
3630 menubar-msw.c
3631 menubar-msw.h
3542 menubar-x.c 3632 menubar-x.c
3543 menubar.c 3633 menubar.c
3544 @end example 3634 menubar.h
3545 3635 @end example
3546 3636
3547 3637
3548 @example 3638
3639 @example
3640 scrollbar-msw.c
3641 scrollbar-msw.h
3549 scrollbar-x.c 3642 scrollbar-x.c
3550 scrollbar-x.h 3643 scrollbar-x.h
3551 scrollbar.c 3644 scrollbar.c
3552 scrollbar.h 3645 scrollbar.h
3553 @end example 3646 @end example
3554 3647
3555 3648
3556 3649
3557 @example 3650 @example
3651 toolbar-msw.c
3558 toolbar-x.c 3652 toolbar-x.c
3559 toolbar.c 3653 toolbar.c
3560 toolbar.h 3654 toolbar.h
3561 @end example 3655 @end example
3562 3656
3579 gif_lib.h 3673 gif_lib.h
3580 gifalloc.c 3674 gifalloc.c
3581 @end example 3675 @end example
3582 3676
3583 These modules decode GIF-format image files, for use with glyphs. 3677 These modules decode GIF-format image files, for use with glyphs.
3584 3678 These files were removed due to Unisys patent infringement concerns.
3585 3679
3586 3680
3587 @node Modules for the Redisplay Mechanism 3681
3682 @node Modules for the Redisplay Mechanism, Modules for Interfacing with the File System, Modules for other Display-Related Lisp Objects, A Summary of the Various XEmacs Modules
3588 @section Modules for the Redisplay Mechanism 3683 @section Modules for the Redisplay Mechanism
3589 3684
3590 @example 3685 @example
3591 redisplay-output.c 3686 redisplay-output.c
3687 redisplay-msw.c
3592 redisplay-tty.c 3688 redisplay-tty.c
3593 redisplay-x.c 3689 redisplay-x.c
3594 redisplay.c 3690 redisplay.c
3595 redisplay.h 3691 redisplay.h
3596 @end example 3692 @end example
3654 These files provide some miscellaneous TTY-output functions and should 3750 These files provide some miscellaneous TTY-output functions and should
3655 probably be merged into @file{redisplay-tty.c}. 3751 probably be merged into @file{redisplay-tty.c}.
3656 3752
3657 3753
3658 3754
3659 @node Modules for Interfacing with the File System 3755 @node Modules for Interfacing with the File System, Modules for Other Aspects of the Lisp Interpreter and Object System, Modules for the Redisplay Mechanism, A Summary of the Various XEmacs Modules
3660 @section Modules for Interfacing with the File System 3756 @section Modules for Interfacing with the File System
3661 3757
3662 @example 3758 @example
3663 lstream.c 3759 lstream.c
3664 lstream.h 3760 lstream.h
3681 streams and C++ I/O streams. 3777 streams and C++ I/O streams.
3682 3778
3683 Similar to other subsystems in XEmacs, lstreams are separated into 3779 Similar to other subsystems in XEmacs, lstreams are separated into
3684 generic functions and a set of methods for the different types of 3780 generic functions and a set of methods for the different types of
3685 lstreams. @file{lstream.c} provides implementations of many different 3781 lstreams. @file{lstream.c} provides implementations of many different
3686 types of streams; others are provided, e.g., in @file{mule-coding.c}. 3782 types of streams; others are provided, e.g., in @file{file-coding.c}.
3687 3783
3688 3784
3689 3785
3690 @example 3786 @example
3691 fileio.c 3787 fileio.c
3755 for expanding symbolic links, on systems that don't implement it or have 3851 for expanding symbolic links, on systems that don't implement it or have
3756 a broken implementation. 3852 a broken implementation.
3757 3853
3758 3854
3759 3855
3760 @node Modules for Other Aspects of the Lisp Interpreter and Object System 3856 @node Modules for Other Aspects of the Lisp Interpreter and Object System, Modules for Interfacing with the Operating System, Modules for Interfacing with the File System, A Summary of the Various XEmacs Modules
3761 @section Modules for Other Aspects of the Lisp Interpreter and Object System 3857 @section Modules for Other Aspects of the Lisp Interpreter and Object System
3762 3858
3763 @example 3859 @example
3764 elhash.c 3860 elhash.c
3765 elhash.h 3861 elhash.h
3917 various security applications on the Internet. 4013 various security applications on the Internet.
3918 4014
3919 4015
3920 4016
3921 4017
3922 @node Modules for Interfacing with the Operating System 4018 @node Modules for Interfacing with the Operating System, Modules for Interfacing with X Windows, Modules for Other Aspects of the Lisp Interpreter and Object System, A Summary of the Various XEmacs Modules
3923 @section Modules for Interfacing with the Operating System 4019 @section Modules for Interfacing with the Operating System
3924 4020
3925 @example 4021 @example
3926 callproc.c 4022 callproc.c
3927 process.c 4023 process.c
4146 This module provides some terminal-control code necessary on versions of 4242 This module provides some terminal-control code necessary on versions of
4147 AIX prior to 4.1. 4243 AIX prior to 4.1.
4148 4244
4149 4245
4150 4246
4151 @example 4247 @node Modules for Interfacing with X Windows, Modules for Internationalization, Modules for Interfacing with the Operating System, A Summary of the Various XEmacs Modules
4152 msdos.c
4153 msdos.h
4154 @end example
4155
4156 These modules are used for MS-DOS support, which does not work in
4157 XEmacs.
4158
4159
4160
4161 @node Modules for Interfacing with X Windows
4162 @section Modules for Interfacing with X Windows 4248 @section Modules for Interfacing with X Windows
4163 4249
4164 @example 4250 @example
4165 Emacs.ad.h 4251 Emacs.ad.h
4166 @end example 4252 @end example
4223 needs to be rewritten. 4309 needs to be rewritten.
4224 4310
4225 4311
4226 4312
4227 @example 4313 @example
4228 xselect.c 4314 select-msw.c
4315 select-x.c
4316 select.c
4317 select.h
4229 @end example 4318 @end example
4230 4319
4231 @cindex selections 4320 @cindex selections
4232 This module provides an interface to the X Window System's concept of 4321 This module provides an interface to the X Window System's concept of
4233 @dfn{selections}, the standard way for X applications to communicate 4322 @dfn{selections}, the standard way for X applications to communicate
4298 4387
4299 Don't touch this code; something is liable to break if you do. 4388 Don't touch this code; something is liable to break if you do.
4300 4389
4301 4390
4302 4391
4303 @node Modules for Internationalization 4392 @node Modules for Internationalization, , Modules for Interfacing with X Windows, A Summary of the Various XEmacs Modules
4304 @section Modules for Internationalization 4393 @section Modules for Internationalization
4305 4394
4306 @example 4395 @example
4307 mule-canna.c 4396 mule-canna.c
4308 mule-ccl.c 4397 mule-ccl.c
4309 mule-charset.c 4398 mule-charset.c
4310 mule-charset.h 4399 mule-charset.h
4311 mule-coding.c 4400 file-coding.c
4312 mule-coding.h 4401 file-coding.h
4313 mule-mcpath.c 4402 mule-mcpath.c
4314 mule-mcpath.h 4403 mule-mcpath.h
4315 mule-wnnfns.c 4404 mule-wnnfns.c
4316 mule.c 4405 mule.c
4317 @end example 4406 @end example
4319 These files implement the MULE (Asian-language) support. Note that MULE 4408 These files implement the MULE (Asian-language) support. Note that MULE
4320 actually provides a general interface for all sorts of languages, not 4409 actually provides a general interface for all sorts of languages, not
4321 just Asian languages (although they are generally the most complicated 4410 just Asian languages (although they are generally the most complicated
4322 to support). This code is still in beta. 4411 to support). This code is still in beta.
4323 4412
4324 @file{mule-charset.*} and @file{mule-coding.*} provide the heart of the 4413 @file{mule-charset.*} and @file{file-coding.*} provide the heart of the
4325 XEmacs MULE support. @file{mule-charset.*} implements the @dfn{charset} 4414 XEmacs MULE support. @file{mule-charset.*} implements the @dfn{charset}
4326 Lisp object type, which encapsulates a character set (an ordered one- or 4415 Lisp object type, which encapsulates a character set (an ordered one- or
4327 two-dimensional set of characters, such as US ASCII or JISX0208 Japanese 4416 two-dimensional set of characters, such as US ASCII or JISX0208 Japanese
4328 Kanji). 4417 Kanji).
4329 4418
4330 @file{mule-coding.*} implements the @dfn{coding-system} Lisp object 4419 @file{file-coding.*} implements the @dfn{coding-system} Lisp object
4331 type, which encapsulates a method of converting between different 4420 type, which encapsulates a method of converting between different
4332 encodings. An encoding is a representation of a stream of characters, 4421 encodings. An encoding is a representation of a stream of characters,
4333 possibly from multiple character sets, using a stream of bytes or words, 4422 possibly from multiple character sets, using a stream of bytes or words,
4334 and defines (e.g.) which escape sequences are used to specify particular 4423 and defines (e.g.) which escape sequences are used to specify particular
4335 character sets, how the indices for a character are converted into bytes 4424 character sets, how the indices for a character are converted into bytes
4375 Asian-language support, and is not currently used. 4464 Asian-language support, and is not currently used.
4376 4465
4377 4466
4378 4467
4379 4468
4380 @node Allocation of Objects in XEmacs Lisp, Events and the Event Loop, A Summary of the Various XEmacs Modules, Top 4469 @node Allocation of Objects in XEmacs Lisp, Dumping, A Summary of the Various XEmacs Modules, Top
4381 @chapter Allocation of Objects in XEmacs Lisp 4470 @chapter Allocation of Objects in XEmacs Lisp
4382 4471
4383 @menu 4472 @menu
4384 * Introduction to Allocation:: 4473 * Introduction to Allocation::
4385 * Garbage Collection:: 4474 * Garbage Collection::
4387 * Garbage Collection - Step by Step:: 4476 * Garbage Collection - Step by Step::
4388 * Integers and Characters:: 4477 * Integers and Characters::
4389 * Allocation from Frob Blocks:: 4478 * Allocation from Frob Blocks::
4390 * lrecords:: 4479 * lrecords::
4391 * Low-level allocation:: 4480 * Low-level allocation::
4392 * Pure Space::
4393 * Cons:: 4481 * Cons::
4394 * Vector:: 4482 * Vector::
4395 * Bit Vector:: 4483 * Bit Vector::
4396 * Symbol:: 4484 * Symbol::
4397 * Marker:: 4485 * Marker::
4398 * String:: 4486 * String::
4399 * Compiled Function:: 4487 * Compiled Function::
4400 @end menu 4488 @end menu
4401 4489
4402 @node Introduction to Allocation 4490 @node Introduction to Allocation, Garbage Collection, Allocation of Objects in XEmacs Lisp, Allocation of Objects in XEmacs Lisp
4403 @section Introduction to Allocation 4491 @section Introduction to Allocation
4404 4492
4405 Emacs Lisp, like all Lisps, has garbage collection. This means that 4493 Emacs Lisp, like all Lisps, has garbage collection. This means that
4406 the programmer never has to explicitly free (destroy) an object; it 4494 the programmer never has to explicitly free (destroy) an object; it
4407 happens automatically when the object becomes inaccessible. Most 4495 happens automatically when the object becomes inaccessible. Most
4418 symbols, the primitives are @code{make-symbol} and @code{intern}; etc. 4506 symbols, the primitives are @code{make-symbol} and @code{intern}; etc.
4419 Some Lisp objects, especially those that are primarily used internally, 4507 Some Lisp objects, especially those that are primarily used internally,
4420 have no corresponding Lisp primitives. Every Lisp object, though, 4508 have no corresponding Lisp primitives. Every Lisp object, though,
4421 has at least one C primitive for creating it. 4509 has at least one C primitive for creating it.
4422 4510
4423 Recall from section (VII) that a Lisp object, as stored in a 32-bit 4511 Recall from section (VII) that a Lisp object, as stored in a 32-bit or
4424 or 64-bit word, has a mark bit, a few tag bits, and a ``value'' that 4512 64-bit word, has a few tag bits, and a ``value'' that occupies the
4425 occupies the remainder of the bits. We can separate the different 4513 remainder of the bits. We can separate the different Lisp object types
4426 Lisp object types into four broad categories: 4514 into three broad categories:
4427 4515
4428 @itemize @bullet 4516 @itemize @bullet
4429 @item 4517 @item
4430 (a) Those for whom the value directly represents the contents of the 4518 (a) Those for whom the value directly represents the contents of the
4431 Lisp object. Only two types are in this category: integers and 4519 Lisp object. Only two types are in this category: integers and
4432 characters. No special allocation or garbage collection is necessary 4520 characters. No special allocation or garbage collection is necessary
4433 for such objects. Lisp objects of these types do not need to be 4521 for such objects. Lisp objects of these types do not need to be
4434 @code{GCPRO}ed. 4522 @code{GCPRO}ed.
4435 @end itemize 4523 @end itemize
4436 4524
4437 In the remaining three categories, the value is a pointer to a
4438 structure.
4439
4440 @itemize @bullet
4441 @item
4442 @cindex frob block
4443 (b) Those for whom the tag directly specifies the type. Recall that
4444 there are only three tag bits; this means that at most five types can be
4445 specified this way. The most commonly-used types are stored in this
4446 format; this includes conses, strings, vectors, and sometimes symbols.
4447 With the exception of vectors, objects in this category are allocated in
4448 @dfn{frob blocks}, i.e. large blocks of memory that are subdivided into
4449 individual objects. This saves a lot on malloc overhead, since there
4450 are typically quite a lot of these objects around, and the objects are
4451 small. (A cons, for example, occupies 8 bytes on 32-bit machines---4
4452 bytes for each of the two objects it contains.) Vectors are individually
4453 @code{malloc()}ed since they are of variable size. (It would be
4454 possible, and desirable, to allocate vectors of certain small sizes out
4455 of frob blocks, but it isn't currently done.) Strings are handled
4456 specially: Each string is allocated in two parts, a fixed size structure
4457 containing a length and a data pointer, and the actual data of the
4458 string. The former structure is allocated in frob blocks as usual, and
4459 the latter data is stored in @dfn{string chars blocks} and is relocated
4460 during garbage collection to eliminate holes.
4461 @end itemize
4462
4463 In the remaining two categories, the type is stored in the object 4525 In the remaining two categories, the type is stored in the object
4464 itself. The tag for all such objects is the generic @dfn{lrecord} 4526 itself. The tag for all such objects is the generic @dfn{lrecord}
4465 (Lisp_Record) tag. The first four bytes (or eight, for 64-bit machines) 4527 (Lisp_Type_Record) tag. The first bytes of the object's structure are an
4466 of the object's structure are a pointer to a structure that describes 4528 integer (actually a char) characterising the object's type and some
4467 the object's type, which includes method pointers and a pointer to a 4529 flags, in particular the mark bit used for garbage collection. A
4468 string naming the type. Note that it's possible to save some space by 4530 structure describing the type is accessible thru the
4469 using a one- or two-byte tag, rather than a four- or eight-byte pointer 4531 lrecord_implementation_table indexed with said integer. This structure
4470 to store the type, but it's not clear it's worth making the change. 4532 includes the method pointers and a pointer to a string naming the type.
4471 4533
4472 @itemize @bullet 4534 @itemize @bullet
4473 @item 4535 @item
4474 (c) Those lrecords that are allocated in frob blocks (see above). This 4536 (b) Those lrecords that are allocated in frob blocks (see above). This
4475 includes the objects that are most common and relatively small, and 4537 includes the objects that are most common and relatively small, and
4476 includes floats, compiled functions, symbols (when not in category (b)), 4538 includes conses, strings, subrs, floats, compiled functions, symbols,
4477 extents, events, and markers. With the cleanup of frob blocks done in 4539 extents, events, and markers. With the cleanup of frob blocks done in
4478 19.12, it's not terribly hard to add more objects to this category, but 4540 19.12, it's not terribly hard to add more objects to this category, but
4479 it's a bit trickier than adding an object type to type (d) (esp. if the 4541 it's a bit trickier than adding an object type to type (c) (esp. if the
4480 object needs a finalization method), and is not likely to save much 4542 object needs a finalization method), and is not likely to save much
4481 space unless the object is small and there are many of them. (In fact, 4543 space unless the object is small and there are many of them. (In fact,
4482 if there are very few of them, it might actually waste space.) 4544 if there are very few of them, it might actually waste space.)
4483 @item 4545 @item
4484 (d) Those lrecords that are individually @code{malloc()}ed. These are 4546 (c) Those lrecords that are individually @code{malloc()}ed. These are
4485 called @dfn{lcrecords}. All other types are in this category. Adding a 4547 called @dfn{lcrecords}. All other types are in this category. Adding a
4486 new type to this category is comparatively easy, and all types added 4548 new type to this category is comparatively easy, and all types added
4487 since 19.8 (when the current allocation scheme was devised, by Richard 4549 since 19.8 (when the current allocation scheme was devised, by Richard
4488 Mlynarik), with the exception of the character type, have been in this 4550 Mlynarik), with the exception of the character type, have been in this
4489 category. 4551 category.
4490 @end itemize 4552 @end itemize
4491 4553
4492 Note that bit vectors are a bit of a special case. They are 4554 Note that bit vectors are a bit of a special case. They are
4493 simple lrecords as in category (c), but are individually @code{malloc()}ed 4555 simple lrecords as in category (b), but are individually @code{malloc()}ed
4494 like vectors. You can basically view them as exactly like vectors 4556 like vectors. You can basically view them as exactly like vectors
4495 except that their type is stored in lrecord fashion rather than 4557 except that their type is stored in lrecord fashion rather than
4496 in directly-tagged fashion. 4558 in directly-tagged fashion.
4497 4559
4498 Note that FSF Emacs redesigned their object system in 19.29 to follow 4560
4499 a similar scheme. However, given RMS's expressed dislike for data 4561 @node Garbage Collection, GCPROing, Introduction to Allocation, Allocation of Objects in XEmacs Lisp
4500 abstraction, the FSF scheme is not nearly as clean or as easy to
4501 extend. (FSF calls items of type (c) @code{Lisp_Misc} and items of type
4502 (d) @code{Lisp_Vectorlike}, with separate tags for each, although
4503 @code{Lisp_Vectorlike} is also used for vectors.)
4504
4505 @node Garbage Collection
4506 @section Garbage Collection 4562 @section Garbage Collection
4507 @cindex garbage collection 4563 @cindex garbage collection
4508 4564
4509 @cindex mark and sweep 4565 @cindex mark and sweep
4510 Garbage collection is simple in theory but tricky to implement. 4566 Garbage collection is simple in theory but tricky to implement.
4518 that ``all of memory'' means all currently allocated objects. 4574 that ``all of memory'' means all currently allocated objects.
4519 Traversing all these objects means traversing all frob blocks, 4575 Traversing all these objects means traversing all frob blocks,
4520 all vectors (which are chained in one big list), and all 4576 all vectors (which are chained in one big list), and all
4521 lcrecords (which are likewise chained). 4577 lcrecords (which are likewise chained).
4522 4578
4523 Note that, when an object is marked, the mark has to occur 4579 Garbage collection can be invoked explicitly by calling
4524 inside of the object's structure, rather than in the 32-bit 4580 @code{garbage-collect} but is also called automatically by @code{eval},
4525 @code{Lisp_Object} holding the object's pointer; i.e. you can't just 4581 once a certain amount of memory has been allocated since the last
4526 set the pointer's mark bit. This is because there may be many 4582 garbage collection (according to @code{gc-cons-threshold}).
4527 pointers to the same object. This means that the method of 4583
4528 marking an object can differ depending on the type. The 4584
4529 different marking methods are approximately as follows: 4585 @node GCPROing, Garbage Collection - Step by Step, Garbage Collection, Allocation of Objects in XEmacs Lisp
4530
4531 @enumerate
4532 @item
4533 For conses, the mark bit of the car is set.
4534 @item
4535 For strings, the mark bit of the string's plist is set.
4536 @item
4537 For symbols when not lrecords, the mark bit of the
4538 symbol's plist is set.
4539 @item
4540 For vectors, the length is negated after adding 1.
4541 @item
4542 For lrecords, the pointer to the structure describing
4543 the type is changed (see below).
4544 @item
4545 Integers and characters do not need to be marked, since
4546 no allocation occurs for them.
4547 @end enumerate
4548
4549 The details of this are in the @code{mark_object()} function.
4550
4551 Note that any code that operates during garbage collection has
4552 to be especially careful because of the fact that some objects
4553 may be marked and as such may not look like they normally do.
4554 In particular:
4555
4556 @itemize @bullet
4557 Some object pointers may have their mark bit set. This will make
4558 @code{FOOBARP()} predicates fail. Use @code{GC_FOOBARP()} to deal with
4559 this.
4560 @item
4561 Even if you clear the mark bit, @code{FOOBARP()} will still fail
4562 for lrecords because the implementation pointer has been
4563 changed (see below). @code{GC_FOOBARP()} will correctly deal with
4564 this.
4565 @item
4566 Vectors have their size field munged, so anything that
4567 looks at this field will fail.
4568 @item
4569 Note that @code{XFOOBAR()} macros @emph{will} work correctly on object
4570 pointers with their mark bit set, because the logical shift operations
4571 that remove the tag also remove the mark bit.
4572 @end itemize
4573
4574 Finally, note that garbage collection can be invoked explicitly
4575 by calling @code{garbage-collect} but is also called automatically
4576 by @code{eval}, once a certain amount of memory has been allocated
4577 since the last garbage collection (according to @code{gc-cons-threshold}).
4578
4579 @node GCPROing
4580 @section @code{GCPRO}ing 4586 @section @code{GCPRO}ing
4581 4587
4582 @code{GCPRO}ing is one of the ugliest and trickiest parts of Emacs 4588 @code{GCPRO}ing is one of the ugliest and trickiest parts of Emacs
4583 internals. The basic idea is that whenever garbage collection 4589 internals. The basic idea is that whenever garbage collection
4584 occurs, all in-use objects must be reachable somehow or 4590 occurs, all in-use objects must be reachable somehow or
4585 other from one of the roots of accessibility. The roots 4591 other from one of the roots of accessibility. The roots
4586 of accessibility are: 4592 of accessibility are:
4587 4593
4588 @enumerate 4594 @enumerate
4589 @item 4595 @item
4590 All objects that have been @code{staticpro()}d. This is used for 4596 All objects that have been @code{staticpro()}d or
4591 any global C variables that hold Lisp objects. A call to 4597 @code{staticpro_nodump()}ed. This is used for any global C variables
4592 @code{staticpro()} happens implicitly as a result of any symbols 4598 that hold Lisp objects. A call to @code{staticpro()} happens implicitly
4593 declared with @code{defsymbol()} and any variables declared with 4599 as a result of any symbols declared with @code{defsymbol()} and any
4594 @code{DEFVAR_FOO()}. You need to explicitly call @code{staticpro()} 4600 variables declared with @code{DEFVAR_FOO()}. You need to explicitly
4595 (in the @code{vars_of_foo()} method of a module) for other global 4601 call @code{staticpro()} (in the @code{vars_of_foo()} method of a module)
4596 C variables holding Lisp objects. (This typically includes 4602 for other global C variables holding Lisp objects. (This typically
4597 internal lists and such things.) 4603 includes internal lists and such things.). Use
4604 @code{staticpro_nodump()} only in the rare cases when you do not want
4605 the pointed variable to be saved at dump time but rather recompute it at
4606 startup.
4598 4607
4599 Note that @code{obarray} is one of the @code{staticpro()}d things. 4608 Note that @code{obarray} is one of the @code{staticpro()}d things.
4600 Therefore, all functions and variables get marked through this. 4609 Therefore, all functions and variables get marked through this.
4601 @item 4610 @item
4602 Any shadowed bindings that are sitting on the @code{specpdl} stack. 4611 Any shadowed bindings that are sitting on the @code{specpdl} stack.
4727 anything that looks like a reference to an object as a reference. This 4736 anything that looks like a reference to an object as a reference. This
4728 will result in a few objects not getting collected when they should, but 4737 will result in a few objects not getting collected when they should, but
4729 it obviates the need for @code{GCPRO}ing, and allows garbage collection 4738 it obviates the need for @code{GCPRO}ing, and allows garbage collection
4730 to happen at any point at all, such as during object allocation. 4739 to happen at any point at all, such as during object allocation.
4731 4740
4732 @node Garbage Collection - Step by Step 4741 @node Garbage Collection - Step by Step, Integers and Characters, GCPROing, Allocation of Objects in XEmacs Lisp
4733 @section Garbage Collection - Step by Step 4742 @section Garbage Collection - Step by Step
4734 @cindex garbage collection step by step 4743 @cindex garbage collection step by step
4735 4744
4736 @menu 4745 @menu
4737 * Invocation:: 4746 * Invocation::
4742 * compact_string_chars:: 4751 * compact_string_chars::
4743 * sweep_strings:: 4752 * sweep_strings::
4744 * sweep_bit_vectors_1:: 4753 * sweep_bit_vectors_1::
4745 @end menu 4754 @end menu
4746 4755
4747 @node Invocation 4756 @node Invocation, garbage_collect_1, Garbage Collection - Step by Step, Garbage Collection - Step by Step
4748 @subsection Invocation 4757 @subsection Invocation
4749 @cindex garbage collection, invocation 4758 @cindex garbage collection, invocation
4750 4759
4751 The first thing that anyone should know about garbage collection is: 4760 The first thing that anyone should know about garbage collection is:
4752 when and how the garbage collector is invoked. One might think that this 4761 when and how the garbage collector is invoked. One might think that this
4753 could happen every time new memory is allocated, e.g. new objects are 4762 could happen every time new memory is allocated, e.g. new objects are
4754 created, but this is @emph{not} the case. Instead, we have the following 4763 created, but this is @emph{not} the case. Instead, we have the following
4755 situation: 4764 situation:
4756 4765
4757 The entry point of any process of garbage collection is an invocation 4766 The entry point of any process of garbage collection is an invocation
4758 of the function @code{garbage_collect_1} in file @code{alloc.c}. The 4767 of the function @code{garbage_collect_1} in file @code{alloc.c}. The
4759 invocation can occur @emph{explicitly} by calling the function 4768 invocation can occur @emph{explicitly} by calling the function
4760 @code{Fgarbage_collect} (in addition this function provides information 4769 @code{Fgarbage_collect} (in addition this function provides information
4761 about the freed memory), or can occur @emph{implicitly} in four different 4770 about the freed memory), or can occur @emph{implicitly} in four different
4762 situations: 4771 situations:
4763 @enumerate 4772 @enumerate
4764 @item 4773 @item
4765 In function @code{main_1} in file @code{emacs.c}. This function is called 4774 In function @code{main_1} in file @code{emacs.c}. This function is called
4766 at each startup of xemacs. The garbage collection is invoked after all 4775 at each startup of xemacs. The garbage collection is invoked after all
4767 initial creations are completed, but only if a special internal error 4776 initial creations are completed, but only if a special internal error
4768 checking-constant @code{ERROR_CHECK_GC} is defined. 4777 checking-constant @code{ERROR_CHECK_GC} is defined.
4769 @item 4778 @item
4770 In function @code{disksave_object_finalization} in file 4779 In function @code{disksave_object_finalization} in file
4771 @code{alloc.c}. The only purpose of this function is to clear the 4780 @code{alloc.c}. The only purpose of this function is to clear the
4772 objects from memory which need not be stored with xemacs when we dump out 4781 objects from memory which need not be stored with xemacs when we dump out
4773 an executable. This is only done by @code{Fdump_emacs} or by 4782 an executable. This is only done by @code{Fdump_emacs} or by
4774 @code{Fdump_emacs_data} respectively (both in @code{emacs.c}). The 4783 @code{Fdump_emacs_data} respectively (both in @code{emacs.c}). The
4775 actual clearing is accomplished by making these objects unreachable and 4784 actual clearing is accomplished by making these objects unreachable and
4776 starting a garbage collection. The function is only used while building 4785 starting a garbage collection. The function is only used while building
4777 xemacs. 4786 xemacs.
4791 @code{Feval}. 4800 @code{Feval}.
4792 @end enumerate 4801 @end enumerate
4793 4802
4794 The upshot is that garbage collection can basically occur everywhere 4803 The upshot is that garbage collection can basically occur everywhere
4795 @code{Feval}, respectively @code{Ffuncall}, is used - either directly or 4804 @code{Feval}, respectively @code{Ffuncall}, is used - either directly or
4796 through another function. Since calls to these two functions are 4805 through another function. Since calls to these two functions are hidden
4797 hidden in various other functions, many calls to 4806 in various other functions, many calls to @code{garbage_collect_1} are
4798 @code{garabge_collect_1} are not obviously foreseeable, and therefore 4807 not obviously foreseeable, and therefore unexpected. Instances where
4799 unexpected. Instances where they are used that are worth remembering are 4808 they are used that are worth remembering are various elisp commands, as
4800 various elisp commands, as for example @code{or}, 4809 for example @code{or}, @code{and}, @code{if}, @code{cond}, @code{while},
4801 @code{and}, @code{if}, @code{cond}, @code{while}, @code{setq}, etc., 4810 @code{setq}, etc., miscellaneous @code{gui_item_...} functions,
4802 miscellaneous @code{gui_item_...} functions, everything related to 4811 everything related to @code{eval} (@code{Feval_buffer}, @code{call0},
4803 @code{eval} (@code{Feval_buffer}, @code{call0}, ...) and inside 4812 ...) and inside @code{Fsignal}. The latter is used to handle signals, as
4804 @code{Fsignal}. The latter is used to handle signals, as for example the 4813 for example the ones raised by every @code{QUITE}-macro triggered after
4805 ones raised by every @code{QUITE}-macro triggered after pressing Ctrl-g. 4814 pressing Ctrl-g.
4806 4815
4807 @node garbage_collect_1 4816 @node garbage_collect_1, mark_object, Invocation, Garbage Collection - Step by Step
4808 @subsection @code{garbage_collect_1} 4817 @subsection @code{garbage_collect_1}
4809 @cindex @code{garbage_collect_1} 4818 @cindex @code{garbage_collect_1}
4810 4819
4811 We can now describe exactly what happens after the invocation takes 4820 We can now describe exactly what happens after the invocation takes
4812 place. 4821 place.
4813 @enumerate 4822 @enumerate
4814 @item 4823 @item
4815 There are several cases in which the garbage collector is left immediately: 4824 There are several cases in which the garbage collector is left immediately:
4816 when we are already garbage collecting (@code{gc_in_progress}), when 4825 when we are already garbage collecting (@code{gc_in_progress}), when
4817 the garbage collection is somehow forbidden 4826 the garbage collection is somehow forbidden
4818 (@code{gc_currently_forbidden}), when we are currently displaying something 4827 (@code{gc_currently_forbidden}), when we are currently displaying something
4819 (@code{in_display}) or when we are preparing for the armageddon of the 4828 (@code{in_display}) or when we are preparing for the armageddon of the
4820 whole system (@code{preparing_for_armageddon}). 4829 whole system (@code{preparing_for_armageddon}).
4821 @item 4830 @item
4822 Next the correct frame in which to put 4831 Next the correct frame in which to put
4823 all the output occurring during garbage collecting is determined. In 4832 all the output occurring during garbage collecting is determined. In
4824 order to be able to restore the old display's state after displaying the 4833 order to be able to restore the old display's state after displaying the
4825 message, some data about the current cursor position has to be 4834 message, some data about the current cursor position has to be
4826 saved. The variables @code{pre_gc_curser} and @code{cursor_changed} take 4835 saved. The variables @code{pre_gc_cursor} and @code{cursor_changed} take
4827 care of that. 4836 care of that.
4828 @item 4837 @item
4829 The state of @code{gc_currently_forbidden} must be restored after 4838 The state of @code{gc_currently_forbidden} must be restored after
4830 the garbage collection, no matter what happens during the process. We 4839 the garbage collection, no matter what happens during the process. We
4831 accomplish this by @code{record_unwind_protect}ing the suitable function 4840 accomplish this by @code{record_unwind_protect}ing the suitable function
4832 @code{restore_gc_inhibit} together with the current value of 4841 @code{restore_gc_inhibit} together with the current value of
4833 @code{gc_currently_forbidden}. 4842 @code{gc_currently_forbidden}.
4834 @item 4843 @item
4835 If we are concurrently running an interactive xemacs session, the next step 4844 If we are concurrently running an interactive xemacs session, the next step
4836 is simply to show the garbage collector's cursor/message. 4845 is simply to show the garbage collector's cursor/message.
4837 @item 4846 @item
4838 The following steps are the intrinsic steps of the garbage collector, 4847 The following steps are the intrinsic steps of the garbage collector,
4842 frame. However, this seems to be a currently unused feature. 4851 frame. However, this seems to be a currently unused feature.
4843 @item 4852 @item
4844 Before actually starting to go over all live objects, references to 4853 Before actually starting to go over all live objects, references to
4845 objects that are no longer used are pruned. We only have to do this for events 4854 objects that are no longer used are pruned. We only have to do this for events
4846 (@code{clear_event_resource}) and for specifiers 4855 (@code{clear_event_resource}) and for specifiers
4847 (@code{cleanup_specifiers}). 4856 (@code{cleanup_specifiers}).
4848 @item 4857 @item
4849 Now the mark phase begins and marks all accessible elements. In order to 4858 Now the mark phase begins and marks all accessible elements. In order to
4850 start from 4859 start from
4851 all slots that serve as roots of accessibility, the function 4860 all slots that serve as roots of accessibility, the function
4852 @code{mark_object} is called for each root individually to go out from 4861 @code{mark_object} is called for each root individually to go out from
4854 shown in their processed order: 4863 shown in their processed order:
4855 @itemize @bullet 4864 @itemize @bullet
4856 @item 4865 @item
4857 all constant symbols and static variables that are registered via 4866 all constant symbols and static variables that are registered via
4858 @code{staticpro}@ in the array @code{staticvec}. 4867 @code{staticpro}@ in the array @code{staticvec}.
4859 @xref{Adding Global Lisp Variables}. 4868 @xref{Adding Global Lisp Variables}.
4860 @item 4869 @item
4861 all Lisp objects that are created in C functions and that must be 4870 all Lisp objects that are created in C functions and that must be
4862 protected from freeing them. They are registered in the global 4871 protected from freeing them. They are registered in the global
4863 list @code{gcprolist}. 4872 list @code{gcprolist}.
4864 @xref{GCPROing}. 4873 @xref{GCPROing}.
4865 @item 4874 @item
4866 all local variables (i.e. their name fields @code{symbol} and old 4875 all local variables (i.e. their name fields @code{symbol} and old
4867 values @code{old_values}) that are bound during the evaluation by the Lisp 4876 values @code{old_values}) that are bound during the evaluation by the Lisp
4868 engine. They are stored in @code{specbinding} structs pushed on a stack 4877 engine. They are stored in @code{specbinding} structs pushed on a stack
4869 called @code{specpdl}. 4878 called @code{specpdl}.
4870 @xref{Dynamic Binding; The specbinding Stack; Unwind-Protects}. 4879 @xref{Dynamic Binding; The specbinding Stack; Unwind-Protects}.
4873 cause the creation of structs @code{catchtag} inserted in the list 4882 cause the creation of structs @code{catchtag} inserted in the list
4874 @code{catchlist}. Their tag (@code{tag}) and value (@code{val} fields 4883 @code{catchlist}. Their tag (@code{tag}) and value (@code{val} fields
4875 are freshly created objects and therefore have to be marked. 4884 are freshly created objects and therefore have to be marked.
4876 @xref{Catch and Throw}. 4885 @xref{Catch and Throw}.
4877 @item 4886 @item
4878 every function application pushes new structs @code{backtrace} 4887 every function application pushes new structs @code{backtrace}
4879 on the call stack of the Lisp engine (@code{backtrace_list}). The unique 4888 on the call stack of the Lisp engine (@code{backtrace_list}). The unique
4880 parts that have to be marked are the fields for each function 4889 parts that have to be marked are the fields for each function
4881 (@code{function}) and all their arguments (@code{args}). 4890 (@code{function}) and all their arguments (@code{args}).
4882 @xref{Evaluation}. 4891 @xref{Evaluation}.
4883 @item 4892 @item
4884 all objects that are used by the redisplay engine that must not be freed 4893 all objects that are used by the redisplay engine that must not be freed
4885 are marked by a special function called @code{mark_redisplay} (in 4894 are marked by a special function called @code{mark_redisplay} (in
4886 @code{redisplay.c}). 4895 @code{redisplay.c}).
4887 @item 4896 @item
4888 all objects created for profiling purposes are allocated by C functions 4897 all objects created for profiling purposes are allocated by C functions
4889 instead of using the lisp allocation mechanisms. In order to receive the 4898 instead of using the lisp allocation mechanisms. In order to receive the
4897 during the estimation of the live objects during garbage collection. 4906 during the estimation of the live objects during garbage collection.
4898 Any object referenced only by weak pointers is collected 4907 Any object referenced only by weak pointers is collected
4899 anyway, and the reference to it is cleared. In hash tables there are 4908 anyway, and the reference to it is cleared. In hash tables there are
4900 different usage patterns of them, manifesting in different types of hash 4909 different usage patterns of them, manifesting in different types of hash
4901 tables, namely 'non-weak', 'weak', 'key-weak' and 'value-weak' 4910 tables, namely 'non-weak', 'weak', 'key-weak' and 'value-weak'
4902 (internally also 'key-car-weak' and 'value-car-weak') hash tables, each 4911 (internally also 'key-car-weak' and 'value-car-weak') hash tables, each
4903 clearing entries depending on different conditions. More information can 4912 clearing entries depending on different conditions. More information can
4904 be found in the documentation to the function @code{make-hash-table}. 4913 be found in the documentation to the function @code{make-hash-table}.
4905 4914
4906 Because there are complicated dependency rules about when and what to 4915 Because there are complicated dependency rules about when and what to
4907 mark while processing weak hash tables, the standard @code{marker} 4916 mark while processing weak hash tables, the standard @code{marker}
4908 method is only active if it is marking non-weak hash tables. As soon as 4917 method is only active if it is marking non-weak hash tables. As soon as
4909 a weak component is in the table, the hash table entries are ignored 4918 a weak component is in the table, the hash table entries are ignored
4910 while marking. Instead their marking is done each separately by the 4919 while marking. Instead their marking is done each separately by the
4911 function @code{finish_marking_weak_hash_tables}. This function iterates 4920 function @code{finish_marking_weak_hash_tables}. This function iterates
4912 over each hash table entry @code{hentries} for each weak hash table in 4921 over each hash table entry @code{hentries} for each weak hash table in
4913 @code{Vall_weak_hash_tables}. Depending on the type of a table, the 4922 @code{Vall_weak_hash_tables}. Depending on the type of a table, the
4914 appropriate action is performed. 4923 appropriate action is performed.
4915 If a table is acting as @code{HASH_TABLE_KEY_WEAK}, and a key already marked, 4924 If a table is acting as @code{HASH_TABLE_KEY_WEAK}, and a key already marked,
4916 everything reachable from the @code{value} component is marked. If it is 4925 everything reachable from the @code{value} component is marked. If it is
4917 acting as a @code{HASH_TABLE_VALUE_WEAK} and the value component is 4926 acting as a @code{HASH_TABLE_VALUE_WEAK} and the value component is
4918 already marked, the marking starts beginning only from the 4927 already marked, the marking starts beginning only from the
4919 @code{key} component. 4928 @code{key} component.
4920 If it is a @code{HASH_TABLE_KEY_CAR_WEAK} and the car 4929 If it is a @code{HASH_TABLE_KEY_CAR_WEAK} and the car
4921 of the key entry is already marked, we mark both the @code{key} and 4930 of the key entry is already marked, we mark both the @code{key} and
4922 @code{value} components. 4931 @code{value} components.
4923 Finally, if the table is of the type @code{HASH_TABLE_VALUE_CAR_WEAK} 4932 Finally, if the table is of the type @code{HASH_TABLE_VALUE_CAR_WEAK}
4924 and the car of the value components is already marked, again both the 4933 and the car of the value components is already marked, again both the
4925 @code{key} and the @code{value} components get marked. 4934 @code{key} and the @code{value} components get marked.
4927 Again, there are lists with comparable properties called weak 4936 Again, there are lists with comparable properties called weak
4928 lists. There exist different peculiarities of their types called 4937 lists. There exist different peculiarities of their types called
4929 @code{simple}, @code{assoc}, @code{key-assoc} and 4938 @code{simple}, @code{assoc}, @code{key-assoc} and
4930 @code{value-assoc}. You can find further details about them in the 4939 @code{value-assoc}. You can find further details about them in the
4931 description to the function @code{make-weak-list}. The scheme of their 4940 description to the function @code{make-weak-list}. The scheme of their
4932 marking is similar: all weak lists are listed in @code{Qall_weak_lists}, 4941 marking is similar: all weak lists are listed in @code{Qall_weak_lists},
4933 therefore we iterate over them. The marking is advanced until we hit an 4942 therefore we iterate over them. The marking is advanced until we hit an
4934 already marked pair. Then we know that during a former run all 4943 already marked pair. Then we know that during a former run all
4935 the rest has been marked completely. Again, depending on the special 4944 the rest has been marked completely. Again, depending on the special
4936 type of the weak list, our jobs differ. If it is a @code{WEAK_LIST_SIMPLE} 4945 type of the weak list, our jobs differ. If it is a @code{WEAK_LIST_SIMPLE}
4937 and the elem is marked, we mark the @code{cons} part. If it is a 4946 and the elem is marked, we mark the @code{cons} part. If it is a
4938 @code{WEAK_LIST_ASSOC} and not a pair or a pair with both marked car and 4947 @code{WEAK_LIST_ASSOC} and not a pair or a pair with both marked car and
4939 cdr, we mark the @code{cons} and the @code{elem}. If it is a 4948 cdr, we mark the @code{cons} and the @code{elem}. If it is a
4942 a @code{WEAK_LIST_VALUE_ASSOC} and not a pair or a pair with a marked 4951 a @code{WEAK_LIST_VALUE_ASSOC} and not a pair or a pair with a marked
4943 cdr of the elem, we mark both the @code{cons} and the @code{elem}. 4952 cdr of the elem, we mark both the @code{cons} and the @code{elem}.
4944 4953
4945 Since, by marking objects in reach from weak hash tables and weak lists, 4954 Since, by marking objects in reach from weak hash tables and weak lists,
4946 other objects could get marked, this perhaps implies further marking of 4955 other objects could get marked, this perhaps implies further marking of
4947 other weak objects, both finishing functions are redone as long as 4956 other weak objects, both finishing functions are redone as long as
4948 yet unmarked objects get freshly marked. 4957 yet unmarked objects get freshly marked.
4949 4958
4950 @item 4959 @item
4951 After completing the special marking for the weak hash tables and for the weak 4960 After completing the special marking for the weak hash tables and for the weak
4952 lists, all entries that point to objects that are going to be swept in 4961 lists, all entries that point to objects that are going to be swept in
4954 the table or the list. 4963 the table or the list.
4955 4964
4956 The function @code{prune_weak_hash_tables} does the job for weak hash 4965 The function @code{prune_weak_hash_tables} does the job for weak hash
4957 tables. Totally unmarked hash tables are removed from the list 4966 tables. Totally unmarked hash tables are removed from the list
4958 @code{Vall_weak_hash_tables}. The other ones are treated more carefully 4967 @code{Vall_weak_hash_tables}. The other ones are treated more carefully
4959 by scanning over all entries and removing one as soon as one of 4968 by scanning over all entries and removing one as soon as one of
4960 the components @code{key} and @code{value} is unmarked. 4969 the components @code{key} and @code{value} is unmarked.
4961 4970
4962 The same idea applies to the weak lists. It is accomplished by 4971 The same idea applies to the weak lists. It is accomplished by
4963 @code{prune_weak_lists}: An unmarked list is pruned from 4972 @code{prune_weak_lists}: An unmarked list is pruned from
4964 @code{Vall_weak_lists} immediately. A marked list is treated more 4973 @code{Vall_weak_lists} immediately. A marked list is treated more
4965 carefully by going over it and removing just the unmarked pairs. 4974 carefully by going over it and removing just the unmarked pairs.
4966 4975
4967 @item 4976 @item
4968 The function @code{prune_specifiers} checks all listed specifiers held 4977 The function @code{prune_specifiers} checks all listed specifiers held
4969 in @code{Vall_speficiers} and removes the ones from the lists that are 4978 in @code{Vall_specifiers} and removes the ones from the lists that are
4970 unmarked. 4979 unmarked.
4971 4980
4972 @item 4981 @item
4973 All syntax tables are stored in a list called 4982 All syntax tables are stored in a list called
4974 @code{Vall_syntax_tables}. The function @code{prune_syntax_tables} walks 4983 @code{Vall_syntax_tables}. The function @code{prune_syntax_tables} walks
4975 through it and unlinks the tables that are unmarked. 4984 through it and unlinks the tables that are unmarked.
4976 4985
4977 @item 4986 @item
4978 Next, we will attack the complete sweeping - the function 4987 Next, we will attack the complete sweeping - the function
4979 @code{gc_sweep} which holds the predominance. 4988 @code{gc_sweep} which holds the predominance.
4980 @item 4989 @item
4981 First, all the variables with respect to garbage collection are 4990 First, all the variables with respect to garbage collection are
4982 reset. @code{consing_since_gc} - the counter of the created cells since 4991 reset. @code{consing_since_gc} - the counter of the created cells since
4983 the last garbage collection - is set back to 0, and 4992 the last garbage collection - is set back to 0, and
4984 @code{gc_in_progress} is not @code{true} anymore. 4993 @code{gc_in_progress} is not @code{true} anymore.
4985 @item 4994 @item
4986 In case the session is interactive, the displayed cursor and message are 4995 In case the session is interactive, the displayed cursor and message are
4987 removed again. 4996 removed again.
4988 @item 4997 @item
4989 The state of @code{gc_inhibit} is restored to the former value by 4998 The state of @code{gc_inhibit} is restored to the former value by
4990 unwinding the stack. 4999 unwinding the stack.
4991 @item 5000 @item
4992 A small memory reserve is always held back that can be reached by 5001 A small memory reserve is always held back that can be reached by
4993 @code{breathing_space}. If nothing more is left, we create a new reserve 5002 @code{breathing_space}. If nothing more is left, we create a new reserve
4994 and exit. 5003 and exit.
4995 @end enumerate 5004 @end enumerate
4996 5005
4997 @node mark_object 5006 @node mark_object, gc_sweep, garbage_collect_1, Garbage Collection - Step by Step
4998 @subsection @code{mark_object} 5007 @subsection @code{mark_object}
4999 @cindex @code{mark_object} 5008 @cindex @code{mark_object}
5000 5009
5001 The first thing that is checked while marking an object is whether the 5010 The first thing that is checked while marking an object is whether the
5002 object is a real Lisp object @code{Lisp_Type_Record} or just an integer 5011 object is a real Lisp object @code{Lisp_Type_Record} or just an integer
5003 or a character. Integers and characters are the only two types that are 5012 or a character. Integers and characters are the only two types that are
5004 stored directly - without another level of indirection, and therefore they 5013 stored directly - without another level of indirection, and therefore they
5005 don't have to be marked and collected. 5014 don't have to be marked and collected.
5006 @xref{How Lisp Objects Are Represented in C}. 5015 @xref{How Lisp Objects Are Represented in C}.
5007 5016
5008 The second case is the one we have to handle. It is the one when we are 5017 The second case is the one we have to handle. It is the one when we are
5009 dealing with a pointer to a Lisp object. But, there exist also three 5018 dealing with a pointer to a Lisp object. But, there exist also three
5010 possibilities, that prevent us from doing anything while marking: The 5019 possibilities, that prevent us from doing anything while marking: The
5011 object is read only which prevents it from being garbage collected, 5020 object is read only which prevents it from being garbage collected,
5012 i.e. marked (@code{C_READONLY_RECORD_HEADER}). The object in question is 5021 i.e. marked (@code{C_READONLY_RECORD_HEADER}). The object in question is
5013 already marked, and need not be marked for the second time (checked by 5022 already marked, and need not be marked for the second time (checked by
5014 @code{MARKED_RECORD_HEADER_P}). If it is a special, unmarkable object 5023 @code{MARKED_RECORD_HEADER_P}). If it is a special, unmarkable object
5015 (@code{UNMARKABLE_RECORD_HEADER_P}, apparently, these are objects that 5024 (@code{UNMARKABLE_RECORD_HEADER_P}, apparently, these are objects that
5016 sit in some CONST space, and can therefore not be marked, see 5025 sit in some const space, and can therefore not be marked, see
5017 @code{this_one_is_unmarkable} in @code{alloc.c}). 5026 @code{this_one_is_unmarkable} in @code{alloc.c}).
5018 5027
5019 Now, the actual marking is feasible. We do so by once using the macro 5028 Now, the actual marking is feasible. We do so by once using the macro
5020 @code{MARK_RECORD_HEADER} to mark the object itself (actually the 5029 @code{MARK_RECORD_HEADER} to mark the object itself (actually the
5021 special flag in the lrecord header), and calling its special marker 5030 special flag in the lrecord header), and calling its special marker
5022 "method" @code{marker} if available. The marker method marks every 5031 "method" @code{marker} if available. The marker method marks every
5023 other object that is in reach from our current object. Note, that these 5032 other object that is in reach from our current object. Note, that these
5024 marker methods should not call @code{mark_object} recursively, but 5033 marker methods should not call @code{mark_object} recursively, but
5025 instead should return the next object from where further marking has to 5034 instead should return the next object from where further marking has to
5026 be performed. 5035 be performed.
5027 5036
5028 In case another object was returned, as mentioned before, we reiterate 5037 In case another object was returned, as mentioned before, we reiterate
5029 the whole @code{mark_object} process beginning with this next object. 5038 the whole @code{mark_object} process beginning with this next object.
5030 5039
5031 @node gc_sweep 5040 @node gc_sweep, sweep_lcrecords_1, mark_object, Garbage Collection - Step by Step
5032 @subsection @code{gc_sweep} 5041 @subsection @code{gc_sweep}
5033 @cindex @code{gc_sweep} 5042 @cindex @code{gc_sweep}
5034 5043
5035 The job of this function is to free all unmarked records from memory. As 5044 The job of this function is to free all unmarked records from memory. As
5036 we know, there are different types of objects implemented and managed, and 5045 we know, there are different types of objects implemented and managed, and
5037 consequently different ways to free them from memory. 5046 consequently different ways to free them from memory.
5038 @xref{Introduction to Allocation}. 5047 @xref{Introduction to Allocation}.
5039 5048
5040 We start with all objects stored through @code{lcrecords}. All 5049 We start with all objects stored through @code{lcrecords}. All
5041 bulkier objects are allocated and handled using that scheme of 5050 bulkier objects are allocated and handled using that scheme of
5042 @code{lcrecords}. Each object is @code{malloc}ed separately 5051 @code{lcrecords}. Each object is @code{malloc}ed separately
5043 instead of placing it in one of the contiguous frob blocks. All types 5052 instead of placing it in one of the contiguous frob blocks. All types
5044 that are currently stored 5053 that are currently stored
5045 using @code{lcrecords}'s @code{alloc_lcrecord} and 5054 using @code{lcrecords}'s @code{alloc_lcrecord} and
5046 @code{make_lcrecord_list} are the types: vectors, buffers, 5055 @code{make_lcrecord_list} are the types: vectors, buffers,
5047 char-table, char-table-entry, console, weak-list, database, device, 5056 char-table, char-table-entry, console, weak-list, database, device,
5048 ldap, hash-table, command-builder, extent-auxiliary, extent-info, face, 5057 ldap, hash-table, command-builder, extent-auxiliary, extent-info, face,
5049 coding-system, frame, image-instance, glyph, popup-data, gui-item, 5058 coding-system, frame, image-instance, glyph, popup-data, gui-item,
5057 doing the whole job for us. 5066 doing the whole job for us.
5058 For a description about the internals: @xref{lrecords}. 5067 For a description about the internals: @xref{lrecords}.
5059 5068
5060 Our next candidates are the other objects that behave quite differently 5069 Our next candidates are the other objects that behave quite differently
5061 than everything else: the strings. They consists of two parts, a 5070 than everything else: the strings. They consists of two parts, a
5062 fixed-size portion (@code{struct Lisp_string}) holding the string's 5071 fixed-size portion (@code{struct Lisp_String}) holding the string's
5063 length, its property list and a pointer to the second part, and the 5072 length, its property list and a pointer to the second part, and the
5064 actual string data, which is stored in string-chars blocks comparable to 5073 actual string data, which is stored in string-chars blocks comparable to
5065 frob blocks. In this block, the data is not only freed, but also a 5074 frob blocks. In this block, the data is not only freed, but also a
5066 compression of holes is made, i.e. all strings are relocated together. 5075 compression of holes is made, i.e. all strings are relocated together.
5067 @xref{String}. This compacting phase is performed by the function 5076 @xref{String}. This compacting phase is performed by the function
5074 @code{sweep_symbols}, @code{sweep_extents}, @code{sweep_markers} and 5083 @code{sweep_symbols}, @code{sweep_extents}, @code{sweep_markers} and
5075 @code{sweep_extents}. They are the fixed-size types cons, floats, 5084 @code{sweep_extents}. They are the fixed-size types cons, floats,
5076 compiled-functions, symbol, marker, extent, and event stored in 5085 compiled-functions, symbol, marker, extent, and event stored in
5077 so-called "frob blocks", and therefore we can basically do the same on 5086 so-called "frob blocks", and therefore we can basically do the same on
5078 every type objects, using the same macros, especially defined only to 5087 every type objects, using the same macros, especially defined only to
5079 handle everything with respect to fixed-size blocks. The only fixed-size 5088 handle everything with respect to fixed-size blocks. The only fixed-size
5080 type that is not handled here are the fixed-size portion of strings, 5089 type that is not handled here are the fixed-size portion of strings,
5081 because we took special care of them earlier. 5090 because we took special care of them earlier.
5082 5091
5083 The only big exceptions are bit vectors stored differently and 5092 The only big exceptions are bit vectors stored differently and
5084 therefore treated differently by the function @code{sweep_bit_vectors_1} 5093 therefore treated differently by the function @code{sweep_bit_vectors_1}
5085 described later. 5094 described later.
5086 5095
5087 At first, we need some brief information about how 5096 At first, we need some brief information about how
5088 these fixed-size types are managed in general, in order to understand 5097 these fixed-size types are managed in general, in order to understand
5089 how the sweeping is done. They have all a fixed size, and are therefore 5098 how the sweeping is done. They have all a fixed size, and are therefore
5090 stored in big blocks of memory - allocated at once - that can hold a 5099 stored in big blocks of memory - allocated at once - that can hold a
5091 certain amount of objects of one type. The macro 5100 certain amount of objects of one type. The macro
5092 @code{DECLARE_FIXED_TYPE_ALLOC} creates the suitable structures for 5101 @code{DECLARE_FIXED_TYPE_ALLOC} creates the suitable structures for
5093 every type. More precisely, we have the block struct 5102 every type. More precisely, we have the block struct
5094 (holding a pointer to the previous block @code{prev} and the 5103 (holding a pointer to the previous block @code{prev} and the
5095 objects in @code{block[]}), a pointer to current block 5104 objects in @code{block[]}), a pointer to current block
5096 (@code{current_..._block)}) and its last index 5105 (@code{current_..._block)}) and its last index
5097 (@code{current_..._block_index}), and a pointer to the free list that 5106 (@code{current_..._block_index}), and a pointer to the free list that
5098 will be created. Also a macro @code{FIXED_TYPE_FROM_BLOCK} plus some 5107 will be created. Also a macro @code{FIXED_TYPE_FROM_BLOCK} plus some
5104 The rest works as follows: all of them define a 5113 The rest works as follows: all of them define a
5105 macro @code{UNMARK_...} that is used to unmark the object. They define a 5114 macro @code{UNMARK_...} that is used to unmark the object. They define a
5106 macro @code{ADDITIONAL_FREE_...} that defines additional work that has 5115 macro @code{ADDITIONAL_FREE_...} that defines additional work that has
5107 to be done when converting an object from in use to not in use (so far, 5116 to be done when converting an object from in use to not in use (so far,
5108 only markers use it in order to unchain them). Then, they all call 5117 only markers use it in order to unchain them). Then, they all call
5109 the macro @code{SWEEP_FIXED_TYPE_BLOCK} instantiated with their type name 5118 the macro @code{SWEEP_FIXED_TYPE_BLOCK} instantiated with their type name
5110 and their struct name. 5119 and their struct name.
5111 5120
5112 This call in particular does the following: we go over all blocks 5121 This call in particular does the following: we go over all blocks
5113 starting with the current moving towards the oldest. 5122 starting with the current moving towards the oldest.
5114 For each block, we look at every object in it. If the object already 5123 For each block, we look at every object in it. If the object already
5115 freed (checked with @code{FREE_STRUCT_P} using the first pointer of the 5124 freed (checked with @code{FREE_STRUCT_P} using the first pointer of the
5116 object), or if it is 5125 object), or if it is
5117 set to read only (@code{C_READONLY_RECORD_HEADER_P}, nothing must be 5126 set to read only (@code{C_READONLY_RECORD_HEADER_P}, nothing must be
5118 done. If it is unmarked (checked with @code{MARKED_RECORD_HEADER_P}), it 5127 done. If it is unmarked (checked with @code{MARKED_RECORD_HEADER_P}), it
5119 is put in the free list and set free (using the macro 5128 is put in the free list and set free (using the macro
5120 @code{FREE_FIXED_TYPE}, otherwise it stays in the block, but is unmarked 5129 @code{FREE_FIXED_TYPE}, otherwise it stays in the block, but is unmarked
5121 (by @code{UNMARK_...}). While going through one block, we note if the 5130 (by @code{UNMARK_...}). While going through one block, we note if the
5122 whole block is empty. If so, the whole block is freed (using 5131 whole block is empty. If so, the whole block is freed (using
5123 @code{xfree}) and the free list state is set to the state it had before 5132 @code{xfree}) and the free list state is set to the state it had before
5124 handling this block. 5133 handling this block.
5125 5134
5126 @node sweep_lcrecords_1 5135 @node sweep_lcrecords_1, compact_string_chars, gc_sweep, Garbage Collection - Step by Step
5127 @subsection @code{sweep_lcrecords_1} 5136 @subsection @code{sweep_lcrecords_1}
5128 @cindex @code{sweep_lcrecords_1} 5137 @cindex @code{sweep_lcrecords_1}
5129 5138
5130 After nullifying the complete lcrecord statistics, we go over all 5139 After nullifying the complete lcrecord statistics, we go over all
5131 lcrecords two separate times. They are all chained together in a list with 5140 lcrecords two separate times. They are all chained together in a list with
5132 a head called @code{all_lcrecords}. 5141 a head called @code{all_lcrecords}.
5133 5142
5134 The first loop calls for each object its @code{finalizer} method, but only 5143 The first loop calls for each object its @code{finalizer} method, but only
5135 in the case that it is not read only 5144 in the case that it is not read only
5136 (@code{C_READONLY_RECORD_HEADER_P)}, it is not already marked 5145 (@code{C_READONLY_RECORD_HEADER_P)}, it is not already marked
5137 (@code{MARKED_RECORD_HEADER_P}), it is not already in a free list (list of 5146 (@code{MARKED_RECORD_HEADER_P}), it is not already in a free list (list of
5138 freed objects, field @code{free}) and finally it owns a finalizer 5147 freed objects, field @code{free}) and finally it owns a finalizer
5139 method. 5148 method.
5140 5149
5141 The second loop actually frees the appropriate objects again by iterating 5150 The second loop actually frees the appropriate objects again by iterating
5142 through the whole list. In case an object is read only or marked, it 5151 through the whole list. In case an object is read only or marked, it
5143 has to persist, otherwise it is manually freed by calling 5152 has to persist, otherwise it is manually freed by calling
5144 @code{xfree}. During this loop, the lcrecord statistics are kept up to 5153 @code{xfree}. During this loop, the lcrecord statistics are kept up to
5145 date by calling @code{tick_lcrecord_stats} with the right arguments, 5154 date by calling @code{tick_lcrecord_stats} with the right arguments,
5146 5155
5147 @node compact_string_chars 5156 @node compact_string_chars, sweep_strings, sweep_lcrecords_1, Garbage Collection - Step by Step
5148 @subsection @code{compact_string_chars} 5157 @subsection @code{compact_string_chars}
5149 @cindex @code{compact_string_chars} 5158 @cindex @code{compact_string_chars}
5150 5159
5151 The purpose of this function is to compact all the data parts of the 5160 The purpose of this function is to compact all the data parts of the
5152 strings that are held in so-called @code{string_chars_block}, i.e. the 5161 strings that are held in so-called @code{string_chars_block}, i.e. the
5154 5163
5155 The procedure with which this is done is as follows. We are keeping two 5164 The procedure with which this is done is as follows. We are keeping two
5156 positions in the @code{string_chars_block}s using two pointer/integer 5165 positions in the @code{string_chars_block}s using two pointer/integer
5157 pairs, namely @code{from_sb}/@code{from_pos} and 5166 pairs, namely @code{from_sb}/@code{from_pos} and
5158 @code{to_sb}/@code{to_pos}. They stand for the actual positions, from 5167 @code{to_sb}/@code{to_pos}. They stand for the actual positions, from
5159 where to where, to copy the actually handled string. 5168 where to where, to copy the actually handled string.
5160 5169
5161 While going over all chained @code{string_char_block}s and their held 5170 While going over all chained @code{string_char_block}s and their held
5162 strings, staring at @code{first_string_chars_block}, both pointers 5171 strings, staring at @code{first_string_chars_block}, both pointers
5163 are advanced and eventually a string is copied from @code{from_sb} to 5172 are advanced and eventually a string is copied from @code{from_sb} to
5164 @code{to_sb}, depending on the status of the pointed at strings. 5173 @code{to_sb}, depending on the status of the pointed at strings.
5165 5174
5166 More precisely, we can distinguish between the following actions. 5175 More precisely, we can distinguish between the following actions.
5167 @itemize @bullet 5176 @itemize @bullet
5168 @item 5177 @item
5169 The string at @code{from_sb}'s position could be marked as free, which 5178 The string at @code{from_sb}'s position could be marked as free, which
5170 is indicated by an invalid pointer to the pointer that should point back 5179 is indicated by an invalid pointer to the pointer that should point back
5171 to the fixed size string object, and which is checked by 5180 to the fixed size string object, and which is checked by
5172 @code{FREE_STRUCT_P}. In this case, the @code{from_sb}/@code{from_pos} 5181 @code{FREE_STRUCT_P}. In this case, the @code{from_sb}/@code{from_pos}
5173 is advanced to the next string, and nothing has to be copied. 5182 is advanced to the next string, and nothing has to be copied.
5174 @item 5183 @item
5175 Also, if a string object itself is unmarked, nothing has to be 5184 Also, if a string object itself is unmarked, nothing has to be
5176 copied. We likewise advance the @code{from_sb}/@code{from_pos} 5185 copied. We likewise advance the @code{from_sb}/@code{from_pos}
5177 pair as described above. 5186 pair as described above.
5178 @item 5187 @item
5179 In all other cases, we have a marked string at hand. The string data 5188 In all other cases, we have a marked string at hand. The string data
5180 must be moved from the from-position to the to-position. In case 5189 must be moved from the from-position to the to-position. In case
5181 there is not enough space in the actual @code{to_sb}-block, we advance 5190 there is not enough space in the actual @code{to_sb}-block, we advance
5182 this pointer to the beginning of the next block before copying. In case the 5191 this pointer to the beginning of the next block before copying. In case the
5183 from and to positions are different, we perform the 5192 from and to positions are different, we perform the
5184 actual copying using the library function @code{memmove}. 5193 actual copying using the library function @code{memmove}.
5188 @code{string_chars_block}, sitting in @code{current_string_chars_block}, 5197 @code{string_chars_block}, sitting in @code{current_string_chars_block},
5189 is reset on the last block to which we moved a string, 5198 is reset on the last block to which we moved a string,
5190 i.e. @code{to_block}, and all remaining blocks (we know that they just 5199 i.e. @code{to_block}, and all remaining blocks (we know that they just
5191 carry garbage) are explicitly @code{xfree}d. 5200 carry garbage) are explicitly @code{xfree}d.
5192 5201
5193 @node sweep_strings 5202 @node sweep_strings, sweep_bit_vectors_1, compact_string_chars, Garbage Collection - Step by Step
5194 @subsection @code{sweep_strings} 5203 @subsection @code{sweep_strings}
5195 @cindex @code{sweep_strings} 5204 @cindex @code{sweep_strings}
5196 5205
5197 The sweeping for the fixed sized string objects is essentially exactly 5206 The sweeping for the fixed sized string objects is essentially exactly
5198 the same as it is for all other fixed size types. As before, the freeing 5207 the same as it is for all other fixed size types. As before, the freeing
5200 @code{SWEEP_FIXED_SIZE_BLOCK} after defining the right macros 5209 @code{SWEEP_FIXED_SIZE_BLOCK} after defining the right macros
5201 @code{UNMARK_string} and @code{ADDITIONAL_FREE_string}. These two 5210 @code{UNMARK_string} and @code{ADDITIONAL_FREE_string}. These two
5202 definitions are a little bit special compared to the ones used 5211 definitions are a little bit special compared to the ones used
5203 for the other fixed size types. 5212 for the other fixed size types.
5204 5213
5205 @code{UNMARK_string} is defined the same way except some additional code 5214 @code{UNMARK_string} is defined the same way except some additional code
5206 used for updating the bookkeeping information. 5215 used for updating the bookkeeping information.
5207 5216
5208 For strings, @code{ADDITIONAL_FREE_string} has to do something in 5217 For strings, @code{ADDITIONAL_FREE_string} has to do something in
5209 addition: in case, the string was not allocated in a 5218 addition: in case, the string was not allocated in a
5210 @code{string_chars_block} because it exceeded the maximal length, and 5219 @code{string_chars_block} because it exceeded the maximal length, and
5211 therefore it was @code{malloc}ed separately, we know also @code{xfree} 5220 therefore it was @code{malloc}ed separately, we know also @code{xfree}
5212 it explicitly. 5221 it explicitly.
5213 5222
5214 @node sweep_bit_vectors_1 5223 @node sweep_bit_vectors_1, , sweep_strings, Garbage Collection - Step by Step
5215 @subsection @code{sweep_bit_vectors_1} 5224 @subsection @code{sweep_bit_vectors_1}
5216 @cindex @code{sweep_bit_vectors_1} 5225 @cindex @code{sweep_bit_vectors_1}
5217 5226
5218 Bit vectors are also one of the rare types that are @code{malloc}ed 5227 Bit vectors are also one of the rare types that are @code{malloc}ed
5219 individually. Consequently, while sweeping, all further needless 5228 individually. Consequently, while sweeping, all further needless
5220 bit vectors must be freed by hand. This is done, as one might imagine, 5229 bit vectors must be freed by hand. This is done, as one might imagine,
5221 the expected way: since they are all registered in a list called 5230 the expected way: since they are all registered in a list called
5222 @code{all_bit_vectors}, all elements of that list are traversed, 5231 @code{all_bit_vectors}, all elements of that list are traversed,
5223 all unmarked bit vectors are unlinked by calling @code{xfree} and all of 5232 all unmarked bit vectors are unlinked by calling @code{xfree} and all of
5224 them become unmarked. 5233 them become unmarked.
5225 In addition, the bookkeeping information used for garbage 5234 In addition, the bookkeeping information used for garbage
5226 collector's output purposes is updated. 5235 collector's output purposes is updated.
5227 5236
5228 @node Integers and Characters 5237 @node Integers and Characters, Allocation from Frob Blocks, Garbage Collection - Step by Step, Allocation of Objects in XEmacs Lisp
5229 @section Integers and Characters 5238 @section Integers and Characters
5230 5239
5231 Integer and character Lisp objects are created from integers using the 5240 Integer and character Lisp objects are created from integers using the
5232 macros @code{XSETINT()} and @code{XSETCHAR()} or the equivalent 5241 macros @code{XSETINT()} and @code{XSETCHAR()} or the equivalent
5233 functions @code{make_int()} and @code{make_char()}. (These are actually 5242 functions @code{make_int()} and @code{make_char()}. (These are actually
5237 5246
5238 @code{XSETINT()} and the like will truncate values given to them that 5247 @code{XSETINT()} and the like will truncate values given to them that
5239 are too big; i.e. you won't get the value you expected but the tag bits 5248 are too big; i.e. you won't get the value you expected but the tag bits
5240 will at least be correct. 5249 will at least be correct.
5241 5250
5242 @node Allocation from Frob Blocks 5251 @node Allocation from Frob Blocks, lrecords, Integers and Characters, Allocation of Objects in XEmacs Lisp
5243 @section Allocation from Frob Blocks 5252 @section Allocation from Frob Blocks
5244 5253
5245 The uninitialized memory required by a @code{Lisp_Object} of a particular type 5254 The uninitialized memory required by a @code{Lisp_Object} of a particular type
5246 is allocated using 5255 is allocated using
5247 @code{ALLOCATE_FIXED_TYPE()}. This only occurs inside of the 5256 @code{ALLOCATE_FIXED_TYPE()}. This only occurs inside of the
5264 @code{ALLOCATE_FIXED_TYPE_FROM_BLOCK()}, which looks at the end of the 5273 @code{ALLOCATE_FIXED_TYPE_FROM_BLOCK()}, which looks at the end of the
5265 last frob block for space, and creates a new frob block if there is 5274 last frob block for space, and creates a new frob block if there is
5266 none. (There are actually two versions of these macros, one of which is 5275 none. (There are actually two versions of these macros, one of which is
5267 more defensive but less efficient and is used for error-checking.) 5276 more defensive but less efficient and is used for error-checking.)
5268 5277
5269 @node lrecords 5278 @node lrecords, Low-level allocation, Allocation from Frob Blocks, Allocation of Objects in XEmacs Lisp
5270 @section lrecords 5279 @section lrecords
5271 5280
5272 [see @file{lrecord.h}] 5281 [see @file{lrecord.h}]
5273 5282
5274 All lrecords have at the beginning of their structure a @code{struct 5283 All lrecords have at the beginning of their structure a @code{struct
5275 lrecord_header}. This just contains a pointer to a @code{struct 5284 lrecord_header}. This just contains a type number and some flags,
5285 including the mark bit. All builtin type numbers are defined as
5286 constants in @code{enum lrecord_type}, to allow the compiler to generate
5287 more efficient code for @code{@var{type}P}. The type number, thru the
5288 @code{lrecord_implementation_table}, gives access to a @code{struct
5276 lrecord_implementation}, which is a structure containing method pointers 5289 lrecord_implementation}, which is a structure containing method pointers
5277 and such. There is one of these for each type, and it is a global, 5290 and such. There is one of these for each type, and it is a global,
5278 constant, statically-declared structure that is declared in the 5291 constant, statically-declared structure that is declared in the
5279 @code{DEFINE_LRECORD_IMPLEMENTATION()} macro. (This macro actually 5292 @code{DEFINE_LRECORD_IMPLEMENTATION()} macro.
5280 declares an array of two @code{struct lrecord_implementation} 5293
5281 structures. The first one contains all the standard method pointers, 5294 Simple lrecords (of type (b) above) just have a @code{struct
5282 and is used in all normal circumstances. During garbage collection,
5283 however, the lrecord is @dfn{marked} by bumping its implementation
5284 pointer by one, so that it points to the second structure in the array.
5285 This structure contains a special indication in it that it's a
5286 @dfn{marked-object} structure: the finalize method is the special
5287 function @code{this_marks_a_marked_record()}, and all other methods are
5288 null pointers. At the end of garbage collection, all lrecords will
5289 either be reclaimed or unmarked by decrementing their implementation
5290 pointers, so this second structure pointer will never remain past
5291 garbage collection.
5292
5293 Simple lrecords (of type (c) above) just have a @code{struct
5294 lrecord_header} at their beginning. lcrecords, however, actually have a 5295 lrecord_header} at their beginning. lcrecords, however, actually have a
5295 @code{struct lcrecord_header}. This, in turn, has a @code{struct 5296 @code{struct lcrecord_header}. This, in turn, has a @code{struct
5296 lrecord_header} at its beginning, so sanity is preserved; but it also 5297 lrecord_header} at its beginning, so sanity is preserved; but it also
5297 has a pointer used to chain all lcrecords together, and a special ID 5298 has a pointer used to chain all lcrecords together, and a special ID
5298 field used to distinguish one lcrecord from another. (This field is used 5299 field used to distinguish one lcrecord from another. (This field is used
5316 type. 5317 type.
5317 5318
5318 Whenever you create an lrecord, you need to call either 5319 Whenever you create an lrecord, you need to call either
5319 @code{DEFINE_LRECORD_IMPLEMENTATION()} or 5320 @code{DEFINE_LRECORD_IMPLEMENTATION()} or
5320 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()}. This needs to be 5321 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()}. This needs to be
5321 specified in a C file, at the top level. What this actually does is 5322 specified in a @file{.c} file, at the top level. What this actually
5322 define and initialize the implementation structure for the lrecord. (And 5323 does is define and initialize the implementation structure for the
5323 possibly declares a function @code{error_check_foo()} that implements 5324 lrecord. (And possibly declares a function @code{error_check_foo()} that
5324 the @code{XFOO()} macro when error-checking is enabled.) The arguments 5325 implements the @code{XFOO()} macro when error-checking is enabled.) The
5325 to the macros are the actual type name (this is used to construct the C 5326 arguments to the macros are the actual type name (this is used to
5326 variable name of the lrecord implementation structure and related 5327 construct the C variable name of the lrecord implementation structure
5327 structures using the @samp{##} macro concatenation operator), a string 5328 and related structures using the @samp{##} macro concatenation
5328 that names the type on the Lisp level (this may not be the same as the C 5329 operator), a string that names the type on the Lisp level (this may not
5329 type name; typically, the C type name has underscores, while the Lisp 5330 be the same as the C type name; typically, the C type name has
5330 string has dashes), various method pointers, and the name of the C 5331 underscores, while the Lisp string has dashes), various method pointers,
5331 structure that contains the object. The methods are used to encapsulate 5332 and the name of the C structure that contains the object. The methods
5332 type-specific information about the object, such as how to print it or 5333 are used to encapsulate type-specific information about the object, such
5333 mark it for garbage collection, so that it's easy to add new object 5334 as how to print it or mark it for garbage collection, so that it's easy
5334 types without having to add a specific case for each new type in a bunch 5335 to add new object types without having to add a specific case for each
5335 of different places. 5336 new type in a bunch of different places.
5336 5337
5337 The difference between @code{DEFINE_LRECORD_IMPLEMENTATION()} and 5338 The difference between @code{DEFINE_LRECORD_IMPLEMENTATION()} and
5338 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()} is that the former is 5339 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()} is that the former is
5339 used for fixed-size object types and the latter is for variable-size 5340 used for fixed-size object types and the latter is for variable-size
5340 object types. Most object types are fixed-size; some complex 5341 object types. Most object types are fixed-size; some complex
5344 (Currently this is only used for keeping allocation statistics.) 5345 (Currently this is only used for keeping allocation statistics.)
5345 5346
5346 For the purpose of keeping allocation statistics, the allocation 5347 For the purpose of keeping allocation statistics, the allocation
5347 engine keeps a list of all the different types that exist. Note that, 5348 engine keeps a list of all the different types that exist. Note that,
5348 since @code{DEFINE_LRECORD_IMPLEMENTATION()} is a macro that is 5349 since @code{DEFINE_LRECORD_IMPLEMENTATION()} is a macro that is
5349 specified at top-level, there is no way for it to add to the list of all 5350 specified at top-level, there is no way for it to initialize the global
5350 existing types. What happens instead is that each implementation 5351 data structures containing type information, like
5351 structure contains in it a dynamically assigned number that is 5352 @code{lrecord_implementations_table}. For this reason a call to
5352 particular to that type. (Or rather, it contains a pointer to another 5353 @code{INIT_LRECORD_IMPLEMENTATION} must be added to the same source file
5353 structure that contains this number. This evasiveness is done so that 5354 containing @code{DEFINE_LRECORD_IMPLEMENTATION}, but instead of to the
5354 the implementation structure can be declared const.) In the sweep stage 5355 top level, to one of the init functions, typically
5355 of garbage collection, each lrecord is examined to see if its 5356 @code{syms_of_@var{foo}.c}. @code{INIT_LRECORD_IMPLEMENTATION} must be
5356 implementation structure has its dynamically-assigned number set. If 5357 called before an object of this type is used.
5357 not, it must be a new type, and it is added to the list of known types 5358
5358 and a new number assigned. The number is used to index into an array 5359 The type number is also used to index into an array holding the number
5359 holding the number of objects of each type and the total memory 5360 of objects of each type and the total memory allocated for objects of
5360 allocated for objects of that type. The statistics in this array are 5361 that type. The statistics in this array are computed during the sweep
5361 also computed during the sweep stage. These statistics are returned by 5362 stage. These statistics are returned by the call to
5362 the call to @code{garbage-collect} and are printed out at the end of the 5363 @code{garbage-collect}.
5363 loadup phase.
5364 5364
5365 Note that for every type defined with a @code{DEFINE_LRECORD_*()} 5365 Note that for every type defined with a @code{DEFINE_LRECORD_*()}
5366 macro, there needs to be a @code{DECLARE_LRECORD_IMPLEMENTATION()} 5366 macro, there needs to be a @code{DECLARE_LRECORD_IMPLEMENTATION()}
5367 somewhere in a @file{.h} file, and this @file{.h} file needs to be 5367 somewhere in a @file{.h} file, and this @file{.h} file needs to be
5368 included by @file{inline.c}. 5368 included by @file{inline.c}.
5369 5369
5370 Furthermore, there should generally be a set of @code{XFOOBAR()}, 5370 Furthermore, there should generally be a set of @code{XFOOBAR()},
5371 @code{FOOBARP()}, etc. macros in a @file{.h} (or occasionally @file{.c}) 5371 @code{FOOBARP()}, etc. macros in a @file{.h} (or occasionally @file{.c})
5372 file. To create one of these, copy an existing model and modify as 5372 file. To create one of these, copy an existing model and modify as
5373 necessary. 5373 necessary.
5374
5375 @strong{Please note:} If you define an lrecord in an external
5376 dynamically-loaded module, you must use @code{DECLARE_EXTERNAL_LRECORD},
5377 @code{DEFINE_EXTERNAL_LRECORD_IMPLEMENTATION}, and
5378 @code{DEFINE_EXTERNAL_LRECORD_SEQUENCE_IMPLEMENTATION} instead of the
5379 non-EXTERNAL forms. These macros will dynamically add new type numbers
5380 to the global enum that records them, whereas the non-EXTERNAL forms
5381 assume that the programmer has already inserted the correct type numbers
5382 into the enum's code at compile-time.
5374 5383
5375 The various methods in the lrecord implementation structure are: 5384 The various methods in the lrecord implementation structure are:
5376 5385
5377 @enumerate 5386 @enumerate
5378 @item 5387 @item
5503 (i.e. declared with a @code{_SEQUENCE_IMPLEMENTATION} macro.) This should 5512 (i.e. declared with a @code{_SEQUENCE_IMPLEMENTATION} macro.) This should
5504 simply return the object's size in bytes, exactly as you might expect. 5513 simply return the object's size in bytes, exactly as you might expect.
5505 For an example, see the methods for window configurations and opaques. 5514 For an example, see the methods for window configurations and opaques.
5506 @end enumerate 5515 @end enumerate
5507 5516
5508 @node Low-level allocation 5517 @node Low-level allocation, Cons, lrecords, Allocation of Objects in XEmacs Lisp
5509 @section Low-level allocation 5518 @section Low-level allocation
5510 5519
5511 Memory that you want to allocate directly should be allocated using 5520 Memory that you want to allocate directly should be allocated using
5512 @code{xmalloc()} rather than @code{malloc()}. This implements 5521 @code{xmalloc()} rather than @code{malloc()}. This implements
5513 error-checking on the return value, and once upon a time did some more 5522 error-checking on the return value, and once upon a time did some more
5564 XEmacs taps into them and issues a warning through the standard 5573 XEmacs taps into them and issues a warning through the standard
5565 warning system, when memory gets to 75%, 85%, and 95% full. 5574 warning system, when memory gets to 75%, 85%, and 95% full.
5566 (On some systems, the memory warnings are not functional.) 5575 (On some systems, the memory warnings are not functional.)
5567 5576
5568 Allocated memory that is going to be used to make a Lisp object 5577 Allocated memory that is going to be used to make a Lisp object
5569 is created using @code{allocate_lisp_storage()}. This calls @code{xmalloc()} 5578 is created using @code{allocate_lisp_storage()}. This just calls
5570 but also verifies that the pointer to the memory can fit into 5579 @code{xmalloc()}. It used to verify that the pointer to the memory can
5571 a Lisp word (remember that some bits are taken away for a type 5580 fit into a Lisp word, before the current Lisp object representation was
5572 tag and a mark bit). If not, an error is issued through @code{memory_full()}. 5581 introduced. @code{allocate_lisp_storage()} is called by
5573 @code{allocate_lisp_storage()} is called by @code{alloc_lcrecord()}, 5582 @code{alloc_lcrecord()}, @code{ALLOCATE_FIXED_TYPE()}, and the vector
5574 @code{ALLOCATE_FIXED_TYPE()}, and the vector and bit-vector creation 5583 and bit-vector creation routines. These routines also call
5575 routines. These routines also call @code{INCREMENT_CONS_COUNTER()} at the 5584 @code{INCREMENT_CONS_COUNTER()} at the appropriate times; this keeps
5576 appropriate times; this keeps statistics on how much memory is 5585 statistics on how much memory is allocated, so that garbage-collection
5577 allocated, so that garbage-collection can be invoked when the 5586 can be invoked when the threshold is reached.
5578 threshold is reached. 5587
5579 5588 @node Cons, Vector, Low-level allocation, Allocation of Objects in XEmacs Lisp
5580 @node Pure Space
5581 @section Pure Space
5582
5583 Not yet documented.
5584
5585 @node Cons
5586 @section Cons 5589 @section Cons
5587 5590
5588 Conses are allocated in standard frob blocks. The only thing to 5591 Conses are allocated in standard frob blocks. The only thing to
5589 note is that conses can be explicitly freed using @code{free_cons()} 5592 note is that conses can be explicitly freed using @code{free_cons()}
5590 and associated functions @code{free_list()} and @code{free_alist()}. This 5593 and associated functions @code{free_list()} and @code{free_alist()}. This
5594 generating extra objects and thereby triggering GC sooner. 5597 generating extra objects and thereby triggering GC sooner.
5595 However, you have to be @emph{extremely} careful when doing this. 5598 However, you have to be @emph{extremely} careful when doing this.
5596 If you mess this up, you will get BADLY BURNED, and it has happened 5599 If you mess this up, you will get BADLY BURNED, and it has happened
5597 before. 5600 before.
5598 5601
5599 @node Vector 5602 @node Vector, Bit Vector, Cons, Allocation of Objects in XEmacs Lisp
5600 @section Vector 5603 @section Vector
5601 5604
5602 As mentioned above, each vector is @code{malloc()}ed individually, and 5605 As mentioned above, each vector is @code{malloc()}ed individually, and
5603 all are threaded through the variable @code{all_vectors}. Vectors are 5606 all are threaded through the variable @code{all_vectors}. Vectors are
5604 marked strangely during garbage collection, by kludging the size field. 5607 marked strangely during garbage collection, by kludging the size field.
5605 Note that the @code{struct Lisp_Vector} is declared with its 5608 Note that the @code{struct Lisp_Vector} is declared with its
5606 @code{contents} field being a @emph{stretchy} array of one element. It 5609 @code{contents} field being a @emph{stretchy} array of one element. It
5607 is actually @code{malloc()}ed with the right size, however, and access 5610 is actually @code{malloc()}ed with the right size, however, and access
5608 to any element through the @code{contents} array works fine. 5611 to any element through the @code{contents} array works fine.
5609 5612
5610 @node Bit Vector 5613 @node Bit Vector, Symbol, Vector, Allocation of Objects in XEmacs Lisp
5611 @section Bit Vector 5614 @section Bit Vector
5612 5615
5613 Bit vectors work exactly like vectors, except for more complicated 5616 Bit vectors work exactly like vectors, except for more complicated
5614 code to access an individual bit, and except for the fact that bit 5617 code to access an individual bit, and except for the fact that bit
5615 vectors are lrecords while vectors are not. (The only difference here is 5618 vectors are lrecords while vectors are not. (The only difference here is
5616 that there's an lrecord implementation pointer at the beginning and the 5619 that there's an lrecord implementation pointer at the beginning and the
5617 tag field in bit vector Lisp words is ``lrecord'' rather than 5620 tag field in bit vector Lisp words is ``lrecord'' rather than
5618 ``vector''.) 5621 ``vector''.)
5619 5622
5620 @node Symbol 5623 @node Symbol, Marker, Bit Vector, Allocation of Objects in XEmacs Lisp
5621 @section Symbol 5624 @section Symbol
5622 5625
5623 Symbols are also allocated in frob blocks. Note that the code 5626 Symbols are also allocated in frob blocks. Symbols in the awful
5624 exists for symbols to be either lrecords (category (c) above) 5627 horrible obarray structure are chained through their @code{next} field.
5625 or simple types (category (b) above), and are lrecords by
5626 default (I think), although there is no good reason for this.
5627
5628 Note that symbols in the awful horrible obarray structure are
5629 chained through their @code{next} field.
5630 5628
5631 Remember that @code{intern} looks up a symbol in an obarray, creating 5629 Remember that @code{intern} looks up a symbol in an obarray, creating
5632 one if necessary. 5630 one if necessary.
5633 5631
5634 @node Marker 5632 @node Marker, String, Symbol, Allocation of Objects in XEmacs Lisp
5635 @section Marker 5633 @section Marker
5636 5634
5637 Markers are allocated in frob blocks, as usual. They are kept 5635 Markers are allocated in frob blocks, as usual. They are kept
5638 in a buffer unordered, but in a doubly-linked list so that they 5636 in a buffer unordered, but in a doubly-linked list so that they
5639 can easily be removed. (Formerly this was a singly-linked list, 5637 can easily be removed. (Formerly this was a singly-linked list,
5640 but in some cases garbage collection took an extraordinarily 5638 but in some cases garbage collection took an extraordinarily
5641 long time due to the O(N^2) time required to remove lots of 5639 long time due to the O(N^2) time required to remove lots of
5642 markers from a buffer.) Markers are removed from a buffer in 5640 markers from a buffer.) Markers are removed from a buffer in
5643 the finalize stage, in @code{ADDITIONAL_FREE_marker()}. 5641 the finalize stage, in @code{ADDITIONAL_FREE_marker()}.
5644 5642
5645 @node String 5643 @node String, Compiled Function, Marker, Allocation of Objects in XEmacs Lisp
5646 @section String 5644 @section String
5647 5645
5648 As mentioned above, strings are a special case. A string is logically 5646 As mentioned above, strings are a special case. A string is logically
5649 two parts, a fixed-size object (containing the length, property list, 5647 two parts, a fixed-size object (containing the length, property list,
5650 and a pointer to the actual data), and the actual data in the string. 5648 and a pointer to the actual data), and the actual data in the string.
5701 string data (which would normally be obtained from the now-non-existent 5699 string data (which would normally be obtained from the now-non-existent
5702 @code{struct Lisp_String}) at the beginning of the dead string data gap. 5700 @code{struct Lisp_String}) at the beginning of the dead string data gap.
5703 The string compactor recognizes this special 0xFFFFFFFF marker and 5701 The string compactor recognizes this special 0xFFFFFFFF marker and
5704 handles it correctly. 5702 handles it correctly.
5705 5703
5706 @node Compiled Function 5704 @node Compiled Function, , String, Allocation of Objects in XEmacs Lisp
5707 @section Compiled Function 5705 @section Compiled Function
5708 5706
5709 Not yet documented. 5707 Not yet documented.
5710 5708
5711 @node Events and the Event Loop, Evaluation; Stack Frames; Bindings, Allocation of Objects in XEmacs Lisp, Top 5709
5710 @node Dumping, Events and the Event Loop, Allocation of Objects in XEmacs Lisp, Top
5711 @chapter Dumping
5712
5713 @section What is dumping and its justification
5714
5715 The C code of XEmacs is just a Lisp engine with a lot of built-in
5716 primitives useful for writing an editor. The editor itself is written
5717 mostly in Lisp, and represents around 100K lines of code. Loading and
5718 executing the initialization of all this code takes a bit a time (five
5719 to ten times the usual startup time of current xemacs) and requires
5720 having all the lisp source files around. Having to reload them each
5721 time the editor is started would not be acceptable.
5722
5723 The traditional solution to this problem is called dumping: the build
5724 process first creates the lisp engine under the name @file{temacs}, then
5725 runs it until it has finished loading and initializing all the lisp
5726 code, and eventually creates a new executable called @file{xemacs}
5727 including both the object code in @file{temacs} and all the contents of
5728 the memory after the initialization.
5729
5730 This solution, while working, has a huge problem: the creation of the
5731 new executable from the actual contents of memory is an extremely
5732 system-specific process, quite error-prone, and which interferes with a
5733 lot of system libraries (like malloc). It is even getting worse
5734 nowadays with libraries using constructors which are automatically
5735 called when the program is started (even before main()) which tend to
5736 crash when they are called multiple times, once before dumping and once
5737 after (IRIX 6.x libz.so pulls in some C++ image libraries thru
5738 dependencies which have this problem). Writing the dumper is also one
5739 of the most difficult parts of porting XEmacs to a new operating system.
5740 Basically, `dumping' is an operation that is just not officially
5741 supported on many operating systems.
5742
5743 The aim of the portable dumper is to solve the same problem as the
5744 system-specific dumper, that is to be able to reload quickly, using only
5745 a small number of files, the fully initialized lisp part of the editor,
5746 without any system-specific hacks.
5747
5748 @menu
5749 * Overview::
5750 * Data descriptions::
5751 * Dumping phase::
5752 * Reloading phase::
5753 * Remaining issues::
5754 @end menu
5755
5756 @node Overview, Data descriptions, Dumping, Dumping
5757 @section Overview
5758
5759 The portable dumping system has to:
5760
5761 @enumerate
5762 @item
5763 At dump time, write all initialized, non-quickly-rebuildable data to a
5764 file [Note: currently named @file{xemacs.dmp}, but the name will
5765 change], along with all informations needed for the reloading.
5766
5767 @item
5768 When starting xemacs, reload the dump file, relocate it to its new
5769 starting address if needed, and reinitialize all pointers to this
5770 data. Also, rebuild all the quickly rebuildable data.
5771 @end enumerate
5772
5773 @node Data descriptions, Dumping phase, Overview, Dumping
5774 @section Data descriptions
5775
5776 The more complex task of the dumper is to be able to write lisp objects
5777 (lrecords) and C structs to disk and reload them at a different address,
5778 updating all the pointers they include in the process. This is done by
5779 using external data descriptions that give information about the layout
5780 of the structures in memory.
5781
5782 The specification of these descriptions is in lrecord.h. A description
5783 of an lrecord is an array of struct lrecord_description. Each of these
5784 structs include a type, an offset in the structure and some optional
5785 parameters depending on the type. For instance, here is the string
5786 description:
5787
5788 @example
5789 static const struct lrecord_description string_description[] = @{
5790 @{ XD_BYTECOUNT, offsetof (Lisp_String, size) @},
5791 @{ XD_OPAQUE_DATA_PTR, offsetof (Lisp_String, data), XD_INDIRECT(0, 1) @},
5792 @{ XD_LISP_OBJECT, offsetof (Lisp_String, plist) @},
5793 @{ XD_END @}
5794 @};
5795 @end example
5796
5797 The first line indicates a member of type Bytecount, which is used by
5798 the next, indirect directive. The second means "there is a pointer to
5799 some opaque data in the field @code{data}". The length of said data is
5800 given by the expression @code{XD_INDIRECT(0, 1)}, which means "the value
5801 in the 0th line of the description (welcome to C) plus one". The third
5802 line means "there is a Lisp_Object member @code{plist} in the Lisp_String
5803 structure". @code{XD_END} then ends the description.
5804
5805 This gives us all the information we need to move around what is pointed
5806 to by a structure (C or lrecord) and, by transitivity, everything that
5807 it points to. The only missing information for dumping is the size of
5808 the structure. For lrecords, this is part of the
5809 lrecord_implementation, so we don't need to duplicate it. For C
5810 structures we use a struct struct_description, which includes a size
5811 field and a pointer to an associated array of lrecord_description.
5812
5813 @node Dumping phase, Reloading phase, Data descriptions, Dumping
5814 @section Dumping phase
5815
5816 Dumping is done by calling the function pdump() (in dumper.c) which is
5817 invoked from Fdump_emacs (in emacs.c). This function performs a number
5818 of tasks.
5819
5820 @menu
5821 * Object inventory::
5822 * Address allocation::
5823 * The header::
5824 * Data dumping::
5825 * Pointers dumping::
5826 @end menu
5827
5828 @node Object inventory, Address allocation, Dumping phase, Dumping phase
5829 @subsection Object inventory
5830
5831 The first task is to build the list of the objects to dump. This
5832 includes:
5833
5834 @itemize @bullet
5835 @item lisp objects
5836 @item C structures
5837 @end itemize
5838
5839 We end up with one @code{pdump_entry_list_elmt} per object group (arrays
5840 of C structs are kept together) which includes a pointer to the first
5841 object of the group, the per-object size and the count of objects in the
5842 group, along with some other information which is initialized later.
5843
5844 These entries are linked together in @code{pdump_entry_list} structures
5845 and can be enumerated thru either:
5846
5847 @enumerate
5848 @item
5849 the @code{pdump_object_table}, an array of @code{pdump_entry_list}, one
5850 per lrecord type, indexed by type number.
5851
5852 @item
5853 the @code{pdump_opaque_data_list}, used for the opaque data which does
5854 not include pointers, and hence does not need descriptions.
5855
5856 @item
5857 the @code{pdump_struct_table}, which is a vector of
5858 @code{struct_description}/@code{pdump_entry_list} pairs, used for
5859 non-opaque C structures.
5860 @end enumerate
5861
5862 This uses a marking strategy similar to the garbage collector. Some
5863 differences though:
5864
5865 @enumerate
5866 @item
5867 We do not use the mark bit (which does not exist for C structures
5868 anyway), we use a big hash table instead.
5869
5870 @item
5871 We do not use the mark function of lrecords but instead rely on the
5872 external descriptions. This happens essentially because we need to
5873 follow pointers to C structures and opaque data in addition to
5874 Lisp_Object members.
5875 @end enumerate
5876
5877 This is done by @code{pdump_register_object}, which handles Lisp_Object
5878 variables, and pdump_register_struct which handles C structures, which
5879 both delegate the description management to pdump_register_sub.
5880
5881 The hash table doubles as a map object to pdump_entry_list_elmt (i.e.
5882 allows us to look up a pdump_entry_list_elmt with the object it points
5883 to). Entries are added with @code{pdump_add_entry()} and looked up with
5884 @code{pdump_get_entry()}. There is no need for entry removal. The hash
5885 value is computed quite basically from the object pointer by
5886 @code{pdump_make_hash()}.
5887
5888 The roots for the marking are:
5889
5890 @enumerate
5891 @item
5892 the @code{staticpro}'ed variables (there is a special @code{staticpro_nodump()}
5893 call for protected variables we do not want to dump).
5894
5895 @item
5896 the @code{pdump_wire}'d variables (@code{staticpro} is equivalent to
5897 @code{staticpro_nodump()} + @code{pdump_wire()}).
5898
5899 @item
5900 the @code{dumpstruct}'ed variables, which points to C structures.
5901 @end enumerate
5902
5903 This does not include the GCPRO'ed variables, the specbinds, the
5904 catchtags, the backlist, the redisplay or the profiling info, since we
5905 do not want to rebuild the actual chain of lisp calls which end up to
5906 the dump-emacs call, only the global variables.
5907
5908 Weak lists and weak hash tables are dumped as if they were their
5909 non-weak equivalent (without changing their type, of course). This has
5910 not yet been a problem.
5911
5912 @node Address allocation, The header, Object inventory, Dumping phase
5913 @subsection Address allocation
5914
5915
5916 The next step is to allocate the offsets of each of the objects in the
5917 final dump file. This is done by @code{pdump_allocate_offset()} which
5918 is called indirectly by @code{pdump_scan_by_alignment()}.
5919
5920 The strategy to deal with alignment problems uses these facts:
5921
5922 @enumerate
5923 @item
5924 real world alignment requirements are powers of two.
5925
5926 @item
5927 the C compiler is required to adjust the size of a struct so that you
5928 can have an array of them next to each other. This means you can have a
5929 upper bound of the alignment requirements of a given structure by
5930 looking at which power of two its size is a multiple.
5931
5932 @item
5933 the non-variant part of variable size lrecords has an alignment
5934 requirement of 4.
5935 @end enumerate
5936
5937 Hence, for each lrecord type, C struct type or opaque data block the
5938 alignment requirement is computed as a power of two, with a minimum of
5939 2^2 for lrecords. @code{pdump_scan_by_alignment()} then scans all the
5940 @code{pdump_entry_list_elmt}'s, the ones with the highest requirements
5941 first. This ensures the best packing.
5942
5943 The maximum alignment requirement we take into account is 2^8.
5944
5945 @code{pdump_allocate_offset()} only has to do a linear allocation,
5946 starting at offset 256 (this leaves room for the header and keep the
5947 alignments happy).
5948
5949 @node The header, Data dumping, Address allocation, Dumping phase
5950 @subsection The header
5951
5952 The next step creates the file and writes a header with a signature and
5953 some random informations in it (number of staticpro, number of assigned
5954 lrecord types, etc...). The reloc_address field, which indicates at
5955 which address the file should be loaded if we want to avoid post-reload
5956 relocation, is set to 0. It then seeks to offset 256 (base offset for
5957 the objects).
5958
5959 @node Data dumping, Pointers dumping, The header, Dumping phase
5960 @subsection Data dumping
5961
5962 The data is dumped in the same order as the addresses were allocated by
5963 @code{pdump_dump_data()}, called from @code{pdump_scan_by_alignment()}.
5964 This function copies the data to a temporary buffer, relocates all
5965 pointers in the object to the addresses allocated in step Address
5966 Allocation, and writes it to the file. Using the same order means that,
5967 if we are careful with lrecords whose size is not a multiple of 4, we
5968 are ensured that the object is always written at the offset in the file
5969 allocated in step Address Allocation.
5970
5971 @node Pointers dumping, , Data dumping, Dumping phase
5972 @subsection Pointers dumping
5973
5974 A bunch of tables needed to reassign properly the global pointers are
5975 then written. They are:
5976
5977 @enumerate
5978 @item
5979 the staticpro array
5980 @item
5981 the dumpstruct array
5982 @item
5983 the lrecord_implementation_table array
5984 @item
5985 a vector of all the offsets to the objects in the file that include a
5986 description (for faster relocation at reload time)
5987 @item
5988 the pdump_wired and pdump_wired_list arrays
5989 @end enumerate
5990
5991 For each of the arrays we write both the pointer to the variables and
5992 the relocated offset of the object they point to. Since these variables
5993 are global, the pointers are still valid when restarting the program and
5994 are used to regenerate the global pointers.
5995
5996 The @code{pdump_wired_list} array is a special case. The variables it
5997 points to are the head of weak linked lists of lisp objects of the same
5998 type. Not all objects of this list are dumped so the relocated pointer
5999 we associate with them points to the first dumped object of the list, or
6000 Qnil if none is available. This is also the reason why they are not
6001 used as roots for the purpose of object enumeration.
6002
6003 This is the end of the dumping part.
6004
6005 @node Reloading phase, Remaining issues, Dumping phase, Dumping
6006 @section Reloading phase
6007
6008 @subsection File loading
6009
6010 The file is mmap'ed in memory (which ensures a PAGESIZE alignment, at
6011 least 4096), or if mmap is unavailable or fails, a 256-bytes aligned
6012 malloc is done and the file is loaded.
6013
6014 Some variables are reinitialized from the values found in the header.
6015
6016 The difference between the actual loading address and the reloc_address
6017 is computed and will be used for all the relocations.
6018
6019
6020 @subsection Putting back the staticvec
6021
6022 The staticvec array is memcpy'd from the file and the variables it
6023 points to are reset to the relocated objects addresses.
6024
6025
6026 @subsection Putting back the dumpstructed variables
6027
6028 The variables pointed to by dumpstruct in the dump phase are reset to
6029 the right relocated object addresses.
6030
6031
6032 @subsection lrecord_implementations_table
6033
6034 The lrecord_implementations_table is reset to its dump time state and
6035 the right lrecord_type_index values are put in.
6036
6037
6038 @subsection Object relocation
6039
6040 All the objects are relocated using their description and their offset
6041 by @code{pdump_reloc_one}. This step is unnecessary if the
6042 reloc_address is equal to the file loading address.
6043
6044
6045 @subsection Putting back the pdump_wire and pdump_wire_list variables
6046
6047 Same as Putting back the dumpstructed variables.
6048
6049
6050 @subsection Reorganize the hash tables
6051
6052 Since some of the hash values in the lisp hash tables are
6053 address-dependent, their layout is now wrong. So we go through each of
6054 them and have them resorted by calling @code{pdump_reorganize_hash_table}.
6055
6056 @node Remaining issues, , Reloading phase, Dumping
6057 @section Remaining issues
6058
6059 The build process will have to start a post-dump xemacs, ask it the
6060 loading address (which will, hopefully, be always the same between
6061 different xemacs invocations) and relocate the file to the new address.
6062 This way the object relocation phase will not have to be done, which
6063 means no writes in the objects and that, because of the use of mmap, the
6064 dumped data will be shared between all the xemacs running on the
6065 computer.
6066
6067 Some executable signature will be necessary to ensure that a given dump
6068 file is really associated with a given executable, or random crashes
6069 will occur. Maybe a random number set at compile or configure time thru
6070 a define. This will also allow for having differently-compiled xemacsen
6071 on the same system (mule and no-mule comes to mind).
6072
6073 The DOC file contents should probably end up in the dump file.
6074
6075
6076 @node Events and the Event Loop, Evaluation; Stack Frames; Bindings, Dumping, Top
5712 @chapter Events and the Event Loop 6077 @chapter Events and the Event Loop
5713 6078
5714 @menu 6079 @menu
5715 * Introduction to Events:: 6080 * Introduction to Events::
5716 * Main Loop:: 6081 * Main Loop::
5720 * Other Event Loop Functions:: 6085 * Other Event Loop Functions::
5721 * Converting Events:: 6086 * Converting Events::
5722 * Dispatching Events; The Command Builder:: 6087 * Dispatching Events; The Command Builder::
5723 @end menu 6088 @end menu
5724 6089
5725 @node Introduction to Events 6090 @node Introduction to Events, Main Loop, Events and the Event Loop, Events and the Event Loop
5726 @section Introduction to Events 6091 @section Introduction to Events
5727 6092
5728 An event is an object that encapsulates information about an 6093 An event is an object that encapsulates information about an
5729 interesting occurrence in the operating system. Events are 6094 interesting occurrence in the operating system. Events are
5730 generated either by user action, direct (e.g. typing on the 6095 generated either by user action, direct (e.g. typing on the
5759 Emacs events---there may not be a one-to-one correspondence. 6124 Emacs events---there may not be a one-to-one correspondence.
5760 6125
5761 Emacs events are documented in @file{events.h}; I'll discuss them 6126 Emacs events are documented in @file{events.h}; I'll discuss them
5762 later. 6127 later.
5763 6128
5764 @node Main Loop 6129 @node Main Loop, Specifics of the Event Gathering Mechanism, Introduction to Events, Events and the Event Loop
5765 @section Main Loop 6130 @section Main Loop
5766 6131
5767 The @dfn{command loop} is the top-level loop that the editor is always 6132 The @dfn{command loop} is the top-level loop that the editor is always
5768 running. It loops endlessly, calling @code{next-event} to retrieve an 6133 running. It loops endlessly, calling @code{next-event} to retrieve an
5769 event and @code{dispatch-event} to execute it. @code{dispatch-event} does 6134 event and @code{dispatch-event} to execute it. @code{dispatch-event} does
5826 wrapper similar to @code{command_loop_2()}. Note also that 6191 wrapper similar to @code{command_loop_2()}. Note also that
5827 @code{initial_command_loop()} sets up a catch for @code{top-level} when 6192 @code{initial_command_loop()} sets up a catch for @code{top-level} when
5828 invoking @code{top_level_1()}, just like when it invokes 6193 invoking @code{top_level_1()}, just like when it invokes
5829 @code{command_loop_2()}. 6194 @code{command_loop_2()}.
5830 6195
5831 @node Specifics of the Event Gathering Mechanism 6196 @node Specifics of the Event Gathering Mechanism, Specifics About the Emacs Event, Main Loop, Events and the Event Loop
5832 @section Specifics of the Event Gathering Mechanism 6197 @section Specifics of the Event Gathering Mechanism
5833 6198
5834 Here is an approximate diagram of the collection processes 6199 Here is an approximate diagram of the collection processes
5835 at work in XEmacs, under TTY's (TTY's are simpler than X 6200 at work in XEmacs, under TTY's (TTY's are simpler than X
5836 so we'll look at this first): 6201 so we'll look at this first):
6065 which repeatedly calls `next-event' 6430 which repeatedly calls `next-event'
6066 and then dispatches the event 6431 and then dispatches the event
6067 using `dispatch-event' 6432 using `dispatch-event'
6068 @end example 6433 @end example
6069 6434
6070 @node Specifics About the Emacs Event 6435 @node Specifics About the Emacs Event, The Event Stream Callback Routines, Specifics of the Event Gathering Mechanism, Events and the Event Loop
6071 @section Specifics About the Emacs Event 6436 @section Specifics About the Emacs Event
6072 6437
6073 @node The Event Stream Callback Routines 6438 @node The Event Stream Callback Routines, Other Event Loop Functions, Specifics About the Emacs Event, Events and the Event Loop
6074 @section The Event Stream Callback Routines 6439 @section The Event Stream Callback Routines
6075 6440
6076 @node Other Event Loop Functions 6441 @node Other Event Loop Functions, Converting Events, The Event Stream Callback Routines, Events and the Event Loop
6077 @section Other Event Loop Functions 6442 @section Other Event Loop Functions
6078 6443
6079 @code{detect_input_pending()} and @code{input-pending-p} look for 6444 @code{detect_input_pending()} and @code{input-pending-p} look for
6080 input by calling @code{event_stream->event_pending_p} and looking in 6445 input by calling @code{event_stream->event_pending_p} and looking in
6081 @code{[V]unread-command-event} and the @code{command_event_queue} (they 6446 @code{[V]unread-command-event} and the @code{command_event_queue} (they
6093 @code{read-char} calls @code{next-command-event} and uses 6458 @code{read-char} calls @code{next-command-event} and uses
6094 @code{event_to_character()} to return the character equivalent. With 6459 @code{event_to_character()} to return the character equivalent. With
6095 the right kind of input method support, it is possible for (read-char) 6460 the right kind of input method support, it is possible for (read-char)
6096 to return a Kanji character. 6461 to return a Kanji character.
6097 6462
6098 @node Converting Events 6463 @node Converting Events, Dispatching Events; The Command Builder, Other Event Loop Functions, Events and the Event Loop
6099 @section Converting Events 6464 @section Converting Events
6100 6465
6101 @code{character_to_event()}, @code{event_to_character()}, 6466 @code{character_to_event()}, @code{event_to_character()},
6102 @code{event-to-character}, and @code{character-to-event} convert between 6467 @code{event-to-character}, and @code{character-to-event} convert between
6103 characters and keypress events corresponding to the characters. If the 6468 characters and keypress events corresponding to the characters. If the
6104 event was not a keypress, @code{event_to_character()} returns -1 and 6469 event was not a keypress, @code{event_to_character()} returns -1 and
6105 @code{event-to-character} returns @code{nil}. These functions convert 6470 @code{event-to-character} returns @code{nil}. These functions convert
6106 between character representation and the split-up event representation 6471 between character representation and the split-up event representation
6107 (keysym plus mod keys). 6472 (keysym plus mod keys).
6108 6473
6109 @node Dispatching Events; The Command Builder 6474 @node Dispatching Events; The Command Builder, , Converting Events, Events and the Event Loop
6110 @section Dispatching Events; The Command Builder 6475 @section Dispatching Events; The Command Builder
6111 6476
6112 Not yet documented. 6477 Not yet documented.
6113 6478
6114 @node Evaluation; Stack Frames; Bindings, Symbols and Variables, Events and the Event Loop, Top 6479 @node Evaluation; Stack Frames; Bindings, Symbols and Variables, Events and the Event Loop, Top
6119 * Dynamic Binding; The specbinding Stack; Unwind-Protects:: 6484 * Dynamic Binding; The specbinding Stack; Unwind-Protects::
6120 * Simple Special Forms:: 6485 * Simple Special Forms::
6121 * Catch and Throw:: 6486 * Catch and Throw::
6122 @end menu 6487 @end menu
6123 6488
6124 @node Evaluation 6489 @node Evaluation, Dynamic Binding; The specbinding Stack; Unwind-Protects, Evaluation; Stack Frames; Bindings, Evaluation; Stack Frames; Bindings
6125 @section Evaluation 6490 @section Evaluation
6126 6491
6127 @code{Feval()} evaluates the form (a Lisp object) that is passed to 6492 @code{Feval()} evaluates the form (a Lisp object) that is passed to
6128 it. Note that evaluation is only non-trivial for two types of objects: 6493 it. Note that evaluation is only non-trivial for two types of objects:
6129 symbols and conses. A symbol is evaluated simply by calling 6494 symbols and conses. A symbol is evaluated simply by calling
6188 @code{funcall_compiled_function()} calls the real byte-code interpreter 6553 @code{funcall_compiled_function()} calls the real byte-code interpreter
6189 @code{execute_optimized_program()} on the byte-code instructions, which 6554 @code{execute_optimized_program()} on the byte-code instructions, which
6190 are converted into an internal form for faster execution. 6555 are converted into an internal form for faster execution.
6191 6556
6192 When a compiled function is executed for the first time by 6557 When a compiled function is executed for the first time by
6193 @code{funcall_compiled_function()}, or when it is @code{Fpurecopy()}ed 6558 @code{funcall_compiled_function()}, or during the dump phase of building
6194 during the dump phase of building XEmacs, the byte-code instructions are 6559 XEmacs, the byte-code instructions are converted from a
6195 converted from a @code{Lisp_String} (which is inefficient to access, 6560 @code{Lisp_String} (which is inefficient to access, especially in the
6196 especially in the presence of MULE) into a @code{Lisp_Opaque} object 6561 presence of MULE) into a @code{Lisp_Opaque} object containing an array
6197 containing an array of unsigned char, which can be directly executed by 6562 of unsigned char, which can be directly executed by the byte-code
6198 the byte-code interpreter. At this time the byte code is also analyzed 6563 interpreter. At this time the byte code is also analyzed for validity
6199 for validity and transformed into a more optimized form, so that 6564 and transformed into a more optimized form, so that
6200 @code{execute_optimized_program()} can really fly. 6565 @code{execute_optimized_program()} can really fly.
6201 6566
6202 Here are some of the optimizations performed by the internal byte-code 6567 Here are some of the optimizations performed by the internal byte-code
6203 transformer: 6568 transformer:
6204 @enumerate 6569 @enumerate
6209 References to the @code{constants} array that will be used as a Lisp 6574 References to the @code{constants} array that will be used as a Lisp
6210 variable are checked for being correct non-constant (i.e. not @code{t}, 6575 variable are checked for being correct non-constant (i.e. not @code{t},
6211 @code{nil}, or @code{keywordp}) symbols, so that the byte interpreter 6576 @code{nil}, or @code{keywordp}) symbols, so that the byte interpreter
6212 doesn't have to. 6577 doesn't have to.
6213 @item 6578 @item
6214 The maxiumum number of variable bindings in the byte-code is 6579 The maximum number of variable bindings in the byte-code is
6215 pre-computed, so that space on the @code{specpdl} stack can be 6580 pre-computed, so that space on the @code{specpdl} stack can be
6216 pre-reserved once for the whole function execution. 6581 pre-reserved once for the whole function execution.
6217 @item 6582 @item
6218 All byte-code jumps are relative to the current program counter instead 6583 All byte-code jumps are relative to the current program counter instead
6219 of the start of the program, thereby saving a register. 6584 of the start of the program, thereby saving a register.
6249 @code{call3()} call a function, passing it the argument(s) given (the 6614 @code{call3()} call a function, passing it the argument(s) given (the
6250 arguments are given as separate C arguments rather than being passed as 6615 arguments are given as separate C arguments rather than being passed as
6251 an array). @code{apply1()} uses @code{Fapply()} while the others use 6616 an array). @code{apply1()} uses @code{Fapply()} while the others use
6252 @code{Ffuncall()} to do the real work. 6617 @code{Ffuncall()} to do the real work.
6253 6618
6254 @node Dynamic Binding; The specbinding Stack; Unwind-Protects 6619 @node Dynamic Binding; The specbinding Stack; Unwind-Protects, Simple Special Forms, Evaluation, Evaluation; Stack Frames; Bindings
6255 @section Dynamic Binding; The specbinding Stack; Unwind-Protects 6620 @section Dynamic Binding; The specbinding Stack; Unwind-Protects
6256 6621
6257 @example 6622 @example
6258 struct specbinding 6623 struct specbinding
6259 @{ 6624 @{
6303 a local-variable binding (@code{func} is 0, @code{symbol} is not 6668 a local-variable binding (@code{func} is 0, @code{symbol} is not
6304 @code{nil}, and @code{old_value} holds the old value, which is stored as 6669 @code{nil}, and @code{old_value} holds the old value, which is stored as
6305 the symbol's value). 6670 the symbol's value).
6306 @end enumerate 6671 @end enumerate
6307 6672
6308 @node Simple Special Forms 6673 @node Simple Special Forms, Catch and Throw, Dynamic Binding; The specbinding Stack; Unwind-Protects, Evaluation; Stack Frames; Bindings
6309 @section Simple Special Forms 6674 @section Simple Special Forms
6310 6675
6311 @code{or}, @code{and}, @code{if}, @code{cond}, @code{progn}, 6676 @code{or}, @code{and}, @code{if}, @code{cond}, @code{progn},
6312 @code{prog1}, @code{prog2}, @code{setq}, @code{quote}, @code{function}, 6677 @code{prog1}, @code{prog2}, @code{setq}, @code{quote}, @code{function},
6313 @code{let*}, @code{let}, @code{while} 6678 @code{let*}, @code{let}, @code{while}
6315 All of these are very simple and work as expected, calling 6680 All of these are very simple and work as expected, calling
6316 @code{Feval()} or @code{Fprogn()} as necessary and (in the case of 6681 @code{Feval()} or @code{Fprogn()} as necessary and (in the case of
6317 @code{let} and @code{let*}) using @code{specbind()} to create bindings 6682 @code{let} and @code{let*}) using @code{specbind()} to create bindings
6318 and @code{unbind_to()} to undo the bindings when finished. 6683 and @code{unbind_to()} to undo the bindings when finished.
6319 6684
6320 Note that, with the exeption of @code{Fprogn}, these functions are 6685 Note that, with the exception of @code{Fprogn}, these functions are
6321 typically called in real life only in interpreted code, since the byte 6686 typically called in real life only in interpreted code, since the byte
6322 compiler knows how to convert calls to these functions directly into 6687 compiler knows how to convert calls to these functions directly into
6323 byte code. 6688 byte code.
6324 6689
6325 @node Catch and Throw 6690 @node Catch and Throw, , Simple Special Forms, Evaluation; Stack Frames; Bindings
6326 @section Catch and Throw 6691 @section Catch and Throw
6327 6692
6328 @example 6693 @example
6329 struct catchtag 6694 struct catchtag
6330 @{ 6695 @{
6388 * Introduction to Symbols:: 6753 * Introduction to Symbols::
6389 * Obarrays:: 6754 * Obarrays::
6390 * Symbol Values:: 6755 * Symbol Values::
6391 @end menu 6756 @end menu
6392 6757
6393 @node Introduction to Symbols 6758 @node Introduction to Symbols, Obarrays, Symbols and Variables, Symbols and Variables
6394 @section Introduction to Symbols 6759 @section Introduction to Symbols
6395 6760
6396 A symbol is basically just an object with four fields: a name (a 6761 A symbol is basically just an object with four fields: a name (a
6397 string), a value (some Lisp object), a function (some Lisp object), and 6762 string), a value (some Lisp object), a function (some Lisp object), and
6398 a property list (usually a list of alternating keyword/value pairs). 6763 a property list (usually a list of alternating keyword/value pairs).
6405 there can be a distinct function and variable with the same name. The 6770 there can be a distinct function and variable with the same name. The
6406 property list is used as a more general mechanism of associating 6771 property list is used as a more general mechanism of associating
6407 additional values with particular names, and once again the namespace is 6772 additional values with particular names, and once again the namespace is
6408 independent of the function and variable namespaces. 6773 independent of the function and variable namespaces.
6409 6774
6410 @node Obarrays 6775 @node Obarrays, Symbol Values, Introduction to Symbols, Symbols and Variables
6411 @section Obarrays 6776 @section Obarrays
6412 6777
6413 The identity of symbols with their names is accomplished through a 6778 The identity of symbols with their names is accomplished through a
6414 structure called an obarray, which is just a poorly-implemented hash 6779 structure called an obarray, which is just a poorly-implemented hash
6415 table mapping from strings to symbols whose name is that string. (I say 6780 table mapping from strings to symbols whose name is that string. (I say
6472 a new one, and @code{unintern} to remove a symbol from an obarray. This 6837 a new one, and @code{unintern} to remove a symbol from an obarray. This
6473 returns the removed symbol. (Remember: You can't put the symbol back 6838 returns the removed symbol. (Remember: You can't put the symbol back
6474 into any obarray.) Finally, @code{mapatoms} maps over all of the symbols 6839 into any obarray.) Finally, @code{mapatoms} maps over all of the symbols
6475 in an obarray. 6840 in an obarray.
6476 6841
6477 @node Symbol Values 6842 @node Symbol Values, , Obarrays, Symbols and Variables
6478 @section Symbol Values 6843 @section Symbol Values
6479 6844
6480 The value field of a symbol normally contains a Lisp object. However, 6845 The value field of a symbol normally contains a Lisp object. However,
6481 a symbol can be @dfn{unbound}, meaning that it logically has no value. 6846 a symbol can be @dfn{unbound}, meaning that it logically has no value.
6482 This is internally indicated by storing a special Lisp object, called 6847 This is internally indicated by storing a special Lisp object, called
6527 * Markers and Extents:: Tagging locations within a buffer. 6892 * Markers and Extents:: Tagging locations within a buffer.
6528 * Bufbytes and Emchars:: Representation of individual characters. 6893 * Bufbytes and Emchars:: Representation of individual characters.
6529 * The Buffer Object:: The Lisp object corresponding to a buffer. 6894 * The Buffer Object:: The Lisp object corresponding to a buffer.
6530 @end menu 6895 @end menu
6531 6896
6532 @node Introduction to Buffers 6897 @node Introduction to Buffers, The Text in a Buffer, Buffers and Textual Representation, Buffers and Textual Representation
6533 @section Introduction to Buffers 6898 @section Introduction to Buffers
6534 6899
6535 A buffer is logically just a Lisp object that holds some text. 6900 A buffer is logically just a Lisp object that holds some text.
6536 In this, it is like a string, but a buffer is optimized for 6901 In this, it is like a string, but a buffer is optimized for
6537 frequent insertion and deletion, while a string is not. Furthermore: 6902 frequent insertion and deletion, while a string is not. Furthermore:
6580 and @dfn{buffer of the selected window}, and the distinction between 6945 and @dfn{buffer of the selected window}, and the distinction between
6581 @dfn{point} of the current buffer and @dfn{window-point} of the selected 6946 @dfn{point} of the current buffer and @dfn{window-point} of the selected
6582 window. (This latter distinction is explained in detail in the section 6947 window. (This latter distinction is explained in detail in the section
6583 on windows.) 6948 on windows.)
6584 6949
6585 @node The Text in a Buffer 6950 @node The Text in a Buffer, Buffer Lists, Introduction to Buffers, Buffers and Textual Representation
6586 @section The Text in a Buffer 6951 @section The Text in a Buffer
6587 6952
6588 The text in a buffer consists of a sequence of zero or more 6953 The text in a buffer consists of a sequence of zero or more
6589 characters. A @dfn{character} is an integer that logically represents 6954 characters. A @dfn{character} is an integer that logically represents
6590 a letter, number, space, or other unit of text. Most of the characters 6955 a letter, number, space, or other unit of text. Most of the characters
6720 Bufbytes underscores the fact that we are working with a string of bytes 7085 Bufbytes underscores the fact that we are working with a string of bytes
6721 in the internal Emacs buffer representation rather than in one of a 7086 in the internal Emacs buffer representation rather than in one of a
6722 number of possible alternative representations (e.g. EUC-encoded text, 7087 number of possible alternative representations (e.g. EUC-encoded text,
6723 etc.). 7088 etc.).
6724 7089
6725 @node Buffer Lists 7090 @node Buffer Lists, Markers and Extents, The Text in a Buffer, Buffers and Textual Representation
6726 @section Buffer Lists 7091 @section Buffer Lists
6727 7092
6728 Recall earlier that buffers are @dfn{permanent} objects, i.e. that 7093 Recall earlier that buffers are @dfn{permanent} objects, i.e. that
6729 they remain around until explicitly deleted. This entails that there is 7094 they remain around until explicitly deleted. This entails that there is
6730 a list of all the buffers in existence. This list is actually an 7095 a list of all the buffers in existence. This list is actually an
6756 respectively. You can also force a new buffer to be created using 7121 respectively. You can also force a new buffer to be created using
6757 @code{generate-new-buffer}, which takes a name and (if necessary) makes 7122 @code{generate-new-buffer}, which takes a name and (if necessary) makes
6758 a unique name from this by appending a number, and then creates the 7123 a unique name from this by appending a number, and then creates the
6759 buffer. This is basically like the symbol operation @code{gensym}. 7124 buffer. This is basically like the symbol operation @code{gensym}.
6760 7125
6761 @node Markers and Extents 7126 @node Markers and Extents, Bufbytes and Emchars, Buffer Lists, Buffers and Textual Representation
6762 @section Markers and Extents 7127 @section Markers and Extents
6763 7128
6764 Among the things associated with a buffer are things that are 7129 Among the things associated with a buffer are things that are
6765 logically attached to certain buffer positions. This can be used to 7130 logically attached to certain buffer positions. This can be used to
6766 keep track of a buffer position when text is inserted and deleted, so 7131 keep track of a buffer position when text is inserted and deleted, so
6782 7147
6783 The important thing here is that markers and extents simply contain 7148 The important thing here is that markers and extents simply contain
6784 buffer positions in them as integers, and every time text is inserted or 7149 buffer positions in them as integers, and every time text is inserted or
6785 deleted, these positions must be updated. In order to minimize the 7150 deleted, these positions must be updated. In order to minimize the
6786 amount of shuffling that needs to be done, the positions in markers and 7151 amount of shuffling that needs to be done, the positions in markers and
6787 extents (there's one per marker, two per extent) and stored in Meminds. 7152 extents (there's one per marker, two per extent) are stored in Meminds.
6788 This means that they only need to be moved when the text is physically 7153 This means that they only need to be moved when the text is physically
6789 moved in memory; since the gap structure tries to minimize this, it also 7154 moved in memory; since the gap structure tries to minimize this, it also
6790 minimizes the number of marker and extent indices that need to be 7155 minimizes the number of marker and extent indices that need to be
6791 adjusted. Look in @file{insdel.c} for the details of how this works. 7156 adjusted. Look in @file{insdel.c} for the details of how this works.
6792 7157
6796 is no way to determine what markers are in a buffer if you are just 7161 is no way to determine what markers are in a buffer if you are just
6797 given the buffer. Extents remain in a buffer until they are detached 7162 given the buffer. Extents remain in a buffer until they are detached
6798 (which could happen as a result of text being deleted) or the buffer is 7163 (which could happen as a result of text being deleted) or the buffer is
6799 deleted, and primitives do exist to enumerate the extents in a buffer. 7164 deleted, and primitives do exist to enumerate the extents in a buffer.
6800 7165
6801 @node Bufbytes and Emchars 7166 @node Bufbytes and Emchars, The Buffer Object, Markers and Extents, Buffers and Textual Representation
6802 @section Bufbytes and Emchars 7167 @section Bufbytes and Emchars
6803 7168
6804 Not yet documented. 7169 Not yet documented.
6805 7170
6806 @node The Buffer Object 7171 @node The Buffer Object, , Bufbytes and Emchars, Buffers and Textual Representation
6807 @section The Buffer Object 7172 @section The Buffer Object
6808 7173
6809 Buffers contain fields not directly accessible by the Lisp programmer. 7174 Buffers contain fields not directly accessible by the Lisp programmer.
6810 We describe them here, naming them by the names used in the C code. 7175 We describe them here, naming them by the names used in the C code.
6811 Many are accessible indirectly in Lisp programs via Lisp primitives. 7176 Many are accessible indirectly in Lisp programs via Lisp primitives.
6920 * Encodings:: 7285 * Encodings::
6921 * Internal Mule Encodings:: 7286 * Internal Mule Encodings::
6922 * CCL:: 7287 * CCL::
6923 @end menu 7288 @end menu
6924 7289
6925 @node Character Sets 7290 @node Character Sets, Encodings, MULE Character Sets and Encodings, MULE Character Sets and Encodings
6926 @section Character Sets 7291 @section Character Sets
6927 7292
6928 A character set (or @dfn{charset}) is an ordered set of characters. A 7293 A character set (or @dfn{charset}) is an ordered set of characters. A
6929 particular character in a charset is indexed using one or more 7294 particular character in a charset is indexed using one or more
6930 @dfn{position codes}, which are non-negative integers. The number of 7295 @dfn{position codes}, which are non-negative integers. The number of
7001 160 - 255 Latin-1 32 - 127 7366 160 - 255 Latin-1 32 - 127
7002 @end example 7367 @end example
7003 7368
7004 This is a bit ad-hoc but gets the job done. 7369 This is a bit ad-hoc but gets the job done.
7005 7370
7006 @node Encodings 7371 @node Encodings, Internal Mule Encodings, Character Sets, MULE Character Sets and Encodings
7007 @section Encodings 7372 @section Encodings
7008 7373
7009 An @dfn{encoding} is a way of numerically representing characters from 7374 An @dfn{encoding} is a way of numerically representing characters from
7010 one or more character sets. If an encoding only encompasses one 7375 one or more character sets. If an encoding only encompasses one
7011 character set, then the position codes for the characters in that 7376 character set, then the position codes for the characters in that
7028 @menu 7393 @menu
7029 * Japanese EUC (Extended Unix Code):: 7394 * Japanese EUC (Extended Unix Code)::
7030 * JIS7:: 7395 * JIS7::
7031 @end menu 7396 @end menu
7032 7397
7033 @node Japanese EUC (Extended Unix Code) 7398 @node Japanese EUC (Extended Unix Code), JIS7, Encodings, Encodings
7034 @subsection Japanese EUC (Extended Unix Code) 7399 @subsection Japanese EUC (Extended Unix Code)
7035 7400
7036 This encompasses the character sets Printing-ASCII, Japanese-JISX0201, 7401 This encompasses the character sets Printing-ASCII, Japanese-JISX0201,
7037 and Japanese-JISX0208-Kana (half-width katakana, the right half of 7402 and Japanese-JISX0208-Kana (half-width katakana, the right half of
7038 JISX0201). It uses 8-bit bytes. 7403 JISX0201). It uses 8-bit bytes.
7050 Japanese-JISX0208 PC1 + 0x80 | PC2 + 0x80 7415 Japanese-JISX0208 PC1 + 0x80 | PC2 + 0x80
7051 Japanese-JISX0212 PC1 + 0x80 | PC2 + 0x80 7416 Japanese-JISX0212 PC1 + 0x80 | PC2 + 0x80
7052 @end example 7417 @end example
7053 7418
7054 7419
7055 @node JIS7 7420 @node JIS7, , Japanese EUC (Extended Unix Code), Encodings
7056 @subsection JIS7 7421 @subsection JIS7
7057 7422
7058 This encompasses the character sets Printing-ASCII, 7423 This encompasses the character sets Printing-ASCII,
7059 Japanese-JISX0201-Roman (the left half of JISX0201; this character set 7424 Japanese-JISX0201-Roman (the left half of JISX0201; this character set
7060 is very similar to Printing-ASCII and is a 94-character charset), 7425 is very similar to Printing-ASCII and is a 94-character charset),
7085 0x1B 0x28 0x42 ESC ( B invoke Printing-ASCII 7450 0x1B 0x28 0x42 ESC ( B invoke Printing-ASCII
7086 @end example 7451 @end example
7087 7452
7088 Initially, Printing-ASCII is invoked. 7453 Initially, Printing-ASCII is invoked.
7089 7454
7090 @node Internal Mule Encodings 7455 @node Internal Mule Encodings, CCL, Encodings, MULE Character Sets and Encodings
7091 @section Internal Mule Encodings 7456 @section Internal Mule Encodings
7092 7457
7093 In XEmacs/Mule, each character set is assigned a unique number, called a 7458 In XEmacs/Mule, each character set is assigned a unique number, called a
7094 @dfn{leading byte}. This is used in the encodings of a character. 7459 @dfn{leading byte}. This is used in the encodings of a character.
7095 Leading bytes are in the range 0x80 - 0xFF (except for ASCII, which has 7460 Leading bytes are in the range 0x80 - 0xFF (except for ASCII, which has
7131 @menu 7496 @menu
7132 * Internal String Encoding:: 7497 * Internal String Encoding::
7133 * Internal Character Encoding:: 7498 * Internal Character Encoding::
7134 @end menu 7499 @end menu
7135 7500
7136 @node Internal String Encoding 7501 @node Internal String Encoding, Internal Character Encoding, Internal Mule Encodings, Internal Mule Encodings
7137 @subsection Internal String Encoding 7502 @subsection Internal String Encoding
7138 7503
7139 ASCII characters are encoded using their position code directly. Other 7504 ASCII characters are encoded using their position code directly. Other
7140 characters are encoded using their leading byte followed by their 7505 characters are encoded using their leading byte followed by their
7141 position code(s) with the high bit set. Characters in private character 7506 position code(s) with the high bit set. Characters in private character
7181 None of the standard non-modal encodings meet all of these 7546 None of the standard non-modal encodings meet all of these
7182 conditions. For example, EUC satisfies only (2) and (3), while 7547 conditions. For example, EUC satisfies only (2) and (3), while
7183 Shift-JIS and Big5 (not yet described) satisfy only (2). (All 7548 Shift-JIS and Big5 (not yet described) satisfy only (2). (All
7184 non-modal encodings must satisfy (2), in order to be unambiguous.) 7549 non-modal encodings must satisfy (2), in order to be unambiguous.)
7185 7550
7186 @node Internal Character Encoding 7551 @node Internal Character Encoding, , Internal String Encoding, Internal Mule Encodings
7187 @subsection Internal Character Encoding 7552 @subsection Internal Character Encoding
7188 7553
7189 One 19-bit word represents a single character. The word is 7554 One 19-bit word represents a single character. The word is
7190 separated into three fields: 7555 separated into three fields:
7191 7556
7216 @end example 7581 @end example
7217 7582
7218 Note that character codes 0 - 255 are the same as the ``binary encoding'' 7583 Note that character codes 0 - 255 are the same as the ``binary encoding''
7219 described above. 7584 described above.
7220 7585
7221 @node CCL 7586 @node CCL, , Internal Mule Encodings, MULE Character Sets and Encodings
7222 @section CCL 7587 @section CCL
7223 7588
7224 @example 7589 @example
7225 CCL PROGRAM SYNTAX: 7590 CCL PROGRAM SYNTAX:
7226 CCL_PROGRAM := (CCL_MAIN_BLOCK 7591 CCL_PROGRAM := (CCL_MAIN_BLOCK
7270 this is the code executed to handle any stuff that needs to be done 7635 this is the code executed to handle any stuff that needs to be done
7271 (e.g. designating back to ASCII and left-to-right mode) after all 7636 (e.g. designating back to ASCII and left-to-right mode) after all
7272 other encoded/decoded data has been written out. This is not used for 7637 other encoded/decoded data has been written out. This is not used for
7273 charset CCL programs. 7638 charset CCL programs.
7274 7639
7275 REGISTER: 0..7 -- refered by RRR or rrr 7640 REGISTER: 0..7 -- referred by RRR or rrr
7276 7641
7277 OPERATOR BIT FIELD (27-bit): XXXXXXXXXXXXXXX RRR TTTTT 7642 OPERATOR BIT FIELD (27-bit): XXXXXXXXXXXXXXX RRR TTTTT
7278 TTTTT (5-bit): operator type 7643 TTTTT (5-bit): operator type
7279 RRR (3-bit): register number 7644 RRR (3-bit): register number
7280 XXXXXXXXXXXXXXXX (15-bit): 7645 XXXXXXXXXXXXXXXX (15-bit):
7407 * Lstream Types:: Different sorts of things that are streamed. 7772 * Lstream Types:: Different sorts of things that are streamed.
7408 * Lstream Functions:: Functions for working with lstreams. 7773 * Lstream Functions:: Functions for working with lstreams.
7409 * Lstream Methods:: Creating new lstream types. 7774 * Lstream Methods:: Creating new lstream types.
7410 @end menu 7775 @end menu
7411 7776
7412 @node Creating an Lstream 7777 @node Creating an Lstream, Lstream Types, Lstreams, Lstreams
7413 @section Creating an Lstream 7778 @section Creating an Lstream
7414 7779
7415 Lstreams come in different types, depending on what is being interfaced 7780 Lstreams come in different types, depending on what is being interfaced
7416 to. Although the primitive for creating new lstreams is 7781 to. Although the primitive for creating new lstreams is
7417 @code{Lstream_new()}, generally you do not call this directly. Instead, 7782 @code{Lstream_new()}, generally you do not call this directly. Instead,
7438 Open for reading, but ``read'' never returns partial MULE characters. 7803 Open for reading, but ``read'' never returns partial MULE characters.
7439 @item "wc" 7804 @item "wc"
7440 Open for writing, but never writes partial MULE characters. 7805 Open for writing, but never writes partial MULE characters.
7441 @end table 7806 @end table
7442 7807
7443 @node Lstream Types 7808 @node Lstream Types, Lstream Functions, Creating an Lstream, Lstreams
7444 @section Lstream Types 7809 @section Lstream Types
7445 7810
7446 @table @asis 7811 @table @asis
7447 @item stdio 7812 @item stdio
7448 7813
7463 @item decoding 7828 @item decoding
7464 7829
7465 @item encoding 7830 @item encoding
7466 @end table 7831 @end table
7467 7832
7468 @node Lstream Functions 7833 @node Lstream Functions, Lstream Methods, Lstream Types, Lstreams
7469 @section Lstream Functions 7834 @section Lstream Functions
7470 7835
7471 @deftypefun {Lstream *} Lstream_new (Lstream_implementation *@var{imp}, CONST char *@var{mode}) 7836 @deftypefun {Lstream *} Lstream_new (Lstream_implementation *@var{imp}, const char *@var{mode})
7472 Allocate and return a new Lstream. This function is not really meant to 7837 Allocate and return a new Lstream. This function is not really meant to
7473 be called directly; rather, each stream type should provide its own 7838 be called directly; rather, each stream type should provide its own
7474 stream creation function, which creates the stream and does any other 7839 stream creation function, which creates the stream and does any other
7475 necessary creation stuff (e.g. opening a file). 7840 necessary creation stuff (e.g. opening a file).
7476 @end deftypefun 7841 @end deftypefun
7546 7911
7547 @deftypefun void Lstream_rewind (Lstream *@var{stream}) 7912 @deftypefun void Lstream_rewind (Lstream *@var{stream})
7548 Rewind the stream to the beginning. 7913 Rewind the stream to the beginning.
7549 @end deftypefun 7914 @end deftypefun
7550 7915
7551 @node Lstream Methods 7916 @node Lstream Methods, , Lstream Functions, Lstreams
7552 @section Lstream Methods 7917 @section Lstream Methods
7553 7918
7554 @deftypefn {Lstream Method} ssize_t reader (Lstream *@var{stream}, unsigned char *@var{data}, size_t @var{size}) 7919 @deftypefn {Lstream Method} ssize_t reader (Lstream *@var{stream}, unsigned char *@var{data}, size_t @var{size})
7555 Read some data from the stream's end and store it into @var{data}, which 7920 Read some data from the stream's end and store it into @var{data}, which
7556 can hold @var{size} bytes. Return the number of bytes read. A return 7921 can hold @var{size} bytes. Return the number of bytes read. A return
7566 calls @code{Lstream_read()} with a very small size. 7931 calls @code{Lstream_read()} with a very small size.
7567 7932
7568 This function can be @code{NULL} if the stream is output-only. 7933 This function can be @code{NULL} if the stream is output-only.
7569 @end deftypefn 7934 @end deftypefn
7570 7935
7571 @deftypefn {Lstream Method} ssize_t writer (Lstream *@var{stream}, CONST unsigned char *@var{data}, size_t @var{size}) 7936 @deftypefn {Lstream Method} ssize_t writer (Lstream *@var{stream}, const unsigned char *@var{data}, size_t @var{size})
7572 Send some data to the stream's end. Data to be sent is in @var{data} 7937 Send some data to the stream's end. Data to be sent is in @var{data}
7573 and is @var{size} bytes. Return the number of bytes sent. This 7938 and is @var{size} bytes. Return the number of bytes sent. This
7574 function can send and return fewer bytes than is passed in; in that 7939 function can send and return fewer bytes than is passed in; in that
7575 case, the function will just be called again until there is no data left 7940 case, the function will just be called again until there is no data left
7576 or 0 is returned. A return value of 0 means that no more data can be 7941 or 0 is returned. A return value of 0 means that no more data can be
7621 * Point:: 7986 * Point::
7622 * Window Hierarchy:: 7987 * Window Hierarchy::
7623 * The Window Object:: 7988 * The Window Object::
7624 @end menu 7989 @end menu
7625 7990
7626 @node Introduction to Consoles; Devices; Frames; Windows 7991 @node Introduction to Consoles; Devices; Frames; Windows, Point, Consoles; Devices; Frames; Windows, Consoles; Devices; Frames; Windows
7627 @section Introduction to Consoles; Devices; Frames; Windows 7992 @section Introduction to Consoles; Devices; Frames; Windows
7628 7993
7629 A window-system window that you see on the screen is called a 7994 A window-system window that you see on the screen is called a
7630 @dfn{frame} in Emacs terminology. Each frame is subdivided into one or 7995 @dfn{frame} in Emacs terminology. Each frame is subdivided into one or
7631 more non-overlapping panes, called (confusingly) @dfn{windows}. Each 7996 more non-overlapping panes, called (confusingly) @dfn{windows}. Each
7656 There is a separate Lisp object type for each of these four concepts. 8021 There is a separate Lisp object type for each of these four concepts.
7657 Furthermore, there is logically a @dfn{selected console}, 8022 Furthermore, there is logically a @dfn{selected console},
7658 @dfn{selected display}, @dfn{selected frame}, and @dfn{selected window}. 8023 @dfn{selected display}, @dfn{selected frame}, and @dfn{selected window}.
7659 Each of these objects is distinguished in various ways, such as being the 8024 Each of these objects is distinguished in various ways, such as being the
7660 default object for various functions that act on objects of that type. 8025 default object for various functions that act on objects of that type.
7661 Note that every containing object rememembers the ``selected'' object 8026 Note that every containing object remembers the ``selected'' object
7662 among the objects that it contains: e.g. not only is there a selected 8027 among the objects that it contains: e.g. not only is there a selected
7663 window, but every frame remembers the last window in it that was 8028 window, but every frame remembers the last window in it that was
7664 selected, and changing the selected frame causes the remembered window 8029 selected, and changing the selected frame causes the remembered window
7665 within it to become the selected window. Similar relationships apply 8030 within it to become the selected window. Similar relationships apply
7666 for consoles to devices and devices to frames. 8031 for consoles to devices and devices to frames.
7667 8032
7668 @node Point 8033 @node Point, Window Hierarchy, Introduction to Consoles; Devices; Frames; Windows, Consoles; Devices; Frames; Windows
7669 @section Point 8034 @section Point
7670 8035
7671 Recall that every buffer has a current insertion position, called 8036 Recall that every buffer has a current insertion position, called
7672 @dfn{point}. Now, two or more windows may be displaying the same buffer, 8037 @dfn{point}. Now, two or more windows may be displaying the same buffer,
7673 and the text cursor in the two windows (i.e. @code{point}) can be in 8038 and the text cursor in the two windows (i.e. @code{point}) can be in
7684 want to retrieve the correct value of @code{point} for a window, 8049 want to retrieve the correct value of @code{point} for a window,
7685 you must special-case on the selected window and retrieve the 8050 you must special-case on the selected window and retrieve the
7686 buffer's point instead. This is related to why @code{save-window-excursion} 8051 buffer's point instead. This is related to why @code{save-window-excursion}
7687 does not save the selected window's value of @code{point}. 8052 does not save the selected window's value of @code{point}.
7688 8053
7689 @node Window Hierarchy 8054 @node Window Hierarchy, The Window Object, Point, Consoles; Devices; Frames; Windows
7690 @section Window Hierarchy 8055 @section Window Hierarchy
7691 @cindex window hierarchy 8056 @cindex window hierarchy
7692 @cindex hierarchy of windows 8057 @cindex hierarchy of windows
7693 8058
7694 If a frame contains multiple windows (panes), they are always created 8059 If a frame contains multiple windows (panes), they are always created
7782 frames have no root window, and the @code{next} of the minibuffer window 8147 frames have no root window, and the @code{next} of the minibuffer window
7783 is @code{nil} but the @code{prev} points to itself. (#### This is an 8148 is @code{nil} but the @code{prev} points to itself. (#### This is an
7784 artifact that should be fixed.) 8149 artifact that should be fixed.)
7785 @end enumerate 8150 @end enumerate
7786 8151
7787 @node The Window Object 8152 @node The Window Object, , Window Hierarchy, Consoles; Devices; Frames; Windows
7788 @section The Window Object 8153 @section The Window Object
7789 8154
7790 Windows have the following accessible fields: 8155 Windows have the following accessible fields:
7791 8156
7792 @table @code 8157 @table @code
7914 * Critical Redisplay Sections:: 8279 * Critical Redisplay Sections::
7915 * Line Start Cache:: 8280 * Line Start Cache::
7916 * Redisplay Piece by Piece:: 8281 * Redisplay Piece by Piece::
7917 @end menu 8282 @end menu
7918 8283
7919 @node Critical Redisplay Sections 8284 @node Critical Redisplay Sections, Line Start Cache, The Redisplay Mechanism, The Redisplay Mechanism
7920 @section Critical Redisplay Sections 8285 @section Critical Redisplay Sections
7921 @cindex critical redisplay sections 8286 @cindex critical redisplay sections
7922 8287
7923 Within this section, we are defenseless and assume that the 8288 Within this section, we are defenseless and assume that the
7924 following cannot happen: 8289 following cannot happen:
7946 we simply return. #### We should abort instead. 8311 we simply return. #### We should abort instead.
7947 8312
7948 #### If a frame-size change does occur we should probably 8313 #### If a frame-size change does occur we should probably
7949 actually be preempting redisplay. 8314 actually be preempting redisplay.
7950 8315
7951 @node Line Start Cache 8316 @node Line Start Cache, Redisplay Piece by Piece, Critical Redisplay Sections, The Redisplay Mechanism
7952 @section Line Start Cache 8317 @section Line Start Cache
7953 @cindex line start cache 8318 @cindex line start cache
7954 8319
7955 The traditional scrolling code in Emacs breaks in a variable height 8320 The traditional scrolling code in Emacs breaks in a variable height
7956 world. It depends on the key assumption that the number of lines that 8321 world. It depends on the key assumption that the number of lines that
8007 @end itemize 8372 @end itemize
8008 8373
8009 In case you're wondering, the Second Golden Rule of Redisplay is not 8374 In case you're wondering, the Second Golden Rule of Redisplay is not
8010 applicable. 8375 applicable.
8011 8376
8012 @node Redisplay Piece by Piece 8377 @node Redisplay Piece by Piece, , Line Start Cache, The Redisplay Mechanism
8013 @section Redisplay Piece by Piece 8378 @section Redisplay Piece by Piece
8014 @cindex Redisplay Piece by Piece 8379 @cindex Redisplay Piece by Piece
8015 8380
8016 As you can begin to see redisplay is complex and also not well 8381 As you can begin to see redisplay is complex and also not well
8017 documented. Chuck no longer works on XEmacs so this section is my take 8382 documented. Chuck no longer works on XEmacs so this section is my take
8029 @item 8394 @item
8030 Output changes Implemented by @code{redisplay-output.c}, 8395 Output changes Implemented by @code{redisplay-output.c},
8031 @code{redisplay-x.c}, @code{redisplay-msw.c} and @code{redisplay-tty.c} 8396 @code{redisplay-x.c}, @code{redisplay-msw.c} and @code{redisplay-tty.c}
8032 @end enumerate 8397 @end enumerate
8033 8398
8034 Steps 1 and 2 are device-independant and relatively complex. Step 3 is 8399 Steps 1 and 2 are device-independent and relatively complex. Step 3 is
8035 mostly device-dependent. 8400 mostly device-dependent.
8036 8401
8037 Determining the desired display 8402 Determining the desired display
8038 8403
8039 Display attributes are stored in @code{display_line} structures. Each 8404 Display attributes are stored in @code{display_line} structures. Each
8040 @code{display_line} consists of a set of @code{display_block}'s and each 8405 @code{display_line} consists of a set of @code{display_block}'s and each
8041 @code{display_block} contains a number of @code{rune}'s. Generally 8406 @code{display_block} contains a number of @code{rune}'s. Generally
8042 dynarr's of @code{display_line}'s are held by each window representing 8407 dynarr's of @code{display_line}'s are held by each window representing
8043 the current display and the desired display. 8408 the current display and the desired display.
8044 8409
8045 The @code{display_line} structures are tighly tied to buffers which 8410 The @code{display_line} structures are tightly tied to buffers which
8046 presents a problem for redisplay as this connection is bogus for the 8411 presents a problem for redisplay as this connection is bogus for the
8047 modeline. Hence the @code{display_line} generation routines are 8412 modeline. Hence the @code{display_line} generation routines are
8048 duplicated for generating the modeline. This means that the modeline 8413 duplicated for generating the modeline. This means that the modeline
8049 display code has many bugs that the standard redisplay code does not. 8414 display code has many bugs that the standard redisplay code does not.
8050 8415
8051 The guts of @code{display_line} generation are in 8416 The guts of @code{display_line} generation are in
8052 @code{create_text_block}, which creates a single display line for the 8417 @code{create_text_block}, which creates a single display line for the
8053 desired locale. This incrementally parses the characters on the current 8418 desired locale. This incrementally parses the characters on the current
8054 line and generates redisplay structures for each. 8419 line and generates redisplay structures for each.
8055 8420
8056 Gutter redisplay is different. Because the data to display is stored in 8421 Gutter redisplay is different. Because the data to display is stored in
8057 a string we cannot use @code{create_text_block}. Instead we use 8422 a string we cannot use @code{create_text_block}. Instead we use
8058 @code{create_text_string_block} which performs the same function as 8423 @code{create_text_string_block} which performs the same function as
8059 @code{create_text_block} but for strings. Many of the complexities of 8424 @code{create_text_block} but for strings. Many of the complexities of
8066 @menu 8431 @menu
8067 * Introduction to Extents:: Extents are ranges over text, with properties. 8432 * Introduction to Extents:: Extents are ranges over text, with properties.
8068 * Extent Ordering:: How extents are ordered internally. 8433 * Extent Ordering:: How extents are ordered internally.
8069 * Format of the Extent Info:: The extent information in a buffer or string. 8434 * Format of the Extent Info:: The extent information in a buffer or string.
8070 * Zero-Length Extents:: A weird special case. 8435 * Zero-Length Extents:: A weird special case.
8071 * Mathematics of Extent Ordering:: A rigorous foundation. 8436 * Mathematics of Extent Ordering:: A rigorous foundation.
8072 * Extent Fragments:: Cached information useful for redisplay. 8437 * Extent Fragments:: Cached information useful for redisplay.
8073 @end menu 8438 @end menu
8074 8439
8075 @node Introduction to Extents 8440 @node Introduction to Extents, Extent Ordering, Extents, Extents
8076 @section Introduction to Extents 8441 @section Introduction to Extents
8077 8442
8078 Extents are regions over a buffer, with a start and an end position 8443 Extents are regions over a buffer, with a start and an end position
8079 denoting the region of the buffer included in the extent. In 8444 denoting the region of the buffer included in the extent. In
8080 addition, either end can be closed or open, meaning that the endpoint 8445 addition, either end can be closed or open, meaning that the endpoint
8092 automatically go inside or out of extents as necessary with no 8457 automatically go inside or out of extents as necessary with no
8093 further work needing to be done. It didn't work out that way, 8458 further work needing to be done. It didn't work out that way,
8094 however, and just ended up complexifying and buggifying all the 8459 however, and just ended up complexifying and buggifying all the
8095 rest of the code.) 8460 rest of the code.)
8096 8461
8097 @node Extent Ordering 8462 @node Extent Ordering, Format of the Extent Info, Introduction to Extents, Extents
8098 @section Extent Ordering 8463 @section Extent Ordering
8099 8464
8100 Extents are compared using memory indices. There are two orderings 8465 Extents are compared using memory indices. There are two orderings
8101 for extents and both orders are kept current at all times. The normal 8466 for extents and both orders are kept current at all times. The normal
8102 or @dfn{display} order is as follows: 8467 or @dfn{display} order is as follows:
8126 The display order and the e-order are complementary orders: any 8491 The display order and the e-order are complementary orders: any
8127 theorem about the display order also applies to the e-order if you swap 8492 theorem about the display order also applies to the e-order if you swap
8128 all occurrences of ``display order'' and ``e-order'', ``less than'' and 8493 all occurrences of ``display order'' and ``e-order'', ``less than'' and
8129 ``greater than'', and ``extent start'' and ``extent end''. 8494 ``greater than'', and ``extent start'' and ``extent end''.
8130 8495
8131 @node Format of the Extent Info 8496 @node Format of the Extent Info, Zero-Length Extents, Extent Ordering, Extents
8132 @section Format of the Extent Info 8497 @section Format of the Extent Info
8133 8498
8134 An extent-info structure consists of a list of the buffer or string's 8499 An extent-info structure consists of a list of the buffer or string's
8135 extents and a @dfn{stack of extents} that lists all of the extents over 8500 extents and a @dfn{stack of extents} that lists all of the extents over
8136 a particular position. The stack-of-extents info is used for 8501 a particular position. The stack-of-extents info is used for
8160 between two extents. Note also that callers of these functions should 8525 between two extents. Note also that callers of these functions should
8161 not be aware of the fact that the extent list is implemented as an 8526 not be aware of the fact that the extent list is implemented as an
8162 array, except for the fact that positions are integers (this should be 8527 array, except for the fact that positions are integers (this should be
8163 generalized to handle integers and linked list equally well). 8528 generalized to handle integers and linked list equally well).
8164 8529
8165 @node Zero-Length Extents 8530 @node Zero-Length Extents, Mathematics of Extent Ordering, Format of the Extent Info, Extents
8166 @section Zero-Length Extents 8531 @section Zero-Length Extents
8167 8532
8168 Extents can be zero-length, and will end up that way if their endpoints 8533 Extents can be zero-length, and will end up that way if their endpoints
8169 are explicitly set that way or if their detachable property is nil 8534 are explicitly set that way or if their detachable property is nil
8170 and all the text in the extent is deleted. (The exception is open-open 8535 and all the text in the extent is deleted. (The exception is open-open
8189 8554
8190 Note that closed-open, non-detachable zero-length extents behave 8555 Note that closed-open, non-detachable zero-length extents behave
8191 exactly like markers and that open-closed, non-detachable zero-length 8556 exactly like markers and that open-closed, non-detachable zero-length
8192 extents behave like the ``point-type'' marker in Mule. 8557 extents behave like the ``point-type'' marker in Mule.
8193 8558
8194 @node Mathematics of Extent Ordering 8559 @node Mathematics of Extent Ordering, Extent Fragments, Zero-Length Extents, Extents
8195 @section Mathematics of Extent Ordering 8560 @section Mathematics of Extent Ordering
8196 @cindex extent mathematics 8561 @cindex extent mathematics
8197 @cindex mathematics of extents 8562 @cindex mathematics of extents
8198 @cindex extent ordering 8563 @cindex extent ordering
8199 8564
8324 Proof: If @math{F2} does not include @math{I} then its start index is 8689 Proof: If @math{F2} does not include @math{I} then its start index is
8325 greater than @math{I} and thus it is greater than any extent in 8690 greater than @math{I} and thus it is greater than any extent in
8326 @math{S}, including @math{F}. Otherwise, @math{F2} includes @math{I} 8691 @math{S}, including @math{F}. Otherwise, @math{F2} includes @math{I}
8327 and thus is in @math{S}, and thus @math{F2 >= F}. 8692 and thus is in @math{S}, and thus @math{F2 >= F}.
8328 8693
8329 @node Extent Fragments 8694 @node Extent Fragments, , Mathematics of Extent Ordering, Extents
8330 @section Extent Fragments 8695 @section Extent Fragments
8331 @cindex extent fragment 8696 @cindex extent fragment
8332 8697
8333 Imagine that the buffer is divided up into contiguous, non-overlapping 8698 Imagine that the buffer is divided up into contiguous, non-overlapping
8334 @dfn{runs} of text such that no extent starts or ends within a run 8699 @dfn{runs} of text such that no extent starts or ends within a run
8373 caching is done by @code{image_instantiate} and is necessary because it 8738 caching is done by @code{image_instantiate} and is necessary because it
8374 is generally possible to display an image-instance in multiple 8739 is generally possible to display an image-instance in multiple
8375 domains. For instance if we create a Pixmap, we can actually display 8740 domains. For instance if we create a Pixmap, we can actually display
8376 this on multiple windows - even though we only need a single Pixmap 8741 this on multiple windows - even though we only need a single Pixmap
8377 instance to do this. If caching wasn't done then it would be necessary 8742 instance to do this. If caching wasn't done then it would be necessary
8378 to create image-instances for every displayable occurrance of a glyph - 8743 to create image-instances for every displayable occurrence of a glyph -
8379 and every usage - and this would be extremely memory and cpu intensive. 8744 and every usage - and this would be extremely memory and cpu intensive.
8380 8745
8381 Widget-glyphs (a.k.a native widgets) are not cached in this way. This is 8746 Widget-glyphs (a.k.a native widgets) are not cached in this way. This is
8382 because widget-glyph image-instances on screen are toolkit windows, and 8747 because widget-glyph image-instances on screen are toolkit windows, and
8383 thus cannot be reused in multiple XEmacs domains. Thus widget-glyphs are 8748 thus cannot be reused in multiple XEmacs domains. Thus widget-glyphs are
8407 multiple types of toolkit. Each element in the widget hierarchy is updated 8772 multiple types of toolkit. Each element in the widget hierarchy is updated
8408 from its corresponding widget_instance by walking the widget_instance 8773 from its corresponding widget_instance by walking the widget_instance
8409 tree recursively. 8774 tree recursively.
8410 8775
8411 This has desirable properties such as lw_modify_all_widgets which is 8776 This has desirable properties such as lw_modify_all_widgets which is
8412 called from glyphs-x.c and updates all the properties of a widget 8777 called from @file{glyphs-x.c} and updates all the properties of a widget
8413 without having to know what the widget is or what toolkit it is from. 8778 without having to know what the widget is or what toolkit it is from.
8414 Unfortunately this also has hairy properrties such as making the lwlib 8779 Unfortunately this also has hairy properties such as making the lwlib
8415 code quite complex. And of course lwlib has to know at some level what 8780 code quite complex. And of course lwlib has to know at some level what
8416 the widget is and how to set its properties. 8781 the widget is and how to set its properties.
8417 8782
8418 @node Specifiers, Menus, Glyphs, Top 8783 @node Specifiers, Menus, Glyphs, Top
8419 @chapter Specifiers 8784 @chapter Specifiers
8543 @item tty_name 8908 @item tty_name
8544 The name of the terminal that the subprocess is using, 8909 The name of the terminal that the subprocess is using,
8545 or @code{nil} if it is using pipes. 8910 or @code{nil} if it is using pipes.
8546 @end table 8911 @end table
8547 8912
8548 @node Interface to X Windows, Index, Subprocesses, Top 8913 @node Interface to X Windows, Index , Subprocesses, Top
8549 @chapter Interface to X Windows 8914 @chapter Interface to X Windows
8550 8915
8551 Not yet documented. 8916 Not yet documented.
8552 8917
8553 @include index.texi 8918 @include index.texi
8556 @summarycontents 8921 @summarycontents
8557 @contents 8922 @contents
8558 @c That's all 8923 @c That's all
8559 8924
8560 @bye 8925 @bye
8561