Mercurial > hg > xemacs-beta
comparison man/internals/internals.texi @ 442:abe6d1db359e r21-2-36
Import from CVS: tag r21-2-36
author | cvs |
---|---|
date | Mon, 13 Aug 2007 11:35:02 +0200 |
parents | 8de8e3f6228a |
children | 576fb035e263 |
comparison
equal
deleted
inserted
replaced
441:72a7cfa4a488 | 442:abe6d1db359e |
---|---|
67 | 67 |
68 @author Ben Wing | 68 @author Ben Wing |
69 @author Martin Buchholz | 69 @author Martin Buchholz |
70 @author Hrvoje Niksic | 70 @author Hrvoje Niksic |
71 @author Matthias Neubauer | 71 @author Matthias Neubauer |
72 @author Olivier Galibert | |
72 @page | 73 @page |
73 @vskip 0pt plus 1fill | 74 @vskip 0pt plus 1fill |
74 | 75 |
75 @noindent | 76 @noindent |
76 Copyright @copyright{} 1992 - 1996 Ben Wing. @* | 77 Copyright @copyright{} 1992 - 1996 Ben Wing. @* |
116 * The XEmacs Object System (Abstractly Speaking):: | 117 * The XEmacs Object System (Abstractly Speaking):: |
117 * How Lisp Objects Are Represented in C:: | 118 * How Lisp Objects Are Represented in C:: |
118 * Rules When Writing New C Code:: | 119 * Rules When Writing New C Code:: |
119 * A Summary of the Various XEmacs Modules:: | 120 * A Summary of the Various XEmacs Modules:: |
120 * Allocation of Objects in XEmacs Lisp:: | 121 * Allocation of Objects in XEmacs Lisp:: |
122 * Dumping:: | |
121 * Events and the Event Loop:: | 123 * Events and the Event Loop:: |
122 * Evaluation; Stack Frames; Bindings:: | 124 * Evaluation; Stack Frames; Bindings:: |
123 * Symbols and Variables:: | 125 * Symbols and Variables:: |
124 * Buffers and Textual Representation:: | 126 * Buffers and Textual Representation:: |
125 * MULE Character Sets and Encodings:: | 127 * MULE Character Sets and Encodings:: |
132 * Glyphs:: | 134 * Glyphs:: |
133 * Specifiers:: | 135 * Specifiers:: |
134 * Menus:: | 136 * Menus:: |
135 * Subprocesses:: | 137 * Subprocesses:: |
136 * Interface to X Windows:: | 138 * Interface to X Windows:: |
137 * Index:: Index including concepts, functions, variables, | 139 * Index:: |
138 and other terms. | 140 |
139 | 141 @detailmenu |
140 --- The Detailed Node Listing --- | 142 |
141 | 143 --- The Detailed Node Listing --- |
142 Here are other nodes that are inferiors of those already listed, | |
143 mentioned here so you can get to them in one step: | |
144 | 144 |
145 A History of Emacs | 145 A History of Emacs |
146 | 146 |
147 * Through Version 18:: Unification prevails. | 147 * Through Version 18:: Unification prevails. |
148 * Lucid Emacs:: One version 19 Emacs. | 148 * Lucid Emacs:: One version 19 Emacs. |
149 * GNU Emacs 19:: The other version 19 Emacs. | 149 * GNU Emacs 19:: The other version 19 Emacs. |
150 * GNU Emacs 20:: The other version 20 Emacs. | |
150 * XEmacs:: The continuation of Lucid Emacs. | 151 * XEmacs:: The continuation of Lucid Emacs. |
151 | 152 |
152 Rules When Writing New C Code | 153 Rules When Writing New C Code |
153 | 154 |
154 * General Coding Rules:: | 155 * General Coding Rules:: |
155 * Writing Lisp Primitives:: | 156 * Writing Lisp Primitives:: |
156 * Adding Global Lisp Variables:: | 157 * Adding Global Lisp Variables:: |
158 * Coding for Mule:: | |
157 * Techniques for XEmacs Developers:: | 159 * Techniques for XEmacs Developers:: |
160 | |
161 Coding for Mule | |
162 | |
163 * Character-Related Data Types:: | |
164 * Working With Character and Byte Positions:: | |
165 * Conversion to and from External Data:: | |
166 * General Guidelines for Writing Mule-Aware Code:: | |
167 * An Example of Mule-Aware Code:: | |
158 | 168 |
159 A Summary of the Various XEmacs Modules | 169 A Summary of the Various XEmacs Modules |
160 | 170 |
161 * Low-Level Modules:: | 171 * Low-Level Modules:: |
162 * Basic Lisp Modules:: | 172 * Basic Lisp Modules:: |
179 * Garbage Collection - Step by Step:: | 189 * Garbage Collection - Step by Step:: |
180 * Integers and Characters:: | 190 * Integers and Characters:: |
181 * Allocation from Frob Blocks:: | 191 * Allocation from Frob Blocks:: |
182 * lrecords:: | 192 * lrecords:: |
183 * Low-level allocation:: | 193 * Low-level allocation:: |
184 * Pure Space:: | |
185 * Cons:: | 194 * Cons:: |
186 * Vector:: | 195 * Vector:: |
187 * Bit Vector:: | 196 * Bit Vector:: |
188 * Symbol:: | 197 * Symbol:: |
189 * Marker:: | 198 * Marker:: |
190 * String:: | 199 * String:: |
191 * Compiled Function:: | 200 * Compiled Function:: |
201 | |
202 Garbage Collection - Step by Step | |
203 | |
204 * Invocation:: | |
205 * garbage_collect_1:: | |
206 * mark_object:: | |
207 * gc_sweep:: | |
208 * sweep_lcrecords_1:: | |
209 * compact_string_chars:: | |
210 * sweep_strings:: | |
211 * sweep_bit_vectors_1:: | |
212 | |
213 Dumping | |
214 | |
215 * Overview:: | |
216 * Data descriptions:: | |
217 * Dumping phase:: | |
218 * Reloading phase:: | |
219 | |
220 Dumping phase | |
221 | |
222 * Object inventory:: | |
223 * Address allocation:: | |
224 * The header:: | |
225 * Data dumping:: | |
226 * Pointers dumping:: | |
192 | 227 |
193 Events and the Event Loop | 228 Events and the Event Loop |
194 | 229 |
195 * Introduction to Events:: | 230 * Introduction to Events:: |
196 * Main Loop:: | 231 * Main Loop:: |
226 MULE Character Sets and Encodings | 261 MULE Character Sets and Encodings |
227 | 262 |
228 * Character Sets:: | 263 * Character Sets:: |
229 * Encodings:: | 264 * Encodings:: |
230 * Internal Mule Encodings:: | 265 * Internal Mule Encodings:: |
266 * CCL:: | |
231 | 267 |
232 Encodings | 268 Encodings |
233 | 269 |
234 * Japanese EUC (Extended Unix Code):: | 270 * Japanese EUC (Extended Unix Code):: |
235 * JIS7:: | 271 * JIS7:: |
237 Internal Mule Encodings | 273 Internal Mule Encodings |
238 | 274 |
239 * Internal String Encoding:: | 275 * Internal String Encoding:: |
240 * Internal Character Encoding:: | 276 * Internal Character Encoding:: |
241 | 277 |
242 The Lisp Reader and Compiler | |
243 | |
244 Lstreams | 278 Lstreams |
279 | |
280 * Creating an Lstream:: Creating an lstream object. | |
281 * Lstream Types:: Different sorts of things that are streamed. | |
282 * Lstream Functions:: Functions for working with lstreams. | |
283 * Lstream Methods:: Creating new lstream types. | |
245 | 284 |
246 Consoles; Devices; Frames; Windows | 285 Consoles; Devices; Frames; Windows |
247 | 286 |
248 * Introduction to Consoles; Devices; Frames; Windows:: | 287 * Introduction to Consoles; Devices; Frames; Windows:: |
249 * Point:: | 288 * Point:: |
250 * Window Hierarchy:: | 289 * Window Hierarchy:: |
290 * The Window Object:: | |
251 | 291 |
252 The Redisplay Mechanism | 292 The Redisplay Mechanism |
253 | 293 |
254 * Critical Redisplay Sections:: | 294 * Critical Redisplay Sections:: |
255 * Line Start Cache:: | 295 * Line Start Cache:: |
296 * Redisplay Piece by Piece:: | |
256 | 297 |
257 Extents | 298 Extents |
258 | 299 |
259 * Introduction to Extents:: Extents are ranges over text, with properties. | 300 * Introduction to Extents:: Extents are ranges over text, with properties. |
260 * Extent Ordering:: How extents are ordered internally. | 301 * Extent Ordering:: How extents are ordered internally. |
261 * Format of the Extent Info:: The extent information in a buffer or string. | 302 * Format of the Extent Info:: The extent information in a buffer or string. |
262 * Zero-Length Extents:: A weird special case. | 303 * Zero-Length Extents:: A weird special case. |
263 * Mathematics of Extent Ordering:: A rigorous foundation. | 304 * Mathematics of Extent Ordering:: A rigorous foundation. |
264 * Extent Fragments:: Cached information useful for redisplay. | 305 * Extent Fragments:: Cached information useful for redisplay. |
265 | 306 |
266 Faces | 307 @end detailmenu |
267 | |
268 Glyphs | |
269 | |
270 Specifiers | |
271 | |
272 Menus | |
273 | |
274 Subprocesses | |
275 | |
276 Interface to X Windows | |
277 | |
278 @end menu | 308 @end menu |
279 | 309 |
280 @node A History of Emacs, XEmacs From the Outside, Top, Top | 310 @node A History of Emacs, XEmacs From the Outside, Top, Top |
281 @chapter A History of Emacs | 311 @chapter A History of Emacs |
282 @cindex history of Emacs | 312 @cindex history of Emacs |
313 * GNU Emacs 19:: The other version 19 Emacs. | 343 * GNU Emacs 19:: The other version 19 Emacs. |
314 * GNU Emacs 20:: The other version 20 Emacs. | 344 * GNU Emacs 20:: The other version 20 Emacs. |
315 * XEmacs:: The continuation of Lucid Emacs. | 345 * XEmacs:: The continuation of Lucid Emacs. |
316 @end menu | 346 @end menu |
317 | 347 |
318 @node Through Version 18 | 348 @node Through Version 18, Lucid Emacs, A History of Emacs, A History of Emacs |
319 @section Through Version 18 | 349 @section Through Version 18 |
320 @cindex Gosling, James | 350 @cindex Gosling, James |
321 @cindex Great Usenet Renaming | 351 @cindex Great Usenet Renaming |
322 | 352 |
323 Although the history of the early versions of GNU Emacs is unclear, | 353 Although the history of the early versions of GNU Emacs is unclear, |
426 version 18.58 released ?????. | 456 version 18.58 released ?????. |
427 @item | 457 @item |
428 version 18.59 released October 31, 1992. | 458 version 18.59 released October 31, 1992. |
429 @end itemize | 459 @end itemize |
430 | 460 |
431 @node Lucid Emacs | 461 @node Lucid Emacs, GNU Emacs 19, Through Version 18, A History of Emacs |
432 @section Lucid Emacs | 462 @section Lucid Emacs |
433 @cindex Lucid Emacs | 463 @cindex Lucid Emacs |
434 @cindex Lucid Inc. | 464 @cindex Lucid Inc. |
435 @cindex Energize | 465 @cindex Energize |
436 @cindex Epoch | 466 @cindex Epoch |
514 version 20.3 (the first stable version of XEmacs 20.x) released November 30, | 544 version 20.3 (the first stable version of XEmacs 20.x) released November 30, |
515 1997. | 545 1997. |
516 version 20.4 released February 28, 1998. | 546 version 20.4 released February 28, 1998. |
517 @end itemize | 547 @end itemize |
518 | 548 |
519 @node GNU Emacs 19 | 549 @node GNU Emacs 19, GNU Emacs 20, Lucid Emacs, A History of Emacs |
520 @section GNU Emacs 19 | 550 @section GNU Emacs 19 |
521 @cindex GNU Emacs 19 | 551 @cindex GNU Emacs 19 |
522 @cindex FSF Emacs | 552 @cindex FSF Emacs |
523 | 553 |
524 About a year after the initial release of Lucid Emacs, the FSF | 554 About a year after the initial release of Lucid Emacs, the FSF |
591 worse. Lucid soon began incorporating features from GNU Emacs 19 into | 621 worse. Lucid soon began incorporating features from GNU Emacs 19 into |
592 Lucid Emacs; the work was mostly done by Richard Mlynarik, who had been | 622 Lucid Emacs; the work was mostly done by Richard Mlynarik, who had been |
593 working on and using GNU Emacs for a long time (back as far as version | 623 working on and using GNU Emacs for a long time (back as far as version |
594 16 or 17). | 624 16 or 17). |
595 | 625 |
596 @node GNU Emacs 20 | 626 @node GNU Emacs 20, XEmacs, GNU Emacs 19, A History of Emacs |
597 @section GNU Emacs 20 | 627 @section GNU Emacs 20 |
598 @cindex GNU Emacs 20 | 628 @cindex GNU Emacs 20 |
599 @cindex FSF Emacs | 629 @cindex FSF Emacs |
600 | 630 |
601 On February 2, 1997 work began on GNU Emacs to integrate Mule. The first | 631 On February 2, 1997 work began on GNU Emacs to integrate Mule. The first |
610 version 20.2 released September 20, 1997. | 640 version 20.2 released September 20, 1997. |
611 @item | 641 @item |
612 version 20.3 released August 19, 1998. | 642 version 20.3 released August 19, 1998. |
613 @end itemize | 643 @end itemize |
614 | 644 |
615 @node XEmacs | 645 @node XEmacs, , GNU Emacs 20, A History of Emacs |
616 @section XEmacs | 646 @section XEmacs |
617 @cindex XEmacs | 647 @cindex XEmacs |
618 | 648 |
619 @cindex Sun Microsystems | 649 @cindex Sun Microsystems |
620 @cindex University of Illinois | 650 @cindex University of Illinois |
935 Java, which is inexcusable. | 965 Java, which is inexcusable. |
936 @end enumerate | 966 @end enumerate |
937 | 967 |
938 Unfortunately, there is no perfect language. Static typing allows a | 968 Unfortunately, there is no perfect language. Static typing allows a |
939 compiler to catch programmer errors and produce more efficient code, but | 969 compiler to catch programmer errors and produce more efficient code, but |
940 makes programming more tedious and less fun. For the forseeable future, | 970 makes programming more tedious and less fun. For the foreseeable future, |
941 an Ideal Editing and Programming Environment (and that is what XEmacs | 971 an Ideal Editing and Programming Environment (and that is what XEmacs |
942 aspires to) will be programmable in multiple languages: high level ones | 972 aspires to) will be programmable in multiple languages: high level ones |
943 like Lisp for user customization and prototyping, and lower level ones | 973 like Lisp for user customization and prototyping, and lower level ones |
944 for infrastructure and industrial strength applications. If I had my | 974 for infrastructure and industrial strength applications. If I had my |
945 way, XEmacs would be friendly towards the Python, Scheme, C++, ML, | 975 way, XEmacs would be friendly towards the Python, Scheme, C++, ML, |
1186 @file{loadup.el} tells the C code about this function by setting its | 1216 @file{loadup.el} tells the C code about this function by setting its |
1187 name as the value of the Lisp variable @code{top-level}. | 1217 name as the value of the Lisp variable @code{top-level}. |
1188 | 1218 |
1189 When the Lisp initialization code is done, the C code enters the event | 1219 When the Lisp initialization code is done, the C code enters the event |
1190 loop, and stays there for the duration of the XEmacs process. The code | 1220 loop, and stays there for the duration of the XEmacs process. The code |
1191 for the event loop is contained in @file{keyboard.c}, and is called | 1221 for the event loop is contained in @file{cmdloop.c}, and is called |
1192 @code{Fcommand_loop_1()}. Note that this event loop could very well be | 1222 @code{Fcommand_loop_1()}. Note that this event loop could very well be |
1193 written in Lisp, and in fact a Lisp version exists; but apparently, | 1223 written in Lisp, and in fact a Lisp version exists; but apparently, |
1194 doing this makes XEmacs run noticeably slower. | 1224 doing this makes XEmacs run noticeably slower. |
1195 | 1225 |
1196 Notice how much of the initialization is done in Lisp, not in C. | 1226 Notice how much of the initialization is done in Lisp, not in C. |
1588 | 1618 |
1589 @example | 1619 @example |
1590 [ 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 ] | 1620 [ 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 ] |
1591 [ 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 ] | 1621 [ 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 ] |
1592 | 1622 |
1593 <---> ^ <------------------------------------------------------> | 1623 <---------------------------------------------------------> <-> |
1594 tag | a pointer to a structure, or an integer | 1624 a pointer to a structure, or an integer tag |
1595 | | 1625 @end example |
1596 mark bit | 1626 |
1597 @end example | 1627 A tag of 00 is used for all pointer object types, a tag of 10 is used |
1598 | 1628 for characters, and the other two tags 01 and 11 are joined together to |
1599 The tag describes the type of the Lisp object. For integers and chars, | 1629 form the integer object type. This representation gives us 31 bit |
1600 the lower 28 bits contain the value of the integer or char; for all | 1630 integers and 30 bit characters, while pointers are represented directly |
1601 others, the lower 28 bits contain a pointer. The mark bit is used | 1631 without any bit masking or shifting. This representation, though, |
1602 during garbage-collection, and is always 0 when garbage collection is | 1632 assumes that pointers to structs are always aligned to multiples of 4, |
1603 not happening. (The way that garbage collection works, basically, is that it | 1633 so the lower 2 bits are always zero. |
1604 loops over all places where Lisp objects could exist---this includes | |
1605 all global variables in C that contain Lisp objects [including | |
1606 @code{Vobarray}, the C equivalent of @code{obarray}; through this, all | |
1607 Lisp variables will get marked], plus various other places---and | |
1608 recursively scans through the Lisp objects, marking each object it finds | |
1609 by setting the mark bit. Then it goes through the lists of all objects | |
1610 allocated, freeing the ones that are not marked and turning off the mark | |
1611 bit of the ones that are marked.) | |
1612 | 1634 |
1613 Lisp objects use the typedef @code{Lisp_Object}, but the actual C type | 1635 Lisp objects use the typedef @code{Lisp_Object}, but the actual C type |
1614 used for the Lisp object can vary. It can be either a simple type | 1636 used for the Lisp object can vary. It can be either a simple type |
1615 (@code{long} on the DEC Alpha, @code{int} on other machines) or a | 1637 (@code{long} on the DEC Alpha, @code{int} on other machines) or a |
1616 structure whose fields are bit fields that line up properly (actually, a | 1638 structure whose fields are bit fields that line up properly (actually, a |
1617 union of structures is used). Generally the simple integral type is | 1639 union of structures is used). Generally the simple integral type is |
1618 preferable because it ensures that the compiler will actually use a | 1640 preferable because it ensures that the compiler will actually use a |
1619 machine word to represent the object (some compilers will use more | 1641 machine word to represent the object (some compilers will use more |
1620 general and less efficient code for unions and structs even if they can | 1642 general and less efficient code for unions and structs even if they can |
1621 fit in a machine word). The union type, however, has the advantage of | 1643 fit in a machine word). The union type, however, has the advantage of |
1622 stricter type checking (if you accidentally pass an integer where a Lisp | 1644 stricter type checking. If you accidentally pass an integer where a Lisp |
1623 object is desired, you get a compile error), and it makes it easier to | 1645 object is desired, you get a compile error. The choice of which type |
1624 decode Lisp objects when debugging. The choice of which type to use is | 1646 to use is determined by the preprocessor constant @code{USE_UNION_TYPE} |
1625 determined by the preprocessor constant @code{USE_UNION_TYPE} which is | 1647 which is defined via the @code{--use-union-type} option to |
1626 defined via the @code{--use-union-type} option to @code{configure}. | 1648 @code{configure}. |
1627 | 1649 |
1628 @cindex record type | 1650 Various macros are used to convert between Lisp_Objects and the |
1629 | 1651 corresponding C type. Macros of the form @code{XINT()}, @code{XCHAR()}, |
1630 Note that there are only eight types that the tag can represent, but | 1652 @code{XSTRING()}, @code{XSYMBOL()}, do any required bit shifting and/or |
1631 many more actual types than this. This is handled by having one of the | 1653 masking and cast it to the appropriate type. @code{XINT()} needs to be |
1632 tag types specify a meta-type called a @dfn{record}; for all such | 1654 a bit tricky so that negative numbers are properly sign-extended. Since |
1633 objects, the first four bytes of the pointed-to structure indicate what | 1655 integers are stored left-shifted, if the right-shift operator does an |
1634 the actual type is. | 1656 arithmetic shift (i.e. it leaves the most-significant bit as-is rather |
1635 | 1657 than shifting in a zero, so that it mimics a divide-by-two even for |
1636 Note also that having 28 bits for pointers and integers restricts a lot | 1658 negative numbers) the shift to remove the tag bit is enough. This is |
1637 of things to 256 megabytes of memory. (Basically, enough pointers and | 1659 the case on all the systems we support. |
1638 indices and whatnot get stuffed into Lisp objects that the total amount | 1660 |
1639 of memory used by XEmacs can't grow above 256 megabytes. In older | 1661 Note that when @code{ERROR_CHECK_TYPECHECK} is defined, the converter |
1640 versions of XEmacs and GNU Emacs, the tag was 5 bits wide, allowing for | |
1641 32 types, which was more than the actual number of types that existed at | |
1642 the time, and no ``record'' type was necessary. However, this limited | |
1643 the editor to 64 megabytes total, which some users who edited large | |
1644 files might conceivably exceed.) | |
1645 | |
1646 Also, note that there is an implicit assumption here that all pointers | |
1647 are low enough that the top bits are all zero and can just be chopped | |
1648 off. On standard machines that allocate memory from the bottom up (and | |
1649 give each process its own address space), this works fine. Some | |
1650 machines, however, put the data space somewhere else in memory | |
1651 (e.g. beginning at 0x80000000). Those machines cope by defining | |
1652 @code{DATA_SEG_BITS} in the corresponding @file{m/} or @file{s/} file to | |
1653 the proper mask. Then, pointers retrieved from Lisp objects are | |
1654 automatically OR'ed with this value prior to being used. | |
1655 | |
1656 A corollary of the previous paragraph is that @strong{(pointers to) | |
1657 stack-allocated structures cannot be put into Lisp objects}. The stack | |
1658 is generally located near the top of memory; if you put such a pointer | |
1659 into a Lisp object, it will get its top bits chopped off, and you will | |
1660 lose. | |
1661 | |
1662 Actually, there's an alternative representation of a @code{Lisp_Object}, | |
1663 invented by Kyle Jones, that is used when the | |
1664 @code{--use-minimal-tagbits} option to @code{configure} is used. In | |
1665 this case the 2 lower bits are used for the tag bits. This | |
1666 representation assumes that pointers to structs are always aligned to | |
1667 multiples of 4, so the lower 2 bits are always zero. | |
1668 | |
1669 @example | |
1670 [ 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 ] | |
1671 [ 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 ] | |
1672 | |
1673 <---------------------------------------------------------> <-> | |
1674 a pointer to a structure, or an integer tag | |
1675 @end example | |
1676 | |
1677 A tag of 00 is used for all pointer object types, a tag of 10 is used | |
1678 for characters, and the other two tags 01 and 11 are joined together to | |
1679 form the integer object type. The markbit is moved to part of the | |
1680 structure being pointed at (integers and chars do not need to be marked, | |
1681 since no memory is allocated). This representation has these | |
1682 advantages: | |
1683 | |
1684 @enumerate | |
1685 @item | |
1686 31 bits can be used for Lisp Integers. | |
1687 @item | |
1688 @emph{Any} pointer can be represented directly, and no bit masking | |
1689 operations are necessary. | |
1690 @end enumerate | |
1691 | |
1692 The disadvantages are: | |
1693 | |
1694 @enumerate | |
1695 @item | |
1696 An extra level of indirection is needed when accessing the object types | |
1697 that were not record types. So checking whether a Lisp object is a cons | |
1698 cell becomes a slower operation. | |
1699 @item | |
1700 Mark bits can no longer be stored directly in Lisp objects, so another | |
1701 place for them must be found. This means that a cons cell requires more | |
1702 memory than merely room for 2 lisp objects, leading to extra memory use. | |
1703 @end enumerate | |
1704 | |
1705 Various macros are used to construct Lisp objects and extract the | |
1706 components. Macros of the form @code{XINT()}, @code{XCHAR()}, | |
1707 @code{XSTRING()}, @code{XSYMBOL()}, etc. mask out the pointer/integer | |
1708 field and cast it to the appropriate type. All of the macros that | |
1709 construct pointers will @code{OR} with @code{DATA_SEG_BITS} if | |
1710 necessary. @code{XINT()} needs to be a bit tricky so that negative | |
1711 numbers are properly sign-extended: Usually it does this by shifting the | |
1712 number four bits to the left and then four bits to the right. This | |
1713 assumes that the right-shift operator does an arithmetic shift (i.e. it | |
1714 leaves the most-significant bit as-is rather than shifting in a zero, so | |
1715 that it mimics a divide-by-two even for negative numbers). Not all | |
1716 machines/compilers do this, and on the ones that don't, a more | |
1717 complicated definition is selected by defining | |
1718 @code{EXPLICIT_SIGN_EXTEND}. | |
1719 | |
1720 Note that when @code{ERROR_CHECK_TYPECHECK} is defined, the extractor | |
1721 macros become more complicated---they check the tag bits and/or the | 1662 macros become more complicated---they check the tag bits and/or the |
1722 type field in the first four bytes of a record type to ensure that the | 1663 type field in the first four bytes of a record type to ensure that the |
1723 object is really of the correct type. This is great for catching places | 1664 object is really of the correct type. This is great for catching places |
1724 where an incorrect type is being dereferenced---this typically results | 1665 where an incorrect type is being dereferenced---this typically results |
1725 in a pointer being dereferenced as the wrong type of structure, with | 1666 in a pointer being dereferenced as the wrong type of structure, with |
1726 unpredictable (and sometimes not easily traceable) results. | 1667 unpredictable (and sometimes not easily traceable) results. |
1727 | 1668 |
1728 There are similar @code{XSET@var{TYPE}()} macros that construct a Lisp | 1669 There are similar @code{XSET@var{TYPE}()} macros that construct a Lisp |
1729 object. These macros are of the form @code{XSET@var{TYPE} | 1670 object. These macros are of the form @code{XSET@var{TYPE} |
1730 (@var{lvalue}, @var{result})}, | 1671 (@var{lvalue}, @var{result})}, i.e. they have to be a statement rather |
1731 i.e. they have to be a statement rather than just used in an expression. | 1672 than just used in an expression. The reason for this is that standard C |
1732 The reason for this is that standard C doesn't let you ``construct'' a | 1673 doesn't let you ``construct'' a structure (but GCC does). Granted, this |
1733 structure (but GCC does). Granted, this sometimes isn't too convenient; | 1674 sometimes isn't too convenient; for the case of integers, at least, you |
1734 for the case of integers, at least, you can use the function | 1675 can use the function @code{make_int()}, which constructs and |
1735 @code{make_int()}, which constructs and @emph{returns} an integer | 1676 @emph{returns} an integer Lisp object. Note that the |
1736 Lisp object. Note that the @code{XSET@var{TYPE}()} macros are also | 1677 @code{XSET@var{TYPE}()} macros are also affected by |
1737 affected by @code{ERROR_CHECK_TYPECHECK} and make sure that the | 1678 @code{ERROR_CHECK_TYPECHECK} and make sure that the structure is of the |
1738 structure is of the right type in the case of record types, where the | 1679 right type in the case of record types, where the type is contained in |
1739 type is contained in the structure. | 1680 the structure. |
1740 | 1681 |
1741 The C programmer is responsible for @strong{guaranteeing} that a | 1682 The C programmer is responsible for @strong{guaranteeing} that a |
1742 Lisp_Object is is the correct type before using the @code{X@var{TYPE}} | 1683 Lisp_Object is the correct type before using the @code{X@var{TYPE}} |
1743 macros. This is especially important in the case of lists. Use | 1684 macros. This is especially important in the case of lists. Use |
1744 @code{XCAR} and @code{XCDR} if a Lisp_Object is certainly a cons cell, | 1685 @code{XCAR} and @code{XCDR} if a Lisp_Object is certainly a cons cell, |
1745 else use @code{Fcar()} and @code{Fcdr()}. Trust other C code, but not | 1686 else use @code{Fcar()} and @code{Fcdr()}. Trust other C code, but not |
1746 Lisp code. On the other hand, if XEmacs has an internal logic error, | 1687 Lisp code. On the other hand, if XEmacs has an internal logic error, |
1747 it's better to crash immediately, so sprinkle ``unreachable'' | 1688 it's better to crash immediately, so sprinkle @code{assert()}s and |
1748 @code{abort()}s liberally about the source code. | 1689 ``unreachable'' @code{abort()}s liberally about the source code. Where |
1690 performance is an issue, use @code{type_checking_assert}, | |
1691 @code{bufpos_checking_assert}, and @code{gc_checking_assert}, which do | |
1692 nothing unless the corresponding configure error checking flag was | |
1693 specified. | |
1749 | 1694 |
1750 @node Rules When Writing New C Code, A Summary of the Various XEmacs Modules, How Lisp Objects Are Represented in C, Top | 1695 @node Rules When Writing New C Code, A Summary of the Various XEmacs Modules, How Lisp Objects Are Represented in C, Top |
1751 @chapter Rules When Writing New C Code | 1696 @chapter Rules When Writing New C Code |
1752 | 1697 |
1753 The XEmacs C Code is extremely complex and intricate, and there are many | 1698 The XEmacs C Code is extremely complex and intricate, and there are many |
1763 * Adding Global Lisp Variables:: | 1708 * Adding Global Lisp Variables:: |
1764 * Coding for Mule:: | 1709 * Coding for Mule:: |
1765 * Techniques for XEmacs Developers:: | 1710 * Techniques for XEmacs Developers:: |
1766 @end menu | 1711 @end menu |
1767 | 1712 |
1768 @node General Coding Rules | 1713 @node General Coding Rules, Writing Lisp Primitives, Rules When Writing New C Code, Rules When Writing New C Code |
1769 @section General Coding Rules | 1714 @section General Coding Rules |
1770 | 1715 |
1771 The C code is actually written in a dialect of C called @dfn{Clean C}, | 1716 The C code is actually written in a dialect of C called @dfn{Clean C}, |
1772 meaning that it can be compiled, mostly warning-free, with either a C or | 1717 meaning that it can be compiled, mostly warning-free, with either a C or |
1773 C++ compiler. Coding in Clean C has several advantages over plain C. | 1718 C++ compiler. Coding in Clean C has several advantages over plain C. |
1797 must always be included before any other header files (including | 1742 must always be included before any other header files (including |
1798 system header files) to ensure that certain tricks played by various | 1743 system header files) to ensure that certain tricks played by various |
1799 @file{s/} and @file{m/} files work out correctly. | 1744 @file{s/} and @file{m/} files work out correctly. |
1800 | 1745 |
1801 When including header files, always use angle brackets, not double | 1746 When including header files, always use angle brackets, not double |
1802 quotes, except when the file to be included is in the same directory as | 1747 quotes, except when the file to be included is always in the same |
1803 the including file. If either file is a generated file, then that is | 1748 directory as the including file. If either file is a generated file, |
1804 not likely to be the case. In order to understand why we have this | 1749 then that is not likely to be the case. In order to understand why we |
1805 rule, imagine what happens when you do a build in the source directory | 1750 have this rule, imagine what happens when you do a build in the source |
1806 using @samp{./configure} and another build in another directory using | 1751 directory using @samp{./configure} and another build in another |
1807 @samp{../work/configure}. There will be two different @file{config.h} | 1752 directory using @samp{../work/configure}. There will be two different |
1808 files. Which one will be used if you @samp{#include "config.h"}? | 1753 @file{config.h} files. Which one will be used if you @samp{#include |
1754 "config.h"}? | |
1809 | 1755 |
1810 @strong{All global and static variables that are to be modifiable must | 1756 @strong{All global and static variables that are to be modifiable must |
1811 be declared uninitialized.} This means that you may not use the | 1757 be declared uninitialized.} This means that you may not use the |
1812 ``declare with initializer'' form for these variables, such as @code{int | 1758 ``declare with initializer'' form for these variables, such as @code{int |
1813 some_variable = 0;}. The reason for this has to do with some kludges | 1759 some_variable = 0;}. The reason for this has to do with some kludges |
1814 done during the dumping process: If possible, the initialized data | 1760 done during the dumping process: If possible, the initialized data |
1815 segment is re-mapped so that it becomes part of the (unmodifiable) code | 1761 segment is re-mapped so that it becomes part of the (unmodifiable) code |
1816 segment in the dumped executable. This allows this memory to be shared | 1762 segment in the dumped executable. This allows this memory to be shared |
1817 among multiple running XEmacs processes. XEmacs is careful to place as | 1763 among multiple running XEmacs processes. XEmacs is careful to place as |
1818 much constant data as possible into initialized variables (in | 1764 much constant data as possible into initialized variables during the |
1819 particular, into what's called the @dfn{pure space}---see below) during | 1765 @file{temacs} phase. |
1820 the @file{temacs} phase. | |
1821 | 1766 |
1822 @cindex copy-on-write | 1767 @cindex copy-on-write |
1823 @strong{Please note:} This kludge only works on a few systems nowadays, | 1768 @strong{Please note:} This kludge only works on a few systems nowadays, |
1824 and is rapidly becoming irrelevant because most modern operating systems | 1769 and is rapidly becoming irrelevant because most modern operating systems |
1825 provide @dfn{copy-on-write} semantics. All data is initially shared | 1770 provide @dfn{copy-on-write} semantics. All data is initially shared |
1849 | 1794 |
1850 The C source code makes heavy use of C preprocessor macros. One popular | 1795 The C source code makes heavy use of C preprocessor macros. One popular |
1851 macro style is: | 1796 macro style is: |
1852 | 1797 |
1853 @example | 1798 @example |
1854 #define FOO(var, value) do @{ \ | 1799 #define FOO(var, value) do @{ \ |
1855 Lisp_Object FOO_value = (value); \ | 1800 Lisp_Object FOO_value = (value); \ |
1856 ... /* compute using FOO_value */ \ | 1801 ... /* compute using FOO_value */ \ |
1857 (var) = bar; \ | 1802 (var) = bar; \ |
1858 @} while (0) | 1803 @} while (0) |
1859 @end example | 1804 @end example |
1878 case of @code{GET_EXTERNAL_LIST_LENGTH}, validating the properness of | 1823 case of @code{GET_EXTERNAL_LIST_LENGTH}, validating the properness of |
1879 the list. The macros @code{EXTERNAL_LIST_LOOP_DELETE_IF} and | 1824 the list. The macros @code{EXTERNAL_LIST_LOOP_DELETE_IF} and |
1880 @code{LIST_LOOP_DELETE_IF} delete elements from a lisp list satisfying some | 1825 @code{LIST_LOOP_DELETE_IF} delete elements from a lisp list satisfying some |
1881 predicate. | 1826 predicate. |
1882 | 1827 |
1883 @node Writing Lisp Primitives | 1828 @node Writing Lisp Primitives, Adding Global Lisp Variables, General Coding Rules, Rules When Writing New C Code |
1884 @section Writing Lisp Primitives | 1829 @section Writing Lisp Primitives |
1885 | 1830 |
1886 Lisp primitives are Lisp functions implemented in C. The details of | 1831 Lisp primitives are Lisp functions implemented in C. The details of |
1887 interfacing the C function so that Lisp can call it are handled by a few | 1832 interfacing the C function so that Lisp can call it are handled by a few |
1888 C macros. The only way to really understand how to write new C code is | 1833 C macros. The only way to really understand how to write new C code is |
2122 | 2067 |
2123 @file{eval.c} is a very good file to look through for examples; | 2068 @file{eval.c} is a very good file to look through for examples; |
2124 @file{lisp.h} contains the definitions for important macros and | 2069 @file{lisp.h} contains the definitions for important macros and |
2125 functions. | 2070 functions. |
2126 | 2071 |
2127 @node Adding Global Lisp Variables | 2072 @node Adding Global Lisp Variables, Coding for Mule, Writing Lisp Primitives, Rules When Writing New C Code |
2128 @section Adding Global Lisp Variables | 2073 @section Adding Global Lisp Variables |
2129 | 2074 |
2130 Global variables whose names begin with @samp{Q} are constants whose | 2075 Global variables whose names begin with @samp{Q} are constants whose |
2131 value is a symbol of a particular name. The name of the variable should | 2076 value is a symbol of a particular name. The name of the variable should |
2132 be derived from the name of the symbol using the same rules as for Lisp | 2077 be derived from the name of the symbol using the same rules as for Lisp |
2184 garbage-collection mechanism won't know that the object in this variable | 2129 garbage-collection mechanism won't know that the object in this variable |
2185 is in use, and will happily collect it and reuse its storage for another | 2130 is in use, and will happily collect it and reuse its storage for another |
2186 Lisp object, and you will be the one who's unhappy when you can't figure | 2131 Lisp object, and you will be the one who's unhappy when you can't figure |
2187 out how your variable got overwritten. | 2132 out how your variable got overwritten. |
2188 | 2133 |
2189 @node Coding for Mule | 2134 @node Coding for Mule, Techniques for XEmacs Developers, Adding Global Lisp Variables, Rules When Writing New C Code |
2190 @section Coding for Mule | 2135 @section Coding for Mule |
2191 @cindex Coding for Mule | 2136 @cindex Coding for Mule |
2192 | 2137 |
2193 Although Mule support is not compiled by default in XEmacs, many people | 2138 Although Mule support is not compiled by default in XEmacs, many people |
2194 are using it, and we consider it crucial that new code works correctly | 2139 are using it, and we consider it crucial that new code works correctly |
2207 * Conversion to and from External Data:: | 2152 * Conversion to and from External Data:: |
2208 * General Guidelines for Writing Mule-Aware Code:: | 2153 * General Guidelines for Writing Mule-Aware Code:: |
2209 * An Example of Mule-Aware Code:: | 2154 * An Example of Mule-Aware Code:: |
2210 @end menu | 2155 @end menu |
2211 | 2156 |
2212 @node Character-Related Data Types | 2157 @node Character-Related Data Types, Working With Character and Byte Positions, Coding for Mule, Coding for Mule |
2213 @subsection Character-Related Data Types | 2158 @subsection Character-Related Data Types |
2214 | 2159 |
2215 First, let's review the basic character-related datatypes used by | 2160 First, let's review the basic character-related datatypes used by |
2216 XEmacs. Note that the separate @code{typedef}s are not mandatory in the | 2161 XEmacs. Note that the separate @code{typedef}s are not mandatory in the |
2217 current implementation (all of them boil down to @code{unsigned char} or | 2162 current implementation (all of them boil down to @code{unsigned char} or |
2234 @item Bufbyte | 2179 @item Bufbyte |
2235 @cindex Bufbyte | 2180 @cindex Bufbyte |
2236 The data representing the text in a buffer or string is logically a set | 2181 The data representing the text in a buffer or string is logically a set |
2237 of @code{Bufbyte}s. | 2182 of @code{Bufbyte}s. |
2238 | 2183 |
2239 XEmacs does not work with character formats all the time; when reading | 2184 XEmacs does not work with the same character formats all the time; when |
2240 characters from the outside, it decodes them to an internal format, and | 2185 reading characters from the outside, it decodes them to an internal |
2241 likewise encodes them when writing. @code{Bufbyte} (in fact | 2186 format, and likewise encodes them when writing. @code{Bufbyte} (in fact |
2242 @code{unsigned char}) is the basic unit of XEmacs internal buffers and | 2187 @code{unsigned char}) is the basic unit of XEmacs internal buffers and |
2243 strings format. | 2188 strings format. A @code{Bufbyte *} is the type that points at text |
2189 encoded in the variable-width internal encoding. | |
2244 | 2190 |
2245 One character can correspond to one or more @code{Bufbyte}s. In the | 2191 One character can correspond to one or more @code{Bufbyte}s. In the |
2246 current implementation, an ASCII character is represented by the same | 2192 current Mule implementation, an ASCII character is represented by the |
2247 @code{Bufbyte}, and extended characters are represented by a sequence of | 2193 same @code{Bufbyte}, and other characters are represented by a sequence |
2248 @code{Bufbyte}s. | 2194 of two or more @code{Bufbyte}s. |
2249 | 2195 |
2250 Without Mule support, a @code{Bufbyte} is equivalent to an | 2196 Without Mule support, there are exactly 256 characters, implicitly |
2251 @code{Emchar}. | 2197 Latin-1, and each character is represented using one @code{Bufbyte}, and |
2198 there is a one-to-one correspondence between @code{Bufbyte}s and | |
2199 @code{Emchar}s. | |
2252 | 2200 |
2253 @item Bufpos | 2201 @item Bufpos |
2254 @itemx Charcount | 2202 @itemx Charcount |
2255 @cindex Bufpos | 2203 @cindex Bufpos |
2256 @cindex Charcount | 2204 @cindex Charcount |
2257 A @code{Bufpos} represents a character position in a buffer or string. | 2205 A @code{Bufpos} represents a character position in a buffer or string. |
2258 A @code{Charcount} represents a number (count) of characters. | 2206 A @code{Charcount} represents a number (count) of characters. |
2259 Logically, subtracting two @code{Bufpos} values yields a | 2207 Logically, subtracting two @code{Bufpos} values yields a |
2260 @code{Charcount} value. Although all of these are @code{typedef}ed to | 2208 @code{Charcount} value. Although all of these are @code{typedef}ed to |
2261 @code{int}, we use them in preference to @code{int} to make it clear | 2209 @code{EMACS_INT}, we use them in preference to @code{EMACS_INT} to make |
2262 what sort of position is being used. | 2210 it clear what sort of position is being used. |
2263 | 2211 |
2264 @code{Bufpos} and @code{Charcount} values are the only ones that are | 2212 @code{Bufpos} and @code{Charcount} values are the only ones that are |
2265 ever visible to Lisp. | 2213 ever visible to Lisp. |
2266 | 2214 |
2267 @item Bytind | 2215 @item Bytind |
2268 @itemx Bytecount | 2216 @itemx Bytecount |
2269 @cindex Bytind | 2217 @cindex Bytind |
2270 @cindex Bytecount | 2218 @cindex Bytecount |
2271 A @code{Bytind} represents a byte position in a buffer or string. A | 2219 A @code{Bytind} represents a byte position in a buffer or string. A |
2272 @code{Bytecount} represents the distance between two positions in bytes. | 2220 @code{Bytecount} represents the distance between two positions, in bytes. |
2273 The relationship between @code{Bytind} and @code{Bytecount} is the same | 2221 The relationship between @code{Bytind} and @code{Bytecount} is the same |
2274 as the relationship between @code{Bufpos} and @code{Charcount}. | 2222 as the relationship between @code{Bufpos} and @code{Charcount}. |
2275 | 2223 |
2276 @item Extbyte | 2224 @item Extbyte |
2277 @itemx Extcount | 2225 @itemx Extcount |
2281 which are equivalent to @code{unsigned char}. Obviously, an | 2229 which are equivalent to @code{unsigned char}. Obviously, an |
2282 @code{Extcount} is the distance between two @code{Extbyte}s. Extbytes | 2230 @code{Extcount} is the distance between two @code{Extbyte}s. Extbytes |
2283 and Extcounts are not all that frequent in XEmacs code. | 2231 and Extcounts are not all that frequent in XEmacs code. |
2284 @end table | 2232 @end table |
2285 | 2233 |
2286 @node Working With Character and Byte Positions | 2234 @node Working With Character and Byte Positions, Conversion to and from External Data, Character-Related Data Types, Coding for Mule |
2287 @subsection Working With Character and Byte Positions | 2235 @subsection Working With Character and Byte Positions |
2288 | 2236 |
2289 Now that we have defined the basic character-related types, we can look | 2237 Now that we have defined the basic character-related types, we can look |
2290 at the macros and functions designed for work with them and for | 2238 at the macros and functions designed for work with them and for |
2291 conversion between them. Most of these macros are defined in | 2239 conversion between them. Most of these macros are defined in |
2294 learn about them. | 2242 learn about them. |
2295 | 2243 |
2296 @table @code | 2244 @table @code |
2297 @item MAX_EMCHAR_LEN | 2245 @item MAX_EMCHAR_LEN |
2298 @cindex MAX_EMCHAR_LEN | 2246 @cindex MAX_EMCHAR_LEN |
2299 This preprocessor constant is the maximum number of buffer bytes per | 2247 This preprocessor constant is the maximum number of buffer bytes to |
2300 Emacs character, i.e. the byte length of an @code{Emchar}. It is useful | 2248 represent an Emacs character in the variable width internal encoding. |
2301 when allocating temporary strings to keep a known number of characters. | 2249 It is useful when allocating temporary strings to keep a known number of |
2302 For instance: | 2250 characters. For instance: |
2303 | 2251 |
2304 @example | 2252 @example |
2305 @group | 2253 @group |
2306 @{ | 2254 @{ |
2307 Charcount cclen; | 2255 Charcount cclen; |
2405 @example | 2353 @example |
2406 Bufbyte *charptr_n_addr (Bufbyte *p, Charcount cc); | 2354 Bufbyte *charptr_n_addr (Bufbyte *p, Charcount cc); |
2407 @end example | 2355 @end example |
2408 @end table | 2356 @end table |
2409 | 2357 |
2410 @node Conversion to and from External Data | 2358 @node Conversion to and from External Data, General Guidelines for Writing Mule-Aware Code, Working With Character and Byte Positions, Coding for Mule |
2411 @subsection Conversion to and from External Data | 2359 @subsection Conversion to and from External Data |
2412 | 2360 |
2413 When an external function, such as a C library function, returns a | 2361 When an external function, such as a C library function, returns a |
2414 @code{char} pointer, you should almost never treat it as @code{Bufbyte}. | 2362 @code{char} pointer, you should almost never treat it as @code{Bufbyte}. |
2415 This is because these returned strings may contain 8bit characters which | 2363 This is because these returned strings may contain 8bit characters which |
2418 always convert it to an appropriate external encoding, lest the internal | 2366 always convert it to an appropriate external encoding, lest the internal |
2419 stuff (such as the infamous \201 characters) leak out. | 2367 stuff (such as the infamous \201 characters) leak out. |
2420 | 2368 |
2421 The interface to conversion between the internal and external | 2369 The interface to conversion between the internal and external |
2422 representations of text are the numerous conversion macros defined in | 2370 representations of text are the numerous conversion macros defined in |
2423 @file{buffer.h}. Before looking at them, we'll look at the external | 2371 @file{buffer.h}. There used to be a fixed set of external formats |
2424 formats supported by these macros. | 2372 supported by these macros, but now any coding system can be used with |
2425 | 2373 these macros. The coding system alias mechanism is used to create the |
2426 Currently meaningful formats are @code{FORMAT_BINARY}, | 2374 following logical coding systems, which replace the fixed external |
2427 @code{FORMAT_FILENAME}, @code{FORMAT_OS}, and @code{FORMAT_CTEXT}. Here | 2375 formats. The (dontusethis-set-symbol-value-handler) mechanism was |
2428 is a description of these. | 2376 enhanced to make this possible (more work on that is needed - like |
2377 remove the @code{dontusethis-} prefix). | |
2429 | 2378 |
2430 @table @code | 2379 @table @code |
2431 @item FORMAT_BINARY | 2380 @item Qbinary |
2432 Binary format. This is the simplest format and is what we use in the | 2381 This is the simplest format and is what we use in the absence of a more |
2433 absence of a more appropriate format. This converts according to the | 2382 appropriate format. This converts according to the @code{binary} coding |
2434 @code{binary} coding system: | 2383 system: |
2435 | 2384 |
2436 @enumerate a | 2385 @enumerate a |
2437 @item | 2386 @item |
2438 On input, bytes 0--255 are converted into characters 0--255. | 2387 On input, bytes 0--255 are converted into (implicitly Latin-1) |
2388 characters 0--255. A non-Mule xemacs doesn't really know about | |
2389 different character sets and the fonts to display them, so the bytes can | |
2390 be treated as text in different 1-byte encodings by simply setting the | |
2391 appropriate fonts. So in a sense, non-Mule xemacs is a multi-lingual | |
2392 editor if, for example, different fonts are used to display text in | |
2393 different buffers, faces, or windows. The specifier mechanism gives the | |
2394 user complete control over this kind of behavior. | |
2439 @item | 2395 @item |
2440 On output, characters 0--255 are converted into bytes 0--255 and other | 2396 On output, characters 0--255 are converted into bytes 0--255 and other |
2441 characters are converted into `X'. | 2397 characters are converted into `~'. |
2442 @end enumerate | 2398 @end enumerate |
2443 | 2399 |
2444 @item FORMAT_FILENAME | 2400 @item Qfile_name |
2445 Format used for filenames. In the original Mule, this is user-definable | 2401 Format used for filenames. This is user-definable via either the |
2446 with the @code{pathname-coding-system} variable. For the moment, we | 2402 @code{file-name-coding-system} or @code{pathname-coding-system} (now |
2447 just use the @code{binary} coding system. | 2403 obsolete) variables. |
2448 | 2404 |
2449 @item FORMAT_OS | 2405 @item Qnative |
2450 Format used for the external Unix environment---@code{argv[]}, stuff | 2406 Format used for the external Unix environment---@code{argv[]}, stuff |
2451 from @code{getenv()}, stuff from the @file{/etc/passwd} file, etc. | 2407 from @code{getenv()}, stuff from the @file{/etc/passwd} file, etc. |
2452 | 2408 Currently this is the same as Qfile_name. The two should be |
2453 Perhaps should be the same as FORMAT_FILENAME. | 2409 distinguished for clarity and possible future separation. |
2454 | 2410 |
2455 @item FORMAT_CTEXT | 2411 @item Qctext |
2456 Compound--text format. This is the standard X format used for data | 2412 Compound--text format. This is the standard X11 format used for data |
2457 stored in properties, selections, and the like. This is an 8-bit | 2413 stored in properties, selections, and the like. This is an 8-bit |
2458 no-lock-shift ISO2022 coding system. | 2414 no-lock-shift ISO2022 coding system. This is a real coding system, |
2415 unlike Qfile_name, which is user-definable. | |
2459 @end table | 2416 @end table |
2460 | 2417 |
2461 The macros to convert between these formats and the internal format, and | 2418 There are two fundamental macros to convert between external and |
2462 vice versa, follow. | 2419 internal format. |
2420 | |
2421 @code{TO_INTERNAL_FORMAT} converts external data to internal format, and | |
2422 @code{TO_EXTERNAL_FORMAT} converts the other way around. The arguments | |
2423 each of these receives are a source type, a source, a sink type, a sink, | |
2424 and a coding system (or a symbol naming a coding system). | |
2425 | |
2426 A typical call looks like | |
2427 @example | |
2428 TO_EXTERNAL_FORMAT (LISP_STRING, str, C_STRING_MALLOC, ptr, Qfile_name); | |
2429 @end example | |
2430 | |
2431 which means that the contents of the lisp string @code{str} are written | |
2432 to a malloc'ed memory area which will be pointed to by @code{ptr}, after | |
2433 the function returns. The conversion will be done using the | |
2434 @code{file-name} coding system, which will be controlled by the user | |
2435 indirectly by setting or binding the variable | |
2436 @code{file-name-coding-system}. | |
2437 | |
2438 Some sources and sinks require two C variables to specify. We use some | |
2439 preprocessor magic to allow different source and sink types, and even | |
2440 different numbers of arguments to specify different types of sources and | |
2441 sinks. | |
2442 | |
2443 So we can have a call that looks like | |
2444 @example | |
2445 TO_INTERNAL_FORMAT (DATA, (ptr, len), | |
2446 MALLOC, (ptr, len), | |
2447 coding_system); | |
2448 @end example | |
2449 | |
2450 The parenthesized argument pairs are required to make the preprocessor | |
2451 magic work. | |
2452 | |
2453 Here are the different source and sink types: | |
2463 | 2454 |
2464 @table @code | 2455 @table @code |
2465 @item GET_CHARPTR_INT_DATA_ALLOCA | 2456 @item @code{DATA, (ptr, len),} |
2466 @itemx GET_CHARPTR_EXT_DATA_ALLOCA | 2457 input data is a fixed buffer of size @var{len} at address @var{ptr} |
2467 These two are the most basic conversion macros. | 2458 @item @code{ALLOCA, (ptr, len),} |
2468 @code{GET_CHARPTR_INT_DATA_ALLOCA} converts external data to internal | 2459 output data is placed in an alloca()ed buffer of size @var{len} pointed to by @var{ptr} |
2469 format, and @code{GET_CHARPTR_EXT_DATA_ALLOCA} converts the other way | 2460 @item @code{MALLOC, (ptr, len),} |
2470 around. The arguments each of these receives are @var{ptr} (pointer to | 2461 output data is in a malloc()ed buffer of size @var{len} pointed to by @var{ptr} |
2471 the text in external format), @var{len} (length of texts in bytes), | 2462 @item @code{C_STRING_ALLOCA, ptr,} |
2472 @var{fmt} (format of the external text), @var{ptr_out} (lvalue to which | 2463 equivalent to @code{ALLOCA (ptr, len_ignored)} on output. |
2473 new text should be copied), and @var{len_out} (lvalue which will be | 2464 @item @code{C_STRING_MALLOC, ptr,} |
2474 assigned the length of the internal text in bytes). The resulting text | 2465 equivalent to @code{MALLOC (ptr, len_ignored)} on output |
2475 is stored to a stack-allocated buffer. If the text doesn't need | 2466 @item @code{C_STRING, ptr,} |
2476 changing, these macros will do nothing, except for setting | 2467 equivalent to @code{DATA, (ptr, strlen (ptr) + 1)} on input |
2477 @var{len_out}. | 2468 @item @code{LISP_STRING, string,} |
2478 | 2469 input or output is a Lisp_Object of type string |
2479 The macros above take many arguments which makes them unwieldy. For | 2470 @item @code{LISP_BUFFER, buffer,} |
2480 this reason, a number of convenience macros are defined with obvious | 2471 output is written to @code{(point)} in lisp buffer @var{buffer} |
2481 functionality, but accepting less arguments. The general rule is that | 2472 @item @code{LISP_LSTREAM, lstream,} |
2482 macros with @samp{INT} in their name convert text to internal Emacs | 2473 input or output is a Lisp_Object of type lstream |
2483 representation, whereas the @samp{EXT} macros convert to external | 2474 @item @code{LISP_OPAQUE, object,} |
2484 representation. | 2475 input or output is a Lisp_Object of type opaque |
2485 | |
2486 @item GET_C_CHARPTR_INT_DATA_ALLOCA | |
2487 @itemx GET_C_CHARPTR_EXT_DATA_ALLOCA | |
2488 As their names imply, these macros work on C char pointers, which are | |
2489 zero-terminated, and thus do not need @var{len} or @var{len_out} | |
2490 parameters. | |
2491 | |
2492 @item GET_STRING_EXT_DATA_ALLOCA | |
2493 @itemx GET_C_STRING_EXT_DATA_ALLOCA | |
2494 These two macros convert a Lisp string into an external representation. | |
2495 The difference between them is that @code{GET_STRING_EXT_DATA_ALLOCA} | |
2496 stores its output to a generic string, providing @var{len_out}, the | |
2497 length of the resulting external string. On the other hand, | |
2498 @code{GET_C_STRING_EXT_DATA_ALLOCA} assumes that the caller will be | |
2499 satisfied with output string being zero-terminated. | |
2500 | |
2501 Note that for Lisp strings only one conversion direction makes sense. | |
2502 | |
2503 @item GET_C_CHARPTR_EXT_BINARY_DATA_ALLOCA | |
2504 @itemx GET_CHARPTR_EXT_BINARY_DATA_ALLOCA | |
2505 @itemx GET_STRING_BINARY_DATA_ALLOCA | |
2506 @itemx GET_C_STRING_BINARY_DATA_ALLOCA | |
2507 @itemx GET_C_CHARPTR_EXT_FILENAME_DATA_ALLOCA | |
2508 @itemx ... | |
2509 These macros convert internal text to a specific external | |
2510 representation, with the external format being encoded into the name of | |
2511 the macro. Note that the @code{GET_STRING_...} and | |
2512 @code{GET_C_STRING...} macros lack the @samp{EXT} tag, because they | |
2513 only make sense in that direction. | |
2514 | |
2515 @item GET_C_CHARPTR_INT_BINARY_DATA_ALLOCA | |
2516 @itemx GET_CHARPTR_INT_BINARY_DATA_ALLOCA | |
2517 @itemx GET_C_CHARPTR_INT_FILENAME_DATA_ALLOCA | |
2518 @itemx ... | |
2519 These macros convert external text of a specific format to its internal | |
2520 representation, with the external format being incoded into the name of | |
2521 the macro. | |
2522 @end table | 2476 @end table |
2523 | 2477 |
2524 @node General Guidelines for Writing Mule-Aware Code | 2478 Often, the data is being converted to a '\0'-byte-terminated string, |
2479 which is the format required by many external system C APIs. For these | |
2480 purposes, a source type of @code{C_STRING} or a sink type of | |
2481 @code{C_STRING_ALLOCA} or @code{C_STRING_MALLOC} is appropriate. | |
2482 Otherwise, we should try to keep XEmacs '\0'-byte-clean, which means | |
2483 using (ptr, len) pairs. | |
2484 | |
2485 The sinks to be specified must be lvalues, unless they are the lisp | |
2486 object types @code{LISP_LSTREAM} or @code{LISP_BUFFER}. | |
2487 | |
2488 For the sink types @code{ALLOCA} and @code{C_STRING_ALLOCA}, the | |
2489 resulting text is stored in a stack-allocated buffer, which is | |
2490 automatically freed on returning from the function. However, the sink | |
2491 types @code{MALLOC} and @code{C_STRING_MALLOC} return @code{xmalloc()}ed | |
2492 memory. The caller is responsible for freeing this memory using | |
2493 @code{xfree()}. | |
2494 | |
2495 Note that it doesn't make sense for @code{LISP_STRING} to be a source | |
2496 for @code{TO_INTERNAL_FORMAT} or a sink for @code{TO_EXTERNAL_FORMAT}. | |
2497 You'll get an assertion failure if you try. | |
2498 | |
2499 | |
2500 @node General Guidelines for Writing Mule-Aware Code, An Example of Mule-Aware Code, Conversion to and from External Data, Coding for Mule | |
2525 @subsection General Guidelines for Writing Mule-Aware Code | 2501 @subsection General Guidelines for Writing Mule-Aware Code |
2526 | 2502 |
2527 This section contains some general guidance on how to write Mule-aware | 2503 This section contains some general guidance on how to write Mule-aware |
2528 code, as well as some pitfalls you should avoid. | 2504 code, as well as some pitfalls you should avoid. |
2529 | 2505 |
2546 It is extremely important to always convert external data, because | 2522 It is extremely important to always convert external data, because |
2547 XEmacs can crash if unexpected 8bit sequences are copied to its internal | 2523 XEmacs can crash if unexpected 8bit sequences are copied to its internal |
2548 buffers literally. | 2524 buffers literally. |
2549 | 2525 |
2550 This means that when a system function, such as @code{readdir}, returns | 2526 This means that when a system function, such as @code{readdir}, returns |
2551 a string, you need to convert it using one of the conversion macros | 2527 a string, you may need to convert it using one of the conversion macros |
2552 described in the previous chapter, before passing it further to Lisp. | 2528 described in the previous chapter, before passing it further to Lisp. |
2553 In the case of @code{readdir}, you would use the | 2529 |
2554 @code{GET_C_CHARPTR_INT_FILENAME_DATA_ALLOCA} macro. | 2530 Actually, most of the basic system functions that accept '\0'-terminated |
2531 string arguments, like @code{stat()} and @code{open()}, have been | |
2532 @strong{encapsulated} so that they are they @code{always} do internal to | |
2533 external conversion themselves. This means you must pass internally | |
2534 encoded data, typically the @code{XSTRING_DATA} of a Lisp_String to | |
2535 these functions. This is actually a design bug, since it unexpectedly | |
2536 changes the semantics of the system functions. A better design would be | |
2537 to provide separate versions of these system functions that accepted | |
2538 Lisp_Objects which were lisp strings in place of their current | |
2539 @code{char *} arguments. | |
2540 | |
2541 @example | |
2542 int stat_lisp (Lisp_Object path, struct stat *buf); /* Implement me */ | |
2543 @end example | |
2555 | 2544 |
2556 Also note that many internal functions, such as @code{make_string}, | 2545 Also note that many internal functions, such as @code{make_string}, |
2557 accept Bufbytes, which removes the need for them to convert the data | 2546 accept Bufbytes, which removes the need for them to convert the data |
2558 they receive. This increases efficiency because that way external data | 2547 they receive. This increases efficiency because that way external data |
2559 needs to be decoded only once, when it is read. After that, it is | 2548 needs to be decoded only once, when it is read. After that, it is |
2560 passed around in internal format. | 2549 passed around in internal format. |
2561 @end table | 2550 @end table |
2562 | 2551 |
2563 @node An Example of Mule-Aware Code | 2552 @node An Example of Mule-Aware Code, , General Guidelines for Writing Mule-Aware Code, Coding for Mule |
2564 @subsection An Example of Mule-Aware Code | 2553 @subsection An Example of Mule-Aware Code |
2565 | 2554 |
2566 As an example of Mule-aware code, we shall will analyze the | 2555 As an example of Mule-aware code, we will analyze the @code{string} |
2567 @code{string} function, which conses up a Lisp string from the character | 2556 function, which conses up a Lisp string from the character arguments it |
2568 arguments it receives. Here is the definition, pasted from | 2557 receives. Here is the definition, pasted from @code{alloc.c}: |
2569 @code{alloc.c}: | |
2570 | 2558 |
2571 @example | 2559 @example |
2572 @group | 2560 @group |
2573 DEFUN ("string", Fstring, 0, MANY, 0, /* | 2561 DEFUN ("string", Fstring, 0, MANY, 0, /* |
2574 Concatenate all the argument characters and make the result a string. | 2562 Concatenate all the argument characters and make the result a string. |
2609 over the XEmacs code. For starters, I recommend | 2597 over the XEmacs code. For starters, I recommend |
2610 @code{Fnormalize_menu_item_name} in @file{menubar.c}. After you have | 2598 @code{Fnormalize_menu_item_name} in @file{menubar.c}. After you have |
2611 understood this section of the manual and studied the examples, you can | 2599 understood this section of the manual and studied the examples, you can |
2612 proceed writing new Mule-aware code. | 2600 proceed writing new Mule-aware code. |
2613 | 2601 |
2614 @node Techniques for XEmacs Developers | 2602 @node Techniques for XEmacs Developers, , Coding for Mule, Rules When Writing New C Code |
2615 @section Techniques for XEmacs Developers | 2603 @section Techniques for XEmacs Developers |
2616 | 2604 |
2605 To make a purified XEmacs, do: @code{make puremacs}. | |
2617 To make a quantified XEmacs, do: @code{make quantmacs}. | 2606 To make a quantified XEmacs, do: @code{make quantmacs}. |
2618 | 2607 |
2619 You simply can't dump Quantified and Purified images. Run the image | 2608 You simply can't dump Quantified and Purified images (unless using the |
2620 like so: @code{quantmacs -batch -l loadup.el run-temacs @var{xemacs-args...}}. | 2609 portable dumper). Purify gets confused when xemacs frees memory in one |
2610 process that was allocated in a @emph{different} process on a different | |
2611 machine!. Run it like so: | |
2612 @example | |
2613 temacs -batch -l loadup.el run-temacs @var{xemacs-args...} | |
2614 @end example | |
2621 | 2615 |
2622 Before you go through the trouble, are you compiling with all | 2616 Before you go through the trouble, are you compiling with all |
2623 debugging and error-checking off? If not try that first. Be warned | 2617 debugging and error-checking off? If not, try that first. Be warned |
2624 that while Quantify is directly responsible for quite a few | 2618 that while Quantify is directly responsible for quite a few |
2625 optimizations which have been made to XEmacs, doing a run which | 2619 optimizations which have been made to XEmacs, doing a run which |
2626 generates results which can be acted upon is not necessarily a trivial | 2620 generates results which can be acted upon is not necessarily a trivial |
2627 task. | 2621 task. |
2628 | 2622 |
2657 | 2651 |
2658 Unfortunately, Emacs Lisp is slow, and is going to stay slow. Function | 2652 Unfortunately, Emacs Lisp is slow, and is going to stay slow. Function |
2659 calls in elisp are especially expensive. Iterating over a long list is | 2653 calls in elisp are especially expensive. Iterating over a long list is |
2660 going to be 30 times faster implemented in C than in Elisp. | 2654 going to be 30 times faster implemented in C than in Elisp. |
2661 | 2655 |
2656 Heavily used small code fragments need to be fast. The traditional way | |
2657 to implement such code fragments in C is with macros. But macros in C | |
2658 are known to be broken. | |
2659 | |
2660 Macro arguments that are repeatedly evaluated may suffer from repeated | |
2661 side effects or suboptimal performance. | |
2662 | |
2663 Variable names used in macros may collide with caller's variables, | |
2664 causing (at least) unwanted compiler warnings. | |
2665 | |
2666 In order to solve these problems, and maintain statement semantics, one | |
2667 should use the @code{do @{ ... @} while (0)} trick while trying to | |
2668 reference macro arguments exactly once using local variables. | |
2669 | |
2670 Let's take a look at this poor macro definition: | |
2671 | |
2672 @example | |
2673 #define MARK_OBJECT(obj) \ | |
2674 if (!marked_p (obj)) mark_object (obj), did_mark = 1 | |
2675 @end example | |
2676 | |
2677 This macro evaluates its argument twice, and also fails if used like this: | |
2678 @example | |
2679 if (flag) MARK_OBJECT (obj); else do_something(); | |
2680 @end example | |
2681 | |
2682 A much better definition is | |
2683 | |
2684 @example | |
2685 #define MARK_OBJECT(obj) do @{ \ | |
2686 Lisp_Object mo_obj = (obj); \ | |
2687 if (!marked_p (mo_obj)) \ | |
2688 @{ \ | |
2689 mark_object (mo_obj); \ | |
2690 did_mark = 1; \ | |
2691 @} \ | |
2692 @} while (0) | |
2693 @end example | |
2694 | |
2695 Notice the elimination of double evaluation by using the local variable | |
2696 with the obscure name. Writing safe and efficient macros requires great | |
2697 care. The one problem with macros that cannot be portably worked around | |
2698 is, since a C block has no value, a macro used as an expression rather | |
2699 than a statement cannot use the techniques just described to avoid | |
2700 multiple evaluation. | |
2701 | |
2702 In most cases where a macro has function semantics, an inline function | |
2703 is a better implementation technique. Modern compiler optimizers tend | |
2704 to inline functions even if they have no @code{inline} keyword, and | |
2705 configure magic ensures that the @code{inline} keyword can be safely | |
2706 used as an additional compiler hint. Inline functions used in a single | |
2707 .c files are easy. The function must already be defined to be | |
2708 @code{static}. Just add another @code{inline} keyword to the | |
2709 definition. | |
2710 | |
2711 @example | |
2712 inline static int | |
2713 heavily_used_small_function (int arg) | |
2714 @{ | |
2715 ... | |
2716 @} | |
2717 @end example | |
2718 | |
2719 Inline functions in header files are trickier, because we would like to | |
2720 make the following optimization if the function is @emph{not} inlined | |
2721 (for example, because we're compiling for debugging). We would like the | |
2722 function to be defined externally exactly once, and each calling | |
2723 translation unit would create an external reference to the function, | |
2724 instead of including a definition of the inline function in the object | |
2725 code of every translation unit that uses it. This optimization is | |
2726 currently only available for gcc. But you don't have to worry about the | |
2727 trickiness; just define your inline functions in header files using this | |
2728 pattern: | |
2729 | |
2730 @example | |
2731 INLINE_HEADER int | |
2732 i_used_to_be_a_crufty_macro_but_look_at_me_now (int arg); | |
2733 INLINE_HEADER int | |
2734 i_used_to_be_a_crufty_macro_but_look_at_me_now (int arg) | |
2735 @{ | |
2736 ... | |
2737 @} | |
2738 @end example | |
2739 | |
2740 The declaration right before the definition is to prevent warnings when | |
2741 compiling with @code{gcc -Wmissing-declarations}. I consider issuing | |
2742 this warning for inline functions a gcc bug, but the gcc maintainers disagree. | |
2743 | |
2744 Every header which contains inline functions, either directly by using | |
2745 @code{INLINE_HEADER} or indirectly by using @code{DECLARE_LRECORD} must | |
2746 be added to @file{inline.c}'s includes to make the optimization | |
2747 described above work. (Optimization note: if all INLINE_HEADER | |
2748 functions are in fact inlined in all translation units, then the linker | |
2749 can just discard @code{inline.o}, since it contains only unreferenced code). | |
2750 | |
2662 To get started debugging XEmacs, take a look at the @file{.gdbinit} and | 2751 To get started debugging XEmacs, take a look at the @file{.gdbinit} and |
2663 @file{.dbxrc} files in the @file{src} directory. | 2752 @file{.dbxrc} files in the @file{src} directory. See the section in the |
2664 @xref{Q2.1.15 - How to Debug an XEmacs problem with a debugger,,, | 2753 XEmacs FAQ on How to Debug an XEmacs problem with a debugger. |
2665 xemacs-faq, XEmacs FAQ}. | |
2666 | 2754 |
2667 After making source code changes, run @code{make check} to ensure that | 2755 After making source code changes, run @code{make check} to ensure that |
2668 you haven't introduced any regressions. If you're feeling ambitious, | 2756 you haven't introduced any regressions. If you want to make xemacs more |
2669 you can try to improve the test suite in @file{tests/automated}. | 2757 reliable, please improve the test suite in @file{tests/automated}. |
2758 | |
2759 Did you make sure you didn't introduce any new compiler warnings? | |
2760 | |
2761 Before submitting a patch, please try compiling at least once with | |
2762 | |
2763 @example | |
2764 configure --with-mule --with-union-type --error-checking=all | |
2765 @end example | |
2670 | 2766 |
2671 Here are things to know when you create a new source file: | 2767 Here are things to know when you create a new source file: |
2672 | 2768 |
2673 @itemize @bullet | 2769 @itemize @bullet |
2674 @item | 2770 @item |
2677 | 2773 |
2678 @item | 2774 @item |
2679 Generated header files should be included using the @code{#include <...>} syntax, | 2775 Generated header files should be included using the @code{#include <...>} syntax, |
2680 not the @code{#include "..."} syntax. The generated headers are: | 2776 not the @code{#include "..."} syntax. The generated headers are: |
2681 | 2777 |
2682 @file{config.h puresize-adjust.h sheap-adjust.h paths.h Emacs.ad.h} | 2778 @file{config.h sheap-adjust.h paths.h Emacs.ad.h} |
2683 | 2779 |
2684 The basic rule is that you should assume builds using @code{--srcdir} | 2780 The basic rule is that you should assume builds using @code{--srcdir} |
2685 and the @code{#include <...>} syntax needs to be used when the | 2781 and the @code{#include <...>} syntax needs to be used when the |
2686 to-be-included generated file is in a potentially different directory | 2782 to-be-included generated file is in a potentially different directory |
2687 @emph{at compile time}. The non-obvious C rule is that @code{#include "..."} | 2783 @emph{at compile time}. The non-obvious C rule is that @code{#include "..."} |
2691 @item | 2787 @item |
2692 Header files should @emph{not} include @code{<config.h>} and | 2788 Header files should @emph{not} include @code{<config.h>} and |
2693 @code{"lisp.h"}. It is the responsibility of the @file{.c} files that | 2789 @code{"lisp.h"}. It is the responsibility of the @file{.c} files that |
2694 use it to do so. | 2790 use it to do so. |
2695 | 2791 |
2696 @item | |
2697 If the header uses @code{INLINE}, either directly or through | |
2698 @code{DECLARE_LRECORD}, then it must be added to @file{inline.c}'s | |
2699 includes. | |
2700 | |
2701 @item | |
2702 Try compiling at least once with | |
2703 | |
2704 @example | |
2705 gcc --with-mule --with-union-type --error-checking=all | |
2706 @end example | |
2707 | |
2708 @item | |
2709 Did I mention that you should run the test suite? | |
2710 @example | |
2711 make check | |
2712 @end example | |
2713 @end itemize | 2792 @end itemize |
2714 | 2793 |
2794 Here is a checklist of things to do when creating a new lisp object type | |
2795 named @var{foo}: | |
2796 | |
2797 @enumerate | |
2798 @item | |
2799 create @var{foo}.h | |
2800 @item | |
2801 create @var{foo}.c | |
2802 @item | |
2803 add definitions of @code{syms_of_@var{foo}}, etc. to @file{@var{foo}.c} | |
2804 @item | |
2805 add declarations of @code{syms_of_@var{foo}}, etc. to @file{symsinit.h} | |
2806 @item | |
2807 add calls to @code{syms_of_@var{foo}}, etc. to @file{emacs.c} | |
2808 @item | |
2809 add definitions of macros like @code{CHECK_@var{FOO}} and | |
2810 @code{@var{FOO}P} to @file{@var{foo}.h} | |
2811 @item | |
2812 add the new type index to @code{enum lrecord_type} | |
2813 @item | |
2814 add a DEFINE_LRECORD_IMPLEMENTATION call to @file{@var{foo}.c} | |
2815 @item | |
2816 add an INIT_LRECORD_IMPLEMENTATION call to @code{syms_of_@var{foo}.c} | |
2817 @end enumerate | |
2715 | 2818 |
2716 @node A Summary of the Various XEmacs Modules, Allocation of Objects in XEmacs Lisp, Rules When Writing New C Code, Top | 2819 @node A Summary of the Various XEmacs Modules, Allocation of Objects in XEmacs Lisp, Rules When Writing New C Code, Top |
2717 @chapter A Summary of the Various XEmacs Modules | 2820 @chapter A Summary of the Various XEmacs Modules |
2718 | 2821 |
2719 This is accurate as of XEmacs 20.0. | 2822 This is accurate as of XEmacs 20.0. |
2731 * Modules for Interfacing with the Operating System:: | 2834 * Modules for Interfacing with the Operating System:: |
2732 * Modules for Interfacing with X Windows:: | 2835 * Modules for Interfacing with X Windows:: |
2733 * Modules for Internationalization:: | 2836 * Modules for Internationalization:: |
2734 @end menu | 2837 @end menu |
2735 | 2838 |
2736 @node Low-Level Modules | 2839 @node Low-Level Modules, Basic Lisp Modules, A Summary of the Various XEmacs Modules, A Summary of the Various XEmacs Modules |
2737 @section Low-Level Modules | 2840 @section Low-Level Modules |
2738 | 2841 |
2739 @example | 2842 @example |
2740 config.h | 2843 config.h |
2741 @end example | 2844 @end example |
2805 chosen by @file{configure}. | 2908 chosen by @file{configure}. |
2806 | 2909 |
2807 | 2910 |
2808 | 2911 |
2809 @example | 2912 @example |
2810 crt0.c | 2913 ecrt0.c |
2811 lastfile.c | 2914 lastfile.c |
2812 pre-crt0.c | 2915 pre-crt0.c |
2813 @end example | 2916 @end example |
2814 | 2917 |
2815 These modules are used in conjunction with the dump mechanism. On some | 2918 These modules are used in conjunction with the dump mechanism. On some |
2940 provided by the @samp{--error-check-*} configuration options. | 3043 provided by the @samp{--error-check-*} configuration options. |
2941 | 3044 |
2942 | 3045 |
2943 | 3046 |
2944 @example | 3047 @example |
2945 prefix-args.c | |
2946 @end example | |
2947 | |
2948 This is actually the source for a small, self-contained program | |
2949 used during building. | |
2950 | |
2951 | |
2952 @example | |
2953 universe.h | 3048 universe.h |
2954 @end example | 3049 @end example |
2955 | 3050 |
2956 This is not currently used. | 3051 This is not currently used. |
2957 | 3052 |
2958 | 3053 |
2959 | 3054 |
2960 @node Basic Lisp Modules | 3055 @node Basic Lisp Modules, Modules for Standard Editing Operations, Low-Level Modules, A Summary of the Various XEmacs Modules |
2961 @section Basic Lisp Modules | 3056 @section Basic Lisp Modules |
2962 | 3057 |
2963 @example | 3058 @example |
2964 emacsfns.h | |
2965 lisp-disunion.h | 3059 lisp-disunion.h |
2966 lisp-union.h | 3060 lisp-union.h |
2967 lisp.h | 3061 lisp.h |
2968 lrecord.h | 3062 lrecord.h |
2969 symsinit.h | 3063 symsinit.h |
3008 | 3102 |
3009 | 3103 |
3010 | 3104 |
3011 @example | 3105 @example |
3012 alloc.c | 3106 alloc.c |
3013 pure.c | |
3014 puresize.h | |
3015 @end example | 3107 @end example |
3016 | 3108 |
3017 The large module @file{alloc.c} implements all of the basic allocation and | 3109 The large module @file{alloc.c} implements all of the basic allocation and |
3018 garbage collection for Lisp objects. The most commonly used Lisp | 3110 garbage collection for Lisp objects. The most commonly used Lisp |
3019 objects are allocated in chunks, similar to the Blocktype data type | 3111 objects are allocated in chunks, similar to the Blocktype data type |
3035 code, adding a new subtype within a subsystem will in general not | 3127 code, adding a new subtype within a subsystem will in general not |
3036 require changes to the generic subsystem code or affect any of the other | 3128 require changes to the generic subsystem code or affect any of the other |
3037 subtypes in the subsystem; this provides a great deal of robustness to | 3129 subtypes in the subsystem; this provides a great deal of robustness to |
3038 the XEmacs code. | 3130 the XEmacs code. |
3039 | 3131 |
3040 @cindex pure space | |
3041 @file{pure.c} contains the declaration of the @dfn{purespace} array. | |
3042 Pure space is a hack used to place some constant Lisp data into the code | |
3043 segment of the XEmacs executable, even though the data needs to be | |
3044 initialized through function calls. (See above in section VIII for more | |
3045 info about this.) During startup, certain sorts of data is | |
3046 automatically copied into pure space, and other data is copied manually | |
3047 in some of the basic Lisp files by calling the function @code{purecopy}, | |
3048 which copies the object if possible (this only works in temacs, of | |
3049 course) and returns the new object. In particular, while temacs is | |
3050 executing, the Lisp reader automatically copies all compiled-function | |
3051 objects that it reads into pure space. Since compiled-function objects | |
3052 are large, are never modified, and typically comprise the majority of | |
3053 the contents of a compiled-Lisp file, this works well. While XEmacs is | |
3054 running, any attempt to modify an object that resides in pure space | |
3055 causes an error. Objects in pure space are never garbage collected -- | |
3056 almost all of the time, they're intended to be permanent, and in any | |
3057 case you can't write into pure space to set the mark bits. | |
3058 | |
3059 @file{puresize.h} contains the declaration of the size of the pure space | |
3060 array. This depends on the optional features that are compiled in, any | |
3061 extra purespace requested by the user at compile time, and certain other | |
3062 factors (e.g. 64-bit machines need more pure space because their Lisp | |
3063 objects are larger). The smallest size that suffices should be used, so | |
3064 that there's no wasted space. If there's not enough pure space, you | |
3065 will get an error during the build process, specifying how much more | |
3066 pure space is needed. | |
3067 | |
3068 | |
3069 | 3132 |
3070 @example | 3133 @example |
3071 eval.c | 3134 eval.c |
3072 backtrace.h | 3135 backtrace.h |
3073 @end example | 3136 @end example |
3161 structures. Note that the byte-code @emph{compiler} is written in Lisp. | 3224 structures. Note that the byte-code @emph{compiler} is written in Lisp. |
3162 | 3225 |
3163 | 3226 |
3164 | 3227 |
3165 | 3228 |
3166 @node Modules for Standard Editing Operations | 3229 @node Modules for Standard Editing Operations, Editor-Level Control Flow Modules, Basic Lisp Modules, A Summary of the Various XEmacs Modules |
3167 @section Modules for Standard Editing Operations | 3230 @section Modules for Standard Editing Operations |
3168 | 3231 |
3169 @example | 3232 @example |
3170 buffer.c | 3233 buffer.c |
3171 buffer.h | 3234 buffer.h |
3331 This module implements the undo mechanism for tracking buffer changes. | 3394 This module implements the undo mechanism for tracking buffer changes. |
3332 Most of this could be implemented in Lisp. | 3395 Most of this could be implemented in Lisp. |
3333 | 3396 |
3334 | 3397 |
3335 | 3398 |
3336 @node Editor-Level Control Flow Modules | 3399 @node Editor-Level Control Flow Modules, Modules for the Basic Displayable Lisp Objects, Modules for Standard Editing Operations, A Summary of the Various XEmacs Modules |
3337 @section Editor-Level Control Flow Modules | 3400 @section Editor-Level Control Flow Modules |
3338 | 3401 |
3339 @example | 3402 @example |
3340 event-Xt.c | 3403 event-Xt.c |
3404 event-msw.c | |
3341 event-stream.c | 3405 event-stream.c |
3342 event-tty.c | 3406 event-tty.c |
3407 events-mod.h | |
3408 gpmevent.c | |
3409 gpmevent.h | |
3343 events.c | 3410 events.c |
3344 events.h | 3411 events.h |
3345 @end example | 3412 @end example |
3346 | 3413 |
3347 These implement the handling of events (user input and other system | 3414 These implement the handling of events (user input and other system |
3392 relevant keymaps.) | 3459 relevant keymaps.) |
3393 | 3460 |
3394 | 3461 |
3395 | 3462 |
3396 @example | 3463 @example |
3397 keyboard.c | 3464 cmdloop.c |
3398 @end example | 3465 @end example |
3399 | 3466 |
3400 @file{keyboard.c} contains functions that implement the actual editor | 3467 @file{cmdloop.c} contains functions that implement the actual editor |
3401 command loop---i.e. the event loop that cyclically retrieves and | 3468 command loop---i.e. the event loop that cyclically retrieves and |
3402 dispatches events. This code is also rather tricky, just like | 3469 dispatches events. This code is also rather tricky, just like |
3403 @file{event-stream.c}. | 3470 @file{event-stream.c}. |
3404 | 3471 |
3405 | 3472 |
3429 bootstrapping implementations early in temacs, before the echo-area Lisp | 3496 bootstrapping implementations early in temacs, before the echo-area Lisp |
3430 code is loaded). | 3497 code is loaded). |
3431 | 3498 |
3432 | 3499 |
3433 | 3500 |
3434 @node Modules for the Basic Displayable Lisp Objects | 3501 @node Modules for the Basic Displayable Lisp Objects, Modules for other Display-Related Lisp Objects, Editor-Level Control Flow Modules, A Summary of the Various XEmacs Modules |
3435 @section Modules for the Basic Displayable Lisp Objects | 3502 @section Modules for the Basic Displayable Lisp Objects |
3436 | 3503 |
3437 @example | 3504 @example |
3438 device-ns.h | 3505 console-msw.c |
3439 device-stream.c | 3506 console-msw.h |
3440 device-stream.h | 3507 console-stream.c |
3508 console-stream.h | |
3509 console-tty.c | |
3510 console-tty.h | |
3511 console-x.c | |
3512 console-x.h | |
3513 console.c | |
3514 console.h | |
3515 @end example | |
3516 | |
3517 These modules implement the @dfn{console} Lisp object type. A console | |
3518 contains multiple display devices, but only one keyboard and mouse. | |
3519 Most of the time, a console will contain exactly one device. | |
3520 | |
3521 Consoles are the top of a lisp object inclusion hierarchy. Consoles | |
3522 contain devices, which contain frames, which contain windows. | |
3523 | |
3524 | |
3525 | |
3526 @example | |
3527 device-msw.c | |
3441 device-tty.c | 3528 device-tty.c |
3442 device-tty.h | |
3443 device-x.c | 3529 device-x.c |
3444 device-x.h | |
3445 device.c | 3530 device.c |
3446 device.h | 3531 device.h |
3447 @end example | 3532 @end example |
3448 | 3533 |
3449 These modules implement the @dfn{device} Lisp object type. This | 3534 These modules implement the @dfn{device} Lisp object type. This |
3460 subtypes (X, TTY, NeXTstep, Microsoft Windows, etc.) as devices do. | 3545 subtypes (X, TTY, NeXTstep, Microsoft Windows, etc.) as devices do. |
3461 | 3546 |
3462 | 3547 |
3463 | 3548 |
3464 @example | 3549 @example |
3465 frame-ns.h | 3550 frame-msw.c |
3466 frame-tty.c | 3551 frame-tty.c |
3467 frame-x.c | 3552 frame-x.c |
3468 frame-x.h | |
3469 frame.c | 3553 frame.c |
3470 frame.h | 3554 frame.h |
3471 @end example | 3555 @end example |
3472 | 3556 |
3473 Each device contains one or more frames in which objects (e.g. text) are | 3557 Each device contains one or more frames in which objects (e.g. text) are |
3503 is part of the redisplay mechanism or the code for particular object | 3587 is part of the redisplay mechanism or the code for particular object |
3504 types such as scrollbars. | 3588 types such as scrollbars. |
3505 | 3589 |
3506 | 3590 |
3507 | 3591 |
3508 @node Modules for other Display-Related Lisp Objects | 3592 @node Modules for other Display-Related Lisp Objects, Modules for the Redisplay Mechanism, Modules for the Basic Displayable Lisp Objects, A Summary of the Various XEmacs Modules |
3509 @section Modules for other Display-Related Lisp Objects | 3593 @section Modules for other Display-Related Lisp Objects |
3510 | 3594 |
3511 @example | 3595 @example |
3512 faces.c | 3596 faces.c |
3513 faces.h | 3597 faces.h |
3515 | 3599 |
3516 | 3600 |
3517 | 3601 |
3518 @example | 3602 @example |
3519 bitmaps.h | 3603 bitmaps.h |
3520 glyphs-ns.h | 3604 glyphs-eimage.c |
3605 glyphs-msw.c | |
3606 glyphs-msw.h | |
3607 glyphs-widget.c | |
3521 glyphs-x.c | 3608 glyphs-x.c |
3522 glyphs-x.h | 3609 glyphs-x.h |
3523 glyphs.c | 3610 glyphs.c |
3524 glyphs.h | 3611 glyphs.h |
3525 @end example | 3612 @end example |
3526 | 3613 |
3527 | 3614 |
3528 | 3615 |
3529 @example | 3616 @example |
3530 objects-ns.h | 3617 objects-msw.c |
3618 objects-msw.h | |
3531 objects-tty.c | 3619 objects-tty.c |
3532 objects-tty.h | 3620 objects-tty.h |
3533 objects-x.c | 3621 objects-x.c |
3534 objects-x.h | 3622 objects-x.h |
3535 objects.c | 3623 objects.c |
3537 @end example | 3625 @end example |
3538 | 3626 |
3539 | 3627 |
3540 | 3628 |
3541 @example | 3629 @example |
3630 menubar-msw.c | |
3631 menubar-msw.h | |
3542 menubar-x.c | 3632 menubar-x.c |
3543 menubar.c | 3633 menubar.c |
3544 @end example | 3634 menubar.h |
3545 | 3635 @end example |
3546 | 3636 |
3547 | 3637 |
3548 @example | 3638 |
3639 @example | |
3640 scrollbar-msw.c | |
3641 scrollbar-msw.h | |
3549 scrollbar-x.c | 3642 scrollbar-x.c |
3550 scrollbar-x.h | 3643 scrollbar-x.h |
3551 scrollbar.c | 3644 scrollbar.c |
3552 scrollbar.h | 3645 scrollbar.h |
3553 @end example | 3646 @end example |
3554 | 3647 |
3555 | 3648 |
3556 | 3649 |
3557 @example | 3650 @example |
3651 toolbar-msw.c | |
3558 toolbar-x.c | 3652 toolbar-x.c |
3559 toolbar.c | 3653 toolbar.c |
3560 toolbar.h | 3654 toolbar.h |
3561 @end example | 3655 @end example |
3562 | 3656 |
3579 gif_lib.h | 3673 gif_lib.h |
3580 gifalloc.c | 3674 gifalloc.c |
3581 @end example | 3675 @end example |
3582 | 3676 |
3583 These modules decode GIF-format image files, for use with glyphs. | 3677 These modules decode GIF-format image files, for use with glyphs. |
3584 | 3678 These files were removed due to Unisys patent infringement concerns. |
3585 | 3679 |
3586 | 3680 |
3587 @node Modules for the Redisplay Mechanism | 3681 |
3682 @node Modules for the Redisplay Mechanism, Modules for Interfacing with the File System, Modules for other Display-Related Lisp Objects, A Summary of the Various XEmacs Modules | |
3588 @section Modules for the Redisplay Mechanism | 3683 @section Modules for the Redisplay Mechanism |
3589 | 3684 |
3590 @example | 3685 @example |
3591 redisplay-output.c | 3686 redisplay-output.c |
3687 redisplay-msw.c | |
3592 redisplay-tty.c | 3688 redisplay-tty.c |
3593 redisplay-x.c | 3689 redisplay-x.c |
3594 redisplay.c | 3690 redisplay.c |
3595 redisplay.h | 3691 redisplay.h |
3596 @end example | 3692 @end example |
3654 These files provide some miscellaneous TTY-output functions and should | 3750 These files provide some miscellaneous TTY-output functions and should |
3655 probably be merged into @file{redisplay-tty.c}. | 3751 probably be merged into @file{redisplay-tty.c}. |
3656 | 3752 |
3657 | 3753 |
3658 | 3754 |
3659 @node Modules for Interfacing with the File System | 3755 @node Modules for Interfacing with the File System, Modules for Other Aspects of the Lisp Interpreter and Object System, Modules for the Redisplay Mechanism, A Summary of the Various XEmacs Modules |
3660 @section Modules for Interfacing with the File System | 3756 @section Modules for Interfacing with the File System |
3661 | 3757 |
3662 @example | 3758 @example |
3663 lstream.c | 3759 lstream.c |
3664 lstream.h | 3760 lstream.h |
3681 streams and C++ I/O streams. | 3777 streams and C++ I/O streams. |
3682 | 3778 |
3683 Similar to other subsystems in XEmacs, lstreams are separated into | 3779 Similar to other subsystems in XEmacs, lstreams are separated into |
3684 generic functions and a set of methods for the different types of | 3780 generic functions and a set of methods for the different types of |
3685 lstreams. @file{lstream.c} provides implementations of many different | 3781 lstreams. @file{lstream.c} provides implementations of many different |
3686 types of streams; others are provided, e.g., in @file{mule-coding.c}. | 3782 types of streams; others are provided, e.g., in @file{file-coding.c}. |
3687 | 3783 |
3688 | 3784 |
3689 | 3785 |
3690 @example | 3786 @example |
3691 fileio.c | 3787 fileio.c |
3755 for expanding symbolic links, on systems that don't implement it or have | 3851 for expanding symbolic links, on systems that don't implement it or have |
3756 a broken implementation. | 3852 a broken implementation. |
3757 | 3853 |
3758 | 3854 |
3759 | 3855 |
3760 @node Modules for Other Aspects of the Lisp Interpreter and Object System | 3856 @node Modules for Other Aspects of the Lisp Interpreter and Object System, Modules for Interfacing with the Operating System, Modules for Interfacing with the File System, A Summary of the Various XEmacs Modules |
3761 @section Modules for Other Aspects of the Lisp Interpreter and Object System | 3857 @section Modules for Other Aspects of the Lisp Interpreter and Object System |
3762 | 3858 |
3763 @example | 3859 @example |
3764 elhash.c | 3860 elhash.c |
3765 elhash.h | 3861 elhash.h |
3917 various security applications on the Internet. | 4013 various security applications on the Internet. |
3918 | 4014 |
3919 | 4015 |
3920 | 4016 |
3921 | 4017 |
3922 @node Modules for Interfacing with the Operating System | 4018 @node Modules for Interfacing with the Operating System, Modules for Interfacing with X Windows, Modules for Other Aspects of the Lisp Interpreter and Object System, A Summary of the Various XEmacs Modules |
3923 @section Modules for Interfacing with the Operating System | 4019 @section Modules for Interfacing with the Operating System |
3924 | 4020 |
3925 @example | 4021 @example |
3926 callproc.c | 4022 callproc.c |
3927 process.c | 4023 process.c |
4146 This module provides some terminal-control code necessary on versions of | 4242 This module provides some terminal-control code necessary on versions of |
4147 AIX prior to 4.1. | 4243 AIX prior to 4.1. |
4148 | 4244 |
4149 | 4245 |
4150 | 4246 |
4151 @example | 4247 @node Modules for Interfacing with X Windows, Modules for Internationalization, Modules for Interfacing with the Operating System, A Summary of the Various XEmacs Modules |
4152 msdos.c | |
4153 msdos.h | |
4154 @end example | |
4155 | |
4156 These modules are used for MS-DOS support, which does not work in | |
4157 XEmacs. | |
4158 | |
4159 | |
4160 | |
4161 @node Modules for Interfacing with X Windows | |
4162 @section Modules for Interfacing with X Windows | 4248 @section Modules for Interfacing with X Windows |
4163 | 4249 |
4164 @example | 4250 @example |
4165 Emacs.ad.h | 4251 Emacs.ad.h |
4166 @end example | 4252 @end example |
4223 needs to be rewritten. | 4309 needs to be rewritten. |
4224 | 4310 |
4225 | 4311 |
4226 | 4312 |
4227 @example | 4313 @example |
4228 xselect.c | 4314 select-msw.c |
4315 select-x.c | |
4316 select.c | |
4317 select.h | |
4229 @end example | 4318 @end example |
4230 | 4319 |
4231 @cindex selections | 4320 @cindex selections |
4232 This module provides an interface to the X Window System's concept of | 4321 This module provides an interface to the X Window System's concept of |
4233 @dfn{selections}, the standard way for X applications to communicate | 4322 @dfn{selections}, the standard way for X applications to communicate |
4298 | 4387 |
4299 Don't touch this code; something is liable to break if you do. | 4388 Don't touch this code; something is liable to break if you do. |
4300 | 4389 |
4301 | 4390 |
4302 | 4391 |
4303 @node Modules for Internationalization | 4392 @node Modules for Internationalization, , Modules for Interfacing with X Windows, A Summary of the Various XEmacs Modules |
4304 @section Modules for Internationalization | 4393 @section Modules for Internationalization |
4305 | 4394 |
4306 @example | 4395 @example |
4307 mule-canna.c | 4396 mule-canna.c |
4308 mule-ccl.c | 4397 mule-ccl.c |
4309 mule-charset.c | 4398 mule-charset.c |
4310 mule-charset.h | 4399 mule-charset.h |
4311 mule-coding.c | 4400 file-coding.c |
4312 mule-coding.h | 4401 file-coding.h |
4313 mule-mcpath.c | 4402 mule-mcpath.c |
4314 mule-mcpath.h | 4403 mule-mcpath.h |
4315 mule-wnnfns.c | 4404 mule-wnnfns.c |
4316 mule.c | 4405 mule.c |
4317 @end example | 4406 @end example |
4319 These files implement the MULE (Asian-language) support. Note that MULE | 4408 These files implement the MULE (Asian-language) support. Note that MULE |
4320 actually provides a general interface for all sorts of languages, not | 4409 actually provides a general interface for all sorts of languages, not |
4321 just Asian languages (although they are generally the most complicated | 4410 just Asian languages (although they are generally the most complicated |
4322 to support). This code is still in beta. | 4411 to support). This code is still in beta. |
4323 | 4412 |
4324 @file{mule-charset.*} and @file{mule-coding.*} provide the heart of the | 4413 @file{mule-charset.*} and @file{file-coding.*} provide the heart of the |
4325 XEmacs MULE support. @file{mule-charset.*} implements the @dfn{charset} | 4414 XEmacs MULE support. @file{mule-charset.*} implements the @dfn{charset} |
4326 Lisp object type, which encapsulates a character set (an ordered one- or | 4415 Lisp object type, which encapsulates a character set (an ordered one- or |
4327 two-dimensional set of characters, such as US ASCII or JISX0208 Japanese | 4416 two-dimensional set of characters, such as US ASCII or JISX0208 Japanese |
4328 Kanji). | 4417 Kanji). |
4329 | 4418 |
4330 @file{mule-coding.*} implements the @dfn{coding-system} Lisp object | 4419 @file{file-coding.*} implements the @dfn{coding-system} Lisp object |
4331 type, which encapsulates a method of converting between different | 4420 type, which encapsulates a method of converting between different |
4332 encodings. An encoding is a representation of a stream of characters, | 4421 encodings. An encoding is a representation of a stream of characters, |
4333 possibly from multiple character sets, using a stream of bytes or words, | 4422 possibly from multiple character sets, using a stream of bytes or words, |
4334 and defines (e.g.) which escape sequences are used to specify particular | 4423 and defines (e.g.) which escape sequences are used to specify particular |
4335 character sets, how the indices for a character are converted into bytes | 4424 character sets, how the indices for a character are converted into bytes |
4375 Asian-language support, and is not currently used. | 4464 Asian-language support, and is not currently used. |
4376 | 4465 |
4377 | 4466 |
4378 | 4467 |
4379 | 4468 |
4380 @node Allocation of Objects in XEmacs Lisp, Events and the Event Loop, A Summary of the Various XEmacs Modules, Top | 4469 @node Allocation of Objects in XEmacs Lisp, Dumping, A Summary of the Various XEmacs Modules, Top |
4381 @chapter Allocation of Objects in XEmacs Lisp | 4470 @chapter Allocation of Objects in XEmacs Lisp |
4382 | 4471 |
4383 @menu | 4472 @menu |
4384 * Introduction to Allocation:: | 4473 * Introduction to Allocation:: |
4385 * Garbage Collection:: | 4474 * Garbage Collection:: |
4387 * Garbage Collection - Step by Step:: | 4476 * Garbage Collection - Step by Step:: |
4388 * Integers and Characters:: | 4477 * Integers and Characters:: |
4389 * Allocation from Frob Blocks:: | 4478 * Allocation from Frob Blocks:: |
4390 * lrecords:: | 4479 * lrecords:: |
4391 * Low-level allocation:: | 4480 * Low-level allocation:: |
4392 * Pure Space:: | |
4393 * Cons:: | 4481 * Cons:: |
4394 * Vector:: | 4482 * Vector:: |
4395 * Bit Vector:: | 4483 * Bit Vector:: |
4396 * Symbol:: | 4484 * Symbol:: |
4397 * Marker:: | 4485 * Marker:: |
4398 * String:: | 4486 * String:: |
4399 * Compiled Function:: | 4487 * Compiled Function:: |
4400 @end menu | 4488 @end menu |
4401 | 4489 |
4402 @node Introduction to Allocation | 4490 @node Introduction to Allocation, Garbage Collection, Allocation of Objects in XEmacs Lisp, Allocation of Objects in XEmacs Lisp |
4403 @section Introduction to Allocation | 4491 @section Introduction to Allocation |
4404 | 4492 |
4405 Emacs Lisp, like all Lisps, has garbage collection. This means that | 4493 Emacs Lisp, like all Lisps, has garbage collection. This means that |
4406 the programmer never has to explicitly free (destroy) an object; it | 4494 the programmer never has to explicitly free (destroy) an object; it |
4407 happens automatically when the object becomes inaccessible. Most | 4495 happens automatically when the object becomes inaccessible. Most |
4418 symbols, the primitives are @code{make-symbol} and @code{intern}; etc. | 4506 symbols, the primitives are @code{make-symbol} and @code{intern}; etc. |
4419 Some Lisp objects, especially those that are primarily used internally, | 4507 Some Lisp objects, especially those that are primarily used internally, |
4420 have no corresponding Lisp primitives. Every Lisp object, though, | 4508 have no corresponding Lisp primitives. Every Lisp object, though, |
4421 has at least one C primitive for creating it. | 4509 has at least one C primitive for creating it. |
4422 | 4510 |
4423 Recall from section (VII) that a Lisp object, as stored in a 32-bit | 4511 Recall from section (VII) that a Lisp object, as stored in a 32-bit or |
4424 or 64-bit word, has a mark bit, a few tag bits, and a ``value'' that | 4512 64-bit word, has a few tag bits, and a ``value'' that occupies the |
4425 occupies the remainder of the bits. We can separate the different | 4513 remainder of the bits. We can separate the different Lisp object types |
4426 Lisp object types into four broad categories: | 4514 into three broad categories: |
4427 | 4515 |
4428 @itemize @bullet | 4516 @itemize @bullet |
4429 @item | 4517 @item |
4430 (a) Those for whom the value directly represents the contents of the | 4518 (a) Those for whom the value directly represents the contents of the |
4431 Lisp object. Only two types are in this category: integers and | 4519 Lisp object. Only two types are in this category: integers and |
4432 characters. No special allocation or garbage collection is necessary | 4520 characters. No special allocation or garbage collection is necessary |
4433 for such objects. Lisp objects of these types do not need to be | 4521 for such objects. Lisp objects of these types do not need to be |
4434 @code{GCPRO}ed. | 4522 @code{GCPRO}ed. |
4435 @end itemize | 4523 @end itemize |
4436 | 4524 |
4437 In the remaining three categories, the value is a pointer to a | |
4438 structure. | |
4439 | |
4440 @itemize @bullet | |
4441 @item | |
4442 @cindex frob block | |
4443 (b) Those for whom the tag directly specifies the type. Recall that | |
4444 there are only three tag bits; this means that at most five types can be | |
4445 specified this way. The most commonly-used types are stored in this | |
4446 format; this includes conses, strings, vectors, and sometimes symbols. | |
4447 With the exception of vectors, objects in this category are allocated in | |
4448 @dfn{frob blocks}, i.e. large blocks of memory that are subdivided into | |
4449 individual objects. This saves a lot on malloc overhead, since there | |
4450 are typically quite a lot of these objects around, and the objects are | |
4451 small. (A cons, for example, occupies 8 bytes on 32-bit machines---4 | |
4452 bytes for each of the two objects it contains.) Vectors are individually | |
4453 @code{malloc()}ed since they are of variable size. (It would be | |
4454 possible, and desirable, to allocate vectors of certain small sizes out | |
4455 of frob blocks, but it isn't currently done.) Strings are handled | |
4456 specially: Each string is allocated in two parts, a fixed size structure | |
4457 containing a length and a data pointer, and the actual data of the | |
4458 string. The former structure is allocated in frob blocks as usual, and | |
4459 the latter data is stored in @dfn{string chars blocks} and is relocated | |
4460 during garbage collection to eliminate holes. | |
4461 @end itemize | |
4462 | |
4463 In the remaining two categories, the type is stored in the object | 4525 In the remaining two categories, the type is stored in the object |
4464 itself. The tag for all such objects is the generic @dfn{lrecord} | 4526 itself. The tag for all such objects is the generic @dfn{lrecord} |
4465 (Lisp_Record) tag. The first four bytes (or eight, for 64-bit machines) | 4527 (Lisp_Type_Record) tag. The first bytes of the object's structure are an |
4466 of the object's structure are a pointer to a structure that describes | 4528 integer (actually a char) characterising the object's type and some |
4467 the object's type, which includes method pointers and a pointer to a | 4529 flags, in particular the mark bit used for garbage collection. A |
4468 string naming the type. Note that it's possible to save some space by | 4530 structure describing the type is accessible thru the |
4469 using a one- or two-byte tag, rather than a four- or eight-byte pointer | 4531 lrecord_implementation_table indexed with said integer. This structure |
4470 to store the type, but it's not clear it's worth making the change. | 4532 includes the method pointers and a pointer to a string naming the type. |
4471 | 4533 |
4472 @itemize @bullet | 4534 @itemize @bullet |
4473 @item | 4535 @item |
4474 (c) Those lrecords that are allocated in frob blocks (see above). This | 4536 (b) Those lrecords that are allocated in frob blocks (see above). This |
4475 includes the objects that are most common and relatively small, and | 4537 includes the objects that are most common and relatively small, and |
4476 includes floats, compiled functions, symbols (when not in category (b)), | 4538 includes conses, strings, subrs, floats, compiled functions, symbols, |
4477 extents, events, and markers. With the cleanup of frob blocks done in | 4539 extents, events, and markers. With the cleanup of frob blocks done in |
4478 19.12, it's not terribly hard to add more objects to this category, but | 4540 19.12, it's not terribly hard to add more objects to this category, but |
4479 it's a bit trickier than adding an object type to type (d) (esp. if the | 4541 it's a bit trickier than adding an object type to type (c) (esp. if the |
4480 object needs a finalization method), and is not likely to save much | 4542 object needs a finalization method), and is not likely to save much |
4481 space unless the object is small and there are many of them. (In fact, | 4543 space unless the object is small and there are many of them. (In fact, |
4482 if there are very few of them, it might actually waste space.) | 4544 if there are very few of them, it might actually waste space.) |
4483 @item | 4545 @item |
4484 (d) Those lrecords that are individually @code{malloc()}ed. These are | 4546 (c) Those lrecords that are individually @code{malloc()}ed. These are |
4485 called @dfn{lcrecords}. All other types are in this category. Adding a | 4547 called @dfn{lcrecords}. All other types are in this category. Adding a |
4486 new type to this category is comparatively easy, and all types added | 4548 new type to this category is comparatively easy, and all types added |
4487 since 19.8 (when the current allocation scheme was devised, by Richard | 4549 since 19.8 (when the current allocation scheme was devised, by Richard |
4488 Mlynarik), with the exception of the character type, have been in this | 4550 Mlynarik), with the exception of the character type, have been in this |
4489 category. | 4551 category. |
4490 @end itemize | 4552 @end itemize |
4491 | 4553 |
4492 Note that bit vectors are a bit of a special case. They are | 4554 Note that bit vectors are a bit of a special case. They are |
4493 simple lrecords as in category (c), but are individually @code{malloc()}ed | 4555 simple lrecords as in category (b), but are individually @code{malloc()}ed |
4494 like vectors. You can basically view them as exactly like vectors | 4556 like vectors. You can basically view them as exactly like vectors |
4495 except that their type is stored in lrecord fashion rather than | 4557 except that their type is stored in lrecord fashion rather than |
4496 in directly-tagged fashion. | 4558 in directly-tagged fashion. |
4497 | 4559 |
4498 Note that FSF Emacs redesigned their object system in 19.29 to follow | 4560 |
4499 a similar scheme. However, given RMS's expressed dislike for data | 4561 @node Garbage Collection, GCPROing, Introduction to Allocation, Allocation of Objects in XEmacs Lisp |
4500 abstraction, the FSF scheme is not nearly as clean or as easy to | |
4501 extend. (FSF calls items of type (c) @code{Lisp_Misc} and items of type | |
4502 (d) @code{Lisp_Vectorlike}, with separate tags for each, although | |
4503 @code{Lisp_Vectorlike} is also used for vectors.) | |
4504 | |
4505 @node Garbage Collection | |
4506 @section Garbage Collection | 4562 @section Garbage Collection |
4507 @cindex garbage collection | 4563 @cindex garbage collection |
4508 | 4564 |
4509 @cindex mark and sweep | 4565 @cindex mark and sweep |
4510 Garbage collection is simple in theory but tricky to implement. | 4566 Garbage collection is simple in theory but tricky to implement. |
4518 that ``all of memory'' means all currently allocated objects. | 4574 that ``all of memory'' means all currently allocated objects. |
4519 Traversing all these objects means traversing all frob blocks, | 4575 Traversing all these objects means traversing all frob blocks, |
4520 all vectors (which are chained in one big list), and all | 4576 all vectors (which are chained in one big list), and all |
4521 lcrecords (which are likewise chained). | 4577 lcrecords (which are likewise chained). |
4522 | 4578 |
4523 Note that, when an object is marked, the mark has to occur | 4579 Garbage collection can be invoked explicitly by calling |
4524 inside of the object's structure, rather than in the 32-bit | 4580 @code{garbage-collect} but is also called automatically by @code{eval}, |
4525 @code{Lisp_Object} holding the object's pointer; i.e. you can't just | 4581 once a certain amount of memory has been allocated since the last |
4526 set the pointer's mark bit. This is because there may be many | 4582 garbage collection (according to @code{gc-cons-threshold}). |
4527 pointers to the same object. This means that the method of | 4583 |
4528 marking an object can differ depending on the type. The | 4584 |
4529 different marking methods are approximately as follows: | 4585 @node GCPROing, Garbage Collection - Step by Step, Garbage Collection, Allocation of Objects in XEmacs Lisp |
4530 | |
4531 @enumerate | |
4532 @item | |
4533 For conses, the mark bit of the car is set. | |
4534 @item | |
4535 For strings, the mark bit of the string's plist is set. | |
4536 @item | |
4537 For symbols when not lrecords, the mark bit of the | |
4538 symbol's plist is set. | |
4539 @item | |
4540 For vectors, the length is negated after adding 1. | |
4541 @item | |
4542 For lrecords, the pointer to the structure describing | |
4543 the type is changed (see below). | |
4544 @item | |
4545 Integers and characters do not need to be marked, since | |
4546 no allocation occurs for them. | |
4547 @end enumerate | |
4548 | |
4549 The details of this are in the @code{mark_object()} function. | |
4550 | |
4551 Note that any code that operates during garbage collection has | |
4552 to be especially careful because of the fact that some objects | |
4553 may be marked and as such may not look like they normally do. | |
4554 In particular: | |
4555 | |
4556 @itemize @bullet | |
4557 Some object pointers may have their mark bit set. This will make | |
4558 @code{FOOBARP()} predicates fail. Use @code{GC_FOOBARP()} to deal with | |
4559 this. | |
4560 @item | |
4561 Even if you clear the mark bit, @code{FOOBARP()} will still fail | |
4562 for lrecords because the implementation pointer has been | |
4563 changed (see below). @code{GC_FOOBARP()} will correctly deal with | |
4564 this. | |
4565 @item | |
4566 Vectors have their size field munged, so anything that | |
4567 looks at this field will fail. | |
4568 @item | |
4569 Note that @code{XFOOBAR()} macros @emph{will} work correctly on object | |
4570 pointers with their mark bit set, because the logical shift operations | |
4571 that remove the tag also remove the mark bit. | |
4572 @end itemize | |
4573 | |
4574 Finally, note that garbage collection can be invoked explicitly | |
4575 by calling @code{garbage-collect} but is also called automatically | |
4576 by @code{eval}, once a certain amount of memory has been allocated | |
4577 since the last garbage collection (according to @code{gc-cons-threshold}). | |
4578 | |
4579 @node GCPROing | |
4580 @section @code{GCPRO}ing | 4586 @section @code{GCPRO}ing |
4581 | 4587 |
4582 @code{GCPRO}ing is one of the ugliest and trickiest parts of Emacs | 4588 @code{GCPRO}ing is one of the ugliest and trickiest parts of Emacs |
4583 internals. The basic idea is that whenever garbage collection | 4589 internals. The basic idea is that whenever garbage collection |
4584 occurs, all in-use objects must be reachable somehow or | 4590 occurs, all in-use objects must be reachable somehow or |
4585 other from one of the roots of accessibility. The roots | 4591 other from one of the roots of accessibility. The roots |
4586 of accessibility are: | 4592 of accessibility are: |
4587 | 4593 |
4588 @enumerate | 4594 @enumerate |
4589 @item | 4595 @item |
4590 All objects that have been @code{staticpro()}d. This is used for | 4596 All objects that have been @code{staticpro()}d or |
4591 any global C variables that hold Lisp objects. A call to | 4597 @code{staticpro_nodump()}ed. This is used for any global C variables |
4592 @code{staticpro()} happens implicitly as a result of any symbols | 4598 that hold Lisp objects. A call to @code{staticpro()} happens implicitly |
4593 declared with @code{defsymbol()} and any variables declared with | 4599 as a result of any symbols declared with @code{defsymbol()} and any |
4594 @code{DEFVAR_FOO()}. You need to explicitly call @code{staticpro()} | 4600 variables declared with @code{DEFVAR_FOO()}. You need to explicitly |
4595 (in the @code{vars_of_foo()} method of a module) for other global | 4601 call @code{staticpro()} (in the @code{vars_of_foo()} method of a module) |
4596 C variables holding Lisp objects. (This typically includes | 4602 for other global C variables holding Lisp objects. (This typically |
4597 internal lists and such things.) | 4603 includes internal lists and such things.). Use |
4604 @code{staticpro_nodump()} only in the rare cases when you do not want | |
4605 the pointed variable to be saved at dump time but rather recompute it at | |
4606 startup. | |
4598 | 4607 |
4599 Note that @code{obarray} is one of the @code{staticpro()}d things. | 4608 Note that @code{obarray} is one of the @code{staticpro()}d things. |
4600 Therefore, all functions and variables get marked through this. | 4609 Therefore, all functions and variables get marked through this. |
4601 @item | 4610 @item |
4602 Any shadowed bindings that are sitting on the @code{specpdl} stack. | 4611 Any shadowed bindings that are sitting on the @code{specpdl} stack. |
4727 anything that looks like a reference to an object as a reference. This | 4736 anything that looks like a reference to an object as a reference. This |
4728 will result in a few objects not getting collected when they should, but | 4737 will result in a few objects not getting collected when they should, but |
4729 it obviates the need for @code{GCPRO}ing, and allows garbage collection | 4738 it obviates the need for @code{GCPRO}ing, and allows garbage collection |
4730 to happen at any point at all, such as during object allocation. | 4739 to happen at any point at all, such as during object allocation. |
4731 | 4740 |
4732 @node Garbage Collection - Step by Step | 4741 @node Garbage Collection - Step by Step, Integers and Characters, GCPROing, Allocation of Objects in XEmacs Lisp |
4733 @section Garbage Collection - Step by Step | 4742 @section Garbage Collection - Step by Step |
4734 @cindex garbage collection step by step | 4743 @cindex garbage collection step by step |
4735 | 4744 |
4736 @menu | 4745 @menu |
4737 * Invocation:: | 4746 * Invocation:: |
4742 * compact_string_chars:: | 4751 * compact_string_chars:: |
4743 * sweep_strings:: | 4752 * sweep_strings:: |
4744 * sweep_bit_vectors_1:: | 4753 * sweep_bit_vectors_1:: |
4745 @end menu | 4754 @end menu |
4746 | 4755 |
4747 @node Invocation | 4756 @node Invocation, garbage_collect_1, Garbage Collection - Step by Step, Garbage Collection - Step by Step |
4748 @subsection Invocation | 4757 @subsection Invocation |
4749 @cindex garbage collection, invocation | 4758 @cindex garbage collection, invocation |
4750 | 4759 |
4751 The first thing that anyone should know about garbage collection is: | 4760 The first thing that anyone should know about garbage collection is: |
4752 when and how the garbage collector is invoked. One might think that this | 4761 when and how the garbage collector is invoked. One might think that this |
4753 could happen every time new memory is allocated, e.g. new objects are | 4762 could happen every time new memory is allocated, e.g. new objects are |
4754 created, but this is @emph{not} the case. Instead, we have the following | 4763 created, but this is @emph{not} the case. Instead, we have the following |
4755 situation: | 4764 situation: |
4756 | 4765 |
4757 The entry point of any process of garbage collection is an invocation | 4766 The entry point of any process of garbage collection is an invocation |
4758 of the function @code{garbage_collect_1} in file @code{alloc.c}. The | 4767 of the function @code{garbage_collect_1} in file @code{alloc.c}. The |
4759 invocation can occur @emph{explicitly} by calling the function | 4768 invocation can occur @emph{explicitly} by calling the function |
4760 @code{Fgarbage_collect} (in addition this function provides information | 4769 @code{Fgarbage_collect} (in addition this function provides information |
4761 about the freed memory), or can occur @emph{implicitly} in four different | 4770 about the freed memory), or can occur @emph{implicitly} in four different |
4762 situations: | 4771 situations: |
4763 @enumerate | 4772 @enumerate |
4764 @item | 4773 @item |
4765 In function @code{main_1} in file @code{emacs.c}. This function is called | 4774 In function @code{main_1} in file @code{emacs.c}. This function is called |
4766 at each startup of xemacs. The garbage collection is invoked after all | 4775 at each startup of xemacs. The garbage collection is invoked after all |
4767 initial creations are completed, but only if a special internal error | 4776 initial creations are completed, but only if a special internal error |
4768 checking-constant @code{ERROR_CHECK_GC} is defined. | 4777 checking-constant @code{ERROR_CHECK_GC} is defined. |
4769 @item | 4778 @item |
4770 In function @code{disksave_object_finalization} in file | 4779 In function @code{disksave_object_finalization} in file |
4771 @code{alloc.c}. The only purpose of this function is to clear the | 4780 @code{alloc.c}. The only purpose of this function is to clear the |
4772 objects from memory which need not be stored with xemacs when we dump out | 4781 objects from memory which need not be stored with xemacs when we dump out |
4773 an executable. This is only done by @code{Fdump_emacs} or by | 4782 an executable. This is only done by @code{Fdump_emacs} or by |
4774 @code{Fdump_emacs_data} respectively (both in @code{emacs.c}). The | 4783 @code{Fdump_emacs_data} respectively (both in @code{emacs.c}). The |
4775 actual clearing is accomplished by making these objects unreachable and | 4784 actual clearing is accomplished by making these objects unreachable and |
4776 starting a garbage collection. The function is only used while building | 4785 starting a garbage collection. The function is only used while building |
4777 xemacs. | 4786 xemacs. |
4791 @code{Feval}. | 4800 @code{Feval}. |
4792 @end enumerate | 4801 @end enumerate |
4793 | 4802 |
4794 The upshot is that garbage collection can basically occur everywhere | 4803 The upshot is that garbage collection can basically occur everywhere |
4795 @code{Feval}, respectively @code{Ffuncall}, is used - either directly or | 4804 @code{Feval}, respectively @code{Ffuncall}, is used - either directly or |
4796 through another function. Since calls to these two functions are | 4805 through another function. Since calls to these two functions are hidden |
4797 hidden in various other functions, many calls to | 4806 in various other functions, many calls to @code{garbage_collect_1} are |
4798 @code{garabge_collect_1} are not obviously foreseeable, and therefore | 4807 not obviously foreseeable, and therefore unexpected. Instances where |
4799 unexpected. Instances where they are used that are worth remembering are | 4808 they are used that are worth remembering are various elisp commands, as |
4800 various elisp commands, as for example @code{or}, | 4809 for example @code{or}, @code{and}, @code{if}, @code{cond}, @code{while}, |
4801 @code{and}, @code{if}, @code{cond}, @code{while}, @code{setq}, etc., | 4810 @code{setq}, etc., miscellaneous @code{gui_item_...} functions, |
4802 miscellaneous @code{gui_item_...} functions, everything related to | 4811 everything related to @code{eval} (@code{Feval_buffer}, @code{call0}, |
4803 @code{eval} (@code{Feval_buffer}, @code{call0}, ...) and inside | 4812 ...) and inside @code{Fsignal}. The latter is used to handle signals, as |
4804 @code{Fsignal}. The latter is used to handle signals, as for example the | 4813 for example the ones raised by every @code{QUITE}-macro triggered after |
4805 ones raised by every @code{QUITE}-macro triggered after pressing Ctrl-g. | 4814 pressing Ctrl-g. |
4806 | 4815 |
4807 @node garbage_collect_1 | 4816 @node garbage_collect_1, mark_object, Invocation, Garbage Collection - Step by Step |
4808 @subsection @code{garbage_collect_1} | 4817 @subsection @code{garbage_collect_1} |
4809 @cindex @code{garbage_collect_1} | 4818 @cindex @code{garbage_collect_1} |
4810 | 4819 |
4811 We can now describe exactly what happens after the invocation takes | 4820 We can now describe exactly what happens after the invocation takes |
4812 place. | 4821 place. |
4813 @enumerate | 4822 @enumerate |
4814 @item | 4823 @item |
4815 There are several cases in which the garbage collector is left immediately: | 4824 There are several cases in which the garbage collector is left immediately: |
4816 when we are already garbage collecting (@code{gc_in_progress}), when | 4825 when we are already garbage collecting (@code{gc_in_progress}), when |
4817 the garbage collection is somehow forbidden | 4826 the garbage collection is somehow forbidden |
4818 (@code{gc_currently_forbidden}), when we are currently displaying something | 4827 (@code{gc_currently_forbidden}), when we are currently displaying something |
4819 (@code{in_display}) or when we are preparing for the armageddon of the | 4828 (@code{in_display}) or when we are preparing for the armageddon of the |
4820 whole system (@code{preparing_for_armageddon}). | 4829 whole system (@code{preparing_for_armageddon}). |
4821 @item | 4830 @item |
4822 Next the correct frame in which to put | 4831 Next the correct frame in which to put |
4823 all the output occurring during garbage collecting is determined. In | 4832 all the output occurring during garbage collecting is determined. In |
4824 order to be able to restore the old display's state after displaying the | 4833 order to be able to restore the old display's state after displaying the |
4825 message, some data about the current cursor position has to be | 4834 message, some data about the current cursor position has to be |
4826 saved. The variables @code{pre_gc_curser} and @code{cursor_changed} take | 4835 saved. The variables @code{pre_gc_cursor} and @code{cursor_changed} take |
4827 care of that. | 4836 care of that. |
4828 @item | 4837 @item |
4829 The state of @code{gc_currently_forbidden} must be restored after | 4838 The state of @code{gc_currently_forbidden} must be restored after |
4830 the garbage collection, no matter what happens during the process. We | 4839 the garbage collection, no matter what happens during the process. We |
4831 accomplish this by @code{record_unwind_protect}ing the suitable function | 4840 accomplish this by @code{record_unwind_protect}ing the suitable function |
4832 @code{restore_gc_inhibit} together with the current value of | 4841 @code{restore_gc_inhibit} together with the current value of |
4833 @code{gc_currently_forbidden}. | 4842 @code{gc_currently_forbidden}. |
4834 @item | 4843 @item |
4835 If we are concurrently running an interactive xemacs session, the next step | 4844 If we are concurrently running an interactive xemacs session, the next step |
4836 is simply to show the garbage collector's cursor/message. | 4845 is simply to show the garbage collector's cursor/message. |
4837 @item | 4846 @item |
4838 The following steps are the intrinsic steps of the garbage collector, | 4847 The following steps are the intrinsic steps of the garbage collector, |
4842 frame. However, this seems to be a currently unused feature. | 4851 frame. However, this seems to be a currently unused feature. |
4843 @item | 4852 @item |
4844 Before actually starting to go over all live objects, references to | 4853 Before actually starting to go over all live objects, references to |
4845 objects that are no longer used are pruned. We only have to do this for events | 4854 objects that are no longer used are pruned. We only have to do this for events |
4846 (@code{clear_event_resource}) and for specifiers | 4855 (@code{clear_event_resource}) and for specifiers |
4847 (@code{cleanup_specifiers}). | 4856 (@code{cleanup_specifiers}). |
4848 @item | 4857 @item |
4849 Now the mark phase begins and marks all accessible elements. In order to | 4858 Now the mark phase begins and marks all accessible elements. In order to |
4850 start from | 4859 start from |
4851 all slots that serve as roots of accessibility, the function | 4860 all slots that serve as roots of accessibility, the function |
4852 @code{mark_object} is called for each root individually to go out from | 4861 @code{mark_object} is called for each root individually to go out from |
4854 shown in their processed order: | 4863 shown in their processed order: |
4855 @itemize @bullet | 4864 @itemize @bullet |
4856 @item | 4865 @item |
4857 all constant symbols and static variables that are registered via | 4866 all constant symbols and static variables that are registered via |
4858 @code{staticpro}@ in the array @code{staticvec}. | 4867 @code{staticpro}@ in the array @code{staticvec}. |
4859 @xref{Adding Global Lisp Variables}. | 4868 @xref{Adding Global Lisp Variables}. |
4860 @item | 4869 @item |
4861 all Lisp objects that are created in C functions and that must be | 4870 all Lisp objects that are created in C functions and that must be |
4862 protected from freeing them. They are registered in the global | 4871 protected from freeing them. They are registered in the global |
4863 list @code{gcprolist}. | 4872 list @code{gcprolist}. |
4864 @xref{GCPROing}. | 4873 @xref{GCPROing}. |
4865 @item | 4874 @item |
4866 all local variables (i.e. their name fields @code{symbol} and old | 4875 all local variables (i.e. their name fields @code{symbol} and old |
4867 values @code{old_values}) that are bound during the evaluation by the Lisp | 4876 values @code{old_values}) that are bound during the evaluation by the Lisp |
4868 engine. They are stored in @code{specbinding} structs pushed on a stack | 4877 engine. They are stored in @code{specbinding} structs pushed on a stack |
4869 called @code{specpdl}. | 4878 called @code{specpdl}. |
4870 @xref{Dynamic Binding; The specbinding Stack; Unwind-Protects}. | 4879 @xref{Dynamic Binding; The specbinding Stack; Unwind-Protects}. |
4873 cause the creation of structs @code{catchtag} inserted in the list | 4882 cause the creation of structs @code{catchtag} inserted in the list |
4874 @code{catchlist}. Their tag (@code{tag}) and value (@code{val} fields | 4883 @code{catchlist}. Their tag (@code{tag}) and value (@code{val} fields |
4875 are freshly created objects and therefore have to be marked. | 4884 are freshly created objects and therefore have to be marked. |
4876 @xref{Catch and Throw}. | 4885 @xref{Catch and Throw}. |
4877 @item | 4886 @item |
4878 every function application pushes new structs @code{backtrace} | 4887 every function application pushes new structs @code{backtrace} |
4879 on the call stack of the Lisp engine (@code{backtrace_list}). The unique | 4888 on the call stack of the Lisp engine (@code{backtrace_list}). The unique |
4880 parts that have to be marked are the fields for each function | 4889 parts that have to be marked are the fields for each function |
4881 (@code{function}) and all their arguments (@code{args}). | 4890 (@code{function}) and all their arguments (@code{args}). |
4882 @xref{Evaluation}. | 4891 @xref{Evaluation}. |
4883 @item | 4892 @item |
4884 all objects that are used by the redisplay engine that must not be freed | 4893 all objects that are used by the redisplay engine that must not be freed |
4885 are marked by a special function called @code{mark_redisplay} (in | 4894 are marked by a special function called @code{mark_redisplay} (in |
4886 @code{redisplay.c}). | 4895 @code{redisplay.c}). |
4887 @item | 4896 @item |
4888 all objects created for profiling purposes are allocated by C functions | 4897 all objects created for profiling purposes are allocated by C functions |
4889 instead of using the lisp allocation mechanisms. In order to receive the | 4898 instead of using the lisp allocation mechanisms. In order to receive the |
4897 during the estimation of the live objects during garbage collection. | 4906 during the estimation of the live objects during garbage collection. |
4898 Any object referenced only by weak pointers is collected | 4907 Any object referenced only by weak pointers is collected |
4899 anyway, and the reference to it is cleared. In hash tables there are | 4908 anyway, and the reference to it is cleared. In hash tables there are |
4900 different usage patterns of them, manifesting in different types of hash | 4909 different usage patterns of them, manifesting in different types of hash |
4901 tables, namely 'non-weak', 'weak', 'key-weak' and 'value-weak' | 4910 tables, namely 'non-weak', 'weak', 'key-weak' and 'value-weak' |
4902 (internally also 'key-car-weak' and 'value-car-weak') hash tables, each | 4911 (internally also 'key-car-weak' and 'value-car-weak') hash tables, each |
4903 clearing entries depending on different conditions. More information can | 4912 clearing entries depending on different conditions. More information can |
4904 be found in the documentation to the function @code{make-hash-table}. | 4913 be found in the documentation to the function @code{make-hash-table}. |
4905 | 4914 |
4906 Because there are complicated dependency rules about when and what to | 4915 Because there are complicated dependency rules about when and what to |
4907 mark while processing weak hash tables, the standard @code{marker} | 4916 mark while processing weak hash tables, the standard @code{marker} |
4908 method is only active if it is marking non-weak hash tables. As soon as | 4917 method is only active if it is marking non-weak hash tables. As soon as |
4909 a weak component is in the table, the hash table entries are ignored | 4918 a weak component is in the table, the hash table entries are ignored |
4910 while marking. Instead their marking is done each separately by the | 4919 while marking. Instead their marking is done each separately by the |
4911 function @code{finish_marking_weak_hash_tables}. This function iterates | 4920 function @code{finish_marking_weak_hash_tables}. This function iterates |
4912 over each hash table entry @code{hentries} for each weak hash table in | 4921 over each hash table entry @code{hentries} for each weak hash table in |
4913 @code{Vall_weak_hash_tables}. Depending on the type of a table, the | 4922 @code{Vall_weak_hash_tables}. Depending on the type of a table, the |
4914 appropriate action is performed. | 4923 appropriate action is performed. |
4915 If a table is acting as @code{HASH_TABLE_KEY_WEAK}, and a key already marked, | 4924 If a table is acting as @code{HASH_TABLE_KEY_WEAK}, and a key already marked, |
4916 everything reachable from the @code{value} component is marked. If it is | 4925 everything reachable from the @code{value} component is marked. If it is |
4917 acting as a @code{HASH_TABLE_VALUE_WEAK} and the value component is | 4926 acting as a @code{HASH_TABLE_VALUE_WEAK} and the value component is |
4918 already marked, the marking starts beginning only from the | 4927 already marked, the marking starts beginning only from the |
4919 @code{key} component. | 4928 @code{key} component. |
4920 If it is a @code{HASH_TABLE_KEY_CAR_WEAK} and the car | 4929 If it is a @code{HASH_TABLE_KEY_CAR_WEAK} and the car |
4921 of the key entry is already marked, we mark both the @code{key} and | 4930 of the key entry is already marked, we mark both the @code{key} and |
4922 @code{value} components. | 4931 @code{value} components. |
4923 Finally, if the table is of the type @code{HASH_TABLE_VALUE_CAR_WEAK} | 4932 Finally, if the table is of the type @code{HASH_TABLE_VALUE_CAR_WEAK} |
4924 and the car of the value components is already marked, again both the | 4933 and the car of the value components is already marked, again both the |
4925 @code{key} and the @code{value} components get marked. | 4934 @code{key} and the @code{value} components get marked. |
4927 Again, there are lists with comparable properties called weak | 4936 Again, there are lists with comparable properties called weak |
4928 lists. There exist different peculiarities of their types called | 4937 lists. There exist different peculiarities of their types called |
4929 @code{simple}, @code{assoc}, @code{key-assoc} and | 4938 @code{simple}, @code{assoc}, @code{key-assoc} and |
4930 @code{value-assoc}. You can find further details about them in the | 4939 @code{value-assoc}. You can find further details about them in the |
4931 description to the function @code{make-weak-list}. The scheme of their | 4940 description to the function @code{make-weak-list}. The scheme of their |
4932 marking is similar: all weak lists are listed in @code{Qall_weak_lists}, | 4941 marking is similar: all weak lists are listed in @code{Qall_weak_lists}, |
4933 therefore we iterate over them. The marking is advanced until we hit an | 4942 therefore we iterate over them. The marking is advanced until we hit an |
4934 already marked pair. Then we know that during a former run all | 4943 already marked pair. Then we know that during a former run all |
4935 the rest has been marked completely. Again, depending on the special | 4944 the rest has been marked completely. Again, depending on the special |
4936 type of the weak list, our jobs differ. If it is a @code{WEAK_LIST_SIMPLE} | 4945 type of the weak list, our jobs differ. If it is a @code{WEAK_LIST_SIMPLE} |
4937 and the elem is marked, we mark the @code{cons} part. If it is a | 4946 and the elem is marked, we mark the @code{cons} part. If it is a |
4938 @code{WEAK_LIST_ASSOC} and not a pair or a pair with both marked car and | 4947 @code{WEAK_LIST_ASSOC} and not a pair or a pair with both marked car and |
4939 cdr, we mark the @code{cons} and the @code{elem}. If it is a | 4948 cdr, we mark the @code{cons} and the @code{elem}. If it is a |
4942 a @code{WEAK_LIST_VALUE_ASSOC} and not a pair or a pair with a marked | 4951 a @code{WEAK_LIST_VALUE_ASSOC} and not a pair or a pair with a marked |
4943 cdr of the elem, we mark both the @code{cons} and the @code{elem}. | 4952 cdr of the elem, we mark both the @code{cons} and the @code{elem}. |
4944 | 4953 |
4945 Since, by marking objects in reach from weak hash tables and weak lists, | 4954 Since, by marking objects in reach from weak hash tables and weak lists, |
4946 other objects could get marked, this perhaps implies further marking of | 4955 other objects could get marked, this perhaps implies further marking of |
4947 other weak objects, both finishing functions are redone as long as | 4956 other weak objects, both finishing functions are redone as long as |
4948 yet unmarked objects get freshly marked. | 4957 yet unmarked objects get freshly marked. |
4949 | 4958 |
4950 @item | 4959 @item |
4951 After completing the special marking for the weak hash tables and for the weak | 4960 After completing the special marking for the weak hash tables and for the weak |
4952 lists, all entries that point to objects that are going to be swept in | 4961 lists, all entries that point to objects that are going to be swept in |
4954 the table or the list. | 4963 the table or the list. |
4955 | 4964 |
4956 The function @code{prune_weak_hash_tables} does the job for weak hash | 4965 The function @code{prune_weak_hash_tables} does the job for weak hash |
4957 tables. Totally unmarked hash tables are removed from the list | 4966 tables. Totally unmarked hash tables are removed from the list |
4958 @code{Vall_weak_hash_tables}. The other ones are treated more carefully | 4967 @code{Vall_weak_hash_tables}. The other ones are treated more carefully |
4959 by scanning over all entries and removing one as soon as one of | 4968 by scanning over all entries and removing one as soon as one of |
4960 the components @code{key} and @code{value} is unmarked. | 4969 the components @code{key} and @code{value} is unmarked. |
4961 | 4970 |
4962 The same idea applies to the weak lists. It is accomplished by | 4971 The same idea applies to the weak lists. It is accomplished by |
4963 @code{prune_weak_lists}: An unmarked list is pruned from | 4972 @code{prune_weak_lists}: An unmarked list is pruned from |
4964 @code{Vall_weak_lists} immediately. A marked list is treated more | 4973 @code{Vall_weak_lists} immediately. A marked list is treated more |
4965 carefully by going over it and removing just the unmarked pairs. | 4974 carefully by going over it and removing just the unmarked pairs. |
4966 | 4975 |
4967 @item | 4976 @item |
4968 The function @code{prune_specifiers} checks all listed specifiers held | 4977 The function @code{prune_specifiers} checks all listed specifiers held |
4969 in @code{Vall_speficiers} and removes the ones from the lists that are | 4978 in @code{Vall_specifiers} and removes the ones from the lists that are |
4970 unmarked. | 4979 unmarked. |
4971 | 4980 |
4972 @item | 4981 @item |
4973 All syntax tables are stored in a list called | 4982 All syntax tables are stored in a list called |
4974 @code{Vall_syntax_tables}. The function @code{prune_syntax_tables} walks | 4983 @code{Vall_syntax_tables}. The function @code{prune_syntax_tables} walks |
4975 through it and unlinks the tables that are unmarked. | 4984 through it and unlinks the tables that are unmarked. |
4976 | 4985 |
4977 @item | 4986 @item |
4978 Next, we will attack the complete sweeping - the function | 4987 Next, we will attack the complete sweeping - the function |
4979 @code{gc_sweep} which holds the predominance. | 4988 @code{gc_sweep} which holds the predominance. |
4980 @item | 4989 @item |
4981 First, all the variables with respect to garbage collection are | 4990 First, all the variables with respect to garbage collection are |
4982 reset. @code{consing_since_gc} - the counter of the created cells since | 4991 reset. @code{consing_since_gc} - the counter of the created cells since |
4983 the last garbage collection - is set back to 0, and | 4992 the last garbage collection - is set back to 0, and |
4984 @code{gc_in_progress} is not @code{true} anymore. | 4993 @code{gc_in_progress} is not @code{true} anymore. |
4985 @item | 4994 @item |
4986 In case the session is interactive, the displayed cursor and message are | 4995 In case the session is interactive, the displayed cursor and message are |
4987 removed again. | 4996 removed again. |
4988 @item | 4997 @item |
4989 The state of @code{gc_inhibit} is restored to the former value by | 4998 The state of @code{gc_inhibit} is restored to the former value by |
4990 unwinding the stack. | 4999 unwinding the stack. |
4991 @item | 5000 @item |
4992 A small memory reserve is always held back that can be reached by | 5001 A small memory reserve is always held back that can be reached by |
4993 @code{breathing_space}. If nothing more is left, we create a new reserve | 5002 @code{breathing_space}. If nothing more is left, we create a new reserve |
4994 and exit. | 5003 and exit. |
4995 @end enumerate | 5004 @end enumerate |
4996 | 5005 |
4997 @node mark_object | 5006 @node mark_object, gc_sweep, garbage_collect_1, Garbage Collection - Step by Step |
4998 @subsection @code{mark_object} | 5007 @subsection @code{mark_object} |
4999 @cindex @code{mark_object} | 5008 @cindex @code{mark_object} |
5000 | 5009 |
5001 The first thing that is checked while marking an object is whether the | 5010 The first thing that is checked while marking an object is whether the |
5002 object is a real Lisp object @code{Lisp_Type_Record} or just an integer | 5011 object is a real Lisp object @code{Lisp_Type_Record} or just an integer |
5003 or a character. Integers and characters are the only two types that are | 5012 or a character. Integers and characters are the only two types that are |
5004 stored directly - without another level of indirection, and therefore they | 5013 stored directly - without another level of indirection, and therefore they |
5005 don't have to be marked and collected. | 5014 don't have to be marked and collected. |
5006 @xref{How Lisp Objects Are Represented in C}. | 5015 @xref{How Lisp Objects Are Represented in C}. |
5007 | 5016 |
5008 The second case is the one we have to handle. It is the one when we are | 5017 The second case is the one we have to handle. It is the one when we are |
5009 dealing with a pointer to a Lisp object. But, there exist also three | 5018 dealing with a pointer to a Lisp object. But, there exist also three |
5010 possibilities, that prevent us from doing anything while marking: The | 5019 possibilities, that prevent us from doing anything while marking: The |
5011 object is read only which prevents it from being garbage collected, | 5020 object is read only which prevents it from being garbage collected, |
5012 i.e. marked (@code{C_READONLY_RECORD_HEADER}). The object in question is | 5021 i.e. marked (@code{C_READONLY_RECORD_HEADER}). The object in question is |
5013 already marked, and need not be marked for the second time (checked by | 5022 already marked, and need not be marked for the second time (checked by |
5014 @code{MARKED_RECORD_HEADER_P}). If it is a special, unmarkable object | 5023 @code{MARKED_RECORD_HEADER_P}). If it is a special, unmarkable object |
5015 (@code{UNMARKABLE_RECORD_HEADER_P}, apparently, these are objects that | 5024 (@code{UNMARKABLE_RECORD_HEADER_P}, apparently, these are objects that |
5016 sit in some CONST space, and can therefore not be marked, see | 5025 sit in some const space, and can therefore not be marked, see |
5017 @code{this_one_is_unmarkable} in @code{alloc.c}). | 5026 @code{this_one_is_unmarkable} in @code{alloc.c}). |
5018 | 5027 |
5019 Now, the actual marking is feasible. We do so by once using the macro | 5028 Now, the actual marking is feasible. We do so by once using the macro |
5020 @code{MARK_RECORD_HEADER} to mark the object itself (actually the | 5029 @code{MARK_RECORD_HEADER} to mark the object itself (actually the |
5021 special flag in the lrecord header), and calling its special marker | 5030 special flag in the lrecord header), and calling its special marker |
5022 "method" @code{marker} if available. The marker method marks every | 5031 "method" @code{marker} if available. The marker method marks every |
5023 other object that is in reach from our current object. Note, that these | 5032 other object that is in reach from our current object. Note, that these |
5024 marker methods should not call @code{mark_object} recursively, but | 5033 marker methods should not call @code{mark_object} recursively, but |
5025 instead should return the next object from where further marking has to | 5034 instead should return the next object from where further marking has to |
5026 be performed. | 5035 be performed. |
5027 | 5036 |
5028 In case another object was returned, as mentioned before, we reiterate | 5037 In case another object was returned, as mentioned before, we reiterate |
5029 the whole @code{mark_object} process beginning with this next object. | 5038 the whole @code{mark_object} process beginning with this next object. |
5030 | 5039 |
5031 @node gc_sweep | 5040 @node gc_sweep, sweep_lcrecords_1, mark_object, Garbage Collection - Step by Step |
5032 @subsection @code{gc_sweep} | 5041 @subsection @code{gc_sweep} |
5033 @cindex @code{gc_sweep} | 5042 @cindex @code{gc_sweep} |
5034 | 5043 |
5035 The job of this function is to free all unmarked records from memory. As | 5044 The job of this function is to free all unmarked records from memory. As |
5036 we know, there are different types of objects implemented and managed, and | 5045 we know, there are different types of objects implemented and managed, and |
5037 consequently different ways to free them from memory. | 5046 consequently different ways to free them from memory. |
5038 @xref{Introduction to Allocation}. | 5047 @xref{Introduction to Allocation}. |
5039 | 5048 |
5040 We start with all objects stored through @code{lcrecords}. All | 5049 We start with all objects stored through @code{lcrecords}. All |
5041 bulkier objects are allocated and handled using that scheme of | 5050 bulkier objects are allocated and handled using that scheme of |
5042 @code{lcrecords}. Each object is @code{malloc}ed separately | 5051 @code{lcrecords}. Each object is @code{malloc}ed separately |
5043 instead of placing it in one of the contiguous frob blocks. All types | 5052 instead of placing it in one of the contiguous frob blocks. All types |
5044 that are currently stored | 5053 that are currently stored |
5045 using @code{lcrecords}'s @code{alloc_lcrecord} and | 5054 using @code{lcrecords}'s @code{alloc_lcrecord} and |
5046 @code{make_lcrecord_list} are the types: vectors, buffers, | 5055 @code{make_lcrecord_list} are the types: vectors, buffers, |
5047 char-table, char-table-entry, console, weak-list, database, device, | 5056 char-table, char-table-entry, console, weak-list, database, device, |
5048 ldap, hash-table, command-builder, extent-auxiliary, extent-info, face, | 5057 ldap, hash-table, command-builder, extent-auxiliary, extent-info, face, |
5049 coding-system, frame, image-instance, glyph, popup-data, gui-item, | 5058 coding-system, frame, image-instance, glyph, popup-data, gui-item, |
5057 doing the whole job for us. | 5066 doing the whole job for us. |
5058 For a description about the internals: @xref{lrecords}. | 5067 For a description about the internals: @xref{lrecords}. |
5059 | 5068 |
5060 Our next candidates are the other objects that behave quite differently | 5069 Our next candidates are the other objects that behave quite differently |
5061 than everything else: the strings. They consists of two parts, a | 5070 than everything else: the strings. They consists of two parts, a |
5062 fixed-size portion (@code{struct Lisp_string}) holding the string's | 5071 fixed-size portion (@code{struct Lisp_String}) holding the string's |
5063 length, its property list and a pointer to the second part, and the | 5072 length, its property list and a pointer to the second part, and the |
5064 actual string data, which is stored in string-chars blocks comparable to | 5073 actual string data, which is stored in string-chars blocks comparable to |
5065 frob blocks. In this block, the data is not only freed, but also a | 5074 frob blocks. In this block, the data is not only freed, but also a |
5066 compression of holes is made, i.e. all strings are relocated together. | 5075 compression of holes is made, i.e. all strings are relocated together. |
5067 @xref{String}. This compacting phase is performed by the function | 5076 @xref{String}. This compacting phase is performed by the function |
5074 @code{sweep_symbols}, @code{sweep_extents}, @code{sweep_markers} and | 5083 @code{sweep_symbols}, @code{sweep_extents}, @code{sweep_markers} and |
5075 @code{sweep_extents}. They are the fixed-size types cons, floats, | 5084 @code{sweep_extents}. They are the fixed-size types cons, floats, |
5076 compiled-functions, symbol, marker, extent, and event stored in | 5085 compiled-functions, symbol, marker, extent, and event stored in |
5077 so-called "frob blocks", and therefore we can basically do the same on | 5086 so-called "frob blocks", and therefore we can basically do the same on |
5078 every type objects, using the same macros, especially defined only to | 5087 every type objects, using the same macros, especially defined only to |
5079 handle everything with respect to fixed-size blocks. The only fixed-size | 5088 handle everything with respect to fixed-size blocks. The only fixed-size |
5080 type that is not handled here are the fixed-size portion of strings, | 5089 type that is not handled here are the fixed-size portion of strings, |
5081 because we took special care of them earlier. | 5090 because we took special care of them earlier. |
5082 | 5091 |
5083 The only big exceptions are bit vectors stored differently and | 5092 The only big exceptions are bit vectors stored differently and |
5084 therefore treated differently by the function @code{sweep_bit_vectors_1} | 5093 therefore treated differently by the function @code{sweep_bit_vectors_1} |
5085 described later. | 5094 described later. |
5086 | 5095 |
5087 At first, we need some brief information about how | 5096 At first, we need some brief information about how |
5088 these fixed-size types are managed in general, in order to understand | 5097 these fixed-size types are managed in general, in order to understand |
5089 how the sweeping is done. They have all a fixed size, and are therefore | 5098 how the sweeping is done. They have all a fixed size, and are therefore |
5090 stored in big blocks of memory - allocated at once - that can hold a | 5099 stored in big blocks of memory - allocated at once - that can hold a |
5091 certain amount of objects of one type. The macro | 5100 certain amount of objects of one type. The macro |
5092 @code{DECLARE_FIXED_TYPE_ALLOC} creates the suitable structures for | 5101 @code{DECLARE_FIXED_TYPE_ALLOC} creates the suitable structures for |
5093 every type. More precisely, we have the block struct | 5102 every type. More precisely, we have the block struct |
5094 (holding a pointer to the previous block @code{prev} and the | 5103 (holding a pointer to the previous block @code{prev} and the |
5095 objects in @code{block[]}), a pointer to current block | 5104 objects in @code{block[]}), a pointer to current block |
5096 (@code{current_..._block)}) and its last index | 5105 (@code{current_..._block)}) and its last index |
5097 (@code{current_..._block_index}), and a pointer to the free list that | 5106 (@code{current_..._block_index}), and a pointer to the free list that |
5098 will be created. Also a macro @code{FIXED_TYPE_FROM_BLOCK} plus some | 5107 will be created. Also a macro @code{FIXED_TYPE_FROM_BLOCK} plus some |
5104 The rest works as follows: all of them define a | 5113 The rest works as follows: all of them define a |
5105 macro @code{UNMARK_...} that is used to unmark the object. They define a | 5114 macro @code{UNMARK_...} that is used to unmark the object. They define a |
5106 macro @code{ADDITIONAL_FREE_...} that defines additional work that has | 5115 macro @code{ADDITIONAL_FREE_...} that defines additional work that has |
5107 to be done when converting an object from in use to not in use (so far, | 5116 to be done when converting an object from in use to not in use (so far, |
5108 only markers use it in order to unchain them). Then, they all call | 5117 only markers use it in order to unchain them). Then, they all call |
5109 the macro @code{SWEEP_FIXED_TYPE_BLOCK} instantiated with their type name | 5118 the macro @code{SWEEP_FIXED_TYPE_BLOCK} instantiated with their type name |
5110 and their struct name. | 5119 and their struct name. |
5111 | 5120 |
5112 This call in particular does the following: we go over all blocks | 5121 This call in particular does the following: we go over all blocks |
5113 starting with the current moving towards the oldest. | 5122 starting with the current moving towards the oldest. |
5114 For each block, we look at every object in it. If the object already | 5123 For each block, we look at every object in it. If the object already |
5115 freed (checked with @code{FREE_STRUCT_P} using the first pointer of the | 5124 freed (checked with @code{FREE_STRUCT_P} using the first pointer of the |
5116 object), or if it is | 5125 object), or if it is |
5117 set to read only (@code{C_READONLY_RECORD_HEADER_P}, nothing must be | 5126 set to read only (@code{C_READONLY_RECORD_HEADER_P}, nothing must be |
5118 done. If it is unmarked (checked with @code{MARKED_RECORD_HEADER_P}), it | 5127 done. If it is unmarked (checked with @code{MARKED_RECORD_HEADER_P}), it |
5119 is put in the free list and set free (using the macro | 5128 is put in the free list and set free (using the macro |
5120 @code{FREE_FIXED_TYPE}, otherwise it stays in the block, but is unmarked | 5129 @code{FREE_FIXED_TYPE}, otherwise it stays in the block, but is unmarked |
5121 (by @code{UNMARK_...}). While going through one block, we note if the | 5130 (by @code{UNMARK_...}). While going through one block, we note if the |
5122 whole block is empty. If so, the whole block is freed (using | 5131 whole block is empty. If so, the whole block is freed (using |
5123 @code{xfree}) and the free list state is set to the state it had before | 5132 @code{xfree}) and the free list state is set to the state it had before |
5124 handling this block. | 5133 handling this block. |
5125 | 5134 |
5126 @node sweep_lcrecords_1 | 5135 @node sweep_lcrecords_1, compact_string_chars, gc_sweep, Garbage Collection - Step by Step |
5127 @subsection @code{sweep_lcrecords_1} | 5136 @subsection @code{sweep_lcrecords_1} |
5128 @cindex @code{sweep_lcrecords_1} | 5137 @cindex @code{sweep_lcrecords_1} |
5129 | 5138 |
5130 After nullifying the complete lcrecord statistics, we go over all | 5139 After nullifying the complete lcrecord statistics, we go over all |
5131 lcrecords two separate times. They are all chained together in a list with | 5140 lcrecords two separate times. They are all chained together in a list with |
5132 a head called @code{all_lcrecords}. | 5141 a head called @code{all_lcrecords}. |
5133 | 5142 |
5134 The first loop calls for each object its @code{finalizer} method, but only | 5143 The first loop calls for each object its @code{finalizer} method, but only |
5135 in the case that it is not read only | 5144 in the case that it is not read only |
5136 (@code{C_READONLY_RECORD_HEADER_P)}, it is not already marked | 5145 (@code{C_READONLY_RECORD_HEADER_P)}, it is not already marked |
5137 (@code{MARKED_RECORD_HEADER_P}), it is not already in a free list (list of | 5146 (@code{MARKED_RECORD_HEADER_P}), it is not already in a free list (list of |
5138 freed objects, field @code{free}) and finally it owns a finalizer | 5147 freed objects, field @code{free}) and finally it owns a finalizer |
5139 method. | 5148 method. |
5140 | 5149 |
5141 The second loop actually frees the appropriate objects again by iterating | 5150 The second loop actually frees the appropriate objects again by iterating |
5142 through the whole list. In case an object is read only or marked, it | 5151 through the whole list. In case an object is read only or marked, it |
5143 has to persist, otherwise it is manually freed by calling | 5152 has to persist, otherwise it is manually freed by calling |
5144 @code{xfree}. During this loop, the lcrecord statistics are kept up to | 5153 @code{xfree}. During this loop, the lcrecord statistics are kept up to |
5145 date by calling @code{tick_lcrecord_stats} with the right arguments, | 5154 date by calling @code{tick_lcrecord_stats} with the right arguments, |
5146 | 5155 |
5147 @node compact_string_chars | 5156 @node compact_string_chars, sweep_strings, sweep_lcrecords_1, Garbage Collection - Step by Step |
5148 @subsection @code{compact_string_chars} | 5157 @subsection @code{compact_string_chars} |
5149 @cindex @code{compact_string_chars} | 5158 @cindex @code{compact_string_chars} |
5150 | 5159 |
5151 The purpose of this function is to compact all the data parts of the | 5160 The purpose of this function is to compact all the data parts of the |
5152 strings that are held in so-called @code{string_chars_block}, i.e. the | 5161 strings that are held in so-called @code{string_chars_block}, i.e. the |
5154 | 5163 |
5155 The procedure with which this is done is as follows. We are keeping two | 5164 The procedure with which this is done is as follows. We are keeping two |
5156 positions in the @code{string_chars_block}s using two pointer/integer | 5165 positions in the @code{string_chars_block}s using two pointer/integer |
5157 pairs, namely @code{from_sb}/@code{from_pos} and | 5166 pairs, namely @code{from_sb}/@code{from_pos} and |
5158 @code{to_sb}/@code{to_pos}. They stand for the actual positions, from | 5167 @code{to_sb}/@code{to_pos}. They stand for the actual positions, from |
5159 where to where, to copy the actually handled string. | 5168 where to where, to copy the actually handled string. |
5160 | 5169 |
5161 While going over all chained @code{string_char_block}s and their held | 5170 While going over all chained @code{string_char_block}s and their held |
5162 strings, staring at @code{first_string_chars_block}, both pointers | 5171 strings, staring at @code{first_string_chars_block}, both pointers |
5163 are advanced and eventually a string is copied from @code{from_sb} to | 5172 are advanced and eventually a string is copied from @code{from_sb} to |
5164 @code{to_sb}, depending on the status of the pointed at strings. | 5173 @code{to_sb}, depending on the status of the pointed at strings. |
5165 | 5174 |
5166 More precisely, we can distinguish between the following actions. | 5175 More precisely, we can distinguish between the following actions. |
5167 @itemize @bullet | 5176 @itemize @bullet |
5168 @item | 5177 @item |
5169 The string at @code{from_sb}'s position could be marked as free, which | 5178 The string at @code{from_sb}'s position could be marked as free, which |
5170 is indicated by an invalid pointer to the pointer that should point back | 5179 is indicated by an invalid pointer to the pointer that should point back |
5171 to the fixed size string object, and which is checked by | 5180 to the fixed size string object, and which is checked by |
5172 @code{FREE_STRUCT_P}. In this case, the @code{from_sb}/@code{from_pos} | 5181 @code{FREE_STRUCT_P}. In this case, the @code{from_sb}/@code{from_pos} |
5173 is advanced to the next string, and nothing has to be copied. | 5182 is advanced to the next string, and nothing has to be copied. |
5174 @item | 5183 @item |
5175 Also, if a string object itself is unmarked, nothing has to be | 5184 Also, if a string object itself is unmarked, nothing has to be |
5176 copied. We likewise advance the @code{from_sb}/@code{from_pos} | 5185 copied. We likewise advance the @code{from_sb}/@code{from_pos} |
5177 pair as described above. | 5186 pair as described above. |
5178 @item | 5187 @item |
5179 In all other cases, we have a marked string at hand. The string data | 5188 In all other cases, we have a marked string at hand. The string data |
5180 must be moved from the from-position to the to-position. In case | 5189 must be moved from the from-position to the to-position. In case |
5181 there is not enough space in the actual @code{to_sb}-block, we advance | 5190 there is not enough space in the actual @code{to_sb}-block, we advance |
5182 this pointer to the beginning of the next block before copying. In case the | 5191 this pointer to the beginning of the next block before copying. In case the |
5183 from and to positions are different, we perform the | 5192 from and to positions are different, we perform the |
5184 actual copying using the library function @code{memmove}. | 5193 actual copying using the library function @code{memmove}. |
5188 @code{string_chars_block}, sitting in @code{current_string_chars_block}, | 5197 @code{string_chars_block}, sitting in @code{current_string_chars_block}, |
5189 is reset on the last block to which we moved a string, | 5198 is reset on the last block to which we moved a string, |
5190 i.e. @code{to_block}, and all remaining blocks (we know that they just | 5199 i.e. @code{to_block}, and all remaining blocks (we know that they just |
5191 carry garbage) are explicitly @code{xfree}d. | 5200 carry garbage) are explicitly @code{xfree}d. |
5192 | 5201 |
5193 @node sweep_strings | 5202 @node sweep_strings, sweep_bit_vectors_1, compact_string_chars, Garbage Collection - Step by Step |
5194 @subsection @code{sweep_strings} | 5203 @subsection @code{sweep_strings} |
5195 @cindex @code{sweep_strings} | 5204 @cindex @code{sweep_strings} |
5196 | 5205 |
5197 The sweeping for the fixed sized string objects is essentially exactly | 5206 The sweeping for the fixed sized string objects is essentially exactly |
5198 the same as it is for all other fixed size types. As before, the freeing | 5207 the same as it is for all other fixed size types. As before, the freeing |
5200 @code{SWEEP_FIXED_SIZE_BLOCK} after defining the right macros | 5209 @code{SWEEP_FIXED_SIZE_BLOCK} after defining the right macros |
5201 @code{UNMARK_string} and @code{ADDITIONAL_FREE_string}. These two | 5210 @code{UNMARK_string} and @code{ADDITIONAL_FREE_string}. These two |
5202 definitions are a little bit special compared to the ones used | 5211 definitions are a little bit special compared to the ones used |
5203 for the other fixed size types. | 5212 for the other fixed size types. |
5204 | 5213 |
5205 @code{UNMARK_string} is defined the same way except some additional code | 5214 @code{UNMARK_string} is defined the same way except some additional code |
5206 used for updating the bookkeeping information. | 5215 used for updating the bookkeeping information. |
5207 | 5216 |
5208 For strings, @code{ADDITIONAL_FREE_string} has to do something in | 5217 For strings, @code{ADDITIONAL_FREE_string} has to do something in |
5209 addition: in case, the string was not allocated in a | 5218 addition: in case, the string was not allocated in a |
5210 @code{string_chars_block} because it exceeded the maximal length, and | 5219 @code{string_chars_block} because it exceeded the maximal length, and |
5211 therefore it was @code{malloc}ed separately, we know also @code{xfree} | 5220 therefore it was @code{malloc}ed separately, we know also @code{xfree} |
5212 it explicitly. | 5221 it explicitly. |
5213 | 5222 |
5214 @node sweep_bit_vectors_1 | 5223 @node sweep_bit_vectors_1, , sweep_strings, Garbage Collection - Step by Step |
5215 @subsection @code{sweep_bit_vectors_1} | 5224 @subsection @code{sweep_bit_vectors_1} |
5216 @cindex @code{sweep_bit_vectors_1} | 5225 @cindex @code{sweep_bit_vectors_1} |
5217 | 5226 |
5218 Bit vectors are also one of the rare types that are @code{malloc}ed | 5227 Bit vectors are also one of the rare types that are @code{malloc}ed |
5219 individually. Consequently, while sweeping, all further needless | 5228 individually. Consequently, while sweeping, all further needless |
5220 bit vectors must be freed by hand. This is done, as one might imagine, | 5229 bit vectors must be freed by hand. This is done, as one might imagine, |
5221 the expected way: since they are all registered in a list called | 5230 the expected way: since they are all registered in a list called |
5222 @code{all_bit_vectors}, all elements of that list are traversed, | 5231 @code{all_bit_vectors}, all elements of that list are traversed, |
5223 all unmarked bit vectors are unlinked by calling @code{xfree} and all of | 5232 all unmarked bit vectors are unlinked by calling @code{xfree} and all of |
5224 them become unmarked. | 5233 them become unmarked. |
5225 In addition, the bookkeeping information used for garbage | 5234 In addition, the bookkeeping information used for garbage |
5226 collector's output purposes is updated. | 5235 collector's output purposes is updated. |
5227 | 5236 |
5228 @node Integers and Characters | 5237 @node Integers and Characters, Allocation from Frob Blocks, Garbage Collection - Step by Step, Allocation of Objects in XEmacs Lisp |
5229 @section Integers and Characters | 5238 @section Integers and Characters |
5230 | 5239 |
5231 Integer and character Lisp objects are created from integers using the | 5240 Integer and character Lisp objects are created from integers using the |
5232 macros @code{XSETINT()} and @code{XSETCHAR()} or the equivalent | 5241 macros @code{XSETINT()} and @code{XSETCHAR()} or the equivalent |
5233 functions @code{make_int()} and @code{make_char()}. (These are actually | 5242 functions @code{make_int()} and @code{make_char()}. (These are actually |
5237 | 5246 |
5238 @code{XSETINT()} and the like will truncate values given to them that | 5247 @code{XSETINT()} and the like will truncate values given to them that |
5239 are too big; i.e. you won't get the value you expected but the tag bits | 5248 are too big; i.e. you won't get the value you expected but the tag bits |
5240 will at least be correct. | 5249 will at least be correct. |
5241 | 5250 |
5242 @node Allocation from Frob Blocks | 5251 @node Allocation from Frob Blocks, lrecords, Integers and Characters, Allocation of Objects in XEmacs Lisp |
5243 @section Allocation from Frob Blocks | 5252 @section Allocation from Frob Blocks |
5244 | 5253 |
5245 The uninitialized memory required by a @code{Lisp_Object} of a particular type | 5254 The uninitialized memory required by a @code{Lisp_Object} of a particular type |
5246 is allocated using | 5255 is allocated using |
5247 @code{ALLOCATE_FIXED_TYPE()}. This only occurs inside of the | 5256 @code{ALLOCATE_FIXED_TYPE()}. This only occurs inside of the |
5264 @code{ALLOCATE_FIXED_TYPE_FROM_BLOCK()}, which looks at the end of the | 5273 @code{ALLOCATE_FIXED_TYPE_FROM_BLOCK()}, which looks at the end of the |
5265 last frob block for space, and creates a new frob block if there is | 5274 last frob block for space, and creates a new frob block if there is |
5266 none. (There are actually two versions of these macros, one of which is | 5275 none. (There are actually two versions of these macros, one of which is |
5267 more defensive but less efficient and is used for error-checking.) | 5276 more defensive but less efficient and is used for error-checking.) |
5268 | 5277 |
5269 @node lrecords | 5278 @node lrecords, Low-level allocation, Allocation from Frob Blocks, Allocation of Objects in XEmacs Lisp |
5270 @section lrecords | 5279 @section lrecords |
5271 | 5280 |
5272 [see @file{lrecord.h}] | 5281 [see @file{lrecord.h}] |
5273 | 5282 |
5274 All lrecords have at the beginning of their structure a @code{struct | 5283 All lrecords have at the beginning of their structure a @code{struct |
5275 lrecord_header}. This just contains a pointer to a @code{struct | 5284 lrecord_header}. This just contains a type number and some flags, |
5285 including the mark bit. All builtin type numbers are defined as | |
5286 constants in @code{enum lrecord_type}, to allow the compiler to generate | |
5287 more efficient code for @code{@var{type}P}. The type number, thru the | |
5288 @code{lrecord_implementation_table}, gives access to a @code{struct | |
5276 lrecord_implementation}, which is a structure containing method pointers | 5289 lrecord_implementation}, which is a structure containing method pointers |
5277 and such. There is one of these for each type, and it is a global, | 5290 and such. There is one of these for each type, and it is a global, |
5278 constant, statically-declared structure that is declared in the | 5291 constant, statically-declared structure that is declared in the |
5279 @code{DEFINE_LRECORD_IMPLEMENTATION()} macro. (This macro actually | 5292 @code{DEFINE_LRECORD_IMPLEMENTATION()} macro. |
5280 declares an array of two @code{struct lrecord_implementation} | 5293 |
5281 structures. The first one contains all the standard method pointers, | 5294 Simple lrecords (of type (b) above) just have a @code{struct |
5282 and is used in all normal circumstances. During garbage collection, | |
5283 however, the lrecord is @dfn{marked} by bumping its implementation | |
5284 pointer by one, so that it points to the second structure in the array. | |
5285 This structure contains a special indication in it that it's a | |
5286 @dfn{marked-object} structure: the finalize method is the special | |
5287 function @code{this_marks_a_marked_record()}, and all other methods are | |
5288 null pointers. At the end of garbage collection, all lrecords will | |
5289 either be reclaimed or unmarked by decrementing their implementation | |
5290 pointers, so this second structure pointer will never remain past | |
5291 garbage collection. | |
5292 | |
5293 Simple lrecords (of type (c) above) just have a @code{struct | |
5294 lrecord_header} at their beginning. lcrecords, however, actually have a | 5295 lrecord_header} at their beginning. lcrecords, however, actually have a |
5295 @code{struct lcrecord_header}. This, in turn, has a @code{struct | 5296 @code{struct lcrecord_header}. This, in turn, has a @code{struct |
5296 lrecord_header} at its beginning, so sanity is preserved; but it also | 5297 lrecord_header} at its beginning, so sanity is preserved; but it also |
5297 has a pointer used to chain all lcrecords together, and a special ID | 5298 has a pointer used to chain all lcrecords together, and a special ID |
5298 field used to distinguish one lcrecord from another. (This field is used | 5299 field used to distinguish one lcrecord from another. (This field is used |
5316 type. | 5317 type. |
5317 | 5318 |
5318 Whenever you create an lrecord, you need to call either | 5319 Whenever you create an lrecord, you need to call either |
5319 @code{DEFINE_LRECORD_IMPLEMENTATION()} or | 5320 @code{DEFINE_LRECORD_IMPLEMENTATION()} or |
5320 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()}. This needs to be | 5321 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()}. This needs to be |
5321 specified in a C file, at the top level. What this actually does is | 5322 specified in a @file{.c} file, at the top level. What this actually |
5322 define and initialize the implementation structure for the lrecord. (And | 5323 does is define and initialize the implementation structure for the |
5323 possibly declares a function @code{error_check_foo()} that implements | 5324 lrecord. (And possibly declares a function @code{error_check_foo()} that |
5324 the @code{XFOO()} macro when error-checking is enabled.) The arguments | 5325 implements the @code{XFOO()} macro when error-checking is enabled.) The |
5325 to the macros are the actual type name (this is used to construct the C | 5326 arguments to the macros are the actual type name (this is used to |
5326 variable name of the lrecord implementation structure and related | 5327 construct the C variable name of the lrecord implementation structure |
5327 structures using the @samp{##} macro concatenation operator), a string | 5328 and related structures using the @samp{##} macro concatenation |
5328 that names the type on the Lisp level (this may not be the same as the C | 5329 operator), a string that names the type on the Lisp level (this may not |
5329 type name; typically, the C type name has underscores, while the Lisp | 5330 be the same as the C type name; typically, the C type name has |
5330 string has dashes), various method pointers, and the name of the C | 5331 underscores, while the Lisp string has dashes), various method pointers, |
5331 structure that contains the object. The methods are used to encapsulate | 5332 and the name of the C structure that contains the object. The methods |
5332 type-specific information about the object, such as how to print it or | 5333 are used to encapsulate type-specific information about the object, such |
5333 mark it for garbage collection, so that it's easy to add new object | 5334 as how to print it or mark it for garbage collection, so that it's easy |
5334 types without having to add a specific case for each new type in a bunch | 5335 to add new object types without having to add a specific case for each |
5335 of different places. | 5336 new type in a bunch of different places. |
5336 | 5337 |
5337 The difference between @code{DEFINE_LRECORD_IMPLEMENTATION()} and | 5338 The difference between @code{DEFINE_LRECORD_IMPLEMENTATION()} and |
5338 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()} is that the former is | 5339 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()} is that the former is |
5339 used for fixed-size object types and the latter is for variable-size | 5340 used for fixed-size object types and the latter is for variable-size |
5340 object types. Most object types are fixed-size; some complex | 5341 object types. Most object types are fixed-size; some complex |
5344 (Currently this is only used for keeping allocation statistics.) | 5345 (Currently this is only used for keeping allocation statistics.) |
5345 | 5346 |
5346 For the purpose of keeping allocation statistics, the allocation | 5347 For the purpose of keeping allocation statistics, the allocation |
5347 engine keeps a list of all the different types that exist. Note that, | 5348 engine keeps a list of all the different types that exist. Note that, |
5348 since @code{DEFINE_LRECORD_IMPLEMENTATION()} is a macro that is | 5349 since @code{DEFINE_LRECORD_IMPLEMENTATION()} is a macro that is |
5349 specified at top-level, there is no way for it to add to the list of all | 5350 specified at top-level, there is no way for it to initialize the global |
5350 existing types. What happens instead is that each implementation | 5351 data structures containing type information, like |
5351 structure contains in it a dynamically assigned number that is | 5352 @code{lrecord_implementations_table}. For this reason a call to |
5352 particular to that type. (Or rather, it contains a pointer to another | 5353 @code{INIT_LRECORD_IMPLEMENTATION} must be added to the same source file |
5353 structure that contains this number. This evasiveness is done so that | 5354 containing @code{DEFINE_LRECORD_IMPLEMENTATION}, but instead of to the |
5354 the implementation structure can be declared const.) In the sweep stage | 5355 top level, to one of the init functions, typically |
5355 of garbage collection, each lrecord is examined to see if its | 5356 @code{syms_of_@var{foo}.c}. @code{INIT_LRECORD_IMPLEMENTATION} must be |
5356 implementation structure has its dynamically-assigned number set. If | 5357 called before an object of this type is used. |
5357 not, it must be a new type, and it is added to the list of known types | 5358 |
5358 and a new number assigned. The number is used to index into an array | 5359 The type number is also used to index into an array holding the number |
5359 holding the number of objects of each type and the total memory | 5360 of objects of each type and the total memory allocated for objects of |
5360 allocated for objects of that type. The statistics in this array are | 5361 that type. The statistics in this array are computed during the sweep |
5361 also computed during the sweep stage. These statistics are returned by | 5362 stage. These statistics are returned by the call to |
5362 the call to @code{garbage-collect} and are printed out at the end of the | 5363 @code{garbage-collect}. |
5363 loadup phase. | |
5364 | 5364 |
5365 Note that for every type defined with a @code{DEFINE_LRECORD_*()} | 5365 Note that for every type defined with a @code{DEFINE_LRECORD_*()} |
5366 macro, there needs to be a @code{DECLARE_LRECORD_IMPLEMENTATION()} | 5366 macro, there needs to be a @code{DECLARE_LRECORD_IMPLEMENTATION()} |
5367 somewhere in a @file{.h} file, and this @file{.h} file needs to be | 5367 somewhere in a @file{.h} file, and this @file{.h} file needs to be |
5368 included by @file{inline.c}. | 5368 included by @file{inline.c}. |
5369 | 5369 |
5370 Furthermore, there should generally be a set of @code{XFOOBAR()}, | 5370 Furthermore, there should generally be a set of @code{XFOOBAR()}, |
5371 @code{FOOBARP()}, etc. macros in a @file{.h} (or occasionally @file{.c}) | 5371 @code{FOOBARP()}, etc. macros in a @file{.h} (or occasionally @file{.c}) |
5372 file. To create one of these, copy an existing model and modify as | 5372 file. To create one of these, copy an existing model and modify as |
5373 necessary. | 5373 necessary. |
5374 | |
5375 @strong{Please note:} If you define an lrecord in an external | |
5376 dynamically-loaded module, you must use @code{DECLARE_EXTERNAL_LRECORD}, | |
5377 @code{DEFINE_EXTERNAL_LRECORD_IMPLEMENTATION}, and | |
5378 @code{DEFINE_EXTERNAL_LRECORD_SEQUENCE_IMPLEMENTATION} instead of the | |
5379 non-EXTERNAL forms. These macros will dynamically add new type numbers | |
5380 to the global enum that records them, whereas the non-EXTERNAL forms | |
5381 assume that the programmer has already inserted the correct type numbers | |
5382 into the enum's code at compile-time. | |
5374 | 5383 |
5375 The various methods in the lrecord implementation structure are: | 5384 The various methods in the lrecord implementation structure are: |
5376 | 5385 |
5377 @enumerate | 5386 @enumerate |
5378 @item | 5387 @item |
5503 (i.e. declared with a @code{_SEQUENCE_IMPLEMENTATION} macro.) This should | 5512 (i.e. declared with a @code{_SEQUENCE_IMPLEMENTATION} macro.) This should |
5504 simply return the object's size in bytes, exactly as you might expect. | 5513 simply return the object's size in bytes, exactly as you might expect. |
5505 For an example, see the methods for window configurations and opaques. | 5514 For an example, see the methods for window configurations and opaques. |
5506 @end enumerate | 5515 @end enumerate |
5507 | 5516 |
5508 @node Low-level allocation | 5517 @node Low-level allocation, Cons, lrecords, Allocation of Objects in XEmacs Lisp |
5509 @section Low-level allocation | 5518 @section Low-level allocation |
5510 | 5519 |
5511 Memory that you want to allocate directly should be allocated using | 5520 Memory that you want to allocate directly should be allocated using |
5512 @code{xmalloc()} rather than @code{malloc()}. This implements | 5521 @code{xmalloc()} rather than @code{malloc()}. This implements |
5513 error-checking on the return value, and once upon a time did some more | 5522 error-checking on the return value, and once upon a time did some more |
5564 XEmacs taps into them and issues a warning through the standard | 5573 XEmacs taps into them and issues a warning through the standard |
5565 warning system, when memory gets to 75%, 85%, and 95% full. | 5574 warning system, when memory gets to 75%, 85%, and 95% full. |
5566 (On some systems, the memory warnings are not functional.) | 5575 (On some systems, the memory warnings are not functional.) |
5567 | 5576 |
5568 Allocated memory that is going to be used to make a Lisp object | 5577 Allocated memory that is going to be used to make a Lisp object |
5569 is created using @code{allocate_lisp_storage()}. This calls @code{xmalloc()} | 5578 is created using @code{allocate_lisp_storage()}. This just calls |
5570 but also verifies that the pointer to the memory can fit into | 5579 @code{xmalloc()}. It used to verify that the pointer to the memory can |
5571 a Lisp word (remember that some bits are taken away for a type | 5580 fit into a Lisp word, before the current Lisp object representation was |
5572 tag and a mark bit). If not, an error is issued through @code{memory_full()}. | 5581 introduced. @code{allocate_lisp_storage()} is called by |
5573 @code{allocate_lisp_storage()} is called by @code{alloc_lcrecord()}, | 5582 @code{alloc_lcrecord()}, @code{ALLOCATE_FIXED_TYPE()}, and the vector |
5574 @code{ALLOCATE_FIXED_TYPE()}, and the vector and bit-vector creation | 5583 and bit-vector creation routines. These routines also call |
5575 routines. These routines also call @code{INCREMENT_CONS_COUNTER()} at the | 5584 @code{INCREMENT_CONS_COUNTER()} at the appropriate times; this keeps |
5576 appropriate times; this keeps statistics on how much memory is | 5585 statistics on how much memory is allocated, so that garbage-collection |
5577 allocated, so that garbage-collection can be invoked when the | 5586 can be invoked when the threshold is reached. |
5578 threshold is reached. | 5587 |
5579 | 5588 @node Cons, Vector, Low-level allocation, Allocation of Objects in XEmacs Lisp |
5580 @node Pure Space | |
5581 @section Pure Space | |
5582 | |
5583 Not yet documented. | |
5584 | |
5585 @node Cons | |
5586 @section Cons | 5589 @section Cons |
5587 | 5590 |
5588 Conses are allocated in standard frob blocks. The only thing to | 5591 Conses are allocated in standard frob blocks. The only thing to |
5589 note is that conses can be explicitly freed using @code{free_cons()} | 5592 note is that conses can be explicitly freed using @code{free_cons()} |
5590 and associated functions @code{free_list()} and @code{free_alist()}. This | 5593 and associated functions @code{free_list()} and @code{free_alist()}. This |
5594 generating extra objects and thereby triggering GC sooner. | 5597 generating extra objects and thereby triggering GC sooner. |
5595 However, you have to be @emph{extremely} careful when doing this. | 5598 However, you have to be @emph{extremely} careful when doing this. |
5596 If you mess this up, you will get BADLY BURNED, and it has happened | 5599 If you mess this up, you will get BADLY BURNED, and it has happened |
5597 before. | 5600 before. |
5598 | 5601 |
5599 @node Vector | 5602 @node Vector, Bit Vector, Cons, Allocation of Objects in XEmacs Lisp |
5600 @section Vector | 5603 @section Vector |
5601 | 5604 |
5602 As mentioned above, each vector is @code{malloc()}ed individually, and | 5605 As mentioned above, each vector is @code{malloc()}ed individually, and |
5603 all are threaded through the variable @code{all_vectors}. Vectors are | 5606 all are threaded through the variable @code{all_vectors}. Vectors are |
5604 marked strangely during garbage collection, by kludging the size field. | 5607 marked strangely during garbage collection, by kludging the size field. |
5605 Note that the @code{struct Lisp_Vector} is declared with its | 5608 Note that the @code{struct Lisp_Vector} is declared with its |
5606 @code{contents} field being a @emph{stretchy} array of one element. It | 5609 @code{contents} field being a @emph{stretchy} array of one element. It |
5607 is actually @code{malloc()}ed with the right size, however, and access | 5610 is actually @code{malloc()}ed with the right size, however, and access |
5608 to any element through the @code{contents} array works fine. | 5611 to any element through the @code{contents} array works fine. |
5609 | 5612 |
5610 @node Bit Vector | 5613 @node Bit Vector, Symbol, Vector, Allocation of Objects in XEmacs Lisp |
5611 @section Bit Vector | 5614 @section Bit Vector |
5612 | 5615 |
5613 Bit vectors work exactly like vectors, except for more complicated | 5616 Bit vectors work exactly like vectors, except for more complicated |
5614 code to access an individual bit, and except for the fact that bit | 5617 code to access an individual bit, and except for the fact that bit |
5615 vectors are lrecords while vectors are not. (The only difference here is | 5618 vectors are lrecords while vectors are not. (The only difference here is |
5616 that there's an lrecord implementation pointer at the beginning and the | 5619 that there's an lrecord implementation pointer at the beginning and the |
5617 tag field in bit vector Lisp words is ``lrecord'' rather than | 5620 tag field in bit vector Lisp words is ``lrecord'' rather than |
5618 ``vector''.) | 5621 ``vector''.) |
5619 | 5622 |
5620 @node Symbol | 5623 @node Symbol, Marker, Bit Vector, Allocation of Objects in XEmacs Lisp |
5621 @section Symbol | 5624 @section Symbol |
5622 | 5625 |
5623 Symbols are also allocated in frob blocks. Note that the code | 5626 Symbols are also allocated in frob blocks. Symbols in the awful |
5624 exists for symbols to be either lrecords (category (c) above) | 5627 horrible obarray structure are chained through their @code{next} field. |
5625 or simple types (category (b) above), and are lrecords by | |
5626 default (I think), although there is no good reason for this. | |
5627 | |
5628 Note that symbols in the awful horrible obarray structure are | |
5629 chained through their @code{next} field. | |
5630 | 5628 |
5631 Remember that @code{intern} looks up a symbol in an obarray, creating | 5629 Remember that @code{intern} looks up a symbol in an obarray, creating |
5632 one if necessary. | 5630 one if necessary. |
5633 | 5631 |
5634 @node Marker | 5632 @node Marker, String, Symbol, Allocation of Objects in XEmacs Lisp |
5635 @section Marker | 5633 @section Marker |
5636 | 5634 |
5637 Markers are allocated in frob blocks, as usual. They are kept | 5635 Markers are allocated in frob blocks, as usual. They are kept |
5638 in a buffer unordered, but in a doubly-linked list so that they | 5636 in a buffer unordered, but in a doubly-linked list so that they |
5639 can easily be removed. (Formerly this was a singly-linked list, | 5637 can easily be removed. (Formerly this was a singly-linked list, |
5640 but in some cases garbage collection took an extraordinarily | 5638 but in some cases garbage collection took an extraordinarily |
5641 long time due to the O(N^2) time required to remove lots of | 5639 long time due to the O(N^2) time required to remove lots of |
5642 markers from a buffer.) Markers are removed from a buffer in | 5640 markers from a buffer.) Markers are removed from a buffer in |
5643 the finalize stage, in @code{ADDITIONAL_FREE_marker()}. | 5641 the finalize stage, in @code{ADDITIONAL_FREE_marker()}. |
5644 | 5642 |
5645 @node String | 5643 @node String, Compiled Function, Marker, Allocation of Objects in XEmacs Lisp |
5646 @section String | 5644 @section String |
5647 | 5645 |
5648 As mentioned above, strings are a special case. A string is logically | 5646 As mentioned above, strings are a special case. A string is logically |
5649 two parts, a fixed-size object (containing the length, property list, | 5647 two parts, a fixed-size object (containing the length, property list, |
5650 and a pointer to the actual data), and the actual data in the string. | 5648 and a pointer to the actual data), and the actual data in the string. |
5701 string data (which would normally be obtained from the now-non-existent | 5699 string data (which would normally be obtained from the now-non-existent |
5702 @code{struct Lisp_String}) at the beginning of the dead string data gap. | 5700 @code{struct Lisp_String}) at the beginning of the dead string data gap. |
5703 The string compactor recognizes this special 0xFFFFFFFF marker and | 5701 The string compactor recognizes this special 0xFFFFFFFF marker and |
5704 handles it correctly. | 5702 handles it correctly. |
5705 | 5703 |
5706 @node Compiled Function | 5704 @node Compiled Function, , String, Allocation of Objects in XEmacs Lisp |
5707 @section Compiled Function | 5705 @section Compiled Function |
5708 | 5706 |
5709 Not yet documented. | 5707 Not yet documented. |
5710 | 5708 |
5711 @node Events and the Event Loop, Evaluation; Stack Frames; Bindings, Allocation of Objects in XEmacs Lisp, Top | 5709 |
5710 @node Dumping, Events and the Event Loop, Allocation of Objects in XEmacs Lisp, Top | |
5711 @chapter Dumping | |
5712 | |
5713 @section What is dumping and its justification | |
5714 | |
5715 The C code of XEmacs is just a Lisp engine with a lot of built-in | |
5716 primitives useful for writing an editor. The editor itself is written | |
5717 mostly in Lisp, and represents around 100K lines of code. Loading and | |
5718 executing the initialization of all this code takes a bit a time (five | |
5719 to ten times the usual startup time of current xemacs) and requires | |
5720 having all the lisp source files around. Having to reload them each | |
5721 time the editor is started would not be acceptable. | |
5722 | |
5723 The traditional solution to this problem is called dumping: the build | |
5724 process first creates the lisp engine under the name @file{temacs}, then | |
5725 runs it until it has finished loading and initializing all the lisp | |
5726 code, and eventually creates a new executable called @file{xemacs} | |
5727 including both the object code in @file{temacs} and all the contents of | |
5728 the memory after the initialization. | |
5729 | |
5730 This solution, while working, has a huge problem: the creation of the | |
5731 new executable from the actual contents of memory is an extremely | |
5732 system-specific process, quite error-prone, and which interferes with a | |
5733 lot of system libraries (like malloc). It is even getting worse | |
5734 nowadays with libraries using constructors which are automatically | |
5735 called when the program is started (even before main()) which tend to | |
5736 crash when they are called multiple times, once before dumping and once | |
5737 after (IRIX 6.x libz.so pulls in some C++ image libraries thru | |
5738 dependencies which have this problem). Writing the dumper is also one | |
5739 of the most difficult parts of porting XEmacs to a new operating system. | |
5740 Basically, `dumping' is an operation that is just not officially | |
5741 supported on many operating systems. | |
5742 | |
5743 The aim of the portable dumper is to solve the same problem as the | |
5744 system-specific dumper, that is to be able to reload quickly, using only | |
5745 a small number of files, the fully initialized lisp part of the editor, | |
5746 without any system-specific hacks. | |
5747 | |
5748 @menu | |
5749 * Overview:: | |
5750 * Data descriptions:: | |
5751 * Dumping phase:: | |
5752 * Reloading phase:: | |
5753 * Remaining issues:: | |
5754 @end menu | |
5755 | |
5756 @node Overview, Data descriptions, Dumping, Dumping | |
5757 @section Overview | |
5758 | |
5759 The portable dumping system has to: | |
5760 | |
5761 @enumerate | |
5762 @item | |
5763 At dump time, write all initialized, non-quickly-rebuildable data to a | |
5764 file [Note: currently named @file{xemacs.dmp}, but the name will | |
5765 change], along with all informations needed for the reloading. | |
5766 | |
5767 @item | |
5768 When starting xemacs, reload the dump file, relocate it to its new | |
5769 starting address if needed, and reinitialize all pointers to this | |
5770 data. Also, rebuild all the quickly rebuildable data. | |
5771 @end enumerate | |
5772 | |
5773 @node Data descriptions, Dumping phase, Overview, Dumping | |
5774 @section Data descriptions | |
5775 | |
5776 The more complex task of the dumper is to be able to write lisp objects | |
5777 (lrecords) and C structs to disk and reload them at a different address, | |
5778 updating all the pointers they include in the process. This is done by | |
5779 using external data descriptions that give information about the layout | |
5780 of the structures in memory. | |
5781 | |
5782 The specification of these descriptions is in lrecord.h. A description | |
5783 of an lrecord is an array of struct lrecord_description. Each of these | |
5784 structs include a type, an offset in the structure and some optional | |
5785 parameters depending on the type. For instance, here is the string | |
5786 description: | |
5787 | |
5788 @example | |
5789 static const struct lrecord_description string_description[] = @{ | |
5790 @{ XD_BYTECOUNT, offsetof (Lisp_String, size) @}, | |
5791 @{ XD_OPAQUE_DATA_PTR, offsetof (Lisp_String, data), XD_INDIRECT(0, 1) @}, | |
5792 @{ XD_LISP_OBJECT, offsetof (Lisp_String, plist) @}, | |
5793 @{ XD_END @} | |
5794 @}; | |
5795 @end example | |
5796 | |
5797 The first line indicates a member of type Bytecount, which is used by | |
5798 the next, indirect directive. The second means "there is a pointer to | |
5799 some opaque data in the field @code{data}". The length of said data is | |
5800 given by the expression @code{XD_INDIRECT(0, 1)}, which means "the value | |
5801 in the 0th line of the description (welcome to C) plus one". The third | |
5802 line means "there is a Lisp_Object member @code{plist} in the Lisp_String | |
5803 structure". @code{XD_END} then ends the description. | |
5804 | |
5805 This gives us all the information we need to move around what is pointed | |
5806 to by a structure (C or lrecord) and, by transitivity, everything that | |
5807 it points to. The only missing information for dumping is the size of | |
5808 the structure. For lrecords, this is part of the | |
5809 lrecord_implementation, so we don't need to duplicate it. For C | |
5810 structures we use a struct struct_description, which includes a size | |
5811 field and a pointer to an associated array of lrecord_description. | |
5812 | |
5813 @node Dumping phase, Reloading phase, Data descriptions, Dumping | |
5814 @section Dumping phase | |
5815 | |
5816 Dumping is done by calling the function pdump() (in dumper.c) which is | |
5817 invoked from Fdump_emacs (in emacs.c). This function performs a number | |
5818 of tasks. | |
5819 | |
5820 @menu | |
5821 * Object inventory:: | |
5822 * Address allocation:: | |
5823 * The header:: | |
5824 * Data dumping:: | |
5825 * Pointers dumping:: | |
5826 @end menu | |
5827 | |
5828 @node Object inventory, Address allocation, Dumping phase, Dumping phase | |
5829 @subsection Object inventory | |
5830 | |
5831 The first task is to build the list of the objects to dump. This | |
5832 includes: | |
5833 | |
5834 @itemize @bullet | |
5835 @item lisp objects | |
5836 @item C structures | |
5837 @end itemize | |
5838 | |
5839 We end up with one @code{pdump_entry_list_elmt} per object group (arrays | |
5840 of C structs are kept together) which includes a pointer to the first | |
5841 object of the group, the per-object size and the count of objects in the | |
5842 group, along with some other information which is initialized later. | |
5843 | |
5844 These entries are linked together in @code{pdump_entry_list} structures | |
5845 and can be enumerated thru either: | |
5846 | |
5847 @enumerate | |
5848 @item | |
5849 the @code{pdump_object_table}, an array of @code{pdump_entry_list}, one | |
5850 per lrecord type, indexed by type number. | |
5851 | |
5852 @item | |
5853 the @code{pdump_opaque_data_list}, used for the opaque data which does | |
5854 not include pointers, and hence does not need descriptions. | |
5855 | |
5856 @item | |
5857 the @code{pdump_struct_table}, which is a vector of | |
5858 @code{struct_description}/@code{pdump_entry_list} pairs, used for | |
5859 non-opaque C structures. | |
5860 @end enumerate | |
5861 | |
5862 This uses a marking strategy similar to the garbage collector. Some | |
5863 differences though: | |
5864 | |
5865 @enumerate | |
5866 @item | |
5867 We do not use the mark bit (which does not exist for C structures | |
5868 anyway), we use a big hash table instead. | |
5869 | |
5870 @item | |
5871 We do not use the mark function of lrecords but instead rely on the | |
5872 external descriptions. This happens essentially because we need to | |
5873 follow pointers to C structures and opaque data in addition to | |
5874 Lisp_Object members. | |
5875 @end enumerate | |
5876 | |
5877 This is done by @code{pdump_register_object}, which handles Lisp_Object | |
5878 variables, and pdump_register_struct which handles C structures, which | |
5879 both delegate the description management to pdump_register_sub. | |
5880 | |
5881 The hash table doubles as a map object to pdump_entry_list_elmt (i.e. | |
5882 allows us to look up a pdump_entry_list_elmt with the object it points | |
5883 to). Entries are added with @code{pdump_add_entry()} and looked up with | |
5884 @code{pdump_get_entry()}. There is no need for entry removal. The hash | |
5885 value is computed quite basically from the object pointer by | |
5886 @code{pdump_make_hash()}. | |
5887 | |
5888 The roots for the marking are: | |
5889 | |
5890 @enumerate | |
5891 @item | |
5892 the @code{staticpro}'ed variables (there is a special @code{staticpro_nodump()} | |
5893 call for protected variables we do not want to dump). | |
5894 | |
5895 @item | |
5896 the @code{pdump_wire}'d variables (@code{staticpro} is equivalent to | |
5897 @code{staticpro_nodump()} + @code{pdump_wire()}). | |
5898 | |
5899 @item | |
5900 the @code{dumpstruct}'ed variables, which points to C structures. | |
5901 @end enumerate | |
5902 | |
5903 This does not include the GCPRO'ed variables, the specbinds, the | |
5904 catchtags, the backlist, the redisplay or the profiling info, since we | |
5905 do not want to rebuild the actual chain of lisp calls which end up to | |
5906 the dump-emacs call, only the global variables. | |
5907 | |
5908 Weak lists and weak hash tables are dumped as if they were their | |
5909 non-weak equivalent (without changing their type, of course). This has | |
5910 not yet been a problem. | |
5911 | |
5912 @node Address allocation, The header, Object inventory, Dumping phase | |
5913 @subsection Address allocation | |
5914 | |
5915 | |
5916 The next step is to allocate the offsets of each of the objects in the | |
5917 final dump file. This is done by @code{pdump_allocate_offset()} which | |
5918 is called indirectly by @code{pdump_scan_by_alignment()}. | |
5919 | |
5920 The strategy to deal with alignment problems uses these facts: | |
5921 | |
5922 @enumerate | |
5923 @item | |
5924 real world alignment requirements are powers of two. | |
5925 | |
5926 @item | |
5927 the C compiler is required to adjust the size of a struct so that you | |
5928 can have an array of them next to each other. This means you can have a | |
5929 upper bound of the alignment requirements of a given structure by | |
5930 looking at which power of two its size is a multiple. | |
5931 | |
5932 @item | |
5933 the non-variant part of variable size lrecords has an alignment | |
5934 requirement of 4. | |
5935 @end enumerate | |
5936 | |
5937 Hence, for each lrecord type, C struct type or opaque data block the | |
5938 alignment requirement is computed as a power of two, with a minimum of | |
5939 2^2 for lrecords. @code{pdump_scan_by_alignment()} then scans all the | |
5940 @code{pdump_entry_list_elmt}'s, the ones with the highest requirements | |
5941 first. This ensures the best packing. | |
5942 | |
5943 The maximum alignment requirement we take into account is 2^8. | |
5944 | |
5945 @code{pdump_allocate_offset()} only has to do a linear allocation, | |
5946 starting at offset 256 (this leaves room for the header and keep the | |
5947 alignments happy). | |
5948 | |
5949 @node The header, Data dumping, Address allocation, Dumping phase | |
5950 @subsection The header | |
5951 | |
5952 The next step creates the file and writes a header with a signature and | |
5953 some random informations in it (number of staticpro, number of assigned | |
5954 lrecord types, etc...). The reloc_address field, which indicates at | |
5955 which address the file should be loaded if we want to avoid post-reload | |
5956 relocation, is set to 0. It then seeks to offset 256 (base offset for | |
5957 the objects). | |
5958 | |
5959 @node Data dumping, Pointers dumping, The header, Dumping phase | |
5960 @subsection Data dumping | |
5961 | |
5962 The data is dumped in the same order as the addresses were allocated by | |
5963 @code{pdump_dump_data()}, called from @code{pdump_scan_by_alignment()}. | |
5964 This function copies the data to a temporary buffer, relocates all | |
5965 pointers in the object to the addresses allocated in step Address | |
5966 Allocation, and writes it to the file. Using the same order means that, | |
5967 if we are careful with lrecords whose size is not a multiple of 4, we | |
5968 are ensured that the object is always written at the offset in the file | |
5969 allocated in step Address Allocation. | |
5970 | |
5971 @node Pointers dumping, , Data dumping, Dumping phase | |
5972 @subsection Pointers dumping | |
5973 | |
5974 A bunch of tables needed to reassign properly the global pointers are | |
5975 then written. They are: | |
5976 | |
5977 @enumerate | |
5978 @item | |
5979 the staticpro array | |
5980 @item | |
5981 the dumpstruct array | |
5982 @item | |
5983 the lrecord_implementation_table array | |
5984 @item | |
5985 a vector of all the offsets to the objects in the file that include a | |
5986 description (for faster relocation at reload time) | |
5987 @item | |
5988 the pdump_wired and pdump_wired_list arrays | |
5989 @end enumerate | |
5990 | |
5991 For each of the arrays we write both the pointer to the variables and | |
5992 the relocated offset of the object they point to. Since these variables | |
5993 are global, the pointers are still valid when restarting the program and | |
5994 are used to regenerate the global pointers. | |
5995 | |
5996 The @code{pdump_wired_list} array is a special case. The variables it | |
5997 points to are the head of weak linked lists of lisp objects of the same | |
5998 type. Not all objects of this list are dumped so the relocated pointer | |
5999 we associate with them points to the first dumped object of the list, or | |
6000 Qnil if none is available. This is also the reason why they are not | |
6001 used as roots for the purpose of object enumeration. | |
6002 | |
6003 This is the end of the dumping part. | |
6004 | |
6005 @node Reloading phase, Remaining issues, Dumping phase, Dumping | |
6006 @section Reloading phase | |
6007 | |
6008 @subsection File loading | |
6009 | |
6010 The file is mmap'ed in memory (which ensures a PAGESIZE alignment, at | |
6011 least 4096), or if mmap is unavailable or fails, a 256-bytes aligned | |
6012 malloc is done and the file is loaded. | |
6013 | |
6014 Some variables are reinitialized from the values found in the header. | |
6015 | |
6016 The difference between the actual loading address and the reloc_address | |
6017 is computed and will be used for all the relocations. | |
6018 | |
6019 | |
6020 @subsection Putting back the staticvec | |
6021 | |
6022 The staticvec array is memcpy'd from the file and the variables it | |
6023 points to are reset to the relocated objects addresses. | |
6024 | |
6025 | |
6026 @subsection Putting back the dumpstructed variables | |
6027 | |
6028 The variables pointed to by dumpstruct in the dump phase are reset to | |
6029 the right relocated object addresses. | |
6030 | |
6031 | |
6032 @subsection lrecord_implementations_table | |
6033 | |
6034 The lrecord_implementations_table is reset to its dump time state and | |
6035 the right lrecord_type_index values are put in. | |
6036 | |
6037 | |
6038 @subsection Object relocation | |
6039 | |
6040 All the objects are relocated using their description and their offset | |
6041 by @code{pdump_reloc_one}. This step is unnecessary if the | |
6042 reloc_address is equal to the file loading address. | |
6043 | |
6044 | |
6045 @subsection Putting back the pdump_wire and pdump_wire_list variables | |
6046 | |
6047 Same as Putting back the dumpstructed variables. | |
6048 | |
6049 | |
6050 @subsection Reorganize the hash tables | |
6051 | |
6052 Since some of the hash values in the lisp hash tables are | |
6053 address-dependent, their layout is now wrong. So we go through each of | |
6054 them and have them resorted by calling @code{pdump_reorganize_hash_table}. | |
6055 | |
6056 @node Remaining issues, , Reloading phase, Dumping | |
6057 @section Remaining issues | |
6058 | |
6059 The build process will have to start a post-dump xemacs, ask it the | |
6060 loading address (which will, hopefully, be always the same between | |
6061 different xemacs invocations) and relocate the file to the new address. | |
6062 This way the object relocation phase will not have to be done, which | |
6063 means no writes in the objects and that, because of the use of mmap, the | |
6064 dumped data will be shared between all the xemacs running on the | |
6065 computer. | |
6066 | |
6067 Some executable signature will be necessary to ensure that a given dump | |
6068 file is really associated with a given executable, or random crashes | |
6069 will occur. Maybe a random number set at compile or configure time thru | |
6070 a define. This will also allow for having differently-compiled xemacsen | |
6071 on the same system (mule and no-mule comes to mind). | |
6072 | |
6073 The DOC file contents should probably end up in the dump file. | |
6074 | |
6075 | |
6076 @node Events and the Event Loop, Evaluation; Stack Frames; Bindings, Dumping, Top | |
5712 @chapter Events and the Event Loop | 6077 @chapter Events and the Event Loop |
5713 | 6078 |
5714 @menu | 6079 @menu |
5715 * Introduction to Events:: | 6080 * Introduction to Events:: |
5716 * Main Loop:: | 6081 * Main Loop:: |
5720 * Other Event Loop Functions:: | 6085 * Other Event Loop Functions:: |
5721 * Converting Events:: | 6086 * Converting Events:: |
5722 * Dispatching Events; The Command Builder:: | 6087 * Dispatching Events; The Command Builder:: |
5723 @end menu | 6088 @end menu |
5724 | 6089 |
5725 @node Introduction to Events | 6090 @node Introduction to Events, Main Loop, Events and the Event Loop, Events and the Event Loop |
5726 @section Introduction to Events | 6091 @section Introduction to Events |
5727 | 6092 |
5728 An event is an object that encapsulates information about an | 6093 An event is an object that encapsulates information about an |
5729 interesting occurrence in the operating system. Events are | 6094 interesting occurrence in the operating system. Events are |
5730 generated either by user action, direct (e.g. typing on the | 6095 generated either by user action, direct (e.g. typing on the |
5759 Emacs events---there may not be a one-to-one correspondence. | 6124 Emacs events---there may not be a one-to-one correspondence. |
5760 | 6125 |
5761 Emacs events are documented in @file{events.h}; I'll discuss them | 6126 Emacs events are documented in @file{events.h}; I'll discuss them |
5762 later. | 6127 later. |
5763 | 6128 |
5764 @node Main Loop | 6129 @node Main Loop, Specifics of the Event Gathering Mechanism, Introduction to Events, Events and the Event Loop |
5765 @section Main Loop | 6130 @section Main Loop |
5766 | 6131 |
5767 The @dfn{command loop} is the top-level loop that the editor is always | 6132 The @dfn{command loop} is the top-level loop that the editor is always |
5768 running. It loops endlessly, calling @code{next-event} to retrieve an | 6133 running. It loops endlessly, calling @code{next-event} to retrieve an |
5769 event and @code{dispatch-event} to execute it. @code{dispatch-event} does | 6134 event and @code{dispatch-event} to execute it. @code{dispatch-event} does |
5826 wrapper similar to @code{command_loop_2()}. Note also that | 6191 wrapper similar to @code{command_loop_2()}. Note also that |
5827 @code{initial_command_loop()} sets up a catch for @code{top-level} when | 6192 @code{initial_command_loop()} sets up a catch for @code{top-level} when |
5828 invoking @code{top_level_1()}, just like when it invokes | 6193 invoking @code{top_level_1()}, just like when it invokes |
5829 @code{command_loop_2()}. | 6194 @code{command_loop_2()}. |
5830 | 6195 |
5831 @node Specifics of the Event Gathering Mechanism | 6196 @node Specifics of the Event Gathering Mechanism, Specifics About the Emacs Event, Main Loop, Events and the Event Loop |
5832 @section Specifics of the Event Gathering Mechanism | 6197 @section Specifics of the Event Gathering Mechanism |
5833 | 6198 |
5834 Here is an approximate diagram of the collection processes | 6199 Here is an approximate diagram of the collection processes |
5835 at work in XEmacs, under TTY's (TTY's are simpler than X | 6200 at work in XEmacs, under TTY's (TTY's are simpler than X |
5836 so we'll look at this first): | 6201 so we'll look at this first): |
6065 which repeatedly calls `next-event' | 6430 which repeatedly calls `next-event' |
6066 and then dispatches the event | 6431 and then dispatches the event |
6067 using `dispatch-event' | 6432 using `dispatch-event' |
6068 @end example | 6433 @end example |
6069 | 6434 |
6070 @node Specifics About the Emacs Event | 6435 @node Specifics About the Emacs Event, The Event Stream Callback Routines, Specifics of the Event Gathering Mechanism, Events and the Event Loop |
6071 @section Specifics About the Emacs Event | 6436 @section Specifics About the Emacs Event |
6072 | 6437 |
6073 @node The Event Stream Callback Routines | 6438 @node The Event Stream Callback Routines, Other Event Loop Functions, Specifics About the Emacs Event, Events and the Event Loop |
6074 @section The Event Stream Callback Routines | 6439 @section The Event Stream Callback Routines |
6075 | 6440 |
6076 @node Other Event Loop Functions | 6441 @node Other Event Loop Functions, Converting Events, The Event Stream Callback Routines, Events and the Event Loop |
6077 @section Other Event Loop Functions | 6442 @section Other Event Loop Functions |
6078 | 6443 |
6079 @code{detect_input_pending()} and @code{input-pending-p} look for | 6444 @code{detect_input_pending()} and @code{input-pending-p} look for |
6080 input by calling @code{event_stream->event_pending_p} and looking in | 6445 input by calling @code{event_stream->event_pending_p} and looking in |
6081 @code{[V]unread-command-event} and the @code{command_event_queue} (they | 6446 @code{[V]unread-command-event} and the @code{command_event_queue} (they |
6093 @code{read-char} calls @code{next-command-event} and uses | 6458 @code{read-char} calls @code{next-command-event} and uses |
6094 @code{event_to_character()} to return the character equivalent. With | 6459 @code{event_to_character()} to return the character equivalent. With |
6095 the right kind of input method support, it is possible for (read-char) | 6460 the right kind of input method support, it is possible for (read-char) |
6096 to return a Kanji character. | 6461 to return a Kanji character. |
6097 | 6462 |
6098 @node Converting Events | 6463 @node Converting Events, Dispatching Events; The Command Builder, Other Event Loop Functions, Events and the Event Loop |
6099 @section Converting Events | 6464 @section Converting Events |
6100 | 6465 |
6101 @code{character_to_event()}, @code{event_to_character()}, | 6466 @code{character_to_event()}, @code{event_to_character()}, |
6102 @code{event-to-character}, and @code{character-to-event} convert between | 6467 @code{event-to-character}, and @code{character-to-event} convert between |
6103 characters and keypress events corresponding to the characters. If the | 6468 characters and keypress events corresponding to the characters. If the |
6104 event was not a keypress, @code{event_to_character()} returns -1 and | 6469 event was not a keypress, @code{event_to_character()} returns -1 and |
6105 @code{event-to-character} returns @code{nil}. These functions convert | 6470 @code{event-to-character} returns @code{nil}. These functions convert |
6106 between character representation and the split-up event representation | 6471 between character representation and the split-up event representation |
6107 (keysym plus mod keys). | 6472 (keysym plus mod keys). |
6108 | 6473 |
6109 @node Dispatching Events; The Command Builder | 6474 @node Dispatching Events; The Command Builder, , Converting Events, Events and the Event Loop |
6110 @section Dispatching Events; The Command Builder | 6475 @section Dispatching Events; The Command Builder |
6111 | 6476 |
6112 Not yet documented. | 6477 Not yet documented. |
6113 | 6478 |
6114 @node Evaluation; Stack Frames; Bindings, Symbols and Variables, Events and the Event Loop, Top | 6479 @node Evaluation; Stack Frames; Bindings, Symbols and Variables, Events and the Event Loop, Top |
6119 * Dynamic Binding; The specbinding Stack; Unwind-Protects:: | 6484 * Dynamic Binding; The specbinding Stack; Unwind-Protects:: |
6120 * Simple Special Forms:: | 6485 * Simple Special Forms:: |
6121 * Catch and Throw:: | 6486 * Catch and Throw:: |
6122 @end menu | 6487 @end menu |
6123 | 6488 |
6124 @node Evaluation | 6489 @node Evaluation, Dynamic Binding; The specbinding Stack; Unwind-Protects, Evaluation; Stack Frames; Bindings, Evaluation; Stack Frames; Bindings |
6125 @section Evaluation | 6490 @section Evaluation |
6126 | 6491 |
6127 @code{Feval()} evaluates the form (a Lisp object) that is passed to | 6492 @code{Feval()} evaluates the form (a Lisp object) that is passed to |
6128 it. Note that evaluation is only non-trivial for two types of objects: | 6493 it. Note that evaluation is only non-trivial for two types of objects: |
6129 symbols and conses. A symbol is evaluated simply by calling | 6494 symbols and conses. A symbol is evaluated simply by calling |
6188 @code{funcall_compiled_function()} calls the real byte-code interpreter | 6553 @code{funcall_compiled_function()} calls the real byte-code interpreter |
6189 @code{execute_optimized_program()} on the byte-code instructions, which | 6554 @code{execute_optimized_program()} on the byte-code instructions, which |
6190 are converted into an internal form for faster execution. | 6555 are converted into an internal form for faster execution. |
6191 | 6556 |
6192 When a compiled function is executed for the first time by | 6557 When a compiled function is executed for the first time by |
6193 @code{funcall_compiled_function()}, or when it is @code{Fpurecopy()}ed | 6558 @code{funcall_compiled_function()}, or during the dump phase of building |
6194 during the dump phase of building XEmacs, the byte-code instructions are | 6559 XEmacs, the byte-code instructions are converted from a |
6195 converted from a @code{Lisp_String} (which is inefficient to access, | 6560 @code{Lisp_String} (which is inefficient to access, especially in the |
6196 especially in the presence of MULE) into a @code{Lisp_Opaque} object | 6561 presence of MULE) into a @code{Lisp_Opaque} object containing an array |
6197 containing an array of unsigned char, which can be directly executed by | 6562 of unsigned char, which can be directly executed by the byte-code |
6198 the byte-code interpreter. At this time the byte code is also analyzed | 6563 interpreter. At this time the byte code is also analyzed for validity |
6199 for validity and transformed into a more optimized form, so that | 6564 and transformed into a more optimized form, so that |
6200 @code{execute_optimized_program()} can really fly. | 6565 @code{execute_optimized_program()} can really fly. |
6201 | 6566 |
6202 Here are some of the optimizations performed by the internal byte-code | 6567 Here are some of the optimizations performed by the internal byte-code |
6203 transformer: | 6568 transformer: |
6204 @enumerate | 6569 @enumerate |
6209 References to the @code{constants} array that will be used as a Lisp | 6574 References to the @code{constants} array that will be used as a Lisp |
6210 variable are checked for being correct non-constant (i.e. not @code{t}, | 6575 variable are checked for being correct non-constant (i.e. not @code{t}, |
6211 @code{nil}, or @code{keywordp}) symbols, so that the byte interpreter | 6576 @code{nil}, or @code{keywordp}) symbols, so that the byte interpreter |
6212 doesn't have to. | 6577 doesn't have to. |
6213 @item | 6578 @item |
6214 The maxiumum number of variable bindings in the byte-code is | 6579 The maximum number of variable bindings in the byte-code is |
6215 pre-computed, so that space on the @code{specpdl} stack can be | 6580 pre-computed, so that space on the @code{specpdl} stack can be |
6216 pre-reserved once for the whole function execution. | 6581 pre-reserved once for the whole function execution. |
6217 @item | 6582 @item |
6218 All byte-code jumps are relative to the current program counter instead | 6583 All byte-code jumps are relative to the current program counter instead |
6219 of the start of the program, thereby saving a register. | 6584 of the start of the program, thereby saving a register. |
6249 @code{call3()} call a function, passing it the argument(s) given (the | 6614 @code{call3()} call a function, passing it the argument(s) given (the |
6250 arguments are given as separate C arguments rather than being passed as | 6615 arguments are given as separate C arguments rather than being passed as |
6251 an array). @code{apply1()} uses @code{Fapply()} while the others use | 6616 an array). @code{apply1()} uses @code{Fapply()} while the others use |
6252 @code{Ffuncall()} to do the real work. | 6617 @code{Ffuncall()} to do the real work. |
6253 | 6618 |
6254 @node Dynamic Binding; The specbinding Stack; Unwind-Protects | 6619 @node Dynamic Binding; The specbinding Stack; Unwind-Protects, Simple Special Forms, Evaluation, Evaluation; Stack Frames; Bindings |
6255 @section Dynamic Binding; The specbinding Stack; Unwind-Protects | 6620 @section Dynamic Binding; The specbinding Stack; Unwind-Protects |
6256 | 6621 |
6257 @example | 6622 @example |
6258 struct specbinding | 6623 struct specbinding |
6259 @{ | 6624 @{ |
6303 a local-variable binding (@code{func} is 0, @code{symbol} is not | 6668 a local-variable binding (@code{func} is 0, @code{symbol} is not |
6304 @code{nil}, and @code{old_value} holds the old value, which is stored as | 6669 @code{nil}, and @code{old_value} holds the old value, which is stored as |
6305 the symbol's value). | 6670 the symbol's value). |
6306 @end enumerate | 6671 @end enumerate |
6307 | 6672 |
6308 @node Simple Special Forms | 6673 @node Simple Special Forms, Catch and Throw, Dynamic Binding; The specbinding Stack; Unwind-Protects, Evaluation; Stack Frames; Bindings |
6309 @section Simple Special Forms | 6674 @section Simple Special Forms |
6310 | 6675 |
6311 @code{or}, @code{and}, @code{if}, @code{cond}, @code{progn}, | 6676 @code{or}, @code{and}, @code{if}, @code{cond}, @code{progn}, |
6312 @code{prog1}, @code{prog2}, @code{setq}, @code{quote}, @code{function}, | 6677 @code{prog1}, @code{prog2}, @code{setq}, @code{quote}, @code{function}, |
6313 @code{let*}, @code{let}, @code{while} | 6678 @code{let*}, @code{let}, @code{while} |
6315 All of these are very simple and work as expected, calling | 6680 All of these are very simple and work as expected, calling |
6316 @code{Feval()} or @code{Fprogn()} as necessary and (in the case of | 6681 @code{Feval()} or @code{Fprogn()} as necessary and (in the case of |
6317 @code{let} and @code{let*}) using @code{specbind()} to create bindings | 6682 @code{let} and @code{let*}) using @code{specbind()} to create bindings |
6318 and @code{unbind_to()} to undo the bindings when finished. | 6683 and @code{unbind_to()} to undo the bindings when finished. |
6319 | 6684 |
6320 Note that, with the exeption of @code{Fprogn}, these functions are | 6685 Note that, with the exception of @code{Fprogn}, these functions are |
6321 typically called in real life only in interpreted code, since the byte | 6686 typically called in real life only in interpreted code, since the byte |
6322 compiler knows how to convert calls to these functions directly into | 6687 compiler knows how to convert calls to these functions directly into |
6323 byte code. | 6688 byte code. |
6324 | 6689 |
6325 @node Catch and Throw | 6690 @node Catch and Throw, , Simple Special Forms, Evaluation; Stack Frames; Bindings |
6326 @section Catch and Throw | 6691 @section Catch and Throw |
6327 | 6692 |
6328 @example | 6693 @example |
6329 struct catchtag | 6694 struct catchtag |
6330 @{ | 6695 @{ |
6388 * Introduction to Symbols:: | 6753 * Introduction to Symbols:: |
6389 * Obarrays:: | 6754 * Obarrays:: |
6390 * Symbol Values:: | 6755 * Symbol Values:: |
6391 @end menu | 6756 @end menu |
6392 | 6757 |
6393 @node Introduction to Symbols | 6758 @node Introduction to Symbols, Obarrays, Symbols and Variables, Symbols and Variables |
6394 @section Introduction to Symbols | 6759 @section Introduction to Symbols |
6395 | 6760 |
6396 A symbol is basically just an object with four fields: a name (a | 6761 A symbol is basically just an object with four fields: a name (a |
6397 string), a value (some Lisp object), a function (some Lisp object), and | 6762 string), a value (some Lisp object), a function (some Lisp object), and |
6398 a property list (usually a list of alternating keyword/value pairs). | 6763 a property list (usually a list of alternating keyword/value pairs). |
6405 there can be a distinct function and variable with the same name. The | 6770 there can be a distinct function and variable with the same name. The |
6406 property list is used as a more general mechanism of associating | 6771 property list is used as a more general mechanism of associating |
6407 additional values with particular names, and once again the namespace is | 6772 additional values with particular names, and once again the namespace is |
6408 independent of the function and variable namespaces. | 6773 independent of the function and variable namespaces. |
6409 | 6774 |
6410 @node Obarrays | 6775 @node Obarrays, Symbol Values, Introduction to Symbols, Symbols and Variables |
6411 @section Obarrays | 6776 @section Obarrays |
6412 | 6777 |
6413 The identity of symbols with their names is accomplished through a | 6778 The identity of symbols with their names is accomplished through a |
6414 structure called an obarray, which is just a poorly-implemented hash | 6779 structure called an obarray, which is just a poorly-implemented hash |
6415 table mapping from strings to symbols whose name is that string. (I say | 6780 table mapping from strings to symbols whose name is that string. (I say |
6472 a new one, and @code{unintern} to remove a symbol from an obarray. This | 6837 a new one, and @code{unintern} to remove a symbol from an obarray. This |
6473 returns the removed symbol. (Remember: You can't put the symbol back | 6838 returns the removed symbol. (Remember: You can't put the symbol back |
6474 into any obarray.) Finally, @code{mapatoms} maps over all of the symbols | 6839 into any obarray.) Finally, @code{mapatoms} maps over all of the symbols |
6475 in an obarray. | 6840 in an obarray. |
6476 | 6841 |
6477 @node Symbol Values | 6842 @node Symbol Values, , Obarrays, Symbols and Variables |
6478 @section Symbol Values | 6843 @section Symbol Values |
6479 | 6844 |
6480 The value field of a symbol normally contains a Lisp object. However, | 6845 The value field of a symbol normally contains a Lisp object. However, |
6481 a symbol can be @dfn{unbound}, meaning that it logically has no value. | 6846 a symbol can be @dfn{unbound}, meaning that it logically has no value. |
6482 This is internally indicated by storing a special Lisp object, called | 6847 This is internally indicated by storing a special Lisp object, called |
6527 * Markers and Extents:: Tagging locations within a buffer. | 6892 * Markers and Extents:: Tagging locations within a buffer. |
6528 * Bufbytes and Emchars:: Representation of individual characters. | 6893 * Bufbytes and Emchars:: Representation of individual characters. |
6529 * The Buffer Object:: The Lisp object corresponding to a buffer. | 6894 * The Buffer Object:: The Lisp object corresponding to a buffer. |
6530 @end menu | 6895 @end menu |
6531 | 6896 |
6532 @node Introduction to Buffers | 6897 @node Introduction to Buffers, The Text in a Buffer, Buffers and Textual Representation, Buffers and Textual Representation |
6533 @section Introduction to Buffers | 6898 @section Introduction to Buffers |
6534 | 6899 |
6535 A buffer is logically just a Lisp object that holds some text. | 6900 A buffer is logically just a Lisp object that holds some text. |
6536 In this, it is like a string, but a buffer is optimized for | 6901 In this, it is like a string, but a buffer is optimized for |
6537 frequent insertion and deletion, while a string is not. Furthermore: | 6902 frequent insertion and deletion, while a string is not. Furthermore: |
6580 and @dfn{buffer of the selected window}, and the distinction between | 6945 and @dfn{buffer of the selected window}, and the distinction between |
6581 @dfn{point} of the current buffer and @dfn{window-point} of the selected | 6946 @dfn{point} of the current buffer and @dfn{window-point} of the selected |
6582 window. (This latter distinction is explained in detail in the section | 6947 window. (This latter distinction is explained in detail in the section |
6583 on windows.) | 6948 on windows.) |
6584 | 6949 |
6585 @node The Text in a Buffer | 6950 @node The Text in a Buffer, Buffer Lists, Introduction to Buffers, Buffers and Textual Representation |
6586 @section The Text in a Buffer | 6951 @section The Text in a Buffer |
6587 | 6952 |
6588 The text in a buffer consists of a sequence of zero or more | 6953 The text in a buffer consists of a sequence of zero or more |
6589 characters. A @dfn{character} is an integer that logically represents | 6954 characters. A @dfn{character} is an integer that logically represents |
6590 a letter, number, space, or other unit of text. Most of the characters | 6955 a letter, number, space, or other unit of text. Most of the characters |
6720 Bufbytes underscores the fact that we are working with a string of bytes | 7085 Bufbytes underscores the fact that we are working with a string of bytes |
6721 in the internal Emacs buffer representation rather than in one of a | 7086 in the internal Emacs buffer representation rather than in one of a |
6722 number of possible alternative representations (e.g. EUC-encoded text, | 7087 number of possible alternative representations (e.g. EUC-encoded text, |
6723 etc.). | 7088 etc.). |
6724 | 7089 |
6725 @node Buffer Lists | 7090 @node Buffer Lists, Markers and Extents, The Text in a Buffer, Buffers and Textual Representation |
6726 @section Buffer Lists | 7091 @section Buffer Lists |
6727 | 7092 |
6728 Recall earlier that buffers are @dfn{permanent} objects, i.e. that | 7093 Recall earlier that buffers are @dfn{permanent} objects, i.e. that |
6729 they remain around until explicitly deleted. This entails that there is | 7094 they remain around until explicitly deleted. This entails that there is |
6730 a list of all the buffers in existence. This list is actually an | 7095 a list of all the buffers in existence. This list is actually an |
6756 respectively. You can also force a new buffer to be created using | 7121 respectively. You can also force a new buffer to be created using |
6757 @code{generate-new-buffer}, which takes a name and (if necessary) makes | 7122 @code{generate-new-buffer}, which takes a name and (if necessary) makes |
6758 a unique name from this by appending a number, and then creates the | 7123 a unique name from this by appending a number, and then creates the |
6759 buffer. This is basically like the symbol operation @code{gensym}. | 7124 buffer. This is basically like the symbol operation @code{gensym}. |
6760 | 7125 |
6761 @node Markers and Extents | 7126 @node Markers and Extents, Bufbytes and Emchars, Buffer Lists, Buffers and Textual Representation |
6762 @section Markers and Extents | 7127 @section Markers and Extents |
6763 | 7128 |
6764 Among the things associated with a buffer are things that are | 7129 Among the things associated with a buffer are things that are |
6765 logically attached to certain buffer positions. This can be used to | 7130 logically attached to certain buffer positions. This can be used to |
6766 keep track of a buffer position when text is inserted and deleted, so | 7131 keep track of a buffer position when text is inserted and deleted, so |
6782 | 7147 |
6783 The important thing here is that markers and extents simply contain | 7148 The important thing here is that markers and extents simply contain |
6784 buffer positions in them as integers, and every time text is inserted or | 7149 buffer positions in them as integers, and every time text is inserted or |
6785 deleted, these positions must be updated. In order to minimize the | 7150 deleted, these positions must be updated. In order to minimize the |
6786 amount of shuffling that needs to be done, the positions in markers and | 7151 amount of shuffling that needs to be done, the positions in markers and |
6787 extents (there's one per marker, two per extent) and stored in Meminds. | 7152 extents (there's one per marker, two per extent) are stored in Meminds. |
6788 This means that they only need to be moved when the text is physically | 7153 This means that they only need to be moved when the text is physically |
6789 moved in memory; since the gap structure tries to minimize this, it also | 7154 moved in memory; since the gap structure tries to minimize this, it also |
6790 minimizes the number of marker and extent indices that need to be | 7155 minimizes the number of marker and extent indices that need to be |
6791 adjusted. Look in @file{insdel.c} for the details of how this works. | 7156 adjusted. Look in @file{insdel.c} for the details of how this works. |
6792 | 7157 |
6796 is no way to determine what markers are in a buffer if you are just | 7161 is no way to determine what markers are in a buffer if you are just |
6797 given the buffer. Extents remain in a buffer until they are detached | 7162 given the buffer. Extents remain in a buffer until they are detached |
6798 (which could happen as a result of text being deleted) or the buffer is | 7163 (which could happen as a result of text being deleted) or the buffer is |
6799 deleted, and primitives do exist to enumerate the extents in a buffer. | 7164 deleted, and primitives do exist to enumerate the extents in a buffer. |
6800 | 7165 |
6801 @node Bufbytes and Emchars | 7166 @node Bufbytes and Emchars, The Buffer Object, Markers and Extents, Buffers and Textual Representation |
6802 @section Bufbytes and Emchars | 7167 @section Bufbytes and Emchars |
6803 | 7168 |
6804 Not yet documented. | 7169 Not yet documented. |
6805 | 7170 |
6806 @node The Buffer Object | 7171 @node The Buffer Object, , Bufbytes and Emchars, Buffers and Textual Representation |
6807 @section The Buffer Object | 7172 @section The Buffer Object |
6808 | 7173 |
6809 Buffers contain fields not directly accessible by the Lisp programmer. | 7174 Buffers contain fields not directly accessible by the Lisp programmer. |
6810 We describe them here, naming them by the names used in the C code. | 7175 We describe them here, naming them by the names used in the C code. |
6811 Many are accessible indirectly in Lisp programs via Lisp primitives. | 7176 Many are accessible indirectly in Lisp programs via Lisp primitives. |
6920 * Encodings:: | 7285 * Encodings:: |
6921 * Internal Mule Encodings:: | 7286 * Internal Mule Encodings:: |
6922 * CCL:: | 7287 * CCL:: |
6923 @end menu | 7288 @end menu |
6924 | 7289 |
6925 @node Character Sets | 7290 @node Character Sets, Encodings, MULE Character Sets and Encodings, MULE Character Sets and Encodings |
6926 @section Character Sets | 7291 @section Character Sets |
6927 | 7292 |
6928 A character set (or @dfn{charset}) is an ordered set of characters. A | 7293 A character set (or @dfn{charset}) is an ordered set of characters. A |
6929 particular character in a charset is indexed using one or more | 7294 particular character in a charset is indexed using one or more |
6930 @dfn{position codes}, which are non-negative integers. The number of | 7295 @dfn{position codes}, which are non-negative integers. The number of |
7001 160 - 255 Latin-1 32 - 127 | 7366 160 - 255 Latin-1 32 - 127 |
7002 @end example | 7367 @end example |
7003 | 7368 |
7004 This is a bit ad-hoc but gets the job done. | 7369 This is a bit ad-hoc but gets the job done. |
7005 | 7370 |
7006 @node Encodings | 7371 @node Encodings, Internal Mule Encodings, Character Sets, MULE Character Sets and Encodings |
7007 @section Encodings | 7372 @section Encodings |
7008 | 7373 |
7009 An @dfn{encoding} is a way of numerically representing characters from | 7374 An @dfn{encoding} is a way of numerically representing characters from |
7010 one or more character sets. If an encoding only encompasses one | 7375 one or more character sets. If an encoding only encompasses one |
7011 character set, then the position codes for the characters in that | 7376 character set, then the position codes for the characters in that |
7028 @menu | 7393 @menu |
7029 * Japanese EUC (Extended Unix Code):: | 7394 * Japanese EUC (Extended Unix Code):: |
7030 * JIS7:: | 7395 * JIS7:: |
7031 @end menu | 7396 @end menu |
7032 | 7397 |
7033 @node Japanese EUC (Extended Unix Code) | 7398 @node Japanese EUC (Extended Unix Code), JIS7, Encodings, Encodings |
7034 @subsection Japanese EUC (Extended Unix Code) | 7399 @subsection Japanese EUC (Extended Unix Code) |
7035 | 7400 |
7036 This encompasses the character sets Printing-ASCII, Japanese-JISX0201, | 7401 This encompasses the character sets Printing-ASCII, Japanese-JISX0201, |
7037 and Japanese-JISX0208-Kana (half-width katakana, the right half of | 7402 and Japanese-JISX0208-Kana (half-width katakana, the right half of |
7038 JISX0201). It uses 8-bit bytes. | 7403 JISX0201). It uses 8-bit bytes. |
7050 Japanese-JISX0208 PC1 + 0x80 | PC2 + 0x80 | 7415 Japanese-JISX0208 PC1 + 0x80 | PC2 + 0x80 |
7051 Japanese-JISX0212 PC1 + 0x80 | PC2 + 0x80 | 7416 Japanese-JISX0212 PC1 + 0x80 | PC2 + 0x80 |
7052 @end example | 7417 @end example |
7053 | 7418 |
7054 | 7419 |
7055 @node JIS7 | 7420 @node JIS7, , Japanese EUC (Extended Unix Code), Encodings |
7056 @subsection JIS7 | 7421 @subsection JIS7 |
7057 | 7422 |
7058 This encompasses the character sets Printing-ASCII, | 7423 This encompasses the character sets Printing-ASCII, |
7059 Japanese-JISX0201-Roman (the left half of JISX0201; this character set | 7424 Japanese-JISX0201-Roman (the left half of JISX0201; this character set |
7060 is very similar to Printing-ASCII and is a 94-character charset), | 7425 is very similar to Printing-ASCII and is a 94-character charset), |
7085 0x1B 0x28 0x42 ESC ( B invoke Printing-ASCII | 7450 0x1B 0x28 0x42 ESC ( B invoke Printing-ASCII |
7086 @end example | 7451 @end example |
7087 | 7452 |
7088 Initially, Printing-ASCII is invoked. | 7453 Initially, Printing-ASCII is invoked. |
7089 | 7454 |
7090 @node Internal Mule Encodings | 7455 @node Internal Mule Encodings, CCL, Encodings, MULE Character Sets and Encodings |
7091 @section Internal Mule Encodings | 7456 @section Internal Mule Encodings |
7092 | 7457 |
7093 In XEmacs/Mule, each character set is assigned a unique number, called a | 7458 In XEmacs/Mule, each character set is assigned a unique number, called a |
7094 @dfn{leading byte}. This is used in the encodings of a character. | 7459 @dfn{leading byte}. This is used in the encodings of a character. |
7095 Leading bytes are in the range 0x80 - 0xFF (except for ASCII, which has | 7460 Leading bytes are in the range 0x80 - 0xFF (except for ASCII, which has |
7131 @menu | 7496 @menu |
7132 * Internal String Encoding:: | 7497 * Internal String Encoding:: |
7133 * Internal Character Encoding:: | 7498 * Internal Character Encoding:: |
7134 @end menu | 7499 @end menu |
7135 | 7500 |
7136 @node Internal String Encoding | 7501 @node Internal String Encoding, Internal Character Encoding, Internal Mule Encodings, Internal Mule Encodings |
7137 @subsection Internal String Encoding | 7502 @subsection Internal String Encoding |
7138 | 7503 |
7139 ASCII characters are encoded using their position code directly. Other | 7504 ASCII characters are encoded using their position code directly. Other |
7140 characters are encoded using their leading byte followed by their | 7505 characters are encoded using their leading byte followed by their |
7141 position code(s) with the high bit set. Characters in private character | 7506 position code(s) with the high bit set. Characters in private character |
7181 None of the standard non-modal encodings meet all of these | 7546 None of the standard non-modal encodings meet all of these |
7182 conditions. For example, EUC satisfies only (2) and (3), while | 7547 conditions. For example, EUC satisfies only (2) and (3), while |
7183 Shift-JIS and Big5 (not yet described) satisfy only (2). (All | 7548 Shift-JIS and Big5 (not yet described) satisfy only (2). (All |
7184 non-modal encodings must satisfy (2), in order to be unambiguous.) | 7549 non-modal encodings must satisfy (2), in order to be unambiguous.) |
7185 | 7550 |
7186 @node Internal Character Encoding | 7551 @node Internal Character Encoding, , Internal String Encoding, Internal Mule Encodings |
7187 @subsection Internal Character Encoding | 7552 @subsection Internal Character Encoding |
7188 | 7553 |
7189 One 19-bit word represents a single character. The word is | 7554 One 19-bit word represents a single character. The word is |
7190 separated into three fields: | 7555 separated into three fields: |
7191 | 7556 |
7216 @end example | 7581 @end example |
7217 | 7582 |
7218 Note that character codes 0 - 255 are the same as the ``binary encoding'' | 7583 Note that character codes 0 - 255 are the same as the ``binary encoding'' |
7219 described above. | 7584 described above. |
7220 | 7585 |
7221 @node CCL | 7586 @node CCL, , Internal Mule Encodings, MULE Character Sets and Encodings |
7222 @section CCL | 7587 @section CCL |
7223 | 7588 |
7224 @example | 7589 @example |
7225 CCL PROGRAM SYNTAX: | 7590 CCL PROGRAM SYNTAX: |
7226 CCL_PROGRAM := (CCL_MAIN_BLOCK | 7591 CCL_PROGRAM := (CCL_MAIN_BLOCK |
7270 this is the code executed to handle any stuff that needs to be done | 7635 this is the code executed to handle any stuff that needs to be done |
7271 (e.g. designating back to ASCII and left-to-right mode) after all | 7636 (e.g. designating back to ASCII and left-to-right mode) after all |
7272 other encoded/decoded data has been written out. This is not used for | 7637 other encoded/decoded data has been written out. This is not used for |
7273 charset CCL programs. | 7638 charset CCL programs. |
7274 | 7639 |
7275 REGISTER: 0..7 -- refered by RRR or rrr | 7640 REGISTER: 0..7 -- referred by RRR or rrr |
7276 | 7641 |
7277 OPERATOR BIT FIELD (27-bit): XXXXXXXXXXXXXXX RRR TTTTT | 7642 OPERATOR BIT FIELD (27-bit): XXXXXXXXXXXXXXX RRR TTTTT |
7278 TTTTT (5-bit): operator type | 7643 TTTTT (5-bit): operator type |
7279 RRR (3-bit): register number | 7644 RRR (3-bit): register number |
7280 XXXXXXXXXXXXXXXX (15-bit): | 7645 XXXXXXXXXXXXXXXX (15-bit): |
7407 * Lstream Types:: Different sorts of things that are streamed. | 7772 * Lstream Types:: Different sorts of things that are streamed. |
7408 * Lstream Functions:: Functions for working with lstreams. | 7773 * Lstream Functions:: Functions for working with lstreams. |
7409 * Lstream Methods:: Creating new lstream types. | 7774 * Lstream Methods:: Creating new lstream types. |
7410 @end menu | 7775 @end menu |
7411 | 7776 |
7412 @node Creating an Lstream | 7777 @node Creating an Lstream, Lstream Types, Lstreams, Lstreams |
7413 @section Creating an Lstream | 7778 @section Creating an Lstream |
7414 | 7779 |
7415 Lstreams come in different types, depending on what is being interfaced | 7780 Lstreams come in different types, depending on what is being interfaced |
7416 to. Although the primitive for creating new lstreams is | 7781 to. Although the primitive for creating new lstreams is |
7417 @code{Lstream_new()}, generally you do not call this directly. Instead, | 7782 @code{Lstream_new()}, generally you do not call this directly. Instead, |
7438 Open for reading, but ``read'' never returns partial MULE characters. | 7803 Open for reading, but ``read'' never returns partial MULE characters. |
7439 @item "wc" | 7804 @item "wc" |
7440 Open for writing, but never writes partial MULE characters. | 7805 Open for writing, but never writes partial MULE characters. |
7441 @end table | 7806 @end table |
7442 | 7807 |
7443 @node Lstream Types | 7808 @node Lstream Types, Lstream Functions, Creating an Lstream, Lstreams |
7444 @section Lstream Types | 7809 @section Lstream Types |
7445 | 7810 |
7446 @table @asis | 7811 @table @asis |
7447 @item stdio | 7812 @item stdio |
7448 | 7813 |
7463 @item decoding | 7828 @item decoding |
7464 | 7829 |
7465 @item encoding | 7830 @item encoding |
7466 @end table | 7831 @end table |
7467 | 7832 |
7468 @node Lstream Functions | 7833 @node Lstream Functions, Lstream Methods, Lstream Types, Lstreams |
7469 @section Lstream Functions | 7834 @section Lstream Functions |
7470 | 7835 |
7471 @deftypefun {Lstream *} Lstream_new (Lstream_implementation *@var{imp}, CONST char *@var{mode}) | 7836 @deftypefun {Lstream *} Lstream_new (Lstream_implementation *@var{imp}, const char *@var{mode}) |
7472 Allocate and return a new Lstream. This function is not really meant to | 7837 Allocate and return a new Lstream. This function is not really meant to |
7473 be called directly; rather, each stream type should provide its own | 7838 be called directly; rather, each stream type should provide its own |
7474 stream creation function, which creates the stream and does any other | 7839 stream creation function, which creates the stream and does any other |
7475 necessary creation stuff (e.g. opening a file). | 7840 necessary creation stuff (e.g. opening a file). |
7476 @end deftypefun | 7841 @end deftypefun |
7546 | 7911 |
7547 @deftypefun void Lstream_rewind (Lstream *@var{stream}) | 7912 @deftypefun void Lstream_rewind (Lstream *@var{stream}) |
7548 Rewind the stream to the beginning. | 7913 Rewind the stream to the beginning. |
7549 @end deftypefun | 7914 @end deftypefun |
7550 | 7915 |
7551 @node Lstream Methods | 7916 @node Lstream Methods, , Lstream Functions, Lstreams |
7552 @section Lstream Methods | 7917 @section Lstream Methods |
7553 | 7918 |
7554 @deftypefn {Lstream Method} ssize_t reader (Lstream *@var{stream}, unsigned char *@var{data}, size_t @var{size}) | 7919 @deftypefn {Lstream Method} ssize_t reader (Lstream *@var{stream}, unsigned char *@var{data}, size_t @var{size}) |
7555 Read some data from the stream's end and store it into @var{data}, which | 7920 Read some data from the stream's end and store it into @var{data}, which |
7556 can hold @var{size} bytes. Return the number of bytes read. A return | 7921 can hold @var{size} bytes. Return the number of bytes read. A return |
7566 calls @code{Lstream_read()} with a very small size. | 7931 calls @code{Lstream_read()} with a very small size. |
7567 | 7932 |
7568 This function can be @code{NULL} if the stream is output-only. | 7933 This function can be @code{NULL} if the stream is output-only. |
7569 @end deftypefn | 7934 @end deftypefn |
7570 | 7935 |
7571 @deftypefn {Lstream Method} ssize_t writer (Lstream *@var{stream}, CONST unsigned char *@var{data}, size_t @var{size}) | 7936 @deftypefn {Lstream Method} ssize_t writer (Lstream *@var{stream}, const unsigned char *@var{data}, size_t @var{size}) |
7572 Send some data to the stream's end. Data to be sent is in @var{data} | 7937 Send some data to the stream's end. Data to be sent is in @var{data} |
7573 and is @var{size} bytes. Return the number of bytes sent. This | 7938 and is @var{size} bytes. Return the number of bytes sent. This |
7574 function can send and return fewer bytes than is passed in; in that | 7939 function can send and return fewer bytes than is passed in; in that |
7575 case, the function will just be called again until there is no data left | 7940 case, the function will just be called again until there is no data left |
7576 or 0 is returned. A return value of 0 means that no more data can be | 7941 or 0 is returned. A return value of 0 means that no more data can be |
7621 * Point:: | 7986 * Point:: |
7622 * Window Hierarchy:: | 7987 * Window Hierarchy:: |
7623 * The Window Object:: | 7988 * The Window Object:: |
7624 @end menu | 7989 @end menu |
7625 | 7990 |
7626 @node Introduction to Consoles; Devices; Frames; Windows | 7991 @node Introduction to Consoles; Devices; Frames; Windows, Point, Consoles; Devices; Frames; Windows, Consoles; Devices; Frames; Windows |
7627 @section Introduction to Consoles; Devices; Frames; Windows | 7992 @section Introduction to Consoles; Devices; Frames; Windows |
7628 | 7993 |
7629 A window-system window that you see on the screen is called a | 7994 A window-system window that you see on the screen is called a |
7630 @dfn{frame} in Emacs terminology. Each frame is subdivided into one or | 7995 @dfn{frame} in Emacs terminology. Each frame is subdivided into one or |
7631 more non-overlapping panes, called (confusingly) @dfn{windows}. Each | 7996 more non-overlapping panes, called (confusingly) @dfn{windows}. Each |
7656 There is a separate Lisp object type for each of these four concepts. | 8021 There is a separate Lisp object type for each of these four concepts. |
7657 Furthermore, there is logically a @dfn{selected console}, | 8022 Furthermore, there is logically a @dfn{selected console}, |
7658 @dfn{selected display}, @dfn{selected frame}, and @dfn{selected window}. | 8023 @dfn{selected display}, @dfn{selected frame}, and @dfn{selected window}. |
7659 Each of these objects is distinguished in various ways, such as being the | 8024 Each of these objects is distinguished in various ways, such as being the |
7660 default object for various functions that act on objects of that type. | 8025 default object for various functions that act on objects of that type. |
7661 Note that every containing object rememembers the ``selected'' object | 8026 Note that every containing object remembers the ``selected'' object |
7662 among the objects that it contains: e.g. not only is there a selected | 8027 among the objects that it contains: e.g. not only is there a selected |
7663 window, but every frame remembers the last window in it that was | 8028 window, but every frame remembers the last window in it that was |
7664 selected, and changing the selected frame causes the remembered window | 8029 selected, and changing the selected frame causes the remembered window |
7665 within it to become the selected window. Similar relationships apply | 8030 within it to become the selected window. Similar relationships apply |
7666 for consoles to devices and devices to frames. | 8031 for consoles to devices and devices to frames. |
7667 | 8032 |
7668 @node Point | 8033 @node Point, Window Hierarchy, Introduction to Consoles; Devices; Frames; Windows, Consoles; Devices; Frames; Windows |
7669 @section Point | 8034 @section Point |
7670 | 8035 |
7671 Recall that every buffer has a current insertion position, called | 8036 Recall that every buffer has a current insertion position, called |
7672 @dfn{point}. Now, two or more windows may be displaying the same buffer, | 8037 @dfn{point}. Now, two or more windows may be displaying the same buffer, |
7673 and the text cursor in the two windows (i.e. @code{point}) can be in | 8038 and the text cursor in the two windows (i.e. @code{point}) can be in |
7684 want to retrieve the correct value of @code{point} for a window, | 8049 want to retrieve the correct value of @code{point} for a window, |
7685 you must special-case on the selected window and retrieve the | 8050 you must special-case on the selected window and retrieve the |
7686 buffer's point instead. This is related to why @code{save-window-excursion} | 8051 buffer's point instead. This is related to why @code{save-window-excursion} |
7687 does not save the selected window's value of @code{point}. | 8052 does not save the selected window's value of @code{point}. |
7688 | 8053 |
7689 @node Window Hierarchy | 8054 @node Window Hierarchy, The Window Object, Point, Consoles; Devices; Frames; Windows |
7690 @section Window Hierarchy | 8055 @section Window Hierarchy |
7691 @cindex window hierarchy | 8056 @cindex window hierarchy |
7692 @cindex hierarchy of windows | 8057 @cindex hierarchy of windows |
7693 | 8058 |
7694 If a frame contains multiple windows (panes), they are always created | 8059 If a frame contains multiple windows (panes), they are always created |
7782 frames have no root window, and the @code{next} of the minibuffer window | 8147 frames have no root window, and the @code{next} of the minibuffer window |
7783 is @code{nil} but the @code{prev} points to itself. (#### This is an | 8148 is @code{nil} but the @code{prev} points to itself. (#### This is an |
7784 artifact that should be fixed.) | 8149 artifact that should be fixed.) |
7785 @end enumerate | 8150 @end enumerate |
7786 | 8151 |
7787 @node The Window Object | 8152 @node The Window Object, , Window Hierarchy, Consoles; Devices; Frames; Windows |
7788 @section The Window Object | 8153 @section The Window Object |
7789 | 8154 |
7790 Windows have the following accessible fields: | 8155 Windows have the following accessible fields: |
7791 | 8156 |
7792 @table @code | 8157 @table @code |
7914 * Critical Redisplay Sections:: | 8279 * Critical Redisplay Sections:: |
7915 * Line Start Cache:: | 8280 * Line Start Cache:: |
7916 * Redisplay Piece by Piece:: | 8281 * Redisplay Piece by Piece:: |
7917 @end menu | 8282 @end menu |
7918 | 8283 |
7919 @node Critical Redisplay Sections | 8284 @node Critical Redisplay Sections, Line Start Cache, The Redisplay Mechanism, The Redisplay Mechanism |
7920 @section Critical Redisplay Sections | 8285 @section Critical Redisplay Sections |
7921 @cindex critical redisplay sections | 8286 @cindex critical redisplay sections |
7922 | 8287 |
7923 Within this section, we are defenseless and assume that the | 8288 Within this section, we are defenseless and assume that the |
7924 following cannot happen: | 8289 following cannot happen: |
7946 we simply return. #### We should abort instead. | 8311 we simply return. #### We should abort instead. |
7947 | 8312 |
7948 #### If a frame-size change does occur we should probably | 8313 #### If a frame-size change does occur we should probably |
7949 actually be preempting redisplay. | 8314 actually be preempting redisplay. |
7950 | 8315 |
7951 @node Line Start Cache | 8316 @node Line Start Cache, Redisplay Piece by Piece, Critical Redisplay Sections, The Redisplay Mechanism |
7952 @section Line Start Cache | 8317 @section Line Start Cache |
7953 @cindex line start cache | 8318 @cindex line start cache |
7954 | 8319 |
7955 The traditional scrolling code in Emacs breaks in a variable height | 8320 The traditional scrolling code in Emacs breaks in a variable height |
7956 world. It depends on the key assumption that the number of lines that | 8321 world. It depends on the key assumption that the number of lines that |
8007 @end itemize | 8372 @end itemize |
8008 | 8373 |
8009 In case you're wondering, the Second Golden Rule of Redisplay is not | 8374 In case you're wondering, the Second Golden Rule of Redisplay is not |
8010 applicable. | 8375 applicable. |
8011 | 8376 |
8012 @node Redisplay Piece by Piece | 8377 @node Redisplay Piece by Piece, , Line Start Cache, The Redisplay Mechanism |
8013 @section Redisplay Piece by Piece | 8378 @section Redisplay Piece by Piece |
8014 @cindex Redisplay Piece by Piece | 8379 @cindex Redisplay Piece by Piece |
8015 | 8380 |
8016 As you can begin to see redisplay is complex and also not well | 8381 As you can begin to see redisplay is complex and also not well |
8017 documented. Chuck no longer works on XEmacs so this section is my take | 8382 documented. Chuck no longer works on XEmacs so this section is my take |
8029 @item | 8394 @item |
8030 Output changes Implemented by @code{redisplay-output.c}, | 8395 Output changes Implemented by @code{redisplay-output.c}, |
8031 @code{redisplay-x.c}, @code{redisplay-msw.c} and @code{redisplay-tty.c} | 8396 @code{redisplay-x.c}, @code{redisplay-msw.c} and @code{redisplay-tty.c} |
8032 @end enumerate | 8397 @end enumerate |
8033 | 8398 |
8034 Steps 1 and 2 are device-independant and relatively complex. Step 3 is | 8399 Steps 1 and 2 are device-independent and relatively complex. Step 3 is |
8035 mostly device-dependent. | 8400 mostly device-dependent. |
8036 | 8401 |
8037 Determining the desired display | 8402 Determining the desired display |
8038 | 8403 |
8039 Display attributes are stored in @code{display_line} structures. Each | 8404 Display attributes are stored in @code{display_line} structures. Each |
8040 @code{display_line} consists of a set of @code{display_block}'s and each | 8405 @code{display_line} consists of a set of @code{display_block}'s and each |
8041 @code{display_block} contains a number of @code{rune}'s. Generally | 8406 @code{display_block} contains a number of @code{rune}'s. Generally |
8042 dynarr's of @code{display_line}'s are held by each window representing | 8407 dynarr's of @code{display_line}'s are held by each window representing |
8043 the current display and the desired display. | 8408 the current display and the desired display. |
8044 | 8409 |
8045 The @code{display_line} structures are tighly tied to buffers which | 8410 The @code{display_line} structures are tightly tied to buffers which |
8046 presents a problem for redisplay as this connection is bogus for the | 8411 presents a problem for redisplay as this connection is bogus for the |
8047 modeline. Hence the @code{display_line} generation routines are | 8412 modeline. Hence the @code{display_line} generation routines are |
8048 duplicated for generating the modeline. This means that the modeline | 8413 duplicated for generating the modeline. This means that the modeline |
8049 display code has many bugs that the standard redisplay code does not. | 8414 display code has many bugs that the standard redisplay code does not. |
8050 | 8415 |
8051 The guts of @code{display_line} generation are in | 8416 The guts of @code{display_line} generation are in |
8052 @code{create_text_block}, which creates a single display line for the | 8417 @code{create_text_block}, which creates a single display line for the |
8053 desired locale. This incrementally parses the characters on the current | 8418 desired locale. This incrementally parses the characters on the current |
8054 line and generates redisplay structures for each. | 8419 line and generates redisplay structures for each. |
8055 | 8420 |
8056 Gutter redisplay is different. Because the data to display is stored in | 8421 Gutter redisplay is different. Because the data to display is stored in |
8057 a string we cannot use @code{create_text_block}. Instead we use | 8422 a string we cannot use @code{create_text_block}. Instead we use |
8058 @code{create_text_string_block} which performs the same function as | 8423 @code{create_text_string_block} which performs the same function as |
8059 @code{create_text_block} but for strings. Many of the complexities of | 8424 @code{create_text_block} but for strings. Many of the complexities of |
8066 @menu | 8431 @menu |
8067 * Introduction to Extents:: Extents are ranges over text, with properties. | 8432 * Introduction to Extents:: Extents are ranges over text, with properties. |
8068 * Extent Ordering:: How extents are ordered internally. | 8433 * Extent Ordering:: How extents are ordered internally. |
8069 * Format of the Extent Info:: The extent information in a buffer or string. | 8434 * Format of the Extent Info:: The extent information in a buffer or string. |
8070 * Zero-Length Extents:: A weird special case. | 8435 * Zero-Length Extents:: A weird special case. |
8071 * Mathematics of Extent Ordering:: A rigorous foundation. | 8436 * Mathematics of Extent Ordering:: A rigorous foundation. |
8072 * Extent Fragments:: Cached information useful for redisplay. | 8437 * Extent Fragments:: Cached information useful for redisplay. |
8073 @end menu | 8438 @end menu |
8074 | 8439 |
8075 @node Introduction to Extents | 8440 @node Introduction to Extents, Extent Ordering, Extents, Extents |
8076 @section Introduction to Extents | 8441 @section Introduction to Extents |
8077 | 8442 |
8078 Extents are regions over a buffer, with a start and an end position | 8443 Extents are regions over a buffer, with a start and an end position |
8079 denoting the region of the buffer included in the extent. In | 8444 denoting the region of the buffer included in the extent. In |
8080 addition, either end can be closed or open, meaning that the endpoint | 8445 addition, either end can be closed or open, meaning that the endpoint |
8092 automatically go inside or out of extents as necessary with no | 8457 automatically go inside or out of extents as necessary with no |
8093 further work needing to be done. It didn't work out that way, | 8458 further work needing to be done. It didn't work out that way, |
8094 however, and just ended up complexifying and buggifying all the | 8459 however, and just ended up complexifying and buggifying all the |
8095 rest of the code.) | 8460 rest of the code.) |
8096 | 8461 |
8097 @node Extent Ordering | 8462 @node Extent Ordering, Format of the Extent Info, Introduction to Extents, Extents |
8098 @section Extent Ordering | 8463 @section Extent Ordering |
8099 | 8464 |
8100 Extents are compared using memory indices. There are two orderings | 8465 Extents are compared using memory indices. There are two orderings |
8101 for extents and both orders are kept current at all times. The normal | 8466 for extents and both orders are kept current at all times. The normal |
8102 or @dfn{display} order is as follows: | 8467 or @dfn{display} order is as follows: |
8126 The display order and the e-order are complementary orders: any | 8491 The display order and the e-order are complementary orders: any |
8127 theorem about the display order also applies to the e-order if you swap | 8492 theorem about the display order also applies to the e-order if you swap |
8128 all occurrences of ``display order'' and ``e-order'', ``less than'' and | 8493 all occurrences of ``display order'' and ``e-order'', ``less than'' and |
8129 ``greater than'', and ``extent start'' and ``extent end''. | 8494 ``greater than'', and ``extent start'' and ``extent end''. |
8130 | 8495 |
8131 @node Format of the Extent Info | 8496 @node Format of the Extent Info, Zero-Length Extents, Extent Ordering, Extents |
8132 @section Format of the Extent Info | 8497 @section Format of the Extent Info |
8133 | 8498 |
8134 An extent-info structure consists of a list of the buffer or string's | 8499 An extent-info structure consists of a list of the buffer or string's |
8135 extents and a @dfn{stack of extents} that lists all of the extents over | 8500 extents and a @dfn{stack of extents} that lists all of the extents over |
8136 a particular position. The stack-of-extents info is used for | 8501 a particular position. The stack-of-extents info is used for |
8160 between two extents. Note also that callers of these functions should | 8525 between two extents. Note also that callers of these functions should |
8161 not be aware of the fact that the extent list is implemented as an | 8526 not be aware of the fact that the extent list is implemented as an |
8162 array, except for the fact that positions are integers (this should be | 8527 array, except for the fact that positions are integers (this should be |
8163 generalized to handle integers and linked list equally well). | 8528 generalized to handle integers and linked list equally well). |
8164 | 8529 |
8165 @node Zero-Length Extents | 8530 @node Zero-Length Extents, Mathematics of Extent Ordering, Format of the Extent Info, Extents |
8166 @section Zero-Length Extents | 8531 @section Zero-Length Extents |
8167 | 8532 |
8168 Extents can be zero-length, and will end up that way if their endpoints | 8533 Extents can be zero-length, and will end up that way if their endpoints |
8169 are explicitly set that way or if their detachable property is nil | 8534 are explicitly set that way or if their detachable property is nil |
8170 and all the text in the extent is deleted. (The exception is open-open | 8535 and all the text in the extent is deleted. (The exception is open-open |
8189 | 8554 |
8190 Note that closed-open, non-detachable zero-length extents behave | 8555 Note that closed-open, non-detachable zero-length extents behave |
8191 exactly like markers and that open-closed, non-detachable zero-length | 8556 exactly like markers and that open-closed, non-detachable zero-length |
8192 extents behave like the ``point-type'' marker in Mule. | 8557 extents behave like the ``point-type'' marker in Mule. |
8193 | 8558 |
8194 @node Mathematics of Extent Ordering | 8559 @node Mathematics of Extent Ordering, Extent Fragments, Zero-Length Extents, Extents |
8195 @section Mathematics of Extent Ordering | 8560 @section Mathematics of Extent Ordering |
8196 @cindex extent mathematics | 8561 @cindex extent mathematics |
8197 @cindex mathematics of extents | 8562 @cindex mathematics of extents |
8198 @cindex extent ordering | 8563 @cindex extent ordering |
8199 | 8564 |
8324 Proof: If @math{F2} does not include @math{I} then its start index is | 8689 Proof: If @math{F2} does not include @math{I} then its start index is |
8325 greater than @math{I} and thus it is greater than any extent in | 8690 greater than @math{I} and thus it is greater than any extent in |
8326 @math{S}, including @math{F}. Otherwise, @math{F2} includes @math{I} | 8691 @math{S}, including @math{F}. Otherwise, @math{F2} includes @math{I} |
8327 and thus is in @math{S}, and thus @math{F2 >= F}. | 8692 and thus is in @math{S}, and thus @math{F2 >= F}. |
8328 | 8693 |
8329 @node Extent Fragments | 8694 @node Extent Fragments, , Mathematics of Extent Ordering, Extents |
8330 @section Extent Fragments | 8695 @section Extent Fragments |
8331 @cindex extent fragment | 8696 @cindex extent fragment |
8332 | 8697 |
8333 Imagine that the buffer is divided up into contiguous, non-overlapping | 8698 Imagine that the buffer is divided up into contiguous, non-overlapping |
8334 @dfn{runs} of text such that no extent starts or ends within a run | 8699 @dfn{runs} of text such that no extent starts or ends within a run |
8373 caching is done by @code{image_instantiate} and is necessary because it | 8738 caching is done by @code{image_instantiate} and is necessary because it |
8374 is generally possible to display an image-instance in multiple | 8739 is generally possible to display an image-instance in multiple |
8375 domains. For instance if we create a Pixmap, we can actually display | 8740 domains. For instance if we create a Pixmap, we can actually display |
8376 this on multiple windows - even though we only need a single Pixmap | 8741 this on multiple windows - even though we only need a single Pixmap |
8377 instance to do this. If caching wasn't done then it would be necessary | 8742 instance to do this. If caching wasn't done then it would be necessary |
8378 to create image-instances for every displayable occurrance of a glyph - | 8743 to create image-instances for every displayable occurrence of a glyph - |
8379 and every usage - and this would be extremely memory and cpu intensive. | 8744 and every usage - and this would be extremely memory and cpu intensive. |
8380 | 8745 |
8381 Widget-glyphs (a.k.a native widgets) are not cached in this way. This is | 8746 Widget-glyphs (a.k.a native widgets) are not cached in this way. This is |
8382 because widget-glyph image-instances on screen are toolkit windows, and | 8747 because widget-glyph image-instances on screen are toolkit windows, and |
8383 thus cannot be reused in multiple XEmacs domains. Thus widget-glyphs are | 8748 thus cannot be reused in multiple XEmacs domains. Thus widget-glyphs are |
8407 multiple types of toolkit. Each element in the widget hierarchy is updated | 8772 multiple types of toolkit. Each element in the widget hierarchy is updated |
8408 from its corresponding widget_instance by walking the widget_instance | 8773 from its corresponding widget_instance by walking the widget_instance |
8409 tree recursively. | 8774 tree recursively. |
8410 | 8775 |
8411 This has desirable properties such as lw_modify_all_widgets which is | 8776 This has desirable properties such as lw_modify_all_widgets which is |
8412 called from glyphs-x.c and updates all the properties of a widget | 8777 called from @file{glyphs-x.c} and updates all the properties of a widget |
8413 without having to know what the widget is or what toolkit it is from. | 8778 without having to know what the widget is or what toolkit it is from. |
8414 Unfortunately this also has hairy properrties such as making the lwlib | 8779 Unfortunately this also has hairy properties such as making the lwlib |
8415 code quite complex. And of course lwlib has to know at some level what | 8780 code quite complex. And of course lwlib has to know at some level what |
8416 the widget is and how to set its properties. | 8781 the widget is and how to set its properties. |
8417 | 8782 |
8418 @node Specifiers, Menus, Glyphs, Top | 8783 @node Specifiers, Menus, Glyphs, Top |
8419 @chapter Specifiers | 8784 @chapter Specifiers |
8543 @item tty_name | 8908 @item tty_name |
8544 The name of the terminal that the subprocess is using, | 8909 The name of the terminal that the subprocess is using, |
8545 or @code{nil} if it is using pipes. | 8910 or @code{nil} if it is using pipes. |
8546 @end table | 8911 @end table |
8547 | 8912 |
8548 @node Interface to X Windows, Index, Subprocesses, Top | 8913 @node Interface to X Windows, Index , Subprocesses, Top |
8549 @chapter Interface to X Windows | 8914 @chapter Interface to X Windows |
8550 | 8915 |
8551 Not yet documented. | 8916 Not yet documented. |
8552 | 8917 |
8553 @include index.texi | 8918 @include index.texi |
8556 @summarycontents | 8921 @summarycontents |
8557 @contents | 8922 @contents |
8558 @c That's all | 8923 @c That's all |
8559 | 8924 |
8560 @bye | 8925 @bye |
8561 |