Mercurial > hg > xemacs-beta
comparison man/internals/internals.texi @ 412:697ef44129c6 r21-2-14
Import from CVS: tag r21-2-14
author | cvs |
---|---|
date | Mon, 13 Aug 2007 11:20:41 +0200 |
parents | de805c49cfc1 |
children | da8ed4261e83 |
comparison
equal
deleted
inserted
replaced
411:12e008d41344 | 412:697ef44129c6 |
---|---|
5 @c %**end of header | 5 @c %**end of header |
6 | 6 |
7 @ifinfo | 7 @ifinfo |
8 @dircategory XEmacs Editor | 8 @dircategory XEmacs Editor |
9 @direntry | 9 @direntry |
10 * Internals: (internals). XEmacs Internals Manual. | 10 * Internals: (internals). XEmacs Internals Manual. |
11 @end direntry | 11 @end direntry |
12 | 12 |
13 Copyright @copyright{} 1992 - 1996 Ben Wing. | 13 Copyright @copyright{} 1992 - 1996 Ben Wing. |
14 Copyright @copyright{} 1996, 1997 Sun Microsystems. | 14 Copyright @copyright{} 1996, 1997 Sun Microsystems. |
15 Copyright @copyright{} 1994 - 1998 Free Software Foundation. | 15 Copyright @copyright{} 1994 - 1998 Free Software Foundation. |
61 @setchapternewpage odd | 61 @setchapternewpage odd |
62 @finalout | 62 @finalout |
63 | 63 |
64 @titlepage | 64 @titlepage |
65 @title XEmacs Internals Manual | 65 @title XEmacs Internals Manual |
66 @subtitle Version 1.3, August 1999 | 66 @subtitle Version 1.2, October 1998 |
67 | 67 |
68 @author Ben Wing | 68 @author Ben Wing |
69 @author Martin Buchholz | 69 @author Martin Buchholz |
70 @author Hrvoje Niksic | 70 @author Hrvoje Niksic |
71 @author Matthias Neubauer | |
72 @author Olivier Galibert | |
73 @page | 71 @page |
74 @vskip 0pt plus 1fill | 72 @vskip 0pt plus 1fill |
75 | 73 |
76 @noindent | 74 @noindent |
77 Copyright @copyright{} 1992 - 1996 Ben Wing. @* | 75 Copyright @copyright{} 1992 - 1996 Ben Wing. @* |
78 Copyright @copyright{} 1996, 1997 Sun Microsystems, Inc. @* | 76 Copyright @copyright{} 1996, 1997 Sun Microsystems, Inc. @* |
79 Copyright @copyright{} 1994 - 1998 Free Software Foundation. @* | 77 Copyright @copyright{} 1994 - 1998 Free Software Foundation. @* |
80 Copyright @copyright{} 1994, 1995 Board of Trustees, University of Illinois. | 78 Copyright @copyright{} 1994, 1995 Board of Trustees, University of Illinois. |
81 | 79 |
82 @sp 2 | 80 @sp 2 |
83 Version 1.3 @* | 81 Version 1.2 @* |
84 August 1999.@* | 82 October 1998.@* |
85 | 83 |
86 Permission is granted to make and distribute verbatim copies of this | 84 Permission is granted to make and distribute verbatim copies of this |
87 manual provided the copyright notice and this permission notice are | 85 manual provided the copyright notice and this permission notice are |
88 preserved on all copies. | 86 preserved on all copies. |
89 | 87 |
117 * The XEmacs Object System (Abstractly Speaking):: | 115 * The XEmacs Object System (Abstractly Speaking):: |
118 * How Lisp Objects Are Represented in C:: | 116 * How Lisp Objects Are Represented in C:: |
119 * Rules When Writing New C Code:: | 117 * Rules When Writing New C Code:: |
120 * A Summary of the Various XEmacs Modules:: | 118 * A Summary of the Various XEmacs Modules:: |
121 * Allocation of Objects in XEmacs Lisp:: | 119 * Allocation of Objects in XEmacs Lisp:: |
122 * Dumping:: | |
123 * Events and the Event Loop:: | 120 * Events and the Event Loop:: |
124 * Evaluation; Stack Frames; Bindings:: | 121 * Evaluation; Stack Frames; Bindings:: |
125 * Symbols and Variables:: | 122 * Symbols and Variables:: |
126 * Buffers and Textual Representation:: | 123 * Buffers and Textual Representation:: |
127 * MULE Character Sets and Encodings:: | 124 * MULE Character Sets and Encodings:: |
128 * The Lisp Reader and Compiler:: | 125 * The Lisp Reader and Compiler:: |
129 * Lstreams:: | 126 * Lstreams:: |
130 * Consoles; Devices; Frames; Windows:: | 127 * Consoles; Devices; Frames; Windows:: |
131 * The Redisplay Mechanism:: | 128 * The Redisplay Mechanism:: |
132 * Extents:: | 129 * Extents:: |
133 * Faces:: | 130 * Faces and Glyphs:: |
134 * Glyphs:: | |
135 * Specifiers:: | 131 * Specifiers:: |
136 * Menus:: | 132 * Menus:: |
137 * Subprocesses:: | 133 * Subprocesses:: |
138 * Interface to X Windows:: | 134 * Interface to X Windows:: |
139 * Index:: | 135 * Index:: Index including concepts, functions, variables, |
140 | 136 and other terms. |
141 @detailmenu | 137 |
142 | 138 --- The Detailed Node Listing --- |
143 --- The Detailed Node Listing --- | 139 |
140 Here are other nodes that are inferiors of those already listed, | |
141 mentioned here so you can get to them in one step: | |
144 | 142 |
145 A History of Emacs | 143 A History of Emacs |
146 | 144 |
147 * Through Version 18:: Unification prevails. | 145 * Through Version 18:: Unification prevails. |
148 * Lucid Emacs:: One version 19 Emacs. | 146 * Lucid Emacs:: One version 19 Emacs. |
149 * GNU Emacs 19:: The other version 19 Emacs. | 147 * GNU Emacs 19:: The other version 19 Emacs. |
150 * GNU Emacs 20:: The other version 20 Emacs. | |
151 * XEmacs:: The continuation of Lucid Emacs. | 148 * XEmacs:: The continuation of Lucid Emacs. |
152 | 149 |
153 Rules When Writing New C Code | 150 Rules When Writing New C Code |
154 | 151 |
155 * General Coding Rules:: | 152 * General Coding Rules:: |
156 * Writing Lisp Primitives:: | 153 * Writing Lisp Primitives:: |
157 * Adding Global Lisp Variables:: | 154 * Adding Global Lisp Variables:: |
158 * Coding for Mule:: | |
159 * Techniques for XEmacs Developers:: | 155 * Techniques for XEmacs Developers:: |
160 | |
161 Coding for Mule | |
162 | |
163 * Character-Related Data Types:: | |
164 * Working With Character and Byte Positions:: | |
165 * Conversion to and from External Data:: | |
166 * General Guidelines for Writing Mule-Aware Code:: | |
167 * An Example of Mule-Aware Code:: | |
168 | 156 |
169 A Summary of the Various XEmacs Modules | 157 A Summary of the Various XEmacs Modules |
170 | 158 |
171 * Low-Level Modules:: | 159 * Low-Level Modules:: |
172 * Basic Lisp Modules:: | 160 * Basic Lisp Modules:: |
184 Allocation of Objects in XEmacs Lisp | 172 Allocation of Objects in XEmacs Lisp |
185 | 173 |
186 * Introduction to Allocation:: | 174 * Introduction to Allocation:: |
187 * Garbage Collection:: | 175 * Garbage Collection:: |
188 * GCPROing:: | 176 * GCPROing:: |
189 * Garbage Collection - Step by Step:: | |
190 * Integers and Characters:: | 177 * Integers and Characters:: |
191 * Allocation from Frob Blocks:: | 178 * Allocation from Frob Blocks:: |
192 * lrecords:: | 179 * lrecords:: |
193 * Low-level allocation:: | 180 * Low-level allocation:: |
181 * Pure Space:: | |
194 * Cons:: | 182 * Cons:: |
195 * Vector:: | 183 * Vector:: |
196 * Bit Vector:: | 184 * Bit Vector:: |
197 * Symbol:: | 185 * Symbol:: |
198 * Marker:: | 186 * Marker:: |
199 * String:: | 187 * String:: |
200 * Compiled Function:: | 188 * Compiled Function:: |
201 | |
202 Garbage Collection - Step by Step | |
203 | |
204 * Invocation:: | |
205 * garbage_collect_1:: | |
206 * mark_object:: | |
207 * gc_sweep:: | |
208 * sweep_lcrecords_1:: | |
209 * compact_string_chars:: | |
210 * sweep_strings:: | |
211 * sweep_bit_vectors_1:: | |
212 | |
213 Dumping | |
214 | |
215 * Overview:: | |
216 * Data descriptions:: | |
217 * Dumping phase:: | |
218 * Reloading phase:: | |
219 | |
220 Dumping phase | |
221 | |
222 * Object inventory:: | |
223 * Address allocation:: | |
224 * The header:: | |
225 * Data dumping:: | |
226 * Pointers dumping:: | |
227 | 189 |
228 Events and the Event Loop | 190 Events and the Event Loop |
229 | 191 |
230 * Introduction to Events:: | 192 * Introduction to Events:: |
231 * Main Loop:: | 193 * Main Loop:: |
261 MULE Character Sets and Encodings | 223 MULE Character Sets and Encodings |
262 | 224 |
263 * Character Sets:: | 225 * Character Sets:: |
264 * Encodings:: | 226 * Encodings:: |
265 * Internal Mule Encodings:: | 227 * Internal Mule Encodings:: |
266 * CCL:: | |
267 | 228 |
268 Encodings | 229 Encodings |
269 | 230 |
270 * Japanese EUC (Extended Unix Code):: | 231 * Japanese EUC (Extended Unix Code):: |
271 * JIS7:: | 232 * JIS7:: |
273 Internal Mule Encodings | 234 Internal Mule Encodings |
274 | 235 |
275 * Internal String Encoding:: | 236 * Internal String Encoding:: |
276 * Internal Character Encoding:: | 237 * Internal Character Encoding:: |
277 | 238 |
239 The Lisp Reader and Compiler | |
240 | |
278 Lstreams | 241 Lstreams |
279 | |
280 * Creating an Lstream:: Creating an lstream object. | |
281 * Lstream Types:: Different sorts of things that are streamed. | |
282 * Lstream Functions:: Functions for working with lstreams. | |
283 * Lstream Methods:: Creating new lstream types. | |
284 | 242 |
285 Consoles; Devices; Frames; Windows | 243 Consoles; Devices; Frames; Windows |
286 | 244 |
287 * Introduction to Consoles; Devices; Frames; Windows:: | 245 * Introduction to Consoles; Devices; Frames; Windows:: |
288 * Point:: | 246 * Point:: |
289 * Window Hierarchy:: | 247 * Window Hierarchy:: |
290 * The Window Object:: | |
291 | 248 |
292 The Redisplay Mechanism | 249 The Redisplay Mechanism |
293 | 250 |
294 * Critical Redisplay Sections:: | 251 * Critical Redisplay Sections:: |
295 * Line Start Cache:: | 252 * Line Start Cache:: |
296 * Redisplay Piece by Piece:: | |
297 | 253 |
298 Extents | 254 Extents |
299 | 255 |
300 * Introduction to Extents:: Extents are ranges over text, with properties. | 256 * Introduction to Extents:: Extents are ranges over text, with properties. |
301 * Extent Ordering:: How extents are ordered internally. | 257 * Extent Ordering:: How extents are ordered internally. |
302 * Format of the Extent Info:: The extent information in a buffer or string. | 258 * Format of the Extent Info:: The extent information in a buffer or string. |
303 * Zero-Length Extents:: A weird special case. | 259 * Zero-Length Extents:: A weird special case. |
304 * Mathematics of Extent Ordering:: A rigorous foundation. | 260 * Mathematics of Extent Ordering:: A rigorous foundation. |
305 * Extent Fragments:: Cached information useful for redisplay. | 261 * Extent Fragments:: Cached information useful for redisplay. |
306 | 262 |
307 @end detailmenu | 263 Faces and Glyphs |
264 | |
265 Specifiers | |
266 | |
267 Menus | |
268 | |
269 Subprocesses | |
270 | |
271 Interface to X Windows | |
272 | |
308 @end menu | 273 @end menu |
309 | 274 |
310 @node A History of Emacs, XEmacs From the Outside, Top, Top | 275 @node A History of Emacs, XEmacs From the Outside, Top, Top |
311 @chapter A History of Emacs | 276 @chapter A History of Emacs |
312 @cindex history of Emacs | 277 @cindex history of Emacs |
343 * GNU Emacs 19:: The other version 19 Emacs. | 308 * GNU Emacs 19:: The other version 19 Emacs. |
344 * GNU Emacs 20:: The other version 20 Emacs. | 309 * GNU Emacs 20:: The other version 20 Emacs. |
345 * XEmacs:: The continuation of Lucid Emacs. | 310 * XEmacs:: The continuation of Lucid Emacs. |
346 @end menu | 311 @end menu |
347 | 312 |
348 @node Through Version 18, Lucid Emacs, A History of Emacs, A History of Emacs | 313 @node Through Version 18 |
349 @section Through Version 18 | 314 @section Through Version 18 |
350 @cindex Gosling, James | 315 @cindex Gosling, James |
351 @cindex Great Usenet Renaming | 316 @cindex Great Usenet Renaming |
352 | 317 |
353 Although the history of the early versions of GNU Emacs is unclear, | 318 Although the history of the early versions of GNU Emacs is unclear, |
456 version 18.58 released ?????. | 421 version 18.58 released ?????. |
457 @item | 422 @item |
458 version 18.59 released October 31, 1992. | 423 version 18.59 released October 31, 1992. |
459 @end itemize | 424 @end itemize |
460 | 425 |
461 @node Lucid Emacs, GNU Emacs 19, Through Version 18, A History of Emacs | 426 @node Lucid Emacs |
462 @section Lucid Emacs | 427 @section Lucid Emacs |
463 @cindex Lucid Emacs | 428 @cindex Lucid Emacs |
464 @cindex Lucid Inc. | 429 @cindex Lucid Inc. |
465 @cindex Energize | 430 @cindex Energize |
466 @cindex Epoch | 431 @cindex Epoch |
544 version 20.3 (the first stable version of XEmacs 20.x) released November 30, | 509 version 20.3 (the first stable version of XEmacs 20.x) released November 30, |
545 1997. | 510 1997. |
546 version 20.4 released February 28, 1998. | 511 version 20.4 released February 28, 1998. |
547 @end itemize | 512 @end itemize |
548 | 513 |
549 @node GNU Emacs 19, GNU Emacs 20, Lucid Emacs, A History of Emacs | 514 @node GNU Emacs 19 |
550 @section GNU Emacs 19 | 515 @section GNU Emacs 19 |
551 @cindex GNU Emacs 19 | 516 @cindex GNU Emacs 19 |
552 @cindex FSF Emacs | 517 @cindex FSF Emacs |
553 | 518 |
554 About a year after the initial release of Lucid Emacs, the FSF | 519 About a year after the initial release of Lucid Emacs, the FSF |
621 worse. Lucid soon began incorporating features from GNU Emacs 19 into | 586 worse. Lucid soon began incorporating features from GNU Emacs 19 into |
622 Lucid Emacs; the work was mostly done by Richard Mlynarik, who had been | 587 Lucid Emacs; the work was mostly done by Richard Mlynarik, who had been |
623 working on and using GNU Emacs for a long time (back as far as version | 588 working on and using GNU Emacs for a long time (back as far as version |
624 16 or 17). | 589 16 or 17). |
625 | 590 |
626 @node GNU Emacs 20, XEmacs, GNU Emacs 19, A History of Emacs | 591 @node GNU Emacs 20 |
627 @section GNU Emacs 20 | 592 @section GNU Emacs 20 |
628 @cindex GNU Emacs 20 | 593 @cindex GNU Emacs 20 |
629 @cindex FSF Emacs | 594 @cindex FSF Emacs |
630 | 595 |
631 On February 2, 1997 work began on GNU Emacs to integrate Mule. The first | 596 On February 2, 1997 work began on GNU Emacs to integrate Mule. The first |
640 version 20.2 released September 20, 1997. | 605 version 20.2 released September 20, 1997. |
641 @item | 606 @item |
642 version 20.3 released August 19, 1998. | 607 version 20.3 released August 19, 1998. |
643 @end itemize | 608 @end itemize |
644 | 609 |
645 @node XEmacs, , GNU Emacs 20, A History of Emacs | 610 @node XEmacs |
646 @section XEmacs | 611 @section XEmacs |
647 @cindex XEmacs | 612 @cindex XEmacs |
648 | 613 |
649 @cindex Sun Microsystems | 614 @cindex Sun Microsystems |
650 @cindex University of Illinois | 615 @cindex University of Illinois |
728 windows, frames, events) that are useful for implementing an editor. | 693 windows, frames, events) that are useful for implementing an editor. |
729 Some of these objects (in particular windows and frames) have | 694 Some of these objects (in particular windows and frames) have |
730 displayable representations, and XEmacs provides a function | 695 displayable representations, and XEmacs provides a function |
731 @code{redisplay()} that ensures that the display of all such objects | 696 @code{redisplay()} that ensures that the display of all such objects |
732 matches their internal state. Most of the time, a standard Lisp | 697 matches their internal state. Most of the time, a standard Lisp |
733 environment is in a @dfn{read-eval-print} loop---i.e. ``read some Lisp | 698 environment is in a @dfn{read-eval-print} loop -- i.e. ``read some Lisp |
734 code, execute it, and print the results''. XEmacs has a similar loop: | 699 code, execute it, and print the results''. XEmacs has a similar loop: |
735 | 700 |
736 @itemize @bullet | 701 @itemize @bullet |
737 @item | 702 @item |
738 read an event | 703 read an event |
903 handler for some or all classes of errors. (If no handler is registered, | 868 handler for some or all classes of errors. (If no handler is registered, |
904 a default handler, generally installed by the top-level event loop, is | 869 a default handler, generally installed by the top-level event loop, is |
905 executed; this prints out the error and continues.) Routines can also | 870 executed; this prints out the error and continues.) Routines can also |
906 specify cleanup code (called an @dfn{unwind-protect}) that will be | 871 specify cleanup code (called an @dfn{unwind-protect}) that will be |
907 called when control exits from a block of code, no matter how that exit | 872 called when control exits from a block of code, no matter how that exit |
908 occurs---i.e. even if a function deeply nested below it causes a | 873 occurs -- i.e. even if a function deeply nested below it causes a |
909 non-local exit back to the top level. | 874 non-local exit back to the top level. |
910 | 875 |
911 Note that this facility has appeared in some recent vintages of C, in | 876 Note that this facility has appeared in some recent vintages of C, in |
912 particular Visual C++ and other PC compilers written for the Microsoft | 877 particular Visual C++ and other PC compilers written for the Microsoft |
913 Win32 API. | 878 Win32 API. |
917 that if you declare a local variable in a particular function, and then | 882 that if you declare a local variable in a particular function, and then |
918 call another function, that subfunction can ``see'' the local variable | 883 call another function, that subfunction can ``see'' the local variable |
919 you declared. This is actually considered a bug in Emacs Lisp and in | 884 you declared. This is actually considered a bug in Emacs Lisp and in |
920 all other early dialects of Lisp, and was corrected in Common Lisp. (In | 885 all other early dialects of Lisp, and was corrected in Common Lisp. (In |
921 Common Lisp, you can still declare dynamically scoped variables if you | 886 Common Lisp, you can still declare dynamically scoped variables if you |
922 want to---they are sometimes useful---but variables by default are | 887 want to -- they are sometimes useful -- but variables by default are |
923 @dfn{lexically scoped} as in C.) | 888 @dfn{lexically scoped} as in C.) |
924 @end enumerate | 889 @end enumerate |
925 | 890 |
926 For those familiar with Lisp, Emacs Lisp is modelled after MacLisp, an | 891 For those familiar with Lisp, Emacs Lisp is modelled after MacLisp, an |
927 early dialect of Lisp developed at MIT (no relation to the Macintosh | 892 early dialect of Lisp developed at MIT (no relation to the Macintosh |
965 Java, which is inexcusable. | 930 Java, which is inexcusable. |
966 @end enumerate | 931 @end enumerate |
967 | 932 |
968 Unfortunately, there is no perfect language. Static typing allows a | 933 Unfortunately, there is no perfect language. Static typing allows a |
969 compiler to catch programmer errors and produce more efficient code, but | 934 compiler to catch programmer errors and produce more efficient code, but |
970 makes programming more tedious and less fun. For the foreseeable future, | 935 makes programming more tedious and less fun. For the forseeable future, |
971 an Ideal Editing and Programming Environment (and that is what XEmacs | 936 an Ideal Editing and Programming Environment (and that is what XEmacs |
972 aspires to) will be programmable in multiple languages: high level ones | 937 aspires to) will be programmable in multiple languages: high level ones |
973 like Lisp for user customization and prototyping, and lower level ones | 938 like Lisp for user customization and prototyping, and lower level ones |
974 for infrastructure and industrial strength applications. If I had my | 939 for infrastructure and industrial strength applications. If I had my |
975 way, XEmacs would be friendly towards the Python, Scheme, C++, ML, | 940 way, XEmacs would be friendly towards the Python, Scheme, C++, ML, |
1275 most other data structures in Lisp. | 1240 most other data structures in Lisp. |
1276 @item char | 1241 @item char |
1277 An object representing a single character of text; chars behave like | 1242 An object representing a single character of text; chars behave like |
1278 integers in many ways but are logically considered text rather than | 1243 integers in many ways but are logically considered text rather than |
1279 numbers and have a different read syntax. (the read syntax for a char | 1244 numbers and have a different read syntax. (the read syntax for a char |
1280 contains the char itself or some textual encoding of it---for example, | 1245 contains the char itself or some textual encoding of it -- for example, |
1281 a Japanese Kanji character might be encoded as @samp{^[$(B#&^[(B} using the | 1246 a Japanese Kanji character might be encoded as @samp{^[$(B#&^[(B} using the |
1282 ISO-2022 encoding standard---rather than the numerical representation | 1247 ISO-2022 encoding standard -- rather than the numerical representation |
1283 of the char; this way, if the mapping between chars and integers | 1248 of the char; this way, if the mapping between chars and integers |
1284 changes, which is quite possible for Kanji characters and other extended | 1249 changes, which is quite possible for Kanji characters and other extended |
1285 characters, the same character will still be created. Note that some | 1250 characters, the same character will still be created. Note that some |
1286 primitives confuse chars and integers. The worst culprit is @code{eq}, | 1251 primitives confuse chars and integers. The worst culprit is @code{eq}, |
1287 which makes a special exception and considers a char to be @code{eq} to | 1252 which makes a special exception and considers a char to be @code{eq} to |
1495 | 1460 |
1496 @example | 1461 @example |
1497 1.983e-4 | 1462 1.983e-4 |
1498 @end example | 1463 @end example |
1499 | 1464 |
1500 converts to a float whose value is 1.983e-4, or .0001983. | 1465 converts to a float whose value is 1983.23e-4, or .0001983. |
1501 | 1466 |
1502 @example | 1467 @example |
1503 ?b | 1468 ?b |
1504 @end example | 1469 @end example |
1505 | 1470 |
1618 | 1583 |
1619 @example | 1584 @example |
1620 [ 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 ] | 1585 [ 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 ] |
1621 [ 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 ] | 1586 [ 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 ] |
1622 | 1587 |
1623 <---------------------------------------------------------> <-> | 1588 <---> ^ <------------------------------------------------------> |
1624 a pointer to a structure, or an integer tag | 1589 tag | a pointer to a structure, or an integer |
1625 @end example | 1590 | |
1626 | 1591 mark bit |
1627 A tag of 00 is used for all pointer object types, a tag of 10 is used | 1592 @end example |
1628 for characters, and the other two tags 01 and 11 are joined together to | 1593 |
1629 form the integer object type. This representation gives us 31 bit | 1594 The tag describes the type of the Lisp object. For integers and chars, |
1630 integers and 30 bit characters, while pointers are represented directly | 1595 the lower 28 bits contain the value of the integer or char; for all |
1631 without any bit masking or shifting. This representation, though, | 1596 others, the lower 28 bits contain a pointer. The mark bit is used |
1632 assumes that pointers to structs are always aligned to multiples of 4, | 1597 during garbage-collection, and is always 0 when garbage collection is |
1633 so the lower 2 bits are always zero. | 1598 not happening. (The way that garbage collection works, basically, is that it |
1599 loops over all places where Lisp objects could exist -- this includes | |
1600 all global variables in C that contain Lisp objects [including | |
1601 @code{Vobarray}, the C equivalent of @code{obarray}; through this, all | |
1602 Lisp variables will get marked], plus various other places -- and | |
1603 recursively scans through the Lisp objects, marking each object it finds | |
1604 by setting the mark bit. Then it goes through the lists of all objects | |
1605 allocated, freeing the ones that are not marked and turning off the mark | |
1606 bit of the ones that are marked.) | |
1634 | 1607 |
1635 Lisp objects use the typedef @code{Lisp_Object}, but the actual C type | 1608 Lisp objects use the typedef @code{Lisp_Object}, but the actual C type |
1636 used for the Lisp object can vary. It can be either a simple type | 1609 used for the Lisp object can vary. It can be either a simple type |
1637 (@code{long} on the DEC Alpha, @code{int} on other machines) or a | 1610 (@code{long} on the DEC Alpha, @code{int} on other machines) or a |
1638 structure whose fields are bit fields that line up properly (actually, a | 1611 structure whose fields are bit fields that line up properly (actually, a |
1639 union of structures is used). Generally the simple integral type is | 1612 union of structures is used). Generally the simple integral type is |
1640 preferable because it ensures that the compiler will actually use a | 1613 preferable because it ensures that the compiler will actually use a |
1641 machine word to represent the object (some compilers will use more | 1614 machine word to represent the object (some compilers will use more |
1642 general and less efficient code for unions and structs even if they can | 1615 general and less efficient code for unions and structs even if they can |
1643 fit in a machine word). The union type, however, has the advantage of | 1616 fit in a machine word). The union type, however, has the advantage of |
1644 stricter type checking. If you accidentally pass an integer where a Lisp | 1617 stricter type checking (if you accidentally pass an integer where a Lisp |
1645 object is desired, you get a compile error. The choice of which type | 1618 object is desired, you get a compile error), and it makes it easier to |
1646 to use is determined by the preprocessor constant @code{USE_UNION_TYPE} | 1619 decode Lisp objects when debugging. The choice of which type to use is |
1647 which is defined via the @code{--use-union-type} option to | 1620 determined by the preprocessor constant @code{USE_UNION_TYPE} which is |
1648 @code{configure}. | 1621 defined via the @code{--use-union-type} option to @code{configure}. |
1649 | 1622 |
1650 Various macros are used to convert between Lisp_Objects and the | 1623 @cindex record type |
1651 corresponding C type. Macros of the form @code{XINT()}, @code{XCHAR()}, | 1624 |
1652 @code{XSTRING()}, @code{XSYMBOL()}, do any required bit shifting and/or | 1625 Note that there are only eight types that the tag can represent, but |
1653 masking and cast it to the appropriate type. @code{XINT()} needs to be | 1626 many more actual types than this. This is handled by having one of the |
1654 a bit tricky so that negative numbers are properly sign-extended. Since | 1627 tag types specify a meta-type called a @dfn{record}; for all such |
1655 integers are stored left-shifted, if the right-shift operator does an | 1628 objects, the first four bytes of the pointed-to structure indicate what |
1656 arithmetic shift (i.e. it leaves the most-significant bit as-is rather | 1629 the actual type is. |
1657 than shifting in a zero, so that it mimics a divide-by-two even for | 1630 |
1658 negative numbers) the shift to remove the tag bit is enough. This is | 1631 Note also that having 28 bits for pointers and integers restricts a lot |
1659 the case on all the systems we support. | 1632 of things to 256 megabytes of memory. (Basically, enough pointers and |
1660 | 1633 indices and whatnot get stuffed into Lisp objects that the total amount |
1661 Note that when @code{ERROR_CHECK_TYPECHECK} is defined, the converter | 1634 of memory used by XEmacs can't grow above 256 megabytes. In older |
1662 macros become more complicated---they check the tag bits and/or the | 1635 versions of XEmacs and GNU Emacs, the tag was 5 bits wide, allowing for |
1636 32 types, which was more than the actual number of types that existed at | |
1637 the time, and no ``record'' type was necessary. However, this limited | |
1638 the editor to 64 megabytes total, which some users who edited large | |
1639 files might conceivably exceed.) | |
1640 | |
1641 Also, note that there is an implicit assumption here that all pointers | |
1642 are low enough that the top bits are all zero and can just be chopped | |
1643 off. On standard machines that allocate memory from the bottom up (and | |
1644 give each process its own address space), this works fine. Some | |
1645 machines, however, put the data space somewhere else in memory | |
1646 (e.g. beginning at 0x80000000). Those machines cope by defining | |
1647 @code{DATA_SEG_BITS} in the corresponding @file{m/} or @file{s/} file to | |
1648 the proper mask. Then, pointers retrieved from Lisp objects are | |
1649 automatically OR'ed with this value prior to being used. | |
1650 | |
1651 A corollary of the previous paragraph is that @strong{(pointers to) | |
1652 stack-allocated structures cannot be put into Lisp objects}. The stack | |
1653 is generally located near the top of memory; if you put such a pointer | |
1654 into a Lisp object, it will get its top bits chopped off, and you will | |
1655 lose. | |
1656 | |
1657 Actually, there's an alternative representation of a @code{Lisp_Object}, | |
1658 invented by Kyle Jones, that is used when the | |
1659 @code{--use-minimal-tagbits} option to @code{configure} is used. In | |
1660 this case the 2 lower bits are used for the tag bits. This | |
1661 representation assumes that pointers to structs are always aligned to | |
1662 multiples of 4, so the lower 2 bits are always zero. | |
1663 | |
1664 @example | |
1665 [ 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 ] | |
1666 [ 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 ] | |
1667 | |
1668 <---------------------------------------------------------> <-> | |
1669 a pointer to a structure, or an integer tag | |
1670 @end example | |
1671 | |
1672 A tag of 00 is used for all pointer object types, a tag of 10 is used | |
1673 for characters, and the other two tags 01 and 11 are joined together to | |
1674 form the integer object type. The markbit is moved to part of the | |
1675 structure being pointed at (integers and chars do not need to be marked, | |
1676 since no memory is allocated). This representation has these | |
1677 advantages: | |
1678 | |
1679 @enumerate | |
1680 @item | |
1681 31 bits can be used for Lisp Integers. | |
1682 @item | |
1683 @emph{Any} pointer can be represented directly, and no bit masking | |
1684 operations are necessary. | |
1685 @end enumerate | |
1686 | |
1687 The disadvantages are: | |
1688 | |
1689 @enumerate | |
1690 @item | |
1691 An extra level of indirection is needed when accessing the object types | |
1692 that were not record types. So checking whether a Lisp object is a cons | |
1693 cell becomes a slower operation. | |
1694 @item | |
1695 Mark bits can no longer be stored directly in Lisp objects, so another | |
1696 place for them must be found. This means that a cons cell requires more | |
1697 memory than merely room for 2 lisp objects, leading to extra memory use. | |
1698 @end enumerate | |
1699 | |
1700 Various macros are used to construct Lisp objects and extract the | |
1701 components. Macros of the form @code{XINT()}, @code{XCHAR()}, | |
1702 @code{XSTRING()}, @code{XSYMBOL()}, etc. mask out the pointer/integer | |
1703 field and cast it to the appropriate type. All of the macros that | |
1704 construct pointers will @code{OR} with @code{DATA_SEG_BITS} if | |
1705 necessary. @code{XINT()} needs to be a bit tricky so that negative | |
1706 numbers are properly sign-extended: Usually it does this by shifting the | |
1707 number four bits to the left and then four bits to the right. This | |
1708 assumes that the right-shift operator does an arithmetic shift (i.e. it | |
1709 leaves the most-significant bit as-is rather than shifting in a zero, so | |
1710 that it mimics a divide-by-two even for negative numbers). Not all | |
1711 machines/compilers do this, and on the ones that don't, a more | |
1712 complicated definition is selected by defining | |
1713 @code{EXPLICIT_SIGN_EXTEND}. | |
1714 | |
1715 Note that when @code{ERROR_CHECK_TYPECHECK} is defined, the extractor | |
1716 macros become more complicated -- they check the tag bits and/or the | |
1663 type field in the first four bytes of a record type to ensure that the | 1717 type field in the first four bytes of a record type to ensure that the |
1664 object is really of the correct type. This is great for catching places | 1718 object is really of the correct type. This is great for catching places |
1665 where an incorrect type is being dereferenced---this typically results | 1719 where an incorrect type is being dereferenced -- this typically results |
1666 in a pointer being dereferenced as the wrong type of structure, with | 1720 in a pointer being dereferenced as the wrong type of structure, with |
1667 unpredictable (and sometimes not easily traceable) results. | 1721 unpredictable (and sometimes not easily traceable) results. |
1668 | 1722 |
1669 There are similar @code{XSET@var{TYPE}()} macros that construct a Lisp | 1723 There are similar @code{XSET@var{TYPE}()} macros that construct a Lisp |
1670 object. These macros are of the form @code{XSET@var{TYPE} | 1724 object. These macros are of the form @code{XSET@var{TYPE} |
1671 (@var{lvalue}, @var{result})}, i.e. they have to be a statement rather | 1725 (@var{lvalue}, @var{result})}, |
1672 than just used in an expression. The reason for this is that standard C | 1726 i.e. they have to be a statement rather than just used in an expression. |
1673 doesn't let you ``construct'' a structure (but GCC does). Granted, this | 1727 The reason for this is that standard C doesn't let you ``construct'' a |
1674 sometimes isn't too convenient; for the case of integers, at least, you | 1728 structure (but GCC does). Granted, this sometimes isn't too convenient; |
1675 can use the function @code{make_int()}, which constructs and | 1729 for the case of integers, at least, you can use the function |
1676 @emph{returns} an integer Lisp object. Note that the | 1730 @code{make_int()}, which constructs and @emph{returns} an integer |
1677 @code{XSET@var{TYPE}()} macros are also affected by | 1731 Lisp object. Note that the @code{XSET@var{TYPE}()} macros are also |
1678 @code{ERROR_CHECK_TYPECHECK} and make sure that the structure is of the | 1732 affected by @code{ERROR_CHECK_TYPECHECK} and make sure that the |
1679 right type in the case of record types, where the type is contained in | 1733 structure is of the right type in the case of record types, where the |
1680 the structure. | 1734 type is contained in the structure. |
1681 | 1735 |
1682 The C programmer is responsible for @strong{guaranteeing} that a | 1736 The C programmer is responsible for @strong{guaranteeing} that a |
1683 Lisp_Object is the correct type before using the @code{X@var{TYPE}} | 1737 Lisp_Object is is the correct type before using the @code{X@var{TYPE}} |
1684 macros. This is especially important in the case of lists. Use | 1738 macros. This is especially important in the case of lists. Use |
1685 @code{XCAR} and @code{XCDR} if a Lisp_Object is certainly a cons cell, | 1739 @code{XCAR} and @code{XCDR} if a Lisp_Object is certainly a cons cell, |
1686 else use @code{Fcar()} and @code{Fcdr()}. Trust other C code, but not | 1740 else use @code{Fcar()} and @code{Fcdr()}. Trust other C code, but not |
1687 Lisp code. On the other hand, if XEmacs has an internal logic error, | 1741 Lisp code. On the other hand, if XEmacs has an internal logic error, |
1688 it's better to crash immediately, so sprinkle @code{assert()}s and | 1742 it's better to crash immediately, so sprinkle ``unreachable'' |
1689 ``unreachable'' @code{abort()}s liberally about the source code. Where | 1743 @code{abort()}s liberally about the source code. |
1690 performance is an issue, use @code{type_checking_assert}, | |
1691 @code{bufpos_checking_assert}, and @code{gc_checking_assert}, which do | |
1692 nothing unless the corresponding configure error checking flag was | |
1693 specified. | |
1694 | 1744 |
1695 @node Rules When Writing New C Code, A Summary of the Various XEmacs Modules, How Lisp Objects Are Represented in C, Top | 1745 @node Rules When Writing New C Code, A Summary of the Various XEmacs Modules, How Lisp Objects Are Represented in C, Top |
1696 @chapter Rules When Writing New C Code | 1746 @chapter Rules When Writing New C Code |
1697 | 1747 |
1698 The XEmacs C Code is extremely complex and intricate, and there are many | 1748 The XEmacs C Code is extremely complex and intricate, and there are many |
1708 * Adding Global Lisp Variables:: | 1758 * Adding Global Lisp Variables:: |
1709 * Coding for Mule:: | 1759 * Coding for Mule:: |
1710 * Techniques for XEmacs Developers:: | 1760 * Techniques for XEmacs Developers:: |
1711 @end menu | 1761 @end menu |
1712 | 1762 |
1713 @node General Coding Rules, Writing Lisp Primitives, Rules When Writing New C Code, Rules When Writing New C Code | 1763 @node General Coding Rules |
1714 @section General Coding Rules | 1764 @section General Coding Rules |
1715 | 1765 |
1716 The C code is actually written in a dialect of C called @dfn{Clean C}, | 1766 The C code is actually written in a dialect of C called @dfn{Clean C}, |
1717 meaning that it can be compiled, mostly warning-free, with either a C or | 1767 meaning that it can be compiled, mostly warning-free, with either a C or |
1718 C++ compiler. Coding in Clean C has several advantages over plain C. | 1768 C++ compiler. Coding in Clean C has several advantages over plain C. |
1741 the same directory as the C sources) and @file{lisp.h}. @file{config.h} | 1791 the same directory as the C sources) and @file{lisp.h}. @file{config.h} |
1742 must always be included before any other header files (including | 1792 must always be included before any other header files (including |
1743 system header files) to ensure that certain tricks played by various | 1793 system header files) to ensure that certain tricks played by various |
1744 @file{s/} and @file{m/} files work out correctly. | 1794 @file{s/} and @file{m/} files work out correctly. |
1745 | 1795 |
1746 When including header files, always use angle brackets, not double | |
1747 quotes, except when the file to be included is always in the same | |
1748 directory as the including file. If either file is a generated file, | |
1749 then that is not likely to be the case. In order to understand why we | |
1750 have this rule, imagine what happens when you do a build in the source | |
1751 directory using @samp{./configure} and another build in another | |
1752 directory using @samp{../work/configure}. There will be two different | |
1753 @file{config.h} files. Which one will be used if you @samp{#include | |
1754 "config.h"}? | |
1755 | |
1756 @strong{All global and static variables that are to be modifiable must | 1796 @strong{All global and static variables that are to be modifiable must |
1757 be declared uninitialized.} This means that you may not use the | 1797 be declared uninitialized.} This means that you may not use the |
1758 ``declare with initializer'' form for these variables, such as @code{int | 1798 ``declare with initializer'' form for these variables, such as @code{int |
1759 some_variable = 0;}. The reason for this has to do with some kludges | 1799 some_variable = 0;}. The reason for this has to do with some kludges |
1760 done during the dumping process: If possible, the initialized data | 1800 done during the dumping process: If possible, the initialized data |
1761 segment is re-mapped so that it becomes part of the (unmodifiable) code | 1801 segment is re-mapped so that it becomes part of the (unmodifiable) code |
1762 segment in the dumped executable. This allows this memory to be shared | 1802 segment in the dumped executable. This allows this memory to be shared |
1763 among multiple running XEmacs processes. XEmacs is careful to place as | 1803 among multiple running XEmacs processes. XEmacs is careful to place as |
1764 much constant data as possible into initialized variables during the | 1804 much constant data as possible into initialized variables (in |
1765 @file{temacs} phase. | 1805 particular, into what's called the @dfn{pure space} -- see below) during |
1806 the @file{temacs} phase. | |
1766 | 1807 |
1767 @cindex copy-on-write | 1808 @cindex copy-on-write |
1768 @strong{Please note:} This kludge only works on a few systems nowadays, | 1809 @strong{Please note:} This kludge only works on a few systems nowadays, |
1769 and is rapidly becoming irrelevant because most modern operating systems | 1810 and is rapidly becoming irrelevant because most modern operating systems |
1770 provide @dfn{copy-on-write} semantics. All data is initially shared | 1811 provide @dfn{copy-on-write} semantics. All data is initially shared |
1794 | 1835 |
1795 The C source code makes heavy use of C preprocessor macros. One popular | 1836 The C source code makes heavy use of C preprocessor macros. One popular |
1796 macro style is: | 1837 macro style is: |
1797 | 1838 |
1798 @example | 1839 @example |
1799 #define FOO(var, value) do @{ \ | 1840 #define FOO(var, value) do @{ \ |
1800 Lisp_Object FOO_value = (value); \ | 1841 Lisp_Object FOO_value = (value); \ |
1801 ... /* compute using FOO_value */ \ | 1842 ... /* compute using FOO_value */ \ |
1802 (var) = bar; \ | 1843 (var) = bar; \ |
1803 @} while (0) | 1844 @} while (0) |
1804 @end example | 1845 @end example |
1805 | 1846 |
1806 The @code{do @{...@} while (0)} is a standard trick to allow FOO to have | 1847 The @code{do @{...@} while (0)} is a standard trick to allow FOO to have |
1807 statement semantics, so that it can safely be used within an @code{if} | 1848 statement semantics, so that it can safely be used within an @code{if} |
1823 case of @code{GET_EXTERNAL_LIST_LENGTH}, validating the properness of | 1864 case of @code{GET_EXTERNAL_LIST_LENGTH}, validating the properness of |
1824 the list. The macros @code{EXTERNAL_LIST_LOOP_DELETE_IF} and | 1865 the list. The macros @code{EXTERNAL_LIST_LOOP_DELETE_IF} and |
1825 @code{LIST_LOOP_DELETE_IF} delete elements from a lisp list satisfying some | 1866 @code{LIST_LOOP_DELETE_IF} delete elements from a lisp list satisfying some |
1826 predicate. | 1867 predicate. |
1827 | 1868 |
1828 @node Writing Lisp Primitives, Adding Global Lisp Variables, General Coding Rules, Rules When Writing New C Code | 1869 @node Writing Lisp Primitives |
1829 @section Writing Lisp Primitives | 1870 @section Writing Lisp Primitives |
1830 | 1871 |
1831 Lisp primitives are Lisp functions implemented in C. The details of | 1872 Lisp primitives are Lisp functions implemented in C. The details of |
1832 interfacing the C function so that Lisp can call it are handled by a few | 1873 interfacing the C function so that Lisp can call it are handled by a few |
1833 C macros. The only way to really understand how to write new C code is | 1874 C macros. The only way to really understand how to write new C code is |
2067 | 2108 |
2068 @file{eval.c} is a very good file to look through for examples; | 2109 @file{eval.c} is a very good file to look through for examples; |
2069 @file{lisp.h} contains the definitions for important macros and | 2110 @file{lisp.h} contains the definitions for important macros and |
2070 functions. | 2111 functions. |
2071 | 2112 |
2072 @node Adding Global Lisp Variables, Coding for Mule, Writing Lisp Primitives, Rules When Writing New C Code | 2113 @node Adding Global Lisp Variables |
2073 @section Adding Global Lisp Variables | 2114 @section Adding Global Lisp Variables |
2074 | 2115 |
2075 Global variables whose names begin with @samp{Q} are constants whose | 2116 Global variables whose names begin with @samp{Q} are constants whose |
2076 value is a symbol of a particular name. The name of the variable should | 2117 value is a symbol of a particular name. The name of the variable should |
2077 be derived from the name of the symbol using the same rules as for Lisp | 2118 be derived from the name of the symbol using the same rules as for Lisp |
2129 garbage-collection mechanism won't know that the object in this variable | 2170 garbage-collection mechanism won't know that the object in this variable |
2130 is in use, and will happily collect it and reuse its storage for another | 2171 is in use, and will happily collect it and reuse its storage for another |
2131 Lisp object, and you will be the one who's unhappy when you can't figure | 2172 Lisp object, and you will be the one who's unhappy when you can't figure |
2132 out how your variable got overwritten. | 2173 out how your variable got overwritten. |
2133 | 2174 |
2134 @node Coding for Mule, Techniques for XEmacs Developers, Adding Global Lisp Variables, Rules When Writing New C Code | 2175 @node Coding for Mule |
2135 @section Coding for Mule | 2176 @section Coding for Mule |
2136 @cindex Coding for Mule | 2177 @cindex Coding for Mule |
2137 | 2178 |
2138 Although Mule support is not compiled by default in XEmacs, many people | 2179 Although Mule support is not compiled by default in XEmacs, many people |
2139 are using it, and we consider it crucial that new code works correctly | 2180 are using it, and we consider it crucial that new code works correctly |
2152 * Conversion to and from External Data:: | 2193 * Conversion to and from External Data:: |
2153 * General Guidelines for Writing Mule-Aware Code:: | 2194 * General Guidelines for Writing Mule-Aware Code:: |
2154 * An Example of Mule-Aware Code:: | 2195 * An Example of Mule-Aware Code:: |
2155 @end menu | 2196 @end menu |
2156 | 2197 |
2157 @node Character-Related Data Types, Working With Character and Byte Positions, Coding for Mule, Coding for Mule | 2198 @node Character-Related Data Types |
2158 @subsection Character-Related Data Types | 2199 @subsection Character-Related Data Types |
2159 | 2200 |
2160 First, let's review the basic character-related datatypes used by | 2201 First, let's review the basic character-related datatypes used by |
2161 XEmacs. Note that the separate @code{typedef}s are not mandatory in the | 2202 XEmacs. Note that the separate @code{typedef}s are not mandatory in the |
2162 current implementation (all of them boil down to @code{unsigned char} or | 2203 current implementation (all of them boil down to @code{unsigned char} or |
2179 @item Bufbyte | 2220 @item Bufbyte |
2180 @cindex Bufbyte | 2221 @cindex Bufbyte |
2181 The data representing the text in a buffer or string is logically a set | 2222 The data representing the text in a buffer or string is logically a set |
2182 of @code{Bufbyte}s. | 2223 of @code{Bufbyte}s. |
2183 | 2224 |
2184 XEmacs does not work with the same character formats all the time; when | 2225 XEmacs does not work with character formats all the time; when reading |
2185 reading characters from the outside, it decodes them to an internal | 2226 characters from the outside, it decodes them to an internal format, and |
2186 format, and likewise encodes them when writing. @code{Bufbyte} (in fact | 2227 likewise encodes them when writing. @code{Bufbyte} (in fact |
2187 @code{unsigned char}) is the basic unit of XEmacs internal buffers and | 2228 @code{unsigned char}) is the basic unit of XEmacs internal buffers and |
2188 strings format. A @code{Bufbyte *} is the type that points at text | 2229 strings format. |
2189 encoded in the variable-width internal encoding. | |
2190 | 2230 |
2191 One character can correspond to one or more @code{Bufbyte}s. In the | 2231 One character can correspond to one or more @code{Bufbyte}s. In the |
2192 current Mule implementation, an ASCII character is represented by the | 2232 current implementation, an ASCII character is represented by the same |
2193 same @code{Bufbyte}, and other characters are represented by a sequence | 2233 @code{Bufbyte}, and extended characters are represented by a sequence of |
2194 of two or more @code{Bufbyte}s. | 2234 @code{Bufbyte}s. |
2195 | 2235 |
2196 Without Mule support, there are exactly 256 characters, implicitly | 2236 Without Mule support, a @code{Bufbyte} is equivalent to an |
2197 Latin-1, and each character is represented using one @code{Bufbyte}, and | 2237 @code{Emchar}. |
2198 there is a one-to-one correspondence between @code{Bufbyte}s and | |
2199 @code{Emchar}s. | |
2200 | 2238 |
2201 @item Bufpos | 2239 @item Bufpos |
2202 @itemx Charcount | 2240 @itemx Charcount |
2203 @cindex Bufpos | 2241 @cindex Bufpos |
2204 @cindex Charcount | 2242 @cindex Charcount |
2205 A @code{Bufpos} represents a character position in a buffer or string. | 2243 A @code{Bufpos} represents a character position in a buffer or string. |
2206 A @code{Charcount} represents a number (count) of characters. | 2244 A @code{Charcount} represents a number (count) of characters. |
2207 Logically, subtracting two @code{Bufpos} values yields a | 2245 Logically, subtracting two @code{Bufpos} values yields a |
2208 @code{Charcount} value. Although all of these are @code{typedef}ed to | 2246 @code{Charcount} value. Although all of these are @code{typedef}ed to |
2209 @code{EMACS_INT}, we use them in preference to @code{EMACS_INT} to make | 2247 @code{int}, we use them in preference to @code{int} to make it clear |
2210 it clear what sort of position is being used. | 2248 what sort of position is being used. |
2211 | 2249 |
2212 @code{Bufpos} and @code{Charcount} values are the only ones that are | 2250 @code{Bufpos} and @code{Charcount} values are the only ones that are |
2213 ever visible to Lisp. | 2251 ever visible to Lisp. |
2214 | 2252 |
2215 @item Bytind | 2253 @item Bytind |
2216 @itemx Bytecount | 2254 @itemx Bytecount |
2217 @cindex Bytind | 2255 @cindex Bytind |
2218 @cindex Bytecount | 2256 @cindex Bytecount |
2219 A @code{Bytind} represents a byte position in a buffer or string. A | 2257 A @code{Bytind} represents a byte position in a buffer or string. A |
2220 @code{Bytecount} represents the distance between two positions, in bytes. | 2258 @code{Bytecount} represents the distance between two positions in bytes. |
2221 The relationship between @code{Bytind} and @code{Bytecount} is the same | 2259 The relationship between @code{Bytind} and @code{Bytecount} is the same |
2222 as the relationship between @code{Bufpos} and @code{Charcount}. | 2260 as the relationship between @code{Bufpos} and @code{Charcount}. |
2223 | 2261 |
2224 @item Extbyte | 2262 @item Extbyte |
2225 @itemx Extcount | 2263 @itemx Extcount |
2229 which are equivalent to @code{unsigned char}. Obviously, an | 2267 which are equivalent to @code{unsigned char}. Obviously, an |
2230 @code{Extcount} is the distance between two @code{Extbyte}s. Extbytes | 2268 @code{Extcount} is the distance between two @code{Extbyte}s. Extbytes |
2231 and Extcounts are not all that frequent in XEmacs code. | 2269 and Extcounts are not all that frequent in XEmacs code. |
2232 @end table | 2270 @end table |
2233 | 2271 |
2234 @node Working With Character and Byte Positions, Conversion to and from External Data, Character-Related Data Types, Coding for Mule | 2272 @node Working With Character and Byte Positions |
2235 @subsection Working With Character and Byte Positions | 2273 @subsection Working With Character and Byte Positions |
2236 | 2274 |
2237 Now that we have defined the basic character-related types, we can look | 2275 Now that we have defined the basic character-related types, we can look |
2238 at the macros and functions designed for work with them and for | 2276 at the macros and functions designed for work with them and for |
2239 conversion between them. Most of these macros are defined in | 2277 conversion between them. Most of these macros are defined in |
2242 learn about them. | 2280 learn about them. |
2243 | 2281 |
2244 @table @code | 2282 @table @code |
2245 @item MAX_EMCHAR_LEN | 2283 @item MAX_EMCHAR_LEN |
2246 @cindex MAX_EMCHAR_LEN | 2284 @cindex MAX_EMCHAR_LEN |
2247 This preprocessor constant is the maximum number of buffer bytes to | 2285 This preprocessor constant is the maximum number of buffer bytes per |
2248 represent an Emacs character in the variable width internal encoding. | 2286 Emacs character, i.e. the byte length of an @code{Emchar}. It is useful |
2249 It is useful when allocating temporary strings to keep a known number of | 2287 when allocating temporary strings to keep a known number of characters. |
2250 characters. For instance: | 2288 For instance: |
2251 | 2289 |
2252 @example | 2290 @example |
2253 @group | 2291 @group |
2254 @{ | 2292 @{ |
2255 Charcount cclen; | 2293 Charcount cclen; |
2353 @example | 2391 @example |
2354 Bufbyte *charptr_n_addr (Bufbyte *p, Charcount cc); | 2392 Bufbyte *charptr_n_addr (Bufbyte *p, Charcount cc); |
2355 @end example | 2393 @end example |
2356 @end table | 2394 @end table |
2357 | 2395 |
2358 @node Conversion to and from External Data, General Guidelines for Writing Mule-Aware Code, Working With Character and Byte Positions, Coding for Mule | 2396 @node Conversion to and from External Data |
2359 @subsection Conversion to and from External Data | 2397 @subsection Conversion to and from External Data |
2360 | 2398 |
2361 When an external function, such as a C library function, returns a | 2399 When an external function, such as a C library function, returns a |
2362 @code{char} pointer, you should almost never treat it as @code{Bufbyte}. | 2400 @code{char} pointer, you should almost never treat it as @code{Bufbyte}. |
2363 This is because these returned strings may contain 8bit characters which | 2401 This is because these returned strings may contain 8bit characters which |
2366 always convert it to an appropriate external encoding, lest the internal | 2404 always convert it to an appropriate external encoding, lest the internal |
2367 stuff (such as the infamous \201 characters) leak out. | 2405 stuff (such as the infamous \201 characters) leak out. |
2368 | 2406 |
2369 The interface to conversion between the internal and external | 2407 The interface to conversion between the internal and external |
2370 representations of text are the numerous conversion macros defined in | 2408 representations of text are the numerous conversion macros defined in |
2371 @file{buffer.h}. There used to be a fixed set of external formats | 2409 @file{buffer.h}. Before looking at them, we'll look at the external |
2372 supported by these macros, but now any coding system can be used with | 2410 formats supported by these macros. |
2373 these macros. The coding system alias mechanism is used to create the | 2411 |
2374 following logical coding systems, which replace the fixed external | 2412 Currently meaningful formats are @code{FORMAT_BINARY}, |
2375 formats. The (dontusethis-set-symbol-value-handler) mechanism was | 2413 @code{FORMAT_FILENAME}, @code{FORMAT_OS}, and @code{FORMAT_CTEXT}. Here |
2376 enhanced to make this possible (more work on that is needed - like | 2414 is a description of these. |
2377 remove the @code{dontusethis-} prefix). | |
2378 | 2415 |
2379 @table @code | 2416 @table @code |
2380 @item Qbinary | 2417 @item FORMAT_BINARY |
2381 This is the simplest format and is what we use in the absence of a more | 2418 Binary format. This is the simplest format and is what we use in the |
2382 appropriate format. This converts according to the @code{binary} coding | 2419 absence of a more appropriate format. This converts according to the |
2383 system: | 2420 @code{binary} coding system: |
2384 | 2421 |
2385 @enumerate a | 2422 @enumerate a |
2386 @item | 2423 @item |
2387 On input, bytes 0--255 are converted into (implicitly Latin-1) | 2424 On input, bytes 0--255 are converted into characters 0--255. |
2388 characters 0--255. A non-Mule xemacs doesn't really know about | |
2389 different character sets and the fonts to display them, so the bytes can | |
2390 be treated as text in different 1-byte encodings by simply setting the | |
2391 appropriate fonts. So in a sense, non-Mule xemacs is a multi-lingual | |
2392 editor if, for example, different fonts are used to display text in | |
2393 different buffers, faces, or windows. The specifier mechanism gives the | |
2394 user complete control over this kind of behavior. | |
2395 @item | 2425 @item |
2396 On output, characters 0--255 are converted into bytes 0--255 and other | 2426 On output, characters 0--255 are converted into bytes 0--255 and other |
2397 characters are converted into `~'. | 2427 characters are converted into `X'. |
2398 @end enumerate | 2428 @end enumerate |
2399 | 2429 |
2400 @item Qfile_name | 2430 @item FORMAT_FILENAME |
2401 Format used for filenames. This is user-definable via either the | 2431 Format used for filenames. In the original Mule, this is user-definable |
2402 @code{file-name-coding-system} or @code{pathname-coding-system} (now | 2432 with the @code{pathname-coding-system} variable. For the moment, we |
2403 obsolete) variables. | 2433 just use the @code{binary} coding system. |
2404 | 2434 |
2405 @item Qnative | 2435 @item FORMAT_OS |
2406 Format used for the external Unix environment---@code{argv[]}, stuff | 2436 Format used for the external Unix environment---@code{argv[]}, stuff |
2407 from @code{getenv()}, stuff from the @file{/etc/passwd} file, etc. | 2437 from @code{getenv()}, stuff from the @file{/etc/passwd} file, etc. |
2408 Currently this is the same as Qfile_name. The two should be | 2438 |
2409 distinguished for clarity and possible future separation. | 2439 Perhaps should be the same as FORMAT_FILENAME. |
2410 | 2440 |
2411 @item Qctext | 2441 @item FORMAT_CTEXT |
2412 Compound--text format. This is the standard X11 format used for data | 2442 Compound--text format. This is the standard X format used for data |
2413 stored in properties, selections, and the like. This is an 8-bit | 2443 stored in properties, selections, and the like. This is an 8-bit |
2414 no-lock-shift ISO2022 coding system. This is a real coding system, | 2444 no-lock-shift ISO2022 coding system. |
2415 unlike Qfile_name, which is user-definable. | |
2416 @end table | 2445 @end table |
2417 | 2446 |
2418 There are two fundamental macros to convert between external and | 2447 The macros to convert between these formats and the internal format, and |
2419 internal format. | 2448 vice versa, follow. |
2420 | |
2421 @code{TO_INTERNAL_FORMAT} converts external data to internal format, and | |
2422 @code{TO_EXTERNAL_FORMAT} converts the other way around. The arguments | |
2423 each of these receives are a source type, a source, a sink type, a sink, | |
2424 and a coding system (or a symbol naming a coding system). | |
2425 | |
2426 A typical call looks like | |
2427 @example | |
2428 TO_EXTERNAL_FORMAT (LISP_STRING, str, C_STRING_MALLOC, ptr, Qfile_name); | |
2429 @end example | |
2430 | |
2431 which means that the contents of the lisp string @code{str} are written | |
2432 to a malloc'ed memory area which will be pointed to by @code{ptr}, after | |
2433 the function returns. The conversion will be done using the | |
2434 @code{file-name} coding system, which will be controlled by the user | |
2435 indirectly by setting or binding the variable | |
2436 @code{file-name-coding-system}. | |
2437 | |
2438 Some sources and sinks require two C variables to specify. We use some | |
2439 preprocessor magic to allow different source and sink types, and even | |
2440 different numbers of arguments to specify different types of sources and | |
2441 sinks. | |
2442 | |
2443 So we can have a call that looks like | |
2444 @example | |
2445 TO_INTERNAL_FORMAT (DATA, (ptr, len), | |
2446 MALLOC, (ptr, len), | |
2447 coding_system); | |
2448 @end example | |
2449 | |
2450 The parenthesized argument pairs are required to make the preprocessor | |
2451 magic work. | |
2452 | |
2453 Here are the different source and sink types: | |
2454 | 2449 |
2455 @table @code | 2450 @table @code |
2456 @item @code{DATA, (ptr, len),} | 2451 @item GET_CHARPTR_INT_DATA_ALLOCA |
2457 input data is a fixed buffer of size @var{len} at address @var{ptr} | 2452 @itemx GET_CHARPTR_EXT_DATA_ALLOCA |
2458 @item @code{ALLOCA, (ptr, len),} | 2453 These two are the most basic conversion macros. |
2459 output data is placed in an alloca()ed buffer of size @var{len} pointed to by @var{ptr} | 2454 @code{GET_CHARPTR_INT_DATA_ALLOCA} converts external data to internal |
2460 @item @code{MALLOC, (ptr, len),} | 2455 format, and @code{GET_CHARPTR_EXT_DATA_ALLOCA} converts the other way |
2461 output data is in a malloc()ed buffer of size @var{len} pointed to by @var{ptr} | 2456 around. The arguments each of these receives are @var{ptr} (pointer to |
2462 @item @code{C_STRING_ALLOCA, ptr,} | 2457 the text in external format), @var{len} (length of texts in bytes), |
2463 equivalent to @code{ALLOCA (ptr, len_ignored)} on output. | 2458 @var{fmt} (format of the external text), @var{ptr_out} (lvalue to which |
2464 @item @code{C_STRING_MALLOC, ptr,} | 2459 new text should be copied), and @var{len_out} (lvalue which will be |
2465 equivalent to @code{MALLOC (ptr, len_ignored)} on output | 2460 assigned the length of the internal text in bytes). The resulting text |
2466 @item @code{C_STRING, ptr,} | 2461 is stored to a stack-allocated buffer. If the text doesn't need |
2467 equivalent to @code{DATA, (ptr, strlen (ptr) + 1)} on input | 2462 changing, these macros will do nothing, except for setting |
2468 @item @code{LISP_STRING, string,} | 2463 @var{len_out}. |
2469 input or output is a Lisp_Object of type string | 2464 |
2470 @item @code{LISP_BUFFER, buffer,} | 2465 The macros above take many arguments which makes them unwieldy. For |
2471 output is written to @code{(point)} in lisp buffer @var{buffer} | 2466 this reason, a number of convenience macros are defined with obvious |
2472 @item @code{LISP_LSTREAM, lstream,} | 2467 functionality, but accepting less arguments. The general rule is that |
2473 input or output is a Lisp_Object of type lstream | 2468 macros with @samp{INT} in their name convert text to internal Emacs |
2474 @item @code{LISP_OPAQUE, object,} | 2469 representation, whereas the @samp{EXT} macros convert to external |
2475 input or output is a Lisp_Object of type opaque | 2470 representation. |
2471 | |
2472 @item GET_C_CHARPTR_INT_DATA_ALLOCA | |
2473 @itemx GET_C_CHARPTR_EXT_DATA_ALLOCA | |
2474 As their names imply, these macros work on C char pointers, which are | |
2475 zero-terminated, and thus do not need @var{len} or @var{len_out} | |
2476 parameters. | |
2477 | |
2478 @item GET_STRING_EXT_DATA_ALLOCA | |
2479 @itemx GET_C_STRING_EXT_DATA_ALLOCA | |
2480 These two macros convert a Lisp string into an external representation. | |
2481 The difference between them is that @code{GET_STRING_EXT_DATA_ALLOCA} | |
2482 stores its output to a generic string, providing @var{len_out}, the | |
2483 length of the resulting external string. On the other hand, | |
2484 @code{GET_C_STRING_EXT_DATA_ALLOCA} assumes that the caller will be | |
2485 satisfied with output string being zero-terminated. | |
2486 | |
2487 Note that for Lisp strings only one conversion direction makes sense. | |
2488 | |
2489 @item GET_C_CHARPTR_EXT_BINARY_DATA_ALLOCA | |
2490 @itemx GET_CHARPTR_EXT_BINARY_DATA_ALLOCA | |
2491 @itemx GET_STRING_BINARY_DATA_ALLOCA | |
2492 @itemx GET_C_STRING_BINARY_DATA_ALLOCA | |
2493 @itemx GET_C_CHARPTR_EXT_FILENAME_DATA_ALLOCA | |
2494 @itemx ... | |
2495 These macros convert internal text to a specific external | |
2496 representation, with the external format being encoded into the name of | |
2497 the macro. Note that the @code{GET_STRING_...} and | |
2498 @code{GET_C_STRING...} macros lack the @samp{EXT} tag, because they | |
2499 only make sense in that direction. | |
2500 | |
2501 @item GET_C_CHARPTR_INT_BINARY_DATA_ALLOCA | |
2502 @itemx GET_CHARPTR_INT_BINARY_DATA_ALLOCA | |
2503 @itemx GET_C_CHARPTR_INT_FILENAME_DATA_ALLOCA | |
2504 @itemx ... | |
2505 These macros convert external text of a specific format to its internal | |
2506 representation, with the external format being incoded into the name of | |
2507 the macro. | |
2476 @end table | 2508 @end table |
2477 | 2509 |
2478 Often, the data is being converted to a '\0'-byte-terminated string, | 2510 @node General Guidelines for Writing Mule-Aware Code |
2479 which is the format required by many external system C APIs. For these | |
2480 purposes, a source type of @code{C_STRING} or a sink type of | |
2481 @code{C_STRING_ALLOCA} or @code{C_STRING_MALLOC} is appropriate. | |
2482 Otherwise, we should try to keep XEmacs '\0'-byte-clean, which means | |
2483 using (ptr, len) pairs. | |
2484 | |
2485 The sinks to be specified must be lvalues, unless they are the lisp | |
2486 object types @code{LISP_LSTREAM} or @code{LISP_BUFFER}. | |
2487 | |
2488 For the sink types @code{ALLOCA} and @code{C_STRING_ALLOCA}, the | |
2489 resulting text is stored in a stack-allocated buffer, which is | |
2490 automatically freed on returning from the function. However, the sink | |
2491 types @code{MALLOC} and @code{C_STRING_MALLOC} return @code{xmalloc()}ed | |
2492 memory. The caller is responsible for freeing this memory using | |
2493 @code{xfree()}. | |
2494 | |
2495 Note that it doesn't make sense for @code{LISP_STRING} to be a source | |
2496 for @code{TO_INTERNAL_FORMAT} or a sink for @code{TO_EXTERNAL_FORMAT}. | |
2497 You'll get an assertion failure if you try. | |
2498 | |
2499 | |
2500 @node General Guidelines for Writing Mule-Aware Code, An Example of Mule-Aware Code, Conversion to and from External Data, Coding for Mule | |
2501 @subsection General Guidelines for Writing Mule-Aware Code | 2511 @subsection General Guidelines for Writing Mule-Aware Code |
2502 | 2512 |
2503 This section contains some general guidance on how to write Mule-aware | 2513 This section contains some general guidance on how to write Mule-aware |
2504 code, as well as some pitfalls you should avoid. | 2514 code, as well as some pitfalls you should avoid. |
2505 | 2515 |
2522 It is extremely important to always convert external data, because | 2532 It is extremely important to always convert external data, because |
2523 XEmacs can crash if unexpected 8bit sequences are copied to its internal | 2533 XEmacs can crash if unexpected 8bit sequences are copied to its internal |
2524 buffers literally. | 2534 buffers literally. |
2525 | 2535 |
2526 This means that when a system function, such as @code{readdir}, returns | 2536 This means that when a system function, such as @code{readdir}, returns |
2527 a string, you may need to convert it using one of the conversion macros | 2537 a string, you need to convert it using one of the conversion macros |
2528 described in the previous chapter, before passing it further to Lisp. | 2538 described in the previous chapter, before passing it further to Lisp. |
2529 | 2539 In the case of @code{readdir}, you would use the |
2530 Actually, most of the basic system functions that accept '\0'-terminated | 2540 @code{GET_C_CHARPTR_INT_FILENAME_DATA_ALLOCA} macro. |
2531 string arguments, like @code{stat()} and @code{open()}, have been | |
2532 @strong{encapsulated} so that they are they @code{always} do internal to | |
2533 external conversion themselves. This means you must pass internally | |
2534 encoded data, typically the @code{XSTRING_DATA} of a Lisp_String to | |
2535 these functions. This is actually a design bug, since it unexpectedly | |
2536 changes the semantics of the system functions. A better design would be | |
2537 to provide separate versions of these system functions that accepted | |
2538 Lisp_Objects which were lisp strings in place of their current | |
2539 @code{char *} arguments. | |
2540 | |
2541 @example | |
2542 int stat_lisp (Lisp_Object path, struct stat *buf); /* Implement me */ | |
2543 @end example | |
2544 | 2541 |
2545 Also note that many internal functions, such as @code{make_string}, | 2542 Also note that many internal functions, such as @code{make_string}, |
2546 accept Bufbytes, which removes the need for them to convert the data | 2543 accept Bufbytes, which removes the need for them to convert the data |
2547 they receive. This increases efficiency because that way external data | 2544 they receive. This increases efficiency because that way external data |
2548 needs to be decoded only once, when it is read. After that, it is | 2545 needs to be decoded only once, when it is read. After that, it is |
2549 passed around in internal format. | 2546 passed around in internal format. |
2550 @end table | 2547 @end table |
2551 | 2548 |
2552 @node An Example of Mule-Aware Code, , General Guidelines for Writing Mule-Aware Code, Coding for Mule | 2549 @node An Example of Mule-Aware Code |
2553 @subsection An Example of Mule-Aware Code | 2550 @subsection An Example of Mule-Aware Code |
2554 | 2551 |
2555 As an example of Mule-aware code, we will analyze the @code{string} | 2552 As an example of Mule-aware code, we shall will analyze the |
2556 function, which conses up a Lisp string from the character arguments it | 2553 @code{string} function, which conses up a Lisp string from the character |
2557 receives. Here is the definition, pasted from @code{alloc.c}: | 2554 arguments it receives. Here is the definition, pasted from |
2555 @code{alloc.c}: | |
2558 | 2556 |
2559 @example | 2557 @example |
2560 @group | 2558 @group |
2561 DEFUN ("string", Fstring, 0, MANY, 0, /* | 2559 DEFUN ("string", Fstring, 0, MANY, 0, /* |
2562 Concatenate all the argument characters and make the result a string. | 2560 Concatenate all the argument characters and make the result a string. |
2597 over the XEmacs code. For starters, I recommend | 2595 over the XEmacs code. For starters, I recommend |
2598 @code{Fnormalize_menu_item_name} in @file{menubar.c}. After you have | 2596 @code{Fnormalize_menu_item_name} in @file{menubar.c}. After you have |
2599 understood this section of the manual and studied the examples, you can | 2597 understood this section of the manual and studied the examples, you can |
2600 proceed writing new Mule-aware code. | 2598 proceed writing new Mule-aware code. |
2601 | 2599 |
2602 @node Techniques for XEmacs Developers, , Coding for Mule, Rules When Writing New C Code | 2600 @node Techniques for XEmacs Developers |
2603 @section Techniques for XEmacs Developers | 2601 @section Techniques for XEmacs Developers |
2604 | 2602 |
2605 To make a purified XEmacs, do: @code{make puremacs}. | |
2606 To make a quantified XEmacs, do: @code{make quantmacs}. | 2603 To make a quantified XEmacs, do: @code{make quantmacs}. |
2607 | 2604 |
2608 You simply can't dump Quantified and Purified images (unless using the | 2605 You simply can't dump Quantified and Purified images. Run the image |
2609 portable dumper). Purify gets confused when xemacs frees memory in one | 2606 like so: @code{quantmacs -batch -l loadup.el run-temacs @var{xemacs-args...}}. |
2610 process that was allocated in a @emph{different} process on a different | |
2611 machine!. Run it like so: | |
2612 @example | |
2613 temacs -batch -l loadup.el run-temacs @var{xemacs-args...} | |
2614 @end example | |
2615 | 2607 |
2616 Before you go through the trouble, are you compiling with all | 2608 Before you go through the trouble, are you compiling with all |
2617 debugging and error-checking off? If not, try that first. Be warned | 2609 debugging and error-checking off? If not try that first. Be warned |
2618 that while Quantify is directly responsible for quite a few | 2610 that while Quantify is directly responsible for quite a few |
2619 optimizations which have been made to XEmacs, doing a run which | 2611 optimizations which have been made to XEmacs, doing a run which |
2620 generates results which can be acted upon is not necessarily a trivial | 2612 generates results which can be acted upon is not necessarily a trivial |
2621 task. | 2613 task. |
2622 | 2614 |
2651 | 2643 |
2652 Unfortunately, Emacs Lisp is slow, and is going to stay slow. Function | 2644 Unfortunately, Emacs Lisp is slow, and is going to stay slow. Function |
2653 calls in elisp are especially expensive. Iterating over a long list is | 2645 calls in elisp are especially expensive. Iterating over a long list is |
2654 going to be 30 times faster implemented in C than in Elisp. | 2646 going to be 30 times faster implemented in C than in Elisp. |
2655 | 2647 |
2656 Heavily used small code fragments need to be fast. The traditional way | 2648 To get started debugging XEmacs, take a look at the @file{gdbinit} and |
2657 to implement such code fragments in C is with macros. But macros in C | 2649 @file{dbxrc} files in the @file{src} directory. |
2658 are known to be broken. | 2650 @xref{Q2.1.15 - How to Debug an XEmacs problem with a debugger,,, |
2659 | 2651 xemacs-faq, XEmacs FAQ}. |
2660 Macro arguments that are repeatedly evaluated may suffer from repeated | |
2661 side effects or suboptimal performance. | |
2662 | |
2663 Variable names used in macros may collide with caller's variables, | |
2664 causing (at least) unwanted compiler warnings. | |
2665 | |
2666 In order to solve these problems, and maintain statement semantics, one | |
2667 should use the @code{do @{ ... @} while (0)} trick while trying to | |
2668 reference macro arguments exactly once using local variables. | |
2669 | |
2670 Let's take a look at this poor macro definition: | |
2671 | |
2672 @example | |
2673 #define MARK_OBJECT(obj) \ | |
2674 if (!marked_p (obj)) mark_object (obj), did_mark = 1 | |
2675 @end example | |
2676 | |
2677 This macro evaluates its argument twice, and also fails if used like this: | |
2678 @example | |
2679 if (flag) MARK_OBJECT (obj); else do_something(); | |
2680 @end example | |
2681 | |
2682 A much better definition is | |
2683 | |
2684 @example | |
2685 #define MARK_OBJECT(obj) do @{ \ | |
2686 Lisp_Object mo_obj = (obj); \ | |
2687 if (!marked_p (mo_obj)) \ | |
2688 @{ \ | |
2689 mark_object (mo_obj); \ | |
2690 did_mark = 1; \ | |
2691 @} \ | |
2692 @} while (0) | |
2693 @end example | |
2694 | |
2695 Notice the elimination of double evaluation by using the local variable | |
2696 with the obscure name. Writing safe and efficient macros requires great | |
2697 care. The one problem with macros that cannot be portably worked around | |
2698 is, since a C block has no value, a macro used as an expression rather | |
2699 than a statement cannot use the techniques just described to avoid | |
2700 multiple evaluation. | |
2701 | |
2702 In most cases where a macro has function semantics, an inline function | |
2703 is a better implementation technique. Modern compiler optimizers tend | |
2704 to inline functions even if they have no @code{inline} keyword, and | |
2705 configure magic ensures that the @code{inline} keyword can be safely | |
2706 used as an additional compiler hint. Inline functions used in a single | |
2707 .c files are easy. The function must already be defined to be | |
2708 @code{static}. Just add another @code{inline} keyword to the | |
2709 definition. | |
2710 | |
2711 @example | |
2712 inline static int | |
2713 heavily_used_small_function (int arg) | |
2714 @{ | |
2715 ... | |
2716 @} | |
2717 @end example | |
2718 | |
2719 Inline functions in header files are trickier, because we would like to | |
2720 make the following optimization if the function is @emph{not} inlined | |
2721 (for example, because we're compiling for debugging). We would like the | |
2722 function to be defined externally exactly once, and each calling | |
2723 translation unit would create an external reference to the function, | |
2724 instead of including a definition of the inline function in the object | |
2725 code of every translation unit that uses it. This optimization is | |
2726 currently only available for gcc. But you don't have to worry about the | |
2727 trickiness; just define your inline functions in header files using this | |
2728 pattern: | |
2729 | |
2730 @example | |
2731 INLINE_HEADER int | |
2732 i_used_to_be_a_crufty_macro_but_look_at_me_now (int arg); | |
2733 INLINE_HEADER int | |
2734 i_used_to_be_a_crufty_macro_but_look_at_me_now (int arg) | |
2735 @{ | |
2736 ... | |
2737 @} | |
2738 @end example | |
2739 | |
2740 The declaration right before the definition is to prevent warnings when | |
2741 compiling with @code{gcc -Wmissing-declarations}. I consider issuing | |
2742 this warning for inline functions a gcc bug, but the gcc maintainers disagree. | |
2743 | |
2744 Every header which contains inline functions, either directly by using | |
2745 @code{INLINE_HEADER} or indirectly by using @code{DECLARE_LRECORD} must | |
2746 be added to @file{inline.c}'s includes to make the optimization | |
2747 described above work. (Optimization note: if all INLINE_HEADER | |
2748 functions are in fact inlined in all translation units, then the linker | |
2749 can just discard @code{inline.o}, since it contains only unreferenced code). | |
2750 | |
2751 To get started debugging XEmacs, take a look at the @file{.gdbinit} and | |
2752 @file{.dbxrc} files in the @file{src} directory. See the section in the | |
2753 XEmacs FAQ on How to Debug an XEmacs problem with a debugger. | |
2754 | 2652 |
2755 After making source code changes, run @code{make check} to ensure that | 2653 After making source code changes, run @code{make check} to ensure that |
2756 you haven't introduced any regressions. If you want to make xemacs more | 2654 you haven't introduced any regressions. If you're feeling ambitious, |
2757 reliable, please improve the test suite in @file{tests/automated}. | 2655 you can try to improve the test suite in @file{tests/automated}. |
2758 | |
2759 Did you make sure you didn't introduce any new compiler warnings? | |
2760 | |
2761 Before submitting a patch, please try compiling at least once with | |
2762 | |
2763 @example | |
2764 configure --with-mule --with-union-type --error-checking=all | |
2765 @end example | |
2766 | 2656 |
2767 Here are things to know when you create a new source file: | 2657 Here are things to know when you create a new source file: |
2768 | 2658 |
2769 @itemize @bullet | 2659 @itemize @bullet |
2770 @item | 2660 @item |
2773 | 2663 |
2774 @item | 2664 @item |
2775 Generated header files should be included using the @code{#include <...>} syntax, | 2665 Generated header files should be included using the @code{#include <...>} syntax, |
2776 not the @code{#include "..."} syntax. The generated headers are: | 2666 not the @code{#include "..."} syntax. The generated headers are: |
2777 | 2667 |
2778 @file{config.h sheap-adjust.h paths.h Emacs.ad.h} | 2668 @file{config.h puresize-adjust.h sheap-adjust.h paths.h Emacs.ad.h} |
2779 | 2669 |
2780 The basic rule is that you should assume builds using @code{--srcdir} | 2670 The basic rule is that you should assume builds using @code{--srcdir} |
2781 and the @code{#include <...>} syntax needs to be used when the | 2671 and the @code{#include <...>} syntax needs to be used when the |
2782 to-be-included generated file is in a potentially different directory | 2672 to-be-included generated file is in a potentially different directory |
2783 @emph{at compile time}. The non-obvious C rule is that @code{#include "..."} | 2673 @emph{at compile time}. The non-obvious C rule is that @code{#include "..."} |
2787 @item | 2677 @item |
2788 Header files should @emph{not} include @code{<config.h>} and | 2678 Header files should @emph{not} include @code{<config.h>} and |
2789 @code{"lisp.h"}. It is the responsibility of the @file{.c} files that | 2679 @code{"lisp.h"}. It is the responsibility of the @file{.c} files that |
2790 use it to do so. | 2680 use it to do so. |
2791 | 2681 |
2682 @item | |
2683 If the header uses @code{INLINE}, either directly or through | |
2684 @code{DECLARE_LRECORD}, then it must be added to @file{inline.c}'s | |
2685 includes. | |
2686 | |
2687 @item | |
2688 Try compiling at least once with | |
2689 | |
2690 @example | |
2691 gcc --with-mule --with-union-type --error-checking=all | |
2692 @end example | |
2693 | |
2694 @item | |
2695 Did I mention that you should run the test suite? | |
2696 @example | |
2697 make check | |
2698 @end example | |
2792 @end itemize | 2699 @end itemize |
2793 | 2700 |
2794 Here is a checklist of things to do when creating a new lisp object type | |
2795 named @var{foo}: | |
2796 | |
2797 @enumerate | |
2798 @item | |
2799 create @var{foo}.h | |
2800 @item | |
2801 create @var{foo}.c | |
2802 @item | |
2803 add definitions of @code{syms_of_@var{foo}}, etc. to @file{@var{foo}.c} | |
2804 @item | |
2805 add declarations of @code{syms_of_@var{foo}}, etc. to @file{symsinit.h} | |
2806 @item | |
2807 add calls to @code{syms_of_@var{foo}}, etc. to @file{emacs.c} | |
2808 @item | |
2809 add definitions of macros like @code{CHECK_@var{FOO}} and | |
2810 @code{@var{FOO}P} to @file{@var{foo}.h} | |
2811 @item | |
2812 add the new type index to @code{enum lrecord_type} | |
2813 @item | |
2814 add a DEFINE_LRECORD_IMPLEMENTATION call to @file{@var{foo}.c} | |
2815 @item | |
2816 add an INIT_LRECORD_IMPLEMENTATION call to @code{syms_of_@var{foo}.c} | |
2817 @end enumerate | |
2818 | 2701 |
2819 @node A Summary of the Various XEmacs Modules, Allocation of Objects in XEmacs Lisp, Rules When Writing New C Code, Top | 2702 @node A Summary of the Various XEmacs Modules, Allocation of Objects in XEmacs Lisp, Rules When Writing New C Code, Top |
2820 @chapter A Summary of the Various XEmacs Modules | 2703 @chapter A Summary of the Various XEmacs Modules |
2821 | 2704 |
2822 This is accurate as of XEmacs 20.0. | 2705 This is accurate as of XEmacs 20.0. |
2834 * Modules for Interfacing with the Operating System:: | 2717 * Modules for Interfacing with the Operating System:: |
2835 * Modules for Interfacing with X Windows:: | 2718 * Modules for Interfacing with X Windows:: |
2836 * Modules for Internationalization:: | 2719 * Modules for Internationalization:: |
2837 @end menu | 2720 @end menu |
2838 | 2721 |
2839 @node Low-Level Modules, Basic Lisp Modules, A Summary of the Various XEmacs Modules, A Summary of the Various XEmacs Modules | 2722 @node Low-Level Modules |
2840 @section Low-Level Modules | 2723 @section Low-Level Modules |
2841 | 2724 |
2842 @example | 2725 @example |
2843 config.h | 2726 config.h |
2844 @end example | 2727 @end example |
3058 | 2941 |
3059 This is not currently used. | 2942 This is not currently used. |
3060 | 2943 |
3061 | 2944 |
3062 | 2945 |
3063 @node Basic Lisp Modules, Modules for Standard Editing Operations, Low-Level Modules, A Summary of the Various XEmacs Modules | 2946 @node Basic Lisp Modules |
3064 @section Basic Lisp Modules | 2947 @section Basic Lisp Modules |
3065 | 2948 |
3066 @example | 2949 @example |
3067 emacsfns.h | 2950 emacsfns.h |
3068 lisp-disunion.h | 2951 lisp-disunion.h |
3093 declarations (i.e. a simple declaration like @code{struct foo;} where | 2976 declarations (i.e. a simple declaration like @code{struct foo;} where |
3094 the structure itself is defined elsewhere) should be placed into the | 2977 the structure itself is defined elsewhere) should be placed into the |
3095 typedefs section as necessary. | 2978 typedefs section as necessary. |
3096 | 2979 |
3097 @file{lrecord.h} contains the basic structures and macros that implement | 2980 @file{lrecord.h} contains the basic structures and macros that implement |
3098 all record-type Lisp objects---i.e. all objects whose type is a field | 2981 all record-type Lisp objects -- i.e. all objects whose type is a field |
3099 in their C structure, which includes all objects except the few most | 2982 in their C structure, which includes all objects except the few most |
3100 basic ones. | 2983 basic ones. |
3101 | 2984 |
3102 @file{lisp.h} contains prototypes for most of the exported functions in | 2985 @file{lisp.h} contains prototypes for most of the exported functions in |
3103 the various modules. Lisp primitives defined using @code{DEFUN} that | 2986 the various modules. Lisp primitives defined using @code{DEFUN} that |
3111 | 2994 |
3112 | 2995 |
3113 | 2996 |
3114 @example | 2997 @example |
3115 alloc.c | 2998 alloc.c |
2999 pure.c | |
3000 puresize.h | |
3116 @end example | 3001 @end example |
3117 | 3002 |
3118 The large module @file{alloc.c} implements all of the basic allocation and | 3003 The large module @file{alloc.c} implements all of the basic allocation and |
3119 garbage collection for Lisp objects. The most commonly used Lisp | 3004 garbage collection for Lisp objects. The most commonly used Lisp |
3120 objects are allocated in chunks, similar to the Blocktype data type | 3005 objects are allocated in chunks, similar to the Blocktype data type |
3127 not dependent on any particular object type, and interfaces to | 3012 not dependent on any particular object type, and interfaces to |
3128 particular types of objects using a standardized interface of | 3013 particular types of objects using a standardized interface of |
3129 type-specific methods. This scheme is a fundamental principle of | 3014 type-specific methods. This scheme is a fundamental principle of |
3130 object-oriented programming and is heavily used throughout XEmacs. The | 3015 object-oriented programming and is heavily used throughout XEmacs. The |
3131 great advantage of this is that it allows for a clean separation of | 3016 great advantage of this is that it allows for a clean separation of |
3132 functionality into different modules---new classes of Lisp objects, new | 3017 functionality into different modules -- new classes of Lisp objects, new |
3133 event interfaces, new device types, new stream interfaces, etc. can be | 3018 event interfaces, new device types, new stream interfaces, etc. can be |
3134 added transparently without affecting code anywhere else in XEmacs. | 3019 added transparently without affecting code anywhere else in XEmacs. |
3135 Because the different subsystems are divided into general and specific | 3020 Because the different subsystems are divided into general and specific |
3136 code, adding a new subtype within a subsystem will in general not | 3021 code, adding a new subtype within a subsystem will in general not |
3137 require changes to the generic subsystem code or affect any of the other | 3022 require changes to the generic subsystem code or affect any of the other |
3138 subtypes in the subsystem; this provides a great deal of robustness to | 3023 subtypes in the subsystem; this provides a great deal of robustness to |
3139 the XEmacs code. | 3024 the XEmacs code. |
3025 | |
3026 @cindex pure space | |
3027 @file{pure.c} contains the declaration of the @dfn{purespace} array. | |
3028 Pure space is a hack used to place some constant Lisp data into the code | |
3029 segment of the XEmacs executable, even though the data needs to be | |
3030 initialized through function calls. (See above in section VIII for more | |
3031 info about this.) During startup, certain sorts of data is | |
3032 automatically copied into pure space, and other data is copied manually | |
3033 in some of the basic Lisp files by calling the function @code{purecopy}, | |
3034 which copies the object if possible (this only works in temacs, of | |
3035 course) and returns the new object. In particular, while temacs is | |
3036 executing, the Lisp reader automatically copies all compiled-function | |
3037 objects that it reads into pure space. Since compiled-function objects | |
3038 are large, are never modified, and typically comprise the majority of | |
3039 the contents of a compiled-Lisp file, this works well. While XEmacs is | |
3040 running, any attempt to modify an object that resides in pure space | |
3041 causes an error. Objects in pure space are never garbage collected -- | |
3042 almost all of the time, they're intended to be permanent, and in any | |
3043 case you can't write into pure space to set the mark bits. | |
3044 | |
3045 @file{puresize.h} contains the declaration of the size of the pure space | |
3046 array. This depends on the optional features that are compiled in, any | |
3047 extra purespace requested by the user at compile time, and certain other | |
3048 factors (e.g. 64-bit machines need more pure space because their Lisp | |
3049 objects are larger). The smallest size that suffices should be used, so | |
3050 that there's no wasted space. If there's not enough pure space, you | |
3051 will get an error during the build process, specifying how much more | |
3052 pure space is needed. | |
3053 | |
3140 | 3054 |
3141 | 3055 |
3142 @example | 3056 @example |
3143 eval.c | 3057 eval.c |
3144 backtrace.h | 3058 backtrace.h |
3189 @end example | 3103 @end example |
3190 | 3104 |
3191 @file{symbols.c} implements the handling of symbols, obarrays, and | 3105 @file{symbols.c} implements the handling of symbols, obarrays, and |
3192 retrieving the values of symbols. Much of the code is devoted to | 3106 retrieving the values of symbols. Much of the code is devoted to |
3193 handling the special @dfn{symbol-value-magic} objects that define | 3107 handling the special @dfn{symbol-value-magic} objects that define |
3194 special types of variables---this includes buffer-local variables, | 3108 special types of variables -- this includes buffer-local variables, |
3195 variable aliases, variables that forward into C variables, etc. This | 3109 variable aliases, variables that forward into C variables, etc. This |
3196 module is initialized extremely early (right after @file{alloc.c}), | 3110 module is initialized extremely early (right after @file{alloc.c}), |
3197 because it is here that the basic symbols @code{t} and @code{nil} are | 3111 because it is here that the basic symbols @code{t} and @code{nil} are |
3198 created, and those symbols are used everywhere throughout XEmacs. | 3112 created, and those symbols are used everywhere throughout XEmacs. |
3199 | 3113 |
3233 structures. Note that the byte-code @emph{compiler} is written in Lisp. | 3147 structures. Note that the byte-code @emph{compiler} is written in Lisp. |
3234 | 3148 |
3235 | 3149 |
3236 | 3150 |
3237 | 3151 |
3238 @node Modules for Standard Editing Operations, Editor-Level Control Flow Modules, Basic Lisp Modules, A Summary of the Various XEmacs Modules | 3152 @node Modules for Standard Editing Operations |
3239 @section Modules for Standard Editing Operations | 3153 @section Modules for Standard Editing Operations |
3240 | 3154 |
3241 @example | 3155 @example |
3242 buffer.c | 3156 buffer.c |
3243 buffer.h | 3157 buffer.h |
3403 This module implements the undo mechanism for tracking buffer changes. | 3317 This module implements the undo mechanism for tracking buffer changes. |
3404 Most of this could be implemented in Lisp. | 3318 Most of this could be implemented in Lisp. |
3405 | 3319 |
3406 | 3320 |
3407 | 3321 |
3408 @node Editor-Level Control Flow Modules, Modules for the Basic Displayable Lisp Objects, Modules for Standard Editing Operations, A Summary of the Various XEmacs Modules | 3322 @node Editor-Level Control Flow Modules |
3409 @section Editor-Level Control Flow Modules | 3323 @section Editor-Level Control Flow Modules |
3410 | 3324 |
3411 @example | 3325 @example |
3412 event-Xt.c | 3326 event-Xt.c |
3413 event-stream.c | 3327 event-stream.c |
3468 @example | 3382 @example |
3469 keyboard.c | 3383 keyboard.c |
3470 @end example | 3384 @end example |
3471 | 3385 |
3472 @file{keyboard.c} contains functions that implement the actual editor | 3386 @file{keyboard.c} contains functions that implement the actual editor |
3473 command loop---i.e. the event loop that cyclically retrieves and | 3387 command loop -- i.e. the event loop that cyclically retrieves and |
3474 dispatches events. This code is also rather tricky, just like | 3388 dispatches events. This code is also rather tricky, just like |
3475 @file{event-stream.c}. | 3389 @file{event-stream.c}. |
3476 | 3390 |
3477 | 3391 |
3478 | 3392 |
3501 bootstrapping implementations early in temacs, before the echo-area Lisp | 3415 bootstrapping implementations early in temacs, before the echo-area Lisp |
3502 code is loaded). | 3416 code is loaded). |
3503 | 3417 |
3504 | 3418 |
3505 | 3419 |
3506 @node Modules for the Basic Displayable Lisp Objects, Modules for other Display-Related Lisp Objects, Editor-Level Control Flow Modules, A Summary of the Various XEmacs Modules | 3420 @node Modules for the Basic Displayable Lisp Objects |
3507 @section Modules for the Basic Displayable Lisp Objects | 3421 @section Modules for the Basic Displayable Lisp Objects |
3508 | 3422 |
3509 @example | 3423 @example |
3510 device-ns.h | 3424 device-ns.h |
3511 device-stream.c | 3425 device-stream.c |
3575 is part of the redisplay mechanism or the code for particular object | 3489 is part of the redisplay mechanism or the code for particular object |
3576 types such as scrollbars. | 3490 types such as scrollbars. |
3577 | 3491 |
3578 | 3492 |
3579 | 3493 |
3580 @node Modules for other Display-Related Lisp Objects, Modules for the Redisplay Mechanism, Modules for the Basic Displayable Lisp Objects, A Summary of the Various XEmacs Modules | 3494 @node Modules for other Display-Related Lisp Objects |
3581 @section Modules for other Display-Related Lisp Objects | 3495 @section Modules for other Display-Related Lisp Objects |
3582 | 3496 |
3583 @example | 3497 @example |
3584 faces.c | 3498 faces.c |
3585 faces.h | 3499 faces.h |
3636 | 3550 |
3637 @example | 3551 @example |
3638 font-lock.c | 3552 font-lock.c |
3639 @end example | 3553 @end example |
3640 | 3554 |
3641 This file provides C support for syntax highlighting---i.e. | 3555 This file provides C support for syntax highlighting -- i.e. |
3642 highlighting different syntactic constructs of a source file in | 3556 highlighting different syntactic constructs of a source file in |
3643 different colors, for easy reading. The C support is provided so that | 3557 different colors, for easy reading. The C support is provided so that |
3644 this is fast. | 3558 this is fast. |
3645 | 3559 |
3646 | 3560 |
3654 | 3568 |
3655 These modules decode GIF-format image files, for use with glyphs. | 3569 These modules decode GIF-format image files, for use with glyphs. |
3656 | 3570 |
3657 | 3571 |
3658 | 3572 |
3659 @node Modules for the Redisplay Mechanism, Modules for Interfacing with the File System, Modules for other Display-Related Lisp Objects, A Summary of the Various XEmacs Modules | 3573 @node Modules for the Redisplay Mechanism |
3660 @section Modules for the Redisplay Mechanism | 3574 @section Modules for the Redisplay Mechanism |
3661 | 3575 |
3662 @example | 3576 @example |
3663 redisplay-output.c | 3577 redisplay-output.c |
3664 redisplay-tty.c | 3578 redisplay-tty.c |
3726 These files provide some miscellaneous TTY-output functions and should | 3640 These files provide some miscellaneous TTY-output functions and should |
3727 probably be merged into @file{redisplay-tty.c}. | 3641 probably be merged into @file{redisplay-tty.c}. |
3728 | 3642 |
3729 | 3643 |
3730 | 3644 |
3731 @node Modules for Interfacing with the File System, Modules for Other Aspects of the Lisp Interpreter and Object System, Modules for the Redisplay Mechanism, A Summary of the Various XEmacs Modules | 3645 @node Modules for Interfacing with the File System |
3732 @section Modules for Interfacing with the File System | 3646 @section Modules for Interfacing with the File System |
3733 | 3647 |
3734 @example | 3648 @example |
3735 lstream.c | 3649 lstream.c |
3736 lstream.h | 3650 lstream.h |
3827 for expanding symbolic links, on systems that don't implement it or have | 3741 for expanding symbolic links, on systems that don't implement it or have |
3828 a broken implementation. | 3742 a broken implementation. |
3829 | 3743 |
3830 | 3744 |
3831 | 3745 |
3832 @node Modules for Other Aspects of the Lisp Interpreter and Object System, Modules for Interfacing with the Operating System, Modules for Interfacing with the File System, A Summary of the Various XEmacs Modules | 3746 @node Modules for Other Aspects of the Lisp Interpreter and Object System |
3833 @section Modules for Other Aspects of the Lisp Interpreter and Object System | 3747 @section Modules for Other Aspects of the Lisp Interpreter and Object System |
3834 | 3748 |
3835 @example | 3749 @example |
3836 elhash.c | 3750 elhash.c |
3837 elhash.h | 3751 elhash.h |
3941 @cindex mark method | 3855 @cindex mark method |
3942 Opaque objects can also have an arbitrary @dfn{mark method} associated | 3856 Opaque objects can also have an arbitrary @dfn{mark method} associated |
3943 with them, in case the block of memory contains other Lisp objects that | 3857 with them, in case the block of memory contains other Lisp objects that |
3944 need to be marked for garbage-collection purposes. (If you need other | 3858 need to be marked for garbage-collection purposes. (If you need other |
3945 object methods, such as a finalize method, you should just go ahead and | 3859 object methods, such as a finalize method, you should just go ahead and |
3946 create a new Lisp object type---it's not hard.) | 3860 create a new Lisp object type -- it's not hard.) |
3947 | 3861 |
3948 | 3862 |
3949 | 3863 |
3950 @example | 3864 @example |
3951 abbrev.c | 3865 abbrev.c |
3989 various security applications on the Internet. | 3903 various security applications on the Internet. |
3990 | 3904 |
3991 | 3905 |
3992 | 3906 |
3993 | 3907 |
3994 @node Modules for Interfacing with the Operating System, Modules for Interfacing with X Windows, Modules for Other Aspects of the Lisp Interpreter and Object System, A Summary of the Various XEmacs Modules | 3908 @node Modules for Interfacing with the Operating System |
3995 @section Modules for Interfacing with the Operating System | 3909 @section Modules for Interfacing with the Operating System |
3996 | 3910 |
3997 @example | 3911 @example |
3998 callproc.c | 3912 callproc.c |
3999 process.c | 3913 process.c |
4228 These modules are used for MS-DOS support, which does not work in | 4142 These modules are used for MS-DOS support, which does not work in |
4229 XEmacs. | 4143 XEmacs. |
4230 | 4144 |
4231 | 4145 |
4232 | 4146 |
4233 @node Modules for Interfacing with X Windows, Modules for Internationalization, Modules for Interfacing with the Operating System, A Summary of the Various XEmacs Modules | 4147 @node Modules for Interfacing with X Windows |
4234 @section Modules for Interfacing with X Windows | 4148 @section Modules for Interfacing with X Windows |
4235 | 4149 |
4236 @example | 4150 @example |
4237 Emacs.ad.h | 4151 Emacs.ad.h |
4238 @end example | 4152 @end example |
4370 | 4284 |
4371 Don't touch this code; something is liable to break if you do. | 4285 Don't touch this code; something is liable to break if you do. |
4372 | 4286 |
4373 | 4287 |
4374 | 4288 |
4375 @node Modules for Internationalization, , Modules for Interfacing with X Windows, A Summary of the Various XEmacs Modules | 4289 @node Modules for Internationalization |
4376 @section Modules for Internationalization | 4290 @section Modules for Internationalization |
4377 | 4291 |
4378 @example | 4292 @example |
4379 mule-canna.c | 4293 mule-canna.c |
4380 mule-ccl.c | 4294 mule-ccl.c |
4447 Asian-language support, and is not currently used. | 4361 Asian-language support, and is not currently used. |
4448 | 4362 |
4449 | 4363 |
4450 | 4364 |
4451 | 4365 |
4452 @node Allocation of Objects in XEmacs Lisp, Dumping, A Summary of the Various XEmacs Modules, Top | 4366 @node Allocation of Objects in XEmacs Lisp, Events and the Event Loop, A Summary of the Various XEmacs Modules, Top |
4453 @chapter Allocation of Objects in XEmacs Lisp | 4367 @chapter Allocation of Objects in XEmacs Lisp |
4454 | 4368 |
4455 @menu | 4369 @menu |
4456 * Introduction to Allocation:: | 4370 * Introduction to Allocation:: |
4457 * Garbage Collection:: | 4371 * Garbage Collection:: |
4458 * GCPROing:: | 4372 * GCPROing:: |
4459 * Garbage Collection - Step by Step:: | |
4460 * Integers and Characters:: | 4373 * Integers and Characters:: |
4461 * Allocation from Frob Blocks:: | 4374 * Allocation from Frob Blocks:: |
4462 * lrecords:: | 4375 * lrecords:: |
4463 * Low-level allocation:: | 4376 * Low-level allocation:: |
4377 * Pure Space:: | |
4464 * Cons:: | 4378 * Cons:: |
4465 * Vector:: | 4379 * Vector:: |
4466 * Bit Vector:: | 4380 * Bit Vector:: |
4467 * Symbol:: | 4381 * Symbol:: |
4468 * Marker:: | 4382 * Marker:: |
4469 * String:: | 4383 * String:: |
4470 * Compiled Function:: | 4384 * Compiled Function:: |
4471 @end menu | 4385 @end menu |
4472 | 4386 |
4473 @node Introduction to Allocation, Garbage Collection, Allocation of Objects in XEmacs Lisp, Allocation of Objects in XEmacs Lisp | 4387 @node Introduction to Allocation |
4474 @section Introduction to Allocation | 4388 @section Introduction to Allocation |
4475 | 4389 |
4476 Emacs Lisp, like all Lisps, has garbage collection. This means that | 4390 Emacs Lisp, like all Lisps, has garbage collection. This means that |
4477 the programmer never has to explicitly free (destroy) an object; it | 4391 the programmer never has to explicitly free (destroy) an object; it |
4478 happens automatically when the object becomes inaccessible. Most | 4392 happens automatically when the object becomes inaccessible. Most |
4489 symbols, the primitives are @code{make-symbol} and @code{intern}; etc. | 4403 symbols, the primitives are @code{make-symbol} and @code{intern}; etc. |
4490 Some Lisp objects, especially those that are primarily used internally, | 4404 Some Lisp objects, especially those that are primarily used internally, |
4491 have no corresponding Lisp primitives. Every Lisp object, though, | 4405 have no corresponding Lisp primitives. Every Lisp object, though, |
4492 has at least one C primitive for creating it. | 4406 has at least one C primitive for creating it. |
4493 | 4407 |
4494 Recall from section (VII) that a Lisp object, as stored in a 32-bit or | 4408 Recall from section (VII) that a Lisp object, as stored in a 32-bit |
4495 64-bit word, has a few tag bits, and a ``value'' that occupies the | 4409 or 64-bit word, has a mark bit, a few tag bits, and a ``value'' that |
4496 remainder of the bits. We can separate the different Lisp object types | 4410 occupies the remainder of the bits. We can separate the different |
4497 into three broad categories: | 4411 Lisp object types into four broad categories: |
4498 | 4412 |
4499 @itemize @bullet | 4413 @itemize @bullet |
4500 @item | 4414 @item |
4501 (a) Those for whom the value directly represents the contents of the | 4415 (a) Those for whom the value directly represents the contents of the |
4502 Lisp object. Only two types are in this category: integers and | 4416 Lisp object. Only two types are in this category: integers and |
4503 characters. No special allocation or garbage collection is necessary | 4417 characters. No special allocation or garbage collection is necessary |
4504 for such objects. Lisp objects of these types do not need to be | 4418 for such objects. Lisp objects of these types do not need to be |
4505 @code{GCPRO}ed. | 4419 @code{GCPRO}ed. |
4506 @end itemize | 4420 @end itemize |
4507 | 4421 |
4422 In the remaining three categories, the value is a pointer to a | |
4423 structure. | |
4424 | |
4425 @itemize @bullet | |
4426 @item | |
4427 @cindex frob block | |
4428 (b) Those for whom the tag directly specifies the type. Recall that | |
4429 there are only three tag bits; this means that at most five types can be | |
4430 specified this way. The most commonly-used types are stored in this | |
4431 format; this includes conses, strings, vectors, and sometimes symbols. | |
4432 With the exception of vectors, objects in this category are allocated in | |
4433 @dfn{frob blocks}, i.e. large blocks of memory that are subdivided into | |
4434 individual objects. This saves a lot on malloc overhead, since there | |
4435 are typically quite a lot of these objects around, and the objects are | |
4436 small. (A cons, for example, occupies 8 bytes on 32-bit machines -- 4 | |
4437 bytes for each of the two objects it contains.) Vectors are individually | |
4438 @code{malloc()}ed since they are of variable size. (It would be | |
4439 possible, and desirable, to allocate vectors of certain small sizes out | |
4440 of frob blocks, but it isn't currently done.) Strings are handled | |
4441 specially: Each string is allocated in two parts, a fixed size structure | |
4442 containing a length and a data pointer, and the actual data of the | |
4443 string. The former structure is allocated in frob blocks as usual, and | |
4444 the latter data is stored in @dfn{string chars blocks} and is relocated | |
4445 during garbage collection to eliminate holes. | |
4446 @end itemize | |
4447 | |
4508 In the remaining two categories, the type is stored in the object | 4448 In the remaining two categories, the type is stored in the object |
4509 itself. The tag for all such objects is the generic @dfn{lrecord} | 4449 itself. The tag for all such objects is the generic @dfn{lrecord} |
4510 (Lisp_Type_Record) tag. The first bytes of the object's structure are an | 4450 (Lisp_Record) tag. The first four bytes (or eight, for 64-bit machines) |
4511 integer (actually a char) characterising the object's type and some | 4451 of the object's structure are a pointer to a structure that describes |
4512 flags, in particular the mark bit used for garbage collection. A | 4452 the object's type, which includes method pointers and a pointer to a |
4513 structure describing the type is accessible thru the | 4453 string naming the type. Note that it's possible to save some space by |
4514 lrecord_implementation_table indexed with said integer. This structure | 4454 using a one- or two-byte tag, rather than a four- or eight-byte pointer |
4515 includes the method pointers and a pointer to a string naming the type. | 4455 to store the type, but it's not clear it's worth making the change. |
4516 | 4456 |
4517 @itemize @bullet | 4457 @itemize @bullet |
4518 @item | 4458 @item |
4519 (b) Those lrecords that are allocated in frob blocks (see above). This | 4459 (c) Those lrecords that are allocated in frob blocks (see above). This |
4520 includes the objects that are most common and relatively small, and | 4460 includes the objects that are most common and relatively small, and |
4521 includes conses, strings, subrs, floats, compiled functions, symbols, | 4461 includes floats, compiled functions, symbols (when not in category (b)), |
4522 extents, events, and markers. With the cleanup of frob blocks done in | 4462 extents, events, and markers. With the cleanup of frob blocks done in |
4523 19.12, it's not terribly hard to add more objects to this category, but | 4463 19.12, it's not terribly hard to add more objects to this category, but |
4524 it's a bit trickier than adding an object type to type (c) (esp. if the | 4464 it's a bit trickier than adding an object type to type (d) (esp. if the |
4525 object needs a finalization method), and is not likely to save much | 4465 object needs a finalization method), and is not likely to save much |
4526 space unless the object is small and there are many of them. (In fact, | 4466 space unless the object is small and there are many of them. (In fact, |
4527 if there are very few of them, it might actually waste space.) | 4467 if there are very few of them, it might actually waste space.) |
4528 @item | 4468 @item |
4529 (c) Those lrecords that are individually @code{malloc()}ed. These are | 4469 (d) Those lrecords that are individually @code{malloc()}ed. These are |
4530 called @dfn{lcrecords}. All other types are in this category. Adding a | 4470 called @dfn{lcrecords}. All other types are in this category. Adding a |
4531 new type to this category is comparatively easy, and all types added | 4471 new type to this category is comparatively easy, and all types added |
4532 since 19.8 (when the current allocation scheme was devised, by Richard | 4472 since 19.8 (when the current allocation scheme was devised, by Richard |
4533 Mlynarik), with the exception of the character type, have been in this | 4473 Mlynarik), with the exception of the character type, have been in this |
4534 category. | 4474 category. |
4535 @end itemize | 4475 @end itemize |
4536 | 4476 |
4537 Note that bit vectors are a bit of a special case. They are | 4477 Note that bit vectors are a bit of a special case. They are |
4538 simple lrecords as in category (b), but are individually @code{malloc()}ed | 4478 simple lrecords as in category (c), but are individually @code{malloc()}ed |
4539 like vectors. You can basically view them as exactly like vectors | 4479 like vectors. You can basically view them as exactly like vectors |
4540 except that their type is stored in lrecord fashion rather than | 4480 except that their type is stored in lrecord fashion rather than |
4541 in directly-tagged fashion. | 4481 in directly-tagged fashion. |
4542 | 4482 |
4543 | 4483 Note that FSF Emacs redesigned their object system in 19.29 to follow |
4544 @node Garbage Collection, GCPROing, Introduction to Allocation, Allocation of Objects in XEmacs Lisp | 4484 a similar scheme. However, given RMS's expressed dislike for data |
4485 abstraction, the FSF scheme is not nearly as clean or as easy to | |
4486 extend. (FSF calls items of type (c) @code{Lisp_Misc} and items of type | |
4487 (d) @code{Lisp_Vectorlike}, with separate tags for each, although | |
4488 @code{Lisp_Vectorlike} is also used for vectors.) | |
4489 | |
4490 @node Garbage Collection | |
4545 @section Garbage Collection | 4491 @section Garbage Collection |
4546 @cindex garbage collection | 4492 @cindex garbage collection |
4547 | 4493 |
4548 @cindex mark and sweep | 4494 @cindex mark and sweep |
4549 Garbage collection is simple in theory but tricky to implement. | 4495 Garbage collection is simple in theory but tricky to implement. |
4557 that ``all of memory'' means all currently allocated objects. | 4503 that ``all of memory'' means all currently allocated objects. |
4558 Traversing all these objects means traversing all frob blocks, | 4504 Traversing all these objects means traversing all frob blocks, |
4559 all vectors (which are chained in one big list), and all | 4505 all vectors (which are chained in one big list), and all |
4560 lcrecords (which are likewise chained). | 4506 lcrecords (which are likewise chained). |
4561 | 4507 |
4562 Garbage collection can be invoked explicitly by calling | 4508 Note that, when an object is marked, the mark has to occur |
4563 @code{garbage-collect} but is also called automatically by @code{eval}, | 4509 inside of the object's structure, rather than in the 32-bit |
4564 once a certain amount of memory has been allocated since the last | 4510 @code{Lisp_Object} holding the object's pointer; i.e. you can't just |
4565 garbage collection (according to @code{gc-cons-threshold}). | 4511 set the pointer's mark bit. This is because there may be many |
4566 | 4512 pointers to the same object. This means that the method of |
4567 | 4513 marking an object can differ depending on the type. The |
4568 @node GCPROing, Garbage Collection - Step by Step, Garbage Collection, Allocation of Objects in XEmacs Lisp | 4514 different marking methods are approximately as follows: |
4515 | |
4516 @enumerate | |
4517 @item | |
4518 For conses, the mark bit of the car is set. | |
4519 @item | |
4520 For strings, the mark bit of the string's plist is set. | |
4521 @item | |
4522 For symbols when not lrecords, the mark bit of the | |
4523 symbol's plist is set. | |
4524 @item | |
4525 For vectors, the length is negated after adding 1. | |
4526 @item | |
4527 For lrecords, the pointer to the structure describing | |
4528 the type is changed (see below). | |
4529 @item | |
4530 Integers and characters do not need to be marked, since | |
4531 no allocation occurs for them. | |
4532 @end enumerate | |
4533 | |
4534 The details of this are in the @code{mark_object()} function. | |
4535 | |
4536 Note that any code that operates during garbage collection has | |
4537 to be especially careful because of the fact that some objects | |
4538 may be marked and as such may not look like they normally do. | |
4539 In particular: | |
4540 | |
4541 @itemize @bullet | |
4542 Some object pointers may have their mark bit set. This will make | |
4543 @code{FOOBARP()} predicates fail. Use @code{GC_FOOBARP()} to deal with | |
4544 this. | |
4545 @item | |
4546 Even if you clear the mark bit, @code{FOOBARP()} will still fail | |
4547 for lrecords because the implementation pointer has been | |
4548 changed (see below). @code{GC_FOOBARP()} will correctly deal with | |
4549 this. | |
4550 @item | |
4551 Vectors have their size field munged, so anything that | |
4552 looks at this field will fail. | |
4553 @item | |
4554 Note that @code{XFOOBAR()} macros @emph{will} work correctly on object | |
4555 pointers with their mark bit set, because the logical shift operations | |
4556 that remove the tag also remove the mark bit. | |
4557 @end itemize | |
4558 | |
4559 Finally, note that garbage collection can be invoked explicitly | |
4560 by calling @code{garbage-collect} but is also called automatically | |
4561 by @code{eval}, once a certain amount of memory has been allocated | |
4562 since the last garbage collection (according to @code{gc-cons-threshold}). | |
4563 | |
4564 @node GCPROing | |
4569 @section @code{GCPRO}ing | 4565 @section @code{GCPRO}ing |
4570 | 4566 |
4571 @code{GCPRO}ing is one of the ugliest and trickiest parts of Emacs | 4567 @code{GCPRO}ing is one of the ugliest and trickiest parts of Emacs |
4572 internals. The basic idea is that whenever garbage collection | 4568 internals. The basic idea is that whenever garbage collection |
4573 occurs, all in-use objects must be reachable somehow or | 4569 occurs, all in-use objects must be reachable somehow or |
4574 other from one of the roots of accessibility. The roots | 4570 other from one of the roots of accessibility. The roots |
4575 of accessibility are: | 4571 of accessibility are: |
4576 | 4572 |
4577 @enumerate | 4573 @enumerate |
4578 @item | 4574 @item |
4579 All objects that have been @code{staticpro()}d or | 4575 All objects that have been @code{staticpro()}d. This is used for |
4580 @code{staticpro_nodump()}ed. This is used for any global C variables | 4576 any global C variables that hold Lisp objects. A call to |
4581 that hold Lisp objects. A call to @code{staticpro()} happens implicitly | 4577 @code{staticpro()} happens implicitly as a result of any symbols |
4582 as a result of any symbols declared with @code{defsymbol()} and any | 4578 declared with @code{defsymbol()} and any variables declared with |
4583 variables declared with @code{DEFVAR_FOO()}. You need to explicitly | 4579 @code{DEFVAR_FOO()}. You need to explicitly call @code{staticpro()} |
4584 call @code{staticpro()} (in the @code{vars_of_foo()} method of a module) | 4580 (in the @code{vars_of_foo()} method of a module) for other global |
4585 for other global C variables holding Lisp objects. (This typically | 4581 C variables holding Lisp objects. (This typically includes |
4586 includes internal lists and such things.). Use | 4582 internal lists and such things.) |
4587 @code{staticpro_nodump()} only in the rare cases when you do not want | |
4588 the pointed variable to be saved at dump time but rather recompute it at | |
4589 startup. | |
4590 | 4583 |
4591 Note that @code{obarray} is one of the @code{staticpro()}d things. | 4584 Note that @code{obarray} is one of the @code{staticpro()}d things. |
4592 Therefore, all functions and variables get marked through this. | 4585 Therefore, all functions and variables get marked through this. |
4593 @item | 4586 @item |
4594 Any shadowed bindings that are sitting on the @code{specpdl} stack. | 4587 Any shadowed bindings that are sitting on the @code{specpdl} stack. |
4628 variable @samp{gcprolist} pointing to the head of the list and the nth | 4621 variable @samp{gcprolist} pointing to the head of the list and the nth |
4629 local @code{gcpro} variable pointing to the first @code{gcpro} variable | 4622 local @code{gcpro} variable pointing to the first @code{gcpro} variable |
4630 in the next enclosing stack frame. Each @code{GCPRO}ed thing is an | 4623 in the next enclosing stack frame. Each @code{GCPRO}ed thing is an |
4631 lvalue, and the @code{struct gcpro} local variable contains a pointer to | 4624 lvalue, and the @code{struct gcpro} local variable contains a pointer to |
4632 this lvalue. This is why things will mess up badly if you don't pair up | 4625 this lvalue. This is why things will mess up badly if you don't pair up |
4633 the @code{GCPRO}s and @code{UNGCPRO}s---you will end up with | 4626 the @code{GCPRO}s and @code{UNGCPRO}s -- you will end up with |
4634 @code{gcprolist}s containing pointers to @code{struct gcpro}s or local | 4627 @code{gcprolist}s containing pointers to @code{struct gcpro}s or local |
4635 @code{Lisp_Object} variables in no-longer-active stack frames. | 4628 @code{Lisp_Object} variables in no-longer-active stack frames. |
4636 | 4629 |
4637 @item | 4630 @item |
4638 It is actually possible for a single @code{struct gcpro} to | 4631 It is actually possible for a single @code{struct gcpro} to |
4719 anything that looks like a reference to an object as a reference. This | 4712 anything that looks like a reference to an object as a reference. This |
4720 will result in a few objects not getting collected when they should, but | 4713 will result in a few objects not getting collected when they should, but |
4721 it obviates the need for @code{GCPRO}ing, and allows garbage collection | 4714 it obviates the need for @code{GCPRO}ing, and allows garbage collection |
4722 to happen at any point at all, such as during object allocation. | 4715 to happen at any point at all, such as during object allocation. |
4723 | 4716 |
4724 @node Garbage Collection - Step by Step, Integers and Characters, GCPROing, Allocation of Objects in XEmacs Lisp | 4717 @node Integers and Characters |
4725 @section Garbage Collection - Step by Step | |
4726 @cindex garbage collection step by step | |
4727 | |
4728 @menu | |
4729 * Invocation:: | |
4730 * garbage_collect_1:: | |
4731 * mark_object:: | |
4732 * gc_sweep:: | |
4733 * sweep_lcrecords_1:: | |
4734 * compact_string_chars:: | |
4735 * sweep_strings:: | |
4736 * sweep_bit_vectors_1:: | |
4737 @end menu | |
4738 | |
4739 @node Invocation, garbage_collect_1, Garbage Collection - Step by Step, Garbage Collection - Step by Step | |
4740 @subsection Invocation | |
4741 @cindex garbage collection, invocation | |
4742 | |
4743 The first thing that anyone should know about garbage collection is: | |
4744 when and how the garbage collector is invoked. One might think that this | |
4745 could happen every time new memory is allocated, e.g. new objects are | |
4746 created, but this is @emph{not} the case. Instead, we have the following | |
4747 situation: | |
4748 | |
4749 The entry point of any process of garbage collection is an invocation | |
4750 of the function @code{garbage_collect_1} in file @code{alloc.c}. The | |
4751 invocation can occur @emph{explicitly} by calling the function | |
4752 @code{Fgarbage_collect} (in addition this function provides information | |
4753 about the freed memory), or can occur @emph{implicitly} in four different | |
4754 situations: | |
4755 @enumerate | |
4756 @item | |
4757 In function @code{main_1} in file @code{emacs.c}. This function is called | |
4758 at each startup of xemacs. The garbage collection is invoked after all | |
4759 initial creations are completed, but only if a special internal error | |
4760 checking-constant @code{ERROR_CHECK_GC} is defined. | |
4761 @item | |
4762 In function @code{disksave_object_finalization} in file | |
4763 @code{alloc.c}. The only purpose of this function is to clear the | |
4764 objects from memory which need not be stored with xemacs when we dump out | |
4765 an executable. This is only done by @code{Fdump_emacs} or by | |
4766 @code{Fdump_emacs_data} respectively (both in @code{emacs.c}). The | |
4767 actual clearing is accomplished by making these objects unreachable and | |
4768 starting a garbage collection. The function is only used while building | |
4769 xemacs. | |
4770 @item | |
4771 In function @code{Feval / eval} in file @code{eval.c}. Each time the | |
4772 well known and often used function eval is called to evaluate a form, | |
4773 one of the first things that could happen, is a potential call of | |
4774 @code{garbage_collect_1}. There exist three global variables, | |
4775 @code{consing_since_gc} (counts the created cons-cells since the last | |
4776 garbage collection), @code{gc_cons_threshold} (a specified threshold | |
4777 after which a garbage collection occurs) and @code{always_gc}. If | |
4778 @code{always_gc} is set or if the threshold is exceeded, the garbage | |
4779 collection will start. | |
4780 @item | |
4781 In function @code{Ffuncall / funcall} in file @code{eval.c}. This | |
4782 function evaluates calls of elisp functions and works according to | |
4783 @code{Feval}. | |
4784 @end enumerate | |
4785 | |
4786 The upshot is that garbage collection can basically occur everywhere | |
4787 @code{Feval}, respectively @code{Ffuncall}, is used - either directly or | |
4788 through another function. Since calls to these two functions are hidden | |
4789 in various other functions, many calls to @code{garbage_collect_1} are | |
4790 not obviously foreseeable, and therefore unexpected. Instances where | |
4791 they are used that are worth remembering are various elisp commands, as | |
4792 for example @code{or}, @code{and}, @code{if}, @code{cond}, @code{while}, | |
4793 @code{setq}, etc., miscellaneous @code{gui_item_...} functions, | |
4794 everything related to @code{eval} (@code{Feval_buffer}, @code{call0}, | |
4795 ...) and inside @code{Fsignal}. The latter is used to handle signals, as | |
4796 for example the ones raised by every @code{QUITE}-macro triggered after | |
4797 pressing Ctrl-g. | |
4798 | |
4799 @node garbage_collect_1, mark_object, Invocation, Garbage Collection - Step by Step | |
4800 @subsection @code{garbage_collect_1} | |
4801 @cindex @code{garbage_collect_1} | |
4802 | |
4803 We can now describe exactly what happens after the invocation takes | |
4804 place. | |
4805 @enumerate | |
4806 @item | |
4807 There are several cases in which the garbage collector is left immediately: | |
4808 when we are already garbage collecting (@code{gc_in_progress}), when | |
4809 the garbage collection is somehow forbidden | |
4810 (@code{gc_currently_forbidden}), when we are currently displaying something | |
4811 (@code{in_display}) or when we are preparing for the armageddon of the | |
4812 whole system (@code{preparing_for_armageddon}). | |
4813 @item | |
4814 Next the correct frame in which to put | |
4815 all the output occurring during garbage collecting is determined. In | |
4816 order to be able to restore the old display's state after displaying the | |
4817 message, some data about the current cursor position has to be | |
4818 saved. The variables @code{pre_gc_cursor} and @code{cursor_changed} take | |
4819 care of that. | |
4820 @item | |
4821 The state of @code{gc_currently_forbidden} must be restored after | |
4822 the garbage collection, no matter what happens during the process. We | |
4823 accomplish this by @code{record_unwind_protect}ing the suitable function | |
4824 @code{restore_gc_inhibit} together with the current value of | |
4825 @code{gc_currently_forbidden}. | |
4826 @item | |
4827 If we are concurrently running an interactive xemacs session, the next step | |
4828 is simply to show the garbage collector's cursor/message. | |
4829 @item | |
4830 The following steps are the intrinsic steps of the garbage collector, | |
4831 therefore @code{gc_in_progress} is set. | |
4832 @item | |
4833 For debugging purposes, it is possible to copy the current C stack | |
4834 frame. However, this seems to be a currently unused feature. | |
4835 @item | |
4836 Before actually starting to go over all live objects, references to | |
4837 objects that are no longer used are pruned. We only have to do this for events | |
4838 (@code{clear_event_resource}) and for specifiers | |
4839 (@code{cleanup_specifiers}). | |
4840 @item | |
4841 Now the mark phase begins and marks all accessible elements. In order to | |
4842 start from | |
4843 all slots that serve as roots of accessibility, the function | |
4844 @code{mark_object} is called for each root individually to go out from | |
4845 there to mark all reachable objects. All roots that are traversed are | |
4846 shown in their processed order: | |
4847 @itemize @bullet | |
4848 @item | |
4849 all constant symbols and static variables that are registered via | |
4850 @code{staticpro}@ in the array @code{staticvec}. | |
4851 @xref{Adding Global Lisp Variables}. | |
4852 @item | |
4853 all Lisp objects that are created in C functions and that must be | |
4854 protected from freeing them. They are registered in the global | |
4855 list @code{gcprolist}. | |
4856 @xref{GCPROing}. | |
4857 @item | |
4858 all local variables (i.e. their name fields @code{symbol} and old | |
4859 values @code{old_values}) that are bound during the evaluation by the Lisp | |
4860 engine. They are stored in @code{specbinding} structs pushed on a stack | |
4861 called @code{specpdl}. | |
4862 @xref{Dynamic Binding; The specbinding Stack; Unwind-Protects}. | |
4863 @item | |
4864 all catch blocks that the Lisp engine encounters during the evaluation | |
4865 cause the creation of structs @code{catchtag} inserted in the list | |
4866 @code{catchlist}. Their tag (@code{tag}) and value (@code{val} fields | |
4867 are freshly created objects and therefore have to be marked. | |
4868 @xref{Catch and Throw}. | |
4869 @item | |
4870 every function application pushes new structs @code{backtrace} | |
4871 on the call stack of the Lisp engine (@code{backtrace_list}). The unique | |
4872 parts that have to be marked are the fields for each function | |
4873 (@code{function}) and all their arguments (@code{args}). | |
4874 @xref{Evaluation}. | |
4875 @item | |
4876 all objects that are used by the redisplay engine that must not be freed | |
4877 are marked by a special function called @code{mark_redisplay} (in | |
4878 @code{redisplay.c}). | |
4879 @item | |
4880 all objects created for profiling purposes are allocated by C functions | |
4881 instead of using the lisp allocation mechanisms. In order to receive the | |
4882 right ones during the sweep phase, they also have to be marked | |
4883 manually. That is done by the function @code{mark_profiling_info} | |
4884 @end itemize | |
4885 @item | |
4886 Hash tables in XEmacs belong to a kind of special objects that | |
4887 make use of a concept often called 'weak pointers'. | |
4888 To make a long story short, these kind of pointers are not followed | |
4889 during the estimation of the live objects during garbage collection. | |
4890 Any object referenced only by weak pointers is collected | |
4891 anyway, and the reference to it is cleared. In hash tables there are | |
4892 different usage patterns of them, manifesting in different types of hash | |
4893 tables, namely 'non-weak', 'weak', 'key-weak' and 'value-weak' | |
4894 (internally also 'key-car-weak' and 'value-car-weak') hash tables, each | |
4895 clearing entries depending on different conditions. More information can | |
4896 be found in the documentation to the function @code{make-hash-table}. | |
4897 | |
4898 Because there are complicated dependency rules about when and what to | |
4899 mark while processing weak hash tables, the standard @code{marker} | |
4900 method is only active if it is marking non-weak hash tables. As soon as | |
4901 a weak component is in the table, the hash table entries are ignored | |
4902 while marking. Instead their marking is done each separately by the | |
4903 function @code{finish_marking_weak_hash_tables}. This function iterates | |
4904 over each hash table entry @code{hentries} for each weak hash table in | |
4905 @code{Vall_weak_hash_tables}. Depending on the type of a table, the | |
4906 appropriate action is performed. | |
4907 If a table is acting as @code{HASH_TABLE_KEY_WEAK}, and a key already marked, | |
4908 everything reachable from the @code{value} component is marked. If it is | |
4909 acting as a @code{HASH_TABLE_VALUE_WEAK} and the value component is | |
4910 already marked, the marking starts beginning only from the | |
4911 @code{key} component. | |
4912 If it is a @code{HASH_TABLE_KEY_CAR_WEAK} and the car | |
4913 of the key entry is already marked, we mark both the @code{key} and | |
4914 @code{value} components. | |
4915 Finally, if the table is of the type @code{HASH_TABLE_VALUE_CAR_WEAK} | |
4916 and the car of the value components is already marked, again both the | |
4917 @code{key} and the @code{value} components get marked. | |
4918 | |
4919 Again, there are lists with comparable properties called weak | |
4920 lists. There exist different peculiarities of their types called | |
4921 @code{simple}, @code{assoc}, @code{key-assoc} and | |
4922 @code{value-assoc}. You can find further details about them in the | |
4923 description to the function @code{make-weak-list}. The scheme of their | |
4924 marking is similar: all weak lists are listed in @code{Qall_weak_lists}, | |
4925 therefore we iterate over them. The marking is advanced until we hit an | |
4926 already marked pair. Then we know that during a former run all | |
4927 the rest has been marked completely. Again, depending on the special | |
4928 type of the weak list, our jobs differ. If it is a @code{WEAK_LIST_SIMPLE} | |
4929 and the elem is marked, we mark the @code{cons} part. If it is a | |
4930 @code{WEAK_LIST_ASSOC} and not a pair or a pair with both marked car and | |
4931 cdr, we mark the @code{cons} and the @code{elem}. If it is a | |
4932 @code{WEAK_LIST_KEY_ASSOC} and not a pair or a pair with a marked car of | |
4933 the elem, we mark the @code{cons} and the @code{elem}. Finally, if it is | |
4934 a @code{WEAK_LIST_VALUE_ASSOC} and not a pair or a pair with a marked | |
4935 cdr of the elem, we mark both the @code{cons} and the @code{elem}. | |
4936 | |
4937 Since, by marking objects in reach from weak hash tables and weak lists, | |
4938 other objects could get marked, this perhaps implies further marking of | |
4939 other weak objects, both finishing functions are redone as long as | |
4940 yet unmarked objects get freshly marked. | |
4941 | |
4942 @item | |
4943 After completing the special marking for the weak hash tables and for the weak | |
4944 lists, all entries that point to objects that are going to be swept in | |
4945 the further process are useless, and therefore have to be removed from | |
4946 the table or the list. | |
4947 | |
4948 The function @code{prune_weak_hash_tables} does the job for weak hash | |
4949 tables. Totally unmarked hash tables are removed from the list | |
4950 @code{Vall_weak_hash_tables}. The other ones are treated more carefully | |
4951 by scanning over all entries and removing one as soon as one of | |
4952 the components @code{key} and @code{value} is unmarked. | |
4953 | |
4954 The same idea applies to the weak lists. It is accomplished by | |
4955 @code{prune_weak_lists}: An unmarked list is pruned from | |
4956 @code{Vall_weak_lists} immediately. A marked list is treated more | |
4957 carefully by going over it and removing just the unmarked pairs. | |
4958 | |
4959 @item | |
4960 The function @code{prune_specifiers} checks all listed specifiers held | |
4961 in @code{Vall_specifiers} and removes the ones from the lists that are | |
4962 unmarked. | |
4963 | |
4964 @item | |
4965 All syntax tables are stored in a list called | |
4966 @code{Vall_syntax_tables}. The function @code{prune_syntax_tables} walks | |
4967 through it and unlinks the tables that are unmarked. | |
4968 | |
4969 @item | |
4970 Next, we will attack the complete sweeping - the function | |
4971 @code{gc_sweep} which holds the predominance. | |
4972 @item | |
4973 First, all the variables with respect to garbage collection are | |
4974 reset. @code{consing_since_gc} - the counter of the created cells since | |
4975 the last garbage collection - is set back to 0, and | |
4976 @code{gc_in_progress} is not @code{true} anymore. | |
4977 @item | |
4978 In case the session is interactive, the displayed cursor and message are | |
4979 removed again. | |
4980 @item | |
4981 The state of @code{gc_inhibit} is restored to the former value by | |
4982 unwinding the stack. | |
4983 @item | |
4984 A small memory reserve is always held back that can be reached by | |
4985 @code{breathing_space}. If nothing more is left, we create a new reserve | |
4986 and exit. | |
4987 @end enumerate | |
4988 | |
4989 @node mark_object, gc_sweep, garbage_collect_1, Garbage Collection - Step by Step | |
4990 @subsection @code{mark_object} | |
4991 @cindex @code{mark_object} | |
4992 | |
4993 The first thing that is checked while marking an object is whether the | |
4994 object is a real Lisp object @code{Lisp_Type_Record} or just an integer | |
4995 or a character. Integers and characters are the only two types that are | |
4996 stored directly - without another level of indirection, and therefore they | |
4997 don't have to be marked and collected. | |
4998 @xref{How Lisp Objects Are Represented in C}. | |
4999 | |
5000 The second case is the one we have to handle. It is the one when we are | |
5001 dealing with a pointer to a Lisp object. But, there exist also three | |
5002 possibilities, that prevent us from doing anything while marking: The | |
5003 object is read only which prevents it from being garbage collected, | |
5004 i.e. marked (@code{C_READONLY_RECORD_HEADER}). The object in question is | |
5005 already marked, and need not be marked for the second time (checked by | |
5006 @code{MARKED_RECORD_HEADER_P}). If it is a special, unmarkable object | |
5007 (@code{UNMARKABLE_RECORD_HEADER_P}, apparently, these are objects that | |
5008 sit in some const space, and can therefore not be marked, see | |
5009 @code{this_one_is_unmarkable} in @code{alloc.c}). | |
5010 | |
5011 Now, the actual marking is feasible. We do so by once using the macro | |
5012 @code{MARK_RECORD_HEADER} to mark the object itself (actually the | |
5013 special flag in the lrecord header), and calling its special marker | |
5014 "method" @code{marker} if available. The marker method marks every | |
5015 other object that is in reach from our current object. Note, that these | |
5016 marker methods should not call @code{mark_object} recursively, but | |
5017 instead should return the next object from where further marking has to | |
5018 be performed. | |
5019 | |
5020 In case another object was returned, as mentioned before, we reiterate | |
5021 the whole @code{mark_object} process beginning with this next object. | |
5022 | |
5023 @node gc_sweep, sweep_lcrecords_1, mark_object, Garbage Collection - Step by Step | |
5024 @subsection @code{gc_sweep} | |
5025 @cindex @code{gc_sweep} | |
5026 | |
5027 The job of this function is to free all unmarked records from memory. As | |
5028 we know, there are different types of objects implemented and managed, and | |
5029 consequently different ways to free them from memory. | |
5030 @xref{Introduction to Allocation}. | |
5031 | |
5032 We start with all objects stored through @code{lcrecords}. All | |
5033 bulkier objects are allocated and handled using that scheme of | |
5034 @code{lcrecords}. Each object is @code{malloc}ed separately | |
5035 instead of placing it in one of the contiguous frob blocks. All types | |
5036 that are currently stored | |
5037 using @code{lcrecords}'s @code{alloc_lcrecord} and | |
5038 @code{make_lcrecord_list} are the types: vectors, buffers, | |
5039 char-table, char-table-entry, console, weak-list, database, device, | |
5040 ldap, hash-table, command-builder, extent-auxiliary, extent-info, face, | |
5041 coding-system, frame, image-instance, glyph, popup-data, gui-item, | |
5042 keymap, charset, color_instance, font_instance, opaque, opaque-list, | |
5043 process, range-table, specifier, symbol-value-buffer-local, | |
5044 symbol-value-lisp-magic, symbol-value-varalias, toolbar-button, | |
5045 tooltalk-message, tooltalk-pattern, window, and window-configuration. We | |
5046 take care of them in the fist place | |
5047 in order to be able to handle and to finalize items stored in them more | |
5048 easily. The function @code{sweep_lcrecords_1} as described below is | |
5049 doing the whole job for us. | |
5050 For a description about the internals: @xref{lrecords}. | |
5051 | |
5052 Our next candidates are the other objects that behave quite differently | |
5053 than everything else: the strings. They consists of two parts, a | |
5054 fixed-size portion (@code{struct Lisp_String}) holding the string's | |
5055 length, its property list and a pointer to the second part, and the | |
5056 actual string data, which is stored in string-chars blocks comparable to | |
5057 frob blocks. In this block, the data is not only freed, but also a | |
5058 compression of holes is made, i.e. all strings are relocated together. | |
5059 @xref{String}. This compacting phase is performed by the function | |
5060 @code{compact_string_chars}, the actual sweeping by the function | |
5061 @code{sweep_strings} is described below. | |
5062 | |
5063 After that, the other types are swept step by step using functions | |
5064 @code{sweep_conses}, @code{sweep_bit_vectors_1}, | |
5065 @code{sweep_compiled_functions}, @code{sweep_floats}, | |
5066 @code{sweep_symbols}, @code{sweep_extents}, @code{sweep_markers} and | |
5067 @code{sweep_extents}. They are the fixed-size types cons, floats, | |
5068 compiled-functions, symbol, marker, extent, and event stored in | |
5069 so-called "frob blocks", and therefore we can basically do the same on | |
5070 every type objects, using the same macros, especially defined only to | |
5071 handle everything with respect to fixed-size blocks. The only fixed-size | |
5072 type that is not handled here are the fixed-size portion of strings, | |
5073 because we took special care of them earlier. | |
5074 | |
5075 The only big exceptions are bit vectors stored differently and | |
5076 therefore treated differently by the function @code{sweep_bit_vectors_1} | |
5077 described later. | |
5078 | |
5079 At first, we need some brief information about how | |
5080 these fixed-size types are managed in general, in order to understand | |
5081 how the sweeping is done. They have all a fixed size, and are therefore | |
5082 stored in big blocks of memory - allocated at once - that can hold a | |
5083 certain amount of objects of one type. The macro | |
5084 @code{DECLARE_FIXED_TYPE_ALLOC} creates the suitable structures for | |
5085 every type. More precisely, we have the block struct | |
5086 (holding a pointer to the previous block @code{prev} and the | |
5087 objects in @code{block[]}), a pointer to current block | |
5088 (@code{current_..._block)}) and its last index | |
5089 (@code{current_..._block_index}), and a pointer to the free list that | |
5090 will be created. Also a macro @code{FIXED_TYPE_FROM_BLOCK} plus some | |
5091 related macros exists that are used to obtain a new object, either from | |
5092 the free list @code{ALLOCATE_FIXED_TYPE_1} if there is an unused object | |
5093 of that type stored or by allocating a completely new block using | |
5094 @code{ALLOCATE_FIXED_TYPE_FROM_BLOCK}. | |
5095 | |
5096 The rest works as follows: all of them define a | |
5097 macro @code{UNMARK_...} that is used to unmark the object. They define a | |
5098 macro @code{ADDITIONAL_FREE_...} that defines additional work that has | |
5099 to be done when converting an object from in use to not in use (so far, | |
5100 only markers use it in order to unchain them). Then, they all call | |
5101 the macro @code{SWEEP_FIXED_TYPE_BLOCK} instantiated with their type name | |
5102 and their struct name. | |
5103 | |
5104 This call in particular does the following: we go over all blocks | |
5105 starting with the current moving towards the oldest. | |
5106 For each block, we look at every object in it. If the object already | |
5107 freed (checked with @code{FREE_STRUCT_P} using the first pointer of the | |
5108 object), or if it is | |
5109 set to read only (@code{C_READONLY_RECORD_HEADER_P}, nothing must be | |
5110 done. If it is unmarked (checked with @code{MARKED_RECORD_HEADER_P}), it | |
5111 is put in the free list and set free (using the macro | |
5112 @code{FREE_FIXED_TYPE}, otherwise it stays in the block, but is unmarked | |
5113 (by @code{UNMARK_...}). While going through one block, we note if the | |
5114 whole block is empty. If so, the whole block is freed (using | |
5115 @code{xfree}) and the free list state is set to the state it had before | |
5116 handling this block. | |
5117 | |
5118 @node sweep_lcrecords_1, compact_string_chars, gc_sweep, Garbage Collection - Step by Step | |
5119 @subsection @code{sweep_lcrecords_1} | |
5120 @cindex @code{sweep_lcrecords_1} | |
5121 | |
5122 After nullifying the complete lcrecord statistics, we go over all | |
5123 lcrecords two separate times. They are all chained together in a list with | |
5124 a head called @code{all_lcrecords}. | |
5125 | |
5126 The first loop calls for each object its @code{finalizer} method, but only | |
5127 in the case that it is not read only | |
5128 (@code{C_READONLY_RECORD_HEADER_P)}, it is not already marked | |
5129 (@code{MARKED_RECORD_HEADER_P}), it is not already in a free list (list of | |
5130 freed objects, field @code{free}) and finally it owns a finalizer | |
5131 method. | |
5132 | |
5133 The second loop actually frees the appropriate objects again by iterating | |
5134 through the whole list. In case an object is read only or marked, it | |
5135 has to persist, otherwise it is manually freed by calling | |
5136 @code{xfree}. During this loop, the lcrecord statistics are kept up to | |
5137 date by calling @code{tick_lcrecord_stats} with the right arguments, | |
5138 | |
5139 @node compact_string_chars, sweep_strings, sweep_lcrecords_1, Garbage Collection - Step by Step | |
5140 @subsection @code{compact_string_chars} | |
5141 @cindex @code{compact_string_chars} | |
5142 | |
5143 The purpose of this function is to compact all the data parts of the | |
5144 strings that are held in so-called @code{string_chars_block}, i.e. the | |
5145 strings that do not exceed a certain maximal length. | |
5146 | |
5147 The procedure with which this is done is as follows. We are keeping two | |
5148 positions in the @code{string_chars_block}s using two pointer/integer | |
5149 pairs, namely @code{from_sb}/@code{from_pos} and | |
5150 @code{to_sb}/@code{to_pos}. They stand for the actual positions, from | |
5151 where to where, to copy the actually handled string. | |
5152 | |
5153 While going over all chained @code{string_char_block}s and their held | |
5154 strings, staring at @code{first_string_chars_block}, both pointers | |
5155 are advanced and eventually a string is copied from @code{from_sb} to | |
5156 @code{to_sb}, depending on the status of the pointed at strings. | |
5157 | |
5158 More precisely, we can distinguish between the following actions. | |
5159 @itemize @bullet | |
5160 @item | |
5161 The string at @code{from_sb}'s position could be marked as free, which | |
5162 is indicated by an invalid pointer to the pointer that should point back | |
5163 to the fixed size string object, and which is checked by | |
5164 @code{FREE_STRUCT_P}. In this case, the @code{from_sb}/@code{from_pos} | |
5165 is advanced to the next string, and nothing has to be copied. | |
5166 @item | |
5167 Also, if a string object itself is unmarked, nothing has to be | |
5168 copied. We likewise advance the @code{from_sb}/@code{from_pos} | |
5169 pair as described above. | |
5170 @item | |
5171 In all other cases, we have a marked string at hand. The string data | |
5172 must be moved from the from-position to the to-position. In case | |
5173 there is not enough space in the actual @code{to_sb}-block, we advance | |
5174 this pointer to the beginning of the next block before copying. In case the | |
5175 from and to positions are different, we perform the | |
5176 actual copying using the library function @code{memmove}. | |
5177 @end itemize | |
5178 | |
5179 After compacting, the pointer to the current | |
5180 @code{string_chars_block}, sitting in @code{current_string_chars_block}, | |
5181 is reset on the last block to which we moved a string, | |
5182 i.e. @code{to_block}, and all remaining blocks (we know that they just | |
5183 carry garbage) are explicitly @code{xfree}d. | |
5184 | |
5185 @node sweep_strings, sweep_bit_vectors_1, compact_string_chars, Garbage Collection - Step by Step | |
5186 @subsection @code{sweep_strings} | |
5187 @cindex @code{sweep_strings} | |
5188 | |
5189 The sweeping for the fixed sized string objects is essentially exactly | |
5190 the same as it is for all other fixed size types. As before, the freeing | |
5191 into the suitable free list is done by using the macro | |
5192 @code{SWEEP_FIXED_SIZE_BLOCK} after defining the right macros | |
5193 @code{UNMARK_string} and @code{ADDITIONAL_FREE_string}. These two | |
5194 definitions are a little bit special compared to the ones used | |
5195 for the other fixed size types. | |
5196 | |
5197 @code{UNMARK_string} is defined the same way except some additional code | |
5198 used for updating the bookkeeping information. | |
5199 | |
5200 For strings, @code{ADDITIONAL_FREE_string} has to do something in | |
5201 addition: in case, the string was not allocated in a | |
5202 @code{string_chars_block} because it exceeded the maximal length, and | |
5203 therefore it was @code{malloc}ed separately, we know also @code{xfree} | |
5204 it explicitly. | |
5205 | |
5206 @node sweep_bit_vectors_1, , sweep_strings, Garbage Collection - Step by Step | |
5207 @subsection @code{sweep_bit_vectors_1} | |
5208 @cindex @code{sweep_bit_vectors_1} | |
5209 | |
5210 Bit vectors are also one of the rare types that are @code{malloc}ed | |
5211 individually. Consequently, while sweeping, all further needless | |
5212 bit vectors must be freed by hand. This is done, as one might imagine, | |
5213 the expected way: since they are all registered in a list called | |
5214 @code{all_bit_vectors}, all elements of that list are traversed, | |
5215 all unmarked bit vectors are unlinked by calling @code{xfree} and all of | |
5216 them become unmarked. | |
5217 In addition, the bookkeeping information used for garbage | |
5218 collector's output purposes is updated. | |
5219 | |
5220 @node Integers and Characters, Allocation from Frob Blocks, Garbage Collection - Step by Step, Allocation of Objects in XEmacs Lisp | |
5221 @section Integers and Characters | 4718 @section Integers and Characters |
5222 | 4719 |
5223 Integer and character Lisp objects are created from integers using the | 4720 Integer and character Lisp objects are created from integers using the |
5224 macros @code{XSETINT()} and @code{XSETCHAR()} or the equivalent | 4721 macros @code{XSETINT()} and @code{XSETCHAR()} or the equivalent |
5225 functions @code{make_int()} and @code{make_char()}. (These are actually | 4722 functions @code{make_int()} and @code{make_char()}. (These are actually |
5229 | 4726 |
5230 @code{XSETINT()} and the like will truncate values given to them that | 4727 @code{XSETINT()} and the like will truncate values given to them that |
5231 are too big; i.e. you won't get the value you expected but the tag bits | 4728 are too big; i.e. you won't get the value you expected but the tag bits |
5232 will at least be correct. | 4729 will at least be correct. |
5233 | 4730 |
5234 @node Allocation from Frob Blocks, lrecords, Integers and Characters, Allocation of Objects in XEmacs Lisp | 4731 @node Allocation from Frob Blocks |
5235 @section Allocation from Frob Blocks | 4732 @section Allocation from Frob Blocks |
5236 | 4733 |
5237 The uninitialized memory required by a @code{Lisp_Object} of a particular type | 4734 The uninitialized memory required by a @code{Lisp_Object} of a particular type |
5238 is allocated using | 4735 is allocated using |
5239 @code{ALLOCATE_FIXED_TYPE()}. This only occurs inside of the | 4736 @code{ALLOCATE_FIXED_TYPE()}. This only occurs inside of the |
5256 @code{ALLOCATE_FIXED_TYPE_FROM_BLOCK()}, which looks at the end of the | 4753 @code{ALLOCATE_FIXED_TYPE_FROM_BLOCK()}, which looks at the end of the |
5257 last frob block for space, and creates a new frob block if there is | 4754 last frob block for space, and creates a new frob block if there is |
5258 none. (There are actually two versions of these macros, one of which is | 4755 none. (There are actually two versions of these macros, one of which is |
5259 more defensive but less efficient and is used for error-checking.) | 4756 more defensive but less efficient and is used for error-checking.) |
5260 | 4757 |
5261 @node lrecords, Low-level allocation, Allocation from Frob Blocks, Allocation of Objects in XEmacs Lisp | 4758 @node lrecords |
5262 @section lrecords | 4759 @section lrecords |
5263 | 4760 |
5264 [see @file{lrecord.h}] | 4761 [see @file{lrecord.h}] |
5265 | 4762 |
5266 All lrecords have at the beginning of their structure a @code{struct | 4763 All lrecords have at the beginning of their structure a @code{struct |
5267 lrecord_header}. This just contains a type number and some flags, | 4764 lrecord_header}. This just contains a pointer to a @code{struct |
5268 including the mark bit. All builtin type numbers are defined as | |
5269 constants in @code{enum lrecord_type}, to allow the compiler to generate | |
5270 more efficient code for @code{@var{type}P}. The type number, thru the | |
5271 @code{lrecord_implementation_table}, gives access to a @code{struct | |
5272 lrecord_implementation}, which is a structure containing method pointers | 4765 lrecord_implementation}, which is a structure containing method pointers |
5273 and such. There is one of these for each type, and it is a global, | 4766 and such. There is one of these for each type, and it is a global, |
5274 constant, statically-declared structure that is declared in the | 4767 constant, statically-declared structure that is declared in the |
5275 @code{DEFINE_LRECORD_IMPLEMENTATION()} macro. | 4768 @code{DEFINE_LRECORD_IMPLEMENTATION()} macro. (This macro actually |
5276 | 4769 declares an array of two @code{struct lrecord_implementation} |
5277 Simple lrecords (of type (b) above) just have a @code{struct | 4770 structures. The first one contains all the standard method pointers, |
4771 and is used in all normal circumstances. During garbage collection, | |
4772 however, the lrecord is @dfn{marked} by bumping its implementation | |
4773 pointer by one, so that it points to the second structure in the array. | |
4774 This structure contains a special indication in it that it's a | |
4775 @dfn{marked-object} structure: the finalize method is the special | |
4776 function @code{this_marks_a_marked_record()}, and all other methods are | |
4777 null pointers. At the end of garbage collection, all lrecords will | |
4778 either be reclaimed or unmarked by decrementing their implementation | |
4779 pointers, so this second structure pointer will never remain past | |
4780 garbage collection. | |
4781 | |
4782 Simple lrecords (of type (c) above) just have a @code{struct | |
5278 lrecord_header} at their beginning. lcrecords, however, actually have a | 4783 lrecord_header} at their beginning. lcrecords, however, actually have a |
5279 @code{struct lcrecord_header}. This, in turn, has a @code{struct | 4784 @code{struct lcrecord_header}. This, in turn, has a @code{struct |
5280 lrecord_header} at its beginning, so sanity is preserved; but it also | 4785 lrecord_header} at its beginning, so sanity is preserved; but it also |
5281 has a pointer used to chain all lcrecords together, and a special ID | 4786 has a pointer used to chain all lcrecords together, and a special ID |
5282 field used to distinguish one lcrecord from another. (This field is used | 4787 field used to distinguish one lcrecord from another. (This field is used |
5300 type. | 4805 type. |
5301 | 4806 |
5302 Whenever you create an lrecord, you need to call either | 4807 Whenever you create an lrecord, you need to call either |
5303 @code{DEFINE_LRECORD_IMPLEMENTATION()} or | 4808 @code{DEFINE_LRECORD_IMPLEMENTATION()} or |
5304 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()}. This needs to be | 4809 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()}. This needs to be |
5305 specified in a @file{.c} file, at the top level. What this actually | 4810 specified in a C file, at the top level. What this actually does is |
5306 does is define and initialize the implementation structure for the | 4811 define and initialize the implementation structure for the lrecord. (And |
5307 lrecord. (And possibly declares a function @code{error_check_foo()} that | 4812 possibly declares a function @code{error_check_foo()} that implements |
5308 implements the @code{XFOO()} macro when error-checking is enabled.) The | 4813 the @code{XFOO()} macro when error-checking is enabled.) The arguments |
5309 arguments to the macros are the actual type name (this is used to | 4814 to the macros are the actual type name (this is used to construct the C |
5310 construct the C variable name of the lrecord implementation structure | 4815 variable name of the lrecord implementation structure and related |
5311 and related structures using the @samp{##} macro concatenation | 4816 structures using the @samp{##} macro concatenation operator), a string |
5312 operator), a string that names the type on the Lisp level (this may not | 4817 that names the type on the Lisp level (this may not be the same as the C |
5313 be the same as the C type name; typically, the C type name has | 4818 type name; typically, the C type name has underscores, while the Lisp |
5314 underscores, while the Lisp string has dashes), various method pointers, | 4819 string has dashes), various method pointers, and the name of the C |
5315 and the name of the C structure that contains the object. The methods | 4820 structure that contains the object. The methods are used to encapsulate |
5316 are used to encapsulate type-specific information about the object, such | 4821 type-specific information about the object, such as how to print it or |
5317 as how to print it or mark it for garbage collection, so that it's easy | 4822 mark it for garbage collection, so that it's easy to add new object |
5318 to add new object types without having to add a specific case for each | 4823 types without having to add a specific case for each new type in a bunch |
5319 new type in a bunch of different places. | 4824 of different places. |
5320 | 4825 |
5321 The difference between @code{DEFINE_LRECORD_IMPLEMENTATION()} and | 4826 The difference between @code{DEFINE_LRECORD_IMPLEMENTATION()} and |
5322 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()} is that the former is | 4827 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()} is that the former is |
5323 used for fixed-size object types and the latter is for variable-size | 4828 used for fixed-size object types and the latter is for variable-size |
5324 object types. Most object types are fixed-size; some complex | 4829 object types. Most object types are fixed-size; some complex |
5328 (Currently this is only used for keeping allocation statistics.) | 4833 (Currently this is only used for keeping allocation statistics.) |
5329 | 4834 |
5330 For the purpose of keeping allocation statistics, the allocation | 4835 For the purpose of keeping allocation statistics, the allocation |
5331 engine keeps a list of all the different types that exist. Note that, | 4836 engine keeps a list of all the different types that exist. Note that, |
5332 since @code{DEFINE_LRECORD_IMPLEMENTATION()} is a macro that is | 4837 since @code{DEFINE_LRECORD_IMPLEMENTATION()} is a macro that is |
5333 specified at top-level, there is no way for it to initialize the global | 4838 specified at top-level, there is no way for it to add to the list of all |
5334 data structures containing type information, like | 4839 existing types. What happens instead is that each implementation |
5335 @code{lrecord_implementations_table}. For this reason a call to | 4840 structure contains in it a dynamically assigned number that is |
5336 @code{INIT_LRECORD_IMPLEMENTATION} must be added to the same source file | 4841 particular to that type. (Or rather, it contains a pointer to another |
5337 containing @code{DEFINE_LRECORD_IMPLEMENTATION}, but instead of to the | 4842 structure that contains this number. This evasiveness is done so that |
5338 top level, to one of the init functions, typically | 4843 the implementation structure can be declared const.) In the sweep stage |
5339 @code{syms_of_@var{foo}.c}. @code{INIT_LRECORD_IMPLEMENTATION} must be | 4844 of garbage collection, each lrecord is examined to see if its |
5340 called before an object of this type is used. | 4845 implementation structure has its dynamically-assigned number set. If |
5341 | 4846 not, it must be a new type, and it is added to the list of known types |
5342 The type number is also used to index into an array holding the number | 4847 and a new number assigned. The number is used to index into an array |
5343 of objects of each type and the total memory allocated for objects of | 4848 holding the number of objects of each type and the total memory |
5344 that type. The statistics in this array are computed during the sweep | 4849 allocated for objects of that type. The statistics in this array are |
5345 stage. These statistics are returned by the call to | 4850 also computed during the sweep stage. These statistics are returned by |
5346 @code{garbage-collect}. | 4851 the call to @code{garbage-collect} and are printed out at the end of the |
4852 loadup phase. | |
5347 | 4853 |
5348 Note that for every type defined with a @code{DEFINE_LRECORD_*()} | 4854 Note that for every type defined with a @code{DEFINE_LRECORD_*()} |
5349 macro, there needs to be a @code{DECLARE_LRECORD_IMPLEMENTATION()} | 4855 macro, there needs to be a @code{DECLARE_LRECORD_IMPLEMENTATION()} |
5350 somewhere in a @file{.h} file, and this @file{.h} file needs to be | 4856 somewhere in a @file{.h} file, and this @file{.h} file needs to be |
5351 included by @file{inline.c}. | 4857 included by @file{inline.c}. |
5486 (i.e. declared with a @code{_SEQUENCE_IMPLEMENTATION} macro.) This should | 4992 (i.e. declared with a @code{_SEQUENCE_IMPLEMENTATION} macro.) This should |
5487 simply return the object's size in bytes, exactly as you might expect. | 4993 simply return the object's size in bytes, exactly as you might expect. |
5488 For an example, see the methods for window configurations and opaques. | 4994 For an example, see the methods for window configurations and opaques. |
5489 @end enumerate | 4995 @end enumerate |
5490 | 4996 |
5491 @node Low-level allocation, Cons, lrecords, Allocation of Objects in XEmacs Lisp | 4997 @node Low-level allocation |
5492 @section Low-level allocation | 4998 @section Low-level allocation |
5493 | 4999 |
5494 Memory that you want to allocate directly should be allocated using | 5000 Memory that you want to allocate directly should be allocated using |
5495 @code{xmalloc()} rather than @code{malloc()}. This implements | 5001 @code{xmalloc()} rather than @code{malloc()}. This implements |
5496 error-checking on the return value, and once upon a time did some more | 5002 error-checking on the return value, and once upon a time did some more |
5547 XEmacs taps into them and issues a warning through the standard | 5053 XEmacs taps into them and issues a warning through the standard |
5548 warning system, when memory gets to 75%, 85%, and 95% full. | 5054 warning system, when memory gets to 75%, 85%, and 95% full. |
5549 (On some systems, the memory warnings are not functional.) | 5055 (On some systems, the memory warnings are not functional.) |
5550 | 5056 |
5551 Allocated memory that is going to be used to make a Lisp object | 5057 Allocated memory that is going to be used to make a Lisp object |
5552 is created using @code{allocate_lisp_storage()}. This just calls | 5058 is created using @code{allocate_lisp_storage()}. This calls @code{xmalloc()} |
5553 @code{xmalloc()}. It used to verify that the pointer to the memory can | 5059 but also verifies that the pointer to the memory can fit into |
5554 fit into a Lisp word, before the current Lisp object representation was | 5060 a Lisp word (remember that some bits are taken away for a type |
5555 introduced. @code{allocate_lisp_storage()} is called by | 5061 tag and a mark bit). If not, an error is issued through @code{memory_full()}. |
5556 @code{alloc_lcrecord()}, @code{ALLOCATE_FIXED_TYPE()}, and the vector | 5062 @code{allocate_lisp_storage()} is called by @code{alloc_lcrecord()}, |
5557 and bit-vector creation routines. These routines also call | 5063 @code{ALLOCATE_FIXED_TYPE()}, and the vector and bit-vector creation |
5558 @code{INCREMENT_CONS_COUNTER()} at the appropriate times; this keeps | 5064 routines. These routines also call @code{INCREMENT_CONS_COUNTER()} at the |
5559 statistics on how much memory is allocated, so that garbage-collection | 5065 appropriate times; this keeps statistics on how much memory is |
5560 can be invoked when the threshold is reached. | 5066 allocated, so that garbage-collection can be invoked when the |
5561 | 5067 threshold is reached. |
5562 @node Cons, Vector, Low-level allocation, Allocation of Objects in XEmacs Lisp | 5068 |
5069 @node Pure Space | |
5070 @section Pure Space | |
5071 | |
5072 Not yet documented. | |
5073 | |
5074 @node Cons | |
5563 @section Cons | 5075 @section Cons |
5564 | 5076 |
5565 Conses are allocated in standard frob blocks. The only thing to | 5077 Conses are allocated in standard frob blocks. The only thing to |
5566 note is that conses can be explicitly freed using @code{free_cons()} | 5078 note is that conses can be explicitly freed using @code{free_cons()} |
5567 and associated functions @code{free_list()} and @code{free_alist()}. This | 5079 and associated functions @code{free_list()} and @code{free_alist()}. This |
5571 generating extra objects and thereby triggering GC sooner. | 5083 generating extra objects and thereby triggering GC sooner. |
5572 However, you have to be @emph{extremely} careful when doing this. | 5084 However, you have to be @emph{extremely} careful when doing this. |
5573 If you mess this up, you will get BADLY BURNED, and it has happened | 5085 If you mess this up, you will get BADLY BURNED, and it has happened |
5574 before. | 5086 before. |
5575 | 5087 |
5576 @node Vector, Bit Vector, Cons, Allocation of Objects in XEmacs Lisp | 5088 @node Vector |
5577 @section Vector | 5089 @section Vector |
5578 | 5090 |
5579 As mentioned above, each vector is @code{malloc()}ed individually, and | 5091 As mentioned above, each vector is @code{malloc()}ed individually, and |
5580 all are threaded through the variable @code{all_vectors}. Vectors are | 5092 all are threaded through the variable @code{all_vectors}. Vectors are |
5581 marked strangely during garbage collection, by kludging the size field. | 5093 marked strangely during garbage collection, by kludging the size field. |
5582 Note that the @code{struct Lisp_Vector} is declared with its | 5094 Note that the @code{struct Lisp_Vector} is declared with its |
5583 @code{contents} field being a @emph{stretchy} array of one element. It | 5095 @code{contents} field being a @emph{stretchy} array of one element. It |
5584 is actually @code{malloc()}ed with the right size, however, and access | 5096 is actually @code{malloc()}ed with the right size, however, and access |
5585 to any element through the @code{contents} array works fine. | 5097 to any element through the @code{contents} array works fine. |
5586 | 5098 |
5587 @node Bit Vector, Symbol, Vector, Allocation of Objects in XEmacs Lisp | 5099 @node Bit Vector |
5588 @section Bit Vector | 5100 @section Bit Vector |
5589 | 5101 |
5590 Bit vectors work exactly like vectors, except for more complicated | 5102 Bit vectors work exactly like vectors, except for more complicated |
5591 code to access an individual bit, and except for the fact that bit | 5103 code to access an individual bit, and except for the fact that bit |
5592 vectors are lrecords while vectors are not. (The only difference here is | 5104 vectors are lrecords while vectors are not. (The only difference here is |
5593 that there's an lrecord implementation pointer at the beginning and the | 5105 that there's an lrecord implementation pointer at the beginning and the |
5594 tag field in bit vector Lisp words is ``lrecord'' rather than | 5106 tag field in bit vector Lisp words is ``lrecord'' rather than |
5595 ``vector''.) | 5107 ``vector''.) |
5596 | 5108 |
5597 @node Symbol, Marker, Bit Vector, Allocation of Objects in XEmacs Lisp | 5109 @node Symbol |
5598 @section Symbol | 5110 @section Symbol |
5599 | 5111 |
5600 Symbols are also allocated in frob blocks. Symbols in the awful | 5112 Symbols are also allocated in frob blocks. Note that the code |
5601 horrible obarray structure are chained through their @code{next} field. | 5113 exists for symbols to be either lrecords (category (c) above) |
5114 or simple types (category (b) above), and are lrecords by | |
5115 default (I think), although there is no good reason for this. | |
5116 | |
5117 Note that symbols in the awful horrible obarray structure are | |
5118 chained through their @code{next} field. | |
5602 | 5119 |
5603 Remember that @code{intern} looks up a symbol in an obarray, creating | 5120 Remember that @code{intern} looks up a symbol in an obarray, creating |
5604 one if necessary. | 5121 one if necessary. |
5605 | 5122 |
5606 @node Marker, String, Symbol, Allocation of Objects in XEmacs Lisp | 5123 @node Marker |
5607 @section Marker | 5124 @section Marker |
5608 | 5125 |
5609 Markers are allocated in frob blocks, as usual. They are kept | 5126 Markers are allocated in frob blocks, as usual. They are kept |
5610 in a buffer unordered, but in a doubly-linked list so that they | 5127 in a buffer unordered, but in a doubly-linked list so that they |
5611 can easily be removed. (Formerly this was a singly-linked list, | 5128 can easily be removed. (Formerly this was a singly-linked list, |
5612 but in some cases garbage collection took an extraordinarily | 5129 but in some cases garbage collection took an extraordinarily |
5613 long time due to the O(N^2) time required to remove lots of | 5130 long time due to the O(N^2) time required to remove lots of |
5614 markers from a buffer.) Markers are removed from a buffer in | 5131 markers from a buffer.) Markers are removed from a buffer in |
5615 the finalize stage, in @code{ADDITIONAL_FREE_marker()}. | 5132 the finalize stage, in @code{ADDITIONAL_FREE_marker()}. |
5616 | 5133 |
5617 @node String, Compiled Function, Marker, Allocation of Objects in XEmacs Lisp | 5134 @node String |
5618 @section String | 5135 @section String |
5619 | 5136 |
5620 As mentioned above, strings are a special case. A string is logically | 5137 As mentioned above, strings are a special case. A string is logically |
5621 two parts, a fixed-size object (containing the length, property list, | 5138 two parts, a fixed-size object (containing the length, property list, |
5622 and a pointer to the actual data), and the actual data in the string. | 5139 and a pointer to the actual data), and the actual data in the string. |
5646 Note that there is one situation not handled: a string that is too big | 5163 Note that there is one situation not handled: a string that is too big |
5647 to fit into a string-chars block. Such strings, called @dfn{big | 5164 to fit into a string-chars block. Such strings, called @dfn{big |
5648 strings}, are all @code{malloc()}ed as their own block. (#### Although it | 5165 strings}, are all @code{malloc()}ed as their own block. (#### Although it |
5649 would make more sense for the threshold for big strings to be somewhat | 5166 would make more sense for the threshold for big strings to be somewhat |
5650 lower, e.g. 1/2 or 1/4 the size of a string-chars block. It seems that | 5167 lower, e.g. 1/2 or 1/4 the size of a string-chars block. It seems that |
5651 this was indeed the case formerly---indeed, the threshold was set at | 5168 this was indeed the case formerly -- indeed, the threshold was set at |
5652 1/8---but Mly forgot about this when rewriting things for 19.8.) | 5169 1/8 -- but Mly forgot about this when rewriting things for 19.8.) |
5653 | 5170 |
5654 Note also that the string data in string-chars blocks is padded as | 5171 Note also that the string data in string-chars blocks is padded as |
5655 necessary so that proper alignment constraints on the @code{struct | 5172 necessary so that proper alignment constraints on the @code{struct |
5656 Lisp_String} back pointers are maintained. | 5173 Lisp_String} back pointers are maintained. |
5657 | 5174 |
5673 string data (which would normally be obtained from the now-non-existent | 5190 string data (which would normally be obtained from the now-non-existent |
5674 @code{struct Lisp_String}) at the beginning of the dead string data gap. | 5191 @code{struct Lisp_String}) at the beginning of the dead string data gap. |
5675 The string compactor recognizes this special 0xFFFFFFFF marker and | 5192 The string compactor recognizes this special 0xFFFFFFFF marker and |
5676 handles it correctly. | 5193 handles it correctly. |
5677 | 5194 |
5678 @node Compiled Function, , String, Allocation of Objects in XEmacs Lisp | 5195 @node Compiled Function |
5679 @section Compiled Function | 5196 @section Compiled Function |
5680 | 5197 |
5681 Not yet documented. | 5198 Not yet documented. |
5682 | 5199 |
5683 | 5200 @node Events and the Event Loop, Evaluation; Stack Frames; Bindings, Allocation of Objects in XEmacs Lisp, Top |
5684 @node Dumping, Events and the Event Loop, Allocation of Objects in XEmacs Lisp, Top | |
5685 @chapter Dumping | |
5686 | |
5687 @section What is dumping and its justification | |
5688 | |
5689 The C code of XEmacs is just a Lisp engine with a lot of built-in | |
5690 primitives useful for writing an editor. The editor itself is written | |
5691 mostly in Lisp, and represents around 100K lines of code. Loading and | |
5692 executing the initialization of all this code takes a bit a time (five | |
5693 to ten times the usual startup time of current xemacs) and requires | |
5694 having all the lisp source files around. Having to reload them each | |
5695 time the editor is started would not be acceptable. | |
5696 | |
5697 The traditional solution to this problem is called dumping: the build | |
5698 process first creates the lisp engine under the name @file{temacs}, then | |
5699 runs it until it has finished loading and initializing all the lisp | |
5700 code, and eventually creates a new executable called @file{xemacs} | |
5701 including both the object code in @file{temacs} and all the contents of | |
5702 the memory after the initialization. | |
5703 | |
5704 This solution, while working, has a huge problem: the creation of the | |
5705 new executable from the actual contents of memory is an extremely | |
5706 system-specific process, quite error-prone, and which interferes with a | |
5707 lot of system libraries (like malloc). It is even getting worse | |
5708 nowadays with libraries using constructors which are automatically | |
5709 called when the program is started (even before main()) which tend to | |
5710 crash when they are called multiple times, once before dumping and once | |
5711 after (IRIX 6.x libz.so pulls in some C++ image libraries thru | |
5712 dependencies which have this problem). Writing the dumper is also one | |
5713 of the most difficult parts of porting XEmacs to a new operating system. | |
5714 Basically, `dumping' is an operation that is just not officially | |
5715 supported on many operating systems. | |
5716 | |
5717 The aim of the portable dumper is to solve the same problem as the | |
5718 system-specific dumper, that is to be able to reload quickly, using only | |
5719 a small number of files, the fully initialized lisp part of the editor, | |
5720 without any system-specific hacks. | |
5721 | |
5722 @menu | |
5723 * Overview:: | |
5724 * Data descriptions:: | |
5725 * Dumping phase:: | |
5726 * Reloading phase:: | |
5727 * Remaining issues:: | |
5728 @end menu | |
5729 | |
5730 @node Overview, Data descriptions, Dumping, Dumping | |
5731 @section Overview | |
5732 | |
5733 The portable dumping system has to: | |
5734 | |
5735 @enumerate | |
5736 @item | |
5737 At dump time, write all initialized, non-quickly-rebuildable data to a | |
5738 file [Note: currently named @file{xemacs.dmp}, but the name will | |
5739 change], along with all informations needed for the reloading. | |
5740 | |
5741 @item | |
5742 When starting xemacs, reload the dump file, relocate it to its new | |
5743 starting address if needed, and reinitialize all pointers to this | |
5744 data. Also, rebuild all the quickly rebuildable data. | |
5745 @end enumerate | |
5746 | |
5747 @node Data descriptions, Dumping phase, Overview, Dumping | |
5748 @section Data descriptions | |
5749 | |
5750 The more complex task of the dumper is to be able to write lisp objects | |
5751 (lrecords) and C structs to disk and reload them at a different address, | |
5752 updating all the pointers they include in the process. This is done by | |
5753 using external data descriptions that give information about the layout | |
5754 of the structures in memory. | |
5755 | |
5756 The specification of these descriptions is in lrecord.h. A description | |
5757 of an lrecord is an array of struct lrecord_description. Each of these | |
5758 structs include a type, an offset in the structure and some optional | |
5759 parameters depending on the type. For instance, here is the string | |
5760 description: | |
5761 | |
5762 @example | |
5763 static const struct lrecord_description string_description[] = @{ | |
5764 @{ XD_BYTECOUNT, offsetof (Lisp_String, size) @}, | |
5765 @{ XD_OPAQUE_DATA_PTR, offsetof (Lisp_String, data), XD_INDIRECT(0, 1) @}, | |
5766 @{ XD_LISP_OBJECT, offsetof (Lisp_String, plist) @}, | |
5767 @{ XD_END @} | |
5768 @}; | |
5769 @end example | |
5770 | |
5771 The first line indicates a member of type Bytecount, which is used by | |
5772 the next, indirect directive. The second means "there is a pointer to | |
5773 some opaque data in the field @code{data}". The length of said data is | |
5774 given by the expression @code{XD_INDIRECT(0, 1)}, which means "the value | |
5775 in the 0th line of the description (welcome to C) plus one". The third | |
5776 line means "there is a Lisp_Object member @code{plist} in the Lisp_String | |
5777 structure". @code{XD_END} then ends the description. | |
5778 | |
5779 This gives us all the information we need to move around what is pointed | |
5780 to by a structure (C or lrecord) and, by transitivity, everything that | |
5781 it points to. The only missing information for dumping is the size of | |
5782 the structure. For lrecords, this is part of the | |
5783 lrecord_implementation, so we don't need to duplicate it. For C | |
5784 structures we use a struct struct_description, which includes a size | |
5785 field and a pointer to an associated array of lrecord_description. | |
5786 | |
5787 @node Dumping phase, Reloading phase, Data descriptions, Dumping | |
5788 @section Dumping phase | |
5789 | |
5790 Dumping is done by calling the function pdump() (in dumper.c) which is | |
5791 invoked from Fdump_emacs (in emacs.c). This function performs a number | |
5792 of tasks. | |
5793 | |
5794 @menu | |
5795 * Object inventory:: | |
5796 * Address allocation:: | |
5797 * The header:: | |
5798 * Data dumping:: | |
5799 * Pointers dumping:: | |
5800 @end menu | |
5801 | |
5802 @node Object inventory, Address allocation, Dumping phase, Dumping phase | |
5803 @subsection Object inventory | |
5804 | |
5805 The first task is to build the list of the objects to dump. This | |
5806 includes: | |
5807 | |
5808 @itemize @bullet | |
5809 @item lisp objects | |
5810 @item C structures | |
5811 @end itemize | |
5812 | |
5813 We end up with one @code{pdump_entry_list_elmt} per object group (arrays | |
5814 of C structs are kept together) which includes a pointer to the first | |
5815 object of the group, the per-object size and the count of objects in the | |
5816 group, along with some other information which is initialized later. | |
5817 | |
5818 These entries are linked together in @code{pdump_entry_list} structures | |
5819 and can be enumerated thru either: | |
5820 | |
5821 @enumerate | |
5822 @item | |
5823 the @code{pdump_object_table}, an array of @code{pdump_entry_list}, one | |
5824 per lrecord type, indexed by type number. | |
5825 | |
5826 @item | |
5827 the @code{pdump_opaque_data_list}, used for the opaque data which does | |
5828 not include pointers, and hence does not need descriptions. | |
5829 | |
5830 @item | |
5831 the @code{pdump_struct_table}, which is a vector of | |
5832 @code{struct_description}/@code{pdump_entry_list} pairs, used for | |
5833 non-opaque C structures. | |
5834 @end enumerate | |
5835 | |
5836 This uses a marking strategy similar to the garbage collector. Some | |
5837 differences though: | |
5838 | |
5839 @enumerate | |
5840 @item | |
5841 We do not use the mark bit (which does not exist for C structures | |
5842 anyway), we use a big hash table instead. | |
5843 | |
5844 @item | |
5845 We do not use the mark function of lrecords but instead rely on the | |
5846 external descriptions. This happens essentially because we need to | |
5847 follow pointers to C structures and opaque data in addition to | |
5848 Lisp_Object members. | |
5849 @end enumerate | |
5850 | |
5851 This is done by @code{pdump_register_object}, which handles Lisp_Object | |
5852 variables, and pdump_register_struct which handles C structures, which | |
5853 both delegate the description management to pdump_register_sub. | |
5854 | |
5855 The hash table doubles as a map object to pdump_entry_list_elmt (i.e. | |
5856 allows us to look up a pdump_entry_list_elmt with the object it points | |
5857 to). Entries are added with @code{pdump_add_entry()} and looked up with | |
5858 @code{pdump_get_entry()}. There is no need for entry removal. The hash | |
5859 value is computed quite basically from the object pointer by | |
5860 @code{pdump_make_hash()}. | |
5861 | |
5862 The roots for the marking are: | |
5863 | |
5864 @enumerate | |
5865 @item | |
5866 the @code{staticpro}'ed variables (there is a special @code{staticpro_nodump()} | |
5867 call for protected variables we do not want to dump). | |
5868 | |
5869 @item | |
5870 the @code{pdump_wire}'d variables (@code{staticpro} is equivalent to | |
5871 @code{staticpro_nodump()} + @code{pdump_wire()}). | |
5872 | |
5873 @item | |
5874 the @code{dumpstruct}'ed variables, which points to C structures. | |
5875 @end enumerate | |
5876 | |
5877 This does not include the GCPRO'ed variables, the specbinds, the | |
5878 catchtags, the backlist, the redisplay or the profiling info, since we | |
5879 do not want to rebuild the actual chain of lisp calls which end up to | |
5880 the dump-emacs call, only the global variables. | |
5881 | |
5882 Weak lists and weak hash tables are dumped as if they were their | |
5883 non-weak equivalent (without changing their type, of course). This has | |
5884 not yet been a problem. | |
5885 | |
5886 @node Address allocation, The header, Object inventory, Dumping phase | |
5887 @subsection Address allocation | |
5888 | |
5889 | |
5890 The next step is to allocate the offsets of each of the objects in the | |
5891 final dump file. This is done by @code{pdump_allocate_offset()} which | |
5892 is called indirectly by @code{pdump_scan_by_alignment()}. | |
5893 | |
5894 The strategy to deal with alignment problems uses these facts: | |
5895 | |
5896 @enumerate | |
5897 @item | |
5898 real world alignment requirements are powers of two. | |
5899 | |
5900 @item | |
5901 the C compiler is required to adjust the size of a struct so that you | |
5902 can have an array of them next to each other. This means you can have a | |
5903 upper bound of the alignment requirements of a given structure by | |
5904 looking at which power of two its size is a multiple. | |
5905 | |
5906 @item | |
5907 the non-variant part of variable size lrecords has an alignment | |
5908 requirement of 4. | |
5909 @end enumerate | |
5910 | |
5911 Hence, for each lrecord type, C struct type or opaque data block the | |
5912 alignment requirement is computed as a power of two, with a minimum of | |
5913 2^2 for lrecords. @code{pdump_scan_by_alignment()} then scans all the | |
5914 @code{pdump_entry_list_elmt}'s, the ones with the highest requirements | |
5915 first. This ensures the best packing. | |
5916 | |
5917 The maximum alignment requirement we take into account is 2^8. | |
5918 | |
5919 @code{pdump_allocate_offset()} only has to do a linear allocation, | |
5920 starting at offset 256 (this leaves room for the header and keep the | |
5921 alignments happy). | |
5922 | |
5923 @node The header, Data dumping, Address allocation, Dumping phase | |
5924 @subsection The header | |
5925 | |
5926 The next step creates the file and writes a header with a signature and | |
5927 some random informations in it (number of staticpro, number of assigned | |
5928 lrecord types, etc...). The reloc_address field, which indicates at | |
5929 which address the file should be loaded if we want to avoid post-reload | |
5930 relocation, is set to 0. It then seeks to offset 256 (base offset for | |
5931 the objects). | |
5932 | |
5933 @node Data dumping, Pointers dumping, The header, Dumping phase | |
5934 @subsection Data dumping | |
5935 | |
5936 The data is dumped in the same order as the addresses were allocated by | |
5937 @code{pdump_dump_data()}, called from @code{pdump_scan_by_alignment()}. | |
5938 This function copies the data to a temporary buffer, relocates all | |
5939 pointers in the object to the addresses allocated in step Address | |
5940 Allocation, and writes it to the file. Using the same order means that, | |
5941 if we are careful with lrecords whose size is not a multiple of 4, we | |
5942 are ensured that the object is always written at the offset in the file | |
5943 allocated in step Address Allocation. | |
5944 | |
5945 @node Pointers dumping, , Data dumping, Dumping phase | |
5946 @subsection Pointers dumping | |
5947 | |
5948 A bunch of tables needed to reassign properly the global pointers are | |
5949 then written. They are: | |
5950 | |
5951 @enumerate | |
5952 @item | |
5953 the staticpro array | |
5954 @item | |
5955 the dumpstruct array | |
5956 @item | |
5957 the lrecord_implementation_table array | |
5958 @item | |
5959 a vector of all the offsets to the objects in the file that include a | |
5960 description (for faster relocation at reload time) | |
5961 @item | |
5962 the pdump_wired and pdump_wired_list arrays | |
5963 @end enumerate | |
5964 | |
5965 For each of the arrays we write both the pointer to the variables and | |
5966 the relocated offset of the object they point to. Since these variables | |
5967 are global, the pointers are still valid when restarting the program and | |
5968 are used to regenerate the global pointers. | |
5969 | |
5970 The @code{pdump_wired_list} array is a special case. The variables it | |
5971 points to are the head of weak linked lists of lisp objects of the same | |
5972 type. Not all objects of this list are dumped so the relocated pointer | |
5973 we associate with them points to the first dumped object of the list, or | |
5974 Qnil if none is available. This is also the reason why they are not | |
5975 used as roots for the purpose of object enumeration. | |
5976 | |
5977 This is the end of the dumping part. | |
5978 | |
5979 @node Reloading phase, Remaining issues, Dumping phase, Dumping | |
5980 @section Reloading phase | |
5981 | |
5982 @subsection File loading | |
5983 | |
5984 The file is mmap'ed in memory (which ensures a PAGESIZE alignment, at | |
5985 least 4096), or if mmap is unavailable or fails, a 256-bytes aligned | |
5986 malloc is done and the file is loaded. | |
5987 | |
5988 Some variables are reinitialized from the values found in the header. | |
5989 | |
5990 The difference between the actual loading address and the reloc_address | |
5991 is computed and will be used for all the relocations. | |
5992 | |
5993 | |
5994 @subsection Putting back the staticvec | |
5995 | |
5996 The staticvec array is memcpy'd from the file and the variables it | |
5997 points to are reset to the relocated objects addresses. | |
5998 | |
5999 | |
6000 @subsection Putting back the dumpstructed variables | |
6001 | |
6002 The variables pointed to by dumpstruct in the dump phase are reset to | |
6003 the right relocated object addresses. | |
6004 | |
6005 | |
6006 @subsection lrecord_implementations_table | |
6007 | |
6008 The lrecord_implementations_table is reset to its dump time state and | |
6009 the right lrecord_type_index values are put in. | |
6010 | |
6011 | |
6012 @subsection Object relocation | |
6013 | |
6014 All the objects are relocated using their description and their offset | |
6015 by @code{pdump_reloc_one}. This step is unnecessary if the | |
6016 reloc_address is equal to the file loading address. | |
6017 | |
6018 | |
6019 @subsection Putting back the pdump_wire and pdump_wire_list variables | |
6020 | |
6021 Same as Putting back the dumpstructed variables. | |
6022 | |
6023 | |
6024 @subsection Reorganize the hash tables | |
6025 | |
6026 Since some of the hash values in the lisp hash tables are | |
6027 address-dependent, their layout is now wrong. So we go through each of | |
6028 them and have them resorted by calling @code{pdump_reorganize_hash_table}. | |
6029 | |
6030 @node Remaining issues, , Reloading phase, Dumping | |
6031 @section Remaining issues | |
6032 | |
6033 The build process will have to start a post-dump xemacs, ask it the | |
6034 loading address (which will, hopefully, be always the same between | |
6035 different xemacs invocations) and relocate the file to the new address. | |
6036 This way the object relocation phase will not have to be done, which | |
6037 means no writes in the objects and that, because of the use of mmap, the | |
6038 dumped data will be shared between all the xemacs running on the | |
6039 computer. | |
6040 | |
6041 Some executable signature will be necessary to ensure that a given dump | |
6042 file is really associated with a given executable, or random crashes | |
6043 will occur. Maybe a random number set at compile or configure time thru | |
6044 a define. This will also allow for having differently-compiled xemacsen | |
6045 on the same system (mule and no-mule comes to mind). | |
6046 | |
6047 The DOC file contents should probably end up in the dump file. | |
6048 | |
6049 | |
6050 @node Events and the Event Loop, Evaluation; Stack Frames; Bindings, Dumping, Top | |
6051 @chapter Events and the Event Loop | 5201 @chapter Events and the Event Loop |
6052 | 5202 |
6053 @menu | 5203 @menu |
6054 * Introduction to Events:: | 5204 * Introduction to Events:: |
6055 * Main Loop:: | 5205 * Main Loop:: |
6059 * Other Event Loop Functions:: | 5209 * Other Event Loop Functions:: |
6060 * Converting Events:: | 5210 * Converting Events:: |
6061 * Dispatching Events; The Command Builder:: | 5211 * Dispatching Events; The Command Builder:: |
6062 @end menu | 5212 @end menu |
6063 | 5213 |
6064 @node Introduction to Events, Main Loop, Events and the Event Loop, Events and the Event Loop | 5214 @node Introduction to Events |
6065 @section Introduction to Events | 5215 @section Introduction to Events |
6066 | 5216 |
6067 An event is an object that encapsulates information about an | 5217 An event is an object that encapsulates information about an |
6068 interesting occurrence in the operating system. Events are | 5218 interesting occurrence in the operating system. Events are |
6069 generated either by user action, direct (e.g. typing on the | 5219 generated either by user action, direct (e.g. typing on the |
6093 XEmacs has its own types of events (called @dfn{Emacs events}), | 5243 XEmacs has its own types of events (called @dfn{Emacs events}), |
6094 which provides an abstract layer on top of the system-dependent | 5244 which provides an abstract layer on top of the system-dependent |
6095 nature of the most basic events that are received. Part of the | 5245 nature of the most basic events that are received. Part of the |
6096 complex nature of the XEmacs event collection process involves | 5246 complex nature of the XEmacs event collection process involves |
6097 converting from the operating-system events into the proper | 5247 converting from the operating-system events into the proper |
6098 Emacs events---there may not be a one-to-one correspondence. | 5248 Emacs events -- there may not be a one-to-one correspondence. |
6099 | 5249 |
6100 Emacs events are documented in @file{events.h}; I'll discuss them | 5250 Emacs events are documented in @file{events.h}; I'll discuss them |
6101 later. | 5251 later. |
6102 | 5252 |
6103 @node Main Loop, Specifics of the Event Gathering Mechanism, Introduction to Events, Events and the Event Loop | 5253 @node Main Loop |
6104 @section Main Loop | 5254 @section Main Loop |
6105 | 5255 |
6106 The @dfn{command loop} is the top-level loop that the editor is always | 5256 The @dfn{command loop} is the top-level loop that the editor is always |
6107 running. It loops endlessly, calling @code{next-event} to retrieve an | 5257 running. It loops endlessly, calling @code{next-event} to retrieve an |
6108 event and @code{dispatch-event} to execute it. @code{dispatch-event} does | 5258 event and @code{dispatch-event} to execute it. @code{dispatch-event} does |
6118 one console), and the engine that looks up keystrokes and | 5268 one console), and the engine that looks up keystrokes and |
6119 constructs full key sequences is called the @dfn{command builder}. | 5269 constructs full key sequences is called the @dfn{command builder}. |
6120 This is documented elsewhere. | 5270 This is documented elsewhere. |
6121 | 5271 |
6122 The guts of the command loop are in @code{command_loop_1()}. This | 5272 The guts of the command loop are in @code{command_loop_1()}. This |
6123 function doesn't catch errors, though---that's the job of | 5273 function doesn't catch errors, though -- that's the job of |
6124 @code{command_loop_2()}, which is a condition-case (i.e. error-trapping) | 5274 @code{command_loop_2()}, which is a condition-case (i.e. error-trapping) |
6125 wrapper around @code{command_loop_1()}. @code{command_loop_1()} never | 5275 wrapper around @code{command_loop_1()}. @code{command_loop_1()} never |
6126 returns, but may get thrown out of. | 5276 returns, but may get thrown out of. |
6127 | 5277 |
6128 When an error occurs, @code{cmd_error()} is called, which usually | 5278 When an error occurs, @code{cmd_error()} is called, which usually |
6165 wrapper similar to @code{command_loop_2()}. Note also that | 5315 wrapper similar to @code{command_loop_2()}. Note also that |
6166 @code{initial_command_loop()} sets up a catch for @code{top-level} when | 5316 @code{initial_command_loop()} sets up a catch for @code{top-level} when |
6167 invoking @code{top_level_1()}, just like when it invokes | 5317 invoking @code{top_level_1()}, just like when it invokes |
6168 @code{command_loop_2()}. | 5318 @code{command_loop_2()}. |
6169 | 5319 |
6170 @node Specifics of the Event Gathering Mechanism, Specifics About the Emacs Event, Main Loop, Events and the Event Loop | 5320 @node Specifics of the Event Gathering Mechanism |
6171 @section Specifics of the Event Gathering Mechanism | 5321 @section Specifics of the Event Gathering Mechanism |
6172 | 5322 |
6173 Here is an approximate diagram of the collection processes | 5323 Here is an approximate diagram of the collection processes |
6174 at work in XEmacs, under TTY's (TTY's are simpler than X | 5324 at work in XEmacs, under TTY's (TTY's are simpler than X |
6175 so we'll look at this first): | 5325 so we'll look at this first): |
6404 which repeatedly calls `next-event' | 5554 which repeatedly calls `next-event' |
6405 and then dispatches the event | 5555 and then dispatches the event |
6406 using `dispatch-event' | 5556 using `dispatch-event' |
6407 @end example | 5557 @end example |
6408 | 5558 |
6409 @node Specifics About the Emacs Event, The Event Stream Callback Routines, Specifics of the Event Gathering Mechanism, Events and the Event Loop | 5559 @node Specifics About the Emacs Event |
6410 @section Specifics About the Emacs Event | 5560 @section Specifics About the Emacs Event |
6411 | 5561 |
6412 @node The Event Stream Callback Routines, Other Event Loop Functions, Specifics About the Emacs Event, Events and the Event Loop | 5562 @node The Event Stream Callback Routines |
6413 @section The Event Stream Callback Routines | 5563 @section The Event Stream Callback Routines |
6414 | 5564 |
6415 @node Other Event Loop Functions, Converting Events, The Event Stream Callback Routines, Events and the Event Loop | 5565 @node Other Event Loop Functions |
6416 @section Other Event Loop Functions | 5566 @section Other Event Loop Functions |
6417 | 5567 |
6418 @code{detect_input_pending()} and @code{input-pending-p} look for | 5568 @code{detect_input_pending()} and @code{input-pending-p} look for |
6419 input by calling @code{event_stream->event_pending_p} and looking in | 5569 input by calling @code{event_stream->event_pending_p} and looking in |
6420 @code{[V]unread-command-event} and the @code{command_event_queue} (they | 5570 @code{[V]unread-command-event} and the @code{command_event_queue} (they |
6432 @code{read-char} calls @code{next-command-event} and uses | 5582 @code{read-char} calls @code{next-command-event} and uses |
6433 @code{event_to_character()} to return the character equivalent. With | 5583 @code{event_to_character()} to return the character equivalent. With |
6434 the right kind of input method support, it is possible for (read-char) | 5584 the right kind of input method support, it is possible for (read-char) |
6435 to return a Kanji character. | 5585 to return a Kanji character. |
6436 | 5586 |
6437 @node Converting Events, Dispatching Events; The Command Builder, Other Event Loop Functions, Events and the Event Loop | 5587 @node Converting Events |
6438 @section Converting Events | 5588 @section Converting Events |
6439 | 5589 |
6440 @code{character_to_event()}, @code{event_to_character()}, | 5590 @code{character_to_event()}, @code{event_to_character()}, |
6441 @code{event-to-character}, and @code{character-to-event} convert between | 5591 @code{event-to-character}, and @code{character-to-event} convert between |
6442 characters and keypress events corresponding to the characters. If the | 5592 characters and keypress events corresponding to the characters. If the |
6443 event was not a keypress, @code{event_to_character()} returns -1 and | 5593 event was not a keypress, @code{event_to_character()} returns -1 and |
6444 @code{event-to-character} returns @code{nil}. These functions convert | 5594 @code{event-to-character} returns @code{nil}. These functions convert |
6445 between character representation and the split-up event representation | 5595 between character representation and the split-up event representation |
6446 (keysym plus mod keys). | 5596 (keysym plus mod keys). |
6447 | 5597 |
6448 @node Dispatching Events; The Command Builder, , Converting Events, Events and the Event Loop | 5598 @node Dispatching Events; The Command Builder |
6449 @section Dispatching Events; The Command Builder | 5599 @section Dispatching Events; The Command Builder |
6450 | 5600 |
6451 Not yet documented. | 5601 Not yet documented. |
6452 | 5602 |
6453 @node Evaluation; Stack Frames; Bindings, Symbols and Variables, Events and the Event Loop, Top | 5603 @node Evaluation; Stack Frames; Bindings, Symbols and Variables, Events and the Event Loop, Top |
6458 * Dynamic Binding; The specbinding Stack; Unwind-Protects:: | 5608 * Dynamic Binding; The specbinding Stack; Unwind-Protects:: |
6459 * Simple Special Forms:: | 5609 * Simple Special Forms:: |
6460 * Catch and Throw:: | 5610 * Catch and Throw:: |
6461 @end menu | 5611 @end menu |
6462 | 5612 |
6463 @node Evaluation, Dynamic Binding; The specbinding Stack; Unwind-Protects, Evaluation; Stack Frames; Bindings, Evaluation; Stack Frames; Bindings | 5613 @node Evaluation |
6464 @section Evaluation | 5614 @section Evaluation |
6465 | 5615 |
6466 @code{Feval()} evaluates the form (a Lisp object) that is passed to | 5616 @code{Feval()} evaluates the form (a Lisp object) that is passed to |
6467 it. Note that evaluation is only non-trivial for two types of objects: | 5617 it. Note that evaluation is only non-trivial for two types of objects: |
6468 symbols and conses. A symbol is evaluated simply by calling | 5618 symbols and conses. A symbol is evaluated simply by calling |
6527 @code{funcall_compiled_function()} calls the real byte-code interpreter | 5677 @code{funcall_compiled_function()} calls the real byte-code interpreter |
6528 @code{execute_optimized_program()} on the byte-code instructions, which | 5678 @code{execute_optimized_program()} on the byte-code instructions, which |
6529 are converted into an internal form for faster execution. | 5679 are converted into an internal form for faster execution. |
6530 | 5680 |
6531 When a compiled function is executed for the first time by | 5681 When a compiled function is executed for the first time by |
6532 @code{funcall_compiled_function()}, or during the dump phase of building | 5682 @code{funcall_compiled_function()}, or when it is @code{Fpurecopy()}ed |
6533 XEmacs, the byte-code instructions are converted from a | 5683 during the dump phase of building XEmacs, the byte-code instructions are |
6534 @code{Lisp_String} (which is inefficient to access, especially in the | 5684 converted from a @code{Lisp_String} (which is inefficient to access, |
6535 presence of MULE) into a @code{Lisp_Opaque} object containing an array | 5685 especially in the presence of MULE) into a @code{Lisp_Opaque} object |
6536 of unsigned char, which can be directly executed by the byte-code | 5686 containing an array of unsigned char, which can be directly executed by |
6537 interpreter. At this time the byte code is also analyzed for validity | 5687 the byte-code interpreter. At this time the byte code is also analyzed |
6538 and transformed into a more optimized form, so that | 5688 for validity and transformed into a more optimized form, so that |
6539 @code{execute_optimized_program()} can really fly. | 5689 @code{execute_optimized_program()} can really fly. |
6540 | 5690 |
6541 Here are some of the optimizations performed by the internal byte-code | 5691 Here are some of the optimizations performed by the internal byte-code |
6542 transformer: | 5692 transformer: |
6543 @enumerate | 5693 @enumerate |
6548 References to the @code{constants} array that will be used as a Lisp | 5698 References to the @code{constants} array that will be used as a Lisp |
6549 variable are checked for being correct non-constant (i.e. not @code{t}, | 5699 variable are checked for being correct non-constant (i.e. not @code{t}, |
6550 @code{nil}, or @code{keywordp}) symbols, so that the byte interpreter | 5700 @code{nil}, or @code{keywordp}) symbols, so that the byte interpreter |
6551 doesn't have to. | 5701 doesn't have to. |
6552 @item | 5702 @item |
6553 The maximum number of variable bindings in the byte-code is | 5703 The maxiumum number of variable bindings in the byte-code is |
6554 pre-computed, so that space on the @code{specpdl} stack can be | 5704 pre-computed, so that space on the @code{specpdl} stack can be |
6555 pre-reserved once for the whole function execution. | 5705 pre-reserved once for the whole function execution. |
6556 @item | 5706 @item |
6557 All byte-code jumps are relative to the current program counter instead | 5707 All byte-code jumps are relative to the current program counter instead |
6558 of the start of the program, thereby saving a register. | 5708 of the start of the program, thereby saving a register. |
6588 @code{call3()} call a function, passing it the argument(s) given (the | 5738 @code{call3()} call a function, passing it the argument(s) given (the |
6589 arguments are given as separate C arguments rather than being passed as | 5739 arguments are given as separate C arguments rather than being passed as |
6590 an array). @code{apply1()} uses @code{Fapply()} while the others use | 5740 an array). @code{apply1()} uses @code{Fapply()} while the others use |
6591 @code{Ffuncall()} to do the real work. | 5741 @code{Ffuncall()} to do the real work. |
6592 | 5742 |
6593 @node Dynamic Binding; The specbinding Stack; Unwind-Protects, Simple Special Forms, Evaluation, Evaluation; Stack Frames; Bindings | 5743 @node Dynamic Binding; The specbinding Stack; Unwind-Protects |
6594 @section Dynamic Binding; The specbinding Stack; Unwind-Protects | 5744 @section Dynamic Binding; The specbinding Stack; Unwind-Protects |
6595 | 5745 |
6596 @example | 5746 @example |
6597 struct specbinding | 5747 struct specbinding |
6598 @{ | 5748 @{ |
6642 a local-variable binding (@code{func} is 0, @code{symbol} is not | 5792 a local-variable binding (@code{func} is 0, @code{symbol} is not |
6643 @code{nil}, and @code{old_value} holds the old value, which is stored as | 5793 @code{nil}, and @code{old_value} holds the old value, which is stored as |
6644 the symbol's value). | 5794 the symbol's value). |
6645 @end enumerate | 5795 @end enumerate |
6646 | 5796 |
6647 @node Simple Special Forms, Catch and Throw, Dynamic Binding; The specbinding Stack; Unwind-Protects, Evaluation; Stack Frames; Bindings | 5797 @node Simple Special Forms |
6648 @section Simple Special Forms | 5798 @section Simple Special Forms |
6649 | 5799 |
6650 @code{or}, @code{and}, @code{if}, @code{cond}, @code{progn}, | 5800 @code{or}, @code{and}, @code{if}, @code{cond}, @code{progn}, |
6651 @code{prog1}, @code{prog2}, @code{setq}, @code{quote}, @code{function}, | 5801 @code{prog1}, @code{prog2}, @code{setq}, @code{quote}, @code{function}, |
6652 @code{let*}, @code{let}, @code{while} | 5802 @code{let*}, @code{let}, @code{while} |
6654 All of these are very simple and work as expected, calling | 5804 All of these are very simple and work as expected, calling |
6655 @code{Feval()} or @code{Fprogn()} as necessary and (in the case of | 5805 @code{Feval()} or @code{Fprogn()} as necessary and (in the case of |
6656 @code{let} and @code{let*}) using @code{specbind()} to create bindings | 5806 @code{let} and @code{let*}) using @code{specbind()} to create bindings |
6657 and @code{unbind_to()} to undo the bindings when finished. | 5807 and @code{unbind_to()} to undo the bindings when finished. |
6658 | 5808 |
6659 Note that, with the exception of @code{Fprogn}, these functions are | 5809 Note that, with the exeption of @code{Fprogn}, these functions are |
6660 typically called in real life only in interpreted code, since the byte | 5810 typically called in real life only in interpreted code, since the byte |
6661 compiler knows how to convert calls to these functions directly into | 5811 compiler knows how to convert calls to these functions directly into |
6662 byte code. | 5812 byte code. |
6663 | 5813 |
6664 @node Catch and Throw, , Simple Special Forms, Evaluation; Stack Frames; Bindings | 5814 @node Catch and Throw |
6665 @section Catch and Throw | 5815 @section Catch and Throw |
6666 | 5816 |
6667 @example | 5817 @example |
6668 struct catchtag | 5818 struct catchtag |
6669 @{ | 5819 @{ |
6727 * Introduction to Symbols:: | 5877 * Introduction to Symbols:: |
6728 * Obarrays:: | 5878 * Obarrays:: |
6729 * Symbol Values:: | 5879 * Symbol Values:: |
6730 @end menu | 5880 @end menu |
6731 | 5881 |
6732 @node Introduction to Symbols, Obarrays, Symbols and Variables, Symbols and Variables | 5882 @node Introduction to Symbols |
6733 @section Introduction to Symbols | 5883 @section Introduction to Symbols |
6734 | 5884 |
6735 A symbol is basically just an object with four fields: a name (a | 5885 A symbol is basically just an object with four fields: a name (a |
6736 string), a value (some Lisp object), a function (some Lisp object), and | 5886 string), a value (some Lisp object), a function (some Lisp object), and |
6737 a property list (usually a list of alternating keyword/value pairs). | 5887 a property list (usually a list of alternating keyword/value pairs). |
6744 there can be a distinct function and variable with the same name. The | 5894 there can be a distinct function and variable with the same name. The |
6745 property list is used as a more general mechanism of associating | 5895 property list is used as a more general mechanism of associating |
6746 additional values with particular names, and once again the namespace is | 5896 additional values with particular names, and once again the namespace is |
6747 independent of the function and variable namespaces. | 5897 independent of the function and variable namespaces. |
6748 | 5898 |
6749 @node Obarrays, Symbol Values, Introduction to Symbols, Symbols and Variables | 5899 @node Obarrays |
6750 @section Obarrays | 5900 @section Obarrays |
6751 | 5901 |
6752 The identity of symbols with their names is accomplished through a | 5902 The identity of symbols with their names is accomplished through a |
6753 structure called an obarray, which is just a poorly-implemented hash | 5903 structure called an obarray, which is just a poorly-implemented hash |
6754 table mapping from strings to symbols whose name is that string. (I say | 5904 table mapping from strings to symbols whose name is that string. (I say |
6811 a new one, and @code{unintern} to remove a symbol from an obarray. This | 5961 a new one, and @code{unintern} to remove a symbol from an obarray. This |
6812 returns the removed symbol. (Remember: You can't put the symbol back | 5962 returns the removed symbol. (Remember: You can't put the symbol back |
6813 into any obarray.) Finally, @code{mapatoms} maps over all of the symbols | 5963 into any obarray.) Finally, @code{mapatoms} maps over all of the symbols |
6814 in an obarray. | 5964 in an obarray. |
6815 | 5965 |
6816 @node Symbol Values, , Obarrays, Symbols and Variables | 5966 @node Symbol Values |
6817 @section Symbol Values | 5967 @section Symbol Values |
6818 | 5968 |
6819 The value field of a symbol normally contains a Lisp object. However, | 5969 The value field of a symbol normally contains a Lisp object. However, |
6820 a symbol can be @dfn{unbound}, meaning that it logically has no value. | 5970 a symbol can be @dfn{unbound}, meaning that it logically has no value. |
6821 This is internally indicated by storing a special Lisp object, called | 5971 This is internally indicated by storing a special Lisp object, called |
6866 * Markers and Extents:: Tagging locations within a buffer. | 6016 * Markers and Extents:: Tagging locations within a buffer. |
6867 * Bufbytes and Emchars:: Representation of individual characters. | 6017 * Bufbytes and Emchars:: Representation of individual characters. |
6868 * The Buffer Object:: The Lisp object corresponding to a buffer. | 6018 * The Buffer Object:: The Lisp object corresponding to a buffer. |
6869 @end menu | 6019 @end menu |
6870 | 6020 |
6871 @node Introduction to Buffers, The Text in a Buffer, Buffers and Textual Representation, Buffers and Textual Representation | 6021 @node Introduction to Buffers |
6872 @section Introduction to Buffers | 6022 @section Introduction to Buffers |
6873 | 6023 |
6874 A buffer is logically just a Lisp object that holds some text. | 6024 A buffer is logically just a Lisp object that holds some text. |
6875 In this, it is like a string, but a buffer is optimized for | 6025 In this, it is like a string, but a buffer is optimized for |
6876 frequent insertion and deletion, while a string is not. Furthermore: | 6026 frequent insertion and deletion, while a string is not. Furthermore: |
6919 and @dfn{buffer of the selected window}, and the distinction between | 6069 and @dfn{buffer of the selected window}, and the distinction between |
6920 @dfn{point} of the current buffer and @dfn{window-point} of the selected | 6070 @dfn{point} of the current buffer and @dfn{window-point} of the selected |
6921 window. (This latter distinction is explained in detail in the section | 6071 window. (This latter distinction is explained in detail in the section |
6922 on windows.) | 6072 on windows.) |
6923 | 6073 |
6924 @node The Text in a Buffer, Buffer Lists, Introduction to Buffers, Buffers and Textual Representation | 6074 @node The Text in a Buffer |
6925 @section The Text in a Buffer | 6075 @section The Text in a Buffer |
6926 | 6076 |
6927 The text in a buffer consists of a sequence of zero or more | 6077 The text in a buffer consists of a sequence of zero or more |
6928 characters. A @dfn{character} is an integer that logically represents | 6078 characters. A @dfn{character} is an integer that logically represents |
6929 a letter, number, space, or other unit of text. Most of the characters | 6079 a letter, number, space, or other unit of text. Most of the characters |
7059 Bufbytes underscores the fact that we are working with a string of bytes | 6209 Bufbytes underscores the fact that we are working with a string of bytes |
7060 in the internal Emacs buffer representation rather than in one of a | 6210 in the internal Emacs buffer representation rather than in one of a |
7061 number of possible alternative representations (e.g. EUC-encoded text, | 6211 number of possible alternative representations (e.g. EUC-encoded text, |
7062 etc.). | 6212 etc.). |
7063 | 6213 |
7064 @node Buffer Lists, Markers and Extents, The Text in a Buffer, Buffers and Textual Representation | 6214 @node Buffer Lists |
7065 @section Buffer Lists | 6215 @section Buffer Lists |
7066 | 6216 |
7067 Recall earlier that buffers are @dfn{permanent} objects, i.e. that | 6217 Recall earlier that buffers are @dfn{permanent} objects, i.e. that |
7068 they remain around until explicitly deleted. This entails that there is | 6218 they remain around until explicitly deleted. This entails that there is |
7069 a list of all the buffers in existence. This list is actually an | 6219 a list of all the buffers in existence. This list is actually an |
7095 respectively. You can also force a new buffer to be created using | 6245 respectively. You can also force a new buffer to be created using |
7096 @code{generate-new-buffer}, which takes a name and (if necessary) makes | 6246 @code{generate-new-buffer}, which takes a name and (if necessary) makes |
7097 a unique name from this by appending a number, and then creates the | 6247 a unique name from this by appending a number, and then creates the |
7098 buffer. This is basically like the symbol operation @code{gensym}. | 6248 buffer. This is basically like the symbol operation @code{gensym}. |
7099 | 6249 |
7100 @node Markers and Extents, Bufbytes and Emchars, Buffer Lists, Buffers and Textual Representation | 6250 @node Markers and Extents |
7101 @section Markers and Extents | 6251 @section Markers and Extents |
7102 | 6252 |
7103 Among the things associated with a buffer are things that are | 6253 Among the things associated with a buffer are things that are |
7104 logically attached to certain buffer positions. This can be used to | 6254 logically attached to certain buffer positions. This can be used to |
7105 keep track of a buffer position when text is inserted and deleted, so | 6255 keep track of a buffer position when text is inserted and deleted, so |
7121 | 6271 |
7122 The important thing here is that markers and extents simply contain | 6272 The important thing here is that markers and extents simply contain |
7123 buffer positions in them as integers, and every time text is inserted or | 6273 buffer positions in them as integers, and every time text is inserted or |
7124 deleted, these positions must be updated. In order to minimize the | 6274 deleted, these positions must be updated. In order to minimize the |
7125 amount of shuffling that needs to be done, the positions in markers and | 6275 amount of shuffling that needs to be done, the positions in markers and |
7126 extents (there's one per marker, two per extent) are stored in Meminds. | 6276 extents (there's one per marker, two per extent) and stored in Meminds. |
7127 This means that they only need to be moved when the text is physically | 6277 This means that they only need to be moved when the text is physically |
7128 moved in memory; since the gap structure tries to minimize this, it also | 6278 moved in memory; since the gap structure tries to minimize this, it also |
7129 minimizes the number of marker and extent indices that need to be | 6279 minimizes the number of marker and extent indices that need to be |
7130 adjusted. Look in @file{insdel.c} for the details of how this works. | 6280 adjusted. Look in @file{insdel.c} for the details of how this works. |
7131 | 6281 |
7135 is no way to determine what markers are in a buffer if you are just | 6285 is no way to determine what markers are in a buffer if you are just |
7136 given the buffer. Extents remain in a buffer until they are detached | 6286 given the buffer. Extents remain in a buffer until they are detached |
7137 (which could happen as a result of text being deleted) or the buffer is | 6287 (which could happen as a result of text being deleted) or the buffer is |
7138 deleted, and primitives do exist to enumerate the extents in a buffer. | 6288 deleted, and primitives do exist to enumerate the extents in a buffer. |
7139 | 6289 |
7140 @node Bufbytes and Emchars, The Buffer Object, Markers and Extents, Buffers and Textual Representation | 6290 @node Bufbytes and Emchars |
7141 @section Bufbytes and Emchars | 6291 @section Bufbytes and Emchars |
7142 | 6292 |
7143 Not yet documented. | 6293 Not yet documented. |
7144 | 6294 |
7145 @node The Buffer Object, , Bufbytes and Emchars, Buffers and Textual Representation | 6295 @node The Buffer Object |
7146 @section The Buffer Object | 6296 @section The Buffer Object |
7147 | 6297 |
7148 Buffers contain fields not directly accessible by the Lisp programmer. | 6298 Buffers contain fields not directly accessible by the Lisp programmer. |
7149 We describe them here, naming them by the names used in the C code. | 6299 We describe them here, naming them by the names used in the C code. |
7150 Many are accessible indirectly in Lisp programs via Lisp primitives. | 6300 Many are accessible indirectly in Lisp programs via Lisp primitives. |
7259 * Encodings:: | 6409 * Encodings:: |
7260 * Internal Mule Encodings:: | 6410 * Internal Mule Encodings:: |
7261 * CCL:: | 6411 * CCL:: |
7262 @end menu | 6412 @end menu |
7263 | 6413 |
7264 @node Character Sets, Encodings, MULE Character Sets and Encodings, MULE Character Sets and Encodings | 6414 @node Character Sets |
7265 @section Character Sets | 6415 @section Character Sets |
7266 | 6416 |
7267 A character set (or @dfn{charset}) is an ordered set of characters. A | 6417 A character set (or @dfn{charset}) is an ordered set of characters. A |
7268 particular character in a charset is indexed using one or more | 6418 particular character in a charset is indexed using one or more |
7269 @dfn{position codes}, which are non-negative integers. The number of | 6419 @dfn{position codes}, which are non-negative integers. The number of |
7340 160 - 255 Latin-1 32 - 127 | 6490 160 - 255 Latin-1 32 - 127 |
7341 @end example | 6491 @end example |
7342 | 6492 |
7343 This is a bit ad-hoc but gets the job done. | 6493 This is a bit ad-hoc but gets the job done. |
7344 | 6494 |
7345 @node Encodings, Internal Mule Encodings, Character Sets, MULE Character Sets and Encodings | 6495 @node Encodings |
7346 @section Encodings | 6496 @section Encodings |
7347 | 6497 |
7348 An @dfn{encoding} is a way of numerically representing characters from | 6498 An @dfn{encoding} is a way of numerically representing characters from |
7349 one or more character sets. If an encoding only encompasses one | 6499 one or more character sets. If an encoding only encompasses one |
7350 character set, then the position codes for the characters in that | 6500 character set, then the position codes for the characters in that |
7367 @menu | 6517 @menu |
7368 * Japanese EUC (Extended Unix Code):: | 6518 * Japanese EUC (Extended Unix Code):: |
7369 * JIS7:: | 6519 * JIS7:: |
7370 @end menu | 6520 @end menu |
7371 | 6521 |
7372 @node Japanese EUC (Extended Unix Code), JIS7, Encodings, Encodings | 6522 @node Japanese EUC (Extended Unix Code) |
7373 @subsection Japanese EUC (Extended Unix Code) | 6523 @subsection Japanese EUC (Extended Unix Code) |
7374 | 6524 |
7375 This encompasses the character sets Printing-ASCII, Japanese-JISX0201, | 6525 This encompasses the character sets Printing-ASCII, Japanese-JISX0201, |
7376 and Japanese-JISX0208-Kana (half-width katakana, the right half of | 6526 and Japanese-JISX0208-Kana (half-width katakana, the right half of |
7377 JISX0201). It uses 8-bit bytes. | 6527 JISX0201). It uses 8-bit bytes. |
7389 Japanese-JISX0208 PC1 + 0x80 | PC2 + 0x80 | 6539 Japanese-JISX0208 PC1 + 0x80 | PC2 + 0x80 |
7390 Japanese-JISX0212 PC1 + 0x80 | PC2 + 0x80 | 6540 Japanese-JISX0212 PC1 + 0x80 | PC2 + 0x80 |
7391 @end example | 6541 @end example |
7392 | 6542 |
7393 | 6543 |
7394 @node JIS7, , Japanese EUC (Extended Unix Code), Encodings | 6544 @node JIS7 |
7395 @subsection JIS7 | 6545 @subsection JIS7 |
7396 | 6546 |
7397 This encompasses the character sets Printing-ASCII, | 6547 This encompasses the character sets Printing-ASCII, |
7398 Japanese-JISX0201-Roman (the left half of JISX0201; this character set | 6548 Japanese-JISX0201-Roman (the left half of JISX0201; this character set |
7399 is very similar to Printing-ASCII and is a 94-character charset), | 6549 is very similar to Printing-ASCII and is a 94-character charset), |
7424 0x1B 0x28 0x42 ESC ( B invoke Printing-ASCII | 6574 0x1B 0x28 0x42 ESC ( B invoke Printing-ASCII |
7425 @end example | 6575 @end example |
7426 | 6576 |
7427 Initially, Printing-ASCII is invoked. | 6577 Initially, Printing-ASCII is invoked. |
7428 | 6578 |
7429 @node Internal Mule Encodings, CCL, Encodings, MULE Character Sets and Encodings | 6579 @node Internal Mule Encodings |
7430 @section Internal Mule Encodings | 6580 @section Internal Mule Encodings |
7431 | 6581 |
7432 In XEmacs/Mule, each character set is assigned a unique number, called a | 6582 In XEmacs/Mule, each character set is assigned a unique number, called a |
7433 @dfn{leading byte}. This is used in the encodings of a character. | 6583 @dfn{leading byte}. This is used in the encodings of a character. |
7434 Leading bytes are in the range 0x80 - 0xFF (except for ASCII, which has | 6584 Leading bytes are in the range 0x80 - 0xFF (except for ASCII, which has |
7470 @menu | 6620 @menu |
7471 * Internal String Encoding:: | 6621 * Internal String Encoding:: |
7472 * Internal Character Encoding:: | 6622 * Internal Character Encoding:: |
7473 @end menu | 6623 @end menu |
7474 | 6624 |
7475 @node Internal String Encoding, Internal Character Encoding, Internal Mule Encodings, Internal Mule Encodings | 6625 @node Internal String Encoding |
7476 @subsection Internal String Encoding | 6626 @subsection Internal String Encoding |
7477 | 6627 |
7478 ASCII characters are encoded using their position code directly. Other | 6628 ASCII characters are encoded using their position code directly. Other |
7479 characters are encoded using their leading byte followed by their | 6629 characters are encoded using their leading byte followed by their |
7480 position code(s) with the high bit set. Characters in private character | 6630 position code(s) with the high bit set. Characters in private character |
7520 None of the standard non-modal encodings meet all of these | 6670 None of the standard non-modal encodings meet all of these |
7521 conditions. For example, EUC satisfies only (2) and (3), while | 6671 conditions. For example, EUC satisfies only (2) and (3), while |
7522 Shift-JIS and Big5 (not yet described) satisfy only (2). (All | 6672 Shift-JIS and Big5 (not yet described) satisfy only (2). (All |
7523 non-modal encodings must satisfy (2), in order to be unambiguous.) | 6673 non-modal encodings must satisfy (2), in order to be unambiguous.) |
7524 | 6674 |
7525 @node Internal Character Encoding, , Internal String Encoding, Internal Mule Encodings | 6675 @node Internal Character Encoding |
7526 @subsection Internal Character Encoding | 6676 @subsection Internal Character Encoding |
7527 | 6677 |
7528 One 19-bit word represents a single character. The word is | 6678 One 19-bit word represents a single character. The word is |
7529 separated into three fields: | 6679 separated into three fields: |
7530 | 6680 |
7555 @end example | 6705 @end example |
7556 | 6706 |
7557 Note that character codes 0 - 255 are the same as the ``binary encoding'' | 6707 Note that character codes 0 - 255 are the same as the ``binary encoding'' |
7558 described above. | 6708 described above. |
7559 | 6709 |
7560 @node CCL, , Internal Mule Encodings, MULE Character Sets and Encodings | 6710 @node CCL |
7561 @section CCL | 6711 @section CCL |
7562 | 6712 |
7563 @example | 6713 @example |
7564 CCL PROGRAM SYNTAX: | 6714 CCL PROGRAM SYNTAX: |
7565 CCL_PROGRAM := (CCL_MAIN_BLOCK | 6715 CCL_PROGRAM := (CCL_MAIN_BLOCK |
7609 this is the code executed to handle any stuff that needs to be done | 6759 this is the code executed to handle any stuff that needs to be done |
7610 (e.g. designating back to ASCII and left-to-right mode) after all | 6760 (e.g. designating back to ASCII and left-to-right mode) after all |
7611 other encoded/decoded data has been written out. This is not used for | 6761 other encoded/decoded data has been written out. This is not used for |
7612 charset CCL programs. | 6762 charset CCL programs. |
7613 | 6763 |
7614 REGISTER: 0..7 -- referred by RRR or rrr | 6764 REGISTER: 0..7 -- refered by RRR or rrr |
7615 | 6765 |
7616 OPERATOR BIT FIELD (27-bit): XXXXXXXXXXXXXXX RRR TTTTT | 6766 OPERATOR BIT FIELD (27-bit): XXXXXXXXXXXXXXX RRR TTTTT |
7617 TTTTT (5-bit): operator type | 6767 TTTTT (5-bit): operator type |
7618 RRR (3-bit): register number | 6768 RRR (3-bit): register number |
7619 XXXXXXXXXXXXXXXX (15-bit): | 6769 XXXXXXXXXXXXXXXX (15-bit): |
7746 * Lstream Types:: Different sorts of things that are streamed. | 6896 * Lstream Types:: Different sorts of things that are streamed. |
7747 * Lstream Functions:: Functions for working with lstreams. | 6897 * Lstream Functions:: Functions for working with lstreams. |
7748 * Lstream Methods:: Creating new lstream types. | 6898 * Lstream Methods:: Creating new lstream types. |
7749 @end menu | 6899 @end menu |
7750 | 6900 |
7751 @node Creating an Lstream, Lstream Types, Lstreams, Lstreams | 6901 @node Creating an Lstream |
7752 @section Creating an Lstream | 6902 @section Creating an Lstream |
7753 | 6903 |
7754 Lstreams come in different types, depending on what is being interfaced | 6904 Lstreams come in different types, depending on what is being interfaced |
7755 to. Although the primitive for creating new lstreams is | 6905 to. Although the primitive for creating new lstreams is |
7756 @code{Lstream_new()}, generally you do not call this directly. Instead, | 6906 @code{Lstream_new()}, generally you do not call this directly. Instead, |
7777 Open for reading, but ``read'' never returns partial MULE characters. | 6927 Open for reading, but ``read'' never returns partial MULE characters. |
7778 @item "wc" | 6928 @item "wc" |
7779 Open for writing, but never writes partial MULE characters. | 6929 Open for writing, but never writes partial MULE characters. |
7780 @end table | 6930 @end table |
7781 | 6931 |
7782 @node Lstream Types, Lstream Functions, Creating an Lstream, Lstreams | 6932 @node Lstream Types |
7783 @section Lstream Types | 6933 @section Lstream Types |
7784 | 6934 |
7785 @table @asis | 6935 @table @asis |
7786 @item stdio | 6936 @item stdio |
7787 | 6937 |
7802 @item decoding | 6952 @item decoding |
7803 | 6953 |
7804 @item encoding | 6954 @item encoding |
7805 @end table | 6955 @end table |
7806 | 6956 |
7807 @node Lstream Functions, Lstream Methods, Lstream Types, Lstreams | 6957 @node Lstream Functions |
7808 @section Lstream Functions | 6958 @section Lstream Functions |
7809 | 6959 |
7810 @deftypefun {Lstream *} Lstream_new (Lstream_implementation *@var{imp}, const char *@var{mode}) | 6960 @deftypefun {Lstream *} Lstream_new (Lstream_implementation *@var{imp}, CONST char *@var{mode}) |
7811 Allocate and return a new Lstream. This function is not really meant to | 6961 Allocate and return a new Lstream. This function is not really meant to |
7812 be called directly; rather, each stream type should provide its own | 6962 be called directly; rather, each stream type should provide its own |
7813 stream creation function, which creates the stream and does any other | 6963 stream creation function, which creates the stream and does any other |
7814 necessary creation stuff (e.g. opening a file). | 6964 necessary creation stuff (e.g. opening a file). |
7815 @end deftypefun | 6965 @end deftypefun |
7838 @end deftypefn | 6988 @end deftypefn |
7839 | 6989 |
7840 @deftypefn Macro void Lstream_ungetc (Lstream *@var{stream}, int @var{c}) | 6990 @deftypefn Macro void Lstream_ungetc (Lstream *@var{stream}, int @var{c}) |
7841 Push one byte back onto the input queue. This will be the next byte | 6991 Push one byte back onto the input queue. This will be the next byte |
7842 read from the stream. Any number of bytes can be pushed back and will | 6992 read from the stream. Any number of bytes can be pushed back and will |
7843 be read in the reverse order they were pushed back---most recent | 6993 be read in the reverse order they were pushed back -- most recent |
7844 first. (This is necessary for consistency---if there are a number of | 6994 first. (This is necessary for consistency -- if there are a number of |
7845 bytes that have been unread and I read and unread a byte, it needs to be | 6995 bytes that have been unread and I read and unread a byte, it needs to be |
7846 the first to be read again.) This is a macro and so it is very | 6996 the first to be read again.) This is a macro and so it is very |
7847 efficient. The @var{c} argument is only evaluated once but the @var{stream} | 6997 efficient. The @var{c} argument is only evaluated once but the @var{stream} |
7848 argument is evaluated more than once. | 6998 argument is evaluated more than once. |
7849 @end deftypefn | 6999 @end deftypefn |
7852 @deftypefunx int Lstream_fgetc (Lstream *@var{stream}) | 7002 @deftypefunx int Lstream_fgetc (Lstream *@var{stream}) |
7853 @deftypefunx void Lstream_fungetc (Lstream *@var{stream}, int @var{c}) | 7003 @deftypefunx void Lstream_fungetc (Lstream *@var{stream}, int @var{c}) |
7854 Function equivalents of the above macros. | 7004 Function equivalents of the above macros. |
7855 @end deftypefun | 7005 @end deftypefun |
7856 | 7006 |
7857 @deftypefun ssize_t Lstream_read (Lstream *@var{stream}, void *@var{data}, size_t @var{size}) | 7007 @deftypefun int Lstream_read (Lstream *@var{stream}, void *@var{data}, int @var{size}) |
7858 Read @var{size} bytes of @var{data} from the stream. Return the number | 7008 Read @var{size} bytes of @var{data} from the stream. Return the number |
7859 of bytes read. 0 means EOF. -1 means an error occurred and no bytes | 7009 of bytes read. 0 means EOF. -1 means an error occurred and no bytes |
7860 were read. | 7010 were read. |
7861 @end deftypefun | 7011 @end deftypefun |
7862 | 7012 |
7863 @deftypefun ssize_t Lstream_write (Lstream *@var{stream}, void *@var{data}, size_t @var{size}) | 7013 @deftypefun int Lstream_write (Lstream *@var{stream}, void *@var{data}, int @var{size}) |
7864 Write @var{size} bytes of @var{data} to the stream. Return the number | 7014 Write @var{size} bytes of @var{data} to the stream. Return the number |
7865 of bytes written. -1 means an error occurred and no bytes were written. | 7015 of bytes written. -1 means an error occurred and no bytes were written. |
7866 @end deftypefun | 7016 @end deftypefun |
7867 | 7017 |
7868 @deftypefun void Lstream_unread (Lstream *@var{stream}, void *@var{data}, size_t @var{size}) | 7018 @deftypefun void Lstream_unread (Lstream *@var{stream}, void *@var{data}, int @var{size}) |
7869 Push back @var{size} bytes of @var{data} onto the input queue. The next | 7019 Push back @var{size} bytes of @var{data} onto the input queue. The next |
7870 call to @code{Lstream_read()} with the same size will read the same | 7020 call to @code{Lstream_read()} with the same size will read the same |
7871 bytes back. Note that this will be the case even if there is other | 7021 bytes back. Note that this will be the case even if there is other |
7872 pending unread data. | 7022 pending unread data. |
7873 @end deftypefun | 7023 @end deftypefun |
7877 @end deftypefun | 7027 @end deftypefun |
7878 | 7028 |
7879 @deftypefun void Lstream_reopen (Lstream *@var{stream}) | 7029 @deftypefun void Lstream_reopen (Lstream *@var{stream}) |
7880 Reopen a closed stream. This enables I/O on it again. This is not | 7030 Reopen a closed stream. This enables I/O on it again. This is not |
7881 meant to be called except from a wrapper routine that reinitializes | 7031 meant to be called except from a wrapper routine that reinitializes |
7882 variables and such---the close routine may well have freed some | 7032 variables and such -- the close routine may well have freed some |
7883 necessary storage structures, for example. | 7033 necessary storage structures, for example. |
7884 @end deftypefun | 7034 @end deftypefun |
7885 | 7035 |
7886 @deftypefun void Lstream_rewind (Lstream *@var{stream}) | 7036 @deftypefun void Lstream_rewind (Lstream *@var{stream}) |
7887 Rewind the stream to the beginning. | 7037 Rewind the stream to the beginning. |
7888 @end deftypefun | 7038 @end deftypefun |
7889 | 7039 |
7890 @node Lstream Methods, , Lstream Functions, Lstreams | 7040 @node Lstream Methods |
7891 @section Lstream Methods | 7041 @section Lstream Methods |
7892 | 7042 |
7893 @deftypefn {Lstream Method} ssize_t reader (Lstream *@var{stream}, unsigned char *@var{data}, size_t @var{size}) | 7043 @deftypefn {Lstream Method} int reader (Lstream *@var{stream}, unsigned char *@var{data}, int @var{size}) |
7894 Read some data from the stream's end and store it into @var{data}, which | 7044 Read some data from the stream's end and store it into @var{data}, which |
7895 can hold @var{size} bytes. Return the number of bytes read. A return | 7045 can hold @var{size} bytes. Return the number of bytes read. A return |
7896 value of 0 means no bytes can be read at this time. This may be because | 7046 value of 0 means no bytes can be read at this time. This may be because |
7897 of an EOF, or because there is a granularity greater than one byte that | 7047 of an EOF, or because there is a granularity greater than one byte that |
7898 the stream imposes on the returned data, and @var{size} is less than | 7048 the stream imposes on the returned data, and @var{size} is less than |
7905 calls @code{Lstream_read()} with a very small size. | 7055 calls @code{Lstream_read()} with a very small size. |
7906 | 7056 |
7907 This function can be @code{NULL} if the stream is output-only. | 7057 This function can be @code{NULL} if the stream is output-only. |
7908 @end deftypefn | 7058 @end deftypefn |
7909 | 7059 |
7910 @deftypefn {Lstream Method} ssize_t writer (Lstream *@var{stream}, const unsigned char *@var{data}, size_t @var{size}) | 7060 @deftypefn {Lstream Method} int writer (Lstream *@var{stream}, CONST unsigned char *@var{data}, int @var{size}) |
7911 Send some data to the stream's end. Data to be sent is in @var{data} | 7061 Send some data to the stream's end. Data to be sent is in @var{data} |
7912 and is @var{size} bytes. Return the number of bytes sent. This | 7062 and is @var{size} bytes. Return the number of bytes sent. This |
7913 function can send and return fewer bytes than is passed in; in that | 7063 function can send and return fewer bytes than is passed in; in that |
7914 case, the function will just be called again until there is no data left | 7064 case, the function will just be called again until there is no data left |
7915 or 0 is returned. A return value of 0 means that no more data can be | 7065 or 0 is returned. A return value of 0 means that no more data can be |
7923 @deftypefn {Lstream Method} int rewinder (Lstream *@var{stream}) | 7073 @deftypefn {Lstream Method} int rewinder (Lstream *@var{stream}) |
7924 Rewind the stream. If this is @code{NULL}, the stream is not seekable. | 7074 Rewind the stream. If this is @code{NULL}, the stream is not seekable. |
7925 @end deftypefn | 7075 @end deftypefn |
7926 | 7076 |
7927 @deftypefn {Lstream Method} int seekable_p (Lstream *@var{stream}) | 7077 @deftypefn {Lstream Method} int seekable_p (Lstream *@var{stream}) |
7928 Indicate whether this stream is seekable---i.e. it can be rewound. | 7078 Indicate whether this stream is seekable -- i.e. it can be rewound. |
7929 This method is ignored if the stream does not have a rewind method. If | 7079 This method is ignored if the stream does not have a rewind method. If |
7930 this method is not present, the result is determined by whether a rewind | 7080 this method is not present, the result is determined by whether a rewind |
7931 method is present. | 7081 method is present. |
7932 @end deftypefn | 7082 @end deftypefn |
7933 | 7083 |
7960 * Point:: | 7110 * Point:: |
7961 * Window Hierarchy:: | 7111 * Window Hierarchy:: |
7962 * The Window Object:: | 7112 * The Window Object:: |
7963 @end menu | 7113 @end menu |
7964 | 7114 |
7965 @node Introduction to Consoles; Devices; Frames; Windows, Point, Consoles; Devices; Frames; Windows, Consoles; Devices; Frames; Windows | 7115 @node Introduction to Consoles; Devices; Frames; Windows |
7966 @section Introduction to Consoles; Devices; Frames; Windows | 7116 @section Introduction to Consoles; Devices; Frames; Windows |
7967 | 7117 |
7968 A window-system window that you see on the screen is called a | 7118 A window-system window that you see on the screen is called a |
7969 @dfn{frame} in Emacs terminology. Each frame is subdivided into one or | 7119 @dfn{frame} in Emacs terminology. Each frame is subdivided into one or |
7970 more non-overlapping panes, called (confusingly) @dfn{windows}. Each | 7120 more non-overlapping panes, called (confusingly) @dfn{windows}. Each |
7995 There is a separate Lisp object type for each of these four concepts. | 7145 There is a separate Lisp object type for each of these four concepts. |
7996 Furthermore, there is logically a @dfn{selected console}, | 7146 Furthermore, there is logically a @dfn{selected console}, |
7997 @dfn{selected display}, @dfn{selected frame}, and @dfn{selected window}. | 7147 @dfn{selected display}, @dfn{selected frame}, and @dfn{selected window}. |
7998 Each of these objects is distinguished in various ways, such as being the | 7148 Each of these objects is distinguished in various ways, such as being the |
7999 default object for various functions that act on objects of that type. | 7149 default object for various functions that act on objects of that type. |
8000 Note that every containing object remembers the ``selected'' object | 7150 Note that every containing object rememembers the ``selected'' object |
8001 among the objects that it contains: e.g. not only is there a selected | 7151 among the objects that it contains: e.g. not only is there a selected |
8002 window, but every frame remembers the last window in it that was | 7152 window, but every frame remembers the last window in it that was |
8003 selected, and changing the selected frame causes the remembered window | 7153 selected, and changing the selected frame causes the remembered window |
8004 within it to become the selected window. Similar relationships apply | 7154 within it to become the selected window. Similar relationships apply |
8005 for consoles to devices and devices to frames. | 7155 for consoles to devices and devices to frames. |
8006 | 7156 |
8007 @node Point, Window Hierarchy, Introduction to Consoles; Devices; Frames; Windows, Consoles; Devices; Frames; Windows | 7157 @node Point |
8008 @section Point | 7158 @section Point |
8009 | 7159 |
8010 Recall that every buffer has a current insertion position, called | 7160 Recall that every buffer has a current insertion position, called |
8011 @dfn{point}. Now, two or more windows may be displaying the same buffer, | 7161 @dfn{point}. Now, two or more windows may be displaying the same buffer, |
8012 and the text cursor in the two windows (i.e. @code{point}) can be in | 7162 and the text cursor in the two windows (i.e. @code{point}) can be in |
8023 want to retrieve the correct value of @code{point} for a window, | 7173 want to retrieve the correct value of @code{point} for a window, |
8024 you must special-case on the selected window and retrieve the | 7174 you must special-case on the selected window and retrieve the |
8025 buffer's point instead. This is related to why @code{save-window-excursion} | 7175 buffer's point instead. This is related to why @code{save-window-excursion} |
8026 does not save the selected window's value of @code{point}. | 7176 does not save the selected window's value of @code{point}. |
8027 | 7177 |
8028 @node Window Hierarchy, The Window Object, Point, Consoles; Devices; Frames; Windows | 7178 @node Window Hierarchy |
8029 @section Window Hierarchy | 7179 @section Window Hierarchy |
8030 @cindex window hierarchy | 7180 @cindex window hierarchy |
8031 @cindex hierarchy of windows | 7181 @cindex hierarchy of windows |
8032 | 7182 |
8033 If a frame contains multiple windows (panes), they are always created | 7183 If a frame contains multiple windows (panes), they are always created |
8092 @dfn{one above the other}. | 7242 @dfn{one above the other}. |
8093 | 7243 |
8094 @item | 7244 @item |
8095 Leaf windows also have markers in their @code{start} (the | 7245 Leaf windows also have markers in their @code{start} (the |
8096 first buffer position displayed in the window) and @code{pointm} | 7246 first buffer position displayed in the window) and @code{pointm} |
8097 (the window's stashed value of @code{point}---see above) fields, | 7247 (the window's stashed value of @code{point} -- see above) fields, |
8098 while combination windows have nil in these fields. | 7248 while combination windows have nil in these fields. |
8099 | 7249 |
8100 @item | 7250 @item |
8101 The list of children for a window is threaded through the | 7251 The list of children for a window is threaded through the |
8102 @code{next} and @code{prev} fields of each child window. | 7252 @code{next} and @code{prev} fields of each child window. |
8108 does nothing except set a special @code{dead} bit to 1 and clear out the | 7258 does nothing except set a special @code{dead} bit to 1 and clear out the |
8109 @code{next}, @code{prev}, @code{hchild}, and @code{vchild} fields, for | 7259 @code{next}, @code{prev}, @code{hchild}, and @code{vchild} fields, for |
8110 GC purposes. | 7260 GC purposes. |
8111 | 7261 |
8112 @item | 7262 @item |
8113 Most frames actually have two top-level windows---one for the | 7263 Most frames actually have two top-level windows -- one for the |
8114 minibuffer and one (the @dfn{root}) for everything else. The modeline | 7264 minibuffer and one (the @dfn{root}) for everything else. The modeline |
8115 (if present) separates these two. The @code{next} field of the root | 7265 (if present) separates these two. The @code{next} field of the root |
8116 points to the minibuffer, and the @code{prev} field of the minibuffer | 7266 points to the minibuffer, and the @code{prev} field of the minibuffer |
8117 points to the root. The other @code{next} and @code{prev} fields are | 7267 points to the root. The other @code{next} and @code{prev} fields are |
8118 @code{nil}, and the frame points to both of these windows. | 7268 @code{nil}, and the frame points to both of these windows. |
8121 frames have no root window, and the @code{next} of the minibuffer window | 7271 frames have no root window, and the @code{next} of the minibuffer window |
8122 is @code{nil} but the @code{prev} points to itself. (#### This is an | 7272 is @code{nil} but the @code{prev} points to itself. (#### This is an |
8123 artifact that should be fixed.) | 7273 artifact that should be fixed.) |
8124 @end enumerate | 7274 @end enumerate |
8125 | 7275 |
8126 @node The Window Object, , Window Hierarchy, Consoles; Devices; Frames; Windows | 7276 @node The Window Object |
8127 @section The Window Object | 7277 @section The Window Object |
8128 | 7278 |
8129 Windows have the following accessible fields: | 7279 Windows have the following accessible fields: |
8130 | 7280 |
8131 @table @code | 7281 @table @code |
8250 @end enumerate | 7400 @end enumerate |
8251 | 7401 |
8252 @menu | 7402 @menu |
8253 * Critical Redisplay Sections:: | 7403 * Critical Redisplay Sections:: |
8254 * Line Start Cache:: | 7404 * Line Start Cache:: |
8255 * Redisplay Piece by Piece:: | |
8256 @end menu | 7405 @end menu |
8257 | 7406 |
8258 @node Critical Redisplay Sections, Line Start Cache, The Redisplay Mechanism, The Redisplay Mechanism | 7407 @node Critical Redisplay Sections |
8259 @section Critical Redisplay Sections | 7408 @section Critical Redisplay Sections |
8260 @cindex critical redisplay sections | 7409 @cindex critical redisplay sections |
8261 | 7410 |
8262 Within this section, we are defenseless and assume that the | 7411 Within this section, we are defenseless and assume that the |
8263 following cannot happen: | 7412 following cannot happen: |
8285 we simply return. #### We should abort instead. | 7434 we simply return. #### We should abort instead. |
8286 | 7435 |
8287 #### If a frame-size change does occur we should probably | 7436 #### If a frame-size change does occur we should probably |
8288 actually be preempting redisplay. | 7437 actually be preempting redisplay. |
8289 | 7438 |
8290 @node Line Start Cache, Redisplay Piece by Piece, Critical Redisplay Sections, The Redisplay Mechanism | 7439 @node Line Start Cache |
8291 @section Line Start Cache | 7440 @section Line Start Cache |
8292 @cindex line start cache | 7441 @cindex line start cache |
8293 | 7442 |
8294 The traditional scrolling code in Emacs breaks in a variable height | 7443 The traditional scrolling code in Emacs breaks in a variable height |
8295 world. It depends on the key assumption that the number of lines that | 7444 world. It depends on the key assumption that the number of lines that |
8329 information basically for free. In those cases where a user is simply | 7478 information basically for free. In those cases where a user is simply |
8330 scrolling around viewing a buffer there is a high probability that this | 7479 scrolling around viewing a buffer there is a high probability that this |
8331 is sufficient to always provide the needed information. The second | 7480 is sufficient to always provide the needed information. The second |
8332 thing we can do is be smart about invalidating the cache. | 7481 thing we can do is be smart about invalidating the cache. |
8333 | 7482 |
8334 TODO---Be smart about invalidating the cache. Potential places: | 7483 TODO -- Be smart about invalidating the cache. Potential places: |
8335 | 7484 |
8336 @itemize @bullet | 7485 @itemize @bullet |
8337 @item | 7486 @item |
8338 Insertions at end-of-line which don't cause line-wraps do not alter the | 7487 Insertions at end-of-line which don't cause line-wraps do not alter the |
8339 starting positions of any display lines. These types of buffer | 7488 starting positions of any display lines. These types of buffer |
8346 @end itemize | 7495 @end itemize |
8347 | 7496 |
8348 In case you're wondering, the Second Golden Rule of Redisplay is not | 7497 In case you're wondering, the Second Golden Rule of Redisplay is not |
8349 applicable. | 7498 applicable. |
8350 | 7499 |
8351 @node Redisplay Piece by Piece, , Line Start Cache, The Redisplay Mechanism | 7500 @node Extents, Faces and Glyphs, The Redisplay Mechanism, Top |
8352 @section Redisplay Piece by Piece | |
8353 @cindex Redisplay Piece by Piece | |
8354 | |
8355 As you can begin to see redisplay is complex and also not well | |
8356 documented. Chuck no longer works on XEmacs so this section is my take | |
8357 on the workings of redisplay. | |
8358 | |
8359 Redisplay happens in three phases: | |
8360 | |
8361 @enumerate | |
8362 @item | |
8363 Determine desired display in area that needs redisplay. | |
8364 Implemented by @code{redisplay.c} | |
8365 @item | |
8366 Compare desired display with current display | |
8367 Implemented by @code{redisplay-output.c} | |
8368 @item | |
8369 Output changes Implemented by @code{redisplay-output.c}, | |
8370 @code{redisplay-x.c}, @code{redisplay-msw.c} and @code{redisplay-tty.c} | |
8371 @end enumerate | |
8372 | |
8373 Steps 1 and 2 are device-independent and relatively complex. Step 3 is | |
8374 mostly device-dependent. | |
8375 | |
8376 Determining the desired display | |
8377 | |
8378 Display attributes are stored in @code{display_line} structures. Each | |
8379 @code{display_line} consists of a set of @code{display_block}'s and each | |
8380 @code{display_block} contains a number of @code{rune}'s. Generally | |
8381 dynarr's of @code{display_line}'s are held by each window representing | |
8382 the current display and the desired display. | |
8383 | |
8384 The @code{display_line} structures are tightly tied to buffers which | |
8385 presents a problem for redisplay as this connection is bogus for the | |
8386 modeline. Hence the @code{display_line} generation routines are | |
8387 duplicated for generating the modeline. This means that the modeline | |
8388 display code has many bugs that the standard redisplay code does not. | |
8389 | |
8390 The guts of @code{display_line} generation are in | |
8391 @code{create_text_block}, which creates a single display line for the | |
8392 desired locale. This incrementally parses the characters on the current | |
8393 line and generates redisplay structures for each. | |
8394 | |
8395 Gutter redisplay is different. Because the data to display is stored in | |
8396 a string we cannot use @code{create_text_block}. Instead we use | |
8397 @code{create_text_string_block} which performs the same function as | |
8398 @code{create_text_block} but for strings. Many of the complexities of | |
8399 @code{create_text_block} to do with cursor handling and selective | |
8400 display have been removed. | |
8401 | |
8402 @node Extents, Faces, The Redisplay Mechanism, Top | |
8403 @chapter Extents | 7501 @chapter Extents |
8404 | 7502 |
8405 @menu | 7503 @menu |
8406 * Introduction to Extents:: Extents are ranges over text, with properties. | 7504 * Introduction to Extents:: Extents are ranges over text, with properties. |
8407 * Extent Ordering:: How extents are ordered internally. | 7505 * Extent Ordering:: How extents are ordered internally. |
8408 * Format of the Extent Info:: The extent information in a buffer or string. | 7506 * Format of the Extent Info:: The extent information in a buffer or string. |
8409 * Zero-Length Extents:: A weird special case. | 7507 * Zero-Length Extents:: A weird special case. |
8410 * Mathematics of Extent Ordering:: A rigorous foundation. | 7508 * Mathematics of Extent Ordering:: A rigorous foundation. |
8411 * Extent Fragments:: Cached information useful for redisplay. | 7509 * Extent Fragments:: Cached information useful for redisplay. |
8412 @end menu | 7510 @end menu |
8413 | 7511 |
8414 @node Introduction to Extents, Extent Ordering, Extents, Extents | 7512 @node Introduction to Extents |
8415 @section Introduction to Extents | 7513 @section Introduction to Extents |
8416 | 7514 |
8417 Extents are regions over a buffer, with a start and an end position | 7515 Extents are regions over a buffer, with a start and an end position |
8418 denoting the region of the buffer included in the extent. In | 7516 denoting the region of the buffer included in the extent. In |
8419 addition, either end can be closed or open, meaning that the endpoint | 7517 addition, either end can be closed or open, meaning that the endpoint |
8431 automatically go inside or out of extents as necessary with no | 7529 automatically go inside or out of extents as necessary with no |
8432 further work needing to be done. It didn't work out that way, | 7530 further work needing to be done. It didn't work out that way, |
8433 however, and just ended up complexifying and buggifying all the | 7531 however, and just ended up complexifying and buggifying all the |
8434 rest of the code.) | 7532 rest of the code.) |
8435 | 7533 |
8436 @node Extent Ordering, Format of the Extent Info, Introduction to Extents, Extents | 7534 @node Extent Ordering |
8437 @section Extent Ordering | 7535 @section Extent Ordering |
8438 | 7536 |
8439 Extents are compared using memory indices. There are two orderings | 7537 Extents are compared using memory indices. There are two orderings |
8440 for extents and both orders are kept current at all times. The normal | 7538 for extents and both orders are kept current at all times. The normal |
8441 or @dfn{display} order is as follows: | 7539 or @dfn{display} order is as follows: |
8465 The display order and the e-order are complementary orders: any | 7563 The display order and the e-order are complementary orders: any |
8466 theorem about the display order also applies to the e-order if you swap | 7564 theorem about the display order also applies to the e-order if you swap |
8467 all occurrences of ``display order'' and ``e-order'', ``less than'' and | 7565 all occurrences of ``display order'' and ``e-order'', ``less than'' and |
8468 ``greater than'', and ``extent start'' and ``extent end''. | 7566 ``greater than'', and ``extent start'' and ``extent end''. |
8469 | 7567 |
8470 @node Format of the Extent Info, Zero-Length Extents, Extent Ordering, Extents | 7568 @node Format of the Extent Info |
8471 @section Format of the Extent Info | 7569 @section Format of the Extent Info |
8472 | 7570 |
8473 An extent-info structure consists of a list of the buffer or string's | 7571 An extent-info structure consists of a list of the buffer or string's |
8474 extents and a @dfn{stack of extents} that lists all of the extents over | 7572 extents and a @dfn{stack of extents} that lists all of the extents over |
8475 a particular position. The stack-of-extents info is used for | 7573 a particular position. The stack-of-extents info is used for |
8476 optimization purposes---it basically caches some info that might | 7574 optimization purposes -- it basically caches some info that might |
8477 be expensive to compute. Certain otherwise hard computations are easy | 7575 be expensive to compute. Certain otherwise hard computations are easy |
8478 given the stack of extents over a particular position, and if the | 7576 given the stack of extents over a particular position, and if the |
8479 stack of extents over a nearby position is known (because it was | 7577 stack of extents over a nearby position is known (because it was |
8480 calculated at some prior point in time), it's easy to move the stack | 7578 calculated at some prior point in time), it's easy to move the stack |
8481 of extents to the proper position. | 7579 of extents to the proper position. |
8499 between two extents. Note also that callers of these functions should | 7597 between two extents. Note also that callers of these functions should |
8500 not be aware of the fact that the extent list is implemented as an | 7598 not be aware of the fact that the extent list is implemented as an |
8501 array, except for the fact that positions are integers (this should be | 7599 array, except for the fact that positions are integers (this should be |
8502 generalized to handle integers and linked list equally well). | 7600 generalized to handle integers and linked list equally well). |
8503 | 7601 |
8504 @node Zero-Length Extents, Mathematics of Extent Ordering, Format of the Extent Info, Extents | 7602 @node Zero-Length Extents |
8505 @section Zero-Length Extents | 7603 @section Zero-Length Extents |
8506 | 7604 |
8507 Extents can be zero-length, and will end up that way if their endpoints | 7605 Extents can be zero-length, and will end up that way if their endpoints |
8508 are explicitly set that way or if their detachable property is nil | 7606 are explicitly set that way or if their detachable property is nil |
8509 and all the text in the extent is deleted. (The exception is open-open | 7607 and all the text in the extent is deleted. (The exception is open-open |
8528 | 7626 |
8529 Note that closed-open, non-detachable zero-length extents behave | 7627 Note that closed-open, non-detachable zero-length extents behave |
8530 exactly like markers and that open-closed, non-detachable zero-length | 7628 exactly like markers and that open-closed, non-detachable zero-length |
8531 extents behave like the ``point-type'' marker in Mule. | 7629 extents behave like the ``point-type'' marker in Mule. |
8532 | 7630 |
8533 @node Mathematics of Extent Ordering, Extent Fragments, Zero-Length Extents, Extents | 7631 @node Mathematics of Extent Ordering |
8534 @section Mathematics of Extent Ordering | 7632 @section Mathematics of Extent Ordering |
8535 @cindex extent mathematics | 7633 @cindex extent mathematics |
8536 @cindex mathematics of extents | 7634 @cindex mathematics of extents |
8537 @cindex extent ordering | 7635 @cindex extent ordering |
8538 | 7636 |
8663 Proof: If @math{F2} does not include @math{I} then its start index is | 7761 Proof: If @math{F2} does not include @math{I} then its start index is |
8664 greater than @math{I} and thus it is greater than any extent in | 7762 greater than @math{I} and thus it is greater than any extent in |
8665 @math{S}, including @math{F}. Otherwise, @math{F2} includes @math{I} | 7763 @math{S}, including @math{F}. Otherwise, @math{F2} includes @math{I} |
8666 and thus is in @math{S}, and thus @math{F2 >= F}. | 7764 and thus is in @math{S}, and thus @math{F2 >= F}. |
8667 | 7765 |
8668 @node Extent Fragments, , Mathematics of Extent Ordering, Extents | 7766 @node Extent Fragments |
8669 @section Extent Fragments | 7767 @section Extent Fragments |
8670 @cindex extent fragment | 7768 @cindex extent fragment |
8671 | 7769 |
8672 Imagine that the buffer is divided up into contiguous, non-overlapping | 7770 Imagine that the buffer is divided up into contiguous, non-overlapping |
8673 @dfn{runs} of text such that no extent starts or ends within a run | 7771 @dfn{runs} of text such that no extent starts or ends within a run |
8674 (extents that abut the run don't count). | 7772 (extents that abut the run don't count). |
8675 | 7773 |
8676 An extent fragment is a structure that holds data about the run that | 7774 An extent fragment is a structure that holds data about the run that |
8677 contains a particular buffer position (if the buffer position is at the | 7775 contains a particular buffer position (if the buffer position is at the |
8678 junction of two runs, the run after the position is used)---the | 7776 junction of two runs, the run after the position is used) -- the |
8679 beginning and end of the run, a list of all of the extents in that run, | 7777 beginning and end of the run, a list of all of the extents in that run, |
8680 the @dfn{merged face} that results from merging all of the faces | 7778 the @dfn{merged face} that results from merging all of the faces |
8681 corresponding to those extents, the begin and end glyphs at the | 7779 corresponding to those extents, the begin and end glyphs at the |
8682 beginning of the run, etc. This is the information that redisplay needs | 7780 beginning of the run, etc. This is the information that redisplay needs |
8683 in order to display this run. | 7781 in order to display this run. |
8685 Extent fragments have to be very quick to update to a new buffer | 7783 Extent fragments have to be very quick to update to a new buffer |
8686 position when moving linearly through the buffer. They rely on the | 7784 position when moving linearly through the buffer. They rely on the |
8687 stack-of-extents code, which does the heavy-duty algorithmic work of | 7785 stack-of-extents code, which does the heavy-duty algorithmic work of |
8688 determining which extents overly a particular position. | 7786 determining which extents overly a particular position. |
8689 | 7787 |
8690 @node Faces, Glyphs, Extents, Top | 7788 @node Faces and Glyphs, Specifiers, Extents, Top |
8691 @chapter Faces | 7789 @chapter Faces and Glyphs |
8692 | 7790 |
8693 Not yet documented. | 7791 Not yet documented. |
8694 | 7792 |
8695 @node Glyphs, Specifiers, Faces, Top | 7793 @node Specifiers, Menus, Faces and Glyphs, Top |
8696 @chapter Glyphs | |
8697 | |
8698 Glyphs are graphical elements that can be displayed in XEmacs buffers or | |
8699 gutters. We use the term graphical element here in the broadest possible | |
8700 sense since glyphs can be as mundane as text to as arcane as a native | |
8701 tab widget. | |
8702 | |
8703 In XEmacs, glyphs represent the uninstantiated state of graphical | |
8704 elements, i.e. they hold all the information necessary to produce an | |
8705 image on-screen but the image does not exist at this stage. | |
8706 | |
8707 Glyphs are lazily instantiated by calling one of the glyph | |
8708 functions. This usually occurs within redisplay when | |
8709 @code{Fglyph_height} is called. Instantiation causes an image-instance | |
8710 to be created and cached. This cache is on a device basis for all glyphs | |
8711 except glyph-widgets, and on a window basis for glyph widgets. The | |
8712 caching is done by @code{image_instantiate} and is necessary because it | |
8713 is generally possible to display an image-instance in multiple | |
8714 domains. For instance if we create a Pixmap, we can actually display | |
8715 this on multiple windows - even though we only need a single Pixmap | |
8716 instance to do this. If caching wasn't done then it would be necessary | |
8717 to create image-instances for every displayable occurrence of a glyph - | |
8718 and every usage - and this would be extremely memory and cpu intensive. | |
8719 | |
8720 Widget-glyphs (a.k.a native widgets) are not cached in this way. This is | |
8721 because widget-glyph image-instances on screen are toolkit windows, and | |
8722 thus cannot be reused in multiple XEmacs domains. Thus widget-glyphs are | |
8723 cached on a window basis. | |
8724 | |
8725 Any action on a glyph first consults the cache before actually | |
8726 instantiating a widget. | |
8727 | |
8728 @section Widget-Glyphs in the MS-Windows Environment | |
8729 | |
8730 To Do | |
8731 | |
8732 @section Widget-Glyphs in the X Environment | |
8733 | |
8734 Widget-glyphs under X make heavy use of lwlib for manipulating the | |
8735 native toolkit objects. This is primarily so that different toolkits can | |
8736 be supported for widget-glyphs, just as they are supported for features | |
8737 such as menubars etc. | |
8738 | |
8739 Lwlib is extremely poorly documented and quite hairy so here is my | |
8740 understanding of what goes on. | |
8741 | |
8742 Lwlib maintains a set of widget_instances which mirror the hierarchical | |
8743 state of Xt widgets. I think this is so that widgets can be updated and | |
8744 manipulated generically by the lwlib library. For instance | |
8745 update_one_widget_instance can cope with multiple types of widget and | |
8746 multiple types of toolkit. Each element in the widget hierarchy is updated | |
8747 from its corresponding widget_instance by walking the widget_instance | |
8748 tree recursively. | |
8749 | |
8750 This has desirable properties such as lw_modify_all_widgets which is | |
8751 called from glyphs-x.c and updates all the properties of a widget | |
8752 without having to know what the widget is or what toolkit it is from. | |
8753 Unfortunately this also has hairy properties such as making the lwlib | |
8754 code quite complex. And of course lwlib has to know at some level what | |
8755 the widget is and how to set its properties. | |
8756 | |
8757 @node Specifiers, Menus, Glyphs, Top | |
8758 @chapter Specifiers | 7794 @chapter Specifiers |
8759 | 7795 |
8760 Not yet documented. | 7796 Not yet documented. |
8761 | 7797 |
8762 @node Menus, Subprocesses, Specifiers, Top | 7798 @node Menus, Subprocesses, Specifiers, Top |
8882 @item tty_name | 7918 @item tty_name |
8883 The name of the terminal that the subprocess is using, | 7919 The name of the terminal that the subprocess is using, |
8884 or @code{nil} if it is using pipes. | 7920 or @code{nil} if it is using pipes. |
8885 @end table | 7921 @end table |
8886 | 7922 |
8887 @node Interface to X Windows, Index , Subprocesses, Top | 7923 @node Interface to X Windows, Index, Subprocesses, Top |
8888 @chapter Interface to X Windows | 7924 @chapter Interface to X Windows |
8889 | 7925 |
8890 Not yet documented. | 7926 Not yet documented. |
8891 | 7927 |
8892 @include index.texi | 7928 @include index.texi |
8895 @summarycontents | 7931 @summarycontents |
8896 @contents | 7932 @contents |
8897 @c That's all | 7933 @c That's all |
8898 | 7934 |
8899 @bye | 7935 @bye |
7936 |