comparison man/internals/internals.texi @ 412:697ef44129c6 r21-2-14

Import from CVS: tag r21-2-14
author cvs
date Mon, 13 Aug 2007 11:20:41 +0200
parents de805c49cfc1
children da8ed4261e83
comparison
equal deleted inserted replaced
411:12e008d41344 412:697ef44129c6
5 @c %**end of header 5 @c %**end of header
6 6
7 @ifinfo 7 @ifinfo
8 @dircategory XEmacs Editor 8 @dircategory XEmacs Editor
9 @direntry 9 @direntry
10 * Internals: (internals). XEmacs Internals Manual. 10 * Internals: (internals). XEmacs Internals Manual.
11 @end direntry 11 @end direntry
12 12
13 Copyright @copyright{} 1992 - 1996 Ben Wing. 13 Copyright @copyright{} 1992 - 1996 Ben Wing.
14 Copyright @copyright{} 1996, 1997 Sun Microsystems. 14 Copyright @copyright{} 1996, 1997 Sun Microsystems.
15 Copyright @copyright{} 1994 - 1998 Free Software Foundation. 15 Copyright @copyright{} 1994 - 1998 Free Software Foundation.
61 @setchapternewpage odd 61 @setchapternewpage odd
62 @finalout 62 @finalout
63 63
64 @titlepage 64 @titlepage
65 @title XEmacs Internals Manual 65 @title XEmacs Internals Manual
66 @subtitle Version 1.3, August 1999 66 @subtitle Version 1.2, October 1998
67 67
68 @author Ben Wing 68 @author Ben Wing
69 @author Martin Buchholz 69 @author Martin Buchholz
70 @author Hrvoje Niksic 70 @author Hrvoje Niksic
71 @author Matthias Neubauer
72 @author Olivier Galibert
73 @page 71 @page
74 @vskip 0pt plus 1fill 72 @vskip 0pt plus 1fill
75 73
76 @noindent 74 @noindent
77 Copyright @copyright{} 1992 - 1996 Ben Wing. @* 75 Copyright @copyright{} 1992 - 1996 Ben Wing. @*
78 Copyright @copyright{} 1996, 1997 Sun Microsystems, Inc. @* 76 Copyright @copyright{} 1996, 1997 Sun Microsystems, Inc. @*
79 Copyright @copyright{} 1994 - 1998 Free Software Foundation. @* 77 Copyright @copyright{} 1994 - 1998 Free Software Foundation. @*
80 Copyright @copyright{} 1994, 1995 Board of Trustees, University of Illinois. 78 Copyright @copyright{} 1994, 1995 Board of Trustees, University of Illinois.
81 79
82 @sp 2 80 @sp 2
83 Version 1.3 @* 81 Version 1.2 @*
84 August 1999.@* 82 October 1998.@*
85 83
86 Permission is granted to make and distribute verbatim copies of this 84 Permission is granted to make and distribute verbatim copies of this
87 manual provided the copyright notice and this permission notice are 85 manual provided the copyright notice and this permission notice are
88 preserved on all copies. 86 preserved on all copies.
89 87
117 * The XEmacs Object System (Abstractly Speaking):: 115 * The XEmacs Object System (Abstractly Speaking)::
118 * How Lisp Objects Are Represented in C:: 116 * How Lisp Objects Are Represented in C::
119 * Rules When Writing New C Code:: 117 * Rules When Writing New C Code::
120 * A Summary of the Various XEmacs Modules:: 118 * A Summary of the Various XEmacs Modules::
121 * Allocation of Objects in XEmacs Lisp:: 119 * Allocation of Objects in XEmacs Lisp::
122 * Dumping::
123 * Events and the Event Loop:: 120 * Events and the Event Loop::
124 * Evaluation; Stack Frames; Bindings:: 121 * Evaluation; Stack Frames; Bindings::
125 * Symbols and Variables:: 122 * Symbols and Variables::
126 * Buffers and Textual Representation:: 123 * Buffers and Textual Representation::
127 * MULE Character Sets and Encodings:: 124 * MULE Character Sets and Encodings::
128 * The Lisp Reader and Compiler:: 125 * The Lisp Reader and Compiler::
129 * Lstreams:: 126 * Lstreams::
130 * Consoles; Devices; Frames; Windows:: 127 * Consoles; Devices; Frames; Windows::
131 * The Redisplay Mechanism:: 128 * The Redisplay Mechanism::
132 * Extents:: 129 * Extents::
133 * Faces:: 130 * Faces and Glyphs::
134 * Glyphs::
135 * Specifiers:: 131 * Specifiers::
136 * Menus:: 132 * Menus::
137 * Subprocesses:: 133 * Subprocesses::
138 * Interface to X Windows:: 134 * Interface to X Windows::
139 * Index:: 135 * Index:: Index including concepts, functions, variables,
140 136 and other terms.
141 @detailmenu 137
142 138 --- The Detailed Node Listing ---
143 --- The Detailed Node Listing --- 139
140 Here are other nodes that are inferiors of those already listed,
141 mentioned here so you can get to them in one step:
144 142
145 A History of Emacs 143 A History of Emacs
146 144
147 * Through Version 18:: Unification prevails. 145 * Through Version 18:: Unification prevails.
148 * Lucid Emacs:: One version 19 Emacs. 146 * Lucid Emacs:: One version 19 Emacs.
149 * GNU Emacs 19:: The other version 19 Emacs. 147 * GNU Emacs 19:: The other version 19 Emacs.
150 * GNU Emacs 20:: The other version 20 Emacs.
151 * XEmacs:: The continuation of Lucid Emacs. 148 * XEmacs:: The continuation of Lucid Emacs.
152 149
153 Rules When Writing New C Code 150 Rules When Writing New C Code
154 151
155 * General Coding Rules:: 152 * General Coding Rules::
156 * Writing Lisp Primitives:: 153 * Writing Lisp Primitives::
157 * Adding Global Lisp Variables:: 154 * Adding Global Lisp Variables::
158 * Coding for Mule::
159 * Techniques for XEmacs Developers:: 155 * Techniques for XEmacs Developers::
160
161 Coding for Mule
162
163 * Character-Related Data Types::
164 * Working With Character and Byte Positions::
165 * Conversion to and from External Data::
166 * General Guidelines for Writing Mule-Aware Code::
167 * An Example of Mule-Aware Code::
168 156
169 A Summary of the Various XEmacs Modules 157 A Summary of the Various XEmacs Modules
170 158
171 * Low-Level Modules:: 159 * Low-Level Modules::
172 * Basic Lisp Modules:: 160 * Basic Lisp Modules::
184 Allocation of Objects in XEmacs Lisp 172 Allocation of Objects in XEmacs Lisp
185 173
186 * Introduction to Allocation:: 174 * Introduction to Allocation::
187 * Garbage Collection:: 175 * Garbage Collection::
188 * GCPROing:: 176 * GCPROing::
189 * Garbage Collection - Step by Step::
190 * Integers and Characters:: 177 * Integers and Characters::
191 * Allocation from Frob Blocks:: 178 * Allocation from Frob Blocks::
192 * lrecords:: 179 * lrecords::
193 * Low-level allocation:: 180 * Low-level allocation::
181 * Pure Space::
194 * Cons:: 182 * Cons::
195 * Vector:: 183 * Vector::
196 * Bit Vector:: 184 * Bit Vector::
197 * Symbol:: 185 * Symbol::
198 * Marker:: 186 * Marker::
199 * String:: 187 * String::
200 * Compiled Function:: 188 * Compiled Function::
201
202 Garbage Collection - Step by Step
203
204 * Invocation::
205 * garbage_collect_1::
206 * mark_object::
207 * gc_sweep::
208 * sweep_lcrecords_1::
209 * compact_string_chars::
210 * sweep_strings::
211 * sweep_bit_vectors_1::
212
213 Dumping
214
215 * Overview::
216 * Data descriptions::
217 * Dumping phase::
218 * Reloading phase::
219
220 Dumping phase
221
222 * Object inventory::
223 * Address allocation::
224 * The header::
225 * Data dumping::
226 * Pointers dumping::
227 189
228 Events and the Event Loop 190 Events and the Event Loop
229 191
230 * Introduction to Events:: 192 * Introduction to Events::
231 * Main Loop:: 193 * Main Loop::
261 MULE Character Sets and Encodings 223 MULE Character Sets and Encodings
262 224
263 * Character Sets:: 225 * Character Sets::
264 * Encodings:: 226 * Encodings::
265 * Internal Mule Encodings:: 227 * Internal Mule Encodings::
266 * CCL::
267 228
268 Encodings 229 Encodings
269 230
270 * Japanese EUC (Extended Unix Code):: 231 * Japanese EUC (Extended Unix Code)::
271 * JIS7:: 232 * JIS7::
273 Internal Mule Encodings 234 Internal Mule Encodings
274 235
275 * Internal String Encoding:: 236 * Internal String Encoding::
276 * Internal Character Encoding:: 237 * Internal Character Encoding::
277 238
239 The Lisp Reader and Compiler
240
278 Lstreams 241 Lstreams
279
280 * Creating an Lstream:: Creating an lstream object.
281 * Lstream Types:: Different sorts of things that are streamed.
282 * Lstream Functions:: Functions for working with lstreams.
283 * Lstream Methods:: Creating new lstream types.
284 242
285 Consoles; Devices; Frames; Windows 243 Consoles; Devices; Frames; Windows
286 244
287 * Introduction to Consoles; Devices; Frames; Windows:: 245 * Introduction to Consoles; Devices; Frames; Windows::
288 * Point:: 246 * Point::
289 * Window Hierarchy:: 247 * Window Hierarchy::
290 * The Window Object::
291 248
292 The Redisplay Mechanism 249 The Redisplay Mechanism
293 250
294 * Critical Redisplay Sections:: 251 * Critical Redisplay Sections::
295 * Line Start Cache:: 252 * Line Start Cache::
296 * Redisplay Piece by Piece::
297 253
298 Extents 254 Extents
299 255
300 * Introduction to Extents:: Extents are ranges over text, with properties. 256 * Introduction to Extents:: Extents are ranges over text, with properties.
301 * Extent Ordering:: How extents are ordered internally. 257 * Extent Ordering:: How extents are ordered internally.
302 * Format of the Extent Info:: The extent information in a buffer or string. 258 * Format of the Extent Info:: The extent information in a buffer or string.
303 * Zero-Length Extents:: A weird special case. 259 * Zero-Length Extents:: A weird special case.
304 * Mathematics of Extent Ordering:: A rigorous foundation. 260 * Mathematics of Extent Ordering:: A rigorous foundation.
305 * Extent Fragments:: Cached information useful for redisplay. 261 * Extent Fragments:: Cached information useful for redisplay.
306 262
307 @end detailmenu 263 Faces and Glyphs
264
265 Specifiers
266
267 Menus
268
269 Subprocesses
270
271 Interface to X Windows
272
308 @end menu 273 @end menu
309 274
310 @node A History of Emacs, XEmacs From the Outside, Top, Top 275 @node A History of Emacs, XEmacs From the Outside, Top, Top
311 @chapter A History of Emacs 276 @chapter A History of Emacs
312 @cindex history of Emacs 277 @cindex history of Emacs
343 * GNU Emacs 19:: The other version 19 Emacs. 308 * GNU Emacs 19:: The other version 19 Emacs.
344 * GNU Emacs 20:: The other version 20 Emacs. 309 * GNU Emacs 20:: The other version 20 Emacs.
345 * XEmacs:: The continuation of Lucid Emacs. 310 * XEmacs:: The continuation of Lucid Emacs.
346 @end menu 311 @end menu
347 312
348 @node Through Version 18, Lucid Emacs, A History of Emacs, A History of Emacs 313 @node Through Version 18
349 @section Through Version 18 314 @section Through Version 18
350 @cindex Gosling, James 315 @cindex Gosling, James
351 @cindex Great Usenet Renaming 316 @cindex Great Usenet Renaming
352 317
353 Although the history of the early versions of GNU Emacs is unclear, 318 Although the history of the early versions of GNU Emacs is unclear,
456 version 18.58 released ?????. 421 version 18.58 released ?????.
457 @item 422 @item
458 version 18.59 released October 31, 1992. 423 version 18.59 released October 31, 1992.
459 @end itemize 424 @end itemize
460 425
461 @node Lucid Emacs, GNU Emacs 19, Through Version 18, A History of Emacs 426 @node Lucid Emacs
462 @section Lucid Emacs 427 @section Lucid Emacs
463 @cindex Lucid Emacs 428 @cindex Lucid Emacs
464 @cindex Lucid Inc. 429 @cindex Lucid Inc.
465 @cindex Energize 430 @cindex Energize
466 @cindex Epoch 431 @cindex Epoch
544 version 20.3 (the first stable version of XEmacs 20.x) released November 30, 509 version 20.3 (the first stable version of XEmacs 20.x) released November 30,
545 1997. 510 1997.
546 version 20.4 released February 28, 1998. 511 version 20.4 released February 28, 1998.
547 @end itemize 512 @end itemize
548 513
549 @node GNU Emacs 19, GNU Emacs 20, Lucid Emacs, A History of Emacs 514 @node GNU Emacs 19
550 @section GNU Emacs 19 515 @section GNU Emacs 19
551 @cindex GNU Emacs 19 516 @cindex GNU Emacs 19
552 @cindex FSF Emacs 517 @cindex FSF Emacs
553 518
554 About a year after the initial release of Lucid Emacs, the FSF 519 About a year after the initial release of Lucid Emacs, the FSF
621 worse. Lucid soon began incorporating features from GNU Emacs 19 into 586 worse. Lucid soon began incorporating features from GNU Emacs 19 into
622 Lucid Emacs; the work was mostly done by Richard Mlynarik, who had been 587 Lucid Emacs; the work was mostly done by Richard Mlynarik, who had been
623 working on and using GNU Emacs for a long time (back as far as version 588 working on and using GNU Emacs for a long time (back as far as version
624 16 or 17). 589 16 or 17).
625 590
626 @node GNU Emacs 20, XEmacs, GNU Emacs 19, A History of Emacs 591 @node GNU Emacs 20
627 @section GNU Emacs 20 592 @section GNU Emacs 20
628 @cindex GNU Emacs 20 593 @cindex GNU Emacs 20
629 @cindex FSF Emacs 594 @cindex FSF Emacs
630 595
631 On February 2, 1997 work began on GNU Emacs to integrate Mule. The first 596 On February 2, 1997 work began on GNU Emacs to integrate Mule. The first
640 version 20.2 released September 20, 1997. 605 version 20.2 released September 20, 1997.
641 @item 606 @item
642 version 20.3 released August 19, 1998. 607 version 20.3 released August 19, 1998.
643 @end itemize 608 @end itemize
644 609
645 @node XEmacs, , GNU Emacs 20, A History of Emacs 610 @node XEmacs
646 @section XEmacs 611 @section XEmacs
647 @cindex XEmacs 612 @cindex XEmacs
648 613
649 @cindex Sun Microsystems 614 @cindex Sun Microsystems
650 @cindex University of Illinois 615 @cindex University of Illinois
728 windows, frames, events) that are useful for implementing an editor. 693 windows, frames, events) that are useful for implementing an editor.
729 Some of these objects (in particular windows and frames) have 694 Some of these objects (in particular windows and frames) have
730 displayable representations, and XEmacs provides a function 695 displayable representations, and XEmacs provides a function
731 @code{redisplay()} that ensures that the display of all such objects 696 @code{redisplay()} that ensures that the display of all such objects
732 matches their internal state. Most of the time, a standard Lisp 697 matches their internal state. Most of the time, a standard Lisp
733 environment is in a @dfn{read-eval-print} loop---i.e. ``read some Lisp 698 environment is in a @dfn{read-eval-print} loop -- i.e. ``read some Lisp
734 code, execute it, and print the results''. XEmacs has a similar loop: 699 code, execute it, and print the results''. XEmacs has a similar loop:
735 700
736 @itemize @bullet 701 @itemize @bullet
737 @item 702 @item
738 read an event 703 read an event
903 handler for some or all classes of errors. (If no handler is registered, 868 handler for some or all classes of errors. (If no handler is registered,
904 a default handler, generally installed by the top-level event loop, is 869 a default handler, generally installed by the top-level event loop, is
905 executed; this prints out the error and continues.) Routines can also 870 executed; this prints out the error and continues.) Routines can also
906 specify cleanup code (called an @dfn{unwind-protect}) that will be 871 specify cleanup code (called an @dfn{unwind-protect}) that will be
907 called when control exits from a block of code, no matter how that exit 872 called when control exits from a block of code, no matter how that exit
908 occurs---i.e. even if a function deeply nested below it causes a 873 occurs -- i.e. even if a function deeply nested below it causes a
909 non-local exit back to the top level. 874 non-local exit back to the top level.
910 875
911 Note that this facility has appeared in some recent vintages of C, in 876 Note that this facility has appeared in some recent vintages of C, in
912 particular Visual C++ and other PC compilers written for the Microsoft 877 particular Visual C++ and other PC compilers written for the Microsoft
913 Win32 API. 878 Win32 API.
917 that if you declare a local variable in a particular function, and then 882 that if you declare a local variable in a particular function, and then
918 call another function, that subfunction can ``see'' the local variable 883 call another function, that subfunction can ``see'' the local variable
919 you declared. This is actually considered a bug in Emacs Lisp and in 884 you declared. This is actually considered a bug in Emacs Lisp and in
920 all other early dialects of Lisp, and was corrected in Common Lisp. (In 885 all other early dialects of Lisp, and was corrected in Common Lisp. (In
921 Common Lisp, you can still declare dynamically scoped variables if you 886 Common Lisp, you can still declare dynamically scoped variables if you
922 want to---they are sometimes useful---but variables by default are 887 want to -- they are sometimes useful -- but variables by default are
923 @dfn{lexically scoped} as in C.) 888 @dfn{lexically scoped} as in C.)
924 @end enumerate 889 @end enumerate
925 890
926 For those familiar with Lisp, Emacs Lisp is modelled after MacLisp, an 891 For those familiar with Lisp, Emacs Lisp is modelled after MacLisp, an
927 early dialect of Lisp developed at MIT (no relation to the Macintosh 892 early dialect of Lisp developed at MIT (no relation to the Macintosh
965 Java, which is inexcusable. 930 Java, which is inexcusable.
966 @end enumerate 931 @end enumerate
967 932
968 Unfortunately, there is no perfect language. Static typing allows a 933 Unfortunately, there is no perfect language. Static typing allows a
969 compiler to catch programmer errors and produce more efficient code, but 934 compiler to catch programmer errors and produce more efficient code, but
970 makes programming more tedious and less fun. For the foreseeable future, 935 makes programming more tedious and less fun. For the forseeable future,
971 an Ideal Editing and Programming Environment (and that is what XEmacs 936 an Ideal Editing and Programming Environment (and that is what XEmacs
972 aspires to) will be programmable in multiple languages: high level ones 937 aspires to) will be programmable in multiple languages: high level ones
973 like Lisp for user customization and prototyping, and lower level ones 938 like Lisp for user customization and prototyping, and lower level ones
974 for infrastructure and industrial strength applications. If I had my 939 for infrastructure and industrial strength applications. If I had my
975 way, XEmacs would be friendly towards the Python, Scheme, C++, ML, 940 way, XEmacs would be friendly towards the Python, Scheme, C++, ML,
1275 most other data structures in Lisp. 1240 most other data structures in Lisp.
1276 @item char 1241 @item char
1277 An object representing a single character of text; chars behave like 1242 An object representing a single character of text; chars behave like
1278 integers in many ways but are logically considered text rather than 1243 integers in many ways but are logically considered text rather than
1279 numbers and have a different read syntax. (the read syntax for a char 1244 numbers and have a different read syntax. (the read syntax for a char
1280 contains the char itself or some textual encoding of it---for example, 1245 contains the char itself or some textual encoding of it -- for example,
1281 a Japanese Kanji character might be encoded as @samp{^[$(B#&^[(B} using the 1246 a Japanese Kanji character might be encoded as @samp{^[$(B#&^[(B} using the
1282 ISO-2022 encoding standard---rather than the numerical representation 1247 ISO-2022 encoding standard -- rather than the numerical representation
1283 of the char; this way, if the mapping between chars and integers 1248 of the char; this way, if the mapping between chars and integers
1284 changes, which is quite possible for Kanji characters and other extended 1249 changes, which is quite possible for Kanji characters and other extended
1285 characters, the same character will still be created. Note that some 1250 characters, the same character will still be created. Note that some
1286 primitives confuse chars and integers. The worst culprit is @code{eq}, 1251 primitives confuse chars and integers. The worst culprit is @code{eq},
1287 which makes a special exception and considers a char to be @code{eq} to 1252 which makes a special exception and considers a char to be @code{eq} to
1495 1460
1496 @example 1461 @example
1497 1.983e-4 1462 1.983e-4
1498 @end example 1463 @end example
1499 1464
1500 converts to a float whose value is 1.983e-4, or .0001983. 1465 converts to a float whose value is 1983.23e-4, or .0001983.
1501 1466
1502 @example 1467 @example
1503 ?b 1468 ?b
1504 @end example 1469 @end example
1505 1470
1618 1583
1619 @example 1584 @example
1620 [ 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 ] 1585 [ 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 ]
1621 [ 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 ] 1586 [ 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 ]
1622 1587
1623 <---------------------------------------------------------> <-> 1588 <---> ^ <------------------------------------------------------>
1624 a pointer to a structure, or an integer tag 1589 tag | a pointer to a structure, or an integer
1625 @end example 1590 |
1626 1591 mark bit
1627 A tag of 00 is used for all pointer object types, a tag of 10 is used 1592 @end example
1628 for characters, and the other two tags 01 and 11 are joined together to 1593
1629 form the integer object type. This representation gives us 31 bit 1594 The tag describes the type of the Lisp object. For integers and chars,
1630 integers and 30 bit characters, while pointers are represented directly 1595 the lower 28 bits contain the value of the integer or char; for all
1631 without any bit masking or shifting. This representation, though, 1596 others, the lower 28 bits contain a pointer. The mark bit is used
1632 assumes that pointers to structs are always aligned to multiples of 4, 1597 during garbage-collection, and is always 0 when garbage collection is
1633 so the lower 2 bits are always zero. 1598 not happening. (The way that garbage collection works, basically, is that it
1599 loops over all places where Lisp objects could exist -- this includes
1600 all global variables in C that contain Lisp objects [including
1601 @code{Vobarray}, the C equivalent of @code{obarray}; through this, all
1602 Lisp variables will get marked], plus various other places -- and
1603 recursively scans through the Lisp objects, marking each object it finds
1604 by setting the mark bit. Then it goes through the lists of all objects
1605 allocated, freeing the ones that are not marked and turning off the mark
1606 bit of the ones that are marked.)
1634 1607
1635 Lisp objects use the typedef @code{Lisp_Object}, but the actual C type 1608 Lisp objects use the typedef @code{Lisp_Object}, but the actual C type
1636 used for the Lisp object can vary. It can be either a simple type 1609 used for the Lisp object can vary. It can be either a simple type
1637 (@code{long} on the DEC Alpha, @code{int} on other machines) or a 1610 (@code{long} on the DEC Alpha, @code{int} on other machines) or a
1638 structure whose fields are bit fields that line up properly (actually, a 1611 structure whose fields are bit fields that line up properly (actually, a
1639 union of structures is used). Generally the simple integral type is 1612 union of structures is used). Generally the simple integral type is
1640 preferable because it ensures that the compiler will actually use a 1613 preferable because it ensures that the compiler will actually use a
1641 machine word to represent the object (some compilers will use more 1614 machine word to represent the object (some compilers will use more
1642 general and less efficient code for unions and structs even if they can 1615 general and less efficient code for unions and structs even if they can
1643 fit in a machine word). The union type, however, has the advantage of 1616 fit in a machine word). The union type, however, has the advantage of
1644 stricter type checking. If you accidentally pass an integer where a Lisp 1617 stricter type checking (if you accidentally pass an integer where a Lisp
1645 object is desired, you get a compile error. The choice of which type 1618 object is desired, you get a compile error), and it makes it easier to
1646 to use is determined by the preprocessor constant @code{USE_UNION_TYPE} 1619 decode Lisp objects when debugging. The choice of which type to use is
1647 which is defined via the @code{--use-union-type} option to 1620 determined by the preprocessor constant @code{USE_UNION_TYPE} which is
1648 @code{configure}. 1621 defined via the @code{--use-union-type} option to @code{configure}.
1649 1622
1650 Various macros are used to convert between Lisp_Objects and the 1623 @cindex record type
1651 corresponding C type. Macros of the form @code{XINT()}, @code{XCHAR()}, 1624
1652 @code{XSTRING()}, @code{XSYMBOL()}, do any required bit shifting and/or 1625 Note that there are only eight types that the tag can represent, but
1653 masking and cast it to the appropriate type. @code{XINT()} needs to be 1626 many more actual types than this. This is handled by having one of the
1654 a bit tricky so that negative numbers are properly sign-extended. Since 1627 tag types specify a meta-type called a @dfn{record}; for all such
1655 integers are stored left-shifted, if the right-shift operator does an 1628 objects, the first four bytes of the pointed-to structure indicate what
1656 arithmetic shift (i.e. it leaves the most-significant bit as-is rather 1629 the actual type is.
1657 than shifting in a zero, so that it mimics a divide-by-two even for 1630
1658 negative numbers) the shift to remove the tag bit is enough. This is 1631 Note also that having 28 bits for pointers and integers restricts a lot
1659 the case on all the systems we support. 1632 of things to 256 megabytes of memory. (Basically, enough pointers and
1660 1633 indices and whatnot get stuffed into Lisp objects that the total amount
1661 Note that when @code{ERROR_CHECK_TYPECHECK} is defined, the converter 1634 of memory used by XEmacs can't grow above 256 megabytes. In older
1662 macros become more complicated---they check the tag bits and/or the 1635 versions of XEmacs and GNU Emacs, the tag was 5 bits wide, allowing for
1636 32 types, which was more than the actual number of types that existed at
1637 the time, and no ``record'' type was necessary. However, this limited
1638 the editor to 64 megabytes total, which some users who edited large
1639 files might conceivably exceed.)
1640
1641 Also, note that there is an implicit assumption here that all pointers
1642 are low enough that the top bits are all zero and can just be chopped
1643 off. On standard machines that allocate memory from the bottom up (and
1644 give each process its own address space), this works fine. Some
1645 machines, however, put the data space somewhere else in memory
1646 (e.g. beginning at 0x80000000). Those machines cope by defining
1647 @code{DATA_SEG_BITS} in the corresponding @file{m/} or @file{s/} file to
1648 the proper mask. Then, pointers retrieved from Lisp objects are
1649 automatically OR'ed with this value prior to being used.
1650
1651 A corollary of the previous paragraph is that @strong{(pointers to)
1652 stack-allocated structures cannot be put into Lisp objects}. The stack
1653 is generally located near the top of memory; if you put such a pointer
1654 into a Lisp object, it will get its top bits chopped off, and you will
1655 lose.
1656
1657 Actually, there's an alternative representation of a @code{Lisp_Object},
1658 invented by Kyle Jones, that is used when the
1659 @code{--use-minimal-tagbits} option to @code{configure} is used. In
1660 this case the 2 lower bits are used for the tag bits. This
1661 representation assumes that pointers to structs are always aligned to
1662 multiples of 4, so the lower 2 bits are always zero.
1663
1664 @example
1665 [ 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 ]
1666 [ 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 ]
1667
1668 <---------------------------------------------------------> <->
1669 a pointer to a structure, or an integer tag
1670 @end example
1671
1672 A tag of 00 is used for all pointer object types, a tag of 10 is used
1673 for characters, and the other two tags 01 and 11 are joined together to
1674 form the integer object type. The markbit is moved to part of the
1675 structure being pointed at (integers and chars do not need to be marked,
1676 since no memory is allocated). This representation has these
1677 advantages:
1678
1679 @enumerate
1680 @item
1681 31 bits can be used for Lisp Integers.
1682 @item
1683 @emph{Any} pointer can be represented directly, and no bit masking
1684 operations are necessary.
1685 @end enumerate
1686
1687 The disadvantages are:
1688
1689 @enumerate
1690 @item
1691 An extra level of indirection is needed when accessing the object types
1692 that were not record types. So checking whether a Lisp object is a cons
1693 cell becomes a slower operation.
1694 @item
1695 Mark bits can no longer be stored directly in Lisp objects, so another
1696 place for them must be found. This means that a cons cell requires more
1697 memory than merely room for 2 lisp objects, leading to extra memory use.
1698 @end enumerate
1699
1700 Various macros are used to construct Lisp objects and extract the
1701 components. Macros of the form @code{XINT()}, @code{XCHAR()},
1702 @code{XSTRING()}, @code{XSYMBOL()}, etc. mask out the pointer/integer
1703 field and cast it to the appropriate type. All of the macros that
1704 construct pointers will @code{OR} with @code{DATA_SEG_BITS} if
1705 necessary. @code{XINT()} needs to be a bit tricky so that negative
1706 numbers are properly sign-extended: Usually it does this by shifting the
1707 number four bits to the left and then four bits to the right. This
1708 assumes that the right-shift operator does an arithmetic shift (i.e. it
1709 leaves the most-significant bit as-is rather than shifting in a zero, so
1710 that it mimics a divide-by-two even for negative numbers). Not all
1711 machines/compilers do this, and on the ones that don't, a more
1712 complicated definition is selected by defining
1713 @code{EXPLICIT_SIGN_EXTEND}.
1714
1715 Note that when @code{ERROR_CHECK_TYPECHECK} is defined, the extractor
1716 macros become more complicated -- they check the tag bits and/or the
1663 type field in the first four bytes of a record type to ensure that the 1717 type field in the first four bytes of a record type to ensure that the
1664 object is really of the correct type. This is great for catching places 1718 object is really of the correct type. This is great for catching places
1665 where an incorrect type is being dereferenced---this typically results 1719 where an incorrect type is being dereferenced -- this typically results
1666 in a pointer being dereferenced as the wrong type of structure, with 1720 in a pointer being dereferenced as the wrong type of structure, with
1667 unpredictable (and sometimes not easily traceable) results. 1721 unpredictable (and sometimes not easily traceable) results.
1668 1722
1669 There are similar @code{XSET@var{TYPE}()} macros that construct a Lisp 1723 There are similar @code{XSET@var{TYPE}()} macros that construct a Lisp
1670 object. These macros are of the form @code{XSET@var{TYPE} 1724 object. These macros are of the form @code{XSET@var{TYPE}
1671 (@var{lvalue}, @var{result})}, i.e. they have to be a statement rather 1725 (@var{lvalue}, @var{result})},
1672 than just used in an expression. The reason for this is that standard C 1726 i.e. they have to be a statement rather than just used in an expression.
1673 doesn't let you ``construct'' a structure (but GCC does). Granted, this 1727 The reason for this is that standard C doesn't let you ``construct'' a
1674 sometimes isn't too convenient; for the case of integers, at least, you 1728 structure (but GCC does). Granted, this sometimes isn't too convenient;
1675 can use the function @code{make_int()}, which constructs and 1729 for the case of integers, at least, you can use the function
1676 @emph{returns} an integer Lisp object. Note that the 1730 @code{make_int()}, which constructs and @emph{returns} an integer
1677 @code{XSET@var{TYPE}()} macros are also affected by 1731 Lisp object. Note that the @code{XSET@var{TYPE}()} macros are also
1678 @code{ERROR_CHECK_TYPECHECK} and make sure that the structure is of the 1732 affected by @code{ERROR_CHECK_TYPECHECK} and make sure that the
1679 right type in the case of record types, where the type is contained in 1733 structure is of the right type in the case of record types, where the
1680 the structure. 1734 type is contained in the structure.
1681 1735
1682 The C programmer is responsible for @strong{guaranteeing} that a 1736 The C programmer is responsible for @strong{guaranteeing} that a
1683 Lisp_Object is the correct type before using the @code{X@var{TYPE}} 1737 Lisp_Object is is the correct type before using the @code{X@var{TYPE}}
1684 macros. This is especially important in the case of lists. Use 1738 macros. This is especially important in the case of lists. Use
1685 @code{XCAR} and @code{XCDR} if a Lisp_Object is certainly a cons cell, 1739 @code{XCAR} and @code{XCDR} if a Lisp_Object is certainly a cons cell,
1686 else use @code{Fcar()} and @code{Fcdr()}. Trust other C code, but not 1740 else use @code{Fcar()} and @code{Fcdr()}. Trust other C code, but not
1687 Lisp code. On the other hand, if XEmacs has an internal logic error, 1741 Lisp code. On the other hand, if XEmacs has an internal logic error,
1688 it's better to crash immediately, so sprinkle @code{assert()}s and 1742 it's better to crash immediately, so sprinkle ``unreachable''
1689 ``unreachable'' @code{abort()}s liberally about the source code. Where 1743 @code{abort()}s liberally about the source code.
1690 performance is an issue, use @code{type_checking_assert},
1691 @code{bufpos_checking_assert}, and @code{gc_checking_assert}, which do
1692 nothing unless the corresponding configure error checking flag was
1693 specified.
1694 1744
1695 @node Rules When Writing New C Code, A Summary of the Various XEmacs Modules, How Lisp Objects Are Represented in C, Top 1745 @node Rules When Writing New C Code, A Summary of the Various XEmacs Modules, How Lisp Objects Are Represented in C, Top
1696 @chapter Rules When Writing New C Code 1746 @chapter Rules When Writing New C Code
1697 1747
1698 The XEmacs C Code is extremely complex and intricate, and there are many 1748 The XEmacs C Code is extremely complex and intricate, and there are many
1708 * Adding Global Lisp Variables:: 1758 * Adding Global Lisp Variables::
1709 * Coding for Mule:: 1759 * Coding for Mule::
1710 * Techniques for XEmacs Developers:: 1760 * Techniques for XEmacs Developers::
1711 @end menu 1761 @end menu
1712 1762
1713 @node General Coding Rules, Writing Lisp Primitives, Rules When Writing New C Code, Rules When Writing New C Code 1763 @node General Coding Rules
1714 @section General Coding Rules 1764 @section General Coding Rules
1715 1765
1716 The C code is actually written in a dialect of C called @dfn{Clean C}, 1766 The C code is actually written in a dialect of C called @dfn{Clean C},
1717 meaning that it can be compiled, mostly warning-free, with either a C or 1767 meaning that it can be compiled, mostly warning-free, with either a C or
1718 C++ compiler. Coding in Clean C has several advantages over plain C. 1768 C++ compiler. Coding in Clean C has several advantages over plain C.
1741 the same directory as the C sources) and @file{lisp.h}. @file{config.h} 1791 the same directory as the C sources) and @file{lisp.h}. @file{config.h}
1742 must always be included before any other header files (including 1792 must always be included before any other header files (including
1743 system header files) to ensure that certain tricks played by various 1793 system header files) to ensure that certain tricks played by various
1744 @file{s/} and @file{m/} files work out correctly. 1794 @file{s/} and @file{m/} files work out correctly.
1745 1795
1746 When including header files, always use angle brackets, not double
1747 quotes, except when the file to be included is always in the same
1748 directory as the including file. If either file is a generated file,
1749 then that is not likely to be the case. In order to understand why we
1750 have this rule, imagine what happens when you do a build in the source
1751 directory using @samp{./configure} and another build in another
1752 directory using @samp{../work/configure}. There will be two different
1753 @file{config.h} files. Which one will be used if you @samp{#include
1754 "config.h"}?
1755
1756 @strong{All global and static variables that are to be modifiable must 1796 @strong{All global and static variables that are to be modifiable must
1757 be declared uninitialized.} This means that you may not use the 1797 be declared uninitialized.} This means that you may not use the
1758 ``declare with initializer'' form for these variables, such as @code{int 1798 ``declare with initializer'' form for these variables, such as @code{int
1759 some_variable = 0;}. The reason for this has to do with some kludges 1799 some_variable = 0;}. The reason for this has to do with some kludges
1760 done during the dumping process: If possible, the initialized data 1800 done during the dumping process: If possible, the initialized data
1761 segment is re-mapped so that it becomes part of the (unmodifiable) code 1801 segment is re-mapped so that it becomes part of the (unmodifiable) code
1762 segment in the dumped executable. This allows this memory to be shared 1802 segment in the dumped executable. This allows this memory to be shared
1763 among multiple running XEmacs processes. XEmacs is careful to place as 1803 among multiple running XEmacs processes. XEmacs is careful to place as
1764 much constant data as possible into initialized variables during the 1804 much constant data as possible into initialized variables (in
1765 @file{temacs} phase. 1805 particular, into what's called the @dfn{pure space} -- see below) during
1806 the @file{temacs} phase.
1766 1807
1767 @cindex copy-on-write 1808 @cindex copy-on-write
1768 @strong{Please note:} This kludge only works on a few systems nowadays, 1809 @strong{Please note:} This kludge only works on a few systems nowadays,
1769 and is rapidly becoming irrelevant because most modern operating systems 1810 and is rapidly becoming irrelevant because most modern operating systems
1770 provide @dfn{copy-on-write} semantics. All data is initially shared 1811 provide @dfn{copy-on-write} semantics. All data is initially shared
1794 1835
1795 The C source code makes heavy use of C preprocessor macros. One popular 1836 The C source code makes heavy use of C preprocessor macros. One popular
1796 macro style is: 1837 macro style is:
1797 1838
1798 @example 1839 @example
1799 #define FOO(var, value) do @{ \ 1840 #define FOO(var, value) do @{ \
1800 Lisp_Object FOO_value = (value); \ 1841 Lisp_Object FOO_value = (value); \
1801 ... /* compute using FOO_value */ \ 1842 ... /* compute using FOO_value */ \
1802 (var) = bar; \ 1843 (var) = bar; \
1803 @} while (0) 1844 @} while (0)
1804 @end example 1845 @end example
1805 1846
1806 The @code{do @{...@} while (0)} is a standard trick to allow FOO to have 1847 The @code{do @{...@} while (0)} is a standard trick to allow FOO to have
1807 statement semantics, so that it can safely be used within an @code{if} 1848 statement semantics, so that it can safely be used within an @code{if}
1823 case of @code{GET_EXTERNAL_LIST_LENGTH}, validating the properness of 1864 case of @code{GET_EXTERNAL_LIST_LENGTH}, validating the properness of
1824 the list. The macros @code{EXTERNAL_LIST_LOOP_DELETE_IF} and 1865 the list. The macros @code{EXTERNAL_LIST_LOOP_DELETE_IF} and
1825 @code{LIST_LOOP_DELETE_IF} delete elements from a lisp list satisfying some 1866 @code{LIST_LOOP_DELETE_IF} delete elements from a lisp list satisfying some
1826 predicate. 1867 predicate.
1827 1868
1828 @node Writing Lisp Primitives, Adding Global Lisp Variables, General Coding Rules, Rules When Writing New C Code 1869 @node Writing Lisp Primitives
1829 @section Writing Lisp Primitives 1870 @section Writing Lisp Primitives
1830 1871
1831 Lisp primitives are Lisp functions implemented in C. The details of 1872 Lisp primitives are Lisp functions implemented in C. The details of
1832 interfacing the C function so that Lisp can call it are handled by a few 1873 interfacing the C function so that Lisp can call it are handled by a few
1833 C macros. The only way to really understand how to write new C code is 1874 C macros. The only way to really understand how to write new C code is
2067 2108
2068 @file{eval.c} is a very good file to look through for examples; 2109 @file{eval.c} is a very good file to look through for examples;
2069 @file{lisp.h} contains the definitions for important macros and 2110 @file{lisp.h} contains the definitions for important macros and
2070 functions. 2111 functions.
2071 2112
2072 @node Adding Global Lisp Variables, Coding for Mule, Writing Lisp Primitives, Rules When Writing New C Code 2113 @node Adding Global Lisp Variables
2073 @section Adding Global Lisp Variables 2114 @section Adding Global Lisp Variables
2074 2115
2075 Global variables whose names begin with @samp{Q} are constants whose 2116 Global variables whose names begin with @samp{Q} are constants whose
2076 value is a symbol of a particular name. The name of the variable should 2117 value is a symbol of a particular name. The name of the variable should
2077 be derived from the name of the symbol using the same rules as for Lisp 2118 be derived from the name of the symbol using the same rules as for Lisp
2129 garbage-collection mechanism won't know that the object in this variable 2170 garbage-collection mechanism won't know that the object in this variable
2130 is in use, and will happily collect it and reuse its storage for another 2171 is in use, and will happily collect it and reuse its storage for another
2131 Lisp object, and you will be the one who's unhappy when you can't figure 2172 Lisp object, and you will be the one who's unhappy when you can't figure
2132 out how your variable got overwritten. 2173 out how your variable got overwritten.
2133 2174
2134 @node Coding for Mule, Techniques for XEmacs Developers, Adding Global Lisp Variables, Rules When Writing New C Code 2175 @node Coding for Mule
2135 @section Coding for Mule 2176 @section Coding for Mule
2136 @cindex Coding for Mule 2177 @cindex Coding for Mule
2137 2178
2138 Although Mule support is not compiled by default in XEmacs, many people 2179 Although Mule support is not compiled by default in XEmacs, many people
2139 are using it, and we consider it crucial that new code works correctly 2180 are using it, and we consider it crucial that new code works correctly
2152 * Conversion to and from External Data:: 2193 * Conversion to and from External Data::
2153 * General Guidelines for Writing Mule-Aware Code:: 2194 * General Guidelines for Writing Mule-Aware Code::
2154 * An Example of Mule-Aware Code:: 2195 * An Example of Mule-Aware Code::
2155 @end menu 2196 @end menu
2156 2197
2157 @node Character-Related Data Types, Working With Character and Byte Positions, Coding for Mule, Coding for Mule 2198 @node Character-Related Data Types
2158 @subsection Character-Related Data Types 2199 @subsection Character-Related Data Types
2159 2200
2160 First, let's review the basic character-related datatypes used by 2201 First, let's review the basic character-related datatypes used by
2161 XEmacs. Note that the separate @code{typedef}s are not mandatory in the 2202 XEmacs. Note that the separate @code{typedef}s are not mandatory in the
2162 current implementation (all of them boil down to @code{unsigned char} or 2203 current implementation (all of them boil down to @code{unsigned char} or
2179 @item Bufbyte 2220 @item Bufbyte
2180 @cindex Bufbyte 2221 @cindex Bufbyte
2181 The data representing the text in a buffer or string is logically a set 2222 The data representing the text in a buffer or string is logically a set
2182 of @code{Bufbyte}s. 2223 of @code{Bufbyte}s.
2183 2224
2184 XEmacs does not work with the same character formats all the time; when 2225 XEmacs does not work with character formats all the time; when reading
2185 reading characters from the outside, it decodes them to an internal 2226 characters from the outside, it decodes them to an internal format, and
2186 format, and likewise encodes them when writing. @code{Bufbyte} (in fact 2227 likewise encodes them when writing. @code{Bufbyte} (in fact
2187 @code{unsigned char}) is the basic unit of XEmacs internal buffers and 2228 @code{unsigned char}) is the basic unit of XEmacs internal buffers and
2188 strings format. A @code{Bufbyte *} is the type that points at text 2229 strings format.
2189 encoded in the variable-width internal encoding.
2190 2230
2191 One character can correspond to one or more @code{Bufbyte}s. In the 2231 One character can correspond to one or more @code{Bufbyte}s. In the
2192 current Mule implementation, an ASCII character is represented by the 2232 current implementation, an ASCII character is represented by the same
2193 same @code{Bufbyte}, and other characters are represented by a sequence 2233 @code{Bufbyte}, and extended characters are represented by a sequence of
2194 of two or more @code{Bufbyte}s. 2234 @code{Bufbyte}s.
2195 2235
2196 Without Mule support, there are exactly 256 characters, implicitly 2236 Without Mule support, a @code{Bufbyte} is equivalent to an
2197 Latin-1, and each character is represented using one @code{Bufbyte}, and 2237 @code{Emchar}.
2198 there is a one-to-one correspondence between @code{Bufbyte}s and
2199 @code{Emchar}s.
2200 2238
2201 @item Bufpos 2239 @item Bufpos
2202 @itemx Charcount 2240 @itemx Charcount
2203 @cindex Bufpos 2241 @cindex Bufpos
2204 @cindex Charcount 2242 @cindex Charcount
2205 A @code{Bufpos} represents a character position in a buffer or string. 2243 A @code{Bufpos} represents a character position in a buffer or string.
2206 A @code{Charcount} represents a number (count) of characters. 2244 A @code{Charcount} represents a number (count) of characters.
2207 Logically, subtracting two @code{Bufpos} values yields a 2245 Logically, subtracting two @code{Bufpos} values yields a
2208 @code{Charcount} value. Although all of these are @code{typedef}ed to 2246 @code{Charcount} value. Although all of these are @code{typedef}ed to
2209 @code{EMACS_INT}, we use them in preference to @code{EMACS_INT} to make 2247 @code{int}, we use them in preference to @code{int} to make it clear
2210 it clear what sort of position is being used. 2248 what sort of position is being used.
2211 2249
2212 @code{Bufpos} and @code{Charcount} values are the only ones that are 2250 @code{Bufpos} and @code{Charcount} values are the only ones that are
2213 ever visible to Lisp. 2251 ever visible to Lisp.
2214 2252
2215 @item Bytind 2253 @item Bytind
2216 @itemx Bytecount 2254 @itemx Bytecount
2217 @cindex Bytind 2255 @cindex Bytind
2218 @cindex Bytecount 2256 @cindex Bytecount
2219 A @code{Bytind} represents a byte position in a buffer or string. A 2257 A @code{Bytind} represents a byte position in a buffer or string. A
2220 @code{Bytecount} represents the distance between two positions, in bytes. 2258 @code{Bytecount} represents the distance between two positions in bytes.
2221 The relationship between @code{Bytind} and @code{Bytecount} is the same 2259 The relationship between @code{Bytind} and @code{Bytecount} is the same
2222 as the relationship between @code{Bufpos} and @code{Charcount}. 2260 as the relationship between @code{Bufpos} and @code{Charcount}.
2223 2261
2224 @item Extbyte 2262 @item Extbyte
2225 @itemx Extcount 2263 @itemx Extcount
2229 which are equivalent to @code{unsigned char}. Obviously, an 2267 which are equivalent to @code{unsigned char}. Obviously, an
2230 @code{Extcount} is the distance between two @code{Extbyte}s. Extbytes 2268 @code{Extcount} is the distance between two @code{Extbyte}s. Extbytes
2231 and Extcounts are not all that frequent in XEmacs code. 2269 and Extcounts are not all that frequent in XEmacs code.
2232 @end table 2270 @end table
2233 2271
2234 @node Working With Character and Byte Positions, Conversion to and from External Data, Character-Related Data Types, Coding for Mule 2272 @node Working With Character and Byte Positions
2235 @subsection Working With Character and Byte Positions 2273 @subsection Working With Character and Byte Positions
2236 2274
2237 Now that we have defined the basic character-related types, we can look 2275 Now that we have defined the basic character-related types, we can look
2238 at the macros and functions designed for work with them and for 2276 at the macros and functions designed for work with them and for
2239 conversion between them. Most of these macros are defined in 2277 conversion between them. Most of these macros are defined in
2242 learn about them. 2280 learn about them.
2243 2281
2244 @table @code 2282 @table @code
2245 @item MAX_EMCHAR_LEN 2283 @item MAX_EMCHAR_LEN
2246 @cindex MAX_EMCHAR_LEN 2284 @cindex MAX_EMCHAR_LEN
2247 This preprocessor constant is the maximum number of buffer bytes to 2285 This preprocessor constant is the maximum number of buffer bytes per
2248 represent an Emacs character in the variable width internal encoding. 2286 Emacs character, i.e. the byte length of an @code{Emchar}. It is useful
2249 It is useful when allocating temporary strings to keep a known number of 2287 when allocating temporary strings to keep a known number of characters.
2250 characters. For instance: 2288 For instance:
2251 2289
2252 @example 2290 @example
2253 @group 2291 @group
2254 @{ 2292 @{
2255 Charcount cclen; 2293 Charcount cclen;
2353 @example 2391 @example
2354 Bufbyte *charptr_n_addr (Bufbyte *p, Charcount cc); 2392 Bufbyte *charptr_n_addr (Bufbyte *p, Charcount cc);
2355 @end example 2393 @end example
2356 @end table 2394 @end table
2357 2395
2358 @node Conversion to and from External Data, General Guidelines for Writing Mule-Aware Code, Working With Character and Byte Positions, Coding for Mule 2396 @node Conversion to and from External Data
2359 @subsection Conversion to and from External Data 2397 @subsection Conversion to and from External Data
2360 2398
2361 When an external function, such as a C library function, returns a 2399 When an external function, such as a C library function, returns a
2362 @code{char} pointer, you should almost never treat it as @code{Bufbyte}. 2400 @code{char} pointer, you should almost never treat it as @code{Bufbyte}.
2363 This is because these returned strings may contain 8bit characters which 2401 This is because these returned strings may contain 8bit characters which
2366 always convert it to an appropriate external encoding, lest the internal 2404 always convert it to an appropriate external encoding, lest the internal
2367 stuff (such as the infamous \201 characters) leak out. 2405 stuff (such as the infamous \201 characters) leak out.
2368 2406
2369 The interface to conversion between the internal and external 2407 The interface to conversion between the internal and external
2370 representations of text are the numerous conversion macros defined in 2408 representations of text are the numerous conversion macros defined in
2371 @file{buffer.h}. There used to be a fixed set of external formats 2409 @file{buffer.h}. Before looking at them, we'll look at the external
2372 supported by these macros, but now any coding system can be used with 2410 formats supported by these macros.
2373 these macros. The coding system alias mechanism is used to create the 2411
2374 following logical coding systems, which replace the fixed external 2412 Currently meaningful formats are @code{FORMAT_BINARY},
2375 formats. The (dontusethis-set-symbol-value-handler) mechanism was 2413 @code{FORMAT_FILENAME}, @code{FORMAT_OS}, and @code{FORMAT_CTEXT}. Here
2376 enhanced to make this possible (more work on that is needed - like 2414 is a description of these.
2377 remove the @code{dontusethis-} prefix).
2378 2415
2379 @table @code 2416 @table @code
2380 @item Qbinary 2417 @item FORMAT_BINARY
2381 This is the simplest format and is what we use in the absence of a more 2418 Binary format. This is the simplest format and is what we use in the
2382 appropriate format. This converts according to the @code{binary} coding 2419 absence of a more appropriate format. This converts according to the
2383 system: 2420 @code{binary} coding system:
2384 2421
2385 @enumerate a 2422 @enumerate a
2386 @item 2423 @item
2387 On input, bytes 0--255 are converted into (implicitly Latin-1) 2424 On input, bytes 0--255 are converted into characters 0--255.
2388 characters 0--255. A non-Mule xemacs doesn't really know about
2389 different character sets and the fonts to display them, so the bytes can
2390 be treated as text in different 1-byte encodings by simply setting the
2391 appropriate fonts. So in a sense, non-Mule xemacs is a multi-lingual
2392 editor if, for example, different fonts are used to display text in
2393 different buffers, faces, or windows. The specifier mechanism gives the
2394 user complete control over this kind of behavior.
2395 @item 2425 @item
2396 On output, characters 0--255 are converted into bytes 0--255 and other 2426 On output, characters 0--255 are converted into bytes 0--255 and other
2397 characters are converted into `~'. 2427 characters are converted into `X'.
2398 @end enumerate 2428 @end enumerate
2399 2429
2400 @item Qfile_name 2430 @item FORMAT_FILENAME
2401 Format used for filenames. This is user-definable via either the 2431 Format used for filenames. In the original Mule, this is user-definable
2402 @code{file-name-coding-system} or @code{pathname-coding-system} (now 2432 with the @code{pathname-coding-system} variable. For the moment, we
2403 obsolete) variables. 2433 just use the @code{binary} coding system.
2404 2434
2405 @item Qnative 2435 @item FORMAT_OS
2406 Format used for the external Unix environment---@code{argv[]}, stuff 2436 Format used for the external Unix environment---@code{argv[]}, stuff
2407 from @code{getenv()}, stuff from the @file{/etc/passwd} file, etc. 2437 from @code{getenv()}, stuff from the @file{/etc/passwd} file, etc.
2408 Currently this is the same as Qfile_name. The two should be 2438
2409 distinguished for clarity and possible future separation. 2439 Perhaps should be the same as FORMAT_FILENAME.
2410 2440
2411 @item Qctext 2441 @item FORMAT_CTEXT
2412 Compound--text format. This is the standard X11 format used for data 2442 Compound--text format. This is the standard X format used for data
2413 stored in properties, selections, and the like. This is an 8-bit 2443 stored in properties, selections, and the like. This is an 8-bit
2414 no-lock-shift ISO2022 coding system. This is a real coding system, 2444 no-lock-shift ISO2022 coding system.
2415 unlike Qfile_name, which is user-definable.
2416 @end table 2445 @end table
2417 2446
2418 There are two fundamental macros to convert between external and 2447 The macros to convert between these formats and the internal format, and
2419 internal format. 2448 vice versa, follow.
2420
2421 @code{TO_INTERNAL_FORMAT} converts external data to internal format, and
2422 @code{TO_EXTERNAL_FORMAT} converts the other way around. The arguments
2423 each of these receives are a source type, a source, a sink type, a sink,
2424 and a coding system (or a symbol naming a coding system).
2425
2426 A typical call looks like
2427 @example
2428 TO_EXTERNAL_FORMAT (LISP_STRING, str, C_STRING_MALLOC, ptr, Qfile_name);
2429 @end example
2430
2431 which means that the contents of the lisp string @code{str} are written
2432 to a malloc'ed memory area which will be pointed to by @code{ptr}, after
2433 the function returns. The conversion will be done using the
2434 @code{file-name} coding system, which will be controlled by the user
2435 indirectly by setting or binding the variable
2436 @code{file-name-coding-system}.
2437
2438 Some sources and sinks require two C variables to specify. We use some
2439 preprocessor magic to allow different source and sink types, and even
2440 different numbers of arguments to specify different types of sources and
2441 sinks.
2442
2443 So we can have a call that looks like
2444 @example
2445 TO_INTERNAL_FORMAT (DATA, (ptr, len),
2446 MALLOC, (ptr, len),
2447 coding_system);
2448 @end example
2449
2450 The parenthesized argument pairs are required to make the preprocessor
2451 magic work.
2452
2453 Here are the different source and sink types:
2454 2449
2455 @table @code 2450 @table @code
2456 @item @code{DATA, (ptr, len),} 2451 @item GET_CHARPTR_INT_DATA_ALLOCA
2457 input data is a fixed buffer of size @var{len} at address @var{ptr} 2452 @itemx GET_CHARPTR_EXT_DATA_ALLOCA
2458 @item @code{ALLOCA, (ptr, len),} 2453 These two are the most basic conversion macros.
2459 output data is placed in an alloca()ed buffer of size @var{len} pointed to by @var{ptr} 2454 @code{GET_CHARPTR_INT_DATA_ALLOCA} converts external data to internal
2460 @item @code{MALLOC, (ptr, len),} 2455 format, and @code{GET_CHARPTR_EXT_DATA_ALLOCA} converts the other way
2461 output data is in a malloc()ed buffer of size @var{len} pointed to by @var{ptr} 2456 around. The arguments each of these receives are @var{ptr} (pointer to
2462 @item @code{C_STRING_ALLOCA, ptr,} 2457 the text in external format), @var{len} (length of texts in bytes),
2463 equivalent to @code{ALLOCA (ptr, len_ignored)} on output. 2458 @var{fmt} (format of the external text), @var{ptr_out} (lvalue to which
2464 @item @code{C_STRING_MALLOC, ptr,} 2459 new text should be copied), and @var{len_out} (lvalue which will be
2465 equivalent to @code{MALLOC (ptr, len_ignored)} on output 2460 assigned the length of the internal text in bytes). The resulting text
2466 @item @code{C_STRING, ptr,} 2461 is stored to a stack-allocated buffer. If the text doesn't need
2467 equivalent to @code{DATA, (ptr, strlen (ptr) + 1)} on input 2462 changing, these macros will do nothing, except for setting
2468 @item @code{LISP_STRING, string,} 2463 @var{len_out}.
2469 input or output is a Lisp_Object of type string 2464
2470 @item @code{LISP_BUFFER, buffer,} 2465 The macros above take many arguments which makes them unwieldy. For
2471 output is written to @code{(point)} in lisp buffer @var{buffer} 2466 this reason, a number of convenience macros are defined with obvious
2472 @item @code{LISP_LSTREAM, lstream,} 2467 functionality, but accepting less arguments. The general rule is that
2473 input or output is a Lisp_Object of type lstream 2468 macros with @samp{INT} in their name convert text to internal Emacs
2474 @item @code{LISP_OPAQUE, object,} 2469 representation, whereas the @samp{EXT} macros convert to external
2475 input or output is a Lisp_Object of type opaque 2470 representation.
2471
2472 @item GET_C_CHARPTR_INT_DATA_ALLOCA
2473 @itemx GET_C_CHARPTR_EXT_DATA_ALLOCA
2474 As their names imply, these macros work on C char pointers, which are
2475 zero-terminated, and thus do not need @var{len} or @var{len_out}
2476 parameters.
2477
2478 @item GET_STRING_EXT_DATA_ALLOCA
2479 @itemx GET_C_STRING_EXT_DATA_ALLOCA
2480 These two macros convert a Lisp string into an external representation.
2481 The difference between them is that @code{GET_STRING_EXT_DATA_ALLOCA}
2482 stores its output to a generic string, providing @var{len_out}, the
2483 length of the resulting external string. On the other hand,
2484 @code{GET_C_STRING_EXT_DATA_ALLOCA} assumes that the caller will be
2485 satisfied with output string being zero-terminated.
2486
2487 Note that for Lisp strings only one conversion direction makes sense.
2488
2489 @item GET_C_CHARPTR_EXT_BINARY_DATA_ALLOCA
2490 @itemx GET_CHARPTR_EXT_BINARY_DATA_ALLOCA
2491 @itemx GET_STRING_BINARY_DATA_ALLOCA
2492 @itemx GET_C_STRING_BINARY_DATA_ALLOCA
2493 @itemx GET_C_CHARPTR_EXT_FILENAME_DATA_ALLOCA
2494 @itemx ...
2495 These macros convert internal text to a specific external
2496 representation, with the external format being encoded into the name of
2497 the macro. Note that the @code{GET_STRING_...} and
2498 @code{GET_C_STRING...} macros lack the @samp{EXT} tag, because they
2499 only make sense in that direction.
2500
2501 @item GET_C_CHARPTR_INT_BINARY_DATA_ALLOCA
2502 @itemx GET_CHARPTR_INT_BINARY_DATA_ALLOCA
2503 @itemx GET_C_CHARPTR_INT_FILENAME_DATA_ALLOCA
2504 @itemx ...
2505 These macros convert external text of a specific format to its internal
2506 representation, with the external format being incoded into the name of
2507 the macro.
2476 @end table 2508 @end table
2477 2509
2478 Often, the data is being converted to a '\0'-byte-terminated string, 2510 @node General Guidelines for Writing Mule-Aware Code
2479 which is the format required by many external system C APIs. For these
2480 purposes, a source type of @code{C_STRING} or a sink type of
2481 @code{C_STRING_ALLOCA} or @code{C_STRING_MALLOC} is appropriate.
2482 Otherwise, we should try to keep XEmacs '\0'-byte-clean, which means
2483 using (ptr, len) pairs.
2484
2485 The sinks to be specified must be lvalues, unless they are the lisp
2486 object types @code{LISP_LSTREAM} or @code{LISP_BUFFER}.
2487
2488 For the sink types @code{ALLOCA} and @code{C_STRING_ALLOCA}, the
2489 resulting text is stored in a stack-allocated buffer, which is
2490 automatically freed on returning from the function. However, the sink
2491 types @code{MALLOC} and @code{C_STRING_MALLOC} return @code{xmalloc()}ed
2492 memory. The caller is responsible for freeing this memory using
2493 @code{xfree()}.
2494
2495 Note that it doesn't make sense for @code{LISP_STRING} to be a source
2496 for @code{TO_INTERNAL_FORMAT} or a sink for @code{TO_EXTERNAL_FORMAT}.
2497 You'll get an assertion failure if you try.
2498
2499
2500 @node General Guidelines for Writing Mule-Aware Code, An Example of Mule-Aware Code, Conversion to and from External Data, Coding for Mule
2501 @subsection General Guidelines for Writing Mule-Aware Code 2511 @subsection General Guidelines for Writing Mule-Aware Code
2502 2512
2503 This section contains some general guidance on how to write Mule-aware 2513 This section contains some general guidance on how to write Mule-aware
2504 code, as well as some pitfalls you should avoid. 2514 code, as well as some pitfalls you should avoid.
2505 2515
2522 It is extremely important to always convert external data, because 2532 It is extremely important to always convert external data, because
2523 XEmacs can crash if unexpected 8bit sequences are copied to its internal 2533 XEmacs can crash if unexpected 8bit sequences are copied to its internal
2524 buffers literally. 2534 buffers literally.
2525 2535
2526 This means that when a system function, such as @code{readdir}, returns 2536 This means that when a system function, such as @code{readdir}, returns
2527 a string, you may need to convert it using one of the conversion macros 2537 a string, you need to convert it using one of the conversion macros
2528 described in the previous chapter, before passing it further to Lisp. 2538 described in the previous chapter, before passing it further to Lisp.
2529 2539 In the case of @code{readdir}, you would use the
2530 Actually, most of the basic system functions that accept '\0'-terminated 2540 @code{GET_C_CHARPTR_INT_FILENAME_DATA_ALLOCA} macro.
2531 string arguments, like @code{stat()} and @code{open()}, have been
2532 @strong{encapsulated} so that they are they @code{always} do internal to
2533 external conversion themselves. This means you must pass internally
2534 encoded data, typically the @code{XSTRING_DATA} of a Lisp_String to
2535 these functions. This is actually a design bug, since it unexpectedly
2536 changes the semantics of the system functions. A better design would be
2537 to provide separate versions of these system functions that accepted
2538 Lisp_Objects which were lisp strings in place of their current
2539 @code{char *} arguments.
2540
2541 @example
2542 int stat_lisp (Lisp_Object path, struct stat *buf); /* Implement me */
2543 @end example
2544 2541
2545 Also note that many internal functions, such as @code{make_string}, 2542 Also note that many internal functions, such as @code{make_string},
2546 accept Bufbytes, which removes the need for them to convert the data 2543 accept Bufbytes, which removes the need for them to convert the data
2547 they receive. This increases efficiency because that way external data 2544 they receive. This increases efficiency because that way external data
2548 needs to be decoded only once, when it is read. After that, it is 2545 needs to be decoded only once, when it is read. After that, it is
2549 passed around in internal format. 2546 passed around in internal format.
2550 @end table 2547 @end table
2551 2548
2552 @node An Example of Mule-Aware Code, , General Guidelines for Writing Mule-Aware Code, Coding for Mule 2549 @node An Example of Mule-Aware Code
2553 @subsection An Example of Mule-Aware Code 2550 @subsection An Example of Mule-Aware Code
2554 2551
2555 As an example of Mule-aware code, we will analyze the @code{string} 2552 As an example of Mule-aware code, we shall will analyze the
2556 function, which conses up a Lisp string from the character arguments it 2553 @code{string} function, which conses up a Lisp string from the character
2557 receives. Here is the definition, pasted from @code{alloc.c}: 2554 arguments it receives. Here is the definition, pasted from
2555 @code{alloc.c}:
2558 2556
2559 @example 2557 @example
2560 @group 2558 @group
2561 DEFUN ("string", Fstring, 0, MANY, 0, /* 2559 DEFUN ("string", Fstring, 0, MANY, 0, /*
2562 Concatenate all the argument characters and make the result a string. 2560 Concatenate all the argument characters and make the result a string.
2597 over the XEmacs code. For starters, I recommend 2595 over the XEmacs code. For starters, I recommend
2598 @code{Fnormalize_menu_item_name} in @file{menubar.c}. After you have 2596 @code{Fnormalize_menu_item_name} in @file{menubar.c}. After you have
2599 understood this section of the manual and studied the examples, you can 2597 understood this section of the manual and studied the examples, you can
2600 proceed writing new Mule-aware code. 2598 proceed writing new Mule-aware code.
2601 2599
2602 @node Techniques for XEmacs Developers, , Coding for Mule, Rules When Writing New C Code 2600 @node Techniques for XEmacs Developers
2603 @section Techniques for XEmacs Developers 2601 @section Techniques for XEmacs Developers
2604 2602
2605 To make a purified XEmacs, do: @code{make puremacs}.
2606 To make a quantified XEmacs, do: @code{make quantmacs}. 2603 To make a quantified XEmacs, do: @code{make quantmacs}.
2607 2604
2608 You simply can't dump Quantified and Purified images (unless using the 2605 You simply can't dump Quantified and Purified images. Run the image
2609 portable dumper). Purify gets confused when xemacs frees memory in one 2606 like so: @code{quantmacs -batch -l loadup.el run-temacs @var{xemacs-args...}}.
2610 process that was allocated in a @emph{different} process on a different
2611 machine!. Run it like so:
2612 @example
2613 temacs -batch -l loadup.el run-temacs @var{xemacs-args...}
2614 @end example
2615 2607
2616 Before you go through the trouble, are you compiling with all 2608 Before you go through the trouble, are you compiling with all
2617 debugging and error-checking off? If not, try that first. Be warned 2609 debugging and error-checking off? If not try that first. Be warned
2618 that while Quantify is directly responsible for quite a few 2610 that while Quantify is directly responsible for quite a few
2619 optimizations which have been made to XEmacs, doing a run which 2611 optimizations which have been made to XEmacs, doing a run which
2620 generates results which can be acted upon is not necessarily a trivial 2612 generates results which can be acted upon is not necessarily a trivial
2621 task. 2613 task.
2622 2614
2651 2643
2652 Unfortunately, Emacs Lisp is slow, and is going to stay slow. Function 2644 Unfortunately, Emacs Lisp is slow, and is going to stay slow. Function
2653 calls in elisp are especially expensive. Iterating over a long list is 2645 calls in elisp are especially expensive. Iterating over a long list is
2654 going to be 30 times faster implemented in C than in Elisp. 2646 going to be 30 times faster implemented in C than in Elisp.
2655 2647
2656 Heavily used small code fragments need to be fast. The traditional way 2648 To get started debugging XEmacs, take a look at the @file{gdbinit} and
2657 to implement such code fragments in C is with macros. But macros in C 2649 @file{dbxrc} files in the @file{src} directory.
2658 are known to be broken. 2650 @xref{Q2.1.15 - How to Debug an XEmacs problem with a debugger,,,
2659 2651 xemacs-faq, XEmacs FAQ}.
2660 Macro arguments that are repeatedly evaluated may suffer from repeated
2661 side effects or suboptimal performance.
2662
2663 Variable names used in macros may collide with caller's variables,
2664 causing (at least) unwanted compiler warnings.
2665
2666 In order to solve these problems, and maintain statement semantics, one
2667 should use the @code{do @{ ... @} while (0)} trick while trying to
2668 reference macro arguments exactly once using local variables.
2669
2670 Let's take a look at this poor macro definition:
2671
2672 @example
2673 #define MARK_OBJECT(obj) \
2674 if (!marked_p (obj)) mark_object (obj), did_mark = 1
2675 @end example
2676
2677 This macro evaluates its argument twice, and also fails if used like this:
2678 @example
2679 if (flag) MARK_OBJECT (obj); else do_something();
2680 @end example
2681
2682 A much better definition is
2683
2684 @example
2685 #define MARK_OBJECT(obj) do @{ \
2686 Lisp_Object mo_obj = (obj); \
2687 if (!marked_p (mo_obj)) \
2688 @{ \
2689 mark_object (mo_obj); \
2690 did_mark = 1; \
2691 @} \
2692 @} while (0)
2693 @end example
2694
2695 Notice the elimination of double evaluation by using the local variable
2696 with the obscure name. Writing safe and efficient macros requires great
2697 care. The one problem with macros that cannot be portably worked around
2698 is, since a C block has no value, a macro used as an expression rather
2699 than a statement cannot use the techniques just described to avoid
2700 multiple evaluation.
2701
2702 In most cases where a macro has function semantics, an inline function
2703 is a better implementation technique. Modern compiler optimizers tend
2704 to inline functions even if they have no @code{inline} keyword, and
2705 configure magic ensures that the @code{inline} keyword can be safely
2706 used as an additional compiler hint. Inline functions used in a single
2707 .c files are easy. The function must already be defined to be
2708 @code{static}. Just add another @code{inline} keyword to the
2709 definition.
2710
2711 @example
2712 inline static int
2713 heavily_used_small_function (int arg)
2714 @{
2715 ...
2716 @}
2717 @end example
2718
2719 Inline functions in header files are trickier, because we would like to
2720 make the following optimization if the function is @emph{not} inlined
2721 (for example, because we're compiling for debugging). We would like the
2722 function to be defined externally exactly once, and each calling
2723 translation unit would create an external reference to the function,
2724 instead of including a definition of the inline function in the object
2725 code of every translation unit that uses it. This optimization is
2726 currently only available for gcc. But you don't have to worry about the
2727 trickiness; just define your inline functions in header files using this
2728 pattern:
2729
2730 @example
2731 INLINE_HEADER int
2732 i_used_to_be_a_crufty_macro_but_look_at_me_now (int arg);
2733 INLINE_HEADER int
2734 i_used_to_be_a_crufty_macro_but_look_at_me_now (int arg)
2735 @{
2736 ...
2737 @}
2738 @end example
2739
2740 The declaration right before the definition is to prevent warnings when
2741 compiling with @code{gcc -Wmissing-declarations}. I consider issuing
2742 this warning for inline functions a gcc bug, but the gcc maintainers disagree.
2743
2744 Every header which contains inline functions, either directly by using
2745 @code{INLINE_HEADER} or indirectly by using @code{DECLARE_LRECORD} must
2746 be added to @file{inline.c}'s includes to make the optimization
2747 described above work. (Optimization note: if all INLINE_HEADER
2748 functions are in fact inlined in all translation units, then the linker
2749 can just discard @code{inline.o}, since it contains only unreferenced code).
2750
2751 To get started debugging XEmacs, take a look at the @file{.gdbinit} and
2752 @file{.dbxrc} files in the @file{src} directory. See the section in the
2753 XEmacs FAQ on How to Debug an XEmacs problem with a debugger.
2754 2652
2755 After making source code changes, run @code{make check} to ensure that 2653 After making source code changes, run @code{make check} to ensure that
2756 you haven't introduced any regressions. If you want to make xemacs more 2654 you haven't introduced any regressions. If you're feeling ambitious,
2757 reliable, please improve the test suite in @file{tests/automated}. 2655 you can try to improve the test suite in @file{tests/automated}.
2758
2759 Did you make sure you didn't introduce any new compiler warnings?
2760
2761 Before submitting a patch, please try compiling at least once with
2762
2763 @example
2764 configure --with-mule --with-union-type --error-checking=all
2765 @end example
2766 2656
2767 Here are things to know when you create a new source file: 2657 Here are things to know when you create a new source file:
2768 2658
2769 @itemize @bullet 2659 @itemize @bullet
2770 @item 2660 @item
2773 2663
2774 @item 2664 @item
2775 Generated header files should be included using the @code{#include <...>} syntax, 2665 Generated header files should be included using the @code{#include <...>} syntax,
2776 not the @code{#include "..."} syntax. The generated headers are: 2666 not the @code{#include "..."} syntax. The generated headers are:
2777 2667
2778 @file{config.h sheap-adjust.h paths.h Emacs.ad.h} 2668 @file{config.h puresize-adjust.h sheap-adjust.h paths.h Emacs.ad.h}
2779 2669
2780 The basic rule is that you should assume builds using @code{--srcdir} 2670 The basic rule is that you should assume builds using @code{--srcdir}
2781 and the @code{#include <...>} syntax needs to be used when the 2671 and the @code{#include <...>} syntax needs to be used when the
2782 to-be-included generated file is in a potentially different directory 2672 to-be-included generated file is in a potentially different directory
2783 @emph{at compile time}. The non-obvious C rule is that @code{#include "..."} 2673 @emph{at compile time}. The non-obvious C rule is that @code{#include "..."}
2787 @item 2677 @item
2788 Header files should @emph{not} include @code{<config.h>} and 2678 Header files should @emph{not} include @code{<config.h>} and
2789 @code{"lisp.h"}. It is the responsibility of the @file{.c} files that 2679 @code{"lisp.h"}. It is the responsibility of the @file{.c} files that
2790 use it to do so. 2680 use it to do so.
2791 2681
2682 @item
2683 If the header uses @code{INLINE}, either directly or through
2684 @code{DECLARE_LRECORD}, then it must be added to @file{inline.c}'s
2685 includes.
2686
2687 @item
2688 Try compiling at least once with
2689
2690 @example
2691 gcc --with-mule --with-union-type --error-checking=all
2692 @end example
2693
2694 @item
2695 Did I mention that you should run the test suite?
2696 @example
2697 make check
2698 @end example
2792 @end itemize 2699 @end itemize
2793 2700
2794 Here is a checklist of things to do when creating a new lisp object type
2795 named @var{foo}:
2796
2797 @enumerate
2798 @item
2799 create @var{foo}.h
2800 @item
2801 create @var{foo}.c
2802 @item
2803 add definitions of @code{syms_of_@var{foo}}, etc. to @file{@var{foo}.c}
2804 @item
2805 add declarations of @code{syms_of_@var{foo}}, etc. to @file{symsinit.h}
2806 @item
2807 add calls to @code{syms_of_@var{foo}}, etc. to @file{emacs.c}
2808 @item
2809 add definitions of macros like @code{CHECK_@var{FOO}} and
2810 @code{@var{FOO}P} to @file{@var{foo}.h}
2811 @item
2812 add the new type index to @code{enum lrecord_type}
2813 @item
2814 add a DEFINE_LRECORD_IMPLEMENTATION call to @file{@var{foo}.c}
2815 @item
2816 add an INIT_LRECORD_IMPLEMENTATION call to @code{syms_of_@var{foo}.c}
2817 @end enumerate
2818 2701
2819 @node A Summary of the Various XEmacs Modules, Allocation of Objects in XEmacs Lisp, Rules When Writing New C Code, Top 2702 @node A Summary of the Various XEmacs Modules, Allocation of Objects in XEmacs Lisp, Rules When Writing New C Code, Top
2820 @chapter A Summary of the Various XEmacs Modules 2703 @chapter A Summary of the Various XEmacs Modules
2821 2704
2822 This is accurate as of XEmacs 20.0. 2705 This is accurate as of XEmacs 20.0.
2834 * Modules for Interfacing with the Operating System:: 2717 * Modules for Interfacing with the Operating System::
2835 * Modules for Interfacing with X Windows:: 2718 * Modules for Interfacing with X Windows::
2836 * Modules for Internationalization:: 2719 * Modules for Internationalization::
2837 @end menu 2720 @end menu
2838 2721
2839 @node Low-Level Modules, Basic Lisp Modules, A Summary of the Various XEmacs Modules, A Summary of the Various XEmacs Modules 2722 @node Low-Level Modules
2840 @section Low-Level Modules 2723 @section Low-Level Modules
2841 2724
2842 @example 2725 @example
2843 config.h 2726 config.h
2844 @end example 2727 @end example
3058 2941
3059 This is not currently used. 2942 This is not currently used.
3060 2943
3061 2944
3062 2945
3063 @node Basic Lisp Modules, Modules for Standard Editing Operations, Low-Level Modules, A Summary of the Various XEmacs Modules 2946 @node Basic Lisp Modules
3064 @section Basic Lisp Modules 2947 @section Basic Lisp Modules
3065 2948
3066 @example 2949 @example
3067 emacsfns.h 2950 emacsfns.h
3068 lisp-disunion.h 2951 lisp-disunion.h
3093 declarations (i.e. a simple declaration like @code{struct foo;} where 2976 declarations (i.e. a simple declaration like @code{struct foo;} where
3094 the structure itself is defined elsewhere) should be placed into the 2977 the structure itself is defined elsewhere) should be placed into the
3095 typedefs section as necessary. 2978 typedefs section as necessary.
3096 2979
3097 @file{lrecord.h} contains the basic structures and macros that implement 2980 @file{lrecord.h} contains the basic structures and macros that implement
3098 all record-type Lisp objects---i.e. all objects whose type is a field 2981 all record-type Lisp objects -- i.e. all objects whose type is a field
3099 in their C structure, which includes all objects except the few most 2982 in their C structure, which includes all objects except the few most
3100 basic ones. 2983 basic ones.
3101 2984
3102 @file{lisp.h} contains prototypes for most of the exported functions in 2985 @file{lisp.h} contains prototypes for most of the exported functions in
3103 the various modules. Lisp primitives defined using @code{DEFUN} that 2986 the various modules. Lisp primitives defined using @code{DEFUN} that
3111 2994
3112 2995
3113 2996
3114 @example 2997 @example
3115 alloc.c 2998 alloc.c
2999 pure.c
3000 puresize.h
3116 @end example 3001 @end example
3117 3002
3118 The large module @file{alloc.c} implements all of the basic allocation and 3003 The large module @file{alloc.c} implements all of the basic allocation and
3119 garbage collection for Lisp objects. The most commonly used Lisp 3004 garbage collection for Lisp objects. The most commonly used Lisp
3120 objects are allocated in chunks, similar to the Blocktype data type 3005 objects are allocated in chunks, similar to the Blocktype data type
3127 not dependent on any particular object type, and interfaces to 3012 not dependent on any particular object type, and interfaces to
3128 particular types of objects using a standardized interface of 3013 particular types of objects using a standardized interface of
3129 type-specific methods. This scheme is a fundamental principle of 3014 type-specific methods. This scheme is a fundamental principle of
3130 object-oriented programming and is heavily used throughout XEmacs. The 3015 object-oriented programming and is heavily used throughout XEmacs. The
3131 great advantage of this is that it allows for a clean separation of 3016 great advantage of this is that it allows for a clean separation of
3132 functionality into different modules---new classes of Lisp objects, new 3017 functionality into different modules -- new classes of Lisp objects, new
3133 event interfaces, new device types, new stream interfaces, etc. can be 3018 event interfaces, new device types, new stream interfaces, etc. can be
3134 added transparently without affecting code anywhere else in XEmacs. 3019 added transparently without affecting code anywhere else in XEmacs.
3135 Because the different subsystems are divided into general and specific 3020 Because the different subsystems are divided into general and specific
3136 code, adding a new subtype within a subsystem will in general not 3021 code, adding a new subtype within a subsystem will in general not
3137 require changes to the generic subsystem code or affect any of the other 3022 require changes to the generic subsystem code or affect any of the other
3138 subtypes in the subsystem; this provides a great deal of robustness to 3023 subtypes in the subsystem; this provides a great deal of robustness to
3139 the XEmacs code. 3024 the XEmacs code.
3025
3026 @cindex pure space
3027 @file{pure.c} contains the declaration of the @dfn{purespace} array.
3028 Pure space is a hack used to place some constant Lisp data into the code
3029 segment of the XEmacs executable, even though the data needs to be
3030 initialized through function calls. (See above in section VIII for more
3031 info about this.) During startup, certain sorts of data is
3032 automatically copied into pure space, and other data is copied manually
3033 in some of the basic Lisp files by calling the function @code{purecopy},
3034 which copies the object if possible (this only works in temacs, of
3035 course) and returns the new object. In particular, while temacs is
3036 executing, the Lisp reader automatically copies all compiled-function
3037 objects that it reads into pure space. Since compiled-function objects
3038 are large, are never modified, and typically comprise the majority of
3039 the contents of a compiled-Lisp file, this works well. While XEmacs is
3040 running, any attempt to modify an object that resides in pure space
3041 causes an error. Objects in pure space are never garbage collected --
3042 almost all of the time, they're intended to be permanent, and in any
3043 case you can't write into pure space to set the mark bits.
3044
3045 @file{puresize.h} contains the declaration of the size of the pure space
3046 array. This depends on the optional features that are compiled in, any
3047 extra purespace requested by the user at compile time, and certain other
3048 factors (e.g. 64-bit machines need more pure space because their Lisp
3049 objects are larger). The smallest size that suffices should be used, so
3050 that there's no wasted space. If there's not enough pure space, you
3051 will get an error during the build process, specifying how much more
3052 pure space is needed.
3053
3140 3054
3141 3055
3142 @example 3056 @example
3143 eval.c 3057 eval.c
3144 backtrace.h 3058 backtrace.h
3189 @end example 3103 @end example
3190 3104
3191 @file{symbols.c} implements the handling of symbols, obarrays, and 3105 @file{symbols.c} implements the handling of symbols, obarrays, and
3192 retrieving the values of symbols. Much of the code is devoted to 3106 retrieving the values of symbols. Much of the code is devoted to
3193 handling the special @dfn{symbol-value-magic} objects that define 3107 handling the special @dfn{symbol-value-magic} objects that define
3194 special types of variables---this includes buffer-local variables, 3108 special types of variables -- this includes buffer-local variables,
3195 variable aliases, variables that forward into C variables, etc. This 3109 variable aliases, variables that forward into C variables, etc. This
3196 module is initialized extremely early (right after @file{alloc.c}), 3110 module is initialized extremely early (right after @file{alloc.c}),
3197 because it is here that the basic symbols @code{t} and @code{nil} are 3111 because it is here that the basic symbols @code{t} and @code{nil} are
3198 created, and those symbols are used everywhere throughout XEmacs. 3112 created, and those symbols are used everywhere throughout XEmacs.
3199 3113
3233 structures. Note that the byte-code @emph{compiler} is written in Lisp. 3147 structures. Note that the byte-code @emph{compiler} is written in Lisp.
3234 3148
3235 3149
3236 3150
3237 3151
3238 @node Modules for Standard Editing Operations, Editor-Level Control Flow Modules, Basic Lisp Modules, A Summary of the Various XEmacs Modules 3152 @node Modules for Standard Editing Operations
3239 @section Modules for Standard Editing Operations 3153 @section Modules for Standard Editing Operations
3240 3154
3241 @example 3155 @example
3242 buffer.c 3156 buffer.c
3243 buffer.h 3157 buffer.h
3403 This module implements the undo mechanism for tracking buffer changes. 3317 This module implements the undo mechanism for tracking buffer changes.
3404 Most of this could be implemented in Lisp. 3318 Most of this could be implemented in Lisp.
3405 3319
3406 3320
3407 3321
3408 @node Editor-Level Control Flow Modules, Modules for the Basic Displayable Lisp Objects, Modules for Standard Editing Operations, A Summary of the Various XEmacs Modules 3322 @node Editor-Level Control Flow Modules
3409 @section Editor-Level Control Flow Modules 3323 @section Editor-Level Control Flow Modules
3410 3324
3411 @example 3325 @example
3412 event-Xt.c 3326 event-Xt.c
3413 event-stream.c 3327 event-stream.c
3468 @example 3382 @example
3469 keyboard.c 3383 keyboard.c
3470 @end example 3384 @end example
3471 3385
3472 @file{keyboard.c} contains functions that implement the actual editor 3386 @file{keyboard.c} contains functions that implement the actual editor
3473 command loop---i.e. the event loop that cyclically retrieves and 3387 command loop -- i.e. the event loop that cyclically retrieves and
3474 dispatches events. This code is also rather tricky, just like 3388 dispatches events. This code is also rather tricky, just like
3475 @file{event-stream.c}. 3389 @file{event-stream.c}.
3476 3390
3477 3391
3478 3392
3501 bootstrapping implementations early in temacs, before the echo-area Lisp 3415 bootstrapping implementations early in temacs, before the echo-area Lisp
3502 code is loaded). 3416 code is loaded).
3503 3417
3504 3418
3505 3419
3506 @node Modules for the Basic Displayable Lisp Objects, Modules for other Display-Related Lisp Objects, Editor-Level Control Flow Modules, A Summary of the Various XEmacs Modules 3420 @node Modules for the Basic Displayable Lisp Objects
3507 @section Modules for the Basic Displayable Lisp Objects 3421 @section Modules for the Basic Displayable Lisp Objects
3508 3422
3509 @example 3423 @example
3510 device-ns.h 3424 device-ns.h
3511 device-stream.c 3425 device-stream.c
3575 is part of the redisplay mechanism or the code for particular object 3489 is part of the redisplay mechanism or the code for particular object
3576 types such as scrollbars. 3490 types such as scrollbars.
3577 3491
3578 3492
3579 3493
3580 @node Modules for other Display-Related Lisp Objects, Modules for the Redisplay Mechanism, Modules for the Basic Displayable Lisp Objects, A Summary of the Various XEmacs Modules 3494 @node Modules for other Display-Related Lisp Objects
3581 @section Modules for other Display-Related Lisp Objects 3495 @section Modules for other Display-Related Lisp Objects
3582 3496
3583 @example 3497 @example
3584 faces.c 3498 faces.c
3585 faces.h 3499 faces.h
3636 3550
3637 @example 3551 @example
3638 font-lock.c 3552 font-lock.c
3639 @end example 3553 @end example
3640 3554
3641 This file provides C support for syntax highlighting---i.e. 3555 This file provides C support for syntax highlighting -- i.e.
3642 highlighting different syntactic constructs of a source file in 3556 highlighting different syntactic constructs of a source file in
3643 different colors, for easy reading. The C support is provided so that 3557 different colors, for easy reading. The C support is provided so that
3644 this is fast. 3558 this is fast.
3645 3559
3646 3560
3654 3568
3655 These modules decode GIF-format image files, for use with glyphs. 3569 These modules decode GIF-format image files, for use with glyphs.
3656 3570
3657 3571
3658 3572
3659 @node Modules for the Redisplay Mechanism, Modules for Interfacing with the File System, Modules for other Display-Related Lisp Objects, A Summary of the Various XEmacs Modules 3573 @node Modules for the Redisplay Mechanism
3660 @section Modules for the Redisplay Mechanism 3574 @section Modules for the Redisplay Mechanism
3661 3575
3662 @example 3576 @example
3663 redisplay-output.c 3577 redisplay-output.c
3664 redisplay-tty.c 3578 redisplay-tty.c
3726 These files provide some miscellaneous TTY-output functions and should 3640 These files provide some miscellaneous TTY-output functions and should
3727 probably be merged into @file{redisplay-tty.c}. 3641 probably be merged into @file{redisplay-tty.c}.
3728 3642
3729 3643
3730 3644
3731 @node Modules for Interfacing with the File System, Modules for Other Aspects of the Lisp Interpreter and Object System, Modules for the Redisplay Mechanism, A Summary of the Various XEmacs Modules 3645 @node Modules for Interfacing with the File System
3732 @section Modules for Interfacing with the File System 3646 @section Modules for Interfacing with the File System
3733 3647
3734 @example 3648 @example
3735 lstream.c 3649 lstream.c
3736 lstream.h 3650 lstream.h
3827 for expanding symbolic links, on systems that don't implement it or have 3741 for expanding symbolic links, on systems that don't implement it or have
3828 a broken implementation. 3742 a broken implementation.
3829 3743
3830 3744
3831 3745
3832 @node Modules for Other Aspects of the Lisp Interpreter and Object System, Modules for Interfacing with the Operating System, Modules for Interfacing with the File System, A Summary of the Various XEmacs Modules 3746 @node Modules for Other Aspects of the Lisp Interpreter and Object System
3833 @section Modules for Other Aspects of the Lisp Interpreter and Object System 3747 @section Modules for Other Aspects of the Lisp Interpreter and Object System
3834 3748
3835 @example 3749 @example
3836 elhash.c 3750 elhash.c
3837 elhash.h 3751 elhash.h
3941 @cindex mark method 3855 @cindex mark method
3942 Opaque objects can also have an arbitrary @dfn{mark method} associated 3856 Opaque objects can also have an arbitrary @dfn{mark method} associated
3943 with them, in case the block of memory contains other Lisp objects that 3857 with them, in case the block of memory contains other Lisp objects that
3944 need to be marked for garbage-collection purposes. (If you need other 3858 need to be marked for garbage-collection purposes. (If you need other
3945 object methods, such as a finalize method, you should just go ahead and 3859 object methods, such as a finalize method, you should just go ahead and
3946 create a new Lisp object type---it's not hard.) 3860 create a new Lisp object type -- it's not hard.)
3947 3861
3948 3862
3949 3863
3950 @example 3864 @example
3951 abbrev.c 3865 abbrev.c
3989 various security applications on the Internet. 3903 various security applications on the Internet.
3990 3904
3991 3905
3992 3906
3993 3907
3994 @node Modules for Interfacing with the Operating System, Modules for Interfacing with X Windows, Modules for Other Aspects of the Lisp Interpreter and Object System, A Summary of the Various XEmacs Modules 3908 @node Modules for Interfacing with the Operating System
3995 @section Modules for Interfacing with the Operating System 3909 @section Modules for Interfacing with the Operating System
3996 3910
3997 @example 3911 @example
3998 callproc.c 3912 callproc.c
3999 process.c 3913 process.c
4228 These modules are used for MS-DOS support, which does not work in 4142 These modules are used for MS-DOS support, which does not work in
4229 XEmacs. 4143 XEmacs.
4230 4144
4231 4145
4232 4146
4233 @node Modules for Interfacing with X Windows, Modules for Internationalization, Modules for Interfacing with the Operating System, A Summary of the Various XEmacs Modules 4147 @node Modules for Interfacing with X Windows
4234 @section Modules for Interfacing with X Windows 4148 @section Modules for Interfacing with X Windows
4235 4149
4236 @example 4150 @example
4237 Emacs.ad.h 4151 Emacs.ad.h
4238 @end example 4152 @end example
4370 4284
4371 Don't touch this code; something is liable to break if you do. 4285 Don't touch this code; something is liable to break if you do.
4372 4286
4373 4287
4374 4288
4375 @node Modules for Internationalization, , Modules for Interfacing with X Windows, A Summary of the Various XEmacs Modules 4289 @node Modules for Internationalization
4376 @section Modules for Internationalization 4290 @section Modules for Internationalization
4377 4291
4378 @example 4292 @example
4379 mule-canna.c 4293 mule-canna.c
4380 mule-ccl.c 4294 mule-ccl.c
4447 Asian-language support, and is not currently used. 4361 Asian-language support, and is not currently used.
4448 4362
4449 4363
4450 4364
4451 4365
4452 @node Allocation of Objects in XEmacs Lisp, Dumping, A Summary of the Various XEmacs Modules, Top 4366 @node Allocation of Objects in XEmacs Lisp, Events and the Event Loop, A Summary of the Various XEmacs Modules, Top
4453 @chapter Allocation of Objects in XEmacs Lisp 4367 @chapter Allocation of Objects in XEmacs Lisp
4454 4368
4455 @menu 4369 @menu
4456 * Introduction to Allocation:: 4370 * Introduction to Allocation::
4457 * Garbage Collection:: 4371 * Garbage Collection::
4458 * GCPROing:: 4372 * GCPROing::
4459 * Garbage Collection - Step by Step::
4460 * Integers and Characters:: 4373 * Integers and Characters::
4461 * Allocation from Frob Blocks:: 4374 * Allocation from Frob Blocks::
4462 * lrecords:: 4375 * lrecords::
4463 * Low-level allocation:: 4376 * Low-level allocation::
4377 * Pure Space::
4464 * Cons:: 4378 * Cons::
4465 * Vector:: 4379 * Vector::
4466 * Bit Vector:: 4380 * Bit Vector::
4467 * Symbol:: 4381 * Symbol::
4468 * Marker:: 4382 * Marker::
4469 * String:: 4383 * String::
4470 * Compiled Function:: 4384 * Compiled Function::
4471 @end menu 4385 @end menu
4472 4386
4473 @node Introduction to Allocation, Garbage Collection, Allocation of Objects in XEmacs Lisp, Allocation of Objects in XEmacs Lisp 4387 @node Introduction to Allocation
4474 @section Introduction to Allocation 4388 @section Introduction to Allocation
4475 4389
4476 Emacs Lisp, like all Lisps, has garbage collection. This means that 4390 Emacs Lisp, like all Lisps, has garbage collection. This means that
4477 the programmer never has to explicitly free (destroy) an object; it 4391 the programmer never has to explicitly free (destroy) an object; it
4478 happens automatically when the object becomes inaccessible. Most 4392 happens automatically when the object becomes inaccessible. Most
4489 symbols, the primitives are @code{make-symbol} and @code{intern}; etc. 4403 symbols, the primitives are @code{make-symbol} and @code{intern}; etc.
4490 Some Lisp objects, especially those that are primarily used internally, 4404 Some Lisp objects, especially those that are primarily used internally,
4491 have no corresponding Lisp primitives. Every Lisp object, though, 4405 have no corresponding Lisp primitives. Every Lisp object, though,
4492 has at least one C primitive for creating it. 4406 has at least one C primitive for creating it.
4493 4407
4494 Recall from section (VII) that a Lisp object, as stored in a 32-bit or 4408 Recall from section (VII) that a Lisp object, as stored in a 32-bit
4495 64-bit word, has a few tag bits, and a ``value'' that occupies the 4409 or 64-bit word, has a mark bit, a few tag bits, and a ``value'' that
4496 remainder of the bits. We can separate the different Lisp object types 4410 occupies the remainder of the bits. We can separate the different
4497 into three broad categories: 4411 Lisp object types into four broad categories:
4498 4412
4499 @itemize @bullet 4413 @itemize @bullet
4500 @item 4414 @item
4501 (a) Those for whom the value directly represents the contents of the 4415 (a) Those for whom the value directly represents the contents of the
4502 Lisp object. Only two types are in this category: integers and 4416 Lisp object. Only two types are in this category: integers and
4503 characters. No special allocation or garbage collection is necessary 4417 characters. No special allocation or garbage collection is necessary
4504 for such objects. Lisp objects of these types do not need to be 4418 for such objects. Lisp objects of these types do not need to be
4505 @code{GCPRO}ed. 4419 @code{GCPRO}ed.
4506 @end itemize 4420 @end itemize
4507 4421
4422 In the remaining three categories, the value is a pointer to a
4423 structure.
4424
4425 @itemize @bullet
4426 @item
4427 @cindex frob block
4428 (b) Those for whom the tag directly specifies the type. Recall that
4429 there are only three tag bits; this means that at most five types can be
4430 specified this way. The most commonly-used types are stored in this
4431 format; this includes conses, strings, vectors, and sometimes symbols.
4432 With the exception of vectors, objects in this category are allocated in
4433 @dfn{frob blocks}, i.e. large blocks of memory that are subdivided into
4434 individual objects. This saves a lot on malloc overhead, since there
4435 are typically quite a lot of these objects around, and the objects are
4436 small. (A cons, for example, occupies 8 bytes on 32-bit machines -- 4
4437 bytes for each of the two objects it contains.) Vectors are individually
4438 @code{malloc()}ed since they are of variable size. (It would be
4439 possible, and desirable, to allocate vectors of certain small sizes out
4440 of frob blocks, but it isn't currently done.) Strings are handled
4441 specially: Each string is allocated in two parts, a fixed size structure
4442 containing a length and a data pointer, and the actual data of the
4443 string. The former structure is allocated in frob blocks as usual, and
4444 the latter data is stored in @dfn{string chars blocks} and is relocated
4445 during garbage collection to eliminate holes.
4446 @end itemize
4447
4508 In the remaining two categories, the type is stored in the object 4448 In the remaining two categories, the type is stored in the object
4509 itself. The tag for all such objects is the generic @dfn{lrecord} 4449 itself. The tag for all such objects is the generic @dfn{lrecord}
4510 (Lisp_Type_Record) tag. The first bytes of the object's structure are an 4450 (Lisp_Record) tag. The first four bytes (or eight, for 64-bit machines)
4511 integer (actually a char) characterising the object's type and some 4451 of the object's structure are a pointer to a structure that describes
4512 flags, in particular the mark bit used for garbage collection. A 4452 the object's type, which includes method pointers and a pointer to a
4513 structure describing the type is accessible thru the 4453 string naming the type. Note that it's possible to save some space by
4514 lrecord_implementation_table indexed with said integer. This structure 4454 using a one- or two-byte tag, rather than a four- or eight-byte pointer
4515 includes the method pointers and a pointer to a string naming the type. 4455 to store the type, but it's not clear it's worth making the change.
4516 4456
4517 @itemize @bullet 4457 @itemize @bullet
4518 @item 4458 @item
4519 (b) Those lrecords that are allocated in frob blocks (see above). This 4459 (c) Those lrecords that are allocated in frob blocks (see above). This
4520 includes the objects that are most common and relatively small, and 4460 includes the objects that are most common and relatively small, and
4521 includes conses, strings, subrs, floats, compiled functions, symbols, 4461 includes floats, compiled functions, symbols (when not in category (b)),
4522 extents, events, and markers. With the cleanup of frob blocks done in 4462 extents, events, and markers. With the cleanup of frob blocks done in
4523 19.12, it's not terribly hard to add more objects to this category, but 4463 19.12, it's not terribly hard to add more objects to this category, but
4524 it's a bit trickier than adding an object type to type (c) (esp. if the 4464 it's a bit trickier than adding an object type to type (d) (esp. if the
4525 object needs a finalization method), and is not likely to save much 4465 object needs a finalization method), and is not likely to save much
4526 space unless the object is small and there are many of them. (In fact, 4466 space unless the object is small and there are many of them. (In fact,
4527 if there are very few of them, it might actually waste space.) 4467 if there are very few of them, it might actually waste space.)
4528 @item 4468 @item
4529 (c) Those lrecords that are individually @code{malloc()}ed. These are 4469 (d) Those lrecords that are individually @code{malloc()}ed. These are
4530 called @dfn{lcrecords}. All other types are in this category. Adding a 4470 called @dfn{lcrecords}. All other types are in this category. Adding a
4531 new type to this category is comparatively easy, and all types added 4471 new type to this category is comparatively easy, and all types added
4532 since 19.8 (when the current allocation scheme was devised, by Richard 4472 since 19.8 (when the current allocation scheme was devised, by Richard
4533 Mlynarik), with the exception of the character type, have been in this 4473 Mlynarik), with the exception of the character type, have been in this
4534 category. 4474 category.
4535 @end itemize 4475 @end itemize
4536 4476
4537 Note that bit vectors are a bit of a special case. They are 4477 Note that bit vectors are a bit of a special case. They are
4538 simple lrecords as in category (b), but are individually @code{malloc()}ed 4478 simple lrecords as in category (c), but are individually @code{malloc()}ed
4539 like vectors. You can basically view them as exactly like vectors 4479 like vectors. You can basically view them as exactly like vectors
4540 except that their type is stored in lrecord fashion rather than 4480 except that their type is stored in lrecord fashion rather than
4541 in directly-tagged fashion. 4481 in directly-tagged fashion.
4542 4482
4543 4483 Note that FSF Emacs redesigned their object system in 19.29 to follow
4544 @node Garbage Collection, GCPROing, Introduction to Allocation, Allocation of Objects in XEmacs Lisp 4484 a similar scheme. However, given RMS's expressed dislike for data
4485 abstraction, the FSF scheme is not nearly as clean or as easy to
4486 extend. (FSF calls items of type (c) @code{Lisp_Misc} and items of type
4487 (d) @code{Lisp_Vectorlike}, with separate tags for each, although
4488 @code{Lisp_Vectorlike} is also used for vectors.)
4489
4490 @node Garbage Collection
4545 @section Garbage Collection 4491 @section Garbage Collection
4546 @cindex garbage collection 4492 @cindex garbage collection
4547 4493
4548 @cindex mark and sweep 4494 @cindex mark and sweep
4549 Garbage collection is simple in theory but tricky to implement. 4495 Garbage collection is simple in theory but tricky to implement.
4557 that ``all of memory'' means all currently allocated objects. 4503 that ``all of memory'' means all currently allocated objects.
4558 Traversing all these objects means traversing all frob blocks, 4504 Traversing all these objects means traversing all frob blocks,
4559 all vectors (which are chained in one big list), and all 4505 all vectors (which are chained in one big list), and all
4560 lcrecords (which are likewise chained). 4506 lcrecords (which are likewise chained).
4561 4507
4562 Garbage collection can be invoked explicitly by calling 4508 Note that, when an object is marked, the mark has to occur
4563 @code{garbage-collect} but is also called automatically by @code{eval}, 4509 inside of the object's structure, rather than in the 32-bit
4564 once a certain amount of memory has been allocated since the last 4510 @code{Lisp_Object} holding the object's pointer; i.e. you can't just
4565 garbage collection (according to @code{gc-cons-threshold}). 4511 set the pointer's mark bit. This is because there may be many
4566 4512 pointers to the same object. This means that the method of
4567 4513 marking an object can differ depending on the type. The
4568 @node GCPROing, Garbage Collection - Step by Step, Garbage Collection, Allocation of Objects in XEmacs Lisp 4514 different marking methods are approximately as follows:
4515
4516 @enumerate
4517 @item
4518 For conses, the mark bit of the car is set.
4519 @item
4520 For strings, the mark bit of the string's plist is set.
4521 @item
4522 For symbols when not lrecords, the mark bit of the
4523 symbol's plist is set.
4524 @item
4525 For vectors, the length is negated after adding 1.
4526 @item
4527 For lrecords, the pointer to the structure describing
4528 the type is changed (see below).
4529 @item
4530 Integers and characters do not need to be marked, since
4531 no allocation occurs for them.
4532 @end enumerate
4533
4534 The details of this are in the @code{mark_object()} function.
4535
4536 Note that any code that operates during garbage collection has
4537 to be especially careful because of the fact that some objects
4538 may be marked and as such may not look like they normally do.
4539 In particular:
4540
4541 @itemize @bullet
4542 Some object pointers may have their mark bit set. This will make
4543 @code{FOOBARP()} predicates fail. Use @code{GC_FOOBARP()} to deal with
4544 this.
4545 @item
4546 Even if you clear the mark bit, @code{FOOBARP()} will still fail
4547 for lrecords because the implementation pointer has been
4548 changed (see below). @code{GC_FOOBARP()} will correctly deal with
4549 this.
4550 @item
4551 Vectors have their size field munged, so anything that
4552 looks at this field will fail.
4553 @item
4554 Note that @code{XFOOBAR()} macros @emph{will} work correctly on object
4555 pointers with their mark bit set, because the logical shift operations
4556 that remove the tag also remove the mark bit.
4557 @end itemize
4558
4559 Finally, note that garbage collection can be invoked explicitly
4560 by calling @code{garbage-collect} but is also called automatically
4561 by @code{eval}, once a certain amount of memory has been allocated
4562 since the last garbage collection (according to @code{gc-cons-threshold}).
4563
4564 @node GCPROing
4569 @section @code{GCPRO}ing 4565 @section @code{GCPRO}ing
4570 4566
4571 @code{GCPRO}ing is one of the ugliest and trickiest parts of Emacs 4567 @code{GCPRO}ing is one of the ugliest and trickiest parts of Emacs
4572 internals. The basic idea is that whenever garbage collection 4568 internals. The basic idea is that whenever garbage collection
4573 occurs, all in-use objects must be reachable somehow or 4569 occurs, all in-use objects must be reachable somehow or
4574 other from one of the roots of accessibility. The roots 4570 other from one of the roots of accessibility. The roots
4575 of accessibility are: 4571 of accessibility are:
4576 4572
4577 @enumerate 4573 @enumerate
4578 @item 4574 @item
4579 All objects that have been @code{staticpro()}d or 4575 All objects that have been @code{staticpro()}d. This is used for
4580 @code{staticpro_nodump()}ed. This is used for any global C variables 4576 any global C variables that hold Lisp objects. A call to
4581 that hold Lisp objects. A call to @code{staticpro()} happens implicitly 4577 @code{staticpro()} happens implicitly as a result of any symbols
4582 as a result of any symbols declared with @code{defsymbol()} and any 4578 declared with @code{defsymbol()} and any variables declared with
4583 variables declared with @code{DEFVAR_FOO()}. You need to explicitly 4579 @code{DEFVAR_FOO()}. You need to explicitly call @code{staticpro()}
4584 call @code{staticpro()} (in the @code{vars_of_foo()} method of a module) 4580 (in the @code{vars_of_foo()} method of a module) for other global
4585 for other global C variables holding Lisp objects. (This typically 4581 C variables holding Lisp objects. (This typically includes
4586 includes internal lists and such things.). Use 4582 internal lists and such things.)
4587 @code{staticpro_nodump()} only in the rare cases when you do not want
4588 the pointed variable to be saved at dump time but rather recompute it at
4589 startup.
4590 4583
4591 Note that @code{obarray} is one of the @code{staticpro()}d things. 4584 Note that @code{obarray} is one of the @code{staticpro()}d things.
4592 Therefore, all functions and variables get marked through this. 4585 Therefore, all functions and variables get marked through this.
4593 @item 4586 @item
4594 Any shadowed bindings that are sitting on the @code{specpdl} stack. 4587 Any shadowed bindings that are sitting on the @code{specpdl} stack.
4628 variable @samp{gcprolist} pointing to the head of the list and the nth 4621 variable @samp{gcprolist} pointing to the head of the list and the nth
4629 local @code{gcpro} variable pointing to the first @code{gcpro} variable 4622 local @code{gcpro} variable pointing to the first @code{gcpro} variable
4630 in the next enclosing stack frame. Each @code{GCPRO}ed thing is an 4623 in the next enclosing stack frame. Each @code{GCPRO}ed thing is an
4631 lvalue, and the @code{struct gcpro} local variable contains a pointer to 4624 lvalue, and the @code{struct gcpro} local variable contains a pointer to
4632 this lvalue. This is why things will mess up badly if you don't pair up 4625 this lvalue. This is why things will mess up badly if you don't pair up
4633 the @code{GCPRO}s and @code{UNGCPRO}s---you will end up with 4626 the @code{GCPRO}s and @code{UNGCPRO}s -- you will end up with
4634 @code{gcprolist}s containing pointers to @code{struct gcpro}s or local 4627 @code{gcprolist}s containing pointers to @code{struct gcpro}s or local
4635 @code{Lisp_Object} variables in no-longer-active stack frames. 4628 @code{Lisp_Object} variables in no-longer-active stack frames.
4636 4629
4637 @item 4630 @item
4638 It is actually possible for a single @code{struct gcpro} to 4631 It is actually possible for a single @code{struct gcpro} to
4719 anything that looks like a reference to an object as a reference. This 4712 anything that looks like a reference to an object as a reference. This
4720 will result in a few objects not getting collected when they should, but 4713 will result in a few objects not getting collected when they should, but
4721 it obviates the need for @code{GCPRO}ing, and allows garbage collection 4714 it obviates the need for @code{GCPRO}ing, and allows garbage collection
4722 to happen at any point at all, such as during object allocation. 4715 to happen at any point at all, such as during object allocation.
4723 4716
4724 @node Garbage Collection - Step by Step, Integers and Characters, GCPROing, Allocation of Objects in XEmacs Lisp 4717 @node Integers and Characters
4725 @section Garbage Collection - Step by Step
4726 @cindex garbage collection step by step
4727
4728 @menu
4729 * Invocation::
4730 * garbage_collect_1::
4731 * mark_object::
4732 * gc_sweep::
4733 * sweep_lcrecords_1::
4734 * compact_string_chars::
4735 * sweep_strings::
4736 * sweep_bit_vectors_1::
4737 @end menu
4738
4739 @node Invocation, garbage_collect_1, Garbage Collection - Step by Step, Garbage Collection - Step by Step
4740 @subsection Invocation
4741 @cindex garbage collection, invocation
4742
4743 The first thing that anyone should know about garbage collection is:
4744 when and how the garbage collector is invoked. One might think that this
4745 could happen every time new memory is allocated, e.g. new objects are
4746 created, but this is @emph{not} the case. Instead, we have the following
4747 situation:
4748
4749 The entry point of any process of garbage collection is an invocation
4750 of the function @code{garbage_collect_1} in file @code{alloc.c}. The
4751 invocation can occur @emph{explicitly} by calling the function
4752 @code{Fgarbage_collect} (in addition this function provides information
4753 about the freed memory), or can occur @emph{implicitly} in four different
4754 situations:
4755 @enumerate
4756 @item
4757 In function @code{main_1} in file @code{emacs.c}. This function is called
4758 at each startup of xemacs. The garbage collection is invoked after all
4759 initial creations are completed, but only if a special internal error
4760 checking-constant @code{ERROR_CHECK_GC} is defined.
4761 @item
4762 In function @code{disksave_object_finalization} in file
4763 @code{alloc.c}. The only purpose of this function is to clear the
4764 objects from memory which need not be stored with xemacs when we dump out
4765 an executable. This is only done by @code{Fdump_emacs} or by
4766 @code{Fdump_emacs_data} respectively (both in @code{emacs.c}). The
4767 actual clearing is accomplished by making these objects unreachable and
4768 starting a garbage collection. The function is only used while building
4769 xemacs.
4770 @item
4771 In function @code{Feval / eval} in file @code{eval.c}. Each time the
4772 well known and often used function eval is called to evaluate a form,
4773 one of the first things that could happen, is a potential call of
4774 @code{garbage_collect_1}. There exist three global variables,
4775 @code{consing_since_gc} (counts the created cons-cells since the last
4776 garbage collection), @code{gc_cons_threshold} (a specified threshold
4777 after which a garbage collection occurs) and @code{always_gc}. If
4778 @code{always_gc} is set or if the threshold is exceeded, the garbage
4779 collection will start.
4780 @item
4781 In function @code{Ffuncall / funcall} in file @code{eval.c}. This
4782 function evaluates calls of elisp functions and works according to
4783 @code{Feval}.
4784 @end enumerate
4785
4786 The upshot is that garbage collection can basically occur everywhere
4787 @code{Feval}, respectively @code{Ffuncall}, is used - either directly or
4788 through another function. Since calls to these two functions are hidden
4789 in various other functions, many calls to @code{garbage_collect_1} are
4790 not obviously foreseeable, and therefore unexpected. Instances where
4791 they are used that are worth remembering are various elisp commands, as
4792 for example @code{or}, @code{and}, @code{if}, @code{cond}, @code{while},
4793 @code{setq}, etc., miscellaneous @code{gui_item_...} functions,
4794 everything related to @code{eval} (@code{Feval_buffer}, @code{call0},
4795 ...) and inside @code{Fsignal}. The latter is used to handle signals, as
4796 for example the ones raised by every @code{QUITE}-macro triggered after
4797 pressing Ctrl-g.
4798
4799 @node garbage_collect_1, mark_object, Invocation, Garbage Collection - Step by Step
4800 @subsection @code{garbage_collect_1}
4801 @cindex @code{garbage_collect_1}
4802
4803 We can now describe exactly what happens after the invocation takes
4804 place.
4805 @enumerate
4806 @item
4807 There are several cases in which the garbage collector is left immediately:
4808 when we are already garbage collecting (@code{gc_in_progress}), when
4809 the garbage collection is somehow forbidden
4810 (@code{gc_currently_forbidden}), when we are currently displaying something
4811 (@code{in_display}) or when we are preparing for the armageddon of the
4812 whole system (@code{preparing_for_armageddon}).
4813 @item
4814 Next the correct frame in which to put
4815 all the output occurring during garbage collecting is determined. In
4816 order to be able to restore the old display's state after displaying the
4817 message, some data about the current cursor position has to be
4818 saved. The variables @code{pre_gc_cursor} and @code{cursor_changed} take
4819 care of that.
4820 @item
4821 The state of @code{gc_currently_forbidden} must be restored after
4822 the garbage collection, no matter what happens during the process. We
4823 accomplish this by @code{record_unwind_protect}ing the suitable function
4824 @code{restore_gc_inhibit} together with the current value of
4825 @code{gc_currently_forbidden}.
4826 @item
4827 If we are concurrently running an interactive xemacs session, the next step
4828 is simply to show the garbage collector's cursor/message.
4829 @item
4830 The following steps are the intrinsic steps of the garbage collector,
4831 therefore @code{gc_in_progress} is set.
4832 @item
4833 For debugging purposes, it is possible to copy the current C stack
4834 frame. However, this seems to be a currently unused feature.
4835 @item
4836 Before actually starting to go over all live objects, references to
4837 objects that are no longer used are pruned. We only have to do this for events
4838 (@code{clear_event_resource}) and for specifiers
4839 (@code{cleanup_specifiers}).
4840 @item
4841 Now the mark phase begins and marks all accessible elements. In order to
4842 start from
4843 all slots that serve as roots of accessibility, the function
4844 @code{mark_object} is called for each root individually to go out from
4845 there to mark all reachable objects. All roots that are traversed are
4846 shown in their processed order:
4847 @itemize @bullet
4848 @item
4849 all constant symbols and static variables that are registered via
4850 @code{staticpro}@ in the array @code{staticvec}.
4851 @xref{Adding Global Lisp Variables}.
4852 @item
4853 all Lisp objects that are created in C functions and that must be
4854 protected from freeing them. They are registered in the global
4855 list @code{gcprolist}.
4856 @xref{GCPROing}.
4857 @item
4858 all local variables (i.e. their name fields @code{symbol} and old
4859 values @code{old_values}) that are bound during the evaluation by the Lisp
4860 engine. They are stored in @code{specbinding} structs pushed on a stack
4861 called @code{specpdl}.
4862 @xref{Dynamic Binding; The specbinding Stack; Unwind-Protects}.
4863 @item
4864 all catch blocks that the Lisp engine encounters during the evaluation
4865 cause the creation of structs @code{catchtag} inserted in the list
4866 @code{catchlist}. Their tag (@code{tag}) and value (@code{val} fields
4867 are freshly created objects and therefore have to be marked.
4868 @xref{Catch and Throw}.
4869 @item
4870 every function application pushes new structs @code{backtrace}
4871 on the call stack of the Lisp engine (@code{backtrace_list}). The unique
4872 parts that have to be marked are the fields for each function
4873 (@code{function}) and all their arguments (@code{args}).
4874 @xref{Evaluation}.
4875 @item
4876 all objects that are used by the redisplay engine that must not be freed
4877 are marked by a special function called @code{mark_redisplay} (in
4878 @code{redisplay.c}).
4879 @item
4880 all objects created for profiling purposes are allocated by C functions
4881 instead of using the lisp allocation mechanisms. In order to receive the
4882 right ones during the sweep phase, they also have to be marked
4883 manually. That is done by the function @code{mark_profiling_info}
4884 @end itemize
4885 @item
4886 Hash tables in XEmacs belong to a kind of special objects that
4887 make use of a concept often called 'weak pointers'.
4888 To make a long story short, these kind of pointers are not followed
4889 during the estimation of the live objects during garbage collection.
4890 Any object referenced only by weak pointers is collected
4891 anyway, and the reference to it is cleared. In hash tables there are
4892 different usage patterns of them, manifesting in different types of hash
4893 tables, namely 'non-weak', 'weak', 'key-weak' and 'value-weak'
4894 (internally also 'key-car-weak' and 'value-car-weak') hash tables, each
4895 clearing entries depending on different conditions. More information can
4896 be found in the documentation to the function @code{make-hash-table}.
4897
4898 Because there are complicated dependency rules about when and what to
4899 mark while processing weak hash tables, the standard @code{marker}
4900 method is only active if it is marking non-weak hash tables. As soon as
4901 a weak component is in the table, the hash table entries are ignored
4902 while marking. Instead their marking is done each separately by the
4903 function @code{finish_marking_weak_hash_tables}. This function iterates
4904 over each hash table entry @code{hentries} for each weak hash table in
4905 @code{Vall_weak_hash_tables}. Depending on the type of a table, the
4906 appropriate action is performed.
4907 If a table is acting as @code{HASH_TABLE_KEY_WEAK}, and a key already marked,
4908 everything reachable from the @code{value} component is marked. If it is
4909 acting as a @code{HASH_TABLE_VALUE_WEAK} and the value component is
4910 already marked, the marking starts beginning only from the
4911 @code{key} component.
4912 If it is a @code{HASH_TABLE_KEY_CAR_WEAK} and the car
4913 of the key entry is already marked, we mark both the @code{key} and
4914 @code{value} components.
4915 Finally, if the table is of the type @code{HASH_TABLE_VALUE_CAR_WEAK}
4916 and the car of the value components is already marked, again both the
4917 @code{key} and the @code{value} components get marked.
4918
4919 Again, there are lists with comparable properties called weak
4920 lists. There exist different peculiarities of their types called
4921 @code{simple}, @code{assoc}, @code{key-assoc} and
4922 @code{value-assoc}. You can find further details about them in the
4923 description to the function @code{make-weak-list}. The scheme of their
4924 marking is similar: all weak lists are listed in @code{Qall_weak_lists},
4925 therefore we iterate over them. The marking is advanced until we hit an
4926 already marked pair. Then we know that during a former run all
4927 the rest has been marked completely. Again, depending on the special
4928 type of the weak list, our jobs differ. If it is a @code{WEAK_LIST_SIMPLE}
4929 and the elem is marked, we mark the @code{cons} part. If it is a
4930 @code{WEAK_LIST_ASSOC} and not a pair or a pair with both marked car and
4931 cdr, we mark the @code{cons} and the @code{elem}. If it is a
4932 @code{WEAK_LIST_KEY_ASSOC} and not a pair or a pair with a marked car of
4933 the elem, we mark the @code{cons} and the @code{elem}. Finally, if it is
4934 a @code{WEAK_LIST_VALUE_ASSOC} and not a pair or a pair with a marked
4935 cdr of the elem, we mark both the @code{cons} and the @code{elem}.
4936
4937 Since, by marking objects in reach from weak hash tables and weak lists,
4938 other objects could get marked, this perhaps implies further marking of
4939 other weak objects, both finishing functions are redone as long as
4940 yet unmarked objects get freshly marked.
4941
4942 @item
4943 After completing the special marking for the weak hash tables and for the weak
4944 lists, all entries that point to objects that are going to be swept in
4945 the further process are useless, and therefore have to be removed from
4946 the table or the list.
4947
4948 The function @code{prune_weak_hash_tables} does the job for weak hash
4949 tables. Totally unmarked hash tables are removed from the list
4950 @code{Vall_weak_hash_tables}. The other ones are treated more carefully
4951 by scanning over all entries and removing one as soon as one of
4952 the components @code{key} and @code{value} is unmarked.
4953
4954 The same idea applies to the weak lists. It is accomplished by
4955 @code{prune_weak_lists}: An unmarked list is pruned from
4956 @code{Vall_weak_lists} immediately. A marked list is treated more
4957 carefully by going over it and removing just the unmarked pairs.
4958
4959 @item
4960 The function @code{prune_specifiers} checks all listed specifiers held
4961 in @code{Vall_specifiers} and removes the ones from the lists that are
4962 unmarked.
4963
4964 @item
4965 All syntax tables are stored in a list called
4966 @code{Vall_syntax_tables}. The function @code{prune_syntax_tables} walks
4967 through it and unlinks the tables that are unmarked.
4968
4969 @item
4970 Next, we will attack the complete sweeping - the function
4971 @code{gc_sweep} which holds the predominance.
4972 @item
4973 First, all the variables with respect to garbage collection are
4974 reset. @code{consing_since_gc} - the counter of the created cells since
4975 the last garbage collection - is set back to 0, and
4976 @code{gc_in_progress} is not @code{true} anymore.
4977 @item
4978 In case the session is interactive, the displayed cursor and message are
4979 removed again.
4980 @item
4981 The state of @code{gc_inhibit} is restored to the former value by
4982 unwinding the stack.
4983 @item
4984 A small memory reserve is always held back that can be reached by
4985 @code{breathing_space}. If nothing more is left, we create a new reserve
4986 and exit.
4987 @end enumerate
4988
4989 @node mark_object, gc_sweep, garbage_collect_1, Garbage Collection - Step by Step
4990 @subsection @code{mark_object}
4991 @cindex @code{mark_object}
4992
4993 The first thing that is checked while marking an object is whether the
4994 object is a real Lisp object @code{Lisp_Type_Record} or just an integer
4995 or a character. Integers and characters are the only two types that are
4996 stored directly - without another level of indirection, and therefore they
4997 don't have to be marked and collected.
4998 @xref{How Lisp Objects Are Represented in C}.
4999
5000 The second case is the one we have to handle. It is the one when we are
5001 dealing with a pointer to a Lisp object. But, there exist also three
5002 possibilities, that prevent us from doing anything while marking: The
5003 object is read only which prevents it from being garbage collected,
5004 i.e. marked (@code{C_READONLY_RECORD_HEADER}). The object in question is
5005 already marked, and need not be marked for the second time (checked by
5006 @code{MARKED_RECORD_HEADER_P}). If it is a special, unmarkable object
5007 (@code{UNMARKABLE_RECORD_HEADER_P}, apparently, these are objects that
5008 sit in some const space, and can therefore not be marked, see
5009 @code{this_one_is_unmarkable} in @code{alloc.c}).
5010
5011 Now, the actual marking is feasible. We do so by once using the macro
5012 @code{MARK_RECORD_HEADER} to mark the object itself (actually the
5013 special flag in the lrecord header), and calling its special marker
5014 "method" @code{marker} if available. The marker method marks every
5015 other object that is in reach from our current object. Note, that these
5016 marker methods should not call @code{mark_object} recursively, but
5017 instead should return the next object from where further marking has to
5018 be performed.
5019
5020 In case another object was returned, as mentioned before, we reiterate
5021 the whole @code{mark_object} process beginning with this next object.
5022
5023 @node gc_sweep, sweep_lcrecords_1, mark_object, Garbage Collection - Step by Step
5024 @subsection @code{gc_sweep}
5025 @cindex @code{gc_sweep}
5026
5027 The job of this function is to free all unmarked records from memory. As
5028 we know, there are different types of objects implemented and managed, and
5029 consequently different ways to free them from memory.
5030 @xref{Introduction to Allocation}.
5031
5032 We start with all objects stored through @code{lcrecords}. All
5033 bulkier objects are allocated and handled using that scheme of
5034 @code{lcrecords}. Each object is @code{malloc}ed separately
5035 instead of placing it in one of the contiguous frob blocks. All types
5036 that are currently stored
5037 using @code{lcrecords}'s @code{alloc_lcrecord} and
5038 @code{make_lcrecord_list} are the types: vectors, buffers,
5039 char-table, char-table-entry, console, weak-list, database, device,
5040 ldap, hash-table, command-builder, extent-auxiliary, extent-info, face,
5041 coding-system, frame, image-instance, glyph, popup-data, gui-item,
5042 keymap, charset, color_instance, font_instance, opaque, opaque-list,
5043 process, range-table, specifier, symbol-value-buffer-local,
5044 symbol-value-lisp-magic, symbol-value-varalias, toolbar-button,
5045 tooltalk-message, tooltalk-pattern, window, and window-configuration. We
5046 take care of them in the fist place
5047 in order to be able to handle and to finalize items stored in them more
5048 easily. The function @code{sweep_lcrecords_1} as described below is
5049 doing the whole job for us.
5050 For a description about the internals: @xref{lrecords}.
5051
5052 Our next candidates are the other objects that behave quite differently
5053 than everything else: the strings. They consists of two parts, a
5054 fixed-size portion (@code{struct Lisp_String}) holding the string's
5055 length, its property list and a pointer to the second part, and the
5056 actual string data, which is stored in string-chars blocks comparable to
5057 frob blocks. In this block, the data is not only freed, but also a
5058 compression of holes is made, i.e. all strings are relocated together.
5059 @xref{String}. This compacting phase is performed by the function
5060 @code{compact_string_chars}, the actual sweeping by the function
5061 @code{sweep_strings} is described below.
5062
5063 After that, the other types are swept step by step using functions
5064 @code{sweep_conses}, @code{sweep_bit_vectors_1},
5065 @code{sweep_compiled_functions}, @code{sweep_floats},
5066 @code{sweep_symbols}, @code{sweep_extents}, @code{sweep_markers} and
5067 @code{sweep_extents}. They are the fixed-size types cons, floats,
5068 compiled-functions, symbol, marker, extent, and event stored in
5069 so-called "frob blocks", and therefore we can basically do the same on
5070 every type objects, using the same macros, especially defined only to
5071 handle everything with respect to fixed-size blocks. The only fixed-size
5072 type that is not handled here are the fixed-size portion of strings,
5073 because we took special care of them earlier.
5074
5075 The only big exceptions are bit vectors stored differently and
5076 therefore treated differently by the function @code{sweep_bit_vectors_1}
5077 described later.
5078
5079 At first, we need some brief information about how
5080 these fixed-size types are managed in general, in order to understand
5081 how the sweeping is done. They have all a fixed size, and are therefore
5082 stored in big blocks of memory - allocated at once - that can hold a
5083 certain amount of objects of one type. The macro
5084 @code{DECLARE_FIXED_TYPE_ALLOC} creates the suitable structures for
5085 every type. More precisely, we have the block struct
5086 (holding a pointer to the previous block @code{prev} and the
5087 objects in @code{block[]}), a pointer to current block
5088 (@code{current_..._block)}) and its last index
5089 (@code{current_..._block_index}), and a pointer to the free list that
5090 will be created. Also a macro @code{FIXED_TYPE_FROM_BLOCK} plus some
5091 related macros exists that are used to obtain a new object, either from
5092 the free list @code{ALLOCATE_FIXED_TYPE_1} if there is an unused object
5093 of that type stored or by allocating a completely new block using
5094 @code{ALLOCATE_FIXED_TYPE_FROM_BLOCK}.
5095
5096 The rest works as follows: all of them define a
5097 macro @code{UNMARK_...} that is used to unmark the object. They define a
5098 macro @code{ADDITIONAL_FREE_...} that defines additional work that has
5099 to be done when converting an object from in use to not in use (so far,
5100 only markers use it in order to unchain them). Then, they all call
5101 the macro @code{SWEEP_FIXED_TYPE_BLOCK} instantiated with their type name
5102 and their struct name.
5103
5104 This call in particular does the following: we go over all blocks
5105 starting with the current moving towards the oldest.
5106 For each block, we look at every object in it. If the object already
5107 freed (checked with @code{FREE_STRUCT_P} using the first pointer of the
5108 object), or if it is
5109 set to read only (@code{C_READONLY_RECORD_HEADER_P}, nothing must be
5110 done. If it is unmarked (checked with @code{MARKED_RECORD_HEADER_P}), it
5111 is put in the free list and set free (using the macro
5112 @code{FREE_FIXED_TYPE}, otherwise it stays in the block, but is unmarked
5113 (by @code{UNMARK_...}). While going through one block, we note if the
5114 whole block is empty. If so, the whole block is freed (using
5115 @code{xfree}) and the free list state is set to the state it had before
5116 handling this block.
5117
5118 @node sweep_lcrecords_1, compact_string_chars, gc_sweep, Garbage Collection - Step by Step
5119 @subsection @code{sweep_lcrecords_1}
5120 @cindex @code{sweep_lcrecords_1}
5121
5122 After nullifying the complete lcrecord statistics, we go over all
5123 lcrecords two separate times. They are all chained together in a list with
5124 a head called @code{all_lcrecords}.
5125
5126 The first loop calls for each object its @code{finalizer} method, but only
5127 in the case that it is not read only
5128 (@code{C_READONLY_RECORD_HEADER_P)}, it is not already marked
5129 (@code{MARKED_RECORD_HEADER_P}), it is not already in a free list (list of
5130 freed objects, field @code{free}) and finally it owns a finalizer
5131 method.
5132
5133 The second loop actually frees the appropriate objects again by iterating
5134 through the whole list. In case an object is read only or marked, it
5135 has to persist, otherwise it is manually freed by calling
5136 @code{xfree}. During this loop, the lcrecord statistics are kept up to
5137 date by calling @code{tick_lcrecord_stats} with the right arguments,
5138
5139 @node compact_string_chars, sweep_strings, sweep_lcrecords_1, Garbage Collection - Step by Step
5140 @subsection @code{compact_string_chars}
5141 @cindex @code{compact_string_chars}
5142
5143 The purpose of this function is to compact all the data parts of the
5144 strings that are held in so-called @code{string_chars_block}, i.e. the
5145 strings that do not exceed a certain maximal length.
5146
5147 The procedure with which this is done is as follows. We are keeping two
5148 positions in the @code{string_chars_block}s using two pointer/integer
5149 pairs, namely @code{from_sb}/@code{from_pos} and
5150 @code{to_sb}/@code{to_pos}. They stand for the actual positions, from
5151 where to where, to copy the actually handled string.
5152
5153 While going over all chained @code{string_char_block}s and their held
5154 strings, staring at @code{first_string_chars_block}, both pointers
5155 are advanced and eventually a string is copied from @code{from_sb} to
5156 @code{to_sb}, depending on the status of the pointed at strings.
5157
5158 More precisely, we can distinguish between the following actions.
5159 @itemize @bullet
5160 @item
5161 The string at @code{from_sb}'s position could be marked as free, which
5162 is indicated by an invalid pointer to the pointer that should point back
5163 to the fixed size string object, and which is checked by
5164 @code{FREE_STRUCT_P}. In this case, the @code{from_sb}/@code{from_pos}
5165 is advanced to the next string, and nothing has to be copied.
5166 @item
5167 Also, if a string object itself is unmarked, nothing has to be
5168 copied. We likewise advance the @code{from_sb}/@code{from_pos}
5169 pair as described above.
5170 @item
5171 In all other cases, we have a marked string at hand. The string data
5172 must be moved from the from-position to the to-position. In case
5173 there is not enough space in the actual @code{to_sb}-block, we advance
5174 this pointer to the beginning of the next block before copying. In case the
5175 from and to positions are different, we perform the
5176 actual copying using the library function @code{memmove}.
5177 @end itemize
5178
5179 After compacting, the pointer to the current
5180 @code{string_chars_block}, sitting in @code{current_string_chars_block},
5181 is reset on the last block to which we moved a string,
5182 i.e. @code{to_block}, and all remaining blocks (we know that they just
5183 carry garbage) are explicitly @code{xfree}d.
5184
5185 @node sweep_strings, sweep_bit_vectors_1, compact_string_chars, Garbage Collection - Step by Step
5186 @subsection @code{sweep_strings}
5187 @cindex @code{sweep_strings}
5188
5189 The sweeping for the fixed sized string objects is essentially exactly
5190 the same as it is for all other fixed size types. As before, the freeing
5191 into the suitable free list is done by using the macro
5192 @code{SWEEP_FIXED_SIZE_BLOCK} after defining the right macros
5193 @code{UNMARK_string} and @code{ADDITIONAL_FREE_string}. These two
5194 definitions are a little bit special compared to the ones used
5195 for the other fixed size types.
5196
5197 @code{UNMARK_string} is defined the same way except some additional code
5198 used for updating the bookkeeping information.
5199
5200 For strings, @code{ADDITIONAL_FREE_string} has to do something in
5201 addition: in case, the string was not allocated in a
5202 @code{string_chars_block} because it exceeded the maximal length, and
5203 therefore it was @code{malloc}ed separately, we know also @code{xfree}
5204 it explicitly.
5205
5206 @node sweep_bit_vectors_1, , sweep_strings, Garbage Collection - Step by Step
5207 @subsection @code{sweep_bit_vectors_1}
5208 @cindex @code{sweep_bit_vectors_1}
5209
5210 Bit vectors are also one of the rare types that are @code{malloc}ed
5211 individually. Consequently, while sweeping, all further needless
5212 bit vectors must be freed by hand. This is done, as one might imagine,
5213 the expected way: since they are all registered in a list called
5214 @code{all_bit_vectors}, all elements of that list are traversed,
5215 all unmarked bit vectors are unlinked by calling @code{xfree} and all of
5216 them become unmarked.
5217 In addition, the bookkeeping information used for garbage
5218 collector's output purposes is updated.
5219
5220 @node Integers and Characters, Allocation from Frob Blocks, Garbage Collection - Step by Step, Allocation of Objects in XEmacs Lisp
5221 @section Integers and Characters 4718 @section Integers and Characters
5222 4719
5223 Integer and character Lisp objects are created from integers using the 4720 Integer and character Lisp objects are created from integers using the
5224 macros @code{XSETINT()} and @code{XSETCHAR()} or the equivalent 4721 macros @code{XSETINT()} and @code{XSETCHAR()} or the equivalent
5225 functions @code{make_int()} and @code{make_char()}. (These are actually 4722 functions @code{make_int()} and @code{make_char()}. (These are actually
5229 4726
5230 @code{XSETINT()} and the like will truncate values given to them that 4727 @code{XSETINT()} and the like will truncate values given to them that
5231 are too big; i.e. you won't get the value you expected but the tag bits 4728 are too big; i.e. you won't get the value you expected but the tag bits
5232 will at least be correct. 4729 will at least be correct.
5233 4730
5234 @node Allocation from Frob Blocks, lrecords, Integers and Characters, Allocation of Objects in XEmacs Lisp 4731 @node Allocation from Frob Blocks
5235 @section Allocation from Frob Blocks 4732 @section Allocation from Frob Blocks
5236 4733
5237 The uninitialized memory required by a @code{Lisp_Object} of a particular type 4734 The uninitialized memory required by a @code{Lisp_Object} of a particular type
5238 is allocated using 4735 is allocated using
5239 @code{ALLOCATE_FIXED_TYPE()}. This only occurs inside of the 4736 @code{ALLOCATE_FIXED_TYPE()}. This only occurs inside of the
5256 @code{ALLOCATE_FIXED_TYPE_FROM_BLOCK()}, which looks at the end of the 4753 @code{ALLOCATE_FIXED_TYPE_FROM_BLOCK()}, which looks at the end of the
5257 last frob block for space, and creates a new frob block if there is 4754 last frob block for space, and creates a new frob block if there is
5258 none. (There are actually two versions of these macros, one of which is 4755 none. (There are actually two versions of these macros, one of which is
5259 more defensive but less efficient and is used for error-checking.) 4756 more defensive but less efficient and is used for error-checking.)
5260 4757
5261 @node lrecords, Low-level allocation, Allocation from Frob Blocks, Allocation of Objects in XEmacs Lisp 4758 @node lrecords
5262 @section lrecords 4759 @section lrecords
5263 4760
5264 [see @file{lrecord.h}] 4761 [see @file{lrecord.h}]
5265 4762
5266 All lrecords have at the beginning of their structure a @code{struct 4763 All lrecords have at the beginning of their structure a @code{struct
5267 lrecord_header}. This just contains a type number and some flags, 4764 lrecord_header}. This just contains a pointer to a @code{struct
5268 including the mark bit. All builtin type numbers are defined as
5269 constants in @code{enum lrecord_type}, to allow the compiler to generate
5270 more efficient code for @code{@var{type}P}. The type number, thru the
5271 @code{lrecord_implementation_table}, gives access to a @code{struct
5272 lrecord_implementation}, which is a structure containing method pointers 4765 lrecord_implementation}, which is a structure containing method pointers
5273 and such. There is one of these for each type, and it is a global, 4766 and such. There is one of these for each type, and it is a global,
5274 constant, statically-declared structure that is declared in the 4767 constant, statically-declared structure that is declared in the
5275 @code{DEFINE_LRECORD_IMPLEMENTATION()} macro. 4768 @code{DEFINE_LRECORD_IMPLEMENTATION()} macro. (This macro actually
5276 4769 declares an array of two @code{struct lrecord_implementation}
5277 Simple lrecords (of type (b) above) just have a @code{struct 4770 structures. The first one contains all the standard method pointers,
4771 and is used in all normal circumstances. During garbage collection,
4772 however, the lrecord is @dfn{marked} by bumping its implementation
4773 pointer by one, so that it points to the second structure in the array.
4774 This structure contains a special indication in it that it's a
4775 @dfn{marked-object} structure: the finalize method is the special
4776 function @code{this_marks_a_marked_record()}, and all other methods are
4777 null pointers. At the end of garbage collection, all lrecords will
4778 either be reclaimed or unmarked by decrementing their implementation
4779 pointers, so this second structure pointer will never remain past
4780 garbage collection.
4781
4782 Simple lrecords (of type (c) above) just have a @code{struct
5278 lrecord_header} at their beginning. lcrecords, however, actually have a 4783 lrecord_header} at their beginning. lcrecords, however, actually have a
5279 @code{struct lcrecord_header}. This, in turn, has a @code{struct 4784 @code{struct lcrecord_header}. This, in turn, has a @code{struct
5280 lrecord_header} at its beginning, so sanity is preserved; but it also 4785 lrecord_header} at its beginning, so sanity is preserved; but it also
5281 has a pointer used to chain all lcrecords together, and a special ID 4786 has a pointer used to chain all lcrecords together, and a special ID
5282 field used to distinguish one lcrecord from another. (This field is used 4787 field used to distinguish one lcrecord from another. (This field is used
5300 type. 4805 type.
5301 4806
5302 Whenever you create an lrecord, you need to call either 4807 Whenever you create an lrecord, you need to call either
5303 @code{DEFINE_LRECORD_IMPLEMENTATION()} or 4808 @code{DEFINE_LRECORD_IMPLEMENTATION()} or
5304 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()}. This needs to be 4809 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()}. This needs to be
5305 specified in a @file{.c} file, at the top level. What this actually 4810 specified in a C file, at the top level. What this actually does is
5306 does is define and initialize the implementation structure for the 4811 define and initialize the implementation structure for the lrecord. (And
5307 lrecord. (And possibly declares a function @code{error_check_foo()} that 4812 possibly declares a function @code{error_check_foo()} that implements
5308 implements the @code{XFOO()} macro when error-checking is enabled.) The 4813 the @code{XFOO()} macro when error-checking is enabled.) The arguments
5309 arguments to the macros are the actual type name (this is used to 4814 to the macros are the actual type name (this is used to construct the C
5310 construct the C variable name of the lrecord implementation structure 4815 variable name of the lrecord implementation structure and related
5311 and related structures using the @samp{##} macro concatenation 4816 structures using the @samp{##} macro concatenation operator), a string
5312 operator), a string that names the type on the Lisp level (this may not 4817 that names the type on the Lisp level (this may not be the same as the C
5313 be the same as the C type name; typically, the C type name has 4818 type name; typically, the C type name has underscores, while the Lisp
5314 underscores, while the Lisp string has dashes), various method pointers, 4819 string has dashes), various method pointers, and the name of the C
5315 and the name of the C structure that contains the object. The methods 4820 structure that contains the object. The methods are used to encapsulate
5316 are used to encapsulate type-specific information about the object, such 4821 type-specific information about the object, such as how to print it or
5317 as how to print it or mark it for garbage collection, so that it's easy 4822 mark it for garbage collection, so that it's easy to add new object
5318 to add new object types without having to add a specific case for each 4823 types without having to add a specific case for each new type in a bunch
5319 new type in a bunch of different places. 4824 of different places.
5320 4825
5321 The difference between @code{DEFINE_LRECORD_IMPLEMENTATION()} and 4826 The difference between @code{DEFINE_LRECORD_IMPLEMENTATION()} and
5322 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()} is that the former is 4827 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()} is that the former is
5323 used for fixed-size object types and the latter is for variable-size 4828 used for fixed-size object types and the latter is for variable-size
5324 object types. Most object types are fixed-size; some complex 4829 object types. Most object types are fixed-size; some complex
5328 (Currently this is only used for keeping allocation statistics.) 4833 (Currently this is only used for keeping allocation statistics.)
5329 4834
5330 For the purpose of keeping allocation statistics, the allocation 4835 For the purpose of keeping allocation statistics, the allocation
5331 engine keeps a list of all the different types that exist. Note that, 4836 engine keeps a list of all the different types that exist. Note that,
5332 since @code{DEFINE_LRECORD_IMPLEMENTATION()} is a macro that is 4837 since @code{DEFINE_LRECORD_IMPLEMENTATION()} is a macro that is
5333 specified at top-level, there is no way for it to initialize the global 4838 specified at top-level, there is no way for it to add to the list of all
5334 data structures containing type information, like 4839 existing types. What happens instead is that each implementation
5335 @code{lrecord_implementations_table}. For this reason a call to 4840 structure contains in it a dynamically assigned number that is
5336 @code{INIT_LRECORD_IMPLEMENTATION} must be added to the same source file 4841 particular to that type. (Or rather, it contains a pointer to another
5337 containing @code{DEFINE_LRECORD_IMPLEMENTATION}, but instead of to the 4842 structure that contains this number. This evasiveness is done so that
5338 top level, to one of the init functions, typically 4843 the implementation structure can be declared const.) In the sweep stage
5339 @code{syms_of_@var{foo}.c}. @code{INIT_LRECORD_IMPLEMENTATION} must be 4844 of garbage collection, each lrecord is examined to see if its
5340 called before an object of this type is used. 4845 implementation structure has its dynamically-assigned number set. If
5341 4846 not, it must be a new type, and it is added to the list of known types
5342 The type number is also used to index into an array holding the number 4847 and a new number assigned. The number is used to index into an array
5343 of objects of each type and the total memory allocated for objects of 4848 holding the number of objects of each type and the total memory
5344 that type. The statistics in this array are computed during the sweep 4849 allocated for objects of that type. The statistics in this array are
5345 stage. These statistics are returned by the call to 4850 also computed during the sweep stage. These statistics are returned by
5346 @code{garbage-collect}. 4851 the call to @code{garbage-collect} and are printed out at the end of the
4852 loadup phase.
5347 4853
5348 Note that for every type defined with a @code{DEFINE_LRECORD_*()} 4854 Note that for every type defined with a @code{DEFINE_LRECORD_*()}
5349 macro, there needs to be a @code{DECLARE_LRECORD_IMPLEMENTATION()} 4855 macro, there needs to be a @code{DECLARE_LRECORD_IMPLEMENTATION()}
5350 somewhere in a @file{.h} file, and this @file{.h} file needs to be 4856 somewhere in a @file{.h} file, and this @file{.h} file needs to be
5351 included by @file{inline.c}. 4857 included by @file{inline.c}.
5486 (i.e. declared with a @code{_SEQUENCE_IMPLEMENTATION} macro.) This should 4992 (i.e. declared with a @code{_SEQUENCE_IMPLEMENTATION} macro.) This should
5487 simply return the object's size in bytes, exactly as you might expect. 4993 simply return the object's size in bytes, exactly as you might expect.
5488 For an example, see the methods for window configurations and opaques. 4994 For an example, see the methods for window configurations and opaques.
5489 @end enumerate 4995 @end enumerate
5490 4996
5491 @node Low-level allocation, Cons, lrecords, Allocation of Objects in XEmacs Lisp 4997 @node Low-level allocation
5492 @section Low-level allocation 4998 @section Low-level allocation
5493 4999
5494 Memory that you want to allocate directly should be allocated using 5000 Memory that you want to allocate directly should be allocated using
5495 @code{xmalloc()} rather than @code{malloc()}. This implements 5001 @code{xmalloc()} rather than @code{malloc()}. This implements
5496 error-checking on the return value, and once upon a time did some more 5002 error-checking on the return value, and once upon a time did some more
5547 XEmacs taps into them and issues a warning through the standard 5053 XEmacs taps into them and issues a warning through the standard
5548 warning system, when memory gets to 75%, 85%, and 95% full. 5054 warning system, when memory gets to 75%, 85%, and 95% full.
5549 (On some systems, the memory warnings are not functional.) 5055 (On some systems, the memory warnings are not functional.)
5550 5056
5551 Allocated memory that is going to be used to make a Lisp object 5057 Allocated memory that is going to be used to make a Lisp object
5552 is created using @code{allocate_lisp_storage()}. This just calls 5058 is created using @code{allocate_lisp_storage()}. This calls @code{xmalloc()}
5553 @code{xmalloc()}. It used to verify that the pointer to the memory can 5059 but also verifies that the pointer to the memory can fit into
5554 fit into a Lisp word, before the current Lisp object representation was 5060 a Lisp word (remember that some bits are taken away for a type
5555 introduced. @code{allocate_lisp_storage()} is called by 5061 tag and a mark bit). If not, an error is issued through @code{memory_full()}.
5556 @code{alloc_lcrecord()}, @code{ALLOCATE_FIXED_TYPE()}, and the vector 5062 @code{allocate_lisp_storage()} is called by @code{alloc_lcrecord()},
5557 and bit-vector creation routines. These routines also call 5063 @code{ALLOCATE_FIXED_TYPE()}, and the vector and bit-vector creation
5558 @code{INCREMENT_CONS_COUNTER()} at the appropriate times; this keeps 5064 routines. These routines also call @code{INCREMENT_CONS_COUNTER()} at the
5559 statistics on how much memory is allocated, so that garbage-collection 5065 appropriate times; this keeps statistics on how much memory is
5560 can be invoked when the threshold is reached. 5066 allocated, so that garbage-collection can be invoked when the
5561 5067 threshold is reached.
5562 @node Cons, Vector, Low-level allocation, Allocation of Objects in XEmacs Lisp 5068
5069 @node Pure Space
5070 @section Pure Space
5071
5072 Not yet documented.
5073
5074 @node Cons
5563 @section Cons 5075 @section Cons
5564 5076
5565 Conses are allocated in standard frob blocks. The only thing to 5077 Conses are allocated in standard frob blocks. The only thing to
5566 note is that conses can be explicitly freed using @code{free_cons()} 5078 note is that conses can be explicitly freed using @code{free_cons()}
5567 and associated functions @code{free_list()} and @code{free_alist()}. This 5079 and associated functions @code{free_list()} and @code{free_alist()}. This
5571 generating extra objects and thereby triggering GC sooner. 5083 generating extra objects and thereby triggering GC sooner.
5572 However, you have to be @emph{extremely} careful when doing this. 5084 However, you have to be @emph{extremely} careful when doing this.
5573 If you mess this up, you will get BADLY BURNED, and it has happened 5085 If you mess this up, you will get BADLY BURNED, and it has happened
5574 before. 5086 before.
5575 5087
5576 @node Vector, Bit Vector, Cons, Allocation of Objects in XEmacs Lisp 5088 @node Vector
5577 @section Vector 5089 @section Vector
5578 5090
5579 As mentioned above, each vector is @code{malloc()}ed individually, and 5091 As mentioned above, each vector is @code{malloc()}ed individually, and
5580 all are threaded through the variable @code{all_vectors}. Vectors are 5092 all are threaded through the variable @code{all_vectors}. Vectors are
5581 marked strangely during garbage collection, by kludging the size field. 5093 marked strangely during garbage collection, by kludging the size field.
5582 Note that the @code{struct Lisp_Vector} is declared with its 5094 Note that the @code{struct Lisp_Vector} is declared with its
5583 @code{contents} field being a @emph{stretchy} array of one element. It 5095 @code{contents} field being a @emph{stretchy} array of one element. It
5584 is actually @code{malloc()}ed with the right size, however, and access 5096 is actually @code{malloc()}ed with the right size, however, and access
5585 to any element through the @code{contents} array works fine. 5097 to any element through the @code{contents} array works fine.
5586 5098
5587 @node Bit Vector, Symbol, Vector, Allocation of Objects in XEmacs Lisp 5099 @node Bit Vector
5588 @section Bit Vector 5100 @section Bit Vector
5589 5101
5590 Bit vectors work exactly like vectors, except for more complicated 5102 Bit vectors work exactly like vectors, except for more complicated
5591 code to access an individual bit, and except for the fact that bit 5103 code to access an individual bit, and except for the fact that bit
5592 vectors are lrecords while vectors are not. (The only difference here is 5104 vectors are lrecords while vectors are not. (The only difference here is
5593 that there's an lrecord implementation pointer at the beginning and the 5105 that there's an lrecord implementation pointer at the beginning and the
5594 tag field in bit vector Lisp words is ``lrecord'' rather than 5106 tag field in bit vector Lisp words is ``lrecord'' rather than
5595 ``vector''.) 5107 ``vector''.)
5596 5108
5597 @node Symbol, Marker, Bit Vector, Allocation of Objects in XEmacs Lisp 5109 @node Symbol
5598 @section Symbol 5110 @section Symbol
5599 5111
5600 Symbols are also allocated in frob blocks. Symbols in the awful 5112 Symbols are also allocated in frob blocks. Note that the code
5601 horrible obarray structure are chained through their @code{next} field. 5113 exists for symbols to be either lrecords (category (c) above)
5114 or simple types (category (b) above), and are lrecords by
5115 default (I think), although there is no good reason for this.
5116
5117 Note that symbols in the awful horrible obarray structure are
5118 chained through their @code{next} field.
5602 5119
5603 Remember that @code{intern} looks up a symbol in an obarray, creating 5120 Remember that @code{intern} looks up a symbol in an obarray, creating
5604 one if necessary. 5121 one if necessary.
5605 5122
5606 @node Marker, String, Symbol, Allocation of Objects in XEmacs Lisp 5123 @node Marker
5607 @section Marker 5124 @section Marker
5608 5125
5609 Markers are allocated in frob blocks, as usual. They are kept 5126 Markers are allocated in frob blocks, as usual. They are kept
5610 in a buffer unordered, but in a doubly-linked list so that they 5127 in a buffer unordered, but in a doubly-linked list so that they
5611 can easily be removed. (Formerly this was a singly-linked list, 5128 can easily be removed. (Formerly this was a singly-linked list,
5612 but in some cases garbage collection took an extraordinarily 5129 but in some cases garbage collection took an extraordinarily
5613 long time due to the O(N^2) time required to remove lots of 5130 long time due to the O(N^2) time required to remove lots of
5614 markers from a buffer.) Markers are removed from a buffer in 5131 markers from a buffer.) Markers are removed from a buffer in
5615 the finalize stage, in @code{ADDITIONAL_FREE_marker()}. 5132 the finalize stage, in @code{ADDITIONAL_FREE_marker()}.
5616 5133
5617 @node String, Compiled Function, Marker, Allocation of Objects in XEmacs Lisp 5134 @node String
5618 @section String 5135 @section String
5619 5136
5620 As mentioned above, strings are a special case. A string is logically 5137 As mentioned above, strings are a special case. A string is logically
5621 two parts, a fixed-size object (containing the length, property list, 5138 two parts, a fixed-size object (containing the length, property list,
5622 and a pointer to the actual data), and the actual data in the string. 5139 and a pointer to the actual data), and the actual data in the string.
5646 Note that there is one situation not handled: a string that is too big 5163 Note that there is one situation not handled: a string that is too big
5647 to fit into a string-chars block. Such strings, called @dfn{big 5164 to fit into a string-chars block. Such strings, called @dfn{big
5648 strings}, are all @code{malloc()}ed as their own block. (#### Although it 5165 strings}, are all @code{malloc()}ed as their own block. (#### Although it
5649 would make more sense for the threshold for big strings to be somewhat 5166 would make more sense for the threshold for big strings to be somewhat
5650 lower, e.g. 1/2 or 1/4 the size of a string-chars block. It seems that 5167 lower, e.g. 1/2 or 1/4 the size of a string-chars block. It seems that
5651 this was indeed the case formerly---indeed, the threshold was set at 5168 this was indeed the case formerly -- indeed, the threshold was set at
5652 1/8---but Mly forgot about this when rewriting things for 19.8.) 5169 1/8 -- but Mly forgot about this when rewriting things for 19.8.)
5653 5170
5654 Note also that the string data in string-chars blocks is padded as 5171 Note also that the string data in string-chars blocks is padded as
5655 necessary so that proper alignment constraints on the @code{struct 5172 necessary so that proper alignment constraints on the @code{struct
5656 Lisp_String} back pointers are maintained. 5173 Lisp_String} back pointers are maintained.
5657 5174
5673 string data (which would normally be obtained from the now-non-existent 5190 string data (which would normally be obtained from the now-non-existent
5674 @code{struct Lisp_String}) at the beginning of the dead string data gap. 5191 @code{struct Lisp_String}) at the beginning of the dead string data gap.
5675 The string compactor recognizes this special 0xFFFFFFFF marker and 5192 The string compactor recognizes this special 0xFFFFFFFF marker and
5676 handles it correctly. 5193 handles it correctly.
5677 5194
5678 @node Compiled Function, , String, Allocation of Objects in XEmacs Lisp 5195 @node Compiled Function
5679 @section Compiled Function 5196 @section Compiled Function
5680 5197
5681 Not yet documented. 5198 Not yet documented.
5682 5199
5683 5200 @node Events and the Event Loop, Evaluation; Stack Frames; Bindings, Allocation of Objects in XEmacs Lisp, Top
5684 @node Dumping, Events and the Event Loop, Allocation of Objects in XEmacs Lisp, Top
5685 @chapter Dumping
5686
5687 @section What is dumping and its justification
5688
5689 The C code of XEmacs is just a Lisp engine with a lot of built-in
5690 primitives useful for writing an editor. The editor itself is written
5691 mostly in Lisp, and represents around 100K lines of code. Loading and
5692 executing the initialization of all this code takes a bit a time (five
5693 to ten times the usual startup time of current xemacs) and requires
5694 having all the lisp source files around. Having to reload them each
5695 time the editor is started would not be acceptable.
5696
5697 The traditional solution to this problem is called dumping: the build
5698 process first creates the lisp engine under the name @file{temacs}, then
5699 runs it until it has finished loading and initializing all the lisp
5700 code, and eventually creates a new executable called @file{xemacs}
5701 including both the object code in @file{temacs} and all the contents of
5702 the memory after the initialization.
5703
5704 This solution, while working, has a huge problem: the creation of the
5705 new executable from the actual contents of memory is an extremely
5706 system-specific process, quite error-prone, and which interferes with a
5707 lot of system libraries (like malloc). It is even getting worse
5708 nowadays with libraries using constructors which are automatically
5709 called when the program is started (even before main()) which tend to
5710 crash when they are called multiple times, once before dumping and once
5711 after (IRIX 6.x libz.so pulls in some C++ image libraries thru
5712 dependencies which have this problem). Writing the dumper is also one
5713 of the most difficult parts of porting XEmacs to a new operating system.
5714 Basically, `dumping' is an operation that is just not officially
5715 supported on many operating systems.
5716
5717 The aim of the portable dumper is to solve the same problem as the
5718 system-specific dumper, that is to be able to reload quickly, using only
5719 a small number of files, the fully initialized lisp part of the editor,
5720 without any system-specific hacks.
5721
5722 @menu
5723 * Overview::
5724 * Data descriptions::
5725 * Dumping phase::
5726 * Reloading phase::
5727 * Remaining issues::
5728 @end menu
5729
5730 @node Overview, Data descriptions, Dumping, Dumping
5731 @section Overview
5732
5733 The portable dumping system has to:
5734
5735 @enumerate
5736 @item
5737 At dump time, write all initialized, non-quickly-rebuildable data to a
5738 file [Note: currently named @file{xemacs.dmp}, but the name will
5739 change], along with all informations needed for the reloading.
5740
5741 @item
5742 When starting xemacs, reload the dump file, relocate it to its new
5743 starting address if needed, and reinitialize all pointers to this
5744 data. Also, rebuild all the quickly rebuildable data.
5745 @end enumerate
5746
5747 @node Data descriptions, Dumping phase, Overview, Dumping
5748 @section Data descriptions
5749
5750 The more complex task of the dumper is to be able to write lisp objects
5751 (lrecords) and C structs to disk and reload them at a different address,
5752 updating all the pointers they include in the process. This is done by
5753 using external data descriptions that give information about the layout
5754 of the structures in memory.
5755
5756 The specification of these descriptions is in lrecord.h. A description
5757 of an lrecord is an array of struct lrecord_description. Each of these
5758 structs include a type, an offset in the structure and some optional
5759 parameters depending on the type. For instance, here is the string
5760 description:
5761
5762 @example
5763 static const struct lrecord_description string_description[] = @{
5764 @{ XD_BYTECOUNT, offsetof (Lisp_String, size) @},
5765 @{ XD_OPAQUE_DATA_PTR, offsetof (Lisp_String, data), XD_INDIRECT(0, 1) @},
5766 @{ XD_LISP_OBJECT, offsetof (Lisp_String, plist) @},
5767 @{ XD_END @}
5768 @};
5769 @end example
5770
5771 The first line indicates a member of type Bytecount, which is used by
5772 the next, indirect directive. The second means "there is a pointer to
5773 some opaque data in the field @code{data}". The length of said data is
5774 given by the expression @code{XD_INDIRECT(0, 1)}, which means "the value
5775 in the 0th line of the description (welcome to C) plus one". The third
5776 line means "there is a Lisp_Object member @code{plist} in the Lisp_String
5777 structure". @code{XD_END} then ends the description.
5778
5779 This gives us all the information we need to move around what is pointed
5780 to by a structure (C or lrecord) and, by transitivity, everything that
5781 it points to. The only missing information for dumping is the size of
5782 the structure. For lrecords, this is part of the
5783 lrecord_implementation, so we don't need to duplicate it. For C
5784 structures we use a struct struct_description, which includes a size
5785 field and a pointer to an associated array of lrecord_description.
5786
5787 @node Dumping phase, Reloading phase, Data descriptions, Dumping
5788 @section Dumping phase
5789
5790 Dumping is done by calling the function pdump() (in dumper.c) which is
5791 invoked from Fdump_emacs (in emacs.c). This function performs a number
5792 of tasks.
5793
5794 @menu
5795 * Object inventory::
5796 * Address allocation::
5797 * The header::
5798 * Data dumping::
5799 * Pointers dumping::
5800 @end menu
5801
5802 @node Object inventory, Address allocation, Dumping phase, Dumping phase
5803 @subsection Object inventory
5804
5805 The first task is to build the list of the objects to dump. This
5806 includes:
5807
5808 @itemize @bullet
5809 @item lisp objects
5810 @item C structures
5811 @end itemize
5812
5813 We end up with one @code{pdump_entry_list_elmt} per object group (arrays
5814 of C structs are kept together) which includes a pointer to the first
5815 object of the group, the per-object size and the count of objects in the
5816 group, along with some other information which is initialized later.
5817
5818 These entries are linked together in @code{pdump_entry_list} structures
5819 and can be enumerated thru either:
5820
5821 @enumerate
5822 @item
5823 the @code{pdump_object_table}, an array of @code{pdump_entry_list}, one
5824 per lrecord type, indexed by type number.
5825
5826 @item
5827 the @code{pdump_opaque_data_list}, used for the opaque data which does
5828 not include pointers, and hence does not need descriptions.
5829
5830 @item
5831 the @code{pdump_struct_table}, which is a vector of
5832 @code{struct_description}/@code{pdump_entry_list} pairs, used for
5833 non-opaque C structures.
5834 @end enumerate
5835
5836 This uses a marking strategy similar to the garbage collector. Some
5837 differences though:
5838
5839 @enumerate
5840 @item
5841 We do not use the mark bit (which does not exist for C structures
5842 anyway), we use a big hash table instead.
5843
5844 @item
5845 We do not use the mark function of lrecords but instead rely on the
5846 external descriptions. This happens essentially because we need to
5847 follow pointers to C structures and opaque data in addition to
5848 Lisp_Object members.
5849 @end enumerate
5850
5851 This is done by @code{pdump_register_object}, which handles Lisp_Object
5852 variables, and pdump_register_struct which handles C structures, which
5853 both delegate the description management to pdump_register_sub.
5854
5855 The hash table doubles as a map object to pdump_entry_list_elmt (i.e.
5856 allows us to look up a pdump_entry_list_elmt with the object it points
5857 to). Entries are added with @code{pdump_add_entry()} and looked up with
5858 @code{pdump_get_entry()}. There is no need for entry removal. The hash
5859 value is computed quite basically from the object pointer by
5860 @code{pdump_make_hash()}.
5861
5862 The roots for the marking are:
5863
5864 @enumerate
5865 @item
5866 the @code{staticpro}'ed variables (there is a special @code{staticpro_nodump()}
5867 call for protected variables we do not want to dump).
5868
5869 @item
5870 the @code{pdump_wire}'d variables (@code{staticpro} is equivalent to
5871 @code{staticpro_nodump()} + @code{pdump_wire()}).
5872
5873 @item
5874 the @code{dumpstruct}'ed variables, which points to C structures.
5875 @end enumerate
5876
5877 This does not include the GCPRO'ed variables, the specbinds, the
5878 catchtags, the backlist, the redisplay or the profiling info, since we
5879 do not want to rebuild the actual chain of lisp calls which end up to
5880 the dump-emacs call, only the global variables.
5881
5882 Weak lists and weak hash tables are dumped as if they were their
5883 non-weak equivalent (without changing their type, of course). This has
5884 not yet been a problem.
5885
5886 @node Address allocation, The header, Object inventory, Dumping phase
5887 @subsection Address allocation
5888
5889
5890 The next step is to allocate the offsets of each of the objects in the
5891 final dump file. This is done by @code{pdump_allocate_offset()} which
5892 is called indirectly by @code{pdump_scan_by_alignment()}.
5893
5894 The strategy to deal with alignment problems uses these facts:
5895
5896 @enumerate
5897 @item
5898 real world alignment requirements are powers of two.
5899
5900 @item
5901 the C compiler is required to adjust the size of a struct so that you
5902 can have an array of them next to each other. This means you can have a
5903 upper bound of the alignment requirements of a given structure by
5904 looking at which power of two its size is a multiple.
5905
5906 @item
5907 the non-variant part of variable size lrecords has an alignment
5908 requirement of 4.
5909 @end enumerate
5910
5911 Hence, for each lrecord type, C struct type or opaque data block the
5912 alignment requirement is computed as a power of two, with a minimum of
5913 2^2 for lrecords. @code{pdump_scan_by_alignment()} then scans all the
5914 @code{pdump_entry_list_elmt}'s, the ones with the highest requirements
5915 first. This ensures the best packing.
5916
5917 The maximum alignment requirement we take into account is 2^8.
5918
5919 @code{pdump_allocate_offset()} only has to do a linear allocation,
5920 starting at offset 256 (this leaves room for the header and keep the
5921 alignments happy).
5922
5923 @node The header, Data dumping, Address allocation, Dumping phase
5924 @subsection The header
5925
5926 The next step creates the file and writes a header with a signature and
5927 some random informations in it (number of staticpro, number of assigned
5928 lrecord types, etc...). The reloc_address field, which indicates at
5929 which address the file should be loaded if we want to avoid post-reload
5930 relocation, is set to 0. It then seeks to offset 256 (base offset for
5931 the objects).
5932
5933 @node Data dumping, Pointers dumping, The header, Dumping phase
5934 @subsection Data dumping
5935
5936 The data is dumped in the same order as the addresses were allocated by
5937 @code{pdump_dump_data()}, called from @code{pdump_scan_by_alignment()}.
5938 This function copies the data to a temporary buffer, relocates all
5939 pointers in the object to the addresses allocated in step Address
5940 Allocation, and writes it to the file. Using the same order means that,
5941 if we are careful with lrecords whose size is not a multiple of 4, we
5942 are ensured that the object is always written at the offset in the file
5943 allocated in step Address Allocation.
5944
5945 @node Pointers dumping, , Data dumping, Dumping phase
5946 @subsection Pointers dumping
5947
5948 A bunch of tables needed to reassign properly the global pointers are
5949 then written. They are:
5950
5951 @enumerate
5952 @item
5953 the staticpro array
5954 @item
5955 the dumpstruct array
5956 @item
5957 the lrecord_implementation_table array
5958 @item
5959 a vector of all the offsets to the objects in the file that include a
5960 description (for faster relocation at reload time)
5961 @item
5962 the pdump_wired and pdump_wired_list arrays
5963 @end enumerate
5964
5965 For each of the arrays we write both the pointer to the variables and
5966 the relocated offset of the object they point to. Since these variables
5967 are global, the pointers are still valid when restarting the program and
5968 are used to regenerate the global pointers.
5969
5970 The @code{pdump_wired_list} array is a special case. The variables it
5971 points to are the head of weak linked lists of lisp objects of the same
5972 type. Not all objects of this list are dumped so the relocated pointer
5973 we associate with them points to the first dumped object of the list, or
5974 Qnil if none is available. This is also the reason why they are not
5975 used as roots for the purpose of object enumeration.
5976
5977 This is the end of the dumping part.
5978
5979 @node Reloading phase, Remaining issues, Dumping phase, Dumping
5980 @section Reloading phase
5981
5982 @subsection File loading
5983
5984 The file is mmap'ed in memory (which ensures a PAGESIZE alignment, at
5985 least 4096), or if mmap is unavailable or fails, a 256-bytes aligned
5986 malloc is done and the file is loaded.
5987
5988 Some variables are reinitialized from the values found in the header.
5989
5990 The difference between the actual loading address and the reloc_address
5991 is computed and will be used for all the relocations.
5992
5993
5994 @subsection Putting back the staticvec
5995
5996 The staticvec array is memcpy'd from the file and the variables it
5997 points to are reset to the relocated objects addresses.
5998
5999
6000 @subsection Putting back the dumpstructed variables
6001
6002 The variables pointed to by dumpstruct in the dump phase are reset to
6003 the right relocated object addresses.
6004
6005
6006 @subsection lrecord_implementations_table
6007
6008 The lrecord_implementations_table is reset to its dump time state and
6009 the right lrecord_type_index values are put in.
6010
6011
6012 @subsection Object relocation
6013
6014 All the objects are relocated using their description and their offset
6015 by @code{pdump_reloc_one}. This step is unnecessary if the
6016 reloc_address is equal to the file loading address.
6017
6018
6019 @subsection Putting back the pdump_wire and pdump_wire_list variables
6020
6021 Same as Putting back the dumpstructed variables.
6022
6023
6024 @subsection Reorganize the hash tables
6025
6026 Since some of the hash values in the lisp hash tables are
6027 address-dependent, their layout is now wrong. So we go through each of
6028 them and have them resorted by calling @code{pdump_reorganize_hash_table}.
6029
6030 @node Remaining issues, , Reloading phase, Dumping
6031 @section Remaining issues
6032
6033 The build process will have to start a post-dump xemacs, ask it the
6034 loading address (which will, hopefully, be always the same between
6035 different xemacs invocations) and relocate the file to the new address.
6036 This way the object relocation phase will not have to be done, which
6037 means no writes in the objects and that, because of the use of mmap, the
6038 dumped data will be shared between all the xemacs running on the
6039 computer.
6040
6041 Some executable signature will be necessary to ensure that a given dump
6042 file is really associated with a given executable, or random crashes
6043 will occur. Maybe a random number set at compile or configure time thru
6044 a define. This will also allow for having differently-compiled xemacsen
6045 on the same system (mule and no-mule comes to mind).
6046
6047 The DOC file contents should probably end up in the dump file.
6048
6049
6050 @node Events and the Event Loop, Evaluation; Stack Frames; Bindings, Dumping, Top
6051 @chapter Events and the Event Loop 5201 @chapter Events and the Event Loop
6052 5202
6053 @menu 5203 @menu
6054 * Introduction to Events:: 5204 * Introduction to Events::
6055 * Main Loop:: 5205 * Main Loop::
6059 * Other Event Loop Functions:: 5209 * Other Event Loop Functions::
6060 * Converting Events:: 5210 * Converting Events::
6061 * Dispatching Events; The Command Builder:: 5211 * Dispatching Events; The Command Builder::
6062 @end menu 5212 @end menu
6063 5213
6064 @node Introduction to Events, Main Loop, Events and the Event Loop, Events and the Event Loop 5214 @node Introduction to Events
6065 @section Introduction to Events 5215 @section Introduction to Events
6066 5216
6067 An event is an object that encapsulates information about an 5217 An event is an object that encapsulates information about an
6068 interesting occurrence in the operating system. Events are 5218 interesting occurrence in the operating system. Events are
6069 generated either by user action, direct (e.g. typing on the 5219 generated either by user action, direct (e.g. typing on the
6093 XEmacs has its own types of events (called @dfn{Emacs events}), 5243 XEmacs has its own types of events (called @dfn{Emacs events}),
6094 which provides an abstract layer on top of the system-dependent 5244 which provides an abstract layer on top of the system-dependent
6095 nature of the most basic events that are received. Part of the 5245 nature of the most basic events that are received. Part of the
6096 complex nature of the XEmacs event collection process involves 5246 complex nature of the XEmacs event collection process involves
6097 converting from the operating-system events into the proper 5247 converting from the operating-system events into the proper
6098 Emacs events---there may not be a one-to-one correspondence. 5248 Emacs events -- there may not be a one-to-one correspondence.
6099 5249
6100 Emacs events are documented in @file{events.h}; I'll discuss them 5250 Emacs events are documented in @file{events.h}; I'll discuss them
6101 later. 5251 later.
6102 5252
6103 @node Main Loop, Specifics of the Event Gathering Mechanism, Introduction to Events, Events and the Event Loop 5253 @node Main Loop
6104 @section Main Loop 5254 @section Main Loop
6105 5255
6106 The @dfn{command loop} is the top-level loop that the editor is always 5256 The @dfn{command loop} is the top-level loop that the editor is always
6107 running. It loops endlessly, calling @code{next-event} to retrieve an 5257 running. It loops endlessly, calling @code{next-event} to retrieve an
6108 event and @code{dispatch-event} to execute it. @code{dispatch-event} does 5258 event and @code{dispatch-event} to execute it. @code{dispatch-event} does
6118 one console), and the engine that looks up keystrokes and 5268 one console), and the engine that looks up keystrokes and
6119 constructs full key sequences is called the @dfn{command builder}. 5269 constructs full key sequences is called the @dfn{command builder}.
6120 This is documented elsewhere. 5270 This is documented elsewhere.
6121 5271
6122 The guts of the command loop are in @code{command_loop_1()}. This 5272 The guts of the command loop are in @code{command_loop_1()}. This
6123 function doesn't catch errors, though---that's the job of 5273 function doesn't catch errors, though -- that's the job of
6124 @code{command_loop_2()}, which is a condition-case (i.e. error-trapping) 5274 @code{command_loop_2()}, which is a condition-case (i.e. error-trapping)
6125 wrapper around @code{command_loop_1()}. @code{command_loop_1()} never 5275 wrapper around @code{command_loop_1()}. @code{command_loop_1()} never
6126 returns, but may get thrown out of. 5276 returns, but may get thrown out of.
6127 5277
6128 When an error occurs, @code{cmd_error()} is called, which usually 5278 When an error occurs, @code{cmd_error()} is called, which usually
6165 wrapper similar to @code{command_loop_2()}. Note also that 5315 wrapper similar to @code{command_loop_2()}. Note also that
6166 @code{initial_command_loop()} sets up a catch for @code{top-level} when 5316 @code{initial_command_loop()} sets up a catch for @code{top-level} when
6167 invoking @code{top_level_1()}, just like when it invokes 5317 invoking @code{top_level_1()}, just like when it invokes
6168 @code{command_loop_2()}. 5318 @code{command_loop_2()}.
6169 5319
6170 @node Specifics of the Event Gathering Mechanism, Specifics About the Emacs Event, Main Loop, Events and the Event Loop 5320 @node Specifics of the Event Gathering Mechanism
6171 @section Specifics of the Event Gathering Mechanism 5321 @section Specifics of the Event Gathering Mechanism
6172 5322
6173 Here is an approximate diagram of the collection processes 5323 Here is an approximate diagram of the collection processes
6174 at work in XEmacs, under TTY's (TTY's are simpler than X 5324 at work in XEmacs, under TTY's (TTY's are simpler than X
6175 so we'll look at this first): 5325 so we'll look at this first):
6404 which repeatedly calls `next-event' 5554 which repeatedly calls `next-event'
6405 and then dispatches the event 5555 and then dispatches the event
6406 using `dispatch-event' 5556 using `dispatch-event'
6407 @end example 5557 @end example
6408 5558
6409 @node Specifics About the Emacs Event, The Event Stream Callback Routines, Specifics of the Event Gathering Mechanism, Events and the Event Loop 5559 @node Specifics About the Emacs Event
6410 @section Specifics About the Emacs Event 5560 @section Specifics About the Emacs Event
6411 5561
6412 @node The Event Stream Callback Routines, Other Event Loop Functions, Specifics About the Emacs Event, Events and the Event Loop 5562 @node The Event Stream Callback Routines
6413 @section The Event Stream Callback Routines 5563 @section The Event Stream Callback Routines
6414 5564
6415 @node Other Event Loop Functions, Converting Events, The Event Stream Callback Routines, Events and the Event Loop 5565 @node Other Event Loop Functions
6416 @section Other Event Loop Functions 5566 @section Other Event Loop Functions
6417 5567
6418 @code{detect_input_pending()} and @code{input-pending-p} look for 5568 @code{detect_input_pending()} and @code{input-pending-p} look for
6419 input by calling @code{event_stream->event_pending_p} and looking in 5569 input by calling @code{event_stream->event_pending_p} and looking in
6420 @code{[V]unread-command-event} and the @code{command_event_queue} (they 5570 @code{[V]unread-command-event} and the @code{command_event_queue} (they
6432 @code{read-char} calls @code{next-command-event} and uses 5582 @code{read-char} calls @code{next-command-event} and uses
6433 @code{event_to_character()} to return the character equivalent. With 5583 @code{event_to_character()} to return the character equivalent. With
6434 the right kind of input method support, it is possible for (read-char) 5584 the right kind of input method support, it is possible for (read-char)
6435 to return a Kanji character. 5585 to return a Kanji character.
6436 5586
6437 @node Converting Events, Dispatching Events; The Command Builder, Other Event Loop Functions, Events and the Event Loop 5587 @node Converting Events
6438 @section Converting Events 5588 @section Converting Events
6439 5589
6440 @code{character_to_event()}, @code{event_to_character()}, 5590 @code{character_to_event()}, @code{event_to_character()},
6441 @code{event-to-character}, and @code{character-to-event} convert between 5591 @code{event-to-character}, and @code{character-to-event} convert between
6442 characters and keypress events corresponding to the characters. If the 5592 characters and keypress events corresponding to the characters. If the
6443 event was not a keypress, @code{event_to_character()} returns -1 and 5593 event was not a keypress, @code{event_to_character()} returns -1 and
6444 @code{event-to-character} returns @code{nil}. These functions convert 5594 @code{event-to-character} returns @code{nil}. These functions convert
6445 between character representation and the split-up event representation 5595 between character representation and the split-up event representation
6446 (keysym plus mod keys). 5596 (keysym plus mod keys).
6447 5597
6448 @node Dispatching Events; The Command Builder, , Converting Events, Events and the Event Loop 5598 @node Dispatching Events; The Command Builder
6449 @section Dispatching Events; The Command Builder 5599 @section Dispatching Events; The Command Builder
6450 5600
6451 Not yet documented. 5601 Not yet documented.
6452 5602
6453 @node Evaluation; Stack Frames; Bindings, Symbols and Variables, Events and the Event Loop, Top 5603 @node Evaluation; Stack Frames; Bindings, Symbols and Variables, Events and the Event Loop, Top
6458 * Dynamic Binding; The specbinding Stack; Unwind-Protects:: 5608 * Dynamic Binding; The specbinding Stack; Unwind-Protects::
6459 * Simple Special Forms:: 5609 * Simple Special Forms::
6460 * Catch and Throw:: 5610 * Catch and Throw::
6461 @end menu 5611 @end menu
6462 5612
6463 @node Evaluation, Dynamic Binding; The specbinding Stack; Unwind-Protects, Evaluation; Stack Frames; Bindings, Evaluation; Stack Frames; Bindings 5613 @node Evaluation
6464 @section Evaluation 5614 @section Evaluation
6465 5615
6466 @code{Feval()} evaluates the form (a Lisp object) that is passed to 5616 @code{Feval()} evaluates the form (a Lisp object) that is passed to
6467 it. Note that evaluation is only non-trivial for two types of objects: 5617 it. Note that evaluation is only non-trivial for two types of objects:
6468 symbols and conses. A symbol is evaluated simply by calling 5618 symbols and conses. A symbol is evaluated simply by calling
6527 @code{funcall_compiled_function()} calls the real byte-code interpreter 5677 @code{funcall_compiled_function()} calls the real byte-code interpreter
6528 @code{execute_optimized_program()} on the byte-code instructions, which 5678 @code{execute_optimized_program()} on the byte-code instructions, which
6529 are converted into an internal form for faster execution. 5679 are converted into an internal form for faster execution.
6530 5680
6531 When a compiled function is executed for the first time by 5681 When a compiled function is executed for the first time by
6532 @code{funcall_compiled_function()}, or during the dump phase of building 5682 @code{funcall_compiled_function()}, or when it is @code{Fpurecopy()}ed
6533 XEmacs, the byte-code instructions are converted from a 5683 during the dump phase of building XEmacs, the byte-code instructions are
6534 @code{Lisp_String} (which is inefficient to access, especially in the 5684 converted from a @code{Lisp_String} (which is inefficient to access,
6535 presence of MULE) into a @code{Lisp_Opaque} object containing an array 5685 especially in the presence of MULE) into a @code{Lisp_Opaque} object
6536 of unsigned char, which can be directly executed by the byte-code 5686 containing an array of unsigned char, which can be directly executed by
6537 interpreter. At this time the byte code is also analyzed for validity 5687 the byte-code interpreter. At this time the byte code is also analyzed
6538 and transformed into a more optimized form, so that 5688 for validity and transformed into a more optimized form, so that
6539 @code{execute_optimized_program()} can really fly. 5689 @code{execute_optimized_program()} can really fly.
6540 5690
6541 Here are some of the optimizations performed by the internal byte-code 5691 Here are some of the optimizations performed by the internal byte-code
6542 transformer: 5692 transformer:
6543 @enumerate 5693 @enumerate
6548 References to the @code{constants} array that will be used as a Lisp 5698 References to the @code{constants} array that will be used as a Lisp
6549 variable are checked for being correct non-constant (i.e. not @code{t}, 5699 variable are checked for being correct non-constant (i.e. not @code{t},
6550 @code{nil}, or @code{keywordp}) symbols, so that the byte interpreter 5700 @code{nil}, or @code{keywordp}) symbols, so that the byte interpreter
6551 doesn't have to. 5701 doesn't have to.
6552 @item 5702 @item
6553 The maximum number of variable bindings in the byte-code is 5703 The maxiumum number of variable bindings in the byte-code is
6554 pre-computed, so that space on the @code{specpdl} stack can be 5704 pre-computed, so that space on the @code{specpdl} stack can be
6555 pre-reserved once for the whole function execution. 5705 pre-reserved once for the whole function execution.
6556 @item 5706 @item
6557 All byte-code jumps are relative to the current program counter instead 5707 All byte-code jumps are relative to the current program counter instead
6558 of the start of the program, thereby saving a register. 5708 of the start of the program, thereby saving a register.
6588 @code{call3()} call a function, passing it the argument(s) given (the 5738 @code{call3()} call a function, passing it the argument(s) given (the
6589 arguments are given as separate C arguments rather than being passed as 5739 arguments are given as separate C arguments rather than being passed as
6590 an array). @code{apply1()} uses @code{Fapply()} while the others use 5740 an array). @code{apply1()} uses @code{Fapply()} while the others use
6591 @code{Ffuncall()} to do the real work. 5741 @code{Ffuncall()} to do the real work.
6592 5742
6593 @node Dynamic Binding; The specbinding Stack; Unwind-Protects, Simple Special Forms, Evaluation, Evaluation; Stack Frames; Bindings 5743 @node Dynamic Binding; The specbinding Stack; Unwind-Protects
6594 @section Dynamic Binding; The specbinding Stack; Unwind-Protects 5744 @section Dynamic Binding; The specbinding Stack; Unwind-Protects
6595 5745
6596 @example 5746 @example
6597 struct specbinding 5747 struct specbinding
6598 @{ 5748 @{
6642 a local-variable binding (@code{func} is 0, @code{symbol} is not 5792 a local-variable binding (@code{func} is 0, @code{symbol} is not
6643 @code{nil}, and @code{old_value} holds the old value, which is stored as 5793 @code{nil}, and @code{old_value} holds the old value, which is stored as
6644 the symbol's value). 5794 the symbol's value).
6645 @end enumerate 5795 @end enumerate
6646 5796
6647 @node Simple Special Forms, Catch and Throw, Dynamic Binding; The specbinding Stack; Unwind-Protects, Evaluation; Stack Frames; Bindings 5797 @node Simple Special Forms
6648 @section Simple Special Forms 5798 @section Simple Special Forms
6649 5799
6650 @code{or}, @code{and}, @code{if}, @code{cond}, @code{progn}, 5800 @code{or}, @code{and}, @code{if}, @code{cond}, @code{progn},
6651 @code{prog1}, @code{prog2}, @code{setq}, @code{quote}, @code{function}, 5801 @code{prog1}, @code{prog2}, @code{setq}, @code{quote}, @code{function},
6652 @code{let*}, @code{let}, @code{while} 5802 @code{let*}, @code{let}, @code{while}
6654 All of these are very simple and work as expected, calling 5804 All of these are very simple and work as expected, calling
6655 @code{Feval()} or @code{Fprogn()} as necessary and (in the case of 5805 @code{Feval()} or @code{Fprogn()} as necessary and (in the case of
6656 @code{let} and @code{let*}) using @code{specbind()} to create bindings 5806 @code{let} and @code{let*}) using @code{specbind()} to create bindings
6657 and @code{unbind_to()} to undo the bindings when finished. 5807 and @code{unbind_to()} to undo the bindings when finished.
6658 5808
6659 Note that, with the exception of @code{Fprogn}, these functions are 5809 Note that, with the exeption of @code{Fprogn}, these functions are
6660 typically called in real life only in interpreted code, since the byte 5810 typically called in real life only in interpreted code, since the byte
6661 compiler knows how to convert calls to these functions directly into 5811 compiler knows how to convert calls to these functions directly into
6662 byte code. 5812 byte code.
6663 5813
6664 @node Catch and Throw, , Simple Special Forms, Evaluation; Stack Frames; Bindings 5814 @node Catch and Throw
6665 @section Catch and Throw 5815 @section Catch and Throw
6666 5816
6667 @example 5817 @example
6668 struct catchtag 5818 struct catchtag
6669 @{ 5819 @{
6727 * Introduction to Symbols:: 5877 * Introduction to Symbols::
6728 * Obarrays:: 5878 * Obarrays::
6729 * Symbol Values:: 5879 * Symbol Values::
6730 @end menu 5880 @end menu
6731 5881
6732 @node Introduction to Symbols, Obarrays, Symbols and Variables, Symbols and Variables 5882 @node Introduction to Symbols
6733 @section Introduction to Symbols 5883 @section Introduction to Symbols
6734 5884
6735 A symbol is basically just an object with four fields: a name (a 5885 A symbol is basically just an object with four fields: a name (a
6736 string), a value (some Lisp object), a function (some Lisp object), and 5886 string), a value (some Lisp object), a function (some Lisp object), and
6737 a property list (usually a list of alternating keyword/value pairs). 5887 a property list (usually a list of alternating keyword/value pairs).
6744 there can be a distinct function and variable with the same name. The 5894 there can be a distinct function and variable with the same name. The
6745 property list is used as a more general mechanism of associating 5895 property list is used as a more general mechanism of associating
6746 additional values with particular names, and once again the namespace is 5896 additional values with particular names, and once again the namespace is
6747 independent of the function and variable namespaces. 5897 independent of the function and variable namespaces.
6748 5898
6749 @node Obarrays, Symbol Values, Introduction to Symbols, Symbols and Variables 5899 @node Obarrays
6750 @section Obarrays 5900 @section Obarrays
6751 5901
6752 The identity of symbols with their names is accomplished through a 5902 The identity of symbols with their names is accomplished through a
6753 structure called an obarray, which is just a poorly-implemented hash 5903 structure called an obarray, which is just a poorly-implemented hash
6754 table mapping from strings to symbols whose name is that string. (I say 5904 table mapping from strings to symbols whose name is that string. (I say
6811 a new one, and @code{unintern} to remove a symbol from an obarray. This 5961 a new one, and @code{unintern} to remove a symbol from an obarray. This
6812 returns the removed symbol. (Remember: You can't put the symbol back 5962 returns the removed symbol. (Remember: You can't put the symbol back
6813 into any obarray.) Finally, @code{mapatoms} maps over all of the symbols 5963 into any obarray.) Finally, @code{mapatoms} maps over all of the symbols
6814 in an obarray. 5964 in an obarray.
6815 5965
6816 @node Symbol Values, , Obarrays, Symbols and Variables 5966 @node Symbol Values
6817 @section Symbol Values 5967 @section Symbol Values
6818 5968
6819 The value field of a symbol normally contains a Lisp object. However, 5969 The value field of a symbol normally contains a Lisp object. However,
6820 a symbol can be @dfn{unbound}, meaning that it logically has no value. 5970 a symbol can be @dfn{unbound}, meaning that it logically has no value.
6821 This is internally indicated by storing a special Lisp object, called 5971 This is internally indicated by storing a special Lisp object, called
6866 * Markers and Extents:: Tagging locations within a buffer. 6016 * Markers and Extents:: Tagging locations within a buffer.
6867 * Bufbytes and Emchars:: Representation of individual characters. 6017 * Bufbytes and Emchars:: Representation of individual characters.
6868 * The Buffer Object:: The Lisp object corresponding to a buffer. 6018 * The Buffer Object:: The Lisp object corresponding to a buffer.
6869 @end menu 6019 @end menu
6870 6020
6871 @node Introduction to Buffers, The Text in a Buffer, Buffers and Textual Representation, Buffers and Textual Representation 6021 @node Introduction to Buffers
6872 @section Introduction to Buffers 6022 @section Introduction to Buffers
6873 6023
6874 A buffer is logically just a Lisp object that holds some text. 6024 A buffer is logically just a Lisp object that holds some text.
6875 In this, it is like a string, but a buffer is optimized for 6025 In this, it is like a string, but a buffer is optimized for
6876 frequent insertion and deletion, while a string is not. Furthermore: 6026 frequent insertion and deletion, while a string is not. Furthermore:
6919 and @dfn{buffer of the selected window}, and the distinction between 6069 and @dfn{buffer of the selected window}, and the distinction between
6920 @dfn{point} of the current buffer and @dfn{window-point} of the selected 6070 @dfn{point} of the current buffer and @dfn{window-point} of the selected
6921 window. (This latter distinction is explained in detail in the section 6071 window. (This latter distinction is explained in detail in the section
6922 on windows.) 6072 on windows.)
6923 6073
6924 @node The Text in a Buffer, Buffer Lists, Introduction to Buffers, Buffers and Textual Representation 6074 @node The Text in a Buffer
6925 @section The Text in a Buffer 6075 @section The Text in a Buffer
6926 6076
6927 The text in a buffer consists of a sequence of zero or more 6077 The text in a buffer consists of a sequence of zero or more
6928 characters. A @dfn{character} is an integer that logically represents 6078 characters. A @dfn{character} is an integer that logically represents
6929 a letter, number, space, or other unit of text. Most of the characters 6079 a letter, number, space, or other unit of text. Most of the characters
7059 Bufbytes underscores the fact that we are working with a string of bytes 6209 Bufbytes underscores the fact that we are working with a string of bytes
7060 in the internal Emacs buffer representation rather than in one of a 6210 in the internal Emacs buffer representation rather than in one of a
7061 number of possible alternative representations (e.g. EUC-encoded text, 6211 number of possible alternative representations (e.g. EUC-encoded text,
7062 etc.). 6212 etc.).
7063 6213
7064 @node Buffer Lists, Markers and Extents, The Text in a Buffer, Buffers and Textual Representation 6214 @node Buffer Lists
7065 @section Buffer Lists 6215 @section Buffer Lists
7066 6216
7067 Recall earlier that buffers are @dfn{permanent} objects, i.e. that 6217 Recall earlier that buffers are @dfn{permanent} objects, i.e. that
7068 they remain around until explicitly deleted. This entails that there is 6218 they remain around until explicitly deleted. This entails that there is
7069 a list of all the buffers in existence. This list is actually an 6219 a list of all the buffers in existence. This list is actually an
7095 respectively. You can also force a new buffer to be created using 6245 respectively. You can also force a new buffer to be created using
7096 @code{generate-new-buffer}, which takes a name and (if necessary) makes 6246 @code{generate-new-buffer}, which takes a name and (if necessary) makes
7097 a unique name from this by appending a number, and then creates the 6247 a unique name from this by appending a number, and then creates the
7098 buffer. This is basically like the symbol operation @code{gensym}. 6248 buffer. This is basically like the symbol operation @code{gensym}.
7099 6249
7100 @node Markers and Extents, Bufbytes and Emchars, Buffer Lists, Buffers and Textual Representation 6250 @node Markers and Extents
7101 @section Markers and Extents 6251 @section Markers and Extents
7102 6252
7103 Among the things associated with a buffer are things that are 6253 Among the things associated with a buffer are things that are
7104 logically attached to certain buffer positions. This can be used to 6254 logically attached to certain buffer positions. This can be used to
7105 keep track of a buffer position when text is inserted and deleted, so 6255 keep track of a buffer position when text is inserted and deleted, so
7121 6271
7122 The important thing here is that markers and extents simply contain 6272 The important thing here is that markers and extents simply contain
7123 buffer positions in them as integers, and every time text is inserted or 6273 buffer positions in them as integers, and every time text is inserted or
7124 deleted, these positions must be updated. In order to minimize the 6274 deleted, these positions must be updated. In order to minimize the
7125 amount of shuffling that needs to be done, the positions in markers and 6275 amount of shuffling that needs to be done, the positions in markers and
7126 extents (there's one per marker, two per extent) are stored in Meminds. 6276 extents (there's one per marker, two per extent) and stored in Meminds.
7127 This means that they only need to be moved when the text is physically 6277 This means that they only need to be moved when the text is physically
7128 moved in memory; since the gap structure tries to minimize this, it also 6278 moved in memory; since the gap structure tries to minimize this, it also
7129 minimizes the number of marker and extent indices that need to be 6279 minimizes the number of marker and extent indices that need to be
7130 adjusted. Look in @file{insdel.c} for the details of how this works. 6280 adjusted. Look in @file{insdel.c} for the details of how this works.
7131 6281
7135 is no way to determine what markers are in a buffer if you are just 6285 is no way to determine what markers are in a buffer if you are just
7136 given the buffer. Extents remain in a buffer until they are detached 6286 given the buffer. Extents remain in a buffer until they are detached
7137 (which could happen as a result of text being deleted) or the buffer is 6287 (which could happen as a result of text being deleted) or the buffer is
7138 deleted, and primitives do exist to enumerate the extents in a buffer. 6288 deleted, and primitives do exist to enumerate the extents in a buffer.
7139 6289
7140 @node Bufbytes and Emchars, The Buffer Object, Markers and Extents, Buffers and Textual Representation 6290 @node Bufbytes and Emchars
7141 @section Bufbytes and Emchars 6291 @section Bufbytes and Emchars
7142 6292
7143 Not yet documented. 6293 Not yet documented.
7144 6294
7145 @node The Buffer Object, , Bufbytes and Emchars, Buffers and Textual Representation 6295 @node The Buffer Object
7146 @section The Buffer Object 6296 @section The Buffer Object
7147 6297
7148 Buffers contain fields not directly accessible by the Lisp programmer. 6298 Buffers contain fields not directly accessible by the Lisp programmer.
7149 We describe them here, naming them by the names used in the C code. 6299 We describe them here, naming them by the names used in the C code.
7150 Many are accessible indirectly in Lisp programs via Lisp primitives. 6300 Many are accessible indirectly in Lisp programs via Lisp primitives.
7259 * Encodings:: 6409 * Encodings::
7260 * Internal Mule Encodings:: 6410 * Internal Mule Encodings::
7261 * CCL:: 6411 * CCL::
7262 @end menu 6412 @end menu
7263 6413
7264 @node Character Sets, Encodings, MULE Character Sets and Encodings, MULE Character Sets and Encodings 6414 @node Character Sets
7265 @section Character Sets 6415 @section Character Sets
7266 6416
7267 A character set (or @dfn{charset}) is an ordered set of characters. A 6417 A character set (or @dfn{charset}) is an ordered set of characters. A
7268 particular character in a charset is indexed using one or more 6418 particular character in a charset is indexed using one or more
7269 @dfn{position codes}, which are non-negative integers. The number of 6419 @dfn{position codes}, which are non-negative integers. The number of
7340 160 - 255 Latin-1 32 - 127 6490 160 - 255 Latin-1 32 - 127
7341 @end example 6491 @end example
7342 6492
7343 This is a bit ad-hoc but gets the job done. 6493 This is a bit ad-hoc but gets the job done.
7344 6494
7345 @node Encodings, Internal Mule Encodings, Character Sets, MULE Character Sets and Encodings 6495 @node Encodings
7346 @section Encodings 6496 @section Encodings
7347 6497
7348 An @dfn{encoding} is a way of numerically representing characters from 6498 An @dfn{encoding} is a way of numerically representing characters from
7349 one or more character sets. If an encoding only encompasses one 6499 one or more character sets. If an encoding only encompasses one
7350 character set, then the position codes for the characters in that 6500 character set, then the position codes for the characters in that
7367 @menu 6517 @menu
7368 * Japanese EUC (Extended Unix Code):: 6518 * Japanese EUC (Extended Unix Code)::
7369 * JIS7:: 6519 * JIS7::
7370 @end menu 6520 @end menu
7371 6521
7372 @node Japanese EUC (Extended Unix Code), JIS7, Encodings, Encodings 6522 @node Japanese EUC (Extended Unix Code)
7373 @subsection Japanese EUC (Extended Unix Code) 6523 @subsection Japanese EUC (Extended Unix Code)
7374 6524
7375 This encompasses the character sets Printing-ASCII, Japanese-JISX0201, 6525 This encompasses the character sets Printing-ASCII, Japanese-JISX0201,
7376 and Japanese-JISX0208-Kana (half-width katakana, the right half of 6526 and Japanese-JISX0208-Kana (half-width katakana, the right half of
7377 JISX0201). It uses 8-bit bytes. 6527 JISX0201). It uses 8-bit bytes.
7389 Japanese-JISX0208 PC1 + 0x80 | PC2 + 0x80 6539 Japanese-JISX0208 PC1 + 0x80 | PC2 + 0x80
7390 Japanese-JISX0212 PC1 + 0x80 | PC2 + 0x80 6540 Japanese-JISX0212 PC1 + 0x80 | PC2 + 0x80
7391 @end example 6541 @end example
7392 6542
7393 6543
7394 @node JIS7, , Japanese EUC (Extended Unix Code), Encodings 6544 @node JIS7
7395 @subsection JIS7 6545 @subsection JIS7
7396 6546
7397 This encompasses the character sets Printing-ASCII, 6547 This encompasses the character sets Printing-ASCII,
7398 Japanese-JISX0201-Roman (the left half of JISX0201; this character set 6548 Japanese-JISX0201-Roman (the left half of JISX0201; this character set
7399 is very similar to Printing-ASCII and is a 94-character charset), 6549 is very similar to Printing-ASCII and is a 94-character charset),
7424 0x1B 0x28 0x42 ESC ( B invoke Printing-ASCII 6574 0x1B 0x28 0x42 ESC ( B invoke Printing-ASCII
7425 @end example 6575 @end example
7426 6576
7427 Initially, Printing-ASCII is invoked. 6577 Initially, Printing-ASCII is invoked.
7428 6578
7429 @node Internal Mule Encodings, CCL, Encodings, MULE Character Sets and Encodings 6579 @node Internal Mule Encodings
7430 @section Internal Mule Encodings 6580 @section Internal Mule Encodings
7431 6581
7432 In XEmacs/Mule, each character set is assigned a unique number, called a 6582 In XEmacs/Mule, each character set is assigned a unique number, called a
7433 @dfn{leading byte}. This is used in the encodings of a character. 6583 @dfn{leading byte}. This is used in the encodings of a character.
7434 Leading bytes are in the range 0x80 - 0xFF (except for ASCII, which has 6584 Leading bytes are in the range 0x80 - 0xFF (except for ASCII, which has
7470 @menu 6620 @menu
7471 * Internal String Encoding:: 6621 * Internal String Encoding::
7472 * Internal Character Encoding:: 6622 * Internal Character Encoding::
7473 @end menu 6623 @end menu
7474 6624
7475 @node Internal String Encoding, Internal Character Encoding, Internal Mule Encodings, Internal Mule Encodings 6625 @node Internal String Encoding
7476 @subsection Internal String Encoding 6626 @subsection Internal String Encoding
7477 6627
7478 ASCII characters are encoded using their position code directly. Other 6628 ASCII characters are encoded using their position code directly. Other
7479 characters are encoded using their leading byte followed by their 6629 characters are encoded using their leading byte followed by their
7480 position code(s) with the high bit set. Characters in private character 6630 position code(s) with the high bit set. Characters in private character
7520 None of the standard non-modal encodings meet all of these 6670 None of the standard non-modal encodings meet all of these
7521 conditions. For example, EUC satisfies only (2) and (3), while 6671 conditions. For example, EUC satisfies only (2) and (3), while
7522 Shift-JIS and Big5 (not yet described) satisfy only (2). (All 6672 Shift-JIS and Big5 (not yet described) satisfy only (2). (All
7523 non-modal encodings must satisfy (2), in order to be unambiguous.) 6673 non-modal encodings must satisfy (2), in order to be unambiguous.)
7524 6674
7525 @node Internal Character Encoding, , Internal String Encoding, Internal Mule Encodings 6675 @node Internal Character Encoding
7526 @subsection Internal Character Encoding 6676 @subsection Internal Character Encoding
7527 6677
7528 One 19-bit word represents a single character. The word is 6678 One 19-bit word represents a single character. The word is
7529 separated into three fields: 6679 separated into three fields:
7530 6680
7555 @end example 6705 @end example
7556 6706
7557 Note that character codes 0 - 255 are the same as the ``binary encoding'' 6707 Note that character codes 0 - 255 are the same as the ``binary encoding''
7558 described above. 6708 described above.
7559 6709
7560 @node CCL, , Internal Mule Encodings, MULE Character Sets and Encodings 6710 @node CCL
7561 @section CCL 6711 @section CCL
7562 6712
7563 @example 6713 @example
7564 CCL PROGRAM SYNTAX: 6714 CCL PROGRAM SYNTAX:
7565 CCL_PROGRAM := (CCL_MAIN_BLOCK 6715 CCL_PROGRAM := (CCL_MAIN_BLOCK
7609 this is the code executed to handle any stuff that needs to be done 6759 this is the code executed to handle any stuff that needs to be done
7610 (e.g. designating back to ASCII and left-to-right mode) after all 6760 (e.g. designating back to ASCII and left-to-right mode) after all
7611 other encoded/decoded data has been written out. This is not used for 6761 other encoded/decoded data has been written out. This is not used for
7612 charset CCL programs. 6762 charset CCL programs.
7613 6763
7614 REGISTER: 0..7 -- referred by RRR or rrr 6764 REGISTER: 0..7 -- refered by RRR or rrr
7615 6765
7616 OPERATOR BIT FIELD (27-bit): XXXXXXXXXXXXXXX RRR TTTTT 6766 OPERATOR BIT FIELD (27-bit): XXXXXXXXXXXXXXX RRR TTTTT
7617 TTTTT (5-bit): operator type 6767 TTTTT (5-bit): operator type
7618 RRR (3-bit): register number 6768 RRR (3-bit): register number
7619 XXXXXXXXXXXXXXXX (15-bit): 6769 XXXXXXXXXXXXXXXX (15-bit):
7746 * Lstream Types:: Different sorts of things that are streamed. 6896 * Lstream Types:: Different sorts of things that are streamed.
7747 * Lstream Functions:: Functions for working with lstreams. 6897 * Lstream Functions:: Functions for working with lstreams.
7748 * Lstream Methods:: Creating new lstream types. 6898 * Lstream Methods:: Creating new lstream types.
7749 @end menu 6899 @end menu
7750 6900
7751 @node Creating an Lstream, Lstream Types, Lstreams, Lstreams 6901 @node Creating an Lstream
7752 @section Creating an Lstream 6902 @section Creating an Lstream
7753 6903
7754 Lstreams come in different types, depending on what is being interfaced 6904 Lstreams come in different types, depending on what is being interfaced
7755 to. Although the primitive for creating new lstreams is 6905 to. Although the primitive for creating new lstreams is
7756 @code{Lstream_new()}, generally you do not call this directly. Instead, 6906 @code{Lstream_new()}, generally you do not call this directly. Instead,
7777 Open for reading, but ``read'' never returns partial MULE characters. 6927 Open for reading, but ``read'' never returns partial MULE characters.
7778 @item "wc" 6928 @item "wc"
7779 Open for writing, but never writes partial MULE characters. 6929 Open for writing, but never writes partial MULE characters.
7780 @end table 6930 @end table
7781 6931
7782 @node Lstream Types, Lstream Functions, Creating an Lstream, Lstreams 6932 @node Lstream Types
7783 @section Lstream Types 6933 @section Lstream Types
7784 6934
7785 @table @asis 6935 @table @asis
7786 @item stdio 6936 @item stdio
7787 6937
7802 @item decoding 6952 @item decoding
7803 6953
7804 @item encoding 6954 @item encoding
7805 @end table 6955 @end table
7806 6956
7807 @node Lstream Functions, Lstream Methods, Lstream Types, Lstreams 6957 @node Lstream Functions
7808 @section Lstream Functions 6958 @section Lstream Functions
7809 6959
7810 @deftypefun {Lstream *} Lstream_new (Lstream_implementation *@var{imp}, const char *@var{mode}) 6960 @deftypefun {Lstream *} Lstream_new (Lstream_implementation *@var{imp}, CONST char *@var{mode})
7811 Allocate and return a new Lstream. This function is not really meant to 6961 Allocate and return a new Lstream. This function is not really meant to
7812 be called directly; rather, each stream type should provide its own 6962 be called directly; rather, each stream type should provide its own
7813 stream creation function, which creates the stream and does any other 6963 stream creation function, which creates the stream and does any other
7814 necessary creation stuff (e.g. opening a file). 6964 necessary creation stuff (e.g. opening a file).
7815 @end deftypefun 6965 @end deftypefun
7838 @end deftypefn 6988 @end deftypefn
7839 6989
7840 @deftypefn Macro void Lstream_ungetc (Lstream *@var{stream}, int @var{c}) 6990 @deftypefn Macro void Lstream_ungetc (Lstream *@var{stream}, int @var{c})
7841 Push one byte back onto the input queue. This will be the next byte 6991 Push one byte back onto the input queue. This will be the next byte
7842 read from the stream. Any number of bytes can be pushed back and will 6992 read from the stream. Any number of bytes can be pushed back and will
7843 be read in the reverse order they were pushed back---most recent 6993 be read in the reverse order they were pushed back -- most recent
7844 first. (This is necessary for consistency---if there are a number of 6994 first. (This is necessary for consistency -- if there are a number of
7845 bytes that have been unread and I read and unread a byte, it needs to be 6995 bytes that have been unread and I read and unread a byte, it needs to be
7846 the first to be read again.) This is a macro and so it is very 6996 the first to be read again.) This is a macro and so it is very
7847 efficient. The @var{c} argument is only evaluated once but the @var{stream} 6997 efficient. The @var{c} argument is only evaluated once but the @var{stream}
7848 argument is evaluated more than once. 6998 argument is evaluated more than once.
7849 @end deftypefn 6999 @end deftypefn
7852 @deftypefunx int Lstream_fgetc (Lstream *@var{stream}) 7002 @deftypefunx int Lstream_fgetc (Lstream *@var{stream})
7853 @deftypefunx void Lstream_fungetc (Lstream *@var{stream}, int @var{c}) 7003 @deftypefunx void Lstream_fungetc (Lstream *@var{stream}, int @var{c})
7854 Function equivalents of the above macros. 7004 Function equivalents of the above macros.
7855 @end deftypefun 7005 @end deftypefun
7856 7006
7857 @deftypefun ssize_t Lstream_read (Lstream *@var{stream}, void *@var{data}, size_t @var{size}) 7007 @deftypefun int Lstream_read (Lstream *@var{stream}, void *@var{data}, int @var{size})
7858 Read @var{size} bytes of @var{data} from the stream. Return the number 7008 Read @var{size} bytes of @var{data} from the stream. Return the number
7859 of bytes read. 0 means EOF. -1 means an error occurred and no bytes 7009 of bytes read. 0 means EOF. -1 means an error occurred and no bytes
7860 were read. 7010 were read.
7861 @end deftypefun 7011 @end deftypefun
7862 7012
7863 @deftypefun ssize_t Lstream_write (Lstream *@var{stream}, void *@var{data}, size_t @var{size}) 7013 @deftypefun int Lstream_write (Lstream *@var{stream}, void *@var{data}, int @var{size})
7864 Write @var{size} bytes of @var{data} to the stream. Return the number 7014 Write @var{size} bytes of @var{data} to the stream. Return the number
7865 of bytes written. -1 means an error occurred and no bytes were written. 7015 of bytes written. -1 means an error occurred and no bytes were written.
7866 @end deftypefun 7016 @end deftypefun
7867 7017
7868 @deftypefun void Lstream_unread (Lstream *@var{stream}, void *@var{data}, size_t @var{size}) 7018 @deftypefun void Lstream_unread (Lstream *@var{stream}, void *@var{data}, int @var{size})
7869 Push back @var{size} bytes of @var{data} onto the input queue. The next 7019 Push back @var{size} bytes of @var{data} onto the input queue. The next
7870 call to @code{Lstream_read()} with the same size will read the same 7020 call to @code{Lstream_read()} with the same size will read the same
7871 bytes back. Note that this will be the case even if there is other 7021 bytes back. Note that this will be the case even if there is other
7872 pending unread data. 7022 pending unread data.
7873 @end deftypefun 7023 @end deftypefun
7877 @end deftypefun 7027 @end deftypefun
7878 7028
7879 @deftypefun void Lstream_reopen (Lstream *@var{stream}) 7029 @deftypefun void Lstream_reopen (Lstream *@var{stream})
7880 Reopen a closed stream. This enables I/O on it again. This is not 7030 Reopen a closed stream. This enables I/O on it again. This is not
7881 meant to be called except from a wrapper routine that reinitializes 7031 meant to be called except from a wrapper routine that reinitializes
7882 variables and such---the close routine may well have freed some 7032 variables and such -- the close routine may well have freed some
7883 necessary storage structures, for example. 7033 necessary storage structures, for example.
7884 @end deftypefun 7034 @end deftypefun
7885 7035
7886 @deftypefun void Lstream_rewind (Lstream *@var{stream}) 7036 @deftypefun void Lstream_rewind (Lstream *@var{stream})
7887 Rewind the stream to the beginning. 7037 Rewind the stream to the beginning.
7888 @end deftypefun 7038 @end deftypefun
7889 7039
7890 @node Lstream Methods, , Lstream Functions, Lstreams 7040 @node Lstream Methods
7891 @section Lstream Methods 7041 @section Lstream Methods
7892 7042
7893 @deftypefn {Lstream Method} ssize_t reader (Lstream *@var{stream}, unsigned char *@var{data}, size_t @var{size}) 7043 @deftypefn {Lstream Method} int reader (Lstream *@var{stream}, unsigned char *@var{data}, int @var{size})
7894 Read some data from the stream's end and store it into @var{data}, which 7044 Read some data from the stream's end and store it into @var{data}, which
7895 can hold @var{size} bytes. Return the number of bytes read. A return 7045 can hold @var{size} bytes. Return the number of bytes read. A return
7896 value of 0 means no bytes can be read at this time. This may be because 7046 value of 0 means no bytes can be read at this time. This may be because
7897 of an EOF, or because there is a granularity greater than one byte that 7047 of an EOF, or because there is a granularity greater than one byte that
7898 the stream imposes on the returned data, and @var{size} is less than 7048 the stream imposes on the returned data, and @var{size} is less than
7905 calls @code{Lstream_read()} with a very small size. 7055 calls @code{Lstream_read()} with a very small size.
7906 7056
7907 This function can be @code{NULL} if the stream is output-only. 7057 This function can be @code{NULL} if the stream is output-only.
7908 @end deftypefn 7058 @end deftypefn
7909 7059
7910 @deftypefn {Lstream Method} ssize_t writer (Lstream *@var{stream}, const unsigned char *@var{data}, size_t @var{size}) 7060 @deftypefn {Lstream Method} int writer (Lstream *@var{stream}, CONST unsigned char *@var{data}, int @var{size})
7911 Send some data to the stream's end. Data to be sent is in @var{data} 7061 Send some data to the stream's end. Data to be sent is in @var{data}
7912 and is @var{size} bytes. Return the number of bytes sent. This 7062 and is @var{size} bytes. Return the number of bytes sent. This
7913 function can send and return fewer bytes than is passed in; in that 7063 function can send and return fewer bytes than is passed in; in that
7914 case, the function will just be called again until there is no data left 7064 case, the function will just be called again until there is no data left
7915 or 0 is returned. A return value of 0 means that no more data can be 7065 or 0 is returned. A return value of 0 means that no more data can be
7923 @deftypefn {Lstream Method} int rewinder (Lstream *@var{stream}) 7073 @deftypefn {Lstream Method} int rewinder (Lstream *@var{stream})
7924 Rewind the stream. If this is @code{NULL}, the stream is not seekable. 7074 Rewind the stream. If this is @code{NULL}, the stream is not seekable.
7925 @end deftypefn 7075 @end deftypefn
7926 7076
7927 @deftypefn {Lstream Method} int seekable_p (Lstream *@var{stream}) 7077 @deftypefn {Lstream Method} int seekable_p (Lstream *@var{stream})
7928 Indicate whether this stream is seekable---i.e. it can be rewound. 7078 Indicate whether this stream is seekable -- i.e. it can be rewound.
7929 This method is ignored if the stream does not have a rewind method. If 7079 This method is ignored if the stream does not have a rewind method. If
7930 this method is not present, the result is determined by whether a rewind 7080 this method is not present, the result is determined by whether a rewind
7931 method is present. 7081 method is present.
7932 @end deftypefn 7082 @end deftypefn
7933 7083
7960 * Point:: 7110 * Point::
7961 * Window Hierarchy:: 7111 * Window Hierarchy::
7962 * The Window Object:: 7112 * The Window Object::
7963 @end menu 7113 @end menu
7964 7114
7965 @node Introduction to Consoles; Devices; Frames; Windows, Point, Consoles; Devices; Frames; Windows, Consoles; Devices; Frames; Windows 7115 @node Introduction to Consoles; Devices; Frames; Windows
7966 @section Introduction to Consoles; Devices; Frames; Windows 7116 @section Introduction to Consoles; Devices; Frames; Windows
7967 7117
7968 A window-system window that you see on the screen is called a 7118 A window-system window that you see on the screen is called a
7969 @dfn{frame} in Emacs terminology. Each frame is subdivided into one or 7119 @dfn{frame} in Emacs terminology. Each frame is subdivided into one or
7970 more non-overlapping panes, called (confusingly) @dfn{windows}. Each 7120 more non-overlapping panes, called (confusingly) @dfn{windows}. Each
7995 There is a separate Lisp object type for each of these four concepts. 7145 There is a separate Lisp object type for each of these four concepts.
7996 Furthermore, there is logically a @dfn{selected console}, 7146 Furthermore, there is logically a @dfn{selected console},
7997 @dfn{selected display}, @dfn{selected frame}, and @dfn{selected window}. 7147 @dfn{selected display}, @dfn{selected frame}, and @dfn{selected window}.
7998 Each of these objects is distinguished in various ways, such as being the 7148 Each of these objects is distinguished in various ways, such as being the
7999 default object for various functions that act on objects of that type. 7149 default object for various functions that act on objects of that type.
8000 Note that every containing object remembers the ``selected'' object 7150 Note that every containing object rememembers the ``selected'' object
8001 among the objects that it contains: e.g. not only is there a selected 7151 among the objects that it contains: e.g. not only is there a selected
8002 window, but every frame remembers the last window in it that was 7152 window, but every frame remembers the last window in it that was
8003 selected, and changing the selected frame causes the remembered window 7153 selected, and changing the selected frame causes the remembered window
8004 within it to become the selected window. Similar relationships apply 7154 within it to become the selected window. Similar relationships apply
8005 for consoles to devices and devices to frames. 7155 for consoles to devices and devices to frames.
8006 7156
8007 @node Point, Window Hierarchy, Introduction to Consoles; Devices; Frames; Windows, Consoles; Devices; Frames; Windows 7157 @node Point
8008 @section Point 7158 @section Point
8009 7159
8010 Recall that every buffer has a current insertion position, called 7160 Recall that every buffer has a current insertion position, called
8011 @dfn{point}. Now, two or more windows may be displaying the same buffer, 7161 @dfn{point}. Now, two or more windows may be displaying the same buffer,
8012 and the text cursor in the two windows (i.e. @code{point}) can be in 7162 and the text cursor in the two windows (i.e. @code{point}) can be in
8023 want to retrieve the correct value of @code{point} for a window, 7173 want to retrieve the correct value of @code{point} for a window,
8024 you must special-case on the selected window and retrieve the 7174 you must special-case on the selected window and retrieve the
8025 buffer's point instead. This is related to why @code{save-window-excursion} 7175 buffer's point instead. This is related to why @code{save-window-excursion}
8026 does not save the selected window's value of @code{point}. 7176 does not save the selected window's value of @code{point}.
8027 7177
8028 @node Window Hierarchy, The Window Object, Point, Consoles; Devices; Frames; Windows 7178 @node Window Hierarchy
8029 @section Window Hierarchy 7179 @section Window Hierarchy
8030 @cindex window hierarchy 7180 @cindex window hierarchy
8031 @cindex hierarchy of windows 7181 @cindex hierarchy of windows
8032 7182
8033 If a frame contains multiple windows (panes), they are always created 7183 If a frame contains multiple windows (panes), they are always created
8092 @dfn{one above the other}. 7242 @dfn{one above the other}.
8093 7243
8094 @item 7244 @item
8095 Leaf windows also have markers in their @code{start} (the 7245 Leaf windows also have markers in their @code{start} (the
8096 first buffer position displayed in the window) and @code{pointm} 7246 first buffer position displayed in the window) and @code{pointm}
8097 (the window's stashed value of @code{point}---see above) fields, 7247 (the window's stashed value of @code{point} -- see above) fields,
8098 while combination windows have nil in these fields. 7248 while combination windows have nil in these fields.
8099 7249
8100 @item 7250 @item
8101 The list of children for a window is threaded through the 7251 The list of children for a window is threaded through the
8102 @code{next} and @code{prev} fields of each child window. 7252 @code{next} and @code{prev} fields of each child window.
8108 does nothing except set a special @code{dead} bit to 1 and clear out the 7258 does nothing except set a special @code{dead} bit to 1 and clear out the
8109 @code{next}, @code{prev}, @code{hchild}, and @code{vchild} fields, for 7259 @code{next}, @code{prev}, @code{hchild}, and @code{vchild} fields, for
8110 GC purposes. 7260 GC purposes.
8111 7261
8112 @item 7262 @item
8113 Most frames actually have two top-level windows---one for the 7263 Most frames actually have two top-level windows -- one for the
8114 minibuffer and one (the @dfn{root}) for everything else. The modeline 7264 minibuffer and one (the @dfn{root}) for everything else. The modeline
8115 (if present) separates these two. The @code{next} field of the root 7265 (if present) separates these two. The @code{next} field of the root
8116 points to the minibuffer, and the @code{prev} field of the minibuffer 7266 points to the minibuffer, and the @code{prev} field of the minibuffer
8117 points to the root. The other @code{next} and @code{prev} fields are 7267 points to the root. The other @code{next} and @code{prev} fields are
8118 @code{nil}, and the frame points to both of these windows. 7268 @code{nil}, and the frame points to both of these windows.
8121 frames have no root window, and the @code{next} of the minibuffer window 7271 frames have no root window, and the @code{next} of the minibuffer window
8122 is @code{nil} but the @code{prev} points to itself. (#### This is an 7272 is @code{nil} but the @code{prev} points to itself. (#### This is an
8123 artifact that should be fixed.) 7273 artifact that should be fixed.)
8124 @end enumerate 7274 @end enumerate
8125 7275
8126 @node The Window Object, , Window Hierarchy, Consoles; Devices; Frames; Windows 7276 @node The Window Object
8127 @section The Window Object 7277 @section The Window Object
8128 7278
8129 Windows have the following accessible fields: 7279 Windows have the following accessible fields:
8130 7280
8131 @table @code 7281 @table @code
8250 @end enumerate 7400 @end enumerate
8251 7401
8252 @menu 7402 @menu
8253 * Critical Redisplay Sections:: 7403 * Critical Redisplay Sections::
8254 * Line Start Cache:: 7404 * Line Start Cache::
8255 * Redisplay Piece by Piece::
8256 @end menu 7405 @end menu
8257 7406
8258 @node Critical Redisplay Sections, Line Start Cache, The Redisplay Mechanism, The Redisplay Mechanism 7407 @node Critical Redisplay Sections
8259 @section Critical Redisplay Sections 7408 @section Critical Redisplay Sections
8260 @cindex critical redisplay sections 7409 @cindex critical redisplay sections
8261 7410
8262 Within this section, we are defenseless and assume that the 7411 Within this section, we are defenseless and assume that the
8263 following cannot happen: 7412 following cannot happen:
8285 we simply return. #### We should abort instead. 7434 we simply return. #### We should abort instead.
8286 7435
8287 #### If a frame-size change does occur we should probably 7436 #### If a frame-size change does occur we should probably
8288 actually be preempting redisplay. 7437 actually be preempting redisplay.
8289 7438
8290 @node Line Start Cache, Redisplay Piece by Piece, Critical Redisplay Sections, The Redisplay Mechanism 7439 @node Line Start Cache
8291 @section Line Start Cache 7440 @section Line Start Cache
8292 @cindex line start cache 7441 @cindex line start cache
8293 7442
8294 The traditional scrolling code in Emacs breaks in a variable height 7443 The traditional scrolling code in Emacs breaks in a variable height
8295 world. It depends on the key assumption that the number of lines that 7444 world. It depends on the key assumption that the number of lines that
8329 information basically for free. In those cases where a user is simply 7478 information basically for free. In those cases where a user is simply
8330 scrolling around viewing a buffer there is a high probability that this 7479 scrolling around viewing a buffer there is a high probability that this
8331 is sufficient to always provide the needed information. The second 7480 is sufficient to always provide the needed information. The second
8332 thing we can do is be smart about invalidating the cache. 7481 thing we can do is be smart about invalidating the cache.
8333 7482
8334 TODO---Be smart about invalidating the cache. Potential places: 7483 TODO -- Be smart about invalidating the cache. Potential places:
8335 7484
8336 @itemize @bullet 7485 @itemize @bullet
8337 @item 7486 @item
8338 Insertions at end-of-line which don't cause line-wraps do not alter the 7487 Insertions at end-of-line which don't cause line-wraps do not alter the
8339 starting positions of any display lines. These types of buffer 7488 starting positions of any display lines. These types of buffer
8346 @end itemize 7495 @end itemize
8347 7496
8348 In case you're wondering, the Second Golden Rule of Redisplay is not 7497 In case you're wondering, the Second Golden Rule of Redisplay is not
8349 applicable. 7498 applicable.
8350 7499
8351 @node Redisplay Piece by Piece, , Line Start Cache, The Redisplay Mechanism 7500 @node Extents, Faces and Glyphs, The Redisplay Mechanism, Top
8352 @section Redisplay Piece by Piece
8353 @cindex Redisplay Piece by Piece
8354
8355 As you can begin to see redisplay is complex and also not well
8356 documented. Chuck no longer works on XEmacs so this section is my take
8357 on the workings of redisplay.
8358
8359 Redisplay happens in three phases:
8360
8361 @enumerate
8362 @item
8363 Determine desired display in area that needs redisplay.
8364 Implemented by @code{redisplay.c}
8365 @item
8366 Compare desired display with current display
8367 Implemented by @code{redisplay-output.c}
8368 @item
8369 Output changes Implemented by @code{redisplay-output.c},
8370 @code{redisplay-x.c}, @code{redisplay-msw.c} and @code{redisplay-tty.c}
8371 @end enumerate
8372
8373 Steps 1 and 2 are device-independent and relatively complex. Step 3 is
8374 mostly device-dependent.
8375
8376 Determining the desired display
8377
8378 Display attributes are stored in @code{display_line} structures. Each
8379 @code{display_line} consists of a set of @code{display_block}'s and each
8380 @code{display_block} contains a number of @code{rune}'s. Generally
8381 dynarr's of @code{display_line}'s are held by each window representing
8382 the current display and the desired display.
8383
8384 The @code{display_line} structures are tightly tied to buffers which
8385 presents a problem for redisplay as this connection is bogus for the
8386 modeline. Hence the @code{display_line} generation routines are
8387 duplicated for generating the modeline. This means that the modeline
8388 display code has many bugs that the standard redisplay code does not.
8389
8390 The guts of @code{display_line} generation are in
8391 @code{create_text_block}, which creates a single display line for the
8392 desired locale. This incrementally parses the characters on the current
8393 line and generates redisplay structures for each.
8394
8395 Gutter redisplay is different. Because the data to display is stored in
8396 a string we cannot use @code{create_text_block}. Instead we use
8397 @code{create_text_string_block} which performs the same function as
8398 @code{create_text_block} but for strings. Many of the complexities of
8399 @code{create_text_block} to do with cursor handling and selective
8400 display have been removed.
8401
8402 @node Extents, Faces, The Redisplay Mechanism, Top
8403 @chapter Extents 7501 @chapter Extents
8404 7502
8405 @menu 7503 @menu
8406 * Introduction to Extents:: Extents are ranges over text, with properties. 7504 * Introduction to Extents:: Extents are ranges over text, with properties.
8407 * Extent Ordering:: How extents are ordered internally. 7505 * Extent Ordering:: How extents are ordered internally.
8408 * Format of the Extent Info:: The extent information in a buffer or string. 7506 * Format of the Extent Info:: The extent information in a buffer or string.
8409 * Zero-Length Extents:: A weird special case. 7507 * Zero-Length Extents:: A weird special case.
8410 * Mathematics of Extent Ordering:: A rigorous foundation. 7508 * Mathematics of Extent Ordering:: A rigorous foundation.
8411 * Extent Fragments:: Cached information useful for redisplay. 7509 * Extent Fragments:: Cached information useful for redisplay.
8412 @end menu 7510 @end menu
8413 7511
8414 @node Introduction to Extents, Extent Ordering, Extents, Extents 7512 @node Introduction to Extents
8415 @section Introduction to Extents 7513 @section Introduction to Extents
8416 7514
8417 Extents are regions over a buffer, with a start and an end position 7515 Extents are regions over a buffer, with a start and an end position
8418 denoting the region of the buffer included in the extent. In 7516 denoting the region of the buffer included in the extent. In
8419 addition, either end can be closed or open, meaning that the endpoint 7517 addition, either end can be closed or open, meaning that the endpoint
8431 automatically go inside or out of extents as necessary with no 7529 automatically go inside or out of extents as necessary with no
8432 further work needing to be done. It didn't work out that way, 7530 further work needing to be done. It didn't work out that way,
8433 however, and just ended up complexifying and buggifying all the 7531 however, and just ended up complexifying and buggifying all the
8434 rest of the code.) 7532 rest of the code.)
8435 7533
8436 @node Extent Ordering, Format of the Extent Info, Introduction to Extents, Extents 7534 @node Extent Ordering
8437 @section Extent Ordering 7535 @section Extent Ordering
8438 7536
8439 Extents are compared using memory indices. There are two orderings 7537 Extents are compared using memory indices. There are two orderings
8440 for extents and both orders are kept current at all times. The normal 7538 for extents and both orders are kept current at all times. The normal
8441 or @dfn{display} order is as follows: 7539 or @dfn{display} order is as follows:
8465 The display order and the e-order are complementary orders: any 7563 The display order and the e-order are complementary orders: any
8466 theorem about the display order also applies to the e-order if you swap 7564 theorem about the display order also applies to the e-order if you swap
8467 all occurrences of ``display order'' and ``e-order'', ``less than'' and 7565 all occurrences of ``display order'' and ``e-order'', ``less than'' and
8468 ``greater than'', and ``extent start'' and ``extent end''. 7566 ``greater than'', and ``extent start'' and ``extent end''.
8469 7567
8470 @node Format of the Extent Info, Zero-Length Extents, Extent Ordering, Extents 7568 @node Format of the Extent Info
8471 @section Format of the Extent Info 7569 @section Format of the Extent Info
8472 7570
8473 An extent-info structure consists of a list of the buffer or string's 7571 An extent-info structure consists of a list of the buffer or string's
8474 extents and a @dfn{stack of extents} that lists all of the extents over 7572 extents and a @dfn{stack of extents} that lists all of the extents over
8475 a particular position. The stack-of-extents info is used for 7573 a particular position. The stack-of-extents info is used for
8476 optimization purposes---it basically caches some info that might 7574 optimization purposes -- it basically caches some info that might
8477 be expensive to compute. Certain otherwise hard computations are easy 7575 be expensive to compute. Certain otherwise hard computations are easy
8478 given the stack of extents over a particular position, and if the 7576 given the stack of extents over a particular position, and if the
8479 stack of extents over a nearby position is known (because it was 7577 stack of extents over a nearby position is known (because it was
8480 calculated at some prior point in time), it's easy to move the stack 7578 calculated at some prior point in time), it's easy to move the stack
8481 of extents to the proper position. 7579 of extents to the proper position.
8499 between two extents. Note also that callers of these functions should 7597 between two extents. Note also that callers of these functions should
8500 not be aware of the fact that the extent list is implemented as an 7598 not be aware of the fact that the extent list is implemented as an
8501 array, except for the fact that positions are integers (this should be 7599 array, except for the fact that positions are integers (this should be
8502 generalized to handle integers and linked list equally well). 7600 generalized to handle integers and linked list equally well).
8503 7601
8504 @node Zero-Length Extents, Mathematics of Extent Ordering, Format of the Extent Info, Extents 7602 @node Zero-Length Extents
8505 @section Zero-Length Extents 7603 @section Zero-Length Extents
8506 7604
8507 Extents can be zero-length, and will end up that way if their endpoints 7605 Extents can be zero-length, and will end up that way if their endpoints
8508 are explicitly set that way or if their detachable property is nil 7606 are explicitly set that way or if their detachable property is nil
8509 and all the text in the extent is deleted. (The exception is open-open 7607 and all the text in the extent is deleted. (The exception is open-open
8528 7626
8529 Note that closed-open, non-detachable zero-length extents behave 7627 Note that closed-open, non-detachable zero-length extents behave
8530 exactly like markers and that open-closed, non-detachable zero-length 7628 exactly like markers and that open-closed, non-detachable zero-length
8531 extents behave like the ``point-type'' marker in Mule. 7629 extents behave like the ``point-type'' marker in Mule.
8532 7630
8533 @node Mathematics of Extent Ordering, Extent Fragments, Zero-Length Extents, Extents 7631 @node Mathematics of Extent Ordering
8534 @section Mathematics of Extent Ordering 7632 @section Mathematics of Extent Ordering
8535 @cindex extent mathematics 7633 @cindex extent mathematics
8536 @cindex mathematics of extents 7634 @cindex mathematics of extents
8537 @cindex extent ordering 7635 @cindex extent ordering
8538 7636
8663 Proof: If @math{F2} does not include @math{I} then its start index is 7761 Proof: If @math{F2} does not include @math{I} then its start index is
8664 greater than @math{I} and thus it is greater than any extent in 7762 greater than @math{I} and thus it is greater than any extent in
8665 @math{S}, including @math{F}. Otherwise, @math{F2} includes @math{I} 7763 @math{S}, including @math{F}. Otherwise, @math{F2} includes @math{I}
8666 and thus is in @math{S}, and thus @math{F2 >= F}. 7764 and thus is in @math{S}, and thus @math{F2 >= F}.
8667 7765
8668 @node Extent Fragments, , Mathematics of Extent Ordering, Extents 7766 @node Extent Fragments
8669 @section Extent Fragments 7767 @section Extent Fragments
8670 @cindex extent fragment 7768 @cindex extent fragment
8671 7769
8672 Imagine that the buffer is divided up into contiguous, non-overlapping 7770 Imagine that the buffer is divided up into contiguous, non-overlapping
8673 @dfn{runs} of text such that no extent starts or ends within a run 7771 @dfn{runs} of text such that no extent starts or ends within a run
8674 (extents that abut the run don't count). 7772 (extents that abut the run don't count).
8675 7773
8676 An extent fragment is a structure that holds data about the run that 7774 An extent fragment is a structure that holds data about the run that
8677 contains a particular buffer position (if the buffer position is at the 7775 contains a particular buffer position (if the buffer position is at the
8678 junction of two runs, the run after the position is used)---the 7776 junction of two runs, the run after the position is used) -- the
8679 beginning and end of the run, a list of all of the extents in that run, 7777 beginning and end of the run, a list of all of the extents in that run,
8680 the @dfn{merged face} that results from merging all of the faces 7778 the @dfn{merged face} that results from merging all of the faces
8681 corresponding to those extents, the begin and end glyphs at the 7779 corresponding to those extents, the begin and end glyphs at the
8682 beginning of the run, etc. This is the information that redisplay needs 7780 beginning of the run, etc. This is the information that redisplay needs
8683 in order to display this run. 7781 in order to display this run.
8685 Extent fragments have to be very quick to update to a new buffer 7783 Extent fragments have to be very quick to update to a new buffer
8686 position when moving linearly through the buffer. They rely on the 7784 position when moving linearly through the buffer. They rely on the
8687 stack-of-extents code, which does the heavy-duty algorithmic work of 7785 stack-of-extents code, which does the heavy-duty algorithmic work of
8688 determining which extents overly a particular position. 7786 determining which extents overly a particular position.
8689 7787
8690 @node Faces, Glyphs, Extents, Top 7788 @node Faces and Glyphs, Specifiers, Extents, Top
8691 @chapter Faces 7789 @chapter Faces and Glyphs
8692 7790
8693 Not yet documented. 7791 Not yet documented.
8694 7792
8695 @node Glyphs, Specifiers, Faces, Top 7793 @node Specifiers, Menus, Faces and Glyphs, Top
8696 @chapter Glyphs
8697
8698 Glyphs are graphical elements that can be displayed in XEmacs buffers or
8699 gutters. We use the term graphical element here in the broadest possible
8700 sense since glyphs can be as mundane as text to as arcane as a native
8701 tab widget.
8702
8703 In XEmacs, glyphs represent the uninstantiated state of graphical
8704 elements, i.e. they hold all the information necessary to produce an
8705 image on-screen but the image does not exist at this stage.
8706
8707 Glyphs are lazily instantiated by calling one of the glyph
8708 functions. This usually occurs within redisplay when
8709 @code{Fglyph_height} is called. Instantiation causes an image-instance
8710 to be created and cached. This cache is on a device basis for all glyphs
8711 except glyph-widgets, and on a window basis for glyph widgets. The
8712 caching is done by @code{image_instantiate} and is necessary because it
8713 is generally possible to display an image-instance in multiple
8714 domains. For instance if we create a Pixmap, we can actually display
8715 this on multiple windows - even though we only need a single Pixmap
8716 instance to do this. If caching wasn't done then it would be necessary
8717 to create image-instances for every displayable occurrence of a glyph -
8718 and every usage - and this would be extremely memory and cpu intensive.
8719
8720 Widget-glyphs (a.k.a native widgets) are not cached in this way. This is
8721 because widget-glyph image-instances on screen are toolkit windows, and
8722 thus cannot be reused in multiple XEmacs domains. Thus widget-glyphs are
8723 cached on a window basis.
8724
8725 Any action on a glyph first consults the cache before actually
8726 instantiating a widget.
8727
8728 @section Widget-Glyphs in the MS-Windows Environment
8729
8730 To Do
8731
8732 @section Widget-Glyphs in the X Environment
8733
8734 Widget-glyphs under X make heavy use of lwlib for manipulating the
8735 native toolkit objects. This is primarily so that different toolkits can
8736 be supported for widget-glyphs, just as they are supported for features
8737 such as menubars etc.
8738
8739 Lwlib is extremely poorly documented and quite hairy so here is my
8740 understanding of what goes on.
8741
8742 Lwlib maintains a set of widget_instances which mirror the hierarchical
8743 state of Xt widgets. I think this is so that widgets can be updated and
8744 manipulated generically by the lwlib library. For instance
8745 update_one_widget_instance can cope with multiple types of widget and
8746 multiple types of toolkit. Each element in the widget hierarchy is updated
8747 from its corresponding widget_instance by walking the widget_instance
8748 tree recursively.
8749
8750 This has desirable properties such as lw_modify_all_widgets which is
8751 called from glyphs-x.c and updates all the properties of a widget
8752 without having to know what the widget is or what toolkit it is from.
8753 Unfortunately this also has hairy properties such as making the lwlib
8754 code quite complex. And of course lwlib has to know at some level what
8755 the widget is and how to set its properties.
8756
8757 @node Specifiers, Menus, Glyphs, Top
8758 @chapter Specifiers 7794 @chapter Specifiers
8759 7795
8760 Not yet documented. 7796 Not yet documented.
8761 7797
8762 @node Menus, Subprocesses, Specifiers, Top 7798 @node Menus, Subprocesses, Specifiers, Top
8882 @item tty_name 7918 @item tty_name
8883 The name of the terminal that the subprocess is using, 7919 The name of the terminal that the subprocess is using,
8884 or @code{nil} if it is using pipes. 7920 or @code{nil} if it is using pipes.
8885 @end table 7921 @end table
8886 7922
8887 @node Interface to X Windows, Index , Subprocesses, Top 7923 @node Interface to X Windows, Index, Subprocesses, Top
8888 @chapter Interface to X Windows 7924 @chapter Interface to X Windows
8889 7925
8890 Not yet documented. 7926 Not yet documented.
8891 7927
8892 @include index.texi 7928 @include index.texi
8895 @summarycontents 7931 @summarycontents
8896 @contents 7932 @contents
8897 @c That's all 7933 @c That's all
8898 7934
8899 @bye 7935 @bye
7936