comparison man/internals/internals.texi @ 0:376386a54a3c r19-14

Import from CVS: tag r19-14
author cvs
date Mon, 13 Aug 2007 08:45:50 +0200
parents
children ac2d302a0011
comparison
equal deleted inserted replaced
-1:000000000000 0:376386a54a3c
1 \input texinfo @c -*-texinfo-*-
2 @c %**start of header
3 @setfilename ../../info/internals.info
4 @settitle XEmacs Internals Manual
5 @c %**end of header
6
7 @ifinfo
8
9 Copyright @copyright{} 1992 - 1996 Ben Wing.
10 Copyright @copyright{} 1996 Sun Microsystems.
11 Copyright @copyright{} 1994, 1995 Free Software Foundation.
12 Copyright @copyright{} 1994, 1995 Board of Trustees, University of Illinois.
13
14
15 Permission is granted to make and distribute verbatim copies of this
16 manual provided the copyright notice and this permission notice are
17 preserved on all copies.
18
19 @ignore
20 Permission is granted to process this file through TeX and print the
21 results, provided the printed document carries copying permission notice
22 identical to this one except for the removal of this paragraph (this
23 paragraph not being relevant to the printed manual).
24
25 @end ignore
26 Permission is granted to copy and distribute modified versions of this
27 manual under the conditions for verbatim copying, provided that the
28 entire resulting derived work is distributed under the terms of a
29 permission notice identical to this one.
30
31 Permission is granted to copy and distribute translations of this manual
32 into another language, under the above conditions for modified versions,
33 except that this permission notice may be stated in a translation
34 approved by the Foundation.
35
36 Permission is granted to copy and distribute modified versions of this
37 manual under the conditions for verbatim copying, provided also that the
38 section entitled ``GNU General Public License'' is included exactly as
39 in the original, and provided that the entire resulting derived work is
40 distributed under the terms of a permission notice identical to this
41 one.
42
43 Permission is granted to copy and distribute translations of this manual
44 into another language, under the above conditions for modified versions,
45 except that the section entitled ``GNU General Public License'' may be
46 included in a translation approved by the Free Software Foundation
47 instead of in the original English.
48 @end ifinfo
49
50 @c Combine indices.
51 @synindex cp fn
52 @syncodeindex vr fn
53 @syncodeindex ky fn
54 @syncodeindex pg fn
55 @syncodeindex tp fn
56
57 @setchapternewpage odd
58 @finalout
59
60 @titlepage
61 @title XEmacs Internals Manual
62 @subtitle Version 1.0, March 1996
63
64 @author Ben Wing
65 @page
66 @vskip 0pt plus 1fill
67
68 @noindent
69 Copyright @copyright{} 1992 - 1996 Ben Wing. @*
70 Copyright @copyright{} 1996 Sun Microsystems, Inc. @*
71 Copyright @copyright{} 1994 Free Software Foundation. @*
72 Copyright @copyright{} 1994, 1995 Board of Trustees, University of Illinois.
73
74 @sp 2
75 Version 1.0 @*
76 March, 1996.@*
77
78 Permission is granted to make and distribute verbatim copies of this
79 manual provided the copyright notice and this permission notice are
80 preserved on all copies.
81
82 Permission is granted to copy and distribute modified versions of this
83 manual under the conditions for verbatim copying, provided also that the
84 section entitled ``GNU General Public License'' is included
85 exactly as in the original, and provided that the entire resulting
86 derived work is distributed under the terms of a permission notice
87 identical to this one.
88
89 Permission is granted to copy and distribute translations of this manual
90 into another language, under the above conditions for modified versions,
91 except that the section entitled ``GNU General Public License'' may be
92 included in a translation approved by the Free Software Foundation
93 instead of in the original English.
94 @end titlepage
95 @page
96
97 @node Top, A History of Emacs, (dir), (dir)
98
99 @ifinfo
100 This Info file contains v1.0 of the XEmacs Internals Manual.
101 @end ifinfo
102
103 @menu
104 * A History of Emacs:: Times, dates, important events.
105 * XEmacs From the Outside:: A broad conceptual overview.
106 * The Lisp Language:: An overview.
107 * XEmacs From the Perspective of Building::
108 * XEmacs From the Inside::
109 * The XEmacs Object System (Abstractly Speaking)::
110 * How Lisp Objects Are Represented in C::
111 * Rules When Writing New C Code::
112 * A Summary of the Various XEmacs Modules::
113 * Allocation of Objects in XEmacs Lisp::
114 * Events and the Event Loop::
115 * Evaluation; Stack Frames; Bindings::
116 * Symbols and Variables::
117 * Buffers and Textual Representation::
118 * MULE Character Sets and Encodings::
119 * The Lisp Reader and Compiler::
120 * Lstreams::
121 * Consoles; Devices; Frames; Windows::
122 * The Redisplay Mechanism::
123 * Extents::
124 * Faces and Glyphs::
125 * Specifiers::
126 * Menus::
127 * Subprocesses::
128 * Interface to X Windows::
129 * Index:: Index including concepts, functions, variables,
130 and other terms.
131
132 --- The Detailed Node Listing ---
133
134 Here are other nodes that are inferiors of those already listed,
135 mentioned here so you can get to them in one step:
136
137 A History of Emacs
138
139 * Through Version 18:: Unification prevails.
140 * Lucid Emacs:: One version 19 Emacs.
141 * GNU Emacs 19:: The other version 19 Emacs.
142 * XEmacs:: The continuation of Lucid Emacs.
143
144 Rules When Writing New C Code
145
146 * General Coding Rules::
147 * Writing Lisp Primitives::
148 * Adding Global Lisp Variables::
149
150 A Summary of the Various XEmacs Modules
151
152 * Low-Level Modules::
153 * Basic Lisp Modules::
154 * Modules for Standard Editing Operations::
155 * Editor-Level Control Flow Modules::
156 * Modules for the Basic Displayable Lisp Objects::
157 * Modules for other Display-Related Lisp Objects::
158 * Modules for the Redisplay Mechanism::
159 * Modules for Interfacing with the File System::
160 * Modules for Other Aspects of the Lisp Interpreter and Object System::
161 * Modules for Interfacing with the Operating System::
162 * Modules for Interfacing with X Windows::
163 * Modules for Internationalization::
164
165 Allocation of Objects in XEmacs Lisp
166
167 * Introduction to Allocation::
168 * Garbage Collection::
169 * GCPROing::
170 * Integers and Characters::
171 * Allocation from Frob Blocks::
172 * lrecords::
173 * Low-level allocation::
174 * Pure Space::
175 * Cons::
176 * Vector::
177 * Bit Vector::
178 * Symbol::
179 * Marker::
180 * String::
181 * Bytecode::
182
183 Events and the Event Loop
184
185 * Introduction to Events::
186 * Main Loop::
187 * Specifics of the Event Gathering Mechanism::
188 * Specifics About the Emacs Event::
189 * The Event Stream Callback Routines::
190 * Other Event Loop Functions::
191 * Converting Events::
192 * Dispatching Events; The Command Builder::
193
194 Evaluation; Stack Frames; Bindings
195
196 * Evaluation::
197 * Dynamic Binding; The specbinding Stack; Unwind-Protects::
198 * Simple Special Forms::
199 * Catch and Throw::
200
201 Symbols and Variables
202
203 * Introduction to Symbols::
204 * Obarrays::
205 * Symbol Values::
206
207 Buffers and Textual Representation
208
209 * Introduction to Buffers:: A buffer holds a block of text such as a file.
210 * A Buffer@'s Text:: Representation of the text in a buffer.
211 * Buffer Lists:: Keeping track of all buffers.
212 * Markers and Extents:: Tagging locations within a buffer.
213 * Bufbytes and Emchars:: Representation of individual characters.
214 * The Buffer Object:: The Lisp object corresponding to a buffer.
215
216 MULE Character Sets and Encodings
217
218 * Character Sets::
219 * Encodings::
220 * Internal Mule Encodings::
221
222 Encodings
223
224 * Japanese EUC (Extended Unix Code)::
225 * JIS7::
226
227 Internal Mule Encodings
228
229 * Internal String Encoding::
230 * Internal Character Encoding::
231
232 The Lisp Reader and Compiler
233
234 Lstreams
235
236 Consoles; Devices; Frames; Windows
237
238 * Introduction to Consoles; Devices; Frames; Windows::
239 * Point::
240 * Window Hierarchy::
241
242 The Redisplay Mechanism
243
244 * Critical Redisplay Sections::
245 * Line Start Cache::
246
247 Extents
248
249 * Introduction to Extents:: Extents are ranges over text, with properties.
250 * Extent Ordering:: How extents are ordered internally.
251 * Format of the Extent Info:: The extent information in a buffer or string.
252 * Zero-Length Extents:: A weird special case.
253 * Mathematics of Extent Ordering:: A rigorous foundation.
254 * Extent Fragments:: Cached information useful for redisplay.
255
256 Faces and Glyphs
257
258 Specifiers
259
260 Menus
261
262 Subprocesses
263
264 Interface to X Windows
265
266 @end menu
267
268 @node A History of Emacs, XEmacs From the Outside, Top, Top
269 @chapter A History of Emacs
270 @cindex history of Emacs
271 @cindex Hackers (Steven Levy)
272 @cindex Levy, Steven
273 @cindex ITS (Incompatible Timesharing System)
274 @cindex Stallman, Richard
275 @cindex RMS
276 @cindex MIT
277 @cindex TECO
278 @cindex FSF
279 @cindex Free Software Foundation
280
281 XEmacs is a powerful, customizable text editor and development
282 environment. It began as Lucid Emacs, which was in turn derived from
283 GNU Emacs, a program written by Richard Stallman of the Free Software
284 Foundation. GNU Emacs dates back to the 1970's, and was modelled
285 after a package called ``Emacs'', written in 1976, that was a set of
286 macros on top of TECO, an old, old text editor written at MIT on the
287 DEC PDP 10 under one of the earliest time-sharing operating systems,
288 ITS (Incompatible Timesharing System). (ITS dates back well before
289 Unix.) ITS, TECO, and Emacs were products of a group of people at MIT
290 who called themselves ``hackers'', who shared an idealistic belief
291 system about the free exchange of information and were fanatical in
292 their devotion to and time spent with computers. (The hacker
293 subculture dates back to the late 1950's at MIT and is described in
294 detail in Steven Levy's book @cite{Hackers}. This book also includes
295 a lot of information about Stallman himself and the development of
296 Lisp, a programming language developed at MIT that underlies Emacs.)
297
298 @menu
299 * Through Version 18:: Unification prevails.
300 * Lucid Emacs:: One version 19 Emacs.
301 * GNU Emacs 19:: The other version 19 Emacs.
302 * XEmacs:: The continuation of Lucid Emacs.
303 @end menu
304
305 @node Through Version 18
306 @section Through Version 18
307 @cindex Gosling, James
308 @cindex Great Usenet Renaming
309
310 Although the history of the early versions of GNU Emacs is unclear,
311 the history is well-known from the middle of 1985. A time line is:
312
313 @itemize @bullet
314 @item
315 GNU Emacs version 15 (15.34) was released sometime in 1984 or 1985 and
316 shared some code with a version of Emacs written by James Gosling (the
317 same James Gosling who later created the Java language).
318 @item
319 GNU Emacs version 16 (first released version was 16.56) was released on
320 July 15, 1985. All Gosling code was removed due to potential copyright
321 problems with the code.
322 @item
323 version 16.57: released on September 16, 1985.
324 @item
325 versions 16.58, 16.59: released on September 17, 1985.
326 @item
327 version 16.60: released on September 19, 1985. These later version 16's
328 incorporated patches from the net, esp. for getting Emacs to work under
329 System V.
330 @item
331 version 17.36 (first official v17 release) released on December 20,
332 1985. Included a TeX-able user manual. First official unpatched
333 version that worked on vanilla System V machines.
334 @item
335 version 17.43 (second official v17 release) released on January 25,
336 1986.
337 @item
338 version 17.45 released on January 30, 1986.
339 @item
340 version 17.46 released on February 4, 1986.
341 @item
342 version 17.48 released on February 10, 1986.
343 @item
344 version 17.49 released on February 12, 1986.
345 @item
346 version 17.55 released on March 18, 1986.
347 @item
348 version 17.57 released on March 27, 1986.
349 @item
350 version 17.58 released on April 4, 1986.
351 @item
352 version 17.61 released on April 12, 1986.
353 @item
354 version 17.63 released on May 7, 1986.
355 @item
356 version 17.64 released on May 12, 1986.
357 @item
358 version 18.24 (a beta version) released on October 2, 1986.
359 @item
360 version 18.30 (a beta version) released on November 15, 1986.
361 @item
362 version 18.31 (a beta version) released on November 23, 1986.
363 @item
364 version 18.32 (a beta version) released on December 7, 1986.
365 @item
366 version 18.33 (a beta version) released on December 12, 1986.
367 @item
368 version 18.35 (a beta version) released on January 5, 1987.
369 @item
370 version 18.36 (a beta version) released on January 21, 1987.
371 @item
372 version 18.37 (a beta version) released on February 12, 1987.
373 @item
374 version 18.38 (a beta version) released on March 3, 1987.
375 @item
376 version 18.39 (a beta version) released on March 14, 1987.
377 @item
378 version 18.40 (a beta version) released on March 18, 1987.
379 @item
380 version 18.41 (the first ``official'' release) released on March 22,
381 1987.
382 @item
383 version 18.45 released on June 2, 1987.
384 @item
385 version 18.46 released on June 9, 1987.
386 @item
387 version 18.47 released on June 18, 1987.
388 @item
389 version 18.48 released on September 3, 1987.
390 @item
391 version 18.49 released on September 18, 1987.
392 @item
393 version 18.50 released on February 13, 1988.
394 @item
395 version 18.51 released on May 7, 1988.
396 @item
397 version 18.52 released on September 1, 1988.
398 @item
399 January 27, 1989: The Great Usenet Renaming. net.emacs is now
400 comp.emacs.
401 @item
402 version 18.53 released on February 24, 1989.
403 @item
404 version 18.54 released on April 26, 1989.
405 @item
406 version 18.55 released on August 23, 1989. This is the earliest version
407 that is still available by FTP.
408 @item
409 version 18.56 released on January 17, 1991.
410 @item
411 version 18.57 released late January, 1991.
412 @item
413 version 18.58 released ?????.
414 @item
415 version 18.59 released October 31, 1992.
416 @end itemize
417
418 @node Lucid Emacs
419 @section Lucid Emacs
420 @cindex Lucid Emacs
421 @cindex Lucid Inc.
422 @cindex Energize
423 @cindex Epoch
424
425 Lucid Emacs was developed by the (now-defunct) Lucid Inc., a maker of
426 C++ and Lisp development environments. It began when Lucid decided they
427 wanted to use Emacs as the editor and cornerstone of their C++
428 development environment (called ``Energize''). They needed many features
429 that were not available in the existing version of GNU Emacs (version
430 18.5something), in particular good and integrated support for GUI
431 elements such as mouse support, multiple fonts, multiple window-system
432 windows, etc. A branch of GNU Emacs called Epoch, written at the
433 University of Illinois, existed that supplied many of these features;
434 however, Lucid needed more than what existed in Epoch. At the time, the
435 Free Software Foundation was working on version 19 of Emacs (this was
436 sometime around 1991), which was planned to have similar features, and
437 so Lucid decided to work with the Free Software Foundation. Their plan
438 was to add features that they needed, and coordinate with the FSF so
439 that the features would get included back into Emacs version 19.
440
441 Delays in the release of version 19 occurred, however (resulting in it
442 finally being released more than a year after what was initially
443 planned), and Lucid encountered unexpected technical resistance in
444 getting their changes merged back into version 19, so they decided to
445 release their own version of Emacs, which became Lucid Emacs 19.0.
446
447 @cindex Zawinski, Jamie
448 @cindex Sexton, Harlan
449 @cindex Benson, Eric
450 @cindex Devin, Matthieu
451 The initial authors of Lucid Emacs were Matthieu Devin, Harlan Sexton,
452 and Eric Benson, and the work was later taken over by Jamie Zawinski,
453 who became ``Mr. Lucid Emacs'' for many releases.
454
455 A time line for Lucid Emacs/XEmacs is
456
457 @itemize @bullet
458 @item
459 version 19.0 shipped with Energize 1.0, April 1992.
460 @item
461 version 19.1 released June 4, 1992.
462 @item
463 version 19.2 released June 19, 1992.
464 @item
465 version 19.3 released September 9, 1992.
466 @item
467 version 19.4 released January 21, 1993.
468 @item
469 version 19.5 was a repackaging of 19.4 with a few bug fixes and
470 shipped with Energize 2.0. Never released to the net.
471 @item
472 version 19.6 released April 9, 1993.
473 @item
474 version 19.7 was a repackaging of 19.6 with a few bug fixes and
475 shipped with Energize 2.1. Never released to the net.
476 @item
477 version 19.8 released September 6, 1993.
478 @item
479 version 19.9 released January 12, 1994.
480 @item
481 version 19.10 released May 27, 1994.
482 @item
483 version 19.11 (first XEmacs) released September 13, 1994.
484 @item
485 version 19.12 released June 23, 1995.
486 @item
487 version 19.13 released September 1, 1995.
488 @end itemize
489
490 @node GNU Emacs 19
491 @section GNU Emacs 19
492 @cindex GNU Emacs 19
493 @cindex FSF Emacs
494
495 About a year after the initial release of Lucid Emacs, the FSF
496 released a beta of their version of Emacs 19 (referred to here as ``GNU
497 Emacs''). By this time, the current version of Lucid Emacs was
498 19.6. (Strangely, the first released beta from the FSF was GNU Emacs
499 19.7.) A time line for GNU Emacs version 19 is
500
501 @itemize @bullet
502 @item
503 version 19.8 (beta) released May 27, 1993.
504 @item
505 version 19.9 (beta) released May 27, 1993.
506 @item
507 version 19.10 (beta) released May 30, 1993.
508 @item
509 version 19.11 (beta) released June 1, 1993.
510 @item
511 version 19.12 (beta) released June 2, 1993.
512 @item
513 version 19.13 (beta) released June 8, 1993.
514 @item
515 version 19.14 (beta) released June 17, 1993.
516 @item
517 version 19.15 (beta) released June 19, 1993.
518 @item
519 version 19.16 (beta) released July 6, 1993.
520 @item
521 version 19.17 (beta) released late July, 1993.
522 @item
523 version 19.18 (beta) released August 9, 1993.
524 @item
525 version 19.19 (beta) released August 15, 1993.
526 @item
527 version 19.20 (beta) released November 17, 1993.
528 @item
529 version 19.21 (beta) released November 17, 1993.
530 @item
531 version 19.22 (beta) released November 28, 1993.
532 @item
533 version 19.23 (beta) released May 17, 1994.
534 @item
535 version 19.24 (beta) released May 16, 1994.
536 @item
537 version 19.25 (beta) released June 3, 1994.
538 @item
539 version 19.26 (beta) released September 11, 1994.
540 @item
541 version 19.27 (beta) released September 14, 1994.
542 @item
543 version 19.28 (first ``official'' release) released November 1, 1994.
544 @item
545 version 19.29 released June 21, 1995.
546 @end itemize
547
548 @cindex Mlynarik, Richard
549 In some ways, GNU Emacs 19 was better than Lucid Emacs; in some ways,
550 worse. Lucid soon began incorporating features from GNU Emacs 19 into
551 Lucid Emacs; the work was mostly done by Richard Mlynarik, who had been
552 working on and using GNU Emacs for a long time (back as far as version
553 16 or 17).
554
555 @node XEmacs
556 @section XEmacs
557 @cindex XEmacs
558
559 @cindex Sun Microsystems
560 @cindex University of Illinois
561 @cindex Illinois, University of
562 @cindex SPARCWorks
563 @cindex Andreessen, Marc
564 @cindex Kaplan, Simon
565 @cindex Wing, Ben
566 @cindex Thompson, Chuck
567 @cindex Win-Emacs
568 @cindex Epoch
569 @cindex Amdahl Corporation
570 Around the time that Lucid was developing Energize, Sun Microsystems
571 was developing their own development environment (called ``SPARCWorks'')
572 and also decided to use Emacs. They joined forces with the Epoch team
573 at the University of Illinois and later with Lucid. The maintainer of
574 the last-released version of Epoch was Marc Andreessen, but he dropped
575 out and the Epoch project, headed by Simon Kaplan, lured Chuck Thompson
576 away from a system administration job to become the primary Lucid Emacs
577 author for Epoch and Sun. Chuck's area of specialty became the
578 redisplay engine (he replaced the old Lucid Emacs redisplay engine with
579 a ported version from Epoch and then later rewrote it from scratch).
580 Sun also hired Ben Wing (the author of Win-Emacs, a port of Lucid Emacs
581 to Microsoft Windows 3.1) in 1993, for what was initially a one-month
582 contract to fix some event problems but later became a many-year
583 involvement, punctuated by a six-month contract with Amdahl Corporation.
584
585 @cindex rename to XEmacs
586 In 1994, Sun and Lucid agreed to rename Lucid Emacs to XEmacs (a name
587 not favorable to either company); the first release called XEmacs was
588 version 19.11. In June 1994, Lucid folded and Jamie quit to work for
589 the newly formed Mosaic Communications Corp., later Netscape
590 Communications Corp. (co-founded by the same Marc Andreessen, who had
591 quit his Epoch job to work on a graphical browser for the World Wide
592 Web). Chuck then become the primary maintainer of XEmacs, and put out
593 versions 19.11, 19.12, and 19.13 in conjunction with Ben. For 19.12 and
594 19.13, Chuck added the new redisplay and many other display improvements
595 and Ben added MULE support (support for Asian and other languages) and
596 redesigned most of the internal Lisp subsystems to better support the
597 MULE work and the various other features being added to XEmacs.
598
599 @cindex merging attempts
600 Many attempts have been made to merge XEmacs and GNU Emacs, but they
601 have consistently run into the same technical disagreements and other
602 problems that Lucid ran into when originally attempting to merge Lucid
603 Emacs into GNU Emacs.
604
605 A more detailed history is contained in the XEmacs About page.
606
607 @node XEmacs From the Outside, The Lisp Language, A History of Emacs, Top
608 @chapter XEmacs From the Outside
609 @cindex read-eval-print
610
611 XEmacs appears to the outside world as an editor, but it is really a
612 Lisp environment. At its heart is a Lisp interpreter; it also
613 ``happens'' to contain many specialized object types (e.g. buffers,
614 windows, frames, events) that are useful for implementing an editor.
615 Some of these objects (in particular windows and frames) have
616 displayable representations, and XEmacs provides a function
617 @code{redisplay()} that ensures that the display of all such objects
618 matches their internal state. Most of the time, a standard Lisp
619 environment is in a @dfn{read-eval-print} loop -- i.e. ``read some Lisp
620 code, execute it, and print the results''. XEmacs has a similar loop:
621
622 @itemize @bullet
623 @item
624 read an event
625 @item
626 dispatch the event (i.e. ``do it'')
627 @item
628 redisplay
629 @end itemize
630
631 Reading an event is done using the Lisp function @code{next-event},
632 which waits for something to happen (typically, the user presses a key
633 or moves the mouse) and returns an event object describing this.
634 Dispatching an event is done using the Lisp function
635 @code{dispatch-event}, which looks up the event in a keymap object (a
636 particular kind of object that associates an event with a Lisp function)
637 and calls that function. The function ``does'' what the user has
638 requested by changing the state of particular frame objects, buffer
639 objects, etc. Finally, @code{redisplay()} is called, which updates the
640 display to reflect those changes just made. Thus is an ``editor'' born.
641
642 @cindex bridge, playing
643 @cindex taxes, doing
644 @cindex pi, calculating
645 Note that you do not have to use XEmacs as an editor; you could just
646 as well make it do your taxes, compute pi, play bridge, etc. You'd just
647 have to write functions to do those operations in Lisp.
648
649 @node The Lisp Language, XEmacs From the Perspective of Building, XEmacs From the Outside, Top
650 @chapter The Lisp Language
651 @cindex Lisp vs. C
652 @cindex C vs. Lisp
653 @cindex Lisp vs. Java
654 @cindex Java vs. Lisp
655 @cindex dynamic scoping
656 @cindex scoping, dynamic
657 @cindex dynamic types
658 @cindex types, dynamic
659 @cindex Java
660 @cindex Common Lisp
661 @cindex Gosling, James
662
663 Lisp is a general-purpose language that is higher-level than C and in
664 many ways more powerful than C. Powerful dialects of Lisp such as
665 Common Lisp are probably much better languages for writing very large
666 applications than is C. (Unfortunately, for many non-technical
667 reasons C and its successor C++ have become the dominant languages for
668 application development. These languages are both inadequate for
669 extremely large applications, which is evidenced by the fact that newer,
670 larger programs are becoming ever harder to write and are requiring ever
671 more programmers despite great increases in C development environments;
672 and by the fact that, although hardware speeds and reliability have been
673 growing at an exponential rate, most software is still generally
674 considered to be slow and buggy.)
675
676 The new Java language holds promise as a better general-purpose
677 development language than C. Java has many features in common with
678 Lisp that are not shared by C (this is not a coincidence, since
679 Java was designed by James Gosling, a former Lisp hacker). This
680 will be discussed more later.
681
682 For those used to C, here is a summary of the basic differences between
683 C and Lisp:
684
685 @enumerate
686 @item
687 Lisp has an extremely regular syntax. Every function, expression,
688 and control statement is written in the form
689
690 @example
691 (@var{func} @var{arg1} @var{arg2} ...)
692 @end example
693
694 This is as opposed to C, which writes functions as
695
696 @example
697 func(@var{arg1}, @var{arg2}, ...)
698 @end example
699
700 but writes expressions involving operators as (e.g.)
701
702 @example
703 @var{arg1} + @var{arg2}
704 @end example
705
706 and writes control statements as (e.g.)
707
708 @example
709 while (@var{expr}) @{ @var{statement1}; @var{statement2}; ... @}
710 @end example
711
712 Lisp equivalents of the latter two would be
713
714 @example
715 (+ @var{arg1} @var{arg2} ...)
716 @end example
717
718 and
719
720 @example
721 (while @var{expr} @var{statement1} @var{statement2} ...)
722 @end example
723
724 @item
725 Lisp is a safe language. Assuming there are no bugs in the Lisp
726 interpreter/compiler, it is impossible to write a program that ``core
727 dumps'' or otherwise causes the machine to execute an illegal
728 instruction. This is very different from C, where perhaps the most
729 common outcome of a bug is exactly such a crash. A corollary of this is that
730 the C operation of casting a pointer is impossible (and unnecessary) in
731 Lisp, and that it is impossible to access memory outside the bounds of
732 an array.
733
734 @item
735 Programs and data are written in the same form. The
736 parenthesis-enclosing form described above for statements is the same
737 form used for the most common data type in Lisp, the list. Thus, it is
738 possible to represent any Lisp program using Lisp data types, and for
739 one program to construct Lisp statements and then dynamically
740 @dfn{evaluate} them, or cause them to execute.
741
742 @item
743 All objects are @dfn{dynamically typed}. This means that part of every
744 object is an indication of what type it is. A Lisp program can
745 manipulate an object without knowing what type it is, and can query an
746 object to determine its type. This means that, correspondingly,
747 variables and function parameters can hold objects of any type and are
748 not normally declared as being of any particular type. This is opposed
749 to the @dfn{static typing} of C, where variables can hold exactly one
750 type of object and must be declared as such, and objects do not contain
751 an indication of their type because it's implicit in the variables they
752 are stored in. It is possible in C to have a variable hold different
753 types of objects (e.g. through the use of @code{void *} pointers or
754 variable-argument functions), but the type information must then be
755 passed explicitly in some other fashion, leading to additional program
756 complexity.
757
758 @item
759 Allocated memory is automatically reclaimed when it is no longer in use.
760 This operation is called @dfn{garbage collection} and involves looking
761 through all variables to see what memory is being pointed to, and
762 reclaiming any memory that is not pointed to and is thus
763 ``inaccessible'' and out of use. This is as opposed to C, in which
764 allocated memory must be explicitly reclaimed using @code{free()}. If
765 you simply drop all pointers to memory without freeing it, it becomes
766 ``leaked'' memory that still takes up space. Over a long period of
767 time, this can cause your program to grow and grow until it runs out of
768 memory.
769
770 @item
771 Lisp has built-in facilities for handling errors and exceptions. In C,
772 when an error occurs, usually either the program exits entirely or the
773 routine in which the error occurs returns a value indicating this. If
774 an error occurs in a deeply-nested routine, then every routine currently
775 called must unwind itself normally and return an error value back up to
776 the next routine. This means that every routine must explicitly check
777 for an error in all the routines it calls; if it does not do so,
778 unexpected and often random behavior results. This is an extremely
779 common source of bugs in C programs. An alternative would be to do a
780 non-local exit using @code{longjmp()}, but that is often very dangerous
781 because the routines that were exited past had no opportunity to clean
782 up after themselves and may leave things in an inconsistent state,
783 causing a crash shortly afterwards.
784
785 Lisp provides mechanisms to make such non-local exits safe. When an
786 error occurs, a routine simply signals that an error of a particular
787 class has occurred, and a non-local exit takes place. Any routine can
788 trap errors occurring in routines it calls by registering an error
789 handler for some or all classes of errors. (If no handler is registered,
790 a default handler, generally installed by the top-level event loop, is
791 executed; this prints out the error and continues.) Routines can also
792 specify cleanup code (called an @dfn{unwind-protect}) that will be
793 called when control exits from a block of code, no matter how that exit
794 occurs -- i.e. even if a function deeply nested below it causes a
795 non-local exit back to the top level.
796
797 Note that this facility has appeared in some recent vintages of C, in
798 particular Visual C++ and other PC compilers written for the Microsoft
799 Win32 API.
800
801 @item
802 In Emacs Lisp, local variables are @dfn{dynamically scoped}. This means
803 that if you declare a local variable in a particular function, and then
804 call another function, that subfunction can ``see'' the local variable
805 you declared. This is actually considered a bug in Emacs Lisp and in
806 all other early dialects of Lisp, and was corrected in Common Lisp. (In
807 Common Lisp, you can still declare dynamically scoped variables if you
808 want to -- they are sometimes useful -- but variables by default are
809 @dfn{lexically scoped} as in C.)
810 @end enumerate
811
812 For those familiar with Lisp, Emacs Lisp is modelled after MacLisp, an
813 early dialect of Lisp developed at MIT (no relation to the Macintosh
814 computer). There is a Common Lisp compatibility package available for
815 Emacs that provides many of the features of Common Lisp.
816
817 The Java language is derived in many ways from C, and shares a similar
818 syntax, but has the following features in common with Lisp (and different
819 from C):
820
821 @enumerate
822 @item
823 Java is a safe language, like Lisp.
824 @item
825 Java provides garbage collection, like Lisp.
826 @item
827 Java has built-in facilities for handling errors and exceptions, like
828 Lisp.
829 @item
830 Java has a type system that combines the best advantages of both static
831 and dynamic typing. Objects (except very simple types) are explicitly
832 marked with their type, as in dynamic typing; but there is a hierarchy
833 of types and functions are declared to accept only certain types, thus
834 providing the increased compile-time error-checking of static typing.
835 @end enumerate
836
837 @node XEmacs From the Perspective of Building, XEmacs From the Inside, The Lisp Language, Top
838 @chapter XEmacs From the Perspective of Building
839
840 The heart of XEmacs is the Lisp environment, which is written in C.
841 This is contained in the @file{src/} subdirectory. Underneath
842 @file{src/} are two subdirectories of header files: @file{s/} (header
843 files for particular operating systems) and @file{m/} (header files for
844 particular machine types). In practice the distinction between the two
845 types of header files is blurred. These header files define or undefine
846 certain preprocessor constants and macros to indicate particular
847 characteristics of the associated machine or operating system. As part
848 of the configure process, one @file{s/} file and one @file{m/} file is
849 identified for the particular environment in which XEmacs is being
850 built.
851
852 XEmacs also contains a great deal of Lisp code. This implements the
853 operations that make XEmacs useful as an editor as well as just a
854 Lisp environment, and also contains many add-on packages that allow
855 XEmacs to browse directories, act as a mail and Usenet news reader,
856 compile Lisp code, etc. There is actually a lot more Lisp code than
857 C code associated with XEmacs, but much of the Lisp code is
858 peripheral to the actual operation of the editor. The Lisp code
859 all lies in subdirectories underneath the @file{lisp/} directory.
860
861 The @file{lwlib/} directory contains C code that implements a
862 generalized interface onto different X widget toolkits and also
863 implements some widgets of its own that behave like Motif widgets but
864 are faster, free, and in some cases more powerful. The code in this
865 directory compiles into a library and is mostly independent from XEmacs.
866
867 The @file{etc/} directory contains various data files associated with
868 XEmacs. Some of them are actually read by XEmacs at startup; others
869 merely contain useful information of various sorts.
870
871 The @file{lib-src/} directory contains C code for various auxiliary
872 programs that are used in connection with XEmacs. Some of them are used
873 during the build process; others are used to perform certain functions
874 that cannot conveniently be placed in the XEmacs executable (e.g. the
875 @file{movemail} program for fetching mail out of /var/spool/mail, which
876 must be setgid to @file{mail} on many systems; and the 'gnuclient'
877 program, which allows an external script to communicate with a running
878 XEmacs process).
879
880 The @file{man/} directory contains the sources for the XEmacs
881 documentation. It is mostly in a form called Texinfo, which can be
882 converted into either a printed document (by passing it through TeX) or
883 into on-line documentation called @dfn{info files}.
884
885 The @file{info/} directory contains the results of formatting the
886 XEmacs documentation as @dfn{info files}, for on-line use. These files
887 are used when you enter the Info system using @kbd{C-h i} or through the
888 Help menu.
889
890 The @file{dynodump/} directory contains auxiliary code used to build
891 XEmacs on Solaris platforms.
892
893 The other directories contain various miscellaneous code and
894 information that is not normally used or needed.
895
896 The first step of building involves running the @file{configure}
897 program and passing it various parameters to specify any optional
898 features you want and compiler arguments and such, as described in the
899 @file{INSTALL} file. This determines what the build environment is,
900 chooses the appropriate @file{s/} and @file{m/} file, and runs a series
901 of tests to determine many details about your environment, such as which
902 library functions are available and exactly how they work. (The
903 @file{s/} and @file{m/} files only contain information that cannot be
904 conveniently detected in this fashion.) The reason for running these
905 tests is that it allows XEmacs to be compiled on a much wider variety of
906 platforms than those that the XEmacs developers happen to be familiar
907 with, including various sorts of hybrid platforms. This is especially
908 important now that many operating systems give you a great deal of
909 control over exactly what features you want installed, and allow for
910 easy upgrading of parts of a system without upgrading the rest. It
911 would be impossible to pre-determine and pre-specify the information for
912 all possible configurations.
913
914 When configure is done running, it generates @file{Makefile}s and the
915 file @file{config.h} (which describes the features of your system) from
916 template files. You then run @file{make}, which compiles the auxiliary
917 code and programs in @file{lib-src/} and @file{lwlib/} and the main
918 XEmacs executable in @file{src/}. The result of this is an executable
919 called @file{temacs}, which is @emph{not} the XEmacs executable.
920 @file{temacs} by itself cannot function as an editor or even display any
921 windows on the screen, and if you simply run it, it will exit
922 immediately. The Makefile runs @file{temacs} with certain options that
923 cause it to initialize itself, read in a number of basic Lisp files, and
924 then dump itself out into a new executable called @file{xemacs}. This
925 new executable has been pre-initialized and contains pre-digested Lisp
926 code that is necessary for the editor to function (this includes some
927 extremely basic Lisp functions, e.g. @code{not}, that can be defined in
928 terms of other Lisp primitives; some initialization code that is called
929 when certain objects, such as frames, are created; and all of the
930 standard keybindings and code for the actions they result in). This
931 executable, @file{xemacs}, is the executable that you run to use the
932 XEmacs editor.
933
934 @node XEmacs From the Inside, The XEmacs Object System (Abstractly Speaking), XEmacs From the Perspective of Building, Top
935 @chapter XEmacs From the Inside
936
937 Internally, XEmacs is quite complex, and can be very confusing. To
938 simplify things, it can be useful to think of XEmacs as containing an
939 event loop that ``drives'' everything, and a number of other subsystems,
940 such as a Lisp engine and a redisplay mechanism. Each of these others
941 subsystems exists simultaneously in XEmacs, and each has a certain
942 state. The flow of control continually passes in and out of these
943 different subsystems in the course of normal operation of the editor.
944
945 It is important to keep in mind that, most of the time, the editor is
946 ``driven'' by the event loop. Except during initialization and batch
947 mode, all subsystems are entered directly or indirectly through the
948 event loop, and ultimately, control exits out of all subsystems back up
949 to the event loop. This cycle of entering a subsystem, exiting back out
950 to the event loop, and starting another iteration of the event loop
951 occurs once each keystroke, mouse motion, etc.
952
953 If you're trying to understand a particular subsystem (other than the
954 event loop), think of it as a ``daemon'' process or ``servant'' that is
955 responsible for one particular aspect of a larger system, and
956 periodically receives commands or environment changes that cause it to
957 do something. Ultimately, these commands and environment changes are
958 always triggered by the event loop. For example:
959
960 @itemize @bullet
961 @item
962 The window and frame mechanism is responsible for keeping track of what
963 windows and frames exist, what buffers are in them, etc. It is
964 periodically given commands (usually from the user) to make a change to
965 the current window/frame state: i.e. create a new frame, delete a
966 window, etc.
967
968 @item
969 The buffer mechanism is responsible for keeping track of what buffers
970 exist and what text is in them. It is periodically given commands
971 (usually from the user) to insert or delete text, create a buffer, etc.
972 When it receives a textual-change command, it tells the redisplay
973 mechanism about this.
974
975 @item
976 The redisplay mechanism is responsible for making sure that windows and
977 frames are displayed correctly. It is periodically told (by the event
978 loop) to actually ``do its job'', i.e. snoop around and see what the
979 current state of the environment (mostly of the currently-existing
980 windows, frames, and buffers) is, and make sure that that state matches
981 what's actually displayed. It keeps lots and lots of information around
982 (such as what is actually being displayed currently, and what the
983 environment was last time it checked) so that it can minimize the work
984 it has to do. It is also helped along in that whenever a relevant
985 change to the environment occurs, the redisplay mechanism is told about
986 this, so it has a pretty good idea of where it has to look to find
987 possible changes and doesn't have to look everywhere.
988
989 @item
990 The Lisp engine is responsible for executing the Lisp code in which most
991 user commands are written. It is entered through a call to @code{eval}
992 or @code{funcall}, which occurs as a result of dispatching an event from
993 the event loop. The functions it calls issue commands to the buffer
994 mechanism, the window/frame subsystem, etc.
995
996 @item
997 The Lisp allocation subsystem is responsible for keeping track of Lisp
998 objects. It is given commands from the Lisp engine to allocate objects,
999 garbage collect, etc.
1000 @end itemize
1001
1002 etc.
1003
1004 The important idea here is that there are a number of independent
1005 subsystems each with their own responsibility and persistent state, just
1006 like different employees in a company, and each subsystem is
1007 periodically given commands from other subsystems. Commands can flow
1008 from any one subsystem to any other, but there is usually some sort of
1009 hierarchy, with all commands originating from the event subsystem.
1010
1011 XEmacs is entered in @code{main()}, which is in @file{emacs.c}. When
1012 this is called the first time (in a properly-invoked @file{temacs}), it
1013 does the following:
1014
1015 @enumerate
1016 @item
1017 It does some very basic environment initializations, such as determining
1018 where it and its directories (e.g. @file{lisp/} and @file{etc/}) reside
1019 and setting up signal handlers.
1020 @item
1021 It initializes the entire Lisp interpreter.
1022 @item
1023 It sets the initial values of many built-in variables (including many
1024 variables that are visible to Lisp programs), such as the global keymap
1025 object and the built-in faces (a face is an object that describes the
1026 display characteristics of text). This involves creating Lisp objects
1027 and thus is dependent on step (2).
1028 @item
1029 It performs various other initializations that are relevant to the
1030 particular environment it is running in, such as retrieving environment
1031 variables, determining the current date and the user who is running the
1032 program, examining its standard input, creating any necessary file
1033 descriptors, etc.
1034 @item
1035 At this point, the C initialization is complete. A Lisp program that
1036 was specified on the command line (usually @file{loadup.el}) is called
1037 (temacs is normally invoked as @code{temacs -batch -l loadup.el dump}).
1038 @file{loadup.el} loads all of the other Lisp files that are needed for
1039 the operation of the editor, calls the @code{dump-emacs} function to
1040 write out @file{xemacs}, and then kills the temacs process.
1041 @end enumerate
1042
1043 When @file{xemacs} is then run, it only redoes steps (1) and (4)
1044 above; all variables already contain the values they were set to when
1045 the executable was dumped, and all memory that was allocated with
1046 @code{malloc()} is still around. (XEmacs knows whether it is being run
1047 as @file{xemacs} or @file{temacs} because it sets the global variable
1048 @code{initialized} to 1 after step (4) above.) At this point,
1049 @file{xemacs} calls a Lisp function to do any further initialization,
1050 which includes parsing the command-line (the C code can only do limited
1051 command-line parsing, which includes looking for the @samp{-batch} and
1052 @samp{-l} flags and a few other flags that it needs to know about before
1053 initialization is complete), creating the first frame (or @dfn{window}
1054 in standard window-system parlance), running the user's init file
1055 (usually the file @file{.emacs} in the user's home directory), etc. The
1056 function to do this is usually called @code{normal-top-level};
1057 @file{loadup.el} tells the C code about this function by setting its
1058 name as the value of the Lisp variable @code{top-level}.
1059
1060 When the Lisp initialization code is done, the C code enters the event
1061 loop, and stays there for the duration of the XEmacs process. The code
1062 for the event loop is contained in @file{keyboard.c}, and is called
1063 @code{Fcommand_loop_1()}. Note that this event loop could very well be
1064 written in Lisp, and in fact a Lisp version exists; but apparently,
1065 doing this makes XEmacs run noticeably slower.
1066
1067 Notice how much of the initialization is done in Lisp, not in C.
1068 In general, XEmacs tries to move as much code as is possible
1069 into Lisp. Code that remains in C is code that implements the
1070 Lisp interpreter itself, or code that needs to be very fast, or
1071 code that needs to do system calls or other such stuff that
1072 needs to be done in C, or code that needs to have access to
1073 ``forbidden'' structures. (One conscious aspect of the design of
1074 Lisp under XEmacs is a clean separation between the external
1075 interface to a Lisp object's functionality and its internal
1076 implementation. Part of this design is that Lisp programs
1077 are forbidden from accessing the contents of the object other
1078 than through using a standard API. In this respect, XEmacs Lisp
1079 is similar to modern Lisp dialects but differs from GNU Emacs,
1080 which tends to expose the implementation and allow Lisp
1081 programs to look at it directly. The major advantage of
1082 hiding the implementation is that it allows the implementation
1083 to be redesigned without affecting any Lisp programs, including
1084 those that might want to be ``clever'' by looking directly at
1085 the object's contents and possibly manipulating them.)
1086
1087 Moving code into Lisp makes the code easier to debug and maintain and
1088 makes it much easier for people who are not XEmacs developers to
1089 customize XEmacs, because they can make a change with much less chance
1090 of obscure and unwanted interactions occurring than if they were to
1091 change the C code.
1092
1093 @node The XEmacs Object System (Abstractly Speaking), How Lisp Objects Are Represented in C, XEmacs From the Inside, Top
1094 @chapter The XEmacs Object System (Abstractly Speaking)
1095
1096 At the heart of the Lisp interpreter is its management of objects.
1097 XEmacs Lisp contains many built-in objects, some of which are
1098 simple and others of which can be very complex; and some of which
1099 are very common, and others of which are rarely used or are only
1100 used internally. (Since the Lisp allocation system, with its
1101 automatic reclamation of unused storage, is so much more convenient
1102 than @code{malloc()} and @code{free()}, the C code makes extensive use of it
1103 in its internal operations.)
1104
1105 The basic Lisp objects are
1106
1107 @table @code
1108 @item integer
1109 28 bits of precision, or 60 bits on 64-bit machines; the reason for this
1110 is described below when the internal Lisp object representation is
1111 described.
1112 @item float
1113 Same precision as a double in C.
1114 @item cons
1115 A simple container for two Lisp objects, used to implement lists and
1116 most other data structures in Lisp.
1117 @item char
1118 An object representing a single character of text; chars behave like
1119 integers in many ways but are logically considered text rather than
1120 numbers and have a different read syntax. (the read syntax for a char
1121 contains the char itself or some textual encoding of it -- for example,
1122 a Japanese Kanji character might be encoded as @samp{^[$(B#&^[(B} using the
1123 ISO-2022 encoding standard -- rather than the numerical representation
1124 of the char; this way, if the mapping between chars and integers
1125 changes, which is quite possible for Kanji characters and other extended
1126 characters, the same character will still be created. Note that some
1127 primitives confuse chars and integers. The worst culprit is @code{eq},
1128 which makes a special exception and considers a char to be @code{eq} to
1129 its integer equivalent, even though in no other case are objects of two
1130 different types @code{eq}. The reason for this monstrosity is
1131 compatibility with existing code; the separation of char from integer
1132 came fairly recently.)
1133 @item symbol
1134 An object that contains Lisp objects and is referred to by name;
1135 symbols are used to implement variables and named functions
1136 and to provide the equivalent of preprocessor constants in C.
1137 @item vector
1138 A one-dimensional array of Lisp objects providing constant-time access
1139 to any of the objects; access to an arbitrary object in a vector is
1140 faster than for lists, but the operations that can be done on a vector
1141 are more limited.
1142 @item string
1143 Self-explanatory; behaves much like a vector of chars
1144 but has a different read syntax and is stored and manipulated
1145 more compactly and efficiently.
1146 @item bit-vector
1147 A vector of bits; similar to a string in spirit.
1148 @item compiled-function
1149 An object describing compiled Lisp code, known as @dfn{byte code}.
1150 @item subr
1151 An object describing a Lisp primitive.
1152 @end table
1153
1154 @cindex closure
1155 Note that there is no basic ``function'' type, as in more powerful
1156 versions of Lisp (where it's called a @dfn{closure}). XEmacs Lisp does
1157 not provide the closure semantics implemented by Common Lisp and Scheme.
1158 The guts of a function in XEmacs Lisp are represented in one of four
1159 ways: a symbol specifying another function (when one function is an
1160 alias for another), a list containing the function's source code, a
1161 bytecode object, or a subr object. (In other words, given a symbol
1162 specifying the name of a function, calling @code{symbol-function} to
1163 retrieve the contents of the symbol's function cell will return one of
1164 these types of objects.)
1165
1166 XEmacs Lisp also contains numerous specialized objects used to
1167 implement the editor:
1168
1169 @table @asis
1170 @item buffer
1171 Stores text like a string, but is optimized for insertion and deletion
1172 and has certain other properties that can be set.
1173 @item frame
1174 An object with various properties whose displayable representation is a
1175 @dfn{window} in window-system parlance.
1176 @item window
1177 A section of a frame that displays the contents of a buffer;
1178 often called a @dfn{pane} in window-system parlance.
1179 @item window-configuration
1180 An object that represents a saved configuration of windows in a frame.
1181 @item device
1182 An object representing a screen on which frames can be displayed;
1183 equivalent to a @dfn{display} in the X Window System and a @dfn{TTY} in
1184 character mode.
1185 @item face
1186 An object specifying the appearance of text or graphics; it contains
1187 characteristics such as font, foreground color, and background color.
1188 @item marker
1189 An object that refers to a particular position in a buffer and moves
1190 around as text is inserted and deleted to stay in the same relative
1191 position to the text around it.
1192 @item extent
1193 Similar to a marker but covers a range of text in a buffer; can also
1194 specify properties of the text, such as a face in which the text is to
1195 be displayed, whether the text is invisible or unmodifiable, etc.
1196 @item event
1197 Generated by calling @code{next-event} and contains information
1198 describing a particular event happening in the system, such as the user
1199 pressing a key or a process terminating.
1200 @item keymap
1201 An object that maps from events (described using lists, vectors, and
1202 symbols rather than with an event object because the mapping is for
1203 classes of events, rather than individual events) to functions to
1204 execute or other events to recursively look up; the functions are
1205 described by name, using a symbol, or using lists to specify the
1206 function's code.
1207 @item glyph
1208 An object that describes the appearance of an image (e.g. pixmap) on
1209 the screen; glyphs can be attached to the beginning or end of extents
1210 and in some future version of XEmacs will be able to be inserted
1211 directly into a buffer.
1212 @item process
1213 An object that describes a connection to an externally-running process.
1214 @end table
1215
1216 There are some other, less-commonly-encountered general objects:
1217
1218 @table @asis
1219 @item hashtable
1220 An object that maps from an arbitrary Lisp object to another arbitrary
1221 Lisp object, using hashing for fast lookup.
1222 @item obarray
1223 A limited form of hashtable that maps from strings to symbols; obarrays
1224 are used to look up a symbol given its name and are not actually their
1225 own object type but are kludgily represented using vectors with hidden
1226 fields (this representation derives from GNU Emacs).
1227 @item specifier
1228 A complex object used to specify the value of a display property; a
1229 default value is given and different values can be specified for
1230 particular frames, buffers, windows, devices, or classes of device.
1231 @item char-table
1232 An object that maps from chars or classes of chars to arbitrary Lisp
1233 objects; internally char tables use a complex nested-vector
1234 representation that is optimized to the way characters are represented
1235 as integers.
1236 @item range-table
1237 An object that maps from ranges of integers to arbitrary Lisp objects.
1238 @end table
1239
1240 And some strange special-purpose objects:
1241
1242 @table @asis
1243 @item charset
1244 @itemx coding-system
1245 Objects used when MULE, or multi-lingual/Asian-language, support is
1246 enabled.
1247 @item color-instance
1248 @itemx font-instance
1249 @itemx image-instance
1250 An object that encapsulates a window-system resource; instances are
1251 mostly used internally but are exposed on the Lisp level for cleanness
1252 of the specifier model and because it's occasionally useful for Lisp
1253 program to create or query the properties of instances.
1254 @item subwindow
1255 An object that encapsulate a @dfn{subwindow} resource, i.e. a
1256 window-system child window that is drawn into by an external process;
1257 this object should be integrated into the glyph system but isn't yet,
1258 and may change form when this is done.
1259 @item tooltalk-message
1260 @itemx tooltalk-pattern
1261 Objects that represent resources used in the ToolTalk interprocess
1262 communication protocol.
1263 @item toolbar-button
1264 An object used in conjunction with the toolbar.
1265 @item x-resource
1266 An object that encapsulates certain miscellaneous resources in the X
1267 window system, used only when Epoch support is enabled.
1268 @end table
1269
1270 And objects that are only used internally:
1271
1272 @table @asis
1273 @item opaque
1274 A generic object for encapsulating arbitrary memory; this allows you the
1275 generality of @code{malloc()} and the convenience of the Lisp object
1276 system.
1277 @item lstream
1278 A buffering I/O stream, used to provide a unified interface to anything
1279 that can accept output or provide input, such as a file descriptor, a
1280 stdio stream, a chunk of memory, a Lisp buffer, a Lisp string, etc.;
1281 it's a Lisp object to make its memory management more convenient.
1282 @item char-table-entry
1283 Subsidiary objects in the internal char-table representation.
1284 @item extent-auxiliary
1285 @itemx menubar-data
1286 @itemx toolbar-data
1287 Various special-purpose objects that are basically just used to
1288 encapsulate memory for particular subsystems, similar to the more
1289 general ``opaque'' object.
1290 @item symbol-value-forward
1291 @itemx symbol-value-buffer-local
1292 @itemx symbol-value-varalias
1293 @itemx symbol-value-lisp-magic
1294 Special internal-only objects that are placed in the value cell of a
1295 symbol to indicate that there is something special with this variable --
1296 e.g. it has no value, it mirrors another variable, or it mirrors some C
1297 variable; there is really only one kind of object, called a
1298 @dfn{symbol-value-magic}, but it is sort-of halfway kludged into
1299 semi-different object types.
1300 @end table
1301
1302 @cindex permanent objects
1303 @cindex temporary objects
1304 Some types of objects are @dfn{permanent}, meaning that once created,
1305 they do not disappear until explicitly destroyed, using a function such
1306 as @code{delete-buffer}, @code{delete-window}, @code{delete-frame}, etc.
1307 Others will disappear once they are not longer used, through the garbage
1308 collection mechanism. Buffers, frames, windows, devices, and processes
1309 are among the objects that are permanent. Note that some objects can go
1310 both ways: Faces can be created either way; extents are normally
1311 permanent, but detached extents (extents not referring to any text, as
1312 happens to some extents when the text they are referring to is deleted)
1313 are temporary. Note that some permanent objects, such as faces and
1314 coding systems, cannot be deleted. Note also that windows are unique in
1315 that they can be @emph{undeleted} after having previously been
1316 deleted. (This happens as a result of restoring a window configuration.)
1317
1318 @cindex read syntax
1319 Note that many types of objects have a @dfn{read syntax}, i.e. a way of
1320 specifying an object of that type in Lisp code. When you load a Lisp
1321 file, or type in code to be evaluated, what really happens is that the
1322 function @code{read} is called, which reads some text and creates an object
1323 based on the syntax of that text; then @code{eval} is called, which
1324 possibly does something special; then this loop repeats until there's
1325 no more text to read. (@code{eval} only actually does something special
1326 with symbols, which causes the symbol's value to be returned,
1327 similar to referencing a variable; and with conses [i.e. lists],
1328 which cause a function invocation. All other values are returned
1329 unchanged.)
1330
1331 The read syntax
1332
1333 @example
1334 17297
1335 @end example
1336
1337 converts to an integer whose value is 17297.
1338
1339 @example
1340 1.983e-4
1341 @end example
1342
1343 converts to a float whose value is 1983.23e-4, or .0001983.
1344
1345 @example
1346 ?b
1347 @end example
1348
1349 converts to a char that represents the lowercase letter b.
1350
1351 @example
1352 ?^[$(B#&^[(B
1353 @end example
1354
1355 (where @samp{^[} actually is an @samp{ESC} character) converts to a
1356 particular Kanji character. (To decode this gook: @samp{ESC} begins an
1357 escape sequence; @samp{ESC $ (} is a class of escape sequences meaning
1358 ``switch to a 94x94 character set''; @samp{ESC $ ( B} means ``switch to
1359 Japanese Kanji''; @samp{#} and @samp{&} collectively index into a
1360 94-by-94 array of characters [subtract 33 from the ASCII value of each
1361 character to get the corresponding index]; @samp{ESC (} is a class of
1362 escape sequences meaning ``switch to a 94 character set''; @samp{ESC (B}
1363 means ``switch to US ASCII''. It is a coincidence that the letter
1364 @samp{B} is used to denote both Japanese Kanji and US ASCII. If the
1365 first @samp{B} were replaced with an @samp{A}, you'd be requesting a
1366 Chinese Hanzi character from the GB2312 character set.)
1367
1368 @example
1369 "foobar"
1370 @end example
1371
1372 converts to a string.
1373
1374 @example
1375 foobar
1376 @end example
1377
1378 converts to a symbol whose name is @code{"foobar"}. This is done by
1379 looking up the string equivalent in the global variable
1380 @code{obarray}, whose contents should be an obarray. If no symbol
1381 is found, a new symbol with the name @code{"foobar"} is automatically
1382 created and adding it to @code{obarray}; this process is called
1383 @dfn{interning} the symbol.
1384 @cindex interning
1385
1386 @example
1387 (foo . bar)
1388 @end example
1389
1390 converts to a cons cell containing the symbols @code{foo} and @code{bar}.
1391
1392 @example
1393 (1 a 2.5)
1394 @end example
1395
1396 converts to a three-element list containing the specified objects
1397 (note that a list is actually a set of nested conses; see the
1398 XEmacs Lisp Reference).
1399
1400 @example
1401 [1 a 2.5]
1402 @end example
1403
1404 converts to a three-element vector containing the specified objects.
1405
1406 @example
1407 #[... ... ... ...]
1408 @end example
1409
1410 converts to a compiled-function object (the actual contents are not
1411 shown since they are not relevant here; look at a file that ends with
1412 @file{.elc} for examples).
1413
1414 @example
1415 #*01110110
1416 @end example
1417
1418 converts to a bit-vector.
1419
1420 @example
1421 #s(range-table ... ...)
1422 @end example
1423
1424 converts to a range table (the actual contents are not shown).
1425
1426 @example
1427 #s(char-table ... ...)
1428 @end example
1429
1430 converts to a char table (the actual contents are not shown).
1431 (Note that the #s syntax is the general syntax for structures,
1432 which are not really implemented in XEmacs Lisp but should be.)
1433
1434 When an object is printed out (using @code{print} or a related
1435 function), the read syntax is used, so that the same object can be read
1436 in again.
1437
1438 The other objects do not have read syntaxes, usually because it does
1439 not really make sense to create them in this fashion (i.e. processes,
1440 where it doesn't make sense to have a subprocess created as a side
1441 effect of reading some Lisp code), or because they can't be created at
1442 all (e.g. subrs). Permanent objects, as a rule, do not have a read
1443 syntax; nor do most complex objects, which contain too much state to be
1444 easily initialized through a read syntax.
1445
1446 @node How Lisp Objects Are Represented in C, Rules When Writing New C Code, The XEmacs Object System (Abstractly Speaking), Top
1447 @chapter How Lisp Objects Are Represented in C
1448
1449 Lisp objects are represented in C using a 32- or 64-bit machine word
1450 (depending on the processor; i.e. DEC Alphas use 64-bit Lisp objects and
1451 most other processors use 32-bit Lisp objects). The representation
1452 stuffs a pointer together with a tag, as follows:
1453
1454 @example
1455 [ 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 ]
1456 [ 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 ]
1457
1458 ^ <---> <------------------------------------------------------>
1459 | tag a pointer to a structure, or an integer
1460 |
1461 `---> mark bit
1462 @end example
1463
1464 The tag describes the type of the Lisp object. For integers and
1465 chars, the lower 28 bits contain the value of the integer or char; for
1466 all others, the lower 28 bits contain a pointer. The mark bit is used
1467 during garbage-collection, and is always 0 when garbage collection is
1468 not happening. Many macros that extract out parts of a Lisp object
1469 expect that the mark bit is 0, and will produce incorrect results if
1470 it's not. (The way that garbage collection works, basically, is that it
1471 loops over all places where Lisp objects could exist -- this includes
1472 all global variables in C that contain Lisp objects [including
1473 @code{Vobarray}, the C equivalent of @code{obarray}; through this, all
1474 Lisp variables will get marked], plus various other places -- and
1475 recursively scans through the Lisp objects, marking each object it finds
1476 by setting the mark bit. Then it goes through the lists of all objects
1477 allocated, freeing the ones that are not marked and turning off the
1478 mark bit of the ones that are marked.)
1479
1480 Lisp objects use the typedef @code{Lisp_Object}, but the actual C type
1481 used for the Lisp object can vary. It can be either a simple type
1482 (@code{long} on the DEC Alpha, @code{int} on other machines) or a
1483 structure whose fields are bit fields that line up properly (actually,
1484 it's a union of structures that's used). Generally the simple integral
1485 type is preferable because it ensures that the compiler will actually
1486 use a machine word to represent the object (some compilers will use more
1487 general and less efficient code for unions and structs even if they can
1488 fit in a machine word). The union type, however, has the advantage of
1489 stricter type checking (if you accidentally pass an integer where a Lisp
1490 object is desired, you get a compile error), and it makes it easier to
1491 decode Lisp objects when debugging. The choice of which type to use is
1492 determined by the presence or absence of the preprocessor constant
1493 @code{NO_UNION_TYPE}. (Shouldn't it be @code{USE_UNION_TYPE}, with
1494 opposite semantics? ``Hysterical reasons'', of course.)
1495
1496 @cindex record type
1497 Note that there are only eight types that the tag can represent,
1498 but many more actual types than this. This is handled by having
1499 one of the tag types specify a meta-object called a @dfn{record};
1500 for all such objects, the first four bytes of the pointed-to
1501 structure indicate what the actual type is.
1502
1503 Note also that having 28 bits for pointers and integers restricts a
1504 lot of things to 256 megabytes of memory. (Basically, enough pointers
1505 and indices and whatnot get stuffed into Lisp objects that the total
1506 amount of memory used by XEmacs can't grow above 256 megabytes. In
1507 older versions of XEmacs and GNU Emacs, the tag was 5 bits wide,
1508 allowing for 32 types, which was more than the actual number of types
1509 that existed at the time, and no ``record'' type was necessary.
1510 However, this limited the editor to 64 megabytes total, which some users
1511 who edited large files might conceivably exceed.)
1512
1513 Also, note that there is an implicit assumption here that all pointers
1514 are low enough that the top bits are all zero and can just be chopped
1515 off. On standard machines that allocate memory from the bottom up (and
1516 give each process its own address space), this works fine. Some
1517 machines, however, put the data space somewhere else in memory
1518 (e.g. beginning at 0x80000000). Those machines cope by defining
1519 @code{DATA_SEG_BITS} in the corresponding @file{m/} or @file{s/} file to
1520 the proper mask. Then, pointers retrieved from Lisp objects are
1521 automatically OR'ed with this value prior to being used.
1522
1523 A corollary of the previous paragraph is that @strong{stack-allocated
1524 structures cannot be put into Lisp objects}. The stack is generally
1525 located near the top of memory; if you put such a pointer into a Lisp
1526 object, it will get its top bits chopped off, and you will lose.
1527
1528 Various macros are used to construct Lisp objects and extract the
1529 components. Macros of the form @code{XINT()}, @code{XCHAR()},
1530 @code{XSTRING()}, @code{XSYMBOL()}, etc. mask out the pointer/integer
1531 field and cast it to the appropriate type. All of the macros that
1532 construct pointers will @code{OR} with @code{DATA_SEG_BITS} if
1533 necessary. @code{XINT()} needs to be a bit tricky so that negative
1534 numbers are properly sign-extended: Usually it does this by shifting the
1535 number four bits to the left and then four bits to the right. This
1536 assumes that the right-shift operator does an arithmetic shift (i.e. it
1537 leaves the most-significant bit as-is rather than shifting in a zero, so
1538 that it mimics a divide-by-two even for negative numbers). Not all
1539 machines/compilers do this, and on the ones that don't, a more
1540 complicated definition is selected by defining
1541 @code{EXPLICIT_SIGN_EXTEND}.
1542
1543 Note that when @code{ERROR_CHECK_TYPECHECK} is defined, the extractor
1544 macros become more complicated -- they check the tag bits and/or the
1545 type field in the first four bytes of a record type to ensure that the
1546 object is really of the correct type. This is great for catching places
1547 where an incorrect type is being dereferenced -- this typically results
1548 in a pointer being dereferenced as the wrong type of structure, with
1549 unpredictable (and sometimes not easily traceable) results.
1550
1551 There are similar @code{XSET()} macros that construct a Lisp object.
1552 These macros are of the form @code{XSET (@var{lvalue}, @var{result})},
1553 i.e. they have to be a statement rather than just used in an expression.
1554 The reason for this is that standard C doesn't let you ``construct'' a
1555 structure (but GCC does). Granted, this sometimes isn't too convenient;
1556 for the case of integers, at least, you can use the function
1557 @code{make_number()}, which constructs and @emph{returns} an integer
1558 Lisp object. Note that the @code{XSET()} macros are also affected by
1559 @code{ERROR_CHECK_TYPECHECK} and make sure that the structure is of the right
1560 type in the case of record types, where the type is contained in
1561 the structure.
1562
1563 @node Rules When Writing New C Code, A Summary of the Various XEmacs Modules, How Lisp Objects Are Represented in C, Top
1564 @chapter Rules When Writing New C Code
1565
1566 The XEmacs C Code is extremely complex and intricate, and there are
1567 many rules that are more or less consistently followed throughout the code.
1568 Many of these rules are not obvious, so they are explained here. It is
1569 of the utmost importance that you follow them. If you don't, you may get
1570 something that appears to work, but which will crash in odd situations,
1571 often in code far away from where the actual breakage is.
1572
1573 @menu
1574 * General Coding Rules::
1575 * Writing Lisp Primitives::
1576 * Adding Global Lisp Variables::
1577 @end menu
1578
1579 @node General Coding Rules
1580 @section General Coding Rules
1581
1582 Almost every module contains a @code{syms_of_*()} function and a
1583 @code{vars_of_*()} function. The former declares any Lisp primitives
1584 you have defined and defines any symbols you will be using. The latter
1585 declares any global Lisp variables you have added and initializes global
1586 C variables in the module. For each such function, declare it in
1587 @file{symsinit.h} and make sure it's called in the appropriate place in
1588 @code{main()}. @strong{Important}: There are stringent requirements on
1589 exactly what can go into these functions. See the comment in
1590 @code{main()}. The reason for this is to avoid obscure unwanted
1591 interactions during initialization. If you don't follow these rules,
1592 you'll be sorry! If you want to do anything that isn't allowed, create
1593 a @code{complex_vars_of_*()} function for it. Doing this is tricky,
1594 though: You have to make sure your function is called at the right time
1595 so that all the initialization dependencies work out.
1596
1597 Every module includes @file{<config.h>} (angle brackets so that
1598 @samp{--srcdir} works correctly) and @file{lisp.h}. @file{config.h}
1599 should always be included before any other header files (including
1600 system header files) to ensure that certain tricks played by various
1601 @file{s/} and @file{m/} files work out correctly.
1602
1603 @strong{All global and static variables that are to be modifiable must
1604 be declared uninitialized.} This means that you may not use the ``declare
1605 with initializer'' form for these variables, such as @code{int
1606 some_variable = 0;}. The reason for this has to do with some kludges
1607 done during the dumping process: If possible, the initialized data
1608 segment is re-mapped so that it becomes part of the (unmodifiable) code
1609 segment in the dumped executable. This allows this memory to be shared
1610 among multiple running XEmacs processes. XEmacs is careful to place as
1611 much constant data as possible into initialized variables (in
1612 particular, into what's called the @dfn{pure space} -- see below) during
1613 the @file{temacs} phase.
1614
1615 @cindex copy-on-write
1616 @strong{Note:} This kludge only works on a few systems nowadays, and is
1617 rapidly becoming irrelevant because most modern operating systems provide
1618 @dfn{copy-on-write} semantics. All data is initially shared between
1619 processes, and a private copy is automatically made (on a page-by-page
1620 basis) when a process first attempts to write to a page of memory.
1621
1622 Formerly, there was a requirement that static variables not be
1623 declared inside of functions. This had to do with another hack along
1624 the same vein as what was just described: old USG systems put
1625 statically-declared variables in the initialized data space, so those
1626 header files had a @code{#define static} declaration. (That way, the
1627 data-segment remapping described above could still work.) This fails
1628 badly on static variables inside of functions, which suddenly become
1629 automatic variables; therefore, you weren't supposed to have any of
1630 them. This awful kludge has been removed in XEmacs because
1631
1632 @enumerate
1633 @item
1634 almost all of the systems that used this kludge ended up having
1635 to disable the data-segment remapping anyway;
1636 @item
1637 the only systems that didn't were extremely outdated ones;
1638 @item
1639 this hack completely messed up inline functions.
1640 @end enumerate
1641
1642 @node Writing Lisp Primitives
1643 @section Writing Lisp Primitives
1644
1645 Lisp primitives are Lisp functions implemented in C. The details of
1646 interfacing the C function so that Lisp can call it are handled by a few
1647 C macros. The only way to really understand how to write new C code is
1648 to read the source, but we can explain some things here.
1649
1650 An example of a special form is the definition of @code{or}, from
1651 @file{eval.c}. (An ordinary function would have the same general
1652 appearance.)
1653
1654 @cindex garbage collection protection
1655 @smallexample
1656 @group
1657 DEFUN ("or", For, Sor, 0, UNEVALLED, 0 /*
1658 Eval args until one of them yields non-nil, then return that value.
1659 The remaining args are not evalled at all.
1660 @end group
1661 @group
1662 If all args return nil, return nil.
1663 */ )
1664 (args)
1665 Lisp_Object args;
1666 @{
1667 /* This function can GC */
1668 REGISTER Lisp_Object val;
1669 Lisp_Object args_left;
1670 struct gcpro gcpro1;
1671 @end group
1672
1673 @group
1674 if (NILP (args))
1675 return Qnil;
1676
1677 args_left = args;
1678 GCPRO1 (args_left);
1679 @end group
1680
1681 @group
1682 do
1683 @{
1684 val = Feval (Fcar (args_left));
1685 if (!NILP (val))
1686 break;
1687 args_left = Fcdr (args_left);
1688 @}
1689 while (!NILP (args_left));
1690 @end group
1691
1692 @group
1693 UNGCPRO;
1694 return val;
1695 @}
1696 @end group
1697 @end smallexample
1698
1699 Let's start with a precise explanation of the arguments to the
1700 @code{DEFUN} macro. Here is a template for them:
1701
1702 @example
1703 DEFUN (@var{lname}, @var{fname}, @var{sname}, @var{min}, @var{max}, @var{interactive} /* @var{doc} */ )
1704 @end example
1705
1706 @table @var
1707 @item lname
1708 This is the name of the Lisp symbol to define as the function name; in
1709 the example above, it is @code{or}.
1710
1711 @item fname
1712 This is the C function name for this function. This is
1713 the name that is used in C code for calling the function. The name is,
1714 by convention, @samp{F} prepended to the Lisp name, with all dashes
1715 (@samp{-}) in the Lisp name changed to underscores. Thus, to call this
1716 function from C code, call @code{For}. Remember that the arguments must
1717 be of type @code{Lisp_Object}; various macros and functions for creating
1718 values of type @code{Lisp_Object} are declared in the file
1719 @file{lisp.h}.
1720
1721 Primitives whose names are special characters (e.g. @code{+} or
1722 @code{<}) are named by spelling out, in some fashion, the special
1723 character: e.g. @code{Fplus()} or @code{Flss()}. Primitives whose names
1724 begin with normal alphanumeric characters but also contain special
1725 characters are spelled out in some creative way, e.g. @code{let*}
1726 becomes @code{FletX()}.
1727
1728 @item sname
1729 This is a C variable name to use for a structure that holds the data for
1730 the subr object that represents the function in Lisp. This structure
1731 conveys the Lisp symbol name to the initialization routine that will
1732 create the symbol and store the subr object as its definition. By
1733 convention, this name is always @var{fname} with @samp{F} replaced with
1734 @samp{S}.
1735
1736 @item min
1737 This is the minimum number of arguments that the function requires. The
1738 function @code{or} allows a minimum of zero arguments.
1739
1740 @item max
1741 This is the maximum number of arguments that the function accepts, if
1742 there is a fixed maximum. Alternatively, it can be @code{UNEVALLED},
1743 indicating a special form that receives unevaluated arguments, or
1744 @code{MANY}, indicating an unlimited number of evaluated arguments (the
1745 equivalent of @code{&rest}). Both @code{UNEVALLED} and @code{MANY} are
1746 macros. If @var{max} is a number, it may not be less than @var{min} and
1747 it may not be greater than 12. (If you need to add a function with
1748 more than 12 arguments, either use the @code{MANY} form or edit the
1749 definition of @code{DEFUN} in @file{lisp.h}. If you do the latter,
1750 make sure to also add another clause to the switch statement in
1751 @code{primitive_funcall().})
1752
1753 @item interactive
1754 This is an interactive specification, a string such as might be used as
1755 the argument of @code{interactive} in a Lisp function. In the case of
1756 @code{or}, it is 0 (a null pointer), indicating that @code{or} cannot be
1757 called interactively. A value of @code{""} indicates a function that
1758 should receive no arguments when called interactively.
1759
1760 @item doc
1761 This is the documentation string. It is written just like a
1762 documentation string for a function defined in Lisp; in particular,
1763 the first line should be a single sentence. Note how the documentation
1764 string is enclosed in a comment, none of the documentation is placed
1765 on the same lines as the comment-start and comment-end characters, and
1766 the comment-start characters are on the same line as the interactive
1767 specification. @file{make-docfile}, which scans the C files for
1768 documentation strings, is very particular about what it looks for,
1769 and will not properly note the doc string if it's not in this exact
1770 format.
1771 @end table
1772
1773 You are free to put the various arguments to @code{DEFUN} on separate
1774 lines to avoid overly long lines. However, make sure to put the
1775 comment-start characters for the doc string on the same line as the
1776 interactive specification, and put a newline directly after them
1777 (and before the comment-end characters).
1778
1779 After the call to the @code{DEFUN} macro, you must write the argument
1780 name list that every C function must have, followed by ordinary C
1781 declarations for the arguments. For a function with a fixed maximum
1782 number of arguments, declare a C argument for each Lisp argument, and
1783 give them all type @code{Lisp_Object}. When a Lisp function has no
1784 upper limit on the number of arguments, its implementation in C actually
1785 receives exactly two arguments: the first is the number of Lisp
1786 arguments, and the second is the address of a block containing their
1787 values. They have types @code{int} and @w{@code{Lisp_Object *}}.
1788
1789 The names of the C arguments will be used as the names of the arguments
1790 to the Lisp primitive as displayed in its documentation, modulo the
1791 same concerns described above for @code{F...} names (in particular,
1792 underscores in the C arguments become dashes in the Lisp arguments).
1793 There is one additional kludge: A C argument called @code{defalt}
1794 becomes the Lisp argument @code{default}. This deliberate misspelling
1795 is done because @code{default} is a reserved word in the C language.
1796
1797 Note that you @emph{must} use old-style prototypes for the arguments
1798 to @code{DEFUN}, even though all other functions in the C code use
1799 new-style prototypes.
1800
1801 Within the function @code{For} itself, note the use of the macros
1802 @code{GCPRO1} and @code{UNGCPRO}. @code{GCPRO1} is used to ``protect''
1803 a variable from garbage collection---to inform the garbage collector
1804 that it must look in that variable and regard its contents as an
1805 accessible object. This is necessary whenever you call @code{Feval} or
1806 anything that can directly or indirectly call @code{Feval} (this
1807 includes the @code{QUIT} macro!). At such a time, any Lisp object that
1808 you intend to refer to again must be protected somehow. @code{UNGCPRO}
1809 cancels the protection of the variables that are protected in the
1810 current function. It is necessary to do this explicitly.
1811
1812 The macro @code{GCPRO1} protects just one local variable. If you want
1813 to protect two, use @code{GCPRO2} instead; repeating @code{GCPRO1} will
1814 not work. Macros @code{GCPRO3} and @code{GCPRO4} also exist.
1815
1816 These macros implicitly use local variables such as @code{gcpro1}; you
1817 must declare these explicitly, with type @code{struct gcpro}. Thus, if
1818 you use @code{GCPRO2}, you must declare @code{gcpro1} and @code{gcpro2}.
1819
1820 @cindex caller-protects (@code{GCPRO} rule)
1821 Note also that the general rule is @dfn{caller-protects}; i.e. you
1822 are only responsible for protecting those Lisp objects that you create.
1823 Any objects passed to you as parameters should have been protected
1824 by whoever created them, so you don't in general have to protect them.
1825 @code{For} is an exception; it protects its parameters to provide
1826 extra assurance against Lisp primitives elsewhere that are incorrectly
1827 written, and against malicious self-modifying code. There are a few
1828 other standard functions that also do this.
1829
1830 @code{GCPRO}ing is perhaps the trickiest and most error-prone part
1831 of XEmacs coding. It is @strong{extremely} important that you get this
1832 right and use a great deal of discipline when writing this code.
1833 @xref{GCPROing, ,@code{GCPRO}ing}, for full details on how to do this.
1834
1835 What @code{DEFUN} actually does is declare a global structure of
1836 type @code{Lisp_Subr} whose name begins with a capital @samp{S} and
1837 which contains information about the primitive (e.g. a pointer to the
1838 function, its minimum and maximum allowed arguments, a string describing
1839 its Lisp name); @code{DEFUN} then begins a normal C function
1840 declaration using the @code{F...} name. The Lisp subr object that is
1841 the function definition of a primitive (i.e. the object in the function
1842 slot of the symbol that names the primitive) actually points to this
1843 @samp{S} structure; when @code{Feval} encounters a subr, it looks in the
1844 structure to find out how to call the C function.
1845
1846 Defining the C function is not enough to make a Lisp primitive
1847 available; you must also create the Lisp symbol for the primitive (the
1848 symbol is @dfn{interned}; @pxref{Obarrays}) and store a suitable subr
1849 object in its function cell. (If you don't do this, the primitive won't
1850 be seen by Lisp code.) The code looks like this:
1851
1852 @example
1853 defsubr (&@var{subr-structure-name});
1854 @end example
1855
1856 @noindent
1857 Here @var{subr-structure-name} is the name you used as the third
1858 argument to @code{DEFUN}.
1859
1860 This call to @code{defsubr} should go in the @code{syms_of_*()}
1861 function at the end of the module. If no such function exists, create
1862 it and make sure to also declare it in @file{symsinit.h} and call it
1863 from the appropriate spot in @code{main()}. @xref{General Coding
1864 Rules}.
1865
1866 Note that C code cannot call functions by name unless they are defined
1867 in C. The way to call a function written in Lisp is to use
1868 @code{Ffuncall}, which embodies the Lisp function @code{funcall}. Since
1869 the Lisp function @code{funcall} accepts an unlimited number of
1870 arguments, in C it takes two: the number of Lisp-level arguments, and a
1871 one-dimensional array containing their values. The first Lisp-level
1872 argument is the Lisp function to call, and the rest are the arguments to
1873 pass to it. Since @code{Ffuncall} can call the evaluator, you must
1874 protect pointers from garbage collection around the call to
1875 @code{Ffuncall}. (However, @code{Ffuncall} explicitly protects all of
1876 its parameters, so you don't have to protect any pointers passed
1877 as parameters to it.)
1878
1879 The C functions @code{call0}, @code{call1}, @code{call2}, and so on,
1880 provide handy ways to call a Lisp function conveniently with a fixed
1881 number of arguments. They work by calling @code{Ffuncall}.
1882
1883 @file{eval.c} is a very good file to look through for examples;
1884 @file{lisp.h} contains the definitions for some important macros and
1885 functions.
1886
1887 @node Adding Global Lisp Variables
1888 @section Adding Global Lisp Variables
1889
1890 Global variables whose names begin with @samp{Q} are constants whose
1891 value is a symbol of a particular name. The name of the variable should
1892 be derived from the name of the symbol using the same rules as for Lisp
1893 primitives. These variables are initialized using a call to
1894 @code{defsymbol()} in the @code{syms_of_*()} function. (This call
1895 interns a symbol, sets the C variable to the resulting Lisp object, and
1896 calls @code{staticpro()} on the C variable to tell the
1897 garbage-collection mechanism about this variable. What
1898 @code{staticpro()} does is add a pointer to the variable to a large
1899 global array; when garbage-collection happens, all pointers listed in
1900 the array are used as starting points for marking Lisp objects. This is
1901 important because it's quite possible that the only current reference to
1902 the object is the C variable. In the case of symbols, the
1903 @code{staticpro()} doesn't matter all that much because the symbol is
1904 contained in @code{obarray}, which is itself @code{staticpro()}ed.
1905 However, it's possible that a naughty user could do something like
1906 uninterning the symbol out of @code{obarray} or even setting
1907 @code{obarray} to a different value [although this is likely to make
1908 XEmacs crash!].)
1909
1910 @strong{Note:} It is potentially deadly if you declare a @samp{Q...}
1911 variable in two different modules. The two calls to @code{defsymbol()}
1912 are no problem, but some linkers will complain about multiply-defined
1913 symbols. The most insidious aspect of this is that often the link will
1914 succeed anyway, but then the resulting executable will sometimes crash
1915 in obscure ways during certain operations! To avoid this problem,
1916 declare any symbols with common names (such as @code{text}) that are not
1917 obviously associated with this particular module in the module
1918 @file{general.c}.
1919
1920 Global variables whose names begin with @samp{V} are variables that
1921 contain Lisp objects. The convention here is that all global variables
1922 of type @code{Lisp_Object} begin with @samp{V}, and all others don't
1923 (including integer and boolean variables that have Lisp
1924 equivalents). Most of the time, these variables have equivalents in
1925 Lisp, but some don't. Those that do are declared this way by a call to
1926 @code{DEFVAR_LISP()} in the @code{vars_of_*()} initializer for the
1927 module. What this does is create a special @dfn{symbol-value-forward}
1928 Lisp object that contains a pointer to the C variable, intern a symbol
1929 whose name is as specified in the call to @code{DEFVAR_LISP()}, and set
1930 its value to the symbol-value-forward Lisp object; it also calls
1931 @code{staticpro()} on the C variable to tell the garbage-collection
1932 mechanism about the variable. When @code{eval} (or actually
1933 @code{symbol-value}) encounters this special object in the process of
1934 retrieving a variable's value, it follows the indirection to the C
1935 variable and gets its value. @code{setq} does similar things so that
1936 the C variable gets changed.
1937
1938 Whether or not you @code{DEFVAR_LISP()} a variable, you need to
1939 initialize it in the @code{vars_of_*()} function; otherwise it will end
1940 up as all zeroes, which is the integer 0 (@emph{not} @code{nil}), and
1941 this is probably not what you want. Also, if the variable is not
1942 @code{DEFVAR_LISP()}ed, @strong{you must call} @code{staticpro()} on the
1943 C variable in the @code{vars_of_*()} function. Otherwise, the
1944 garbage-collection mechanism won't know that the object in this variable
1945 is in use, and will happily collect it and reuse its storage for another
1946 Lisp object, and you will be the one who's unhappy when you can't figure
1947 out how your variable got overwritten.
1948
1949 @node A Summary of the Various XEmacs Modules, Allocation of Objects in XEmacs Lisp, Rules When Writing New C Code, Top
1950 @chapter A Summary of the Various XEmacs Modules
1951
1952 This is accurate as of XEmacs 20.0.
1953
1954 @menu
1955 * Low-Level Modules::
1956 * Basic Lisp Modules::
1957 * Modules for Standard Editing Operations::
1958 * Editor-Level Control Flow Modules::
1959 * Modules for the Basic Displayable Lisp Objects::
1960 * Modules for other Display-Related Lisp Objects::
1961 * Modules for the Redisplay Mechanism::
1962 * Modules for Interfacing with the File System::
1963 * Modules for Other Aspects of the Lisp Interpreter and Object System::
1964 * Modules for Interfacing with the Operating System::
1965 * Modules for Interfacing with X Windows::
1966 * Modules for Internationalization::
1967 @end menu
1968
1969 @node Low-Level Modules
1970 @section Low-Level Modules
1971
1972 @example
1973 size name
1974 ------- ---------------------
1975 18150 config.h
1976 @end example
1977
1978 This is automatically generated from @file{config.h.in} based on the
1979 results of configure tests and user-selected optional features and
1980 contains preprocessor definitions specifying the nature of the
1981 environment in which XEmacs is being compiled.
1982
1983
1984
1985 @example
1986 2347 paths.h
1987 @end example
1988
1989 This is automatically generated from @file{paths.h.in} based on supplied
1990 configure values, and allows for non-standard installed configurations
1991 of the XEmacs directories. It's currently broken, though.
1992
1993
1994
1995 @example
1996 47878 emacs.c
1997 20239 signal.c
1998 @end example
1999
2000 @file{emacs.c} contains @code{main()} and other code that performs the most
2001 basic environment initializations and handles shutting down the XEmacs
2002 process (this includes @code{kill-emacs}, the normal way that XEmacs is
2003 exited; @code{dump-emacs}, which is used during the build process to
2004 write out the XEmacs executable; @code{run-emacs-from-temacs}, which can
2005 be used to start XEmacs directly when temacs has finished loading all
2006 the Lisp code; and emergency code to handle crashes [XEmacs tries to
2007 auto-save all files before it crashes]).
2008
2009 Low-level code that directly interacts with the Unix signal mechanism,
2010 however, is in @file{signal.c}. Note that this code does not handle system
2011 dependencies in interfacing to signals; that is handled using the
2012 @file{syssignal.h} header file, described in section J below.
2013
2014
2015
2016 @example
2017 23458 unexaix.c
2018 9893 unexalpha.c
2019 11302 unexapollo.c
2020 16544 unexconvex.c
2021 31967 unexec.c
2022 30959 unexelf.c
2023 35791 unexelfsgi.c
2024 3207 unexencap.c
2025 7276 unexenix.c
2026 20539 unexfreebsd.c
2027 1153 unexfx2800.c
2028 13432 unexhp9k3.c
2029 11049 unexhp9k800.c
2030 9165 unexmips.c
2031 8981 unexnext.c
2032 1673 unexsol2.c
2033 19261 unexsunos4.c
2034 @end example
2035
2036 These modules contain code dumping out the XEmacs executable on various
2037 different systems. (This process is highly machine-specific and
2038 requires intimate knowledge of the executable format and the memory map
2039 of the process.) Only one of these modules is actually used; this is
2040 chosen by @file{configure}.
2041
2042
2043
2044 @example
2045 15715 crt0.c
2046 1484 lastfile.c
2047 1115 pre-crt0.c
2048 @end example
2049
2050 These modules are used in conjunction with the dump mechanism. On some
2051 systems, an alternative version of the C startup code (the actual code
2052 that receives control from the operating system when the process is
2053 started, and which calls @code{main()}) is required so that the dumping
2054 process works properly; @file{crt0.c} provides this.
2055
2056 @file{pre-crt0.c} and @file{lastfile.c} should be the very first and
2057 very last file linked, respectively. (Actually, this is not really true.
2058 @file{lastfile.c} should be after all Emacs modules whose initialized
2059 data should be made constant, and before all other Emacs files and all
2060 libraries. In particular, the allocation modules @file{gmalloc.c},
2061 @file{alloca.c}, etc. are normally placed past @file{lastfile.c}, and
2062 all of the files that implement Xt widget classes @emph{must} be placed
2063 after @file{lastfile.c} because they contain various structures that
2064 must be statically initialized and into which Xt writes at various
2065 times.) @file{pre-crt0.c} and @file{lastfile.c} contain exported symbols
2066 that are used to determine the start and end of XEmacs's initialized
2067 data space when dumping.
2068
2069
2070
2071 @example
2072 14786 alloca.c
2073 16678 free-hook.c
2074 1692 getpagesize.h
2075 41936 gmalloc.c
2076 25141 malloc.c
2077 3802 mem-limits.h
2078 39011 ralloc.c
2079 3436 vm-limit.c
2080 @end example
2081
2082 These handle basic C allocation of memory. @file{alloca.c} is an emulation of
2083 the stack allocation function @code{alloca()} on machines that lack
2084 this. (XEmacs makes extensive use of @code{alloca()} in its code.)
2085
2086 @file{gmalloc.c} and @file{malloc.c} are two implementations of the standard C
2087 functions @code{malloc()}, @code{realloc()} and @code{free()}. They are
2088 often used in place of the standard system-provided @code{malloc()}
2089 because they usually provide a much faster implementation, at the
2090 expense of additional memory use. @file{gmalloc.c} is a newer implementation
2091 that is much more memory-efficient for large allocations than @file{malloc.c},
2092 and should always be preferred if it works. (At one point, @file{gmalloc.c}
2093 didn't work on some systems where @file{malloc.c} worked; but this should be
2094 fixed now.)
2095
2096 @cindex relocating allocator
2097 @file{ralloc.c} is the @dfn{relocating allocator}. It provides functions
2098 similar to @code{malloc()}, @code{realloc()} and @code{free()} that allocate
2099 memory that can be dynamically relocated in memory. The advantage of
2100 this is that allocated memory can be shuffled around to place all the
2101 free memory at the end of the heap, and the heap can then be shrunk,
2102 releasing the memory back to the operating system. The use of this can
2103 be controlled with the configure option @code{--rel-alloc}; if enabled, memory allocated for
2104 buffers will be relocatable, so that if a very large file is visited and
2105 the buffer is later killed, the memory can be released to the operating
2106 system. (The disadvantage of this mechanism is that it can be very
2107 slow. On systems with the @code{mmap()} system call, the XEmacs version
2108 of @file{ralloc.c} uses this to move memory around without actually having to
2109 block-copy it, which can speed things up; but it can still cause
2110 noticeable performance degradation.)
2111
2112 @file{free-hook.c} contains some debugging functions for checking for invalid
2113 arguments to @code{free()}.
2114
2115 @file{vm-limit.c} contains some functions that warn the user when memory is
2116 getting low. These are callback functions that are called by @file{gmalloc.c}
2117 and @file{malloc.c} at appropriate times.
2118
2119 @file{getpagesize.h} provides a uniform interface for retrieving the size of a
2120 page in virtual memory. @file{mem-limits.h} provides a uniform interface for
2121 retrieving the total amount of available virtual memory. Both are
2122 similar in spirit to the @file{sys*.h} files described in section J, below.
2123
2124
2125
2126 @example
2127 2659 blocktype.c
2128 1410 blocktype.h
2129 7194 dynarr.c
2130 2671 dynarr.h
2131 @end example
2132
2133 These implement a couple of basic C data types to facilitate memory
2134 allocation. The @code{Blocktype} type efficiently manages the
2135 allocation of fixed-size blocks by minimizing the number of times that
2136 @code{malloc()} and @code{free()} are called. It allocates memory in
2137 large chunks, subdivides the chunks into blocks of the proper size, and
2138 returns the blocks as requested. When blocks are freed, they are placed
2139 onto a linked list, so they can be efficiently reused. This data type
2140 is not much used in XEmacs currently, because it's a fairly new
2141 addition.
2142
2143 @cindex dynamic array
2144 The @code{Dynarr} type implements a @dfn{dynamic array}, which is
2145 similar to a standard C array but has no fixed limit on the number of
2146 elements it can contain. Dynamic arrays can hold elements of any type,
2147 and when you add a new element, the array automatically resizes itself
2148 if it isn't big enough. Dynarrs are extensively used in the redisplay
2149 mechanism.
2150
2151
2152
2153 @example
2154 2058 inline.c
2155 @end example
2156
2157 This module is used in connection with inline functions (available in
2158 some compilers). Often, inline functions need to have a corresponding
2159 non-inline function that does the same thing. This module is where they
2160 reside. It contains no actual code, but defines some special flags that
2161 cause inline functions defined in header files to be rendered as actual
2162 functions. It then includes all header files that contain any inline
2163 function definitions, so that each one gets a real function equivalent.
2164
2165
2166
2167 @example
2168 6489 debug.c
2169 2267 debug.h
2170 @end example
2171
2172 These functions provide a system for doing internal consistency checks
2173 during code development. This system is not currently used; instead the
2174 simpler @code{assert()} macro is used along with the various checks
2175 provided by the @samp{--error-check-*} configuration options.
2176
2177
2178
2179 @example
2180 1643 prefix-args.c
2181 @end example
2182
2183 This is actually the source for a small, self-contained program
2184 used during building.
2185
2186
2187 @example
2188 904 universe.h
2189 @end example
2190
2191 This is not currently used.
2192
2193
2194
2195 @node Basic Lisp Modules
2196 @section Basic Lisp Modules
2197
2198 @example
2199 size name
2200 ------- ---------------------
2201 70167 emacsfns.h
2202 6305 lisp-disunion.h
2203 7086 lisp-union.h
2204 54929 lisp.h
2205 14235 lrecord.h
2206 10728 symsinit.h
2207 @end example
2208
2209 These are the basic header files for all XEmacs modules. Each module
2210 includes @file{lisp.h}, which brings the other header files in.
2211 @file{lisp.h} contains the definitions of the structures and extractor
2212 and constructor macros for the basic Lisp objects and various other
2213 basic definitions for the Lisp environment, as well as some
2214 general-purpose definitions (e.g. @code{min()} and @code{max()}).
2215 @file{lisp.h} includes either @file{lisp-disunion.h} or
2216 @file{lisp-union.h}, depending on whether @code{NO_UNION_TYPE} is
2217 defined. These files define the typedef of the Lisp object itself (as
2218 described above) and the low-level macros that hide the actual
2219 implementation of the Lisp object. All extractor and constructor macros
2220 for particular types of Lisp objects are defined in terms of these
2221 low-level macros.
2222
2223 As a general rule, all typedefs should go into the typedefs section of
2224 @file{lisp.h} rather than into a module-specific header file even if the
2225 structure is defined elsewhere. This allows function prototypes that
2226 use the typedef to be placed into @file{emacsfns.h}. Forward structure
2227 declarations (i.e. a simple declaration like @code{struct foo;} where
2228 the structure itself is defined elsewhere) should be placed into the
2229 typedefs section as necessary.
2230
2231 @file{lrecord.h} contains the basic structures and macros that implement
2232 all record-type Lisp objects -- i.e. all objects whose type is a field
2233 in their C structure, which includes all objects except the few most
2234 basic ones.
2235
2236 @file{emacsfns.h} contains prototypes for most of the exported functions
2237 in the various modules. (In particular, prototypes for Lisp primitives
2238 should always go in this header file. Prototypes for other functions
2239 can either go here or in a module-specific header file, depending on how
2240 general-purpose the function is and whether it has special-purpose
2241 argument types requiring definitions not in @file{lisp.h}.) All
2242 initialization functions are prototyped in @file{symsinit.h}.
2243
2244
2245
2246 @example
2247 120478 alloc.c
2248 1029 pure.c
2249 2506 puresize.h
2250 @end example
2251
2252 The large module @file{alloc.c} implements all of the basic allocation and
2253 garbage collection for Lisp objects. The most commonly used Lisp
2254 objects are allocated in chunks, similar to the Blocktype data type
2255 described above; others are allocated in individually @code{malloc()}ed
2256 blocks. This module provides the foundation on which all other aspects
2257 of the Lisp environment sit, and is the first module initialized at
2258 startup.
2259
2260 Note that @file{alloc.c} provides a series of generic functions that are
2261 not dependent on any particular object type, and interfaces to
2262 particular types of objects using a standardized interface of
2263 type-specific methods. This scheme is a fundamental principle of
2264 object-oriented programming and is heavily used throughout XEmacs. The
2265 great advantage of this is that it allows for a clean separation of
2266 functionality into different modules -- new classes of Lisp objects, new
2267 event interfaces, new device types, new stream interfaces, etc. can be
2268 added transparently without affecting code anywhere else in XEmacs.
2269 Because the different subsystems are divided into general and specific
2270 code, adding a new subtype within a subsystem will in general not
2271 require changes to the generic subsystem code or affect any of the other
2272 subtypes in the subsystem; this provides a great deal of robustness to
2273 the XEmacs code.
2274
2275 @cindex pure space
2276 @file{pure.c} contains the declaration of the @dfn{purespace} array.
2277 Pure space is a hack used to place some constant Lisp data into the code
2278 segment of the XEmacs executable, even though the data needs to be
2279 initialized through function calls. (See above in section VIII for more
2280 info about this.) During startup, certain sorts of data is
2281 automatically copied into pure space, and other data is copied manually
2282 in some of the basic Lisp files by calling the function @code{purecopy},
2283 which copies the object if possible (this only works in temacs, of
2284 course) and returns the new object. In particular, while temacs is
2285 executing, the Lisp reader automatically copies all compiled-function
2286 objects that it reads into pure space. Since compiled-function objects
2287 are large, are never modified, and typically comprise the majority of
2288 the contents of a compiled-Lisp file, this works well. While XEmacs is
2289 running, any attempt to modify an object that resides in pure space
2290 causes an error. Objects in pure space are never garbage collected --
2291 almost all of the time, they're intended to be permanent, and in any
2292 case you can't write into pure space to set the mark bits.
2293
2294 @file{puresize.h} contains the declaration of the size of the pure space
2295 array. This depends on the optional features that are compiled in, any
2296 extra purespace requested by the user at compile time, and certain other
2297 factors (e.g. 64-bit machines need more pure space because their Lisp
2298 objects are larger). The smallest size that suffices should be used, so
2299 that there's no wasted space. If there's not enough pure space, you
2300 will get an error during the build process, specifying how much more
2301 pure space is needed.
2302
2303
2304
2305 @example
2306 122243 eval.c
2307 2305 backtrace.h
2308 @end example
2309
2310 This module contains all of the functions to handle the flow of control.
2311 This includes the mechanisms of defining functions, calling functions,
2312 traversing stack frames, and binding variables; the control primitives
2313 and other special forms such as @code{while}, @code{if}, @code{eval},
2314 @code{let}, @code{and}, @code{or}, @code{progn}, etc.; handling of
2315 non-local exits, unwind-protects, and exception handlers; entering the
2316 debugger; methods for the subr Lisp object type; etc. It does
2317 @emph{not} include the @code{read} function, the @code{print} function,
2318 or the handling of symbols and obarrays.
2319
2320 @file{backtrace.h} contains some structures related to stack frames and the
2321 flow of control.
2322
2323
2324
2325 @example
2326 64949 lread.c
2327 @end example
2328
2329 This module implements the Lisp reader and the @code{read} function,
2330 which converts text into Lisp objects, according to the read syntax of
2331 the objects, as described above. This is similar to the parser that is
2332 a part of all compilers.
2333
2334
2335
2336 @example
2337 40900 print.c
2338 @end example
2339
2340 This module implements the Lisp print mechanism and the @code{print}
2341 function and related functions. This is the inverse of the Lisp reader
2342 -- it converts Lisp objects to a printed, textual representation.
2343 (Hopefully something that can be read back in using @code{read} to get
2344 an equivalent object.)
2345
2346
2347
2348 @example
2349 4518 general.c
2350 60220 symbols.c
2351 9966 symeval.h
2352 @end example
2353
2354 @file{symbols.c} implements the handling of symbols, obarrays, and
2355 retrieving the values of symbols. Much of the code is devoted to
2356 handling the special @dfn{symbol-value-magic} objects that define
2357 special types of variables -- this includes buffer-local variables,
2358 variable aliases, variables that forward into C variables, etc. This
2359 module is initialized extremely early (right after @file{alloc.c}),
2360 because it is here that the basic symbols @code{t} and @code{nil} are
2361 created, and those symbols are used everywhere throughout XEmacs.
2362
2363 @file{symeval.h} contains the definitions of symbol structures and the
2364 @code{DEFVAR_LISP()} and related macros for declaring variables.
2365
2366
2367
2368 @example
2369 48973 data.c
2370 25694 floatfns.c
2371 71049 fns.c
2372 @end example
2373
2374 These modules implement the methods and standard Lisp primitives for all
2375 the basic Lisp object types other than symbols (which are described
2376 above). @file{data.c} contains all the predicates (primitives that return
2377 whether an object is of a particular type); the integer arithmetic
2378 functions; and the basic accessor and mutator primitives for the various
2379 object types. @file{fns.c} contains all the standard predicates for working
2380 with sequences (where, abstractly speaking, a sequence is an ordered set
2381 of objects, and can be represented by a list, string, vector, or
2382 bit-vector); it also contains @code{equal}, perhaps on the grounds that
2383 bulk of the operation of @code{equal} is comparing sequences.
2384 @file{floatfns.c} contains methods and primitives for floats and floating-point
2385 arithmetic.
2386
2387
2388
2389 @example
2390 23555 bytecode.c
2391 3358 bytecode.h
2392 @end example
2393
2394 @file{bytecode.c} implements the byte-code interpreter, and @file{bytecode.h} contains
2395 associated structures. Note that the byte-code @emph{compiler} is
2396 written in Lisp.
2397
2398
2399
2400
2401 @node Modules for Standard Editing Operations
2402 @section Modules for Standard Editing Operations
2403
2404 @example
2405 size name
2406 ------- ---------------------
2407 82900 buffer.c
2408 60964 buffer.h
2409 6059 bufslots.h
2410 @end example
2411
2412 @file{buffer.c} implements the buffer Lisp object type. This includes
2413 functions that create and destroy buffers; retrieve buffers by name or
2414 by other properties; manipulate lists of buffers (remember that buffers
2415 are permanent objects and stored in various ordered lists); retrieve or
2416 change buffer properties; etc. It also contains the definitions of all
2417 the built-in buffer-local variables (which can be viewed as buffer
2418 properties). It does @emph{not} contain code to manipulate buffer-local
2419 variables (that's in @file{symbols.c}, described above); or code to manipulate
2420 the text in a buffer.
2421
2422 @file{buffer.h} defines the structures associated with a buffer and the various
2423 macros for retrieving text from a buffer and special buffer positions
2424 (e.g. @code{point}, the default location for text insertion). It also
2425 contains macros for working with buffer positions and converting between
2426 their representations as character offsets and as byte offsets (under
2427 MULE, they are different, because characters can be multi-byte). It is
2428 one of the largest header files.
2429
2430 @file{bufslots.h} defines the fields in the buffer structure that correspond to
2431 the built-in buffer-local variables. It is its own header file because
2432 it is included many times in @file{buffer.c}, as a way of iterating over all
2433 the built-in buffer-local variables.
2434
2435
2436
2437 @example
2438 79888 insdel.c
2439 6103 insdel.h
2440 @end example
2441
2442 @file{insdel.c} contains low-level functions for inserting and deleting text in
2443 a buffer, keeping track of changed regions for use by redisplay, and
2444 calling any before-change and after-change functions that may have been
2445 registered for the buffer. It also contains the actual functions that
2446 convert between byte offsets and character offsets.
2447
2448 @file{insdel.h} contains associated headers.
2449
2450
2451
2452 @example
2453 10975 marker.c
2454 @end example
2455
2456 This module implements the marker Lisp object type, which conceptually
2457 is a pointer to a text position in a buffer that moves around as text is
2458 inserted and deleted, so as to remain in the same relative position.
2459 This module doesn't actually move the markers around -- that's handled
2460 in @file{insdel.c}. This module just creates them and implements the
2461 primitives for working with them. As markers are simple objects, this
2462 does not entail much.
2463
2464 Note that the standard arithmetic primitives (e.g. @code{+}) accept
2465 markers in place of integers and automatically substitute the value of
2466 @code{marker-position} for the marker, i.e. an integer describing the
2467 current buffer position of the marker.
2468
2469
2470
2471 @example
2472 193714 extents.c
2473 15686 extents.h
2474 @end example
2475
2476 This module implements the extent Lisp object type, which is like a
2477 marker that works over a range of text rather than a single position.
2478 Extents are also much more complex and powerful than markers and have a
2479 more efficient (and more algorithmically complex) implementation. The
2480 implementation is described in detail in comments in @file{extents.c}.
2481
2482 The code in @file{extents.c} works closely with @file{insdel.c} so that
2483 extents are properly moved around as text is inserted and deleted.
2484 There is also code in @file{extents.c} that provides information needed
2485 by the redisplay mechanism for efficient operation. (Remember that
2486 extents can have display properties that affect [sometimes drastically,
2487 as in the @code{invisible} property] the display of the text they
2488 cover.)
2489
2490
2491
2492 @example
2493 60155 editfns.c
2494 @end example
2495
2496 @file{editfns.c} contains the standard Lisp primitives for working with
2497 a buffer's text, and calls the low-level functions in @file{insdel.c}.
2498 It also contains primitives for working with @code{point} (the default
2499 buffer insertion location).
2500
2501 @file{editfns.c} also contains functions for retrieving various
2502 characteristics from the external environment: the current time, the
2503 process ID of the running XEmacs process, the name of the user who ran
2504 this XEmacs process, etc. It's not clear why this code is in
2505 @file{editfns.c}.
2506
2507
2508
2509 @example
2510 26081 callint.c
2511 12577 cmds.c
2512 2749 commands.h
2513 @end example
2514
2515 @cindex interactive
2516 These modules implement the basic @dfn{interactive} commands,
2517 i.e. user-callable functions. Commands, as opposed to other functions,
2518 have special ways of getting their parameters interactively (by querying
2519 the user), as opposed to having them passed in a normal function
2520 invocation. Many commands are not really meant to be called from other
2521 Lisp functions, because they modify global state in a way that's often
2522 undesired as part of other Lisp functions.
2523
2524 @file{callint.c} implements the mechanism for querying the user for
2525 parameters and calling interactive commands. The bulk of this module is
2526 code that parses the interactive spec that is supplied with an
2527 interactive command.
2528
2529 @file{cmds.c} implements the basic, most commonly used editing commands:
2530 commands to move around the current buffer and insert and delete
2531 characters. These commands are implemented using the Lisp primitives
2532 defined in @file{editfns.c}.
2533
2534 @file{commands.h} contains associated structure definitions and prototypes.
2535
2536
2537
2538 @example
2539 194863 regex.c
2540 18968 regex.h
2541 79800 search.c
2542 @end example
2543
2544 @file{search.c} implements the Lisp primitives for searching for text in
2545 a buffer, and some of the low-level algorithms for doing this. In
2546 particular, the fast fixed-string Boyer-Moore search algorithm is
2547 implemented in @file{search.c}. The low-level algorithms for doing
2548 regular-expression searching, however, are implemented in @file{regex.c}
2549 and @file{regex.h}. These two modules are largely independent of
2550 XEmacs, and are similar to (and based upon) the regular-expression
2551 routines used in @file{grep} and other GNU utilities.
2552
2553
2554
2555 @example
2556 20476 doprnt.c
2557 @end example
2558
2559 @file{doprnt.c} implements formatted-string processing, similar to
2560 @code{printf()} command in C.
2561
2562
2563
2564 @example
2565 15372 undo.c
2566 @end example
2567
2568 This module implements the undo mechanism for tracking buffer changes.
2569 Most of this could be implemented in Lisp.
2570
2571
2572
2573 @node Editor-Level Control Flow Modules
2574 @section Editor-Level Control Flow Modules
2575
2576 @example
2577 size name
2578 ------- ---------------------
2579 84546 event-Xt.c
2580 121483 event-stream.c
2581 6658 event-tty.c
2582 49271 events.c
2583 14459 events.h
2584 @end example
2585
2586 These implement the handling of events (user input and other system
2587 notifications).
2588
2589 @file{events.c} and @file{events.h} define the event Lisp object type
2590 and primitives for manipulating it.
2591
2592 @file{event-stream.c} implements the basic functions for working with
2593 event queues, dispatching an event by looking it up in relevant keymaps
2594 and such, and handling timeouts; this includes the primitives
2595 @code{next-event} and @code{dispatch-event}, as well as related
2596 primitives such as @code{sit-for}, @code{sleep-for}, and
2597 @code{accept-process-output}. (@file{event-stream.c} is one of the
2598 hairiest and trickiest modules in XEmacs. Beware! You can easily mess
2599 things up here.)
2600
2601 @file{event-Xt.c} and @file{event-tty.c} implement the low-level
2602 interfaces onto retrieving events from Xt (the X toolkit) and from TTY's
2603 (using @code{read()} and @code{select()}), respectively. The event
2604 interface enforces a clean separation between the specific code for
2605 interfacing with the operating system and the generic code for working
2606 with events, by defining an API of basic, low-level event methods;
2607 @file{event-Xt.c} and @file{event-tty.c} are two different
2608 implementations of this API. To add support for a new operating system
2609 (e.g. NeXTstep), one merely needs to provide another implementation of
2610 those API functions.
2611
2612 Note that the choice of whether to use @file{event-Xt.c} or
2613 @file{event-tty.c} is made at compile time! Or at the very latest, it
2614 is made at startup time. @file{event-Xt.c} handles events for
2615 @emph{both} X and TTY frames; @file{event-tty.c} is only used when X
2616 support is not compiled into XEmacs. The reason for this is that there
2617 is only one event loop in XEmacs: thus, it needs to be able to receive
2618 events from all different kinds of frames.
2619
2620
2621
2622 @example
2623 129583 keymap.c
2624 2621 keymap.h
2625 @end example
2626
2627 @file{keymap.c} and @file{keymap.h} define the keymap Lisp object type
2628 and associated methods and primitives. (Remember that keymaps are
2629 objects that associate event descriptions with functions to be called to
2630 ``execute'' those events; @code{dispatch-event} looks up events in the
2631 relevant keymaps.)
2632
2633
2634
2635 @example
2636 25212 keyboard.c
2637 @end example
2638
2639 @file{keyboard.c} contains functions that implement the actual editor
2640 command loop -- i.e. the event loop that cyclically retrieves and
2641 dispatches events. This code is also rather tricky, just like
2642 @file{event-stream.c}.
2643
2644
2645
2646 @example
2647 9973 macros.c
2648 1397 macros.h
2649 @end example
2650
2651 These two modules contain the basic code for defining keyboard macros.
2652 These functions don't actually do much; most of the code that handles keyboard
2653 macros is mixed in with the event-handling code in @file{event-stream.c}.
2654
2655
2656
2657 @example
2658 23234 minibuf.c
2659 @end example
2660
2661 This contains some miscellaneous code related to the minibuffer (most of
2662 the minibuffer code was moved into Lisp by Richard Mlynarik). This
2663 includes the primitives for completion (although filename completion is
2664 in @file{dired.c}), the lowest-level interface to the minibuffer (if the
2665 command loop were cleaned up, this too could be in Lisp), and code for
2666 dealing with the echo area (this, too, was mostly moved into Lisp, and
2667 the only code remaining is code to call out to Lisp or provide simple
2668 bootstrapping implementations early in temacs, before the echo-area Lisp
2669 code is loaded).
2670
2671
2672
2673 @node Modules for the Basic Displayable Lisp Objects
2674 @section Modules for the Basic Displayable Lisp Objects
2675
2676 @example
2677 size name
2678 ------- ---------------------
2679 985 device-ns.h
2680 6454 device-stream.c
2681 1196 device-stream.h
2682 9526 device-tty.c
2683 8660 device-tty.h
2684 43798 device-x.c
2685 11667 device-x.h
2686 26056 device.c
2687 22993 device.h
2688 @end example
2689
2690 These modules implement the device Lisp object type. This abstracts a
2691 particular screen or connection on which frames are displayed. As with
2692 Lisp objects, event interfaces, and other subsystems, the device code is
2693 separated into a generic component that contains a standardized
2694 interface (in the form of a set of methods) onto particular device
2695 types.
2696
2697 The device subsystem defines all the methods and provides method
2698 services for not only device operations but also for the frame, window,
2699 menubar, scrollbar, toolbar, and other displayable-object subsystems.
2700 The reason for this is that all of these subsystems have the same
2701 subtypes (X, TTY, NeXTstep, Microsoft Windows, etc.) as devices do.
2702
2703
2704
2705 @example
2706 934 frame-ns.h
2707 2303 frame-tty.c
2708 69205 frame-x.c
2709 5976 frame-x.h
2710 68175 frame.c
2711 15080 frame.h
2712 @end example
2713
2714 Each device contains one or more frames in which objects (e.g. text) are
2715 displayed. A frame corresponds to a window in the window system;
2716 usually this is a top-level window but it could potentially be one of a
2717 number of overlapping child windows within a top-level window, using the
2718 MDI (Multiple Document Interface) protocol in Microsoft Windows or a
2719 similar scheme.
2720
2721 The @file{frame-*} files implement the frame Lisp object type and provide the
2722 generic and device-type-specific operations on frames (e.g. raising,
2723 lowering, resizing, moving, etc.).
2724
2725
2726
2727 @example
2728 160783 window.c
2729 15974 window.h
2730 @end example
2731
2732 @cindex window (in Emacs)
2733 @cindex pane
2734 Each frame consists of one or more non-overlapping @dfn{windows} (better
2735 known as @dfn{panes} in standard window-system terminology) in which a
2736 buffer's text can be displayed. Windows can also have scrollbars
2737 displayed around their edges.
2738
2739 @file{window.c} and @file{window.h} implement the window Lisp object
2740 type and provide code to manage windows. Since windows have no
2741 associated resources in the window system (the window system knows only
2742 about the frame; no child windows or anything are used for XEmacs
2743 windows), there is no device-type-specific code here; all of that code
2744 is part of the redisplay mechanism or the code for particular object
2745 types such as scrollbars.
2746
2747
2748
2749 @node Modules for other Display-Related Lisp Objects
2750 @section Modules for other Display-Related Lisp Objects
2751
2752 @example
2753 size name
2754 ------- ---------------------
2755 54397 faces.c
2756 15173 faces.h
2757 @end example
2758
2759
2760
2761 @example
2762 4961 bitmaps.h
2763 954 glyphs-ns.h
2764 105345 glyphs-x.c
2765 4288 glyphs-x.h
2766 72102 glyphs.c
2767 16356 glyphs.h
2768 @end example
2769
2770
2771
2772 @example
2773 952 objects-ns.h
2774 9971 objects-tty.c
2775 1465 objects-tty.h
2776 32326 objects-x.c
2777 2806 objects-x.h
2778 31944 objects.c
2779 6809 objects.h
2780 @end example
2781
2782
2783
2784 @example
2785 57511 menubar-x.c
2786 11243 menubar.c
2787 @end example
2788
2789
2790
2791 @example
2792 25012 scrollbar-x.c
2793 2554 scrollbar-x.h
2794 26954 scrollbar.c
2795 2778 scrollbar.h
2796 @end example
2797
2798
2799
2800 @example
2801 23117 toolbar-x.c
2802 43456 toolbar.c
2803 4280 toolbar.h
2804 @end example
2805
2806
2807
2808 @example
2809 25070 font-lock.c
2810 @end example
2811
2812 This file provides C support for syntax highlighting -- i.e.
2813 highlighting different syntactic constructs of a source file in
2814 different colors, for easy reading. The C support is provided so that
2815 this is fast.
2816
2817
2818
2819 @example
2820 32180 dgif_lib.c
2821 3999 gif_err.c
2822 10697 gif_lib.h
2823 9371 gifalloc.c
2824 @end example
2825
2826 These modules decode GIF-format image files, for use with glyphs.
2827
2828
2829
2830 @node Modules for the Redisplay Mechanism
2831 @section Modules for the Redisplay Mechanism
2832
2833 @example
2834 size name
2835 ------- ---------------------
2836 38692 redisplay-output.c
2837 40835 redisplay-tty.c
2838 65069 redisplay-x.c
2839 234142 redisplay.c
2840 17026 redisplay.h
2841 @end example
2842
2843 These files provide the redisplay mechanism. As with many other
2844 subsystems in XEmacs, there is a clean separation between the general
2845 and device-specific support.
2846
2847 @file{redisplay.c} contains the bulk of the redisplay engine. These
2848 functions update the redisplay structures (which describe how the screen
2849 is to appear) to reflect any changes made to the state of any
2850 displayable objects (buffer, frame, window, etc.) since the last time
2851 that redisplay was called. These functions are highly optimized to
2852 avoid doing more work than necessary (since redisplay is called
2853 extremely often and is potentially a huge time sink), and depend heavily
2854 on notifications from the objects themselves that changes have occurred,
2855 so that redisplay doesn't explicitly have to check each possible object.
2856 The redisplay mechanism also contains a great deal of caching to further
2857 speed things up; some of this caching is contained within the various
2858 displayable objects.
2859
2860 @file{redisplay-output.c} goes through the redisplay structures and converts
2861 them into calls to device-specific methods to actually output the screen
2862 changes.
2863
2864 @file{redisplay-x.c} and @file{redisplay-tty.c} are two implementations
2865 of these redisplay output methods, for X frames and TTY frames,
2866 respectively.
2867
2868
2869
2870 @example
2871 14129 indent.c
2872 @end example
2873
2874 This module contains various functions and Lisp primitives for
2875 converting between buffer positions and screen positions. These
2876 functions call the redisplay mechanism to do most of the work, and then
2877 examine the redisplay structures to get the necessary information. This
2878 module needs work.
2879
2880
2881
2882 @example
2883 14754 termcap.c
2884 2141 terminfo.c
2885 7253 tparam.c
2886 @end example
2887
2888 These files contain functions for working with the termcap (BSD-style)
2889 and terminfo (System V style) databases of terminal capabilities and
2890 escape sequences, used when XEmacs is displaying in a TTY.
2891
2892
2893
2894 @example
2895 10869 cm.c
2896 5876 cm.h
2897 @end example
2898
2899 These files provide some miscellaneous TTY-output functions and should
2900 probably be merged into @file{redisplay-tty.c}.
2901
2902
2903
2904 @node Modules for Interfacing with the File System
2905 @section Modules for Interfacing with the File System
2906
2907 @example
2908 size name
2909 ------- ---------------------
2910 43362 lstream.c
2911 14240 lstream.h
2912 @end example
2913
2914 These modules implement the stream Lisp object type. This is an
2915 internal-only Lisp object that implements a generic buffering stream.
2916 The idea is to provide a uniform interface onto all sources and sinks of
2917 data, including file descriptors, stdio streams, chunks of memory, Lisp
2918 buffers, Lisp strings, etc. That way, I/O functions can be written to
2919 the stream interface and can transparently handle all possible sources
2920 and sinks. (For example, the @code{read} function can read data from a
2921 file, a string, a buffer, or even a function that is called repeatedly
2922 to return data, without worrying about where the data is coming from or
2923 what-size chunks it is returned in.)
2924
2925 @cindex lstream
2926 Note that in the C code, streams are called @dfn{lstreams} (for ``Lisp
2927 streams'') to distinguish them from other kinds of streams, e.g. stdio
2928 streams and C++ I/O streams.
2929
2930 Similar to other subsystems in XEmacs, lstreams are separated into
2931 generic functions and a set of methods for the different types of
2932 lstreams. @file{lstream.c} provides implementations of many different
2933 types of streams; others are provided, e.g., in @file{mule-coding.c}.
2934
2935
2936
2937 @example
2938 126926 fileio.c
2939 @end example
2940
2941 This implements the basic primitives for interfacing with the file
2942 system. This includes primitives for reading files into buffers,
2943 writing buffers into files, checking for the presence or accessibility
2944 of files, canonicalizing file names, etc. Note that these primitives
2945 are usually not invoked directly by the user: There is a great deal of
2946 higher-level Lisp code that implements the user commands such as
2947 @code{find-file} and @code{save-buffer}. This is similar to the
2948 distinction between the lower-level primitives in @file{editfns.c} and
2949 the higher-level user commands in @file{commands.c} and
2950 @file{simple.el}.
2951
2952
2953
2954 @example
2955 10960 filelock.c
2956 @end example
2957
2958 This file provides functions for detecting clashes between different
2959 processes (e.g. XEmacs and some external process, or two different
2960 XEmacs processes) modifying the same file. (XEmacs can optionally use
2961 the @file{lock/} subdirectory to provide a form of ``locking'' between
2962 different XEmacs processes.) This module is also used by the low-level
2963 functions in @file{insdel.c} to ensure that, if the first modification
2964 is being made to a buffer whose corresponding file has been externally
2965 modified, the user is made aware of this so that the buffer can be
2966 synched up with the external changes if necessary.
2967
2968
2969 @example
2970 4527 filemode.c
2971 @end example
2972
2973 This file provides some miscellaneous functions that construct a
2974 @samp{rwxr-xr-x}-type permissions string (as might appear in an
2975 @file{ls}-style directory listing) given the information returned by the
2976 @code{stat()} system call.
2977
2978
2979
2980 @example
2981 22855 dired.c
2982 2094 ndir.h
2983 @end example
2984
2985 These files implement the XEmacs interface to directory searching. This
2986 includes a number of primitives for determining the files in a directory
2987 and for doing filename completion. (Remember that generic completion is
2988 handled by a different mechanism, in @file{minibuf.c}.)
2989
2990 @file{ndir.h} is a header file used for the directory-searching
2991 emulation functions provided in @file{sysdep.c} (see section J below),
2992 for systems that don't provide any directory-searching functions. (On
2993 those systems, directories can be read directly as files, and parsed.)
2994
2995
2996
2997 @example
2998 4311 realpath.c
2999 @end example
3000
3001 This file provides an implementation of the @code{realpath()} function
3002 for expanding symbolic links, on systems that don't implement it or have
3003 a broken implementation.
3004
3005
3006
3007 @node Modules for Other Aspects of the Lisp Interpreter and Object System
3008 @section Modules for Other Aspects of the Lisp Interpreter and Object System
3009
3010 @example
3011 size name
3012 ------- ---------------------
3013 22290 elhash.c
3014 2454 elhash.h
3015 12169 hash.c
3016 3369 hash.h
3017 @end example
3018
3019 These files implement the hashtable Lisp object type. @file{hash.c} and
3020 @file{hash.h} provide a generic C implementation of hash tables (which
3021 can stand independently of XEmacs), and @file{elhash.c} and
3022 @file{elhash.h} provide a Lisp interface onto the C hash tables using
3023 the hashtable Lisp object type.
3024
3025
3026
3027 @example
3028 95691 specifier.c
3029 11167 specifier.h
3030 @end example
3031
3032 This module implements the specifier Lisp object type. This is
3033 primarily used for displayable properties, and allows for values that
3034 are specific to a particular buffer, window, frame, device, or device
3035 class, as well as a default value existing. This is used, for example,
3036 to control the height of the horizontal scrollbar or the appearance of
3037 the @code{default}, @code{bold}, or other faces. The specifier object
3038 consists of a number of specifications, each of which maps from a
3039 buffer, window, etc. to a value. The function @code{specifier-instance}
3040 looks up a value given a window (from which a buffer, frame, and device
3041 can be derived).
3042
3043
3044 @example
3045 43058 chartab.c
3046 6503 chartab.h
3047 9918 casetab.c
3048 @end example
3049
3050 @file{chartab.c} and @file{chartab.h} implement the char table Lisp
3051 object type, which maps from characters or certain sorts of character
3052 ranges to Lisp objects. The implementation of this object is optimized
3053 for the internal representation of characters. Char tables come in
3054 different types, which affect the allowed object types to which a
3055 character can be mapped and also dictate certain other properties of the
3056 char table.
3057
3058 @cindex case table
3059 @file{casetab.c} implements one sort of char table, the @dfn{case
3060 table}, which maps characters to other characters of possibly different
3061 case. These are used by XEmacs to implement case-changing primitives
3062 and to do case-insensitive searching.
3063
3064
3065
3066 @example
3067 49593 syntax.c
3068 10200 syntax.h
3069 @end example
3070
3071 @cindex scanner
3072 This module implements syntax tables, another sort of char table that
3073 maps characters into syntax classes that define the syntax of these
3074 characters (e.g. a parenthesis belongs to a class of @samp{open} characters
3075 that have corresponding @samp{close} characters and can be nested).
3076 This module also implements the Lisp @dfn{scanner}, a set of primitives
3077 for scanning over text based on syntax tables. This is used, for
3078 example, to find the matching parenthesis in a command such as
3079 @code{forward-sexp}, and by @file{font-lock.c} to locate quoted strings,
3080 comments, etc.
3081
3082
3083
3084 @example
3085 10438 casefiddle.c
3086 @end example
3087
3088 This module implements various Lisp primitives for upcasing, downcasing
3089 and capitalizing strings or regions of buffers.
3090
3091
3092
3093 @example
3094 20234 rangetab.c
3095 @end example
3096
3097 This module implements the range table Lisp object type, which provides
3098 for a mapping from ranges of integers to arbitrary Lisp objects.
3099
3100
3101
3102 @example
3103 3201 opaque.c
3104 2206 opaque.h
3105 @end example
3106
3107 This module implements the opaque Lisp object type, an internal-only
3108 Lisp object that encapsulates an arbitrary block of memory so that it
3109 can be managed by the Lisp allocation system. To create an opaque
3110 object, you call @code{make_opaque()}, passing a pointer to a block of
3111 memory. An object is created that is big enough to hold the memory,
3112 which is copied into the object's storage. The object will then stick
3113 around as long as you keep pointers to it, after which it will be
3114 automatically reclaimed.
3115
3116 @cindex mark method
3117 Opaque objects can also have an arbitrary @dfn{mark method} associated
3118 with them, in case the block of memory contains other Lisp objects that
3119 need to be marked for garbage-collection purposes. (If you need other
3120 object methods, such as a finalize method, you should just go ahead and
3121 create a new Lisp object type -- it's not hard.)
3122
3123
3124
3125 @example
3126 8783 abbrev.c
3127 @end example
3128
3129 This function provides a few primitives for doing dynamic abbreviation
3130 expansion. In XEmacs, most of the code for this has been moved into
3131 Lisp. Some C code remains for speed and because the primitive
3132 @code{self-insert-command} (which is executed for all self-inserting
3133 characters) hooks into the abbrev mechanism. (@code{self-insert-command}
3134 is itself in C only for speed.)
3135
3136
3137
3138 @example
3139 21934 doc.c
3140 @end example
3141
3142 This function provides primitives for retrieving the documentation
3143 strings of functions and variables. These documentation strings contain
3144 certain special markers that get dynamically expanded (e.g. a
3145 reverse-lookup is performed on some named functions to retrieve their
3146 current key bindings). Some documentation strings (in particular, for
3147 the built-in primitives and pre-loaded Lisp functions) are stored
3148 externally in a file @file{DOC} in the @file{lib-src/} directory and
3149 need to be fetched from that file. (Part of the build stage involves
3150 building this file, and another part involves constructing an index for
3151 this file and embedding it into the executable, so that the functions in
3152 @file{doc.c} do not have to search the entire @file{DOC} file to find
3153 the appropriate documentation string.)
3154
3155
3156
3157 @example
3158 13197 md5.c
3159 @end example
3160
3161 This function provides a Lisp primitive that implements the MD5 secure
3162 hashing scheme, used to create a large hash value of a string of data such that
3163 the data cannot be derived from the hash value. This is used for
3164 various security applications on the Internet.
3165
3166
3167
3168 @example
3169 7000 mocklisp.c
3170 @end example
3171
3172 This function provides some emulation of MockLisp, a version of Lisp
3173 provided in Gosling Emacs (aka Unipress Emacs), from which some old
3174 versions of GNU Emacs were derived. You have to explicitly enable this
3175 code with a configure option and shouldn't normally, because it changes
3176 the semantics of XEmacs Lisp in ways that are not desirable for normal
3177 Lisp programs.
3178
3179
3180
3181 @node Modules for Interfacing with the Operating System
3182 @section Modules for Interfacing with the Operating System
3183
3184 @example
3185 size name
3186 ------- ---------------------
3187 33533 callproc.c
3188 89697 process.c
3189 4663 process.h
3190 @end example
3191
3192 These modules allow XEmacs to spawn and communicate with subprocesses
3193 and network connections.
3194
3195 @cindex synchronous subprocesses
3196 @cindex subprocesses, synchronous
3197 @file{callproc.c} implements (through the @code{call-process}
3198 primitive) what are called @dfn{synchronous subprocesses}. This means
3199 that XEmacs runs a program, waits till it's done, and retrieves its
3200 output. A typical example might be calling the @file{ls} program to get
3201 a directory listing.
3202
3203 @cindex asynchronous subprocesses
3204 @cindex subprocesses, asynchronous
3205 @file{process.c} and @file{process.h} implement @dfn{asynchronous
3206 subprocesses}. This means that XEmacs starts a program and then
3207 continues normally, not waiting for the process to finish. Data can be
3208 sent to the process or retrieved from it as it's running. This is used
3209 for the @code{shell} command (which provides a front end onto a shell
3210 program such as @file{csh}), the mail and news readers implemented in
3211 XEmacs, etc. The result of calling @code{start-process} to start a
3212 subprocess is a process object, a particular kind of object used to
3213 communicate with the subprocess. You can send data to the process by
3214 passing the process object and the data to @code{send-process}, and you
3215 can specify what happens to data retrieved from the process by setting
3216 properties of the process object. (When the process sends data, XEmacs
3217 receives a process event, which says that there is data ready. When
3218 @code{dispatch-event} is called on this event, it reads the data from
3219 the process and does something with it, as specified by the process
3220 object's properties. Typically, this means inserting the data into a
3221 buffer or calling a function.) Another property of the process object is
3222 called the @dfn{sentinel}, which is a function that is called when the
3223 process terminates.
3224
3225 @cindex network connections
3226 Process objects are also used for network connections (connections to a
3227 process running on another machine). Network connections are started
3228 with @code{open-network-stream} but otherwise work just like
3229 subprocesses.
3230
3231
3232
3233 @example
3234 136029 sysdep.c
3235 5986 sysdep.h
3236 @end example
3237
3238 These modules implement most of the low-level, messy operating-system
3239 interface code. This includes various device control (ioctl) operations
3240 for file descriptors, TTY's, pseudo-terminals, etc. (usually this stuff
3241 is fairly system-dependent; thus the name of this module), and emulation
3242 of standard library functions and system calls on systems that don't
3243 provide them or have broken versions.
3244
3245
3246
3247 @example
3248 3605 sysdir.h
3249 6708 sysfile.h
3250 2027 sysfloat.h
3251 2918 sysproc.h
3252 745 syspwd.h
3253 7643 syssignal.h
3254 6892 systime.h
3255 12477 systty.h
3256 3487 syswait.h
3257 @end example
3258
3259 These header files provide consistent interfaces onto system-dependent
3260 header files and system calls. The idea is that, instead of including a
3261 standard header file like @file{<sys/param.h>} (which may or may not
3262 exist on various systems) or having to worry about whether all system
3263 provide a particular preprocessor constant, or having to deal with the
3264 four different paradigms for manipulating signals, you just include the
3265 appropriate @file{sys*.h} header file, which includes all the right
3266 system header files, defines and missing preprocessor constants,
3267 provides a uniform interface onto system calls, etc.
3268
3269 @file{sysdir.h} provides a uniform interface onto directory-querying
3270 functions. (In some cases, this is in conjunction with emulation
3271 functions in @file{sysdep.c}.)
3272
3273 @file{sysfile.h} includes all the necessary header files for standard
3274 system calls (e.g. @code{read()}), ensures that all necessary
3275 @code{open()} and @code{stat()} preprocessor constants are defined, and
3276 possibly (usually) substitutes sugared versions of @code{read()},
3277 @code{write()}, etc. that automatically restart interrupted I/O
3278 operations.
3279
3280 @file{sysfloat.h} includes the necessary header files for floating-point
3281 operations.
3282
3283 @file{sysproc.h} includes the necessary header files for calling
3284 @code{select()}, @code{fork()}, @code{execve()}, socket operations, and
3285 the like, and ensures that the @code{FD_*()} macros for descriptor-set
3286 manipulations are available.
3287
3288 @file{syspwd.h} includes the necessary header files for obtaining
3289 information from @file{/etc/passwd} (the functions are emulated under
3290 VMS).
3291
3292 @file{syssignal.h} includes the necessary header files for
3293 signal-handling and provides a uniform interface onto the different
3294 signal-handling and signal-blocking paradigms.
3295
3296 @file{systime.h} includes the necessary header files and provides
3297 uniform interfaces for retrieving the time of day, setting file
3298 access/modification times, getting the amount of time used by the XEmacs
3299 process, etc.
3300
3301 @file{systty.h} buffers against the infinitude of different ways of
3302 controlling TTY's.
3303
3304 @file{syswait.h} provides a uniform way of retrieving the exit status
3305 from a @code{wait()}ed-on process (some systems use a union, others use
3306 an int).
3307
3308
3309
3310 @example
3311 7940 hpplay.c
3312 10920 libsst.c
3313 1480 libsst.h
3314 3260 libst.h
3315 15355 linuxplay.c
3316 15849 nas.c
3317 19133 sgiplay.c
3318 15411 sound.c
3319 7358 sunplay.c
3320 @end example
3321
3322 These files implement the ability to play various sounds on some types
3323 of computers. You have to configure your XEmacs with sound support in
3324 order to get this capability.
3325
3326 @file{sound.c} provides the generic interface. It implements various
3327 Lisp primitives and variables that let you specify which sounds should
3328 be played in certain conditions. (The conditions are identified by
3329 symbols, which are passed to @code{ding} to make a sound. Various
3330 standard functions call this function at certain times; if sound support
3331 does not exist, a simple beep results.
3332
3333 @cindex native sound
3334 @cindex sound, native
3335 @file{sgiplay.c}, @file{sunplay.c}, @file{hpplay.c}, and
3336 @file{linuxplay.c} interface to the machine's speaker for various
3337 different kind of machines. This is called @dfn{native} sound.
3338
3339 @cindex sound, network
3340 @cindex network sound
3341 @cindex NAS
3342 @file{nas.c} interfaces to a computer somewhere else on the network
3343 using the NAS (Network Audio Server) protocol, playing sounds on that
3344 machine. This allows you to run XEmacs on a remote machine, with its
3345 display set to your local machine, and have the sounds be made on your
3346 local machine, provided that you have a NAS server running on your local
3347 machine.
3348
3349 @file{libsst.c}, @file{libsst.h}, and @file{libst.h} provide some
3350 additional functions for playing sound on a Sun SPARC but are not
3351 currently in use.
3352
3353
3354
3355 @example
3356 44368 tooltalk.c
3357 2137 tooltalk.h
3358 @end example
3359
3360 These two modules implement an interface to the ToolTalk protocol, which
3361 is an interprocess communication protocol implemented on some versions
3362 of Unix. ToolTalk is a high-level protocol that allows processes to
3363 register themselves as providers of particular services; other processes
3364 can then request a service without knowing or caring exactly who is
3365 providing the service. It is similar in spirit to the DDE protocol
3366 provided under Microsoft Windows. ToolTalk is a part of the new CDE
3367 (Common Desktop Environment) specification and is used to connect the
3368 parts of the SPARCWorks development environment.
3369
3370
3371
3372 @example
3373 22695 getloadavg.c
3374 @end example
3375
3376 This module provides the ability to retrieve the system's current load
3377 average. (The way to do this is highly system-specific, unfortunately,
3378 and requires a lot of special-case code.)
3379
3380
3381
3382 @example
3383 148520 energize.c
3384 6896 energize.h
3385 @end example
3386
3387 This module provides code to interface to an Energize server (when
3388 XEmacs is used as part of Lucid's Energize development environment) and
3389 provides some other Energize-specific functions. Much of the code in
3390 this module should be made more general-purpose and moved elsewhere, but
3391 is no longer very relevant now that Lucid is defunct. It also hasn't
3392 worked since version 19.12, since nobody has been maintaining it.
3393
3394
3395
3396 @example
3397 2861 sunpro.c
3398 @end example
3399
3400 This module provides a small amount of code used internally at Sun to
3401 keep statistics on the usage of XEmacs.
3402
3403
3404
3405 @example
3406 5548 broken-sun.h
3407 3468 strcmp.c
3408 2179 strcpy.c
3409 1650 sunOS-fix.c
3410 @end example
3411
3412 These files provide replacement functions and prototypes to fix numerous
3413 bugs in early releases of SunOS 4.1.
3414
3415
3416
3417 @example
3418 11669 hftctl.c
3419 @end example
3420
3421 This module provides some terminal-control code necessary on versions of
3422 AIX prior to 4.1.
3423
3424
3425
3426 @example
3427 1776 acldef.h
3428 1602 chpdef.h
3429 9032 uaf.h
3430 105 vlimit.h
3431 7145 vms-pp.c
3432 1158 vms-pwd.h
3433 26532 vmsfns.c
3434 6038 vmsmap.c
3435 695 vmspaths.h
3436 17482 vmsproc.c
3437 469 vmsproc.h
3438 @end example
3439
3440 All of these files are used for VMS support, which has never worked in
3441 XEmacs.
3442
3443
3444
3445 @example
3446 28316 msdos.c
3447 1472 msdos.h
3448 @end example
3449
3450 These modules are used for MS-DOS support, which does not work in
3451 XEmacs.
3452
3453
3454
3455 @node Modules for Interfacing with X Windows
3456 @section Modules for Interfacing with X Windows
3457
3458 @example
3459 size name
3460 ------- ---------------------
3461 3196 Emacs.ad.h
3462 @end example
3463
3464 A file generated from @file{Emacs.ad}, which contains XEmacs-supplied
3465 fallback resources (so that XEmacs has pretty defaults).
3466
3467
3468
3469 @example
3470 24242 EmacsFrame.c
3471 6979 EmacsFrame.h
3472 3351 EmacsFrameP.h
3473 @end example
3474
3475 These modules implement an Xt widget class that encapsulates a frame.
3476 This is for ease in integrating with Xt. The EmacsFrame widget covers
3477 the entire X window except for the menubar; the scrollbars are
3478 positioned on top of the EmacsFrame widget.
3479
3480 @strong{Warning:} Abandon hope, all ye who enter here. This code took
3481 an ungodly amount of time to get right, and is likely to fall apart
3482 mercilessly at the slightest change. Such is life under Xt.
3483
3484
3485
3486 @example
3487 8178 EmacsManager.c
3488 1967 EmacsManager.h
3489 1895 EmacsManagerP.h
3490 @end example
3491
3492 These modules implement a simple Xt manager (i.e. composite) widget
3493 class that simply lets its children set whatever geometry they want.
3494 It's amazing that Xt doesn't provide this standardly, but on second
3495 thought, it makes sense, considering how amazingly broken Xt is.
3496
3497
3498 @example
3499 13188 EmacsShell-sub.c
3500 4588 EmacsShell.c
3501 2180 EmacsShell.h
3502 3133 EmacsShellP.h
3503 @end example
3504
3505 These modules implement two Xt widget classes that are subclasses of
3506 the TopLevelShell and TransientShell classes. This is necessary to deal
3507 with more brokenness that Xt has sadistically thrust onto the backs of
3508 developers.
3509
3510
3511
3512 @example
3513 9673 xgccache.c
3514 1111 xgccache.h
3515 @end example
3516
3517 These modules provide functions for maintenance and caching of GC's
3518 (graphics contexts) under the X Window System. This code is junky and
3519 needs to be rewritten.
3520
3521
3522
3523 @example
3524 69181 xselect.c
3525 @end example
3526
3527 @cindex selections
3528 This module provides an interface to the X Window System's concept of
3529 @dfn{selections}, the standard way for X applications to communicate
3530 with each other.
3531
3532
3533
3534 @example
3535 929 xintrinsic.h
3536 1038 xintrinsicp.h
3537 1579 xmmanagerp.h
3538 1585 xmprimitivep.h
3539 @end example
3540
3541 These header files are similar in spirit to the @file{sys*.h} files and buffer
3542 against different implementations of Xt and Motif.
3543
3544 @itemize @bullet
3545 @item
3546 @file{xintrinsic.h} should be included in place of @file{<Intrinsic.h>}.
3547 @item
3548 @file{xintrinsicp.h} should be included in place of @file{<IntrinsicP.h>}.
3549 @item
3550 @file{xmmanagerp.h} should be included in place of @file{<XmManagerP.h>}.
3551 @item
3552 @file{xmprimitivep.h} should be included in place of @file{<XmPrimitiveP.h>}.
3553 @end itemize
3554
3555
3556
3557 @example
3558 16930 xmu.c
3559 936 xmu.h
3560 @end example
3561
3562 These files provide an emulation of the Xmu library for those systems
3563 (i.e. HPUX) that don't provide it as a standard part of X.
3564
3565
3566
3567 @example
3568 4201 ExternalClient-Xlib.c
3569 18083 ExternalClient.c
3570 2035 ExternalClient.h
3571 2104 ExternalClientP.h
3572 22684 ExternalShell.c
3573 1709 ExternalShell.h
3574 1971 ExternalShellP.h
3575 2478 extw-Xlib.c
3576 1481 extw-Xlib.h
3577 6565 extw-Xt.c
3578 1430 extw-Xt.h
3579 @end example
3580
3581 @cindex external widget
3582 These files provide the @dfn{external widget} interface, which allows an
3583 XEmacs frame to appear as a widget in another application. To do this,
3584 you have to configure with @samp{--external-widget}.
3585
3586 @file{ExternalShell*} provides the server (XEmacs) side of the
3587 connection.
3588
3589 @file{ExternalClient*} provides the client (other application) side of
3590 the connection. These files are not compiled into XEmacs but are
3591 compiled into libraries that are then linked into your application.
3592
3593 @file{extw-*} is common code that is used for both the client and server.
3594
3595 Don't touch this code; something is liable to break if you do.
3596
3597
3598
3599 @example
3600 31014 epoch.c
3601 @end example
3602
3603 This file provides some additional, Epoch-compatible, functionality for
3604 interfacing to the X Window System.
3605
3606
3607
3608 @node Modules for Internationalization
3609 @section Modules for Internationalization
3610
3611 @example
3612 size name
3613 ------- ---------------------
3614 42836 mule-canna.c
3615 16737 mule-ccl.c
3616 41080 mule-charset.c
3617 30176 mule-charset.h
3618 146844 mule-coding.c
3619 16588 mule-coding.h
3620 6996 mule-mcpath.c
3621 2899 mule-mcpath.h
3622 57158 mule-wnnfns.c
3623 3351 mule.c
3624 @end example
3625
3626 These files implement the MULE (Asian-language) support. Note that MULE
3627 actually provides a general interface for all sorts of languages, not
3628 just Asian languages (although they are generally the most complicated
3629 to support). This code is still in beta.
3630
3631 @file{mule-charset.*} and @file{mule-coding.*} provide the heart of the
3632 XEmacs MULE support. @file{mule-charset.*} implements the @dfn{charset} Lisp object,
3633 which encapsulates a character set (an ordered one- or two-dimensional
3634 set of characters, such as US ASCII or JISX0208 Japanese Kanji).
3635 @file{mule-coding.*} implements the coding-system Lisp object, which
3636 encapsulates a method of converting between different encodings. An
3637 encoding is a representation of a stream of characters from multiple
3638 character sets using a stream of bytes or words and defines (e.g.) which
3639 escape sequences are used to specify particular character sets, how the
3640 indices for a character are converted into bytes (sometimes this
3641 involves setting the high bit; sometimes complicated rearranging of the
3642 values takes place, as in the Shift-JIS encoding), etc.
3643
3644 @file{mule-ccl.c} provides the CCL (Code Conversion Language)
3645 interpreter. CCL is similar in spirit to Lisp byte code and is used to
3646 implement converters for custom encodings.
3647
3648 @file{mule-canna.c} and @file{mule-wnnfns.c} implement interfaces to
3649 external programs used to implement the Canna and WNN input methods,
3650 respectively. This is currently broken.
3651
3652 @file{mule-mcpatch.c} provides some functions to allow for pathnames
3653 containing extended characters. This code is fragmentary and completely
3654 non-working.
3655
3656 @file{mule.c} provides a few miscellaneous things that should probably
3657 be elsewhere.
3658
3659
3660
3661 @example
3662 9400 intl.c
3663 @end example
3664
3665 This provides some miscellaneous internationalization code for
3666 implementing message translation and interfacing to the Ximp input
3667 method. None of this code is currently working.
3668
3669
3670
3671 @example
3672 1764 iso-wide.h
3673 @end example
3674
3675 This contains leftover code from an earlier implementation of
3676 Asian-language support, and is not currently used.
3677
3678
3679
3680
3681 @node Allocation of Objects in XEmacs Lisp, Events and the Event Loop, A Summary of the Various XEmacs Modules, Top
3682 @chapter Allocation of Objects in XEmacs Lisp
3683
3684 @menu
3685 * Introduction to Allocation::
3686 * Garbage Collection::
3687 * GCPROing::
3688 * Integers and Characters::
3689 * Allocation from Frob Blocks::
3690 * lrecords::
3691 * Low-level allocation::
3692 * Pure Space::
3693 * Cons::
3694 * Vector::
3695 * Bit Vector::
3696 * Symbol::
3697 * Marker::
3698 * String::
3699 * Bytecode::
3700 @end menu
3701
3702 @node Introduction to Allocation
3703 @section Introduction to Allocation
3704
3705 Emacs Lisp, like all Lisps, has garbage collection. This means that
3706 the programmer never has to explicitly free (destroy) an object; it
3707 happens automatically when the object becomes inaccessible. Most
3708 experts agree that garbage collection is a necessity in a modern,
3709 high-level language. Its omission from C stems from the fact that C was
3710 originally designed to be a nice abstract layer on top of assembly
3711 language, for writing kernels and basic system utilities rather than
3712 large applications.
3713
3714 Lisp objects can be created by any of a number of Lisp primitives.
3715 Most object types have one or a small number of basic primitives
3716 for creating objects. For conses, the basic primitive is @code{cons};
3717 for vectors, the primitives are @code{make-vector} and @code{vector}; for
3718 symbols, the primitives are @code{make-symbol} and @code{intern}; etc.
3719 Some Lisp objects, especially those that are primarily used internally,
3720 have no corresponding Lisp primitives. Every Lisp object, though,
3721 has at least one C primitive for creating it.
3722
3723 Recall from section (VII) that a Lisp object, as stored in a 32-bit
3724 or 64-bit word, has a mark bit, a few tag bits, and a ``value'' that
3725 occupies the remainder of the bits. We can separate the different
3726 Lisp object types into four broad categories:
3727
3728 @itemize @bullet
3729 @item
3730 (a) Those for whom the value directly represents the contents of the
3731 Lisp object. Only two types are in this category: integers and
3732 characters. No special allocation or garbage collection is necessary
3733 for such objects.
3734 @end itemize
3735
3736 In the remaining three categories, the value is a pointer to a
3737 structure.
3738
3739 @itemize @bullet
3740 @item
3741 @cindex frob block
3742 (b) Those for whom the tag directly specifies the type. Recall that
3743 there are only three tag bits; this means that at most five types can be
3744 specified this way. The most commonly-used types are stored in this
3745 format; this includes conses, strings, vectors, and sometimes symbols.
3746 With the exception of vectors, objects in this category are allocated in
3747 @dfn{frob blocks}, i.e. large blocks of memory that are subdivided into
3748 individual objects. This saves a lot on malloc overhead, since there
3749 are typically quite a lot of these objects around, and the objects are
3750 small. (A cons, for example, occupies 8 bytes on 32-bit machines -- 4
3751 bytes for each of the two objects it contains.) Vectors are individually
3752 @code{malloc()}ed since they are of variable size. (It would be
3753 possible, and desirable, to allocate vectors of certain small sizes out
3754 of frob blocks, but it isn't currently done.) Strings are handled
3755 specially: Each string is allocated in two parts, a fixed size structure
3756 containing a length and a data pointer, and the actual data of the
3757 string. The former structure is allocated in frob blocks as usual, and
3758 the latter data is stored in @dfn{string chars blocks} and is relocated
3759 during garbage collection to eliminate holes.
3760 @end itemize
3761
3762 In the remaining two categories, the type is stored in the object
3763 itself. The tag for all such objects is the generic @dfn{lrecord}
3764 (Lisp_Record) tag. The first four bytes (or eight, for 64-bit machines)
3765 of the object's structure are a pointer to a structure that describes
3766 the object's type, which includes method pointers and a pointer to a
3767 string naming the type. Note that it's possible to save some space by
3768 using a one- or two-byte tag, rather than a four- or eight-byte pointer
3769 to store the type, but it's not clear it's worth making the change.
3770
3771 @itemize @bullet
3772 @item
3773 (c) Those lrecords that are allocated in frob blocks (see above). This
3774 includes the objects that are most common and relatively small, and
3775 includes floats, bytecodes, symbols (when not in category (b)), extents,
3776 events, and markers. With the cleanup of frob blocks done in 19.12,
3777 it's not terribly hard to add more objects to this category, but it's a
3778 bit trickier than adding an object type to type (d) (esp. if the object
3779 needs a finalization method), and is not likely to save much space
3780 unless the object is small and there are many of them. (In fact, if
3781 there are very few of them, it might actually waste space.)
3782 @item
3783 (d) Those lrecords that are individually @code{malloc()}ed. These are
3784 called @dfn{lcrecords}. All other types are in this category. Adding a
3785 new type to this category is comparatively easy, and all types added
3786 since 19.8 (when the current allocation scheme was devised, by Richard
3787 Mlynarik), with the exception of the character type, have been in this
3788 category.
3789 @end itemize
3790
3791 Note that bit vectors are a bit of a special case. They are
3792 simple lrecords as in category (c), but are individually @code{malloc()}ed
3793 like vectors. You can basically view them as exactly like vectors
3794 except that their type is stored in lrecord fashion rather than
3795 in directly-tagged fashion.
3796
3797 Note that FSF Emacs redesigned their object system in 19.29 to follow
3798 a similar scheme. However, given RMS's expressed dislike for data
3799 abstraction, the FSF scheme is not nearly as clean or as easy to
3800 extend. (FSF calls items of type (c) @code{Lisp_Misc} and items of type
3801 (d) @code{Lisp_Vectorlike}, with separate tags for each, although
3802 @code{Lisp_Vectorlike} is also used for vectors.)
3803
3804 @node Garbage Collection
3805 @section Garbage Collection
3806 @cindex garbage collection
3807
3808 @cindex mark and sweep
3809 Garbage collection is simple in theory but tricky to implement.
3810 Emacs Lisp uses the oldest garbage collection method, called
3811 @dfn{mark and sweep}. Garbage collection begins by starting with
3812 all accessible locations (i.e. all variables and other slots where
3813 Lisp objects might occur) and recursively traversing all objects
3814 accessible from those slots, marking each one that is found.
3815 We then go through all of memory and free each object that is
3816 not marked, and unmarking each object that is marked. Note
3817 that ``all of memory'' means all currently allocated objects.
3818 Traversing all these objects means traversing all frob blocks,
3819 all vectors (which are chained in one big list), and all
3820 lcrecords (which are likewise chained).
3821
3822 Note that, when an object is marked, the mark has to occur
3823 inside of the object's structure, rather than in the 32-bit
3824 @code{Lisp_Object} holding the object's pointer; i.e. you can't just
3825 set the pointer's mark bit. This is because there may be many
3826 pointers to the same object. This means that the method of
3827 marking an object can differ depending on the type. The
3828 different marking methods are approximately as follows:
3829
3830 @enumerate
3831 @item
3832 For conses, the mark bit of the car is set.
3833 @item
3834 For strings, the mark bit of the string's plist is set.
3835 @item
3836 For symbols when not lrecords, the mark bit of the
3837 symbol's plist is set.
3838 @item
3839 For vectors, the length is negated after adding 1.
3840 @item
3841 For lrecords, the pointer to the structure describing
3842 the type is changed (see below).
3843 @item
3844 Integers and characters do not need to be marked, since
3845 no allocation occurs for them.
3846 @end enumerate
3847
3848 The details of this are in the @code{mark_object()} function.
3849
3850 Note that any code that operates during garbage collection has
3851 to be especially careful because of the fact that some objects
3852 may be marked and as such may not look like they normally do.
3853 In particular:
3854
3855 @itemize @bullet
3856 Some object pointers may have their mark bit set. This will make
3857 @code{FOOBARP()} predicates fail. Use @code{GC_FOOBARP()} to deal with
3858 this.
3859 @item
3860 Even if you clear the mark bit, @code{FOOBARP()} will still fail
3861 for lrecords because the implementation pointer has been
3862 changed (see below). @code{GC_FOOBARP()} will correctly deal with
3863 this.
3864 @item
3865 Vectors have their size field munged, so anything that
3866 looks at this field will fail.
3867 @item
3868 Note that @code{XFOOBAR()} macros @emph{will} work correctly on object
3869 pointers with their mark bit set, because the logical shift operations
3870 that remove the tag also remove the mark bit.
3871 @end itemize
3872
3873 Finally, note that garbage collection can be invoked explicitly
3874 by calling @code{garbage-collect} but is also called automatically
3875 by @code{eval}, once a certain amount of memory has been allocated
3876 since the last garbage collection (according to @code{gc-cons-threshold}).
3877
3878 @node GCPROing
3879 @section @code{GCPRO}ing
3880
3881 @code{GCPRO}ing is one of the ugliest and trickiest parts of Emacs
3882 internals. The basic idea is that whenever garbage collection
3883 occurs, all in-use objects must be reachable somehow or
3884 other from one of the roots of accessibility. The roots
3885 of accessibility are:
3886
3887 @enumerate
3888 @item
3889 All objects that have been @code{staticpro()}d. This is used for
3890 any global C variables that hold Lisp objects. A call to
3891 @code{staticpro()} happens implicitly as a result of any symbols
3892 declared with @code{defsymbol()} and any variables declared with
3893 @code{DEFVAR_FOO()}. You need to explicitly call @code{staticpro()}
3894 (in the @code{vars_of_foo()} method of a module) for other global
3895 C variables holding Lisp objects. (This typically includes
3896 internal lists and such things.)
3897
3898 Note that @code{obarray} is one of the @code{staticpro()}d things.
3899 Therefore, all functions and variables get marked through this.
3900 @item
3901 Any shadowed bindings that are sitting on the specpdl stack.
3902 @item
3903 Any objects sitting in currently active stack frames,
3904 catches, and condition cases.
3905 @item
3906 A couple of special-case places where active objects are
3907 located.
3908 @item
3909 Anything currently marked with @code{GCPRO}.
3910 @end enumerate
3911
3912 Marking with @code{GCPRO} is necessary because some C functions (quite
3913 a lot, in fact), allocate objects during their operation. Quite
3914 frequently, there will be no other pointer to the object while the
3915 function is running, and if a garbage collection occurs and the object
3916 needs to be referenced again, bad things will happen. The solution is
3917 to mark those objects with @code{GCPRO}. Unfortunately this is easy to
3918 forget, and there is basically no way around this problem. Here are
3919 some rules, though:
3920
3921 @enumerate
3922 @item
3923 For every @code{GCPRO@var{n}}, there have to be declarations of
3924 @code{struct gcpro gcpro1, gcpro2}, etc.
3925
3926 @item
3927 You @emph{must} @code{UNGCPRO} anything that's @code{GCPRO}ed, and you
3928 @emph{must not} @code{UNGCPRO} if you haven't @code{GCPRO}ed. Getting
3929 either of these wrong will lead to crashes, often in completely random
3930 places unrelated to where the problem lies.
3931
3932 @item
3933 The way this actually works is that all currently active @code{GCPRO}s
3934 are chained through the @code{struct gcpro} local variables, with the
3935 variable @samp{gcprolist} pointing to the head of the list and the nth
3936 local @code{gcpro} variable pointing to the first @code{gcpro} variable
3937 in the next enclosing stack frame. Each @code{GCPRO}ed thing is an
3938 lvalue, and the @code{struct gcpro} local variable contains a pointer to
3939 this lvalue. This is why things will mess up badly if you don't pair up
3940 the @code{GCPRO}s and @code{UNGCPRO}s -- you will end up with
3941 @code{gcprolist}s containing pointers to @code{struct gcpro}s or local
3942 @code{Lisp_Object} variables in no-longer-active stack frames.
3943
3944 @item
3945 It is actually possible for a single @code{struct gcpro} to
3946 protect a contiguous array of any number of values, rather than
3947 just a single lvalue. To effect this, call @code{GCPRO@var{n}} as usual on
3948 the first object in the array and then set @code{gcpron.nvars}.
3949
3950 @item
3951 @strong{Strings are relocated.} What this means in practice is that the
3952 pointer obtained using @code{string_data()} is liable to change at any
3953 time, and you should never keep it around past any function call, or
3954 pass it as an argument to any function that might cause a garbage
3955 collection. This is why a number of functions accept either a
3956 ``non-relocatable'' @code{char *} pointer or a relocatable Lisp string,
3957 and only access the Lisp string's data at the very last minute. In some
3958 cases, you may end up having to @code{alloca()} some space and copy the
3959 string's data into it.
3960
3961 @item
3962 By convention, if you have to nest @code{GCPRO}'s, use @code{NGCPRO@var{n}}
3963 (along with @code{struct gcpro ngcpro1, ngcpro2}, etc.), @code{NNGCPRO@var{n}},
3964 etc. This avoids compiler warnings about shadowed locals.
3965
3966 @item
3967 It is @emph{always} better to err on the side of extra @code{GCPRO}s
3968 rather than too few. The extra cycles spent on this are
3969 almost never going to make a whit of difference in the
3970 speed of anything.
3971
3972 @item
3973 The general rule to follow is that caller, not callee, @code{GCPRO}s.
3974 That is, you should not have to explicitly @code{GCPRO} any Lisp objects
3975 that are passed in as parameters, but if you create any Lisp objects
3976 (remember, this happens in all sorts of circumstances, e.g. with
3977 @code{Fcons()}, etc.), you are responsible for @code{GCPRO}ing the
3978 objects unless you are @emph{absolutely sure} that there's no
3979 possibility that a garbage-collection can occur while you need to use
3980 the object. Even then, consider @code{GCPRO}ing.
3981
3982 @item
3983 A garbage collection can occur whenever anything calls @code{Feval}, or
3984 whenever a QUIT can occur where execution can continue past
3985 this. (Remember, this is almost anywhere.)
3986
3987 @item
3988 If you have the @emph{least smidgeon of doubt} about whether
3989 you need to @code{GCPRO}, you should @code{GCPRO}.
3990
3991 @item
3992 Beware of @code{GCPRO}ing something that is uninitialized. If you have
3993 any shade of doubt about this, initialize all your variables to Qnil.
3994
3995 @item
3996 Be careful of traps, like calling @code{Fcons()} in the argument to
3997 another function. By the ``caller protects'' law, you should be
3998 @code{GCPRO}ing the newly-created cons, but you aren't. A certain
3999 number of functions that are commonly called on freshly created stuff
4000 (e.g. @code{nconc2()}, @code{Fsignal()}), break the ``caller protects''
4001 law and go ahead and @code{GCPRO} their arguments so as to simplify
4002 things, but make sure and check if it's OK whenever doing something like
4003 this.
4004
4005 @item
4006 Once again, remember to @code{GCPRO}! Bugs resulting from insufficient
4007 @code{GCPRO}ing are intermittent and extremely difficult to track down,
4008 often showing up in crashes inside of @code{garbage-collect} or in
4009 weirdly corrupted objects or even in incorrect values in a totally
4010 different section of code.
4011 @end enumerate
4012
4013 @cindex garbage collection, conservative
4014 @cindex conservative garbage collection
4015 Given the extremely error-prone nature of the @code{GCPRO} scheme, and
4016 the difficulties in tracking down, it should be considered a deficiency
4017 in the XEmacs code. A solution to this problem would involve
4018 implementing so-called @dfn{conservative} garbage collection for the C
4019 stack. That involves looking through all of stack memory and treating
4020 anything that looks like a reference to an object as a reference. This
4021 will result in a few objects not getting collected when they should, but
4022 it obviates the need for @code{GCPRO}ing, and allows garbage collection
4023 to happen at any point at all, such as during object allocation.
4024
4025 @node Integers and Characters
4026 @section Integers and Characters
4027
4028 Integer and character Lisp objects are created from integers using the
4029 macros @code{XSETINT()} and @code{XSETCHAR()} or the equivalent
4030 functions @code{make_int()} and @code{make_char()}. (These are actually
4031 macros on most systems.) These functions basically just do some moving
4032 of bits around, since the integral value of the object is stored
4033 directly in the @code{Lisp_Object}.
4034
4035 @code{XSETINT()} and the like will truncate values given to them that
4036 are too big; i.e. you won't get the value you expected but the tag bits
4037 will at least be correct.
4038
4039 @node Allocation from Frob Blocks
4040 @section Allocation from Frob Blocks
4041
4042 The uninitialized memory required by a @code{Lisp_Object} of a particular type
4043 is allocated using
4044 @code{ALLOCATE_FIXED_TYPE()}. This only occurs inside of the
4045 lowest-level object-creating functions in @file{alloc.c}:
4046 @code{Fcons()}, @code{make_float()}, @code{Fmake_byte_code()},
4047 @code{Fmake_symbol()}, @code{allocate_extent()},
4048 @code{allocate_event()}, @code{Fmake_marker()}, and
4049 @code{make_uninit_string()}. The idea is that, for each type, there are
4050 a number of frob blocks (each 2K in size); each frob block is divided up
4051 into object-sized chunks. Each frob block will have some of these
4052 chunks that are currently assigned to objects, and perhaps some that are
4053 free. (If a frob block has nothing but free chunks, it is freed at the
4054 end of the garbage collection cycle.) The free chunks are stored in a
4055 free list, which is chained by storing a pointer in the first four bytes
4056 of the chunk. (Except for the free chunks at the end of the last frob
4057 block, which are handled using an index which points past the end of the
4058 last-allocated chunk in the last frob block.)
4059 @code{ALLOCATE_FIXED_TYPE()} first tries to retrieve a chunk from the
4060 free list; if that fails, it calls
4061 @code{ALLOCATE_FIXED_TYPE_FROM_BLOCK()}, which looks at the end of the
4062 last frob block for space, and creates a new frob block if there is
4063 none. (There are actually two versions of these macros, one of which is
4064 more defensive but less efficient and is used for error-checking.)
4065
4066 @node lrecords
4067 @section lrecords
4068
4069 [see @file{lrecord.h}]
4070
4071 All lrecords have at the beginning of their structure a @code{struct
4072 lrecord_header}. This just contains a pointer to a @code{struct
4073 lrecord_implementation}, which is a structure containing method pointers
4074 and such. There is one of these for each type, and it is a global,
4075 constant, statically-declared structure that is declared in the
4076 @code{DEFINE_LRECORD_IMPLEMENTATION()} macro. (This macro actually
4077 declares an array of two @code{struct lrecord_implementation}
4078 structures. The first one contains all the standard method pointers,
4079 and is used in all normal circumstances. During garbage collection,
4080 however, the lrecord is @dfn{marked} by bumping its implementation
4081 pointer by one, so that it points to the second structure in the array.
4082 This structure contains a special indication in it that it's a
4083 @dfn{marked-object} structure: the finalize method is the special
4084 function @code{this_marks_a_marked_record()}, and all other methods are
4085 null pointers. At the end of garbage collection, all lrecords will
4086 either be reclaimed or unmarked by decrementing their implementation
4087 pointers, so this second structure pointer will never remain past
4088 garbage collection.
4089
4090 Simple lrecords (of type (c) above) just have a @code{struct
4091 lrecord_header} at their beginning. lcrecords, however, actually have a
4092 @code{struct lcrecord_header}. This, in turn, has a @code{struct
4093 lrecord_header} at its beginning, so sanity is preserved; but it also
4094 has a pointer used to chain all lrecords together, and a special ID
4095 field used to distinguish one lcrecord from another. (This field is used
4096 only for debugging and could be removed, but the space gain is not
4097 significant.)
4098
4099 Simple lrecords are created using @code{ALLOCATE_FIXED_TYPE()}, just
4100 like for other frob blocks. The only change is that the implementation
4101 pointer must be initialized correctly. (The implementation structure for
4102 an lrecord, or rather the pointer to it, is named @code{lrecord_float},
4103 @code{lrecord_extent}, @code{lrecord_buffer}, etc.)
4104
4105 lcrecords are created using @code{alloc_lcrecord()}. This takes a
4106 size to allocate and an implementation pointer. (The size needs to be
4107 passed because some lcrecords, such as window configurations, are of
4108 variable size.) This basically just @code{malloc()}s the storage,
4109 initializes the @code{struct lcrecord_header}, and chains the lcrecord
4110 onto the head of the list of all lcrecords, which is stored in the
4111 variable @code{all_lcrecords}. The calls to @code{alloc_lcrecord()}
4112 generally occur in the lowest-level allocation function for each lrecord
4113 type.
4114
4115 Whenever you create an lrecord, you need to call either
4116 @code{DEFINE_LRECORD_IMPLEMENTATION()} or
4117 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()}. This needs to be
4118 specified in a C file, at the top level. What this actually does is
4119 define and initialize the implementation structure for the lrecord. (And
4120 possibly declares a function @code{error_check_foo()} that implements
4121 the @code{XFOO()} macro when error-checking is enabled.) The arguments
4122 to the macros are the actual type name (this is used to construct the C
4123 variable name of the lrecord implementation structure and related
4124 structures using the @samp{##} macro concatenation operator), a string
4125 that names the type on the Lisp level (this may not be the same as the C
4126 type name; typically, the C type name has underscores, while the Lisp
4127 string has dashes), various method pointers, and the name of the C
4128 structure that contains the object. The methods are used to encapsulate
4129 type-specific information about the object, such as how to print it or
4130 mark it for garbage collection, so that it's easy to add new object
4131 types without having to add a specific case for each new type in a bunch
4132 of different places.
4133
4134 The difference between @code{DEFINE_LRECORD_IMPLEMENTATION()} and
4135 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()} is that the former is
4136 used for fixed-size object types and the latter is for variable-size
4137 object types. Most object types are fixed-size; some complex
4138 types, however (e.g. window configurations), are variable-size.
4139 Variable-size object types have an extra method, which is called
4140 to determine the actual size of a particular object of that type.
4141 (Currently this is only used for keeping allocation statistics.)
4142
4143 For the purpose of keeping allocation statistics, the allocation
4144 engine keeps a list of all the different types that exist. Note that,
4145 since @code{DEFINE_LRECORD_IMPLEMENTATION()} is a macro that is
4146 specified at top-level, there is no way for it to add to the list of all
4147 existing types. What happens instead is that each implementation
4148 structure contains in it a dynamically assigned number that is
4149 particular to that type. (Or rather, it contains a pointer to another
4150 structure that contains this number. This evasiveness is done so that
4151 the implementation structure can be declared const.) In the sweep stage
4152 of garbage collection, each lrecord is examined to see if its
4153 implementation structure has its dynamically-assigned number set. If
4154 not, it must be a new type, and it is added to the list of known types
4155 and a new number assigned. The number is used to index into an array
4156 holding the number of objects of each type and the total memory
4157 allocated for objects of that type. The statistics in this array are
4158 also computed during the sweep stage. These statistics are returned by
4159 the call to @code{garbage-collect} and are printed out at the end of the
4160 loadup phase.
4161
4162 Note that for every type defined with a @code{DEFINE_LRECORD_*()}
4163 macro, there needs to be a @code{DECLARE_LRECORD_IMPLEMENTATION()}
4164 somewhere in a @file{.h} file, and this @file{.h} file needs to be
4165 included by @file{inline.c}.
4166
4167 Furthermore, there should generally be a set of @code{XFOOBAR()},
4168 @code{FOOBARP()}, etc. macros in a @file{.h} (or occasionally @file{.c})
4169 file. To create one of these, copy an existing model and modify as
4170 necessary.
4171
4172 The various methods in the lrecord implementation structure are:
4173
4174 @enumerate
4175 @item
4176 @cindex mark method
4177 A @dfn{mark} method. This is called during the marking stage and passed
4178 a function pointer (usually the @code{mark_object()} function), which is
4179 used to mark an object. All Lisp objects that are contained within the
4180 object need to be marked by applying this function to them. The mark
4181 method should also return a Lisp object, which should be either nil or
4182 an object to mark. (This can be used in lieu of calling
4183 @code{mark_object()} on the object, to reduce the recursion depth, and
4184 consequently should be the most heavily nested sub-object, such as a
4185 long list.)
4186
4187 @strong{Note}: When the mark method is called, garbage collection
4188 is in progress, and special precautions need to be taken
4189 when accessing objects; see section (B) above.
4190
4191 If your mark method does not need to do anything, it can be
4192 @code{NULL}.
4193
4194 @item
4195 A @dfn{print} method. This is called to create a printed representation
4196 of the object, whenever @code{princ}, @code{prin1}, or the like is
4197 called. It is passed the object, a stream to which the output is to be
4198 directed, and an @code{escapeflag} which indicates whether the object's
4199 printed representation should be @dfn{escaped} so that it is
4200 readable. (This corresponds to the difference between @code{princ} and
4201 @code{prin1}.) Basically, @dfn{escaped} means that strings will have
4202 quotes around them and confusing characters in the strings such as
4203 quotes, backslashes, and newlines will be backslashed; and that special
4204 care will be taken to make symbols print in a readable fashion
4205 (e.g. symbols that look like numbers will be backslashed). Other
4206 readable objects should perhaps pass @code{escapeflag} on when
4207 sub-objects are printed, so that readability is preserved when necessary
4208 (or if not, always pass in a 1 for @code{escapeflag}). Non-readable
4209 objects should in general ignore @code{escapeflag}, except that some use
4210 it as an indication that more verbose output should be given.
4211
4212 Sub-objects are printed using @code{print_internal()}, which takes
4213 exactly the same arguments as are passed to the print method.
4214
4215 Literal C strings should be printed using @code{write_c_string()},
4216 or @code{write_string_1()} for non-null-terminated strings.
4217
4218 Functions that do not have a readable representation should check the
4219 @code{print_readably} flag and signal an error if it is set.
4220
4221 If you specify NULL for the print method, the
4222 @code{default_object_printer()} will be used.
4223
4224 @item
4225 A @dfn{finalize} method. This is called at the beginning of the sweep
4226 stage on lcrecords that are about to be freed, and should be used to
4227 perform any extra object cleanup. This typically involves freeing any
4228 extra @code{malloc()}ed memory associated with the object, releasing any
4229 operating-system and window-system resources associated with the object
4230 (e.g. pixmaps, fonts), etc.
4231
4232 The finalize method can be NULL if nothing needs to be done.
4233
4234 WARNING #1: The finalize method is also called at the end of the dump
4235 phase; this time with the for_disksave parameter set to non-zero. The
4236 object is @emph{not} about to disappear, so you have to make sure to
4237 @emph{not} free any extra @code{malloc()}ed memory if you're going to
4238 need it later. (Also, signal an error if there are any operating-system
4239 and window-system resources here, because they can't be dumped.)
4240
4241 Finalize methods should, as a rule, set to zero any pointers after
4242 they've been freed, and check to make sure pointers are not zero before
4243 freeing. Although I'm pretty sure that finalize methods are not called
4244 twice on the same object (except for the @code{for_disksave} proviso),
4245 we've gotten nastily burned in some cases by not doing this.
4246
4247 WARNING #2: The finalize method is @emph{only} called for
4248 lcrecords, @emph{not} for simply lrecords. If you need a
4249 finalize method for simple lrecords, you have to stick
4250 it in the @code{ADDITIONAL_FREE_foo()} macro in @file{alloc.c}.
4251
4252 WARNING #3: Things are in an @emph{extremely} bizarre state
4253 when @code{ADDITIONAL_FREE_foo()} is called, so you have to
4254 be incredibly careful when writing one of these functions.
4255 See the comment in @code{gc_sweep()}. If you ever have to add
4256 one of these, consider using an lcrecord or dealing with
4257 the problem in a different fashion.
4258
4259 @item
4260 An @dfn{equal} method. This compares the two objects for similarity,
4261 when @code{equal} is called. It should compare the contents of the
4262 objects in some reasonable fashion. It is passed the two objects and a
4263 @dfn{depth} value, which is used to catch circular objects. To compare
4264 sub-Lisp-objects, call @code{internal_equal()} and bump the depth value
4265 by one. If this value gets too high, a @code{circular-object} error
4266 will be signaled.
4267
4268 If this is NULL, objects are @code{equal} only when they are @code{eq},
4269 i.e. identical.
4270
4271 @item
4272 A @dfn{hash} method. This is used to hash objects when they are to be
4273 compared with @code{equal}. The rule here is that if two objects are
4274 @code{equal}, they @emph{must} hash to the same value; i.e. your hash
4275 function should use some subset of the sub-fields of the object that are
4276 compared in the ``equal'' method. If you specify this method as
4277 @code{NULL}, the object's pointer will be used as the hash, which will
4278 @emph{fail} if the object has an @code{equal} method, so don't do this.
4279
4280 To hash a sub-Lisp-object, call @code{internal_hash()}. Bump the
4281 depth by one, just like in the ``equal'' method.
4282
4283 To convert a Lisp object directly into a hash value (using
4284 its pointer), use @code{LISP_HASH()}. This is what happens when
4285 the hash method is NULL.
4286
4287 To hash two or more values together into a single value, use
4288 @code{HASH2()}, @code{HASH3()}, @code{HASH4()}, etc.
4289
4290 @item
4291 @dfn{getprop}, @dfn{putprop}, @dfn{remprop}, and @dfn{plist} methods.
4292 These are used for object types that have properties. I don't feel like
4293 documenting them here. If you create one of these objects, you have to
4294 use different macros to define them,
4295 i.e. @code{DEFINE_LRECORD_IMPLEMENTATION_WITH_PROPS()} or
4296 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION_WITH_PROPS()}.
4297
4298 @item
4299 A @dfn{size_in_bytes} method, when the object is of variable-size.
4300 (i.e. declared with a @code{_SEQUENCE_IMPLEMENTATION} macro.) This should
4301 simply return the object's size in bytes, exactly as you might expect.
4302 For an example, see the methods for window configurations and opaques.
4303 @end enumerate
4304
4305 @node Low-level allocation
4306 @section Low-level allocation
4307
4308 Memory that you want to allocate directly should be allocated using
4309 @code{xmalloc()} rather than @code{malloc()}. This implements
4310 error-checking on the return value, and once upon a time did some more
4311 vital stuff (i.e. @code{BLOCK_INPUT}, which is no longer necessary).
4312 Free using @code{xfree()}, and realloc using @code{xrealloc()}. Note
4313 that @code{xmalloc()} will do a non-local exit if the memory can't be
4314 allocated. (Many functions, however, do not expect this, and thus XEmacs
4315 will likely crash if this happens. @strong{This is a bug.} If you can,
4316 you should strive to make your function handle this OK. However, it's
4317 difficult in the general circumstance, perhaps requiring extra
4318 unwind-protects and such.)
4319
4320 Note that XEmacs provides two separate replacements for the standard
4321 @code{malloc()} library function. These are called @dfn{old GNU malloc}
4322 (@file{malloc.c}) and @dfn{new GNU malloc} (@file{gmalloc.c}),
4323 respectively. New GNU malloc is better in pretty much every way than
4324 old GNU malloc, and should be used if possible. (It used to be that on
4325 some systems, the old one worked but the new one didn't. I think this
4326 was due specifically to a bug in SunOS, which the new one now works
4327 around; so I don't think the old one ever has to be used any more.) The
4328 primary difference between both of these mallocs and the standard system
4329 malloc is that they are much faster, at the expense of increased space.
4330 The basic idea is that memory is allocated in fixed chunks of powers of
4331 two. This allows for basically constant malloc time, since the various
4332 chunks can just be kept on a number of free lists. (The standard system
4333 malloc typically allocates arbitrary-sized chunks and has to spend some
4334 time, sometimes a significant amount of time, walking the heap looking
4335 for a free block to use and cleaning things up.) The new GNU malloc
4336 improves on things by allocating large objects in chunks of 4096 bytes
4337 rather than in ever larger powers of two, which results in ever larger
4338 wastage. There is a slight speed loss here, but it's of doubtful
4339 significance.
4340
4341 NOTE: Apparently there is a third-generation GNU malloc that is
4342 significantly better than the new GNU malloc, and should probably
4343 be included in XEmacs.
4344
4345 There is also the relocating allocator, @file{ralloc.c}. This actually
4346 moves blocks of memory around so that the @code{sbrk()} pointer shrunk
4347 and virtual memory released back to the system. On some systems,
4348 this is a big win. On all systems, it causes a noticeable (and
4349 sometimes huge) speed penalty, so I turn it off by default.
4350 @file{ralloc.c} only works with the new GNU malloc in @file{gmalloc.c}.
4351 There are also two versions of @file{ralloc.c}, one that uses @code{mmap()}
4352 rather than block copies to move data around. This purports to
4353 be faster, although that depends on the amount of data that would
4354 have had to be block copied and the system-call overhead for
4355 @code{mmap()}. I don't know exactly how this works, except that the
4356 relocating-allocation routines are pretty much used only for
4357 the memory allocated for a buffer, which is the biggest consumer
4358 of space, esp. of space that may get freed later.
4359
4360 Note that the GNU mallocs have some ``memory warning'' facilities.
4361 XEmacs taps into them and issues a warning through the standard
4362 warning system, when memory gets to 75%, 85%, and 95% full.
4363 (On some systems, the memory warnings are not functional.)
4364
4365 Allocated memory that is going to be used to make a Lisp object
4366 is created using @code{allocate_lisp_storage()}. This calls @code{xmalloc()}
4367 but also verifies that the pointer to the memory can fit into
4368 a Lisp word (remember that some bits are taken away for a type
4369 tag and a mark bit). If not, an error is issued through @code{memory_full()}.
4370 @code{allocate_lisp_storage()} is called by @code{alloc_lcrecord()},
4371 @code{ALLOCATE_FIXED_TYPE()}, and the vector and bit-vector creation
4372 routines. These routines also call @code{INCREMENT_CONS_COUNTER()} at the
4373 appropriate times; this keeps statistics on how much memory is
4374 allocated, so that garbage-collection can be invoked when the
4375 threshold is reached.
4376
4377 @node Pure Space
4378 @section Pure Space
4379
4380 Not yet documented.
4381
4382 @node Cons
4383 @section Cons
4384
4385 Conses are allocated in standard frob blocks. The only thing to
4386 note is that conses can be explicitly freed using @code{free_cons()}
4387 and associated functions @code{free_list()} and @code{free_alist()}. This
4388 immediately puts the conses onto the cons free list, and decrements
4389 the statistics on memory allocation appropriately. This is used
4390 to good effect by some extremely commonly-used code, to avoid
4391 generating extra objects and thereby triggering GC sooner.
4392 However, you have to be @emph{extremely} careful when doing this.
4393 If you mess this up, you will get BADLY BURNED, and it has happened
4394 before.
4395
4396 @node Vector
4397 @section Vector
4398
4399 As mentioned above, each vector is @code{malloc()}ed individually, and
4400 all are threaded through the variable @code{all_vectors}. Vectors are
4401 marked strangely during garbage collection, by kludging the size field.
4402 Note that the @code{struct Lisp_Vector} is declared with its contents
4403 being an array of one element. It is actually @code{malloc()}ed with
4404 the right size, however, and access to any element through the contents
4405 array works fine.
4406
4407 @node Bit Vector
4408 @section Bit Vector
4409
4410 Bit vectors work exactly like vectors, except for more complicated
4411 code to access an individual bit, and except for the fact that bit
4412 vectors are lrecords while vectors are not. (The only difference here is
4413 that there's an lrecord implementation pointer at the beginning and the
4414 tag field in bit vector Lisp words is ``lrecord'' rather than
4415 ``vector''.)
4416
4417 @node Symbol
4418 @section Symbol
4419
4420 Symbols are also allocated in frob blocks. Note that the code
4421 exists for symbols to be either lrecords (category (c) above)
4422 or simple types (category (b) above), and are lrecords by
4423 default (I think), although there is no good reason for this.
4424
4425 Note that symbols in the awful horrible obarray structure are
4426 chained through their @code{next} field.
4427
4428 Remember that @code{intern} looks up a symbol in an obarray, creating
4429 one if necessary.
4430
4431 @node Marker
4432 @section Marker
4433
4434 Markers are allocated in frob blocks, as usual. They are kept
4435 in a buffer unordered, but in a doubly-linked list so that they
4436 can easily be removed. (Formerly this was a singly-linked list,
4437 but in some cases garbage collection took an extraordinarily
4438 long time due to the O(N^2) time required to remove lots of
4439 markers from a buffer.) Markers are removed from a buffer in
4440 the finalize stage, in @code{ADDITIONAL_FREE_marker()}.
4441
4442 @node String
4443 @section String
4444
4445 As mentioned above, strings are a special case. A string is logically
4446 two parts, a fixed-size object (containing the length, property list,
4447 and a pointer to the actual data), and the actual data in the string.
4448 The fixed-size object is a @code{struct Lisp_String} and is allocated in
4449 frob blocks, as usual. The actual data is stored in special
4450 @dfn{string-chars blocks}, which are 8K blocks of memory.
4451 Currently-allocated strings are simply laid end to end in these
4452 string-chars blocks, with a pointer back to the @code{struct Lisp_String}
4453 stored before each string in the string-chars block. When a new string
4454 needs to be allocated, the remaining space at the end of the last
4455 string-chars block is used if there's enough, and a new string-chars
4456 block is created otherwise.
4457
4458 There are never any holes in the string-chars blocks due to the string
4459 compaction and relocation that happens at the end of garbage collection.
4460 During the sweep stage of garbage collection, when objects are
4461 reclaimed, the garbage collector goes through all string-chars blocks,
4462 looking for unused strings. Each chunk of string data is preceded by a
4463 pointer to the corresponding @code{struct Lisp_String}, which indicates
4464 both whether the string is used and how big the string is, i.e. how to
4465 get to the next chuck of string data. Holes are compressed by
4466 block-copying the next string into the empty space and relocating the
4467 pointer stored in the corresponding @code{struct Lisp_String}.
4468 @strong{This means you have to be careful with strings in your code.}
4469 See the section above on @code{GCPRO}ing.
4470
4471 Note that there is one situation not handled: a string that is too big
4472 to fit into a string-chars block. Such strings, called @dfn{big
4473 strings}, are all @code{malloc()}ed as their own block. (#### Although it
4474 would make more sense for the threshold for big strings to be somewhat
4475 lower, e.g. 1/2 or 1/4 the size of a string-chars block. It seems that
4476 this was indeed the case formerly -- indeed, the threshold was set at
4477 1/8 -- but Mly forgot about this when rewriting things for 19.8.)
4478
4479 Note also that the string data in string-chars blocks is padded as
4480 necessary so that proper alignment constraints on the @code{struct
4481 Lisp_String} back pointers are maintained.
4482
4483 Finally, strings can be resized. This happens in Mule when a
4484 character is substituted with a different-length character, or during
4485 modeline frobbing. (You could also export this to Lisp, but it's not
4486 done so currently.) Resizing a string is a potentially tricky process.
4487 If the change is small enough that the padding can absorb it, nothing
4488 other than a simple memory move needs to be done. Keep in mind,
4489 however, that the string can't shrink too much because the offset to the
4490 next string in the string-chars block is computed by looking at the
4491 length and rounding to the nearest multiple of four or eight. If the
4492 string would shrink or expand beyond the correct padding, new string
4493 data needs to be allocated at the end of the last string-chars block and
4494 the data moved appropriately. This leaves some dead string data, which
4495 is marked by putting a special marker of 0xFFFFFFFF in the @code{struct
4496 Lisp_String} pointer before the data (there's no real @code{struct
4497 Lisp_String} to point to and relocate), and storing the size of the dead
4498 string data (which would normally be obtained from the now-non-existent
4499 @code{struct Lisp_String}) at the beginning of the dead string data gap.
4500 The string compactor recognizes this special 0xFFFFFFFF marker and
4501 handles it correctly.
4502
4503 @node Bytecode
4504 @section Bytecode
4505
4506 Not yet documented.
4507
4508 @node Events and the Event Loop, Evaluation; Stack Frames; Bindings, Allocation of Objects in XEmacs Lisp, Top
4509 @chapter Events and the Event Loop
4510
4511 @menu
4512 * Introduction to Events::
4513 * Main Loop::
4514 * Specifics of the Event Gathering Mechanism::
4515 * Specifics About the Emacs Event::
4516 * The Event Stream Callback Routines::
4517 * Other Event Loop Functions::
4518 * Converting Events::
4519 * Dispatching Events; The Command Builder::
4520 @end menu
4521
4522 @node Introduction to Events
4523 @section Introduction to Events
4524
4525 An event is an object that encapsulates information about an
4526 interesting occurrence in the operating system. Events are
4527 generated either by user action, direct (e.g. typing on the
4528 keyboard or moving the mouse) or indirect (moving another
4529 window, thereby generating an expose event on an Emacs frame),
4530 or as a result of some other typically asynchronous action happening,
4531 such as output from a subprocess being ready or a timer expiring.
4532 Events come into the system in an asynchronous fashion (typically
4533 through a callback being called) and are converted into a
4534 synchronous event queue (first-in, first-out) in a process that
4535 we will call @dfn{collection}.
4536
4537 Note that each application has their own event queue. (It is
4538 immaterial whether the collection process directly puts the
4539 events in the proper application's queue, or puts them into
4540 a single system queue, which is later split up.)
4541
4542 The most basic level of event collection is done by the
4543 operating system or window system. Typically, XEmacs does
4544 its own event collection as well. Often there are multiple
4545 layers of collection in XEmacs, with events from various
4546 sources being collected into a queue, which is then combined
4547 with other sources to go into another queue (i.e. a second
4548 level of collection), with perhaps another level on top of
4549 this, etc.
4550
4551 XEmacs has its own types of events (called @dfn{Emacs events}),
4552 which provides an abstract layer on top of the system-dependent
4553 nature of the most basic events that are received. Part of the
4554 complex nature of the XEmacs event collection process involves
4555 converting from the operating-system events into the proper
4556 Emacs events -- there may not be a one-to-one correspondence.
4557
4558 Emacs events are documented in @file{events.h}; I'll discuss them
4559 later.
4560
4561 @node Main Loop
4562 @section Main Loop
4563
4564 The @dfn{command loop} is the top-level loop that the editor is always
4565 running. It loops endlessly, calling @code{next-event} to retrieve an
4566 event and @code{dispatch-event} to execute it. @code{dispatch-event} does
4567 the appropriate thing with non-user events (process, timeout,
4568 magic, eval, mouse motion); this involves calling a Lisp handler
4569 function, redrawing a newly-exposed part of a frame, reading
4570 subprocess output, etc. For user events, @code{dispatch-event}
4571 looks up the event in relevant keymaps or menubars; when a
4572 full key sequence or menubar selection is reached, the appropriate
4573 function is executed. @code{dispatch-event} may have to keep state
4574 across calls; this is done in the ``command-builder'' structure
4575 associated with each console (remember, there's usually only
4576 one console), and the engine that looks up keystrokes and
4577 constructs full key sequences is called the @dfn{command builder}.
4578 This is documented elsewhere.
4579
4580 The guts of the command loop are in @code{command_loop_1()}. This
4581 function doesn't catch errors, though -- that's the job of
4582 @code{command_loop_2()}, which is a condition-case (i.e. error-trapping)
4583 wrapper around @code{command_loop_1()}. @code{command_loop_1()} never
4584 returns, but may get thrown out of.
4585
4586 When an error occurs, @code{cmd_error()} is called, which usually
4587 invokes the Lisp error handler in @code{command-error}; however, a
4588 default error handler is provided if @code{command-error} is @code{nil}
4589 (e.g. during startup). The purpose of the error handler is simply to
4590 display the error message and do associated cleanup; it does not need to
4591 throw anywhere. When the error handler finishes, the condition-case in
4592 @code{command_loop_2()} will finish and @code{command_loop_2()} will
4593 reinvoke @code{command_loop_1()}.
4594
4595 @code{command_loop_2()} is invoked from three places: from
4596 @code{initial_command_loop()} (called from @code{main()} at the end of
4597 internal initialization), from the Lisp function @code{recursive-edit},
4598 and from @code{call_command_loop()}.
4599
4600 @code{call_command_loop()} is called when a macro is started and when
4601 the minibuffer is entered; normal termination of the macro or minibuffer
4602 causes a throw out of the recursive command loop. (To
4603 @code{execute-kbd-macro} for macros and @code{exit} for minibuffers.
4604 Note also that the low-level minibuffer-entering function,
4605 @code{read-minibuffer-internal}, provides its own error handling and
4606 does not need @code{command_loop_2()}'s error encapsulation; so it tells
4607 @code{call_command_loop()} to invoke @code{command_loop_1()} directly.)
4608
4609 Note that both read-minibuffer-internal and recursive-edit set up a
4610 catch for @code{exit}; this is why @code{abort-recursive-edit}, which
4611 throws to this catch, exits out of either one.
4612
4613 @code{initial_command_loop()}, called from @code{main()}, sets up a
4614 catch for @code{top-level} when invoking @code{command_loop_2()},
4615 allowing functions to throw all the way to the top level if they really
4616 need to. Before invoking @code{command_loop_2()},
4617 @code{initial_command_loop()} calls @code{top_level_1()}, which handles
4618 all of the startup stuff (creating the initial frame, handling the
4619 command-line options, loading the user's @file{.emacs} file, etc.). The
4620 function that actually does this is in Lisp and is pointed to by the
4621 variable @code{top-level}; normally this function is
4622 @code{normal-top-level}. @code{top_level_1()} is just an error-handling
4623 wrapper similar to @code{command_loop_2()}. Note also that
4624 @code{initial_command_loop()} sets up a catch for @code{top-level} when
4625 invoking @code{top_level_1()}, just like when it invokes
4626 @code{command_loop_2()}.
4627
4628 @node Specifics of the Event Gathering Mechanism
4629 @section Specifics of the Event Gathering Mechanism
4630
4631 Here is an approximate diagram of the collection processes
4632 at work in XEmacs, under TTY's (TTY's are simpler than X
4633 so we'll look at this first):
4634
4635 @noindent
4636 @example
4637 asynch. asynch. asynch. asynch. [Collectors in
4638 kbd events kbd events process process the OS]
4639 | | output output
4640 | | | |
4641 | | | | SIGINT, [signal handlers
4642 | | | | SIGQUIT, in XEmacs]
4643 V V V V SIGWINCH,
4644 file file file file SIGALRM
4645 desc. desc. desc. desc. |
4646 (TTY) (TTY) (pipe) (pipe) |
4647 | | | | fake timeouts
4648 | | | | file |
4649 | | | | desc. |
4650 | | | | (pipe) |
4651 | | | | | |
4652 | | | | | |
4653 | | | | | |
4654 V V V V V V
4655 ------>-----------<----------------<----------------
4656 |
4657 |
4658 | [collected using select() in emacs_tty_next_event()
4659 | and converted to the appropriate Emacs event]
4660 |
4661 |
4662 V (above this line is TTY-specific)
4663 Emacs ------------------------------------------------
4664 event (below this line is the generic event mechanism)
4665 |
4666 |
4667 was there if not, call
4668 a SIGINT? emacs_tty_next_event()
4669 | |
4670 | |
4671 | |
4672 V V
4673 --->-------<----
4674 |
4675 | [collected in event_stream_next_event();
4676 | SIGINT is converted using maybe_read_quit_event()]
4677 V
4678 Emacs
4679 event
4680 |
4681 \---->------>----- maybe_kbd_translate() ---->---\
4682 |
4683 |
4684 |
4685 command event queue |
4686 if not from command
4687 (contains events that were event queue, call
4688 read earlier but not processed, event_stream_next_event()
4689 typically when waiting in a |
4690 sit-for, sleep-for, etc. for |
4691 a particular event to be received) |
4692 | |
4693 | |
4694 V V
4695 ---->------------------------------------<----
4696 |
4697 | [collected in
4698 | next_event_internal()]
4699 |
4700 unread- unread- event from |
4701 command- command- keyboard else, call
4702 events event macro next_event_internal()
4703 | | | |
4704 | | | |
4705 | | | |
4706 V V V V
4707 --------->----------------------<------------
4708 |
4709 | [collected in `next-event', which may loop
4710 | more than once if the event it gets is on
4711 | a dead frame, device, etc.]
4712 |
4713 |
4714 V
4715 feed into top-level event loop,
4716 which repeatedly calls `next-event'
4717 and then dispatches the event
4718 using `dispatch-event'
4719 @end example
4720
4721 Notice the separation between TTY-specific and generic event mechanism.
4722 When using the Xt-based event loop, the TTY-specific stuff is replaced
4723 but the rest stays the same.
4724
4725 It's also important to realize that only one different kind of
4726 system-specific event loop can be operating at a time, and must be able
4727 to receive all kinds of events simultaneously. For the two existing
4728 event loops (implemented in @file{event-tty.c} and @file{event-Xt.c},
4729 respectively), the TTY event loop @emph{only} handles TTY consoles,
4730 while the Xt event loop handles @emph{both} TTY and X consoles. This
4731 situation is different from all of the output handlers, where you simply
4732 have one per console type.
4733
4734 Here's the Xt Event Loop Diagram (notice that below a certain point,
4735 it's the same as the above diagram):
4736
4737 @example
4738 asynch. asynch. asynch. asynch. [Collectors in
4739 kbd kbd process process the OS]
4740 events events output output
4741 | | | |
4742 | | | | asynch. asynch. [Collectors in the
4743 | | | | X X OS and X Window System]
4744 | | | | events events
4745 | | | | | |
4746 | | | | | |
4747 | | | | | | SIGINT, [signal handlers
4748 | | | | | | SIGQUIT, in XEmacs]
4749 | | | | | | SIGWINCH,
4750 | | | | | | SIGALRM
4751 | | | | | | |
4752 | | | | | | |
4753 | | | | | | | timeouts
4754 | | | | | | | |
4755 | | | | | | | |
4756 | | | | | | V |
4757 V V V V V V fake |
4758 file file file file file file file |
4759 desc. desc. desc. desc. desc. desc. desc. |
4760 (TTY) (TTY) (pipe) (pipe) (socket) (socket) (pipe) |
4761 | | | | | | | |
4762 | | | | | | | |
4763 | | | | | | | |
4764 V V V V V V V V
4765 --->----------------------------------------<---------<------
4766 | | |
4767 | | | [collected using select() in
4768 | | | _XtWaitForSomething(), called
4769 | | | from XtAppProcessEvent(), called
4770 | | | in emacs_Xt_next_event();
4771 | | | dispatched to various callbacks]
4772 | | |
4773 | | |
4774 emacs_Xt_ p_s_callback(), | [popup_selection_callback]
4775 event_handler() x_u_v_s_callback(),| [x_update_vertical_scrollbar_
4776 | x_u_h_s_callback(),| callback]
4777 | search_callback() | [x_update_horizontal_scrollbar_
4778 | | | callback]
4779 | | |
4780 | | |
4781 enqueue_Xt_ signal_special_ |
4782 dispatch_event() Xt_user_event() |
4783 [maybe multiple | |
4784 times, maybe 0 | |
4785 times] | |
4786 | enqueue_Xt_ |
4787 | dispatch_event() |
4788 | | |
4789 | | |
4790 V V |
4791 -->----------<-- |
4792 | |
4793 | |
4794 dispatch Xt_what_callback()
4795 event sets flags
4796 queue |
4797 | |
4798 | |
4799 | |
4800 | |
4801 ---->-----------<--------
4802 |
4803 |
4804 | [collected and converted as appropriate in
4805 | emacs_Xt_next_event()]
4806 |
4807 |
4808 V (above this line is Xt-specific)
4809 Emacs ------------------------------------------------
4810 event (below this line is the generic event mechanism)
4811 |
4812 |
4813 was there if not, call
4814 a SIGINT? emacs_Xt_next_event()
4815 | |
4816 | |
4817 | |
4818 V V
4819 --->-------<----
4820 |
4821 | [collected in event_stream_next_event();
4822 | SIGINT is converted using maybe_read_quit_event()]
4823 V
4824 Emacs
4825 event
4826 |
4827 \---->------>----- maybe_kbd_translate() -->-----\
4828 |
4829 |
4830 |
4831 command event queue |
4832 if not from command
4833 (contains events that were event queue, call
4834 read earlier but not processed, event_stream_next_event()
4835 typically when waiting in a |
4836 sit-for, sleep-for, etc. for |
4837 a particular event to be received) |
4838 | |
4839 | |
4840 V V
4841 ---->----------------------------------<------
4842 |
4843 | [collected in
4844 | next_event_internal()]
4845 |
4846 unread- unread- event from |
4847 command- command- keyboard else, call
4848 events event macro next_event_internal()
4849 | | | |
4850 | | | |
4851 | | | |
4852 V V V V
4853 --------->----------------------<------------
4854 |
4855 | [collected in `next-event', which may loop
4856 | more than once if the event it gets is on
4857 | a dead frame, device, etc.]
4858 |
4859 |
4860 V
4861 feed into top-level event loop,
4862 which repeatedly calls `next-event'
4863 and then dispatches the event
4864 using `dispatch-event'
4865 @end example
4866
4867 @node Specifics About the Emacs Event
4868 @section Specifics About the Emacs Event
4869
4870 @node The Event Stream Callback Routines
4871 @section The Event Stream Callback Routines
4872
4873 @node Other Event Loop Functions
4874 @section Other Event Loop Functions
4875
4876 @code{detect_input_pending()} and @code{input-pending-p} look for
4877 input by calling @code{event_stream->event_pending_p} and looking in
4878 @code{[V]unread-command-event} and the @code{command_event_queue} (they
4879 do not check for an executing keyboard macro, though).
4880
4881 @code{discard-input} cancels any command events pending (and any
4882 keyboard macros currently executing), and puts the others onto the
4883 @code{command_event_queue}. There is a comment about a ``race
4884 condition'', which is not a good sign.
4885
4886 @code{next-command-event} and @code{read-char} are higher-level
4887 interfaces to @code{next-event}. @code{next-command-event} gets the
4888 next @dfn{command} event (i.e. keypress, mouse event, or menu
4889 selection), calling dispatch-event on any others. @code{read-char}
4890 calls @code{next-command-event} and uses @code{event_to_character()} to
4891 return the ASCII equivalent.
4892
4893 @node Converting Events
4894 @section Converting Events
4895
4896 @code{character_to_event()}, @code{event_to_character()},
4897 @code{event-to-character}, and @code{character-to-event} convert between
4898 ASCII characters and keypresses corresponding to the characters. If the
4899 event was not a keypress, @code{event_to_character()} returns -1 and
4900 @code{event-to-character} returns @code{nil}. These functions convert
4901 between ASCII representation and the split-up event representation
4902 (keysym plus mod keys).
4903
4904 @node Dispatching Events; The Command Builder
4905 @section Dispatching Events; The Command Builder
4906
4907 Not yet documented.
4908
4909 @node Evaluation; Stack Frames; Bindings, Symbols and Variables, Events and the Event Loop, Top
4910 @chapter Evaluation; Stack Frames; Bindings
4911
4912 @menu
4913 * Evaluation::
4914 * Dynamic Binding; The specbinding Stack; Unwind-Protects::
4915 * Simple Special Forms::
4916 * Catch and Throw::
4917 @end menu
4918
4919 @node Evaluation
4920 @section Evaluation
4921
4922 @code{Feval()} evaluates the form (a Lisp object) that is passed to
4923 it. Note that evaluation is only non-trivial for two types of objects:
4924 symbols and conses. Under normal circumstances (i.e. not mocklisp) a
4925 symbol is evaluated simply by calling symbol-value on it and returning
4926 the value.
4927
4928 Evaluating a cons means calling a function. First, @code{eval} checks
4929 to see if garbage-collection is necessary, and calls
4930 @code{Fgarbage_collect()} if so. It then increases the evaluation depth
4931 by 1 (@code{lisp_eval_depth}, which is always less than @code{max_lisp_eval_depth}) and adds an
4932 element to the linked list of @code{struct backtrace}'s
4933 (@code{backtrace_list}). Each such structure contains a pointer to the
4934 function being called plus a list of the function's arguments.
4935 Originally these values are stored unevalled, and as they are evaluated,
4936 the backtrace structure is updated. Garbage collection pays attention
4937 to the objects pointed to in the backtrace structures (garbage
4938 collection might happen while a function is being called or while an
4939 argument is being evaluated, and there could easily be no other
4940 references to the arguments in the argument list; once an argument is
4941 evaluated, however, the unevalled version is not needed by eval, and so
4942 the backtrace structure is changed).
4943
4944 At this point, the function to be called is determined by looking at
4945 the car of the cons (if this is a symbol, its function definition is
4946 retrieved and the process repeated). The function should then consist
4947 of either a Lisp_Subr (built-in function), a Lisp_Compiled object, or a
4948 cons whose car is the symbol @code{autoload}, @code{macro},
4949 @code{lambda}, or @code{mocklisp}.
4950
4951 If the function is a Lisp_Subr, the lisp object points to a struct
4952 Lisp_Subr (created by @code{DEFUN()}), which contains a pointer to the C
4953 function, a minimum and maximum number of arguments (possibly the
4954 special constants @code{MANY} or @code{UNEVALLED}), a pointer to the
4955 symbol referring to that subr, and a couple of other things. If the
4956 subr wants its arguments @code{UNEVALLED}, they are passed raw as a
4957 list. Otherwise, an array of evaluated arguments is created and put
4958 into the backtrace structure, and either passed whole (@code{MANY}) or
4959 each argument is passed as a C argument.
4960
4961 If the function is a Lisp_Compiled object or a lambda,
4962 @code{apply_lambda()} is called. If the function is a macro,
4963 [..... fill in] is done. If the function is an autoload,
4964 @code{do_autoload()} is called to load the definition and then eval
4965 starts over [explain this more]. If the function is a mocklisp,
4966 @code{ml_apply()} is called.
4967
4968 When @code{Feval} exits, the evaluation depth is reduced by one, the
4969 debugger is called if appropriate, and the current backtrace structure
4970 is removed from the list.
4971
4972 @code{apply_lambda()} is passed a function, a list of arguments, and a
4973 flag indicating whether to evaluate the arguments. It creates an array
4974 of (possibly) evaluated arguments and fixes up the backtrace structure,
4975 just like eval does. Then it calls @code{funcall_lambda()}.
4976
4977 @code{funcall_lambda()} goes through the formal arguments to the
4978 function and binds them to the actual arguments, checking for
4979 @code{&rest} and @code{&optional} symbols in the formal arguments and
4980 making sure the number of actual arguments is correct. Then either
4981 progn or byte-code is called to actually execute the body and return a
4982 value.
4983
4984 @code{Ffuncall()} implements Lisp @code{funcall}. @code{(funcall fun
4985 x1 x2 x3 ...)} is equivalent to @code{(eval (list fun (quote x1) (quote
4986 x2) (quote x3) ...))}. @code{Ffuncall()} contains its own code to do
4987 the evaluation, however, and is almost identical to eval.
4988
4989 @code{Fapply()} implements Lisp @code{apply}, which is very similar to
4990 funcall except that if the last argument is a list, the result is the
4991 same as if each of the arguments in the list had been passed separately.
4992 @code{Fapply()} does some business to expand the last argument if it's a
4993 list, then calls @code{Ffuncall()} to do the work.
4994
4995 @code{apply1()}, @code{call0()}, @code{call1()}, @code{call2()}, and
4996 @code{call3()} call a function, passing it the argument(s) given (the
4997 arguments are given as separate C arguments rather than being passed as
4998 an array). @code{apply1()} uses @code{apply} while the others use
4999 @code{funcall}.
5000
5001 @node Dynamic Binding; The specbinding Stack; Unwind-Protects
5002 @section Dynamic Binding; The specbinding Stack; Unwind-Protects
5003
5004 @example
5005 struct specbinding
5006 @{
5007 Lisp_Object symbol, old_value;
5008 Lisp_Object (*func) ();
5009 Lisp_Object unused; /* Dividing by 16 is faster than by 12 */
5010 @};
5011 @end example
5012
5013 @code{struct specbinding} is used for local-variable bindings and
5014 unwind-protects. @code{specpdl} holds an array of @code{struct specbinding}'s,
5015 @code{specpdl_ptr} points to the beginning of the free bindings in the
5016 array, @code{specpdl_size} specifies the total number of binding slots
5017 in the array, and @code{max_specpdl_size} specifies the maximum number
5018 of bindings the array can be expanded to hold. @code{grow_specpdl()}
5019 increases the size of the specpdl array, multiplying its size by 2 but
5020 never exceeding max_specpdl_size (except that if this number is less
5021 than 400, it is first set to 400).
5022
5023 @code{specbind()} binds a symbol to a value and is used for local
5024 variables and @code{let} forms. The symbol and its old value (which
5025 might be @code{Qunbound}, indicating no prior value) are recorded in the
5026 specpdl array, and @code{specpdl_size} is increased by 1.
5027
5028 @code{record_unwind_protect()} implements an @dfn{unwind-protect},
5029 which, when placed around a section of code, ensures that some specified
5030 cleanup routine will be executed even if the code exits abnormally
5031 (e.g. through a throw or quit). @code{record_unwind_protect()} simply
5032 adds a new specbinding to the specpdl array and stores the appropriate
5033 information in it. The cleanup routine can either be a C function,
5034 which is stored in the @code{func} field, or a progn form, which is stored in
5035 the @code{old_value} field.
5036
5037 @code{unbind_to()} removes specbindings from the specpdl array until
5038 the specified position is reached. The specbinding can be one of three
5039 types:
5040
5041 @enumerate
5042 @item
5043 an unwind-protect with a C cleanup function (@code{func} is not 0 --
5044 @code{old_value} holds an argument to be passed to the function);
5045 @item
5046 an unwind-protect with a Lisp form (@code{func} is 0 and @code{symbol}
5047 is @code{nil} -- @code{old_value} holds the form to be executed with
5048 @code{Fprogn()}); or
5049 @item
5050 a local-variable binding (@code{func} is 0 and @code{symbol} is not
5051 @code{nil} -- @code{old_value} holds the old value, which is stored as
5052 the symbol's value).
5053 @end enumerate
5054
5055 @node Simple Special Forms
5056 @section Simple Special Forms
5057
5058 @code{or}, @code{and}, @code{if}, @code{cond}, @code{progn},
5059 @code{prog1}, @code{prog2}, @code{setq}, @code{quote}, @code{function},
5060 @code{let*}, @code{let}, @code{while}
5061
5062 All of these are very simple and work as expected, calling
5063 @code{Feval()} or @code{Fprogn()} as necessary and (in the case of
5064 @code{let} and @code{let*}) using @code{specbind()} to create bindings
5065 and @code{unbind_to()} to undo the bindings when finished. Note that
5066 these functions do a lot of @code{GCPRO}ing to protect their arguments
5067 from garbage collection because they call @code{Feval()} (@pxref{Garbage
5068 Collection}).
5069
5070 @node Catch and Throw
5071 @section Catch and Throw
5072
5073 @example
5074 struct catchtag
5075 @{
5076 Lisp_Object tag;
5077 Lisp_Object val;
5078 struct catchtag *next;
5079 struct gcpro *gcpro;
5080 jmp_buf jmp;
5081 struct backtrace *backlist;
5082 int lisp_eval_depth;
5083 int pdlcount;
5084 @};
5085 @end example
5086
5087 @code{catch} is a Lisp function that places a catch around a body of
5088 code. A catch is a means of non-local exit from the code. When a catch
5089 is created, a tag is specified, and executing a @code{throw} to this tag
5090 will exit from the body of code caught with this tag, and its value will
5091 be the value given in the call to @code{throw}. If there is no such
5092 call, the code will be executed normally.
5093
5094 Information pertaining to a catch is held in a @code{struct catchtag},
5095 which is placed at the head of a linked list pointed to by
5096 @code{catchlist}. @code{internal_catch()} is passed a C function to
5097 call (@code{Fprogn()} when Lisp @code{catch} is called) and arguments to
5098 give it, and places a catch around the function. Each @code{struct
5099 catchtag} is held in the stack frame of the @code{internal_catch()}
5100 instance that created the catch.
5101
5102 @code{internal_catch()} is fairly straightforward. It stores into the
5103 @code{struct catchtag} the tag name and the current values of
5104 @code{backtrace_list}, @code{lisp_eval_depth}, @code{gcprolist}, and the
5105 offset into the specpdl array, sets a jump point with @code{_setjmp()}
5106 (storing the jump point into the @code{struct catchtag}), and calls the
5107 function. Control will return to @code{internal_catch()} either when
5108 the function exits normally or through a @code{_longjmp()} to this jump
5109 point. In the latter case, @code{throw} will store the value to be
5110 returned into the @code{struct catchtag} before jumping. When it's
5111 done, @code{internal_catch()} removes the @code{struct catchtag} from
5112 the catchlist and returns the proper value.
5113
5114 @code{Fthrow()} goes up through the catchlist until it finds one with
5115 a matching tag. It then calls @code{unbind_catch()} to restore
5116 everything to what it was when the appropriate catch was set, stores the
5117 return value in the @code{struct catchtag}, and jumps (with
5118 @code{_longjmp()}) to its jump point.
5119
5120 @code{unbind_catch()} removes all catches from the catchlist until it
5121 finds the correct one. Some of the catches might have been placed for
5122 error-trapping, and if so, the appropriate entries on the handlerlist
5123 must be removed (see ``errors''). @code{unbind_catch()} also restores
5124 the values of @code{gcprolist}, @code{backtrace_list}, and
5125 @code{lisp_eval}, and calls @code{unbind_to()} to undo any specbindings
5126 created since the catch.
5127
5128
5129 @node Symbols and Variables, Buffers and Textual Representation, Evaluation; Stack Frames; Bindings, Top
5130 @chapter Symbols and Variables
5131
5132 @menu
5133 * Introduction to Symbols::
5134 * Obarrays::
5135 * Symbol Values::
5136 @end menu
5137
5138 @node Introduction to Symbols
5139 @section Introduction to Symbols
5140
5141 A symbol is basically just an object with four fields: a name (a
5142 string), a value (some Lisp object), a function (some Lisp object), and
5143 a property list (usually a list of alternating keyword/value pairs).
5144 What makes symbols special is that there is usually only one symbol with
5145 a given name, and the symbol is referred to by name. This makes a
5146 symbol a convenient way of calling up data by name, i.e. of implementing
5147 variables. (The variable's value is stored in the @dfn{value slot}.)
5148 Similarly, functions are referenced by name, and the definition of the
5149 function is stored in a symbol's @dfn{function slot}. This means that
5150 there can be a distinct function and variable with the same name. The
5151 property list is used as a more general mechanism of associating
5152 additional values with particular names, and once again the namespace is
5153 independent of the function and variable namespaces.
5154
5155 @node Obarrays
5156 @section Obarrays
5157
5158 The identity of symbols with their names is accomplished through a
5159 structure called an obarray, which is just a poorly-implemented hash
5160 table mapping from strings to symbols whose name is that string. (I say
5161 ``poorly implemented'' because an obarray appears in Lisp as a vector
5162 with some hidden fields rather than as its own opaque type. This is an
5163 Emacs Lisp artifact that should be fixed.)
5164
5165 Obarrays are implemented as a vector of some fixed size (which should
5166 be a prime for best results), where each ``bucket'' of the vector
5167 contains one or more symbols, threaded through a hidden @code{next}
5168 field in the symbol. Lookup of a symbol in an obarray, and adding a
5169 symbol to an obarray, is accomplished through standard hash-table
5170 techniques.
5171
5172 The standard Lisp function for working with symbols and obarrays is
5173 @code{intern}. This looks up a symbol in an obarray given its name; if
5174 it's not found, a new symbol is automatically created with the specified
5175 name, added to the obarray, and returned. This is what happens when the
5176 Lisp reader encounters a symbol (or more precisely, encounters the name
5177 of a symbol) in some text that it is reading. There is a standard
5178 obarray called @code{obarray} that is used for this purpose, although
5179 the Lisp programmer is free to create his own obarrays and @code{intern}
5180 symbols in them.
5181
5182 Note that, once a symbol is in an obarray, it stays there until
5183 something is done about it, and the standard obarray @code{obarray}
5184 always stays around, so once you use any particular variable name, a
5185 corresponding symbol will stay around in @code{obarray} until you exit
5186 XEmacs.
5187
5188 Note that @code{obarray} itself is a variable, and as such there is a
5189 symbol in @code{obarray} whose name is @code{"obarray"} and which
5190 contains @code{obarray} as its value.
5191
5192 Note also that this call to @code{intern} occurs only when in the Lisp
5193 reader, not when the code is executed (at which point the symbol is
5194 already around, stored as such in the definition of the function).
5195
5196 You can create your own obarray using @code{make-vector} (this is
5197 horrible but is an artifact) and intern symbols into that obarray.
5198 Doing that will result in two or more symbols with the same name.
5199 However, at most one of these symbols is in the standard @code{obarray}:
5200 You cannot have two symbols of the same name in any particular obarray.
5201 Note that you cannot add a symbol to an obarray in any fashion other
5202 than using @code{intern}: i.e. you can't take an existing symbol and put
5203 it in an existing obarray. Nor can you change the name of an existing
5204 symbol. (Since obarrays are vectors, you can violate the consistency of
5205 things by storing directly into the vector, but let's ignore that
5206 possibility.)
5207
5208 Usually symbols are created by @code{intern}, but if you really want,
5209 you can explicitly create a symbol using @code{make-symbol}, giving it
5210 some name. The resulting symbol is not in any obarray (i.e. it is
5211 @dfn{uninterned}), and you can't add it to any obarray. Therefore its
5212 primary purpose is as a carrier of information. (Cons cells could
5213 probably be used just as well.)
5214
5215 You can also use @code{intern-soft} to look up a symbol but not create
5216 a new one, and @code{unintern} to remove a symbol from an obarray. This
5217 returns the removed symbol. (Remember: You can't put the symbol back
5218 into any obarray.) Finally, @code{mapatoms} maps over all of the symbols
5219 in an obarray.
5220
5221 @node Symbol Values
5222 @section Symbol Values
5223
5224 The value field of a symbol normally contains a Lisp object. However,
5225 a symbol can be @dfn{unbound}, meaning that it logically has no value.
5226 This is internally indicated by storing a special Lisp object, called
5227 @dfn{the unbound marker} and stored in the global variable
5228 @code{Qunbound}. The unbound marker is of a special Lisp object type
5229 called @dfn{symbol-value-magic}. It is impossible for the Lisp
5230 programmer to directly create or access any object of this type.
5231
5232 @strong{You must not let any ``symbol-value-magic'' object escape to
5233 the Lisp level.} Printing any of these objects will cause the message
5234 @samp{INTERNAL EMACS BUG} to appear as part of the print representation.
5235 (You may see this normally when you call @code{debug_print()} from the
5236 debugger on a Lisp object.) If you let one of these objects escape to
5237 the Lisp level, you will violate a number of assumptions contained in
5238 the C code and make the unbound marker not function right.
5239
5240 When a symbol is created, its value field (and function field) are set
5241 to @code{Qunbound}. The Lisp programmer can restore these conditions
5242 later using @code{makunbound} or @code{fmakunbound}, and can query to
5243 see whether the value of function fields are @dfn{bound} (i.e. have a
5244 value other than @code{Qunbound}) using @code{boundp} and
5245 @code{fboundp}. The fields are set to a normal Lisp object using
5246 @code{set} (or @code{setq}) and @code{fset}.
5247
5248 Other symbol-value-magic objects are used as special markers to
5249 indicate variables that have non-normal properties. This includes any
5250 variables that are tied into C variables (setting the variable magically
5251 sets some global variable in the C code, and likewise for retrieving the
5252 variable's value), variables that magically tie into slots in the
5253 current buffer, variables that are buffer-local, etc. The
5254 symbol-value-magic object is stored in the value cell in place of
5255 a normal object, and the code to retrieve a symbol's value
5256 (i.e. @code{symbol-value}) knows how to do special things with them.
5257 This means that you should not just fetch the value cell directly if you
5258 want a symbol's value.
5259
5260 The exact workings of this are rather complex and involved and are
5261 well-documented in comments in @file{buffer.c}, @file{symbols.c}, and
5262 @file{lisp.h}.
5263
5264 @node Buffers and Textual Representation, MULE Character Sets and Encodings, Symbols and Variables, Top
5265 @chapter Buffers and Textual Representation
5266
5267 @menu
5268 * Introduction to Buffers:: A buffer holds a block of text such as a file.
5269 * A Buffer@'s Text:: Representation of the text in a buffer.
5270 * Buffer Lists:: Keeping track of all buffers.
5271 * Markers and Extents:: Tagging locations within a buffer.
5272 * Bufbytes and Emchars:: Representation of individual characters.
5273 * The Buffer Object:: The Lisp object corresponding to a buffer.
5274 @end menu
5275
5276 @node Introduction to Buffers
5277 @section Introduction to Buffers
5278
5279 A buffer is logically just a Lisp object that holds some text.
5280 In this, it is like a string, but a buffer is optimized for
5281 frequent insertion and deletion, while a string is not. Furthermore:
5282
5283 @enumerate
5284 @item
5285 Buffers are @dfn{permanent} objects, i.e. one you create them, they
5286 remain around, and need to be explicitly deleted before they go away.
5287 @item
5288 Each buffer has a unique name, which is a string. Buffers are
5289 normally referred to by name. In this respect, they are like
5290 symbols.
5291 @item
5292 Buffers have a default insertion position, called @dfn{point}.
5293 Inserting text (unless you explicitly give a position) goes at point,
5294 and moves point forward past the text. This is what is going on when
5295 you type text into Emacs.
5296 @item
5297 Buffers have lots of extra properties associated with them.
5298 @item
5299 Buffers can be @dfn{displayed}. What this means is that there
5300 exist a number of @dfn{windows}, which are objects that correspond
5301 to some visible section of your display, and each window has
5302 an associated buffer, and the current contents of the buffer
5303 are shown in that section of the display. The redisplay mechanism
5304 (which takes care of doing this) knows how to look at the
5305 text of a buffer and come up with some reasonable way of displaying
5306 this. Many of the properties of a buffer control how the
5307 buffer's text is displayed.
5308 @item
5309 One buffer is distinguished and called the @dfn{current buffer}. It is
5310 stored in the variable @code{current_buffer}. Buffer operations operate
5311 on this buffer by default. When you are typing text into a buffer, the
5312 buffer you are typing into is always @code{current_buffer}. Switching
5313 to a different window changes the current buffer. Note that Lisp code
5314 can temporarily change the current buffer using @code{set-buffer} (often
5315 enclosed in a @code{save-excursion} so that the former current buffer
5316 gets restored when the code is finished). However, calling
5317 @code{set-buffer} will NOT cause a permanent change in the current
5318 buffer. The reason for this is that the top-level event loop sets
5319 current buffer to the buffer of the selected window, each time it
5320 finishes executing a user command.
5321 @end enumerate
5322
5323 Make sure you understand the distinction between @dfn{current buffer}
5324 and @dfn{buffer of the selected window}, and the distinction between
5325 @dfn{point} of the current buffer and @dfn{window-point} of the selected
5326 window. (This latter distinction is explained in detail in the section
5327 on windows.)
5328
5329 @node A Buffer@'s Text
5330 @section A Buffer's Text
5331
5332 The text in a buffer consists of a sequence of zero or more
5333 characters. A @dfn{character} is an integer that logically represents
5334 a letter, number, space, or other unit of text. Most of the characters
5335 that you will typically encounter belong to the ASCII set of characters,
5336 but there are also characters for various sorts of accented letters,
5337 special symbols, Chinese and Japanese ideograms (i.e. Kanji, Katakana,
5338 etc.), Cyrillic and Greek letters, etc. The actual number of possible
5339 characters is quite large.
5340
5341 For now, we can view a character as some non-negative integer that
5342 has some shape that defines how it typically appears (e.g. as an
5343 uppercase A). (The exact way in which a character appears depends
5344 on the font of the character.) The internal type of characters in
5345 the C code is an Emchar; this is just an int, but using a symbolic
5346 type makes the code clearer.
5347
5348 Between every character in a buffer is a @dfn{buffer position} or
5349 @dfn{character position}. We can speak of the character before or after
5350 a particular buffer position, and when you insert a character at a
5351 particular position, all characters after that position end up at new
5352 positions. When we speak of the character @dfn{at} a position, we
5353 really mean the character after the position. (This schizophrenia
5354 between a buffer position being ``between'' a character and ``on'' a
5355 character is rampant in Emacs.)
5356
5357 Buffer positions are numbered starting at 1. This means that
5358 position 1 is before the first character, and position 0 is not
5359 valid. If there are N characters in a buffer, then buffer
5360 position N+1 is after the last one, and position N+2 is not valid.
5361
5362 The internal makeup of the Emchar integer varies depending on whether
5363 we have compiled with MULE support. If not, the Emchar integer is an
5364 8-bit integer with possible values from 0 - 255. 0 - 127 are the
5365 standard ASCII characters, while 128 - 255 are the characters from the
5366 ISO-8859-1 character set. If we have compiled with MULE support, an
5367 Emchar is a 19-bit integer, with the various bits having meanings
5368 according to a complex scheme that will be detailed later. The
5369 characters numbered 0 - 255 still have the same meanings as for the
5370 non-MULE case, though.
5371
5372 Internally, the text in a buffer is represented in a fairly simple
5373 fashion: as a contiguous array of bytes, with a @dfn{gap} of some size
5374 in the middle. Although the gap is of some substantial size in bytes,
5375 there is no text contained within it: From the perspective of the text
5376 in the buffer, it does not exist. The gap logically sits at some buffer
5377 position, between two characters (or possibly at the beginning or end of
5378 the buffer). Insertion of text in a buffer at a particular position is
5379 always accomplished by first moving the gap to that position
5380 (i.e. through some block moving of text), then writing the text into the
5381 beginning of the gap, thereby shrinking the gap. If the gap shrinks
5382 down to nothing, a new gap is created. (What actually happens is that a
5383 new gap is ``created'' at the end of the buffer's text, which requires
5384 nothing more than changing a couple of indices; then the gap is
5385 ``moved'' to the position where the insertion needs to take place by
5386 moving up in memory all the text after that position.) Similarly,
5387 deletion occurs by moving the gap to the place where the text is to be
5388 deleted, and then simply expanding the gap to include the deleted text.
5389 (@dfn{Expanding} and @dfn{shrinking} the gap as just described means
5390 just that the internal indices that keep track of where the gap is
5391 located are changed.)
5392
5393 Note that the total amount of memory allocated for a buffer text never
5394 decreases while the buffer is live. Therefore, if you load up a
5395 20-megabyte file and then delete all but one character, there will be a
5396 20-megabyte gap, which won't get any smaller (except by inserting
5397 characters back again). Once the buffer is killed, the memory allocated
5398 for the buffer text will be freed, but it will still be sitting on the
5399 heap, taking up virtual memory, and will not be released back to the
5400 operating system. (However, if you have compiled XEmacs with rel-alloc,
5401 the situation is different. In this case, the space @emph{will} be
5402 released back to the operating system. However, this tends to effect a
5403 noticeable speed penalty.)
5404
5405 Astute readers may notice that the text in a buffer is represented as
5406 an array of @emph{bytes}, while (at least in the MULE case) an Emchar is
5407 a 19-bit integer, which clearly cannot fit in a byte. This means (of
5408 course) that the text in a buffer uses a different representation from
5409 an Emchar: specifically, the 19-bit Emchar becomes a series of one to
5410 four bytes. The conversion between these two representations is complex
5411 and will be described later.
5412
5413 In the non-MULE case, everything is very simple: An Emchar
5414 is an 8-bit value, which fits neatly into one byte.
5415
5416 If we are given a buffer position and want to retrieve the
5417 character at that position, we need to follow these steps:
5418
5419 @enumerate
5420 @item
5421 Pretend there's no gap, and convert the buffer position into a @dfn{byte
5422 index} that indexes to the appropriate byte in the buffer's stream of
5423 textual bytes. By convention, byte indices begin at 1, just like buffer
5424 positions. In the non-MULE case, byte indices and buffer positions are
5425 identical, since one character equals one byte.
5426 @item
5427 Convert the byte index into a @dfn{memory index}, which takes the gap
5428 into account. The memory index is a direct index into the block of
5429 memory that stores the text of a buffer. This basically just involves
5430 checking to see if the byte index is past the gap, and if so, adding the
5431 size of the gap to it. By convention, memory indices begin at 1, just
5432 like buffer positions and byte indices, and when referring to the
5433 position that is @dfn{at} the gap, we always use the memory position at
5434 the @emph{beginning}, not at the end, of the gap.
5435 @item
5436 Fetch the appropriate bytes at the determined memory position.
5437 @item
5438 Convert these bytes into an Emchar.
5439 @end enumerate
5440
5441 In the non-Mule case, (3) and (4) boil down to a simple one-byte
5442 memory access.
5443
5444 Note that we have defined three types of positions in a buffer:
5445
5446 @enumerate
5447 @item
5448 @dfn{buffer positions} or @dfn{character positions}, typedef @code{Bufpos}
5449 @item
5450 @dfn{byte indices}, typedef @code{Bytind}
5451 @item
5452 @dfn{memory indices}, typedef @code{Memind}
5453 @end enumerate
5454
5455 All three typedefs are just ints, but defining them this way makes
5456 things a lot clearer.
5457
5458 Most code works with buffer positions. In particular, all Lisp code
5459 that refers to text in a buffer uses buffer positions. Lisp code does
5460 not know that byte indices or memory indices exist.
5461
5462 Finally, we have a typedef for the bytes in a buffer. This is a
5463 @code{Bufbyte}, which is an unsigned char. Referring to them as
5464 Bufbytes underscores the fact that we are working with a string of bytes
5465 in the internal Emacs buffer representation rather than in one of a
5466 number of possible alternative representations (e.g. EUC-coded text,
5467 etc.).
5468
5469 @node Buffer Lists
5470 @section Buffer Lists
5471
5472 Recall earlier that buffers are @dfn{permanent} objects, i.e. that
5473 they remain around until explicitly deleted. This entails that there is
5474 a list of all the buffers in existence. This list is actually an
5475 assoc-list (mapping from the buffer's name to the buffer) and is stored
5476 in the global variable @code{Vbuffer_alist}.
5477
5478 The order of the buffers in the list is important: the buffers are
5479 ordered approximately from most-recently-used to least-recently-used.
5480 Switching to a buffer using @code{switch-to-buffer},
5481 @code{pop-to-buffer}, etc. and switching windows using
5482 @code{other-window}, etc. usually brings the new current buffer to the
5483 front of the list. @code{switch-to-buffer}, @code{other-buffer},
5484 etc. look at the beginning of the list to find an alternative buffer to
5485 suggest. You can also explicitly move a buffer to the end of the list
5486 using @code{bury-buffer}.
5487
5488 In addition to the global ordering in @code{Vbuffer_alist}, each frame
5489 has its own ordering of the list. These lists always contain the same
5490 elements as in @code{Vbuffer_alist} although possibly in a different
5491 order. @code{buffer-list} normally returns the list for the selected
5492 frame. This allows you to work in separate frames without things
5493 interfering with each other.
5494
5495 The standard way to look up a buffer given a name is
5496 @code{get-buffer}, and the standard way to create a new buffer is
5497 @code{get-buffer-create}, which looks up a buffer with a given name,
5498 creating a new one if necessary. These operations correspond exactly
5499 with the symbol operations @code{intern-soft} and @code{intern},
5500 respectively. You can also force a new buffer to be created using
5501 @code{generate-new-buffer}, which takes a name and (if necessary) makes
5502 a unique name from this by appending a number, and then creates the
5503 buffer. This is basically like the symbol operation @code{gensym}.
5504
5505 @node Markers and Extents
5506 @section Markers and Extents
5507
5508 Among the things associated with a buffer are things that are
5509 logically attached to certain buffer positions. This can be used to
5510 keep track of a buffer position when text is inserted and deleted, so
5511 that it remains at the same spot relative to the text around it; to
5512 assign properties to particular sections of text; etc. There are two
5513 such objects that are useful in this regard: they are @dfn{markers} and
5514 @dfn{extents}.
5515
5516 A @dfn{marker} is simply a flag placed at a particular buffer
5517 position, which is moved around as text is inserted and deleted.
5518 Markers are used for all sorts of purposes, such as the @code{mark} that
5519 is the other end of textual regions to be cut, copied, etc.
5520
5521 An @dfn{extent} is similar to two markers plus some associated
5522 properties, and is used to keep track of regions in a buffer as text is
5523 inserted and deleted, and to add properties (e.g. fonts) to particular
5524 regions of text. The external interface of extents is explained
5525 elsewhere.
5526
5527 The important thing here is that markers and extents simply contain
5528 buffer positions in them as integers, and every time text is inserted or
5529 deleted, these positions must be updated. In order to minimize the
5530 amount of shuffling that needs to be done, the positions in markers and
5531 extents (there's one per marker, two per extent) and stored in Meminds.
5532 This means that they only need to be moved when the text is physically
5533 moved in memory; since the gap structure tries to minimize this, it also
5534 minimizes the number of marker and extent indices that need to be
5535 adjusted. Look in @file{insdel.c} for the details of how this works.
5536
5537 One other important distinction is that markers are @dfn{temporary}
5538 while extents are @dfn{permanent}. This means that markers disappear as
5539 soon as there are no more pointers to them, and correspondingly, there
5540 is no way to determine what markers are in a buffer if you are just
5541 given the buffer. Extents remain in a buffer until they are detached
5542 (which could happen as a result of text being deleted) or the buffer is
5543 deleted, and primitives do exist to enumerate the extents in a buffer.
5544
5545 @node Bufbytes and Emchars
5546 @section Bufbytes and Emchars
5547
5548 Not yet documented.
5549
5550 @node The Buffer Object
5551 @section The Buffer Object
5552
5553 Buffers contain fields not directly accessible by the Lisp programmer.
5554 We describe them here, naming them by the names used in the C code.
5555 Many are accessible indirectly in Lisp programs via Lisp primitives.
5556
5557 @table @code
5558 @item name
5559 The buffer name is a string that names the buffer. It is guaranteed to
5560 be unique. @xref{Buffer Names,,, lispref, XEmacs Lisp Programmer's
5561 Manual}.
5562
5563 @item save_modified
5564 This field contains the time when the buffer was last saved, as an
5565 integer. @xref{Buffer Modification,,, lispref, XEmacs Lisp Programmer's
5566 Manual}.
5567
5568 @item modtime
5569 This field contains the modification time of the visited file. It is
5570 set when the file is written or read. Every time the buffer is written
5571 to the file, this field is compared to the modification time of the
5572 file. @xref{Buffer Modification,,, lispref, XEmacs Lisp Programmer's
5573 Manual}.
5574
5575 @item auto_save_modified
5576 This field contains the time when the buffer was last auto-saved.
5577
5578 @item last_window_start
5579 This field contains the @code{window-start} position in the buffer as of
5580 the last time the buffer was displayed in a window.
5581
5582 @item undo_list
5583 This field points to the buffer's undo list. @xref{Undo,,, lispref,
5584 XEmacs Lisp Programmer's Manual}.
5585
5586 @item syntax_table_v
5587 This field contains the syntax table for the buffer. @xref{Syntax
5588 Tables,,, lispref, XEmacs Lisp Programmer's Manual}.
5589
5590 @item downcase_table
5591 This field contains the conversion table for converting text to lower
5592 case. @xref{Case Tables,,, lispref, XEmacs Lisp Programmer's Manual}.
5593
5594 @item upcase_table
5595 This field contains the conversion table for converting text to upper
5596 case. @xref{Case Tables,,, lispref, XEmacs Lisp Programmer's Manual}.
5597
5598 @item case_canon_table
5599 This field contains the conversion table for canonicalizing text for
5600 case-folding search. @xref{Case Tables,,, lispref, XEmacs Lisp
5601 Programmer's Manual}.
5602
5603 @item case_eqv_table
5604 This field contains the equivalence table for case-folding search.
5605 @xref{Case Tables,,, lispref, XEmacs Lisp Programmer's Manual}.
5606
5607 @item display_table
5608 This field contains the buffer's display table, or @code{nil} if it
5609 doesn't have one. @xref{Display Tables,,, lispref, XEmacs Lisp
5610 Programmer's Manual}.
5611
5612 @item markers
5613 This field contains the chain of all markers that currently point into
5614 the buffer. Deletion of text in the buffer, and motion of the buffer's
5615 gap, must check each of these markers and perhaps update it.
5616 @xref{Markers,,, lispref, XEmacs Lisp Programmer's Manual}.
5617
5618 @item backed_up
5619 This field is a flag that tells whether a backup file has been made for
5620 the visited file of this buffer.
5621
5622 @item mark
5623 This field contains the mark for the buffer. The mark is a marker,
5624 hence it is also included on the list @code{markers}. @xref{The Mark,,,
5625 lispref, XEmacs Lisp Programmer's Manual}.
5626
5627 @item mark_active
5628 This field is non-@code{nil} if the buffer's mark is active.
5629
5630 @item local_var_alist
5631 This field contains the association list describing the variables local
5632 in this buffer, and their values, with the exception of local variables
5633 that have special slots in the buffer object. (Those slots are omitted
5634 from this table.) @xref{Buffer-Local Variables,,, lispref, XEmacs Lisp
5635 Programmer's Manual}.
5636
5637 @item modeline_format
5638 This field contains a Lisp object which controls how to display the mode
5639 line for this buffer. @xref{Modeline Format,,, lispref, XEmacs Lisp
5640 Programmer's Manual}.
5641
5642 @item base_buffer
5643 This field holds the buffer's base buffer (if it is an indirect buffer),
5644 or @code{nil}.
5645 @end table
5646
5647 @node MULE Character Sets and Encodings, The Lisp Reader and Compiler, Buffers and Textual Representation, Top
5648 @chapter MULE Character Sets and Encodings
5649
5650 Recall that there are two primary ways that text is represented in
5651 XEmacs. The @dfn{buffer} representation sees the text as a series of
5652 bytes (Bufbytes), with a variable number of bytes used per character.
5653 The @dfn{character} representation sees the text as a series of integers
5654 (Emchars), one per character. The character representation is a cleaner
5655 representation from a theoretical standpoint, and is thus used in many
5656 cases when lots of manipulations on a string need to be done. However,
5657 the buffer representation is the standard representation used in both
5658 Lisp strings and buffers, and because of this, it is the ``default''
5659 representation that text comes in. The reason for using this
5660 representation is that it's compact and is compatible with ASCII.
5661
5662 @menu
5663 * Character Sets::
5664 * Encodings::
5665 * Internal Mule Encodings::
5666 * CCL::
5667 @end menu
5668
5669 @node Character Sets
5670 @section Character Sets
5671
5672 A character set (or @dfn{charset}) is an ordered set of characters. A
5673 particular character in a charset is indexed using one or more
5674 @dfn{position codes}, which are non-negative integers. The number of
5675 position codes needed to identify a particular character in a charset is
5676 called the @dfn{dimension} of the charset. In XEmacs/Mule, all charsets
5677 have dimension 1 or 2, and the size of all charsets (except for a few
5678 special cases) is either 94, 96, 94 by 94, or 96 by 96. The range of
5679 position codes used to index characters from any of these types of
5680 character sets is as follows:
5681
5682 @example
5683 Charset type Position code 1 Position code 2
5684 ------------------------------------------------------------
5685 94 33 - 126 N/A
5686 96 32 - 127 N/A
5687 94x94 33 - 126 33 - 126
5688 96x96 32 - 127 32 - 127
5689 @end example
5690
5691 Note that in the above cases position codes do not start at an
5692 expected value such as 0 or 1. The reason for this will become clear
5693 later.
5694
5695 For example, Latin-1 is a 96-character charset, and JISX0208 (the
5696 Japanese national character set) is a 94x94-character charset.
5697
5698 [Note that, although the ranges above define the @emph{valid} position
5699 codes for a charset, some of the slots in a particular charset may in
5700 fact be empty. This is the case for JISX0208, for example, where (e.g.)
5701 all the slots whose first position code is in the range 118 - 127 are
5702 empty.]
5703
5704 There are three charsets that do not follow the above rules. All of
5705 them have one dimension, and have ranges of position codes as follows:
5706
5707 @example
5708 Charset name Position code 1
5709 ------------------------------------
5710 ASCII 0 - 127
5711 Control-1 0 - 31
5712 Composite 0 - some large number
5713 @end example
5714
5715 (The upper bound of the position code for composite characters has not
5716 yet been determined, but it will probably be at least 16,383).
5717
5718 ASCII is the union of two subsidiary character sets: Printing-ASCII
5719 (the printing ASCII character set, consisting of position codes 33 -
5720 126, like for a standard 94-character charset) and Control-ASCII (the
5721 non-printing characters that would appear in a binary file with codes 0
5722 - 32 and 127).
5723
5724 Control-1 contains the non-printing characters that would appear in a
5725 binary file with codes 128 - 159.
5726
5727 Composite contains characters that are generated by overstriking one
5728 or more characters from other charsets.
5729
5730 Note that some characters in ASCII, and all characters in Control-1,
5731 are @dfn{control} (non-printing) characters. These have no printed
5732 representation but instead control some other function of the printing
5733 (e.g. TAB or 8 moves the current character position to the next tab
5734 stop). All other characters in all charsets are @dfn{graphic}
5735 (printing) characters.
5736
5737 When a binary file is read in, the bytes in the file are assigned to
5738 character sets as follows:
5739
5740 @example
5741 Bytes Character set Range
5742 --------------------------------------------------
5743 0 - 127 ASCII 0 - 127
5744 128 - 159 Control-1 0 - 31
5745 160 - 255 Latin-1 32 - 127
5746 @end example
5747
5748 This is a bit ad-hoc but gets the job done.
5749
5750 @node Encodings
5751 @section Encodings
5752
5753 An @dfn{encoding} is a way of numerically representing characters from
5754 one or more character sets. If an encoding only encompasses one
5755 character set, then the position codes for the characters in that
5756 character set could be used directly. This is not possible, however, if
5757 more than one character set is to be used in the encoding.
5758
5759 For example, the conversion detailed above between bytes in a binary
5760 file and characters is effectively an encoding that encompasses the
5761 three character sets ASCII, Control-1, and Latin-1 in a stream of 8-bit
5762 bytes.
5763
5764 Thus, an encoding can be viewed as a way of encoding characters from a
5765 specified group of character sets using a stream of bytes, each of which
5766 contains a fixed number of bits (but not necessarily 8, as in the common
5767 usage of ``byte'').
5768
5769 Here are descriptions of a couple of common
5770 encodings:
5771
5772 @menu
5773 * Japanese EUC (Extended Unix Code)::
5774 * JIS7::
5775 @end menu
5776
5777 @node Japanese EUC (Extended Unix Code)
5778 @subsection Japanese EUC (Extended Unix Code)
5779
5780 This encompasses the character sets Printing-ASCII, Japanese (aka
5781 JISX0208), and Japanese-Kana (half-width katakana, the right half of
5782 JISX0201). It uses 8-bit bytes.
5783
5784 Note that Printing-ASCII and Japanese-Kana are 94-character charsets,
5785 while Japanese is a 94x94-character charset.
5786
5787 The encoding is as follows:
5788
5789 @example
5790 Character set Representation (PC=position-code)
5791 ------------- --------------
5792 Printing-ASCII PC1
5793 Japanese PC1 + 0x80 | PC2 + 0x80
5794 Japanese-Kana 0x8E | PC1 + 0x80
5795 @end example
5796
5797
5798 @node JIS7
5799 @subsection JIS7
5800
5801 This encompasses the character sets Printing-ASCII,
5802 Japanese-Roman (the left half of JISX0201; this character
5803 set is very similar to Printing-ASCII and is a 94-character
5804 charset), Japanese, and Japanese-Kana. It uses 7-bit bytes.
5805
5806 Unlike Japanese EUC, this is a @dfn{modal} encoding, which
5807 means that there are multiple states that the encoding can
5808 be in, which affect how the bytes are to be interpreted.
5809 Special sequences of bytes (called @dfn{escape sequences})
5810 are used to change states.
5811
5812 The encoding is as follows:
5813
5814 @example
5815 Character set Representation (PC=position-code)
5816 ------------- --------------
5817 Printing-ASCII PC1
5818 Japanese-Roman PC1
5819 Japanese PC1 PC2
5820 Japanese-Kana PC1
5821
5822
5823 Escape sequence ASCII equivalent Meaning
5824 --------------- ---------------- -------
5825 0x1B 0x28 0x4A ESC ( J invoke Japanese-Roman
5826 0x1B 0x24 0x42 ESC $ B invoke Japanese
5827 0x1B 0x28 0x49 ESC ( I invoke Japanese-Kana
5828 0x1B 0x28 0x42 ESC ( B invoke Printing-ASCII
5829 @end example
5830
5831 Initially, Printing-ASCII is invoked.
5832
5833 @node Internal Mule Encodings
5834 @section Internal Mule Encodings
5835
5836 In XEmacs/Mule, each character set is assigned a unique number,
5837 called a @dfn{leading byte}. This is used in the encodings of a
5838 character. Leading bytes are in the range 0x80 - 0xFF
5839 (except for ASCII, which has a leading byte of 0), although
5840 some leading bytes are reserved.
5841
5842 Charsets whose leading byte is in the range 0x80 - 0x9F are
5843 called @dfn{official} and are used for built-in charsets.
5844 Other charsets are called @dfn{private} and have leading bytes
5845 in the range 0xA0 - 0xFF; these are user-defined charsets.
5846
5847 More specifically:
5848
5849 @example
5850 Character set Leading byte
5851 ------------- ------------
5852 ASCII 0
5853 Composite 0x80
5854 Dimension-1 Official 0x81 - 0x8D
5855 (0x8E is free)
5856 Control-1 0x8F
5857 Dimension-2 Official 0x90 - 0x99
5858 (0x9A - 0x9D are free;
5859 0x9E and 0x9F are reserved)
5860 Dimension-1 Private 0xA0 - 0xEF
5861 Dimension-2 Private 0xF0 - 0xFF
5862 @end example
5863
5864 There are two internal encodings for characters in XEmacs/Mule. One
5865 is called @dfn{string encoding} and is an 8-bit encoding that is used
5866 for representing characters in a buffer or string. It uses 1 to 4 bytes
5867 per character. The other is called @dfn{character encoding} and is a
5868 19-bit encoding that is used for representing characters individually in
5869 a variable.
5870
5871 (In the following descriptions, we'll ignore composite
5872 characters for the moment. We also give a general (structural)
5873 overview first, followed later by the exact details.)
5874
5875 @menu
5876 * Internal String Encoding::
5877 * Internal Character Encoding::
5878 @end menu
5879
5880 @node Internal String Encoding
5881 @subsection Internal String Encoding
5882
5883 ASCII characters are encoded using their position code directly.
5884 Other characters are encoded using their leading byte followed
5885 by their position code(s) with the high bit set. Characters
5886 in private character sets have their leading byte prefixed with
5887 a @dfn{leading byte prefix}, which is either 0x9E or 0x9F. (No
5888 character sets are ever assigned these leading bytes.) Specifically:
5889
5890 @example
5891 Character set Encoding (PC=position-code, LB=leading-byte)
5892 ------------- --------
5893 ASCII PC-1 |
5894 Control-1 LB | PC1 + 0xA0 |
5895 Dimension-1 official LB | PC1 + 0x80 |
5896 Dimension-1 private 0x9E | LB | PC1 + 0x80 |
5897 Dimension-2 official LB | PC1 + 0x80 | PC2 + 0x80 |
5898 Dimension-2 private 0x9F | LB | PC1 + 0x80 | PC2 + 0x80
5899 @end example
5900
5901 The basic characteristic of this encoding is that the first byte
5902 of all characters is in the range 0x00 - 0x9F, and the second and
5903 following bytes of all characters is in the range 0xA0 - 0xFF.
5904 This means that it is impossible to get out of sync, or more
5905 specifically:
5906
5907 @enumerate
5908 @item
5909 Given any byte position, the beginning of the character it is
5910 within can be determined in constant time.
5911 @item
5912 Given any byte position at the beginning of a character, the
5913 beginning of the next character can be determined in constant
5914 time.
5915 @item
5916 Given any byte position at the beginning of a character, the
5917 beginning of the previous character can be determined in constant
5918 time.
5919 @item
5920 Textual searches can simply treat encoded strings as if they
5921 were encoded in a one-byte-per-character fashion rather than
5922 the actual multi-byte encoding.
5923 @end enumerate
5924
5925 None of the standard non-modal encodings meet all of these
5926 conditions. For example, EUC satisfies only (2) and (3), while
5927 Shift-JIS and Big5 (not yet described) satisfy only (2). (All
5928 non-modal encodings must satisfy (2), in order to be unambiguous.)
5929
5930 @node Internal Character Encoding
5931 @subsection Internal Character Encoding
5932
5933 One 19-bit word represents a single character. The word is
5934 separated into three fields:
5935
5936 @example
5937 Bit number: 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
5938 <------------> <------------------> <------------------>
5939 Field: 1 2 3
5940 @end example
5941
5942 Note that fields 2 and 3 hold 7 bits each, while field 1 holds 5 bits.
5943
5944 @example
5945 Character set Field 1 Field 2 Field 3
5946 ------------- ------- ------- -------
5947 ASCII 0 0 PC1
5948 range: (00 - 7F)
5949 Control-1 0 1 PC1
5950 range: (00 - 1F)
5951 Dimension-1 official 0 LB - 0x80 PC1
5952 range: (01 - 0D) (20 - 7F)
5953 Dimension-1 private 0 LB - 0x80 PC1
5954 range: (20 - 6F) (20 - 7F)
5955 Dimension-2 official LB - 0x8F PC1 PC2
5956 range: (01 - 0A) (20 - 7F) (20 - 7F)
5957 Dimension-2 private LB - 0xE1 PC1 PC2
5958 range: (0F - 1E) (20 - 7F) (20 - 7F)
5959 Composite 0x1F ? ?
5960 @end example
5961
5962 Note that character codes 0 - 255 are the same as the ``binary encoding''
5963 described above.
5964
5965 @node CCL
5966 @section CCL
5967
5968 @example
5969 CCL PROGRAM SYNTAX:
5970 CCL_PROGRAM := (CCL_MAIN_BLOCK
5971 [ CCL_EOF_BLOCK ])
5972
5973 CCL_MAIN_BLOCK := CCL_BLOCK
5974 CCL_EOF_BLOCK := CCL_BLOCK
5975
5976 CCL_BLOCK := STATEMENT | (STATEMENT [STATEMENT ...])
5977 STATEMENT :=
5978 SET | IF | BRANCH | LOOP | REPEAT | BREAK
5979 | READ | WRITE
5980
5981 SET := (REG = EXPRESSION) | (REG SELF_OP EXPRESSION)
5982 | INT-OR-CHAR
5983
5984 EXPRESSION := ARG | (EXPRESSION OP ARG)
5985
5986 IF := (if EXPRESSION CCL_BLOCK CCL_BLOCK)
5987 BRANCH := (branch EXPRESSION CCL_BLOCK [CCL_BLOCK ...])
5988 LOOP := (loop STATEMENT [STATEMENT ...])
5989 BREAK := (break)
5990 REPEAT := (repeat)
5991 | (write-repeat [REG | INT-OR-CHAR | string])
5992 | (write-read-repeat REG [INT-OR-CHAR | string | ARRAY]?)
5993 READ := (read REG) | (read REG REG)
5994 | (read-if REG ARITH_OP ARG CCL_BLOCK CCL_BLOCK)
5995 | (read-branch REG CCL_BLOCK [CCL_BLOCK ...])
5996 WRITE := (write REG) | (write REG REG)
5997 | (write INT-OR-CHAR) | (write STRING) | STRING
5998 | (write REG ARRAY)
5999 END := (end)
6000
6001 REG := r0 | r1 | r2 | r3 | r4 | r5 | r6 | r7
6002 ARG := REG | INT-OR-CHAR
6003 OP := + | - | * | / | % | & | '|' | ^ | << | >> | <8 | >8 | //
6004 | < | > | == | <= | >= | !=
6005 SELF_OP :=
6006 += | -= | *= | /= | %= | &= | '|=' | ^= | <<= | >>=
6007 ARRAY := '[' INT-OR-CHAR ... ']'
6008 INT-OR-CHAR := INT | CHAR
6009
6010 MACHINE CODE:
6011
6012 The machine code consists of a vector of 32-bit words.
6013 The first such word specifies the start of the EOF section of the code;
6014 this is the code executed to handle any stuff that needs to be done
6015 (e.g. designating back to ASCII and left-to-right mode) after all
6016 other encoded/decoded data has been written out. This is not used for
6017 charset CCL programs.
6018
6019 REGISTER: 0..7 -- refered by RRR or rrr
6020
6021 OPERATOR BIT FIELD (27-bit): XXXXXXXXXXXXXXX RRR TTTTT
6022 TTTTT (5-bit): operator type
6023 RRR (3-bit): register number
6024 XXXXXXXXXXXXXXXX (15-bit):
6025 CCCCCCCCCCCCCCC: constant or address
6026 000000000000rrr: register number
6027
6028 AAAA: 00000 +
6029 00001 -
6030 00010 *
6031 00011 /
6032 00100 %
6033 00101 &
6034 00110 |
6035 00111 ~
6036
6037 01000 <<
6038 01001 >>
6039 01010 <8
6040 01011 >8
6041 01100 //
6042 01101 not used
6043 01110 not used
6044 01111 not used
6045
6046 10000 <
6047 10001 >
6048 10010 ==
6049 10011 <=
6050 10100 >=
6051 10101 !=
6052
6053 OPERATORS: TTTTT RRR XX..
6054
6055 SetCS: 00000 RRR C...C RRR = C...C
6056 SetCL: 00001 RRR ..... RRR = c...c
6057 c.............c
6058 SetR: 00010 RRR ..rrr RRR = rrr
6059 SetA: 00011 RRR ..rrr RRR = array[rrr]
6060 C.............C size of array = C...C
6061 c.............c contents = c...c
6062
6063 Jump: 00100 000 c...c jump to c...c
6064 JumpCond: 00101 RRR c...c if (!RRR) jump to c...c
6065 WriteJump: 00110 RRR c...c Write1 RRR, jump to c...c
6066 WriteReadJump: 00111 RRR c...c Write1, Read1 RRR, jump to c...c
6067 WriteCJump: 01000 000 c...c Write1 C...C, jump to c...c
6068 C...C
6069 WriteCReadJump: 01001 RRR c...c Write1 C...C, Read1 RRR,
6070 C.............C and jump to c...c
6071 WriteSJump: 01010 000 c...c WriteS, jump to c...c
6072 C.............C
6073 S.............S
6074 ...
6075 WriteSReadJump: 01011 RRR c...c WriteS, Read1 RRR, jump to c...c
6076 C.............C
6077 S.............S
6078 ...
6079 WriteAReadJump: 01100 RRR c...c WriteA, Read1 RRR, jump to c...c
6080 C.............C size of array = C...C
6081 c.............c contents = c...c
6082 ...
6083 Branch: 01101 RRR C...C if (RRR >= 0 && RRR < C..)
6084 c.............c branch to (RRR+1)th address
6085 Read1: 01110 RRR ... read 1-byte to RRR
6086 Read2: 01111 RRR ..rrr read 2-byte to RRR and rrr
6087 ReadBranch: 10000 RRR C...C Read1 and Branch
6088 c.............c
6089 ...
6090 Write1: 10001 RRR ..... write 1-byte RRR
6091 Write2: 10010 RRR ..rrr write 2-byte RRR and rrr
6092 WriteC: 10011 000 ..... write 1-char C...CC
6093 C.............C
6094 WriteS: 10100 000 ..... write C..-byte of string
6095 C.............C
6096 S.............S
6097 ...
6098 WriteA: 10101 RRR ..... write array[RRR]
6099 C.............C size of array = C...C
6100 c.............c contents = c...c
6101 ...
6102 End: 10110 000 ..... terminate the execution
6103
6104 SetSelfCS: 10111 RRR C...C RRR AAAAA= C...C
6105 ..........AAAAA
6106 SetSelfCL: 11000 RRR ..... RRR AAAAA= c...c
6107 c.............c
6108 ..........AAAAA
6109 SetSelfR: 11001 RRR ..Rrr RRR AAAAA= rrr
6110 ..........AAAAA
6111 SetExprCL: 11010 RRR ..Rrr RRR = rrr AAAAA c...c
6112 c.............c
6113 ..........AAAAA
6114 SetExprR: 11011 RRR ..rrr RRR = rrr AAAAA Rrr
6115 ............Rrr
6116 ..........AAAAA
6117 JumpCondC: 11100 RRR c...c if !(RRR AAAAA C..) jump to c...c
6118 C.............C
6119 ..........AAAAA
6120 JumpCondR: 11101 RRR c...c if !(RRR AAAAA rrr) jump to c...c
6121 ............rrr
6122 ..........AAAAA
6123 ReadJumpCondC: 11110 RRR c...c Read1 and JumpCondC
6124 C.............C
6125 ..........AAAAA
6126 ReadJumpCondR: 11111 RRR c...c Read1 and JumpCondR
6127 ............rrr
6128 ..........AAAAA
6129 @end example
6130
6131 @node The Lisp Reader and Compiler, Lstreams, MULE Character Sets and Encodings, Top
6132 @chapter The Lisp Reader and Compiler
6133
6134 Not yet documented.
6135
6136 @node Lstreams, Consoles; Devices; Frames; Windows, The Lisp Reader and Compiler, Top
6137 @chapter Lstreams
6138
6139 An @dfn{lstream} is an internal Lisp object that provides a generic
6140 buffering stream implementation. Conceptually, you send data to the
6141 stream or read data from the stream, not caring what's on the other end
6142 of the stream. The other end could be another stream, a file
6143 descriptor, a stdio stream, a fixed block of memory, a reallocating
6144 block of memory, etc. The main purpose of the stream is to provide a
6145 standard interface and to do buffering. Macros are defined to read or
6146 write characters, so the calling functions do not have to worry about
6147 blocking data together in order to achieve efficiency.
6148
6149 @menu
6150 * Creating an Lstream:: Creating an lstream object.
6151 * Lstream Types:: Different sorts of things that are streamed.
6152 * Lstream Functions:: Functions for working with lstreams.
6153 * Lstream Methods:: Creating new lstream types.
6154 @end menu
6155
6156 @node Creating an Lstream
6157 @section Creating an Lstream
6158
6159 Lstreams come in different types, depending on what is being interfaced
6160 to. Although the primitive for creating new lstreams is
6161 @code{Lstream_new()}, generally you do not call this directly. Instead,
6162 you call some type-specific creation function, which creates the lstream
6163 and initializes it as appropriate for the particular type.
6164
6165 All lstream creation functions take a @var{mode} argument, specifying
6166 what mode the lstream should be opened as. This controls whether the
6167 lstream is for input and output, and optionally whether data should be
6168 blocked up in units of MULE characters. Note that some types of
6169 lstreams can only be opened for input; others only for output; and
6170 others can be opened either way. #### Richard Mlynarik thinks that
6171 there should be a strict separation between input and output streams,
6172 and he's probably right.
6173
6174 @var{mode} is a string, one of
6175
6176 @table @code
6177 @item "r"
6178 Open for reading.
6179 @item "w"
6180 Open for writing.
6181 @item "rc"
6182 Open for reading, but ``read'' never returns partial MULE characters.
6183 @item "wc"
6184 Open for writing, but never writes partial MULE characters.
6185 @end table
6186
6187 @node Lstream Types
6188 @section Lstream Types
6189
6190 @table @asis
6191 @item stdio
6192
6193 @item filedesc
6194
6195 @item lisp-string
6196
6197 @item fixed-buffer
6198
6199 @item resizing-buffer
6200
6201 @item dynarr
6202
6203 @item lisp-buffer
6204
6205 @item print
6206
6207 @item decoding
6208
6209 @item encoding
6210 @end table
6211
6212 @node Lstream Functions
6213 @section Lstream Functions
6214
6215 @deftypefun {Lstream *} Lstream_new (Lstream_implementation *@var{imp}, CONST char *@var{mode})
6216 Allocate and return a new Lstream. This function is not really meant to
6217 be called directly; rather, each stream type should provide its own
6218 stream creation function, which creates the stream and does any other
6219 necessary creation stuff (e.g. opening a file).
6220 @end deftypefun
6221
6222 @deftypefun void Lstream_set_buffering (Lstream *@var{lstr}, Lstream_buffering @var{buffering}, int @var{buffering_size})
6223 Change the buffering of a stream. See @file{lstream.h}. By default the
6224 buffering is @code{STREAM_BLOCK_BUFFERED}.
6225 @end deftypefun
6226
6227 @deftypefun int Lstream_flush (Lstream *@var{lstr})
6228 Flush out any pending unwritten data in the stream. Clear any buffered
6229 input data. Returns 0 on success, -1 on error.
6230 @end deftypefun
6231
6232 @deftypefn Macro int Lstream_putc (Lstream *@var{stream}, int @var{c})
6233 Write out one byte to the stream. This is a macro and so it is very
6234 efficient. The @var{c} argument is only evaluated once but the @var{stream}
6235 argument is evaluated more than once. Returns 0 on success, -1 on
6236 error.
6237 @end deftypefn
6238
6239 @deftypefn Macro int Lstream_getc (Lstream *@var{stream})
6240 Read one byte from the stream. This is a macro and so it is very
6241 efficient. The @var{stream} argument is evaluated more than once. Return
6242 value is -1 for EOF or error.
6243 @end deftypefn
6244
6245 @deftypefn Macro void Lstream_ungetc (Lstream *@var{stream}, int @var{c})
6246 Push one byte back onto the input queue. This will be the next byte
6247 read from the stream. Any number of bytes can be pushed back and will
6248 be read in the reverse order they were pushed back -- most recent
6249 first. (This is necessary for consistency -- if there are a number of
6250 bytes that have been unread and I read and unread a byte, it needs to be
6251 the first to be read again.) This is a macro and so it is very
6252 efficient. The @var{c} argument is only evaluated once but the @var{stream}
6253 argument is evaluated more than once.
6254 @end deftypefn
6255
6256 @deftypefun int Lstream_fputc (Lstream *@var{stream}, int @var{c})
6257 @deftypefunx int Lstream_fgetc (Lstream *@var{stream})
6258 @deftypefunx void Lstream_fungetc (Lstream *@var{stream}, int @var{c})
6259 Function equivalents of the above macros.
6260 @end deftypefun
6261
6262 @deftypefun int Lstream_read (Lstream *@var{stream}, void *@var{data}, int @var{size})
6263 Read @var{size} bytes of @var{data} from the stream. Return the number
6264 of bytes read. 0 means EOF. -1 means an error occurred and no bytes
6265 were read.
6266 @end deftypefun
6267
6268 @deftypefun int Lstream_write (Lstream *@var{stream}, void *@var{data}, int @var{size})
6269 Write @var{size} bytes of @var{data} to the stream. Return the number
6270 of bytes written. -1 means an error occurred and no bytes were written.
6271 @end deftypefun
6272
6273 @deftypefun void Lstream_unread (Lstream *@var{stream}, void *@var{data}, int @var{size})
6274 Push back @var{size} bytes of @var{data} onto the input queue. The next
6275 call to @code{Lstream_read()} with the same size will read the same
6276 bytes back. Note that this will be the case even if there is other
6277 pending unread data.
6278 @end deftypefun
6279
6280 @deftypefun int Lstream_close (Lstream *@var{stream})
6281 Close the stream. All data will be flushed out.
6282 @end deftypefun
6283
6284 @deftypefun void Lstream_reopen (Lstream *@var{stream})
6285 Reopen a closed stream. This enables I/O on it again. This is not
6286 meant to be called except from a wrapper routine that reinitializes
6287 variables and such -- the close routine may well have freed some
6288 necessary storage structures, for example.
6289 @end deftypefun
6290
6291 @deftypefun void Lstream_rewind (Lstream *@var{stream})
6292 Rewind the stream to the beginning.
6293 @end deftypefun
6294
6295 @node Lstream Methods
6296 @section Lstream Methods
6297
6298 @deftypefn {Lstream Method} int reader (Lstream *@var{stream}, unsigned char *@var{data}, int @var{size})
6299 Read some data from the stream's end and store it into @var{data}, which
6300 can hold @var{size} bytes. Return the number of bytes read. A return
6301 value of 0 means no bytes can be read at this time. This may be because
6302 of an EOF, or because there is a granularity greater than one byte that
6303 the stream imposes on the returned data, and @var{size} is less than
6304 this granularity. (This will happen frequently for streams that need to
6305 return whole characters, because @code{Lstream_read()} calls the reader
6306 function repeatedly until it has the number of bytes it wants or until 0
6307 is returned.) The lstream functions do not treat a 0 return as EOF or
6308 do anything special; however, the calling function will interpret any 0
6309 it gets back as EOF. This will normally not happen unless the caller
6310 calls @code{Lstream_read()} with a very small size.
6311
6312 This function can be @code{NULL} if the stream is output-only.
6313 @end deftypefn
6314
6315 @deftypefn {Lstream Method} int writer (Lstream *@var{stream}, CONST unsigned char *@var{data}, int @var{size})
6316 Send some data to the stream's end. Data to be sent is in @var{data}
6317 and is @var{size} bytes. Return the number of bytes sent. This
6318 function can send and return fewer bytes than is passed in; in that
6319 case, the function will just be called again until there is no data left
6320 or 0 is returned. A return value of 0 means that no more data can be
6321 currently stored, but there is no error; the data will be squirreled
6322 away until the writer can accept data. (This is useful, e.g., if you're
6323 dealing with a non-blocking file descriptor and are getting
6324 @code{EWOULDBLOCK} errors.) This function can be @code{NULL} if the
6325 stream is input-only.
6326 @end deftypefn
6327
6328 @deftypefn {Lstream Method} int rewinder (Lstream *@var{stream})
6329 Rewind the stream. If this is @code{NULL}, the stream is not seekable.
6330 @end deftypefn
6331
6332 @deftypefn {Lstream Method} int seekable_p (Lstream *@var{stream})
6333 Indicate whether this stream is seekable -- i.e. it can be rewound.
6334 This method is ignored if the stream does not have a rewind method. If
6335 this method is not present, the result is determined by whether a rewind
6336 method is present.
6337 @end deftypefn
6338
6339 @deftypefn {Lstream Method} int flusher (Lstream *@var{stream})
6340 Perform any additional operations necessary to flush the data in this
6341 stream.
6342 @end deftypefn
6343
6344 @deftypefn {Lstream Method} int pseudo_closer (Lstream *@var{stream})
6345 @end deftypefn
6346
6347 @deftypefn {Lstream Method} int closer (Lstream *@var{stream})
6348 Perform any additional operations necessary to close this stream down.
6349 May be @code{NULL}. This function is called when @code{Lstream_close()}
6350 is called or when the stream is garbage-collected. When this function
6351 is called, all pending data in the stream will already have been written
6352 out.
6353 @end deftypefn
6354
6355 @deftypefn {Lstream Method} Lisp_Object marker (Lisp_Object @var{lstream}, void (*@var{markfun}) (Lisp_Object))
6356 Mark this object for garbage collection. Same semantics as a standard
6357 @code{Lisp_Object} marker. This function can be @code{NULL}.
6358 @end deftypefn
6359
6360 @node Consoles; Devices; Frames; Windows, The Redisplay Mechanism, Lstreams, Top
6361 @chapter Consoles; Devices; Frames; Windows
6362
6363 @menu
6364 * Introduction to Consoles; Devices; Frames; Windows::
6365 * Point::
6366 * Window Hierarchy::
6367 * The Window Object::
6368 @end menu
6369
6370 @node Introduction to Consoles; Devices; Frames; Windows
6371 @section Introduction to Consoles; Devices; Frames; Windows
6372
6373 A window-system window that you see on the screen is called a
6374 @dfn{frame} in Emacs terminology. Each frame is subdivided into one or
6375 more non-overlapping panes, called (confusingly) @dfn{windows}. Each
6376 window displays the text of a buffer in it. (See above on Buffers.) Note
6377 that buffers and windows are independent entities: Two or more windows
6378 can be displaying the same buffer (potentially in different locations),
6379 and a buffer can be displayed in no windows.
6380
6381 A single display screen that contains one or more frames is called
6382 a @dfn{display}. Under most circumstances, there is only one display.
6383 However, more than one display can exist, for example if you have
6384 a @dfn{multi-headed} console, i.e. one with a single keyboard but
6385 multiple displays. (Typically in such a situation, the various
6386 displays act like one large display, in that the mouse is only
6387 in one of them at a time, and moving the mouse off of one moves
6388 it into another.) In some cases, the different displays will
6389 have different characteristics, e.g. one color and one mono.
6390
6391 XEmacs can display frames on multiple displays. It can even deal
6392 simultaneously with frames on multiple keyboards (called @dfn{consoles} in
6393 XEmacs terminology). Here is one case where this might be useful: You
6394 are using XEmacs on your workstation at work, and leave it running.
6395 Then you go home and dial in on a TTY line, and you can use the
6396 already-running XEmacs process to display another frame on your local
6397 TTY.
6398
6399 Thus, there is a hierarchy console -> display -> frame -> window.
6400 There is a separate Lisp object type for each of these four concepts.
6401 Furthermore, there is logically a @dfn{selected console},
6402 @dfn{selected display}, @dfn{selected frame}, and @dfn{selected window}.
6403 This particular object is distinguished in various ways, such as
6404 that it is the default object for various functions that act
6405 on objects of that type. Note that every containing object
6406 rememembers the ``selected'' object among the objects that it
6407 contains: e.g. not only is there a selected window, but
6408 every frame remembers the last window in it that was selected,
6409 and changing the selected frame causes the remembered window
6410 within it to become the selected window. Similar relationships
6411 apply for consoles to devices and devices to frames.
6412
6413 @node Point
6414 @section Point
6415
6416 Recall that every buffer has a current insertion position, called
6417 @dfn{point}. Now, two or more windows may be displaying the same buffer,
6418 and the text cursor in the two windows (i.e. @code{point}) can be in
6419 two different places. You may ask, how can that be, since each
6420 buffer has only one value of @code{point}? The answer is that each window
6421 also has a value of @code{point} that is squirreled away in it. There
6422 is only one selected window, and the value of ``point'' in that buffer
6423 corresponds to that window. When the selected window is changed
6424 from one window to another displaying the same buffer, the old
6425 value of @code{point} is stored into the old window's ``point'' and the
6426 value of @code{point} from the new window is retrieved and made the
6427 value of @code{point} in the buffer. This means that @code{window-point}
6428 for the selected window is potentially inaccurate, and if you
6429 want to retrieve the correct value of @code{point} for a window,
6430 you must special-case on the selected window and retrieve the
6431 buffer's point instead. This is related to why @code{save-window-excursion}
6432 does not save the selected window's value of @code{point}.
6433
6434 @node Window Hierarchy
6435 @section Window Hierarchy
6436 @cindex window hierarchy
6437 @cindex hierarchy of windows
6438
6439 If a frame contains multiple windows (panes), they are always created
6440 by splitting an existing window along the horizontal or vertical axis.
6441 Terminology is a bit confusing here: to @dfn{split a window
6442 horizontally} means to create two side-by-side windows, i.e. to make a
6443 @emph{vertical} cut in a window. Likewise, to @dfn{split a window
6444 vertically} means to create two windows, one above the other, by making
6445 a @emph{horizontal} cut.
6446
6447 If you split a window and then split again along the same axis, you
6448 will end up with a number of panes all arranged along the same axis.
6449 The precise way in which the splits were made should not be important,
6450 and this is reflected internally. Internally, all windows are arranged
6451 in a tree, consisting of two types of windows, @dfn{combination} windows
6452 (which have children, and are covered completely by those children) and
6453 @dfn{leaf} windows, which have no children and are visible. Every
6454 combination window has two or more children, all arranged along the same
6455 axis. There are (logically) two subtypes of windows, depending on
6456 whether their children are horizontally or vertically arrayed. There is
6457 always one root window, which is either a leaf window (if the frame
6458 contains only one window) or a combination window (if the frame contains
6459 more than one window). In the latter case, the root window will have
6460 two or more children, either horizontally or vertically arrayed, and
6461 each of those children will be either a leaf window or another
6462 combination window.
6463
6464 Here are some rules:
6465
6466 @enumerate
6467 @item
6468 Horizontal combination windows can never have children that
6469 are horizontal combination windows; same for vertical.
6470
6471 @item
6472 Only leaf windows can be split (obviously) and this splitting does one
6473 of two things: (a) turns the leaf window into a combination window and
6474 creates two new leaf children, or (b) turns the leaf window into one of
6475 the two new leaves and creates the other leaf. Rule (1) dictates which
6476 of these two outcomes happens.
6477
6478 @item
6479 Every combination window must have at least two children.
6480
6481 @item
6482 Leaf windows can never become combination windows. They can be deleted,
6483 however. If this results in a violation of (3), the parent combination
6484 window also gets deleted.
6485
6486 @item
6487 All functions that accept windows must be prepared to accept combination
6488 windows, and do something sane (e.g. signal an error if so).
6489 Combination windows @emph{do} escape to the Lisp level.
6490
6491 @item
6492 All windows have three fields governing their contents:
6493 these are @dfn{hchild} (a list of horizontally-arrayed children),
6494 @dfn{vchild} (a list of vertically-arrayed children), and @dfn{buffer}
6495 (the buffer contained in a leaf window). Exactly one of
6496 these will be non-nil. Remember that @dfn{horizontally-arrayed}
6497 means ``side-by-side'' and @dfn{vertically-arrayed} means
6498 @dfn{one above the other}.
6499
6500 @item
6501 Leaf windows also have markers in their @code{start} (the
6502 first buffer position displayed in the window) and @code{pointm}
6503 (the window's stashed value of @code{point} -- see above) fields,
6504 while combination windows have nil in these fields.
6505
6506 @item
6507 The list of children for a window is threaded through the
6508 @code{next} and @code{prev} fields of each child window.
6509
6510 @item
6511 @strong{Deleted windows can be undeleted}. This happens as a result of
6512 restoring a window configuration, and is unlike frames, displays, and
6513 consoles, which, once deleted, can never be restored. Deleting a window
6514 does nothing except set a special @code{dead} bit to 1 and clear out the
6515 @code{next}, @code{prev}, @code{hchild}, and @code{vchild} fields, for
6516 GC purposes.
6517
6518 @item
6519 Most frames actually have two top-level windows -- one for the
6520 minibuffer and one (the @dfn{root}) for everything else. The modeline
6521 (if present) separates these two. The @code{next} field of the root
6522 points to the minibuffer, and the @code{prev} field of the minibuffer
6523 points to the root. The other @code{next} and @code{prev} fields are
6524 @code{nil}, and the frame points to both of these windows.
6525 Minibuffer-less frames have no minibuffer window, and the @code{next}
6526 and @code{prev} of the root window are @code{nil}. Minibuffer-only
6527 frames have no root window, and the @code{next} of the minibuffer window
6528 is @code{nil} but the @code{prev} points to itself. (#### This is an
6529 artifact that should be fixed.)
6530 @end enumerate
6531
6532 @node The Window Object
6533 @section The Window Object
6534
6535 Windows have the following accessible fields:
6536
6537 @table @code
6538 @item frame
6539 The frame that this window is on.
6540
6541 @item mini_p
6542 Non-@code{nil} if this window is a minibuffer window.
6543
6544 @item buffer
6545 The buffer that the window is displaying. This may change often during
6546 the life of the window.
6547
6548 @item dedicated
6549 Non-@code{nil} if this window is dedicated to its buffer.
6550
6551 @item pointm
6552 @cindex window point internals
6553 This is the value of point in the current buffer when this window is
6554 selected; when it is not selected, it retains its previous value.
6555
6556 @item start
6557 The position in the buffer that is the first character to be displayed
6558 in the window.
6559
6560 @item force_start
6561 If this flag is non-@code{nil}, it says that the window has been
6562 scrolled explicitly by the Lisp program. This affects what the next
6563 redisplay does if point is off the screen: instead of scrolling the
6564 window to show the text around point, it moves point to a location that
6565 is on the screen.
6566
6567 @item last_modified
6568 The @code{modified} field of the window's buffer, as of the last time
6569 a redisplay completed in this window.
6570
6571 @item last_point
6572 The buffer's value of point, as of the last time
6573 a redisplay completed in this window.
6574
6575 @item left
6576 This is the left-hand edge of the window, measured in columns. (The
6577 leftmost column on the screen is @w{column 0}.)
6578
6579 @item top
6580 This is the top edge of the window, measured in lines. (The top line on
6581 the screen is @w{line 0}.)
6582
6583 @item height
6584 The height of the window, measured in lines.
6585
6586 @item width
6587 The width of the window, measured in columns.
6588
6589 @item next
6590 This is the window that is the next in the chain of siblings. It is
6591 @code{nil} in a window that is the rightmost or bottommost of a group of
6592 siblings.
6593
6594 @item prev
6595 This is the window that is the previous in the chain of siblings. It is
6596 @code{nil} in a window that is the leftmost or topmost of a group of
6597 siblings.
6598
6599 @item parent
6600 Internally, XEmacs arranges windows in a tree; each group of siblings has
6601 a parent window whose area includes all the siblings. This field points
6602 to a window's parent.
6603
6604 Parent windows do not display buffers, and play little role in display
6605 except to shape their child windows. Emacs Lisp programs usually have
6606 no access to the parent windows; they operate on the windows at the
6607 leaves of the tree, which actually display buffers.
6608
6609 @item hscroll
6610 This is the number of columns that the display in the window is scrolled
6611 horizontally to the left. Normally, this is 0.
6612
6613 @item use_time
6614 This is the last time that the window was selected. The function
6615 @code{get-lru-window} uses this field.
6616
6617 @item display_table
6618 The window's display table, or @code{nil} if none is specified for it.
6619
6620 @item update_mode_line
6621 Non-@code{nil} means this window's mode line needs to be updated.
6622
6623 @item base_line_number
6624 The line number of a certain position in the buffer, or @code{nil}.
6625 This is used for displaying the line number of point in the mode line.
6626
6627 @item base_line_pos
6628 The position in the buffer for which the line number is known, or
6629 @code{nil} meaning none is known.
6630
6631 @item region_showing
6632 If the region (or part of it) is highlighted in this window, this field
6633 holds the mark position that made one end of that region. Otherwise,
6634 this field is @code{nil}.
6635 @end table
6636
6637 @node The Redisplay Mechanism, Extents, Consoles; Devices; Frames; Windows, Top
6638 @chapter The Redisplay Mechanism
6639
6640 The redisplay mechanism is one of the most complicated sections of
6641 XEmacs, especially from a conceptual standpoint. This is doubly so
6642 because, unlike for the basic aspects of the Lisp interpreter, the
6643 computer science theories of how to efficiently handle redisplay are not
6644 well-developed.
6645
6646 When working with the redisplay mechanism, remember the Golden Rules
6647 of Redisplay:
6648
6649 @enumerate
6650 @item
6651 It Is Better To Be Correct Than Fast.
6652 @item
6653 Thou Shalt Not Run Elisp From Within Redisplay.
6654 @item
6655 It Is Better To Be Fast Than Not To Be.
6656 @end enumerate
6657
6658 @menu
6659 * Critical Redisplay Sections::
6660 * Line Start Cache::
6661 @end menu
6662
6663 @node Critical Redisplay Sections
6664 @section Critical Redisplay Sections
6665 @cindex critical redisplay sections
6666
6667 Within this section, we are defenseless and assume that the
6668 following cannot happen:
6669
6670 @enumerate
6671 @item
6672 garbage collection
6673 @item
6674 Lisp code evaluation
6675 @item
6676 frame size changes
6677 @end enumerate
6678
6679 We ensure (3) by calling @code{hold_frame_size_changes()}, which
6680 will cause any pending frame size changes to get put on hold
6681 till after the end of the critical section. (1) follows
6682 automatically if (2) is met. #### Unfortunately, there are
6683 some places where Lisp code can be called within this section.
6684 We need to remove them.
6685
6686 If @code{Fsignal()} is called during this critical section, we
6687 will @code{abort()}.
6688
6689 If garbage collection is called during this critical section,
6690 we simply return. #### We should abort instead.
6691
6692 #### If a frame-size change does occur we should probably
6693 actually be preempting redisplay.
6694
6695 @node Line Start Cache
6696 @section Line Start Cache
6697 @cindex line start cache
6698
6699 The traditional scrolling code in Emacs breaks in a variable height
6700 world. It depends on the key assumption that the number of lines that
6701 can be displayed at any given time is fixed. This led to a complete
6702 separation of the scrolling code from the redisplay code. In order to
6703 fully support variable height lines, the scrolling code must actually be
6704 tightly integrated with redisplay. Only redisplay can determine how
6705 many lines will be displayed on a screen for any given starting point.
6706
6707 What is ideally wanted is a complete list of the starting buffer
6708 position for every possible display line of a buffer along with the
6709 height of that display line. Maintaining such a full list would be very
6710 expensive. We settle for having it include information for all areas
6711 which we happen to generate anyhow (i.e. the region currently being
6712 displayed) and for those areas we need to work with.
6713
6714 In order to ensure that the cache accurately represents what redisplay
6715 would actually show, it is necessary to invalidate it in many
6716 situations. If the buffer changes, the starting positions may no longer
6717 be correct. If a face or an extent has changed then the line heights
6718 may have altered. These events happen frequently enough that the cache
6719 can end up being constantly disabled. With this potentially constant
6720 invalidation when is the cache ever useful?
6721
6722 Even if the cache is invalidated before every single usage, it is
6723 necessary. Scrolling often requires knowledge about display lines which
6724 are actually above or below the visible region. The cache provides a
6725 convenient light-weight method of storing this information for multiple
6726 display regions. This knowledge is necessary for the scrolling code to
6727 always obey the First Golden Rule of Redisplay.
6728
6729 If the cache already contains all of the information that the scrolling
6730 routines happen to need so that it doesn't have to go generate it, then
6731 we are able to obey the Third Golden Rule of Redisplay. The first thing
6732 we do to help out the cache is to always add the displayed region. This
6733 region had to be generated anyway, so the cache ends up getting the
6734 information basically for free. In those cases where a user is simply
6735 scrolling around viewing a buffer there is a high probability that this
6736 is sufficient to always provide the needed information. The second
6737 thing we can do is be smart about invalidating the cache.
6738
6739 TODO -- Be smart about invalidating the cache. Potential places:
6740
6741 @itemize @bullet
6742 @item
6743 Insertions at end-of-line which don't cause line-wraps do not alter the
6744 starting positions of any display lines. These types of buffer
6745 modifications should not invalidate the cache. This is actually a large
6746 optimization for redisplay speed as well.
6747 @item
6748 Buffer modifications frequently only affect the display of lines at and
6749 below where they occur. In these situations we should only invalidate
6750 the part of the cache starting at where the modification occurs.
6751 @end itemize
6752
6753 In case you're wondering, the Second Golden Rule of Redisplay is not
6754 applicable.
6755
6756 @node Extents, Faces and Glyphs, The Redisplay Mechanism, Top
6757 @chapter Extents
6758
6759 @menu
6760 * Introduction to Extents:: Extents are ranges over text, with properties.
6761 * Extent Ordering:: How extents are ordered internally.
6762 * Format of the Extent Info:: The extent information in a buffer or string.
6763 * Zero-Length Extents:: A weird special case.
6764 * Mathematics of Extent Ordering:: A rigorous foundation.
6765 * Extent Fragments:: Cached information useful for redisplay.
6766 @end menu
6767
6768 @node Introduction to Extents
6769 @section Introduction to Extents
6770
6771 Extents are regions over a buffer, with a start and an end position
6772 denoting the region of the buffer included in the extent. In
6773 addition, either end can be closed or open, meaning that the endpoint
6774 is or is not logically included in the extent. Insertion of a character
6775 at a closed endpoint causes the character to go inside the extent;
6776 insertion at an open endpoint causes the character to go outside.
6777
6778 Extent endpoints are stored using memory indices (see @file{insdel.c}),
6779 to minimize the amount of adjusting that needs to be done when
6780 characters are inserted or deleted.
6781
6782 (Formerly, extent endpoints at the gap could be either before or
6783 after the gap, depending on the open/closedness of the endpoint.
6784 The intent of this was to make it so that insertions would
6785 automatically go inside or out of extents as necessary with no
6786 further work needing to be done. It didn't work out that way,
6787 however, and just ended up complexifying and buggifying all the
6788 rest of the code.)
6789
6790 @node Extent Ordering
6791 @section Extent Ordering
6792
6793 Extents are compared using memory indices. There are two orderings
6794 for extents and both orders are kept current at all times. The normal
6795 or @dfn{display} order is as follows:
6796
6797 @example
6798 Extent A is ``less than'' extent B, that is, earlier in the display order,
6799 if: A-start < B-start,
6800 or if: A-start = B-start, and A-end > B-end
6801 @end example
6802
6803 So if two extents begin at the same position, the larger of them is the
6804 earlier one in the display order (@code{EXTENT_LESS} is true).
6805
6806 For the e-order, the same thing holds:
6807
6808 @example
6809 Extent A is ``less than'' extent B in e-order, that is, later in the buffer,
6810 if: A-end < B-end,
6811 or if: A-end = B-end, and A-start > B-start
6812 @end example
6813
6814 So if two extents end at the same position, the smaller of them is the
6815 earlier one in the e-order (@code{EXTENT_E_LESS} is true).
6816
6817 The display order and the e-order are complementary orders: any
6818 theorem about the display order also applies to the e-order if you swap
6819 all occurrences of ``display order'' and ``e-order'', ``less than'' and
6820 ``greater than'', and ``extent start'' and ``extent end''.
6821
6822 @node Format of the Extent Info
6823 @section Format of the Extent Info
6824
6825 An extent-info structure consists of a list of the buffer or string's
6826 extents and a @dfn{stack of extents} that lists all of the extents over
6827 a particular position. The stack-of-extents info is used for
6828 optimization purposes -- it basically caches some info that might
6829 be expensive to compute. Certain otherwise hard computations are easy
6830 given the stack of extents over a particular position, and if the
6831 stack of extents over a nearby position is known (because it was
6832 calculated at some prior point in time), it's easy to move the stack
6833 of extents to the proper position.
6834
6835 Given that the stack of extents is an optimization, and given that
6836 it requires memory, a string's stack of extents is wiped out each
6837 time a garbage collection occurs. Therefore, any time you retrieve
6838 the stack of extents, it might not be there. If you need it to
6839 be there, use the @code{_force} version.
6840
6841 Similarly, a string may or may not have an extent_info structure.
6842 (Generally it won't if there haven't been any extents added to the
6843 string.) So use the @code{_force} version if you need the extent_info
6844 structure to be there.
6845
6846 A list of extents is maintained as a double gap array: one gap array
6847 is ordered by start index (the @dfn{display order}) and the other is
6848 ordered by end index (the @dfn{e-order}). Note that positions in an
6849 extent list should logically be conceived of as referring @emph{to} a
6850 particular extent (as is the norm in programs) rather than sitting
6851 between two extents. Note also that callers of these functions should
6852 not be aware of the fact that the extent list is implemented as an
6853 array, except for the fact that positions are integers (this should be
6854 generalized to handle integers and linked list equally well).
6855
6856 @node Zero-Length Extents
6857 @section Zero-Length Extents
6858
6859 Extents can be zero-length, and will end up that way if their endpoints
6860 are explicitly set that way or if their detachable property is nil
6861 and all the text in the extent is deleted. (The exception is open-open
6862 zero-length extents, which are barred from existing because there is
6863 no sensible way to define their properties. Deletion of the text in
6864 an open-open extent causes it to be converted into a closed-open
6865 extent.) Zero-length extents are primarily used to represent
6866 annotations, and behave as follows:
6867
6868 @enumerate
6869 @item
6870 Insertion at the position of a zero-length extent expands the extent
6871 if both endpoints are closed; goes after the extent if it is closed-open;
6872 and goes before the extent if it is open-closed.
6873
6874 @item
6875 Deletion of a character on a side of a zero-length extent whose
6876 corresponding endpoint is closed causes the extent to be detached if
6877 it is detachable; if the extent is not detachable or the corresponding
6878 endpoint is open, the extent remains in the buffer, moving as necessary.
6879 @end enumerate
6880
6881 Note that closed-open, non-detachable zero-length extents behave
6882 exactly like markers and that open-closed, non-detachable zero-length
6883 extents behave like the ``point-type'' marker in Mule.
6884
6885 @node Mathematics of Extent Ordering
6886 @section Mathematics of Extent Ordering
6887 @cindex extent mathematics
6888 @cindex mathematics of extents
6889 @cindex extent ordering
6890
6891 @cindex display order of extents
6892 @cindex extents, display order
6893 The extents in a buffer are ordered by ``display order'' because that
6894 is that order that the redisplay mechanism needs to process them in.
6895 The e-order is an auxiliary ordering used to facilitate operations
6896 over extents. The operations that can be performed on the ordered
6897 list of extents in a buffer are
6898
6899 @enumerate
6900 @item
6901 Locate where an extent would go if inserted into the list.
6902 @item
6903 Insert an extent into the list.
6904 @item
6905 Remove an extent from the list.
6906 @item
6907 Map over all the extents that overlap a range.
6908 @end enumerate
6909
6910 (4) requires being able to determine the first and last extents
6911 that overlap a range.
6912
6913 NOTE: @dfn{overlap} is used as follows:
6914
6915 @itemize @bullet
6916 @item
6917 two ranges overlap if they have at least one point in common.
6918 Whether the endpoints are open or closed makes a difference here.
6919 @item
6920 a point overlaps a range if the point is contained within the
6921 range; this is equivalent to treating a point @math{P} as the range
6922 @math{[P, P]}.
6923 @item
6924 In the case of an @emph{extent} overlapping a point or range, the extent
6925 is normally treated as having closed endpoints. This applies
6926 consistently in the discussion of stacks of extents and such below.
6927 Note that this definition of overlap is not necessarily consistent with
6928 the extents that @code{map-extents} maps over, since @code{map-extents}
6929 sometimes pays attention to whether the endpoints of an extents are open
6930 or closed. But for our purposes, it greatly simplifies things to treat
6931 all extents as having closed endpoints.
6932 @end itemize
6933
6934 First, define @math{>}, @math{<}, @math{<=}, etc. as applied to extents
6935 to mean comparison according to the display order. Comparison between
6936 an extent @math{E} and an index @math{I} means comparison between
6937 @math{E} and the range @math{[I, I]}.
6938
6939 Also define @math{e>}, @math{e<}, @math{e<=}, etc. to mean comparison
6940 according to the e-order.
6941
6942 For any range @math{R}, define @math{R(0)} to be the starting index of
6943 the range and @math{R(1)} to be the ending index of the range.
6944
6945 For any extent @math{E}, define @math{E(next)} to be the extent directly
6946 following @math{E}, and @math{E(prev)} to be the extent directly
6947 preceding @math{E}. Assume @math{E(next)} and @math{E(prev)} can be
6948 determined from @math{E} in constant time. (This is because we store
6949 the extent list as a doubly linked list.)
6950
6951 Similarly, define @math{E(e-next)} and @math{E(e-prev)} to be the
6952 extents directly following and preceding @math{E} in the e-order.
6953
6954 Now:
6955
6956 Let @math{R} be a range.
6957 Let @math{F} be the first extent overlapping @math{R}.
6958 Let @math{L} be the last extent overlapping @math{R}.
6959
6960 Theorem 1: @math{R(1)} lies between @math{L} and @math{L(next)},
6961 i.e. @math{L <= R(1) < L(next)}.
6962
6963 This follows easily from the definition of display order. The
6964 basic reason that this theorem applies is that the display order
6965 sorts by increasing starting index.
6966
6967 Therefore, we can determine @math{L} just by looking at where we would
6968 insert @math{R(1)} into the list, and if we know @math{F} and are moving
6969 forward over extents, we can easily determine when we've hit @math{L} by
6970 comparing the extent we're at to @math{R(1)}.
6971
6972 @example
6973 Theorem 2: @math{F(e-prev) e< [1, R(0)] e<= F}.
6974 @end example
6975
6976 This is the analog of Theorem 1, and applies because the e-order
6977 sorts by increasing ending index.
6978
6979 Therefore, @math{F} can be found in the same amount of time as
6980 operation (1), i.e. the time that it takes to locate where an extent
6981 would go if inserted into the e-order list.
6982
6983 If the lists were stored as balanced binary trees, then operation (1)
6984 would take logarithmic time, which is usually quite fast. However,
6985 currently they're stored as simple doubly-linked lists, and instead we
6986 do some caching to try to speed things up.
6987
6988 Define a @dfn{stack of extents} (or @dfn{SOE}) as the set of extents
6989 (ordered in the display order) that overlap an index @math{I}, together
6990 with the SOE's @dfn{previous} extent, which is an extent that precedes
6991 @math{I} in the e-order. (Hopefully there will not be very many extents
6992 between @math{I} and the previous extent.)
6993
6994 Now:
6995
6996 Let @math{I} be an index, let @math{S} be the stack of extents on
6997 @math{I}, let @math{F} be the first extent in @math{S}, and let @math{P}
6998 be @math{S}'s previous extent.
6999
7000 Theorem 3: The first extent in @math{S} is the first extent that overlaps
7001 any range @math{[I, J]}.
7002
7003 Proof: Any extent that overlaps @math{[I, J]} but does not include
7004 @math{I} must have a start index @math{> I}, and thus be greater than
7005 any extent in @math{S}.
7006
7007 Therefore, finding the first extent that overlaps a range @math{R} is
7008 the same as finding the first extent that overlaps @math{R(0)}.
7009
7010 Theorem 4: Let @math{I2} be an index such that @math{I2 > I}, and let
7011 @math{F2} be the first extent that overlaps @math{I2}. Then, either
7012 @math{F2} is in @math{S} or @math{F2} is greater than any extent in
7013 @math{S}.
7014
7015 Proof: If @math{F2} does not include @math{I} then its start index is
7016 greater than @math{I} and thus it is greater than any extent in
7017 @math{S}, including @math{F}. Otherwise, @math{F2} includes @math{I}
7018 and thus is in @math{S}, and thus @math{F2 >= F}.
7019
7020 @node Extent Fragments
7021 @section Extent Fragments
7022 @cindex extent fragment
7023
7024 Imagine that the buffer is divided up into contiguous, non-overlapping
7025 @dfn{runs} of text such that no extent starts or ends within a run
7026 (extents that abut the run don't count).
7027
7028 An extent fragment is a structure that holds data about the run that
7029 contains a particular buffer position (if the buffer position is at the
7030 junction of two runs, the run after the position is used) -- the
7031 beginning and end of the run, a list of all of the extents in that run,
7032 the @dfn{merged face} that results from merging all of the faces
7033 corresponding to those extents, the begin and end glyphs at the
7034 beginning of the run, etc. This is the information that redisplay needs
7035 in order to display this run.
7036
7037 Extent fragments have to be very quick to update to a new buffer
7038 position when moving linearly through the buffer. They rely on the
7039 stack-of-extents code, which does the heavy-duty algorithmic work of
7040 determining which extents overly a particular position.
7041
7042 @node Faces and Glyphs, Specifiers, Extents, Top
7043 @chapter Faces and Glyphs
7044
7045 Not yet documented.
7046
7047 @node Specifiers, Menus, Faces and Glyphs, Top
7048 @chapter Specifiers
7049
7050 Not yet documented.
7051
7052 @node Menus, Subprocesses, Specifiers, Top
7053 @chapter Menus
7054
7055 A menu is set by setting the value of the variable
7056 @code{current-menubar} (which may be buffer-local) and then calling
7057 @code{set-menubar-dirty-flag} to signal a change. This will cause the
7058 menu to be redrawn at the next redisplay. The format of the data in
7059 @code{current-menubar} is described in @file{menubar.c}.
7060
7061 Internally the data in current-menubar is parsed into a tree of
7062 @code{widget_value's} (defined in @file{lwlib.h}); this is accomplished
7063 by the recursive function @code{menu_item_descriptor_to_widget_value()},
7064 called by @code{compute_menubar_data()}. Such a tree is deallocated
7065 using @code{free_widget_value()}.
7066
7067 @code{update_screen_menubars()} is one of the external entry points.
7068 This checks to see, for each screen, if that screen's menubar needs to
7069 be updated. This is the case if
7070
7071 @enumerate
7072 @item
7073 @code{set-menubar-dirty-flag} was called since the last redisplay. (This
7074 function sets the C variable menubar_has_changed.)
7075 @item
7076 The buffer displayed in the screen has changed.
7077 @item
7078 The screen has no menubar currently displayed.
7079 @end enumerate
7080
7081 @code{set_screen_menubar()} is called for each such screen. This
7082 function calls @code{compute_menubar_data()} to create the tree of
7083 widget_value's, then calls @code{lw_create_widget()},
7084 @code{lw_modify_all_widgets()}, and/or @code{lw_destroy_all_widgets()}
7085 to create the X-Toolkit widget associated with the menu.
7086
7087 @code{update_psheets()}, the other external entry point, actually
7088 changes the menus being displayed. It uses the widgets fixed by
7089 @code{update_screen_menubars()} and calls various X functions to ensure
7090 that the menus are displayed properly.
7091
7092 The menubar widget is set up so that @code{pre_activate_callback()} is
7093 called when the menu is first selected (i.e. mouse button goes down),
7094 and @code{menubar_selection_callback()} is called when an item is
7095 selected. @code{pre_activate_callback()} calls the function in
7096 activate-menubar-hook, which can change the menubar (this is described
7097 in @file{menubar.c}). If the menubar is changed,
7098 @code{set_screen_menubars()} is called.
7099 @code{menubar_selection_callback()} enqueues a menu event, putting in it
7100 a function to call (either @code{eval} or @code{call-interactively}) and
7101 its argument, which is the callback function or form given in the menu's
7102 description.
7103
7104 @node Subprocesses, Interface to X Windows, Menus, Top
7105 @chapter Subprocesses
7106
7107 The fields of a process are:
7108
7109 @table @code
7110 @item name
7111 A string, the name of the process.
7112
7113 @item command
7114 A list containing the command arguments that were used to start this
7115 process.
7116
7117 @item filter
7118 A function used to accept output from the process instead of a buffer,
7119 or @code{nil}.
7120
7121 @item sentinel
7122 A function called whenever the process receives a signal, or @code{nil}.
7123
7124 @item buffer
7125 The associated buffer of the process.
7126
7127 @item pid
7128 An integer, the Unix process @sc{id}.
7129
7130 @item childp
7131 A flag, non-@code{nil} if this is really a child process.
7132 It is @code{nil} for a network connection.
7133
7134 @item mark
7135 A marker indicating the position of the end of the last output from this
7136 process inserted into the buffer. This is often but not always the end
7137 of the buffer.
7138
7139 @item kill_without_query
7140 If this is non-@code{nil}, killing XEmacs while this process is still
7141 running does not ask for confirmation about killing the process.
7142
7143 @item raw_status_low
7144 @itemx raw_status_high
7145 These two fields record 16 bits each of the process status returned by
7146 the @code{wait} system call.
7147
7148 @item status
7149 The process status, as @code{process-status} should return it.
7150
7151 @item tick
7152 @itemx update_tick
7153 If these two fields are not equal, a change in the status of the process
7154 needs to be reported, either by running the sentinel or by inserting a
7155 message in the process buffer.
7156
7157 @item pty_flag
7158 Non-@code{nil} if communication with the subprocess uses a @sc{pty};
7159 @code{nil} if it uses a pipe.
7160
7161 @item infd
7162 The file descriptor for input from the process.
7163
7164 @item outfd
7165 The file descriptor for output to the process.
7166
7167 @item subtty
7168 The file descriptor for the terminal that the subprocess is using. (On
7169 some systems, there is no need to record this, so the value is
7170 @code{-1}.)
7171
7172 @item tty_name
7173 The name of the terminal that the subprocess is using,
7174 or @code{nil} if it is using pipes.
7175 @end table
7176
7177 @node Interface to X Windows, Index, Subprocesses, Top
7178 @chapter Interface to X Windows
7179
7180 Not yet documented.
7181
7182 @include index.texi
7183
7184 @c Print the tables of contents
7185 @summarycontents
7186 @contents
7187 @c That's all
7188
7189 @bye
7190