xemacs-beta: man/internals/internals.texi comparison

comparison man/internals/internals.texi @ 428:3ecd8885ac67 r21-2-22

Import from CVS: tag r21-2-22

author	cvs
date	Mon, 13 Aug 2007 11:28:15 +0200
parents
children	080151679be2

comparison

equal deleted inserted replaced

-:0a0253eac470
+:3ecd8885ac67
+\input texinfo  @c -*-texinfo-*-
+@c %**start of header
+@setfilename ../../info/internals.info
+@settitle XEmacs Internals Manual
+@c %**end of header
+@ifinfo
+@dircategory XEmacs Editor
+@direntry
+* Internals: (internals).	XEmacs Internals Manual.
+@end direntry
+Copyright @copyright{} 1992 - 1996 Ben Wing.
+Copyright @copyright{} 1996, 1997 Sun Microsystems.
+Copyright @copyright{} 1994 - 1998 Free Software Foundation.
+Copyright @copyright{} 1994, 1995 Board of Trustees, University of Illinois.
+Permission is granted to make and distribute verbatim copies of this
+manual provided the copyright notice and this permission notice are
+preserved on all copies.
+@ignore
+Permission is granted to process this file through TeX and print the
+results, provided the printed document carries copying permission notice
+identical to this one except for the removal of this paragraph (this
+paragraph not being relevant to the printed manual).
+@end ignore
+Permission is granted to copy and distribute modified versions of this
+manual under the conditions for verbatim copying, provided that the
+entire resulting derived work is distributed under the terms of a
+permission notice identical to this one.
+Permission is granted to copy and distribute translations of this manual
+into another language, under the above conditions for modified versions,
+except that this permission notice may be stated in a translation
+approved by the Foundation.
+Permission is granted to copy and distribute modified versions of this
+manual under the conditions for verbatim copying, provided also that the
+section entitled ``GNU General Public License'' is included exactly as
+in the original, and provided that the entire resulting derived work is
+distributed under the terms of a permission notice identical to this
+one.
+Permission is granted to copy and distribute translations of this manual
+into another language, under the above conditions for modified versions,
+except that the section entitled ``GNU General Public License'' may be
+included in a translation approved by the Free Software Foundation
+instead of in the original English.
+@end ifinfo
+@c Combine indices.
+@synindex cp fn
+@syncodeindex vr fn
+@syncodeindex ky fn
+@syncodeindex pg fn
+@syncodeindex tp fn
+@setchapternewpage odd
+@finalout
+@titlepage
+@title XEmacs Internals Manual
+@subtitle Version 1.3, August 1999
+@author Ben Wing
+@author Martin Buchholz
+@author Hrvoje Niksic
+@author Matthias Neubauer
+@page
+@vskip 0pt plus 1fill
+@noindent
+Copyright @copyright{} 1992 - 1996 Ben Wing. @*
+Copyright @copyright{} 1996, 1997 Sun Microsystems, Inc. @*
+Copyright @copyright{} 1994 - 1998 Free Software Foundation. @*
+Copyright @copyright{} 1994, 1995 Board of Trustees, University of Illinois.
+@sp 2
+Version 1.3 @*
+August 1999.@*
+Permission is granted to make and distribute verbatim copies of this
+manual provided the copyright notice and this permission notice are
+preserved on all copies.
+Permission is granted to copy and distribute modified versions of this
+manual under the conditions for verbatim copying, provided also that the
+section entitled ``GNU General Public License'' is included
+exactly as in the original, and provided that the entire resulting
+derived work is distributed under the terms of a permission notice
+identical to this one.
+Permission is granted to copy and distribute translations of this manual
+into another language, under the above conditions for modified versions,
+except that the section entitled ``GNU General Public License'' may be
+included in a translation approved by the Free Software Foundation
+instead of in the original English.
+@end titlepage
+@page
+@node Top, A History of Emacs, (dir), (dir)
+@ifinfo
+This Info file contains v1.0 of the XEmacs Internals Manual.
+@end ifinfo
+@menu
+* A History of Emacs::          Times, dates, important events.
+* XEmacs From the Outside::     A broad conceptual overview.
+* The Lisp Language::           An overview.
+* XEmacs From the Perspective of Building::
+* XEmacs From the Inside::
+* The XEmacs Object System (Abstractly Speaking)::
+* How Lisp Objects Are Represented in C::
+* Rules When Writing New C Code::
+* A Summary of the Various XEmacs Modules::
+* Allocation of Objects in XEmacs Lisp::
+* Events and the Event Loop::
+* Evaluation; Stack Frames; Bindings::
+* Symbols and Variables::
+* Buffers and Textual Representation::
+* MULE Character Sets and Encodings::
+* The Lisp Reader and Compiler::
+* Lstreams::
+* Consoles; Devices; Frames; Windows::
+* The Redisplay Mechanism::
+* Extents::
+* Faces::
+* Glyphs::
+* Specifiers::
+* Menus::
+* Subprocesses::
+* Interface to X Windows::
+* Index::                   Index including concepts, functions, variables,
+and other terms.
+--- The Detailed Node Listing ---
+Here are other nodes that are inferiors of those already listed,
+mentioned here so you can get to them in one step:
+A History of Emacs
+* Through Version 18::          Unification prevails.
+* Lucid Emacs::                 One version 19 Emacs.
+* GNU Emacs 19::                The other version 19 Emacs.
+* XEmacs::                      The continuation of Lucid Emacs.
+Rules When Writing New C Code
+* General Coding Rules::
+* Writing Lisp Primitives::
+* Adding Global Lisp Variables::
+* Techniques for XEmacs Developers::
+A Summary of the Various XEmacs Modules
+* Low-Level Modules::
+* Basic Lisp Modules::
+* Modules for Standard Editing Operations::
+* Editor-Level Control Flow Modules::
+* Modules for the Basic Displayable Lisp Objects::
+* Modules for other Display-Related Lisp Objects::
+* Modules for the Redisplay Mechanism::
+* Modules for Interfacing with the File System::
+* Modules for Other Aspects of the Lisp Interpreter and Object System::
+* Modules for Interfacing with the Operating System::
+* Modules for Interfacing with X Windows::
+* Modules for Internationalization::
+Allocation of Objects in XEmacs Lisp
+* Introduction to Allocation::
+* Garbage Collection::
+* GCPROing::
+* Garbage Collection - Step by Step::
+* Integers and Characters::
+* Allocation from Frob Blocks::
+* lrecords::
+* Low-level allocation::
+* Pure Space::
+* Cons::
+* Vector::
+* Bit Vector::
+* Symbol::
+* Marker::
+* String::
+* Compiled Function::
+Events and the Event Loop
+* Introduction to Events::
+* Main Loop::
+* Specifics of the Event Gathering Mechanism::
+* Specifics About the Emacs Event::
+* The Event Stream Callback Routines::
+* Other Event Loop Functions::
+* Converting Events::
+* Dispatching Events; The Command Builder::
+Evaluation; Stack Frames; Bindings
+* Evaluation::
+* Dynamic Binding; The specbinding Stack; Unwind-Protects::
+* Simple Special Forms::
+* Catch and Throw::
+Symbols and Variables
+* Introduction to Symbols::
+* Obarrays::
+* Symbol Values::
+Buffers and Textual Representation
+* Introduction to Buffers::     A buffer holds a block of text such as a file.
+* The Text in a Buffer::        Representation of the text in a buffer.
+* Buffer Lists::                Keeping track of all buffers.
+* Markers and Extents::         Tagging locations within a buffer.
+* Bufbytes and Emchars::        Representation of individual characters.
+* The Buffer Object::           The Lisp object corresponding to a buffer.
+MULE Character Sets and Encodings
+* Character Sets::
+* Encodings::
+* Internal Mule Encodings::
+Encodings
+* Japanese EUC (Extended Unix Code)::
+* JIS7::
+Internal Mule Encodings
+* Internal String Encoding::
+* Internal Character Encoding::
+The Lisp Reader and Compiler
+Lstreams
+Consoles; Devices; Frames; Windows
+* Introduction to Consoles; Devices; Frames; Windows::
+* Point::
+* Window Hierarchy::
+The Redisplay Mechanism
+* Critical Redisplay Sections::
+* Line Start Cache::
+Extents
+* Introduction to Extents::     Extents are ranges over text, with properties.
+* Extent Ordering::             How extents are ordered internally.
+* Format of the Extent Info::   The extent information in a buffer or string.
+* Zero-Length Extents::         A weird special case.
+* Mathematics of Extent Ordering::      A rigorous foundation.
+* Extent Fragments::            Cached information useful for redisplay.
+Faces
+Glyphs
+Specifiers
+Menus
+Subprocesses
+Interface to X Windows
+@end menu
+@node A History of Emacs, XEmacs From the Outside, Top, Top
+@chapter A History of Emacs
+@cindex history of Emacs
+@cindex Hackers (Steven Levy)
+@cindex Levy, Steven
+@cindex ITS (Incompatible Timesharing System)
+@cindex Stallman, Richard
+@cindex RMS
+@cindex MIT
+@cindex TECO
+@cindex FSF
+@cindex Free Software Foundation
+XEmacs is a powerful, customizable text editor and development
+environment.  It began as Lucid Emacs, which was in turn derived from
+GNU Emacs, a program written by Richard Stallman of the Free Software
+Foundation.  GNU Emacs dates back to the 1970's, and was modelled
+after a package called ``Emacs'', written in 1976, that was a set of
+macros on top of TECO, an old, old text editor written at MIT on the
+DEC PDP 10 under one of the earliest time-sharing operating systems,
+ITS (Incompatible Timesharing System). (ITS dates back well before
+Unix.) ITS, TECO, and Emacs were products of a group of people at MIT
+who called themselves ``hackers'', who shared an idealistic belief
+system about the free exchange of information and were fanatical in
+their devotion to and time spent with computers. (The hacker
+subculture dates back to the late 1950's at MIT and is described in
+detail in Steven Levy's book @cite{Hackers}.  This book also includes
+a lot of information about Stallman himself and the development of
+Lisp, a programming language developed at MIT that underlies Emacs.)
+@menu
+* Through Version 18::          Unification prevails.
+* Lucid Emacs::                 One version 19 Emacs.
+* GNU Emacs 19::                The other version 19 Emacs.
+* GNU Emacs 20::                The other version 20 Emacs.
+* XEmacs::                      The continuation of Lucid Emacs.
+@end menu
+@node Through Version 18
+@section Through Version 18
+@cindex Gosling, James
+@cindex Great Usenet Renaming
+Although the history of the early versions of GNU Emacs is unclear,
+the history is well-known from the middle of 1985.  A time line is:
+@itemize @bullet
+@item
+GNU Emacs version 15 (15.34) was released sometime in 1984 or 1985 and
+shared some code with a version of Emacs written by James Gosling (the
+same James Gosling who later created the Java language).
+@item
+GNU Emacs version 16 (first released version was 16.56) was released on
+July 15, 1985.  All Gosling code was removed due to potential copyright
+problems with the code.
+@item
+version 16.57: released on September 16, 1985.
+@item
+versions 16.58, 16.59: released on September 17, 1985.
+@item
+version 16.60: released on September 19, 1985.  These later version 16's
+incorporated patches from the net, esp. for getting Emacs to work under
+System V.
+@item
+version 17.36 (first official v17 release) released on December 20,
+1985.  Included a TeX-able user manual.  First official unpatched
+version that worked on vanilla System V machines.
+@item
+version 17.43 (second official v17 release) released on January 25,
+1986.
+@item
+version 17.45 released on January 30, 1986.
+@item
+version 17.46 released on February 4, 1986.
+@item
+version 17.48 released on February 10, 1986.
+@item
+version 17.49 released on February 12, 1986.
+@item
+version 17.55 released on March 18, 1986.
+@item
+version 17.57 released on March 27, 1986.
+@item
+version 17.58 released on April 4, 1986.
+@item
+version 17.61 released on April 12, 1986.
+@item
+version 17.63 released on May 7, 1986.
+@item
+version 17.64 released on May 12, 1986.
+@item
+version 18.24 (a beta version) released on October 2, 1986.
+@item
+version 18.30 (a beta version) released on November 15, 1986.
+@item
+version 18.31 (a beta version) released on November 23, 1986.
+@item
+version 18.32 (a beta version) released on December 7, 1986.
+@item
+version 18.33 (a beta version) released on December 12, 1986.
+@item
+version 18.35 (a beta version) released on January 5, 1987.
+@item
+version 18.36 (a beta version) released on January 21, 1987.
+@item
+January 27, 1987: The Great Usenet Renaming.  net.emacs is now
+comp.emacs.
+@item
+version 18.37 (a beta version) released on February 12, 1987.
+@item
+version 18.38 (a beta version) released on March 3, 1987.
+@item
+version 18.39 (a beta version) released on March 14, 1987.
+@item
+version 18.40 (a beta version) released on March 18, 1987.
+@item
+version 18.41 (the first ``official'' release) released on March 22,
+1987.
+@item
+version 18.45 released on June 2, 1987.
+@item
+version 18.46 released on June 9, 1987.
+@item
+version 18.47 released on June 18, 1987.
+@item
+version 18.48 released on September 3, 1987.
+@item
+version 18.49 released on September 18, 1987.
+@item
+version 18.50 released on February 13, 1988.
+@item
+version 18.51 released on May 7, 1988.
+@item
+version 18.52 released on September 1, 1988.
+@item
+version 18.53 released on February 24, 1989.
+@item
+version 18.54 released on April 26, 1989.
+@item
+version 18.55 released on August 23, 1989.  This is the earliest version
+that is still available by FTP.
+@item
+version 18.56 released on January 17, 1991.
+@item
+version 18.57 released late January, 1991.
+@item
+version 18.58 released ?????.
+@item
+version 18.59 released October 31, 1992.
+@end itemize
+@node Lucid Emacs
+@section Lucid Emacs
+@cindex Lucid Emacs
+@cindex Lucid Inc.
+@cindex Energize
+@cindex Epoch
+Lucid Emacs was developed by the (now-defunct) Lucid Inc., a maker of
+C++ and Lisp development environments.  It began when Lucid decided they
+wanted to use Emacs as the editor and cornerstone of their C++
+development environment (called ``Energize'').  They needed many features
+that were not available in the existing version of GNU Emacs (version
+18.5something), in particular good and integrated support for GUI
+elements such as mouse support, multiple fonts, multiple window-system
+windows, etc.  A branch of GNU Emacs called Epoch, written at the
+University of Illinois, existed that supplied many of these features;
+however, Lucid needed more than what existed in Epoch.  At the time, the
+Free Software Foundation was working on version 19 of Emacs (this was
+sometime around 1991), which was planned to have similar features, and
+so Lucid decided to work with the Free Software Foundation.  Their plan
+was to add features that they needed, and coordinate with the FSF so
+that the features would get included back into Emacs version 19.
+Delays in the release of version 19 occurred, however (resulting in it
+finally being released more than a year after what was initially
+planned), and Lucid encountered unexpected technical resistance in
+getting their changes merged back into version 19, so they decided to
+release their own version of Emacs, which became Lucid Emacs 19.0.
+@cindex Zawinski, Jamie
+@cindex Sexton, Harlan
+@cindex Benson, Eric
+@cindex Devin, Matthieu
+The initial authors of Lucid Emacs were Matthieu Devin, Harlan Sexton,
+and Eric Benson, and the work was later taken over by Jamie Zawinski,
+who became ``Mr. Lucid Emacs'' for many releases.
+A time line for Lucid Emacs/XEmacs is
+@itemize @bullet
+@item
+version 19.0 shipped with Energize 1.0, April 1992.
+@item
+version 19.1 released June 4, 1992.
+@item
+version 19.2 released June 19, 1992.
+@item
+version 19.3 released September 9, 1992.
+@item
+version 19.4 released January 21, 1993.
+@item
+version 19.5 was a repackaging of 19.4 with a few bug fixes and
+shipped with Energize 2.0.  Never released to the net.
+@item
+version 19.6 released April 9, 1993.
+@item
+version 19.7 was a repackaging of 19.6 with a few bug fixes and
+shipped with Energize 2.1.  Never released to the net.
+@item
+version 19.8 released September 6, 1993.
+@item
+version 19.9 released January 12, 1994.
+@item
+version 19.10 released May 27, 1994.
+@item
+version 19.11 (first XEmacs) released September 13, 1994.
+@item
+version 19.12 released June 23, 1995.
+@item
+version 19.13 released September 1, 1995.
+@item
+version 19.14 released June 23, 1996.
+@item
+version 20.0 released February 9, 1997.
+@item
+version 19.15 released March 28, 1997.
+@item
+version 20.1 (not released to the net) April 15, 1997.
+@item
+version 20.2 released May 16, 1997.
+@item
+version 19.16 released October 31, 1997.
+@item
+version 20.3 (the first stable version of XEmacs 20.x) released November 30,
+1997.
+version 20.4 released February 28, 1998.
+@end itemize
+@node GNU Emacs 19
+@section GNU Emacs 19
+@cindex GNU Emacs 19
+@cindex FSF Emacs
+About a year after the initial release of Lucid Emacs, the FSF
+released a beta of their version of Emacs 19 (referred to here as ``GNU
+Emacs'').  By this time, the current version of Lucid Emacs was
+19.6. (Strangely, the first released beta from the FSF was GNU Emacs
+19.7.) A time line for GNU Emacs version 19 is
+@itemize @bullet
+@item
+version 19.8 (beta) released May 27, 1993.
+@item
+version 19.9 (beta) released May 27, 1993.
+@item
+version 19.10 (beta) released May 30, 1993.
+@item
+version 19.11 (beta) released June 1, 1993.
+@item
+version 19.12 (beta) released June 2, 1993.
+@item
+version 19.13 (beta) released June 8, 1993.
+@item
+version 19.14 (beta) released June 17, 1993.
+@item
+version 19.15 (beta) released June 19, 1993.
+@item
+version 19.16 (beta) released July 6, 1993.
+@item
+version 19.17 (beta) released late July, 1993.
+@item
+version 19.18 (beta) released August 9, 1993.
+@item
+version 19.19 (beta) released August 15, 1993.
+@item
+version 19.20 (beta) released November 17, 1993.
+@item
+version 19.21 (beta) released November 17, 1993.
+@item
+version 19.22 (beta) released November 28, 1993.
+@item
+version 19.23 (beta) released May 17, 1994.
+@item
+version 19.24 (beta) released May 16, 1994.
+@item
+version 19.25 (beta) released June 3, 1994.
+@item
+version 19.26 (beta) released September 11, 1994.
+@item
+version 19.27 (beta) released September 14, 1994.
+@item
+version 19.28 (first ``official'' release) released November 1, 1994.
+@item
+version 19.29 released June 21, 1995.
+@item
+version 19.30 released November 24, 1995.
+@item
+version 19.31 released May 25, 1996.
+@item
+version 19.32 released July 31, 1996.
+@item
+version 19.33 released August 11, 1996.
+@item
+version 19.34 released August 21, 1996.
+@item
+version 19.34b released September 6, 1996.
+@end itemize
+@cindex Mlynarik, Richard
+In some ways, GNU Emacs 19 was better than Lucid Emacs; in some ways,
+worse.  Lucid soon began incorporating features from GNU Emacs 19 into
+Lucid Emacs; the work was mostly done by Richard Mlynarik, who had been
+working on and using GNU Emacs for a long time (back as far as version
+16 or 17).
+@node GNU Emacs 20
+@section GNU Emacs 20
+@cindex GNU Emacs 20
+@cindex FSF Emacs
+On February 2, 1997 work began on GNU Emacs to integrate Mule.  The first
+release was made in September of that year.
+A timeline for Emacs 20 is
+@itemize @bullet
+@item
+version 20.1 released September 17, 1997.
+@item
+version 20.2 released September 20, 1997.
+@item
+version 20.3 released August 19, 1998.
+@end itemize
+@node XEmacs
+@section XEmacs
+@cindex XEmacs
+@cindex Sun Microsystems
+@cindex University of Illinois
+@cindex Illinois, University of
+@cindex SPARCWorks
+@cindex Andreessen, Marc
+@cindex Baur, Steve
+@cindex Buchholz, Martin
+@cindex Kaplan, Simon
+@cindex Wing, Ben
+@cindex Thompson, Chuck
+@cindex Win-Emacs
+@cindex Epoch
+@cindex Amdahl Corporation
+Around the time that Lucid was developing Energize, Sun Microsystems
+was developing their own development environment (called ``SPARCWorks'')
+and also decided to use Emacs.  They joined forces with the Epoch team
+at the University of Illinois and later with Lucid.  The maintainer of
+the last-released version of Epoch was Marc Andreessen, but he dropped
+out and the Epoch project, headed by Simon Kaplan, lured Chuck Thompson
+away from a system administration job to become the primary Lucid Emacs
+author for Epoch and Sun.  Chuck's area of specialty became the
+redisplay engine (he replaced the old Lucid Emacs redisplay engine with
+a ported version from Epoch and then later rewrote it from scratch).
+Sun also hired Ben Wing (the author of Win-Emacs, a port of Lucid Emacs
+to Microsoft Windows 3.1) in 1993, for what was initially a one-month
+contract to fix some event problems but later became a many-year
+involvement, punctuated by a six-month contract with Amdahl Corporation.
+@cindex rename to XEmacs
+In 1994, Sun and Lucid agreed to rename Lucid Emacs to XEmacs (a name
+not favorable to either company); the first release called XEmacs was
+version 19.11.  In June 1994, Lucid folded and Jamie quit to work for
+the newly formed Mosaic Communications Corp., later Netscape
+Communications Corp. (co-founded by the same Marc Andreessen, who had
+quit his Epoch job to work on a graphical browser for the World Wide
+Web).  Chuck then become the primary maintainer of XEmacs, and put out
+versions 19.11 through 19.14 in conjunction with Ben.  For 19.12 and
+19.13, Chuck added the new redisplay and many other display improvements
+and Ben added MULE support (support for Asian and other languages) and
+redesigned most of the internal Lisp subsystems to better support the
+MULE work and the various other features being added to XEmacs.  After
+19.14 Chuck retired as primary maintainer and Steve Baur stepped in.
+@cindex MULE merged XEmacs appears
+Soon after 19.13 was released, work began in earnest on the MULE
+internationalization code and the source tree was divided into two
+development paths.  The MULE version was initially called 19.20, but was
+soon renamed to 20.0.  In 1996 Martin Buchholz of Sun Microsystems took
+over the care and feeding of it and worked on it in parallel with the
+19.14 development that was occurring at the same time.  After much work
+by Martin, it was decided to release 20.0 ahead of 19.15 in February
+1997.  The source tree remained divided until 20.2 when the version 19
+source was finally retired at version 19.16.
+@cindex Baur, Steve
+@cindex Buchholz, Martin
+@cindex Jones, Kyle
+@cindex Niksic, Hrvoje
+@cindex XEmacs goes it alone
+In 1997, Sun finally dropped all pretense of support for XEmacs and
+Martin Buchholz left the company in November.  Since then, and mostly
+for the previous year, because Steve Baur was never paid to work on
+XEmacs, XEmacs has existed solely on the contributions of volunteers
+from the Free Software Community.  Starting from 1997, Hrvoje Niksic and
+Kyle Jones have figured prominently in XEmacs development.
+@cindex merging attempts
+Many attempts have been made to merge XEmacs and GNU Emacs, but they
+have consistently failed.
+A more detailed history is contained in the XEmacs About page.
+@node XEmacs From the Outside, The Lisp Language, A History of Emacs, Top
+@chapter XEmacs From the Outside
+@cindex read-eval-print
+XEmacs appears to the outside world as an editor, but it is really a
+Lisp environment.  At its heart is a Lisp interpreter; it also
+``happens'' to contain many specialized object types (e.g. buffers,
+windows, frames, events) that are useful for implementing an editor.
+Some of these objects (in particular windows and frames) have
+displayable representations, and XEmacs provides a function
+@code{redisplay()} that ensures that the display of all such objects
+matches their internal state.  Most of the time, a standard Lisp
+environment is in a @dfn{read-eval-print} loop -- i.e. ``read some Lisp
+code, execute it, and print the results''.  XEmacs has a similar loop:
+@itemize @bullet
+@item
+read an event
+@item
+dispatch the event (i.e. ``do it'')
+@item
+redisplay
+@end itemize
+Reading an event is done using the Lisp function @code{next-event},
+which waits for something to happen (typically, the user presses a key
+or moves the mouse) and returns an event object describing this.
+Dispatching an event is done using the Lisp function
+@code{dispatch-event}, which looks up the event in a keymap object (a
+particular kind of object that associates an event with a Lisp function)
+and calls that function.  The function ``does'' what the user has
+requested by changing the state of particular frame objects, buffer
+objects, etc.  Finally, @code{redisplay()} is called, which updates the
+display to reflect those changes just made.  Thus is an ``editor'' born.
+@cindex bridge, playing
+@cindex taxes, doing
+@cindex pi, calculating
+Note that you do not have to use XEmacs as an editor; you could just
+as well make it do your taxes, compute pi, play bridge, etc.  You'd just
+have to write functions to do those operations in Lisp.
+@node The Lisp Language, XEmacs From the Perspective of Building, XEmacs From the Outside, Top
+@chapter The Lisp Language
+@cindex Lisp vs. C
+@cindex C vs. Lisp
+@cindex Lisp vs. Java
+@cindex Java vs. Lisp
+@cindex dynamic scoping
+@cindex scoping, dynamic
+@cindex dynamic types
+@cindex types, dynamic
+@cindex Java
+@cindex Common Lisp
+@cindex Gosling, James
+Lisp is a general-purpose language that is higher-level than C and in
+many ways more powerful than C.  Powerful dialects of Lisp such as
+Common Lisp are probably much better languages for writing very large
+applications than is C. (Unfortunately, for many non-technical
+reasons C and its successor C++ have become the dominant languages for
+application development.  These languages are both inadequate for
+extremely large applications, which is evidenced by the fact that newer,
+larger programs are becoming ever harder to write and are requiring ever
+more programmers despite great increases in C development environments;
+and by the fact that, although hardware speeds and reliability have been
+growing at an exponential rate, most software is still generally
+considered to be slow and buggy.)
+The new Java language holds promise as a better general-purpose
+development language than C.  Java has many features in common with
+Lisp that are not shared by C (this is not a coincidence, since
+Java was designed by James Gosling, a former Lisp hacker).  This
+will be discussed more later.
+For those used to C, here is a summary of the basic differences between
+C and Lisp:
+@enumerate
+@item
+Lisp has an extremely regular syntax.  Every function, expression,
+and control statement is written in the form
+@example
+(@var{func} @var{arg1} @var{arg2} ...)
+@end example
+This is as opposed to C, which writes functions as
+@example
+func(@var{arg1}, @var{arg2}, ...)
+@end example
+but writes expressions involving operators as (e.g.)
+@example
+@var{arg1} + @var{arg2}
+@end example
+and writes control statements as (e.g.)
+@example
+while (@var{expr}) @{ @var{statement1}; @var{statement2}; ... @}
+@end example
+Lisp equivalents of the latter two would be
+@example
+(+ @var{arg1} @var{arg2} ...)
+@end example
+and
+@example
+(while @var{expr} @var{statement1} @var{statement2} ...)
+@end example
+@item
+Lisp is a safe language.  Assuming there are no bugs in the Lisp
+interpreter/compiler, it is impossible to write a program that ``core
+dumps'' or otherwise causes the machine to execute an illegal
+instruction.  This is very different from C, where perhaps the most
+common outcome of a bug is exactly such a crash.  A corollary of this is that
+the C operation of casting a pointer is impossible (and unnecessary) in
+Lisp, and that it is impossible to access memory outside the bounds of
+an array.
+@item
+Programs and data are written in the same form.  The
+parenthesis-enclosing form described above for statements is the same
+form used for the most common data type in Lisp, the list.  Thus, it is
+possible to represent any Lisp program using Lisp data types, and for
+one program to construct Lisp statements and then dynamically
+@dfn{evaluate} them, or cause them to execute.
+@item
+All objects are @dfn{dynamically typed}.  This means that part of every
+object is an indication of what type it is.  A Lisp program can
+manipulate an object without knowing what type it is, and can query an
+object to determine its type.  This means that, correspondingly,
+variables and function parameters can hold objects of any type and are
+not normally declared as being of any particular type.  This is opposed
+to the @dfn{static typing} of C, where variables can hold exactly one
+type of object and must be declared as such, and objects do not contain
+an indication of their type because it's implicit in the variables they
+are stored in.  It is possible in C to have a variable hold different
+types of objects (e.g. through the use of @code{void *} pointers or
+variable-argument functions), but the type information must then be
+passed explicitly in some other fashion, leading to additional program
+complexity.
+@item
+Allocated memory is automatically reclaimed when it is no longer in use.
+This operation is called @dfn{garbage collection} and involves looking
+through all variables to see what memory is being pointed to, and
+reclaiming any memory that is not pointed to and is thus
+``inaccessible'' and out of use.  This is as opposed to C, in which
+allocated memory must be explicitly reclaimed using @code{free()}.  If
+you simply drop all pointers to memory without freeing it, it becomes
+``leaked'' memory that still takes up space.  Over a long period of
+time, this can cause your program to grow and grow until it runs out of
+memory.
+@item
+Lisp has built-in facilities for handling errors and exceptions.  In C,
+when an error occurs, usually either the program exits entirely or the
+routine in which the error occurs returns a value indicating this.  If
+an error occurs in a deeply-nested routine, then every routine currently
+called must unwind itself normally and return an error value back up to
+the next routine.  This means that every routine must explicitly check
+for an error in all the routines it calls; if it does not do so,
+unexpected and often random behavior results.  This is an extremely
+common source of bugs in C programs.  An alternative would be to do a
+non-local exit using @code{longjmp()}, but that is often very dangerous
+because the routines that were exited past had no opportunity to clean
+up after themselves and may leave things in an inconsistent state,
+causing a crash shortly afterwards.
+Lisp provides mechanisms to make such non-local exits safe.  When an
+error occurs, a routine simply signals that an error of a particular
+class has occurred, and a non-local exit takes place.  Any routine can
+trap errors occurring in routines it calls by registering an error
+handler for some or all classes of errors. (If no handler is registered,
+a default handler, generally installed by the top-level event loop, is
+executed; this prints out the error and continues.) Routines can also
+specify cleanup code (called an @dfn{unwind-protect}) that will be
+called when control exits from a block of code, no matter how that exit
+occurs -- i.e. even if a function deeply nested below it causes a
+non-local exit back to the top level.
+Note that this facility has appeared in some recent vintages of C, in
+particular Visual C++ and other PC compilers written for the Microsoft
+Win32 API.
+@item
+In Emacs Lisp, local variables are @dfn{dynamically scoped}.  This means
+that if you declare a local variable in a particular function, and then
+call another function, that subfunction can ``see'' the local variable
+you declared.  This is actually considered a bug in Emacs Lisp and in
+all other early dialects of Lisp, and was corrected in Common Lisp. (In
+Common Lisp, you can still declare dynamically scoped variables if you
+want to -- they are sometimes useful -- but variables by default are
+@dfn{lexically scoped} as in C.)
+@end enumerate
+For those familiar with Lisp, Emacs Lisp is modelled after MacLisp, an
+early dialect of Lisp developed at MIT (no relation to the Macintosh
+computer).  There is a Common Lisp compatibility package available for
+Emacs that provides many of the features of Common Lisp.
+The Java language is derived in many ways from C, and shares a similar
+syntax, but has the following features in common with Lisp (and different
+from C):
+@enumerate
+@item
+Java is a safe language, like Lisp.
+@item
+Java provides garbage collection, like Lisp.
+@item
+Java has built-in facilities for handling errors and exceptions, like
+Lisp.
+@item
+Java has a type system that combines the best advantages of both static
+and dynamic typing.  Objects (except very simple types) are explicitly
+marked with their type, as in dynamic typing; but there is a hierarchy
+of types and functions are declared to accept only certain types, thus
+providing the increased compile-time error-checking of static typing.
+@end enumerate
+The Java language also has some negative attributes:
+@enumerate
+@item
+Java uses the edit/compile/run model of software development.  This
+makes it hard to use interactively.  For example, to use Java like
+@code{bc} it is necessary to write a special purpose, albeit tiny,
+application.  In Emacs Lisp, a calculator comes built-in without any
+effort - one can always just type an expression in the @code{*scratch*}
+buffer.
+@item
+Java tries too hard to enforce, not merely enable, portability, making
+ordinary access to standard OS facilities painful.  Java has an
+@dfn{agenda}.  I think this is why @code{chdir} is not part of standard
+Java, which is inexcusable.
+@end enumerate
+Unfortunately, there is no perfect language.  Static typing allows a
+compiler to catch programmer errors and produce more efficient code, but
+makes programming more tedious and less fun.  For the forseeable future,
+an Ideal Editing and Programming Environment (and that is what XEmacs
+aspires to) will be programmable in multiple languages: high level ones
+like Lisp for user customization and prototyping, and lower level ones
+for infrastructure and industrial strength applications.  If I had my
+way, XEmacs would be friendly towards the Python, Scheme, C++, ML,
+etc... communities.  But there are serious technical difficulties to
+achieving that goal.
+The word @dfn{application} in the previous paragraph was used
+intentionally.  XEmacs implements an API for programs written in Lisp
+that makes it a full-fledged application platform, very much like an OS
+inside the real OS.
+@node XEmacs From the Perspective of Building, XEmacs From the Inside, The Lisp Language, Top
+@chapter XEmacs From the Perspective of Building
+The heart of XEmacs is the Lisp environment, which is written in C.
+This is contained in the @file{src/} subdirectory.  Underneath
+@file{src/} are two subdirectories of header files: @file{s/} (header
+files for particular operating systems) and @file{m/} (header files for
+particular machine types).  In practice the distinction between the two
+types of header files is blurred.  These header files define or undefine
+certain preprocessor constants and macros to indicate particular
+characteristics of the associated machine or operating system.  As part
+of the configure process, one @file{s/} file and one @file{m/} file is
+identified for the particular environment in which XEmacs is being
+built.
+XEmacs also contains a great deal of Lisp code.  This implements the
+operations that make XEmacs useful as an editor as well as just a Lisp
+environment, and also contains many add-on packages that allow XEmacs to
+browse directories, act as a mail and Usenet news reader, compile Lisp
+code, etc.  There is actually more Lisp code than C code associated with
+XEmacs, but much of the Lisp code is peripheral to the actual operation
+of the editor.  The Lisp code all lies in subdirectories underneath the
+@file{lisp/} directory.
+The @file{lwlib/} directory contains C code that implements a
+generalized interface onto different X widget toolkits and also
+implements some widgets of its own that behave like Motif widgets but
+are faster, free, and in some cases more powerful.  The code in this
+directory compiles into a library and is mostly independent from XEmacs.
+The @file{etc/} directory contains various data files associated with
+XEmacs.  Some of them are actually read by XEmacs at startup; others
+merely contain useful information of various sorts.
+The @file{lib-src/} directory contains C code for various auxiliary
+programs that are used in connection with XEmacs.  Some of them are used
+during the build process; others are used to perform certain functions
+that cannot conveniently be placed in the XEmacs executable (e.g. the
+@file{movemail} program for fetching mail out of @file{/var/spool/mail},
+which must be setgid to @file{mail} on many systems; and the
+@file{gnuclient} program, which allows an external script to communicate
+with a running XEmacs process).
+The @file{man/} directory contains the sources for the XEmacs
+documentation.  It is mostly in a form called Texinfo, which can be
+converted into either a printed document (by passing it through @TeX{})
+or into on-line documentation called @dfn{info files}.
+The @file{info/} directory contains the results of formatting the XEmacs
+documentation as @dfn{info files}, for on-line use.  These files are
+used when you enter the Info system using @kbd{C-h i} or through the
+Help menu.
+The @file{dynodump/} directory contains auxiliary code used to build
+XEmacs on Solaris platforms.
+The other directories contain various miscellaneous code and information
+that is not normally used or needed.
+The first step of building involves running the @file{configure} program
+and passing it various parameters to specify any optional features you
+want and compiler arguments and such, as described in the @file{INSTALL}
+file.  This determines what the build environment is, chooses the
+appropriate @file{s/} and @file{m/} file, and runs a series of tests to
+determine many details about your environment, such as which library
+functions are available and exactly how they work.  The reason for
+running these tests is that it allows XEmacs to be compiled on a much
+wider variety of platforms than those that the XEmacs developers happen
+to be familiar with, including various sorts of hybrid platforms.  This
+is especially important now that many operating systems give you a great
+deal of control over exactly what features you want installed, and allow
+for easy upgrading of parts of a system without upgrading the rest.  It
+would be impossible to pre-determine and pre-specify the information for
+all possible configurations.
+In fact, the @file{s/} and @file{m/} files are basically @emph{evil},
+since they contain unmaintainable platform-specific hard-coded
+information.  XEmacs has been moving in the direction of having all
+system-specific information be determined dynamically by
+@file{configure}.  Perhaps someday we can @code{rm -rf src/s src/m}.
+When configure is done running, it generates @file{Makefile}s and
+@file{GNUmakefile}s and the file @file{src/config.h} (which describes
+the features of your system) from template files.  You then run
+@file{make}, which compiles the auxiliary code and programs in
+@file{lib-src/} and @file{lwlib/} and the main XEmacs executable in
+@file{src/}.  The result of compiling and linking is an executable
+called @file{temacs}, which is @emph{not} the final XEmacs executable.
+@file{temacs} by itself is not intended to function as an editor or even
+display any windows on the screen, and if you simply run it, it will
+exit immediately.  The @file{Makefile} runs @file{temacs} with certain
+options that cause it to initialize itself, read in a number of basic
+Lisp files, and then dump itself out into a new executable called
+@file{xemacs}.  This new executable has been pre-initialized and
+contains pre-digested Lisp code that is necessary for the editor to
+function (this includes most basic editing functions,
+e.g. @code{kill-line}, that can be defined in terms of other Lisp
+primitives; some initialization code that is called when certain
+objects, such as frames, are created; and all of the standard
+keybindings and code for the actions they result in).  This executable,
+@file{xemacs}, is the executable that you run to use the XEmacs editor.
+Although @file{temacs} is not intended to be run as an editor, it can,
+by using the incantation @code{temacs -batch -l loadup.el run-temacs}.
+This is useful when the dumping procedure described above is broken, or
+when using certain program debugging tools such as Purify.  These tools
+get mighty confused by the tricks played by the XEmacs build process,
+such as allocation memory in one process, and freeing it in the next.
+@node XEmacs From the Inside, The XEmacs Object System (Abstractly Speaking), XEmacs From the Perspective of Building, Top
+@chapter XEmacs From the Inside
+Internally, XEmacs is quite complex, and can be very confusing.  To
+simplify things, it can be useful to think of XEmacs as containing an
+event loop that ``drives'' everything, and a number of other subsystems,
+such as a Lisp engine and a redisplay mechanism.  Each of these other
+subsystems exists simultaneously in XEmacs, and each has a certain
+state.  The flow of control continually passes in and out of these
+different subsystems in the course of normal operation of the editor.
+It is important to keep in mind that, most of the time, the editor is
+``driven'' by the event loop.  Except during initialization and batch
+mode, all subsystems are entered directly or indirectly through the
+event loop, and ultimately, control exits out of all subsystems back up
+to the event loop.  This cycle of entering a subsystem, exiting back out
+to the event loop, and starting another iteration of the event loop
+occurs once each keystroke, mouse motion, etc.
+If you're trying to understand a particular subsystem (other than the
+event loop), think of it as a ``daemon'' process or ``servant'' that is
+responsible for one particular aspect of a larger system, and
+periodically receives commands or environment changes that cause it to
+do something.  Ultimately, these commands and environment changes are
+always triggered by the event loop.  For example:
+@itemize @bullet
+@item
+The window and frame mechanism is responsible for keeping track of what
+windows and frames exist, what buffers are in them, etc.  It is
+periodically given commands (usually from the user) to make a change to
+the current window/frame state: i.e. create a new frame, delete a
+window, etc.
+@item
+The buffer mechanism is responsible for keeping track of what buffers
+exist and what text is in them.  It is periodically given commands
+(usually from the user) to insert or delete text, create a buffer, etc.
+When it receives a text-change command, it notifies the redisplay
+mechanism.
+@item
+The redisplay mechanism is responsible for making sure that windows and
+frames are displayed correctly.  It is periodically told (by the event
+loop) to actually ``do its job'', i.e. snoop around and see what the
+current state of the environment (mostly of the currently-existing
+windows, frames, and buffers) is, and make sure that that state matches
+what's actually displayed.  It keeps lots and lots of information around
+(such as what is actually being displayed currently, and what the
+environment was last time it checked) so that it can minimize the work
+it has to do.  It is also helped along in that whenever a relevant
+change to the environment occurs, the redisplay mechanism is told about
+this, so it has a pretty good idea of where it has to look to find
+possible changes and doesn't have to look everywhere.
+@item
+The Lisp engine is responsible for executing the Lisp code in which most
+user commands are written.  It is entered through a call to @code{eval}
+or @code{funcall}, which occurs as a result of dispatching an event from
+the event loop.  The functions it calls issue commands to the buffer
+mechanism, the window/frame subsystem, etc.
+@item
+The Lisp allocation subsystem is responsible for keeping track of Lisp
+objects.  It is given commands from the Lisp engine to allocate objects,
+garbage collect, etc.
+@end itemize
+etc.
+The important idea here is that there are a number of independent
+subsystems each with its own responsibility and persistent state, just
+like different employees in a company, and each subsystem is
+periodically given commands from other subsystems.  Commands can flow
+from any one subsystem to any other, but there is usually some sort of
+hierarchy, with all commands originating from the event subsystem.
+XEmacs is entered in @code{main()}, which is in @file{emacs.c}.  When
+this is called the first time (in a properly-invoked @file{temacs}), it
+does the following:
+@enumerate
+@item
+It does some very basic environment initializations, such as determining
+where it and its directories (e.g. @file{lisp/} and @file{etc/}) reside
+and setting up signal handlers.
+@item
+It initializes the entire Lisp interpreter.
+@item
+It sets the initial values of many built-in variables (including many
+variables that are visible to Lisp programs), such as the global keymap
+object and the built-in faces (a face is an object that describes the
+display characteristics of text).  This involves creating Lisp objects
+and thus is dependent on step (2).
+@item
+It performs various other initializations that are relevant to the
+particular environment it is running in, such as retrieving environment
+variables, determining the current date and the user who is running the
+program, examining its standard input, creating any necessary file
+descriptors, etc.
+@item
+At this point, the C initialization is complete.  A Lisp program that
+was specified on the command line (usually @file{loadup.el}) is called
+(temacs is normally invoked as @code{temacs -batch -l loadup.el dump}).
+@file{loadup.el} loads all of the other Lisp files that are needed for
+the operation of the editor, calls the @code{dump-emacs} function to
+write out @file{xemacs}, and then kills the temacs process.
+@end enumerate
+When @file{xemacs} is then run, it only redoes steps (1) and (4)
+above; all variables already contain the values they were set to when
+the executable was dumped, and all memory that was allocated with
+@code{malloc()} is still around. (XEmacs knows whether it is being run
+as @file{xemacs} or @file{temacs} because it sets the global variable
+@code{initialized} to 1 after step (4) above.) At this point,
+@file{xemacs} calls a Lisp function to do any further initialization,
+which includes parsing the command-line (the C code can only do limited
+command-line parsing, which includes looking for the @samp{-batch} and
+@samp{-l} flags and a few other flags that it needs to know about before
+initialization is complete), creating the first frame (or @dfn{window}
+in standard window-system parlance), running the user's init file
+(usually the file @file{.emacs} in the user's home directory), etc.  The
+function to do this is usually called @code{normal-top-level};
+@file{loadup.el} tells the C code about this function by setting its
+name as the value of the Lisp variable @code{top-level}.
+When the Lisp initialization code is done, the C code enters the event
+loop, and stays there for the duration of the XEmacs process.  The code
+for the event loop is contained in @file{keyboard.c}, and is called
+@code{Fcommand_loop_1()}.  Note that this event loop could very well be
+written in Lisp, and in fact a Lisp version exists; but apparently,
+doing this makes XEmacs run noticeably slower.
+Notice how much of the initialization is done in Lisp, not in C.
+In general, XEmacs tries to move as much code as is possible
+into Lisp.  Code that remains in C is code that implements the
+Lisp interpreter itself, or code that needs to be very fast, or
+code that needs to do system calls or other such stuff that
+needs to be done in C, or code that needs to have access to
+``forbidden'' structures. (One conscious aspect of the design of
+Lisp under XEmacs is a clean separation between the external
+interface to a Lisp object's functionality and its internal
+implementation.  Part of this design is that Lisp programs
+are forbidden from accessing the contents of the object other
+than through using a standard API.  In this respect, XEmacs Lisp
+is similar to modern Lisp dialects but differs from GNU Emacs,
+which tends to expose the implementation and allow Lisp
+programs to look at it directly.  The major advantage of
+hiding the implementation is that it allows the implementation
+to be redesigned without affecting any Lisp programs, including
+those that might want to be ``clever'' by looking directly at
+the object's contents and possibly manipulating them.)
+Moving code into Lisp makes the code easier to debug and maintain and
+makes it much easier for people who are not XEmacs developers to
+customize XEmacs, because they can make a change with much less chance
+of obscure and unwanted interactions occurring than if they were to
+change the C code.
+@node The XEmacs Object System (Abstractly Speaking), How Lisp Objects Are Represented in C, XEmacs From the Inside, Top
+@chapter The XEmacs Object System (Abstractly Speaking)
+At the heart of the Lisp interpreter is its management of objects.
+XEmacs Lisp contains many built-in objects, some of which are
+simple and others of which can be very complex; and some of which
+are very common, and others of which are rarely used or are only
+used internally. (Since the Lisp allocation system, with its
+automatic reclamation of unused storage, is so much more convenient
+than @code{malloc()} and @code{free()}, the C code makes extensive use of it
+in its internal operations.)
+The basic Lisp objects are
+@table @code
+@item integer
+28 or 31 bits of precision, or 60 or 63 bits on 64-bit machines; the
+reason for this is described below when the internal Lisp object
+representation is described.
+@item float
+Same precision as a double in C.
+@item cons
+A simple container for two Lisp objects, used to implement lists and
+most other data structures in Lisp.
+@item char
+An object representing a single character of text; chars behave like
+integers in many ways but are logically considered text rather than
+numbers and have a different read syntax. (the read syntax for a char
+contains the char itself or some textual encoding of it -- for example,
+a Japanese Kanji character might be encoded as @samp{^[$(B#&^[(B} using the
+ISO-2022 encoding standard -- rather than the numerical representation
+of the char; this way, if the mapping between chars and integers
+changes, which is quite possible for Kanji characters and other extended
+characters, the same character will still be created.  Note that some
+primitives confuse chars and integers.  The worst culprit is @code{eq},
+which makes a special exception and considers a char to be @code{eq} to
+its integer equivalent, even though in no other case are objects of two
+different types @code{eq}.  The reason for this monstrosity is
+compatibility with existing code; the separation of char from integer
+came fairly recently.)
+@item symbol
+An object that contains Lisp objects and is referred to by name;
+symbols are used to implement variables and named functions
+and to provide the equivalent of preprocessor constants in C.
+@item vector
+A one-dimensional array of Lisp objects providing constant-time access
+to any of the objects; access to an arbitrary object in a vector is
+faster than for lists, but the operations that can be done on a vector
+are more limited.
+@item string
+Self-explanatory; behaves much like a vector of chars
+but has a different read syntax and is stored and manipulated
+more compactly.
+@item bit-vector
+A vector of bits; similar to a string in spirit.
+@item compiled-function
+An object containing compiled Lisp code, known as @dfn{byte code}.
+@item subr
+A Lisp primitive, i.e. a Lisp-callable function implemented in C.
+@end table
+@cindex closure
+Note that there is no basic ``function'' type, as in more powerful
+versions of Lisp (where it's called a @dfn{closure}).  XEmacs Lisp does
+not provide the closure semantics implemented by Common Lisp and Scheme.
+The guts of a function in XEmacs Lisp are represented in one of four
+ways: a symbol specifying another function (when one function is an
+alias for another), a list (whose first element must be the symbol
+@code{lambda}) containing the function's source code, a
+compiled-function object, or a subr object. (In other words, given a
+symbol specifying the name of a function, calling @code{symbol-function}
+to retrieve the contents of the symbol's function cell will return one
+of these types of objects.)
+XEmacs Lisp also contains numerous specialized objects used to implement
+the editor:
+@table @code
+@item buffer
+Stores text like a string, but is optimized for insertion and deletion
+and has certain other properties that can be set.
+@item frame
+An object with various properties whose displayable representation is a
+@dfn{window} in window-system parlance.
+@item window
+A section of a frame that displays the contents of a buffer;
+often called a @dfn{pane} in window-system parlance.
+@item window-configuration
+An object that represents a saved configuration of windows in a frame.
+@item device
+An object representing a screen on which frames can be displayed;
+equivalent to a @dfn{display} in the X Window System and a @dfn{TTY} in
+character mode.
+@item face
+An object specifying the appearance of text or graphics; it has
+properties such as font, foreground color, and background color.
+@item marker
+An object that refers to a particular position in a buffer and moves
+around as text is inserted and deleted to stay in the same relative
+position to the text around it.
+@item extent
+Similar to a marker but covers a range of text in a buffer; can also
+specify properties of the text, such as a face in which the text is to
+be displayed, whether the text is invisible or unmodifiable, etc.
+@item event
+Generated by calling @code{next-event} and contains information
+describing a particular event happening in the system, such as the user
+pressing a key or a process terminating.
+@item keymap
+An object that maps from events (described using lists, vectors, and
+symbols rather than with an event object because the mapping is for
+classes of events, rather than individual events) to functions to
+execute or other events to recursively look up; the functions are
+described by name, using a symbol, or using lists to specify the
+function's code.
+@item glyph
+An object that describes the appearance of an image (e.g.  pixmap) on
+the screen; glyphs can be attached to the beginning or end of extents
+and in some future version of XEmacs will be able to be inserted
+directly into a buffer.
+@item process
+An object that describes a connection to an externally-running process.
+@end table
+There are some other, less-commonly-encountered general objects:
+@table @code
+@item hash-table
+An object that maps from an arbitrary Lisp object to another arbitrary
+Lisp object, using hashing for fast lookup.
+@item obarray
+A limited form of hash-table that maps from strings to symbols; obarrays
+are used to look up a symbol given its name and are not actually their
+own object type but are kludgily represented using vectors with hidden
+fields (this representation derives from GNU Emacs).
+@item specifier
+A complex object used to specify the value of a display property; a
+default value is given and different values can be specified for
+particular frames, buffers, windows, devices, or classes of device.
+@item char-table
+An object that maps from chars or classes of chars to arbitrary Lisp
+objects; internally char tables use a complex nested-vector
+representation that is optimized to the way characters are represented
+as integers.
+@item range-table
+An object that maps from ranges of integers to arbitrary Lisp objects.
+@end table
+And some strange special-purpose objects:
+@table @code
+@item charset
+@itemx coding-system
+Objects used when MULE, or multi-lingual/Asian-language, support is
+enabled.
+@item color-instance
+@itemx font-instance
+@itemx image-instance
+An object that encapsulates a window-system resource; instances are
+mostly used internally but are exposed on the Lisp level for cleanness
+of the specifier model and because it's occasionally useful for Lisp
+program to create or query the properties of instances.
+@item subwindow
+An object that encapsulate a @dfn{subwindow} resource, i.e. a
+window-system child window that is drawn into by an external process;
+this object should be integrated into the glyph system but isn't yet,
+and may change form when this is done.
+@item tooltalk-message
+@itemx tooltalk-pattern
+Objects that represent resources used in the ToolTalk interprocess
+communication protocol.
+@item toolbar-button
+An object used in conjunction with the toolbar.
+@end table
+And objects that are only used internally:
+@table @code
+@item opaque
+A generic object for encapsulating arbitrary memory; this allows you the
+generality of @code{malloc()} and the convenience of the Lisp object
+system.
+@item lstream
+A buffering I/O stream, used to provide a unified interface to anything
+that can accept output or provide input, such as a file descriptor, a
+stdio stream, a chunk of memory, a Lisp buffer, a Lisp string, etc.;
+it's a Lisp object to make its memory management more convenient.
+@item char-table-entry
+Subsidiary objects in the internal char-table representation.
+@item extent-auxiliary
+@itemx menubar-data
+@itemx toolbar-data
+Various special-purpose objects that are basically just used to
+encapsulate memory for particular subsystems, similar to the more
+general ``opaque'' object.
+@item symbol-value-forward
+@itemx symbol-value-buffer-local
+@itemx symbol-value-varalias
+@itemx symbol-value-lisp-magic
+Special internal-only objects that are placed in the value cell of a
+symbol to indicate that there is something special with this variable --
+e.g. it has no value, it mirrors another variable, or it mirrors some C
+variable; there is really only one kind of object, called a
+@dfn{symbol-value-magic}, but it is sort-of halfway kludged into
+semi-different object types.
+@end table
+@cindex permanent objects
+@cindex temporary objects
+Some types of objects are @dfn{permanent}, meaning that once created,
+they do not disappear until explicitly destroyed, using a function such
+as @code{delete-buffer}, @code{delete-window}, @code{delete-frame}, etc.
+Others will disappear once they are not longer used, through the garbage
+collection mechanism.  Buffers, frames, windows, devices, and processes
+are among the objects that are permanent.  Note that some objects can go
+both ways: Faces can be created either way; extents are normally
+permanent, but detached extents (extents not referring to any text, as
+happens to some extents when the text they are referring to is deleted)
+are temporary.  Note that some permanent objects, such as faces and
+coding systems, cannot be deleted.  Note also that windows are unique in
+that they can be @emph{undeleted} after having previously been
+deleted. (This happens as a result of restoring a window configuration.)
+@cindex read syntax
+Note that many types of objects have a @dfn{read syntax}, i.e. a way of
+specifying an object of that type in Lisp code.  When you load a Lisp
+file, or type in code to be evaluated, what really happens is that the
+function @code{read} is called, which reads some text and creates an object
+based on the syntax of that text; then @code{eval} is called, which
+possibly does something special; then this loop repeats until there's
+no more text to read. (@code{eval} only actually does something special
+with symbols, which causes the symbol's value to be returned,
+similar to referencing a variable; and with conses [i.e. lists],
+which cause a function invocation.  All other values are returned
+unchanged.)
+The read syntax
+@example
+17297
+@end example
+converts to an integer whose value is 17297.
+@example
+1.983e-4
+@end example
+converts to a float whose value is 1.983e-4, or .0001983.
+@example
+?b
+@end example
+converts to a char that represents the lowercase letter b.
+@example
+?^[$(B#&^[(B
+@end example
+(where @samp{^[} actually is an @samp{ESC} character) converts to a
+particular Kanji character when using an ISO2022-based coding system for
+input. (To decode this goo: @samp{ESC} begins an escape sequence;
+@samp{ESC $ (} is a class of escape sequences meaning ``switch to a
+94x94 character set''; @samp{ESC $ ( B} means ``switch to Japanese
+Kanji''; @samp{#} and @samp{&} collectively index into a 94-by-94 array
+of characters [subtract 33 from the ASCII value of each character to get
+the corresponding index]; @samp{ESC (} is a class of escape sequences
+meaning ``switch to a 94 character set''; @samp{ESC (B} means ``switch
+to US ASCII''.  It is a coincidence that the letter @samp{B} is used to
+denote both Japanese Kanji and US ASCII.  If the first @samp{B} were
+replaced with an @samp{A}, you'd be requesting a Chinese Hanzi character
+from the GB2312 character set.)
+@example
+"foobar"
+@end example
+converts to a string.
+@example
+foobar
+@end example
+converts to a symbol whose name is @code{"foobar"}.  This is done by
+looking up the string equivalent in the global variable
+@code{obarray}, whose contents should be an obarray.  If no symbol
+is found, a new symbol with the name @code{"foobar"} is automatically
+created and added to @code{obarray}; this process is called
+@dfn{interning} the symbol.
+@cindex interning
+@example
+(foo . bar)
+@end example
+converts to a cons cell containing the symbols @code{foo} and @code{bar}.
+@example
+(1 a 2.5)
+@end example
+converts to a three-element list containing the specified objects
+(note that a list is actually a set of nested conses; see the
+XEmacs Lisp Reference).
+@example
+[1 a 2.5]
+@end example
+converts to a three-element vector containing the specified objects.
+@example
+#[... ... ... ...]
+@end example
+converts to a compiled-function object (the actual contents are not
+shown since they are not relevant here; look at a file that ends with
+@file{.elc} for examples).
+@example
+#*01110110
+@end example
+converts to a bit-vector.
+@example
+#s(hash-table ... ...)
+@end example
+converts to a hash table (the actual contents are not shown).
+@example
+#s(range-table ... ...)
+@end example
+converts to a range table (the actual contents are not shown).
+@example
+#s(char-table ... ...)
+@end example
+converts to a char table (the actual contents are not shown).
+Note that the @code{#s()} syntax is the general syntax for structures,
+which are not really implemented in XEmacs Lisp but should be.
+When an object is printed out (using @code{print} or a related
+function), the read syntax is used, so that the same object can be read
+in again.
+The other objects do not have read syntaxes, usually because it does not
+really make sense to create them in this fashion (i.e.  processes, where
+it doesn't make sense to have a subprocess created as a side effect of
+reading some Lisp code), or because they can't be created at all
+(e.g. subrs).  Permanent objects, as a rule, do not have a read syntax;
+nor do most complex objects, which contain too much state to be easily
+initialized through a read syntax.
+@node How Lisp Objects Are Represented in C, Rules When Writing New C Code, The XEmacs Object System (Abstractly Speaking), Top
+@chapter How Lisp Objects Are Represented in C
+Lisp objects are represented in C using a 32-bit or 64-bit machine word
+(depending on the processor; i.e. DEC Alphas use 64-bit Lisp objects and
+most other processors use 32-bit Lisp objects).  The representation
+stuffs a pointer together with a tag, as follows:
+@example
+[ 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 ]
+[ 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 ]
+<---> ^ <------------------------------------------------------>
+tag  |       a pointer to a structure, or an integer
+|
+mark bit
+@end example
+The tag describes the type of the Lisp object.  For integers and chars,
+the lower 28 bits contain the value of the integer or char; for all
+others, the lower 28 bits contain a pointer.  The mark bit is used
+during garbage-collection, and is always 0 when garbage collection is
+not happening. (The way that garbage collection works, basically, is that it
+loops over all places where Lisp objects could exist -- this includes
+all global variables in C that contain Lisp objects [including
+@code{Vobarray}, the C equivalent of @code{obarray}; through this, all
+Lisp variables will get marked], plus various other places -- and
+recursively scans through the Lisp objects, marking each object it finds
+by setting the mark bit.  Then it goes through the lists of all objects
+allocated, freeing the ones that are not marked and turning off the mark
+bit of the ones that are marked.)
+Lisp objects use the typedef @code{Lisp_Object}, but the actual C type
+used for the Lisp object can vary.  It can be either a simple type
+(@code{long} on the DEC Alpha, @code{int} on other machines) or a
+structure whose fields are bit fields that line up properly (actually, a
+union of structures is used).  Generally the simple integral type is
+preferable because it ensures that the compiler will actually use a
+machine word to represent the object (some compilers will use more
+general and less efficient code for unions and structs even if they can
+fit in a machine word).  The union type, however, has the advantage of
+stricter type checking (if you accidentally pass an integer where a Lisp
+object is desired, you get a compile error), and it makes it easier to
+decode Lisp objects when debugging.  The choice of which type to use is
+determined by the preprocessor constant @code{USE_UNION_TYPE} which is
+defined via the @code{--use-union-type} option to @code{configure}.
+@cindex record type
+Note that there are only eight types that the tag can represent, but
+many more actual types than this.  This is handled by having one of the
+tag types specify a meta-type called a @dfn{record}; for all such
+objects, the first four bytes of the pointed-to structure indicate what
+the actual type is.
+Note also that having 28 bits for pointers and integers restricts a lot
+of things to 256 megabytes of memory. (Basically, enough pointers and
+indices and whatnot get stuffed into Lisp objects that the total amount
+of memory used by XEmacs can't grow above 256 megabytes.  In older
+versions of XEmacs and GNU Emacs, the tag was 5 bits wide, allowing for
+32 types, which was more than the actual number of types that existed at
+the time, and no ``record'' type was necessary.  However, this limited
+the editor to 64 megabytes total, which some users who edited large
+files might conceivably exceed.)
+Also, note that there is an implicit assumption here that all pointers
+are low enough that the top bits are all zero and can just be chopped
+off.  On standard machines that allocate memory from the bottom up (and
+give each process its own address space), this works fine.  Some
+machines, however, put the data space somewhere else in memory
+(e.g. beginning at 0x80000000).  Those machines cope by defining
+@code{DATA_SEG_BITS} in the corresponding @file{m/} or @file{s/} file to
+the proper mask.  Then, pointers retrieved from Lisp objects are
+automatically OR'ed with this value prior to being used.
+A corollary of the previous paragraph is that @strong{(pointers to)
+stack-allocated structures cannot be put into Lisp objects}.  The stack
+is generally located near the top of memory; if you put such a pointer
+into a Lisp object, it will get its top bits chopped off, and you will
+lose.
+Actually, there's an alternative representation of a @code{Lisp_Object},
+invented by Kyle Jones, that is used when the
+@code{--use-minimal-tagbits} option to @code{configure} is used.  In
+this case the 2 lower bits are used for the tag bits.  This
+representation assumes that pointers to structs are always aligned to
+multiples of 4, so the lower 2 bits are always zero.
+@example
+[ 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 ]
+[ 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 ]
+<---------------------------------------------------------> <->
+a pointer to a structure, or an integer            tag
+@end example
+A tag of 00 is used for all pointer object types, a tag of 10 is used
+for characters, and the other two tags 01 and 11 are joined together to
+form the integer object type.  The markbit is moved to part of the
+structure being pointed at (integers and chars do not need to be marked,
+since no memory is allocated).  This representation has these
+advantages:
+@enumerate
+@item
+31 bits can be used for Lisp Integers.
+@item
+@emph{Any} pointer can be represented directly, and no bit masking
+operations are necessary.
+@end enumerate
+The disadvantages are:
+@enumerate
+@item
+An extra level of indirection is needed when accessing the object types
+that were not record types.  So checking whether a Lisp object is a cons
+cell becomes a slower operation.
+@item
+Mark bits can no longer be stored directly in Lisp objects, so another
+place for them must be found.  This means that a cons cell requires more
+memory than merely room for 2 lisp objects, leading to extra memory use.
+@end enumerate
+Various macros are used to construct Lisp objects and extract the
+components.  Macros of the form @code{XINT()}, @code{XCHAR()},
+@code{XSTRING()}, @code{XSYMBOL()}, etc. mask out the pointer/integer
+field and cast it to the appropriate type.  All of the macros that
+construct pointers will @code{OR} with @code{DATA_SEG_BITS} if
+necessary.  @code{XINT()} needs to be a bit tricky so that negative
+numbers are properly sign-extended: Usually it does this by shifting the
+number four bits to the left and then four bits to the right.  This
+assumes that the right-shift operator does an arithmetic shift (i.e. it
+leaves the most-significant bit as-is rather than shifting in a zero, so
+that it mimics a divide-by-two even for negative numbers).  Not all
+machines/compilers do this, and on the ones that don't, a more
+complicated definition is selected by defining
+@code{EXPLICIT_SIGN_EXTEND}.
+Note that when @code{ERROR_CHECK_TYPECHECK} is defined, the extractor
+macros become more complicated -- they check the tag bits and/or the
+type field in the first four bytes of a record type to ensure that the
+object is really of the correct type.  This is great for catching places
+where an incorrect type is being dereferenced -- this typically results
+in a pointer being dereferenced as the wrong type of structure, with
+unpredictable (and sometimes not easily traceable) results.
+There are similar @code{XSET@var{TYPE}()} macros that construct a Lisp
+object.  These macros are of the form @code{XSET@var{TYPE}
+(@var{lvalue}, @var{result})},
+i.e. they have to be a statement rather than just used in an expression.
+The reason for this is that standard C doesn't let you ``construct'' a
+structure (but GCC does).  Granted, this sometimes isn't too convenient;
+for the case of integers, at least, you can use the function
+@code{make_int()}, which constructs and @emph{returns} an integer
+Lisp object.  Note that the @code{XSET@var{TYPE}()} macros are also
+affected by @code{ERROR_CHECK_TYPECHECK} and make sure that the
+structure is of the right type in the case of record types, where the
+type is contained in the structure.
+The C programmer is responsible for @strong{guaranteeing} that a
+Lisp_Object is is the correct type before using the @code{X@var{TYPE}}
+macros.  This is especially important in the case of lists.  Use
+@code{XCAR} and @code{XCDR} if a Lisp_Object is certainly a cons cell,
+else use @code{Fcar()} and @code{Fcdr()}.  Trust other C code, but not
+Lisp code.  On the other hand, if XEmacs has an internal logic error,
+it's better to crash immediately, so sprinkle ``unreachable''
+@code{abort()}s liberally about the source code.
+@node Rules When Writing New C Code, A Summary of the Various XEmacs Modules, How Lisp Objects Are Represented in C, Top
+@chapter Rules When Writing New C Code
+The XEmacs C Code is extremely complex and intricate, and there are many
+rules that are more or less consistently followed throughout the code.
+Many of these rules are not obvious, so they are explained here.  It is
+of the utmost importance that you follow them.  If you don't, you may
+get something that appears to work, but which will crash in odd
+situations, often in code far away from where the actual breakage is.
+@menu
+* General Coding Rules::
+* Writing Lisp Primitives::
+* Adding Global Lisp Variables::
+* Coding for Mule::
+* Techniques for XEmacs Developers::
+@end menu
+@node General Coding Rules
+@section General Coding Rules
+The C code is actually written in a dialect of C called @dfn{Clean C},
+meaning that it can be compiled, mostly warning-free, with either a C or
+C++ compiler.  Coding in Clean C has several advantages over plain C.
+C++ compilers are more nit-picking, and a number of coding errors have
+been found by compiling with C++.  The ability to use both C and C++
+tools means that a greater variety of development tools are available to
+the developer.
+Almost every module contains a @code{syms_of_*()} function and a
+@code{vars_of_*()} function.  The former declares any Lisp primitives
+you have defined and defines any symbols you will be using.  The latter
+declares any global Lisp variables you have added and initializes global
+C variables in the module.  For each such function, declare it in
+@file{symsinit.h} and make sure it's called in the appropriate place in
+@file{emacs.c}.  @strong{Important}: There are stringent requirements on
+exactly what can go into these functions.  See the comment in
+@file{emacs.c}.  The reason for this is to avoid obscure unwanted
+interactions during initialization.  If you don't follow these rules,
+you'll be sorry!  If you want to do anything that isn't allowed, create
+a @code{complex_vars_of_*()} function for it.  Doing this is tricky,
+though: You have to make sure your function is called at the right time
+so that all the initialization dependencies work out.
+Every module includes @file{<config.h>} (angle brackets so that
+@samp{--srcdir} works correctly; @file{config.h} may or may not be in
+the same directory as the C sources) and @file{lisp.h}.  @file{config.h}
+must always be included before any other header files (including
+system header files) to ensure that certain tricks played by various
+@file{s/} and @file{m/} files work out correctly.
+@strong{All global and static variables that are to be modifiable must
+be declared uninitialized.}  This means that you may not use the
+``declare with initializer'' form for these variables, such as @code{int
+some_variable = 0;}.  The reason for this has to do with some kludges
+done during the dumping process: If possible, the initialized data
+segment is re-mapped so that it becomes part of the (unmodifiable) code
+segment in the dumped executable.  This allows this memory to be shared
+among multiple running XEmacs processes.  XEmacs is careful to place as
+much constant data as possible into initialized variables (in
+particular, into what's called the @dfn{pure space} -- see below) during
+the @file{temacs} phase.
+@cindex copy-on-write
+@strong{Please note:} This kludge only works on a few systems nowadays,
+and is rapidly becoming irrelevant because most modern operating systems
+provide @dfn{copy-on-write} semantics.  All data is initially shared
+between processes, and a private copy is automatically made (on a
+page-by-page basis) when a process first attempts to write to a page of
+memory.
+Formerly, there was a requirement that static variables not be declared
+inside of functions.  This had to do with another hack along the same
+vein as what was just described: old USG systems put statically-declared
+variables in the initialized data space, so those header files had a
+@code{#define static} declaration. (That way, the data-segment remapping
+described above could still work.) This fails badly on static variables
+inside of functions, which suddenly become automatic variables;
+therefore, you weren't supposed to have any of them.  This awful kludge
+has been removed in XEmacs because
+@enumerate
+@item
+almost all of the systems that used this kludge ended up having
+to disable the data-segment remapping anyway;
+@item
+the only systems that didn't were extremely outdated ones;
+@item
+this hack completely messed up inline functions.
+@end enumerate
+The C source code makes heavy use of C preprocessor macros.  One popular
+macro style is:
+@example
+#define FOO(var, value) do @{		\
+Lisp_Object FOO_value = (value);	\
+... /* compute using FOO_value */	\
+(var) = bar;				\
+@} while (0)
+@end example
+The @code{do @{...@} while (0)} is a standard trick to allow FOO to have
+statement semantics, so that it can safely be used within an @code{if}
+statement in C, for example.  Multiple evaluation is prevented by
+copying a supplied argument into a local variable, so that
+@code{FOO(var,fun(1))} only calls @code{fun} once.
+Lisp lists are popular data structures in the C code as well as in
+Elisp.  There are two sets of macros that iterate over lists.
+@code{EXTERNAL_LIST_LOOP_@var{n}} should be used when the list has been
+supplied by the user, and cannot be trusted to be acyclic and
+nil-terminated.  A @code{malformed-list} or @code{circular-list} error
+will be generated if the list being iterated over is not entirely
+kosher.  @code{LIST_LOOP_@var{n}}, on the other hand, is faster and less
+safe, and can be used only on trusted lists.
+Related macros are @code{GET_EXTERNAL_LIST_LENGTH} and
+@code{GET_LIST_LENGTH}, which calculate the length of a list, and in the
+case of @code{GET_EXTERNAL_LIST_LENGTH}, validating the properness of
+the list.  The macros @code{EXTERNAL_LIST_LOOP_DELETE_IF} and
+@code{LIST_LOOP_DELETE_IF} delete elements from a lisp list satisfying some
+predicate.
+@node Writing Lisp Primitives
+@section Writing Lisp Primitives
+Lisp primitives are Lisp functions implemented in C.  The details of
+interfacing the C function so that Lisp can call it are handled by a few
+C macros.  The only way to really understand how to write new C code is
+to read the source, but we can explain some things here.
+An example of a special form is the definition of @code{prog1}, from
+@file{eval.c}.  (An ordinary function would have the same general
+appearance.)
+@cindex garbage collection protection
+@smallexample
+@group
+DEFUN ("prog1", Fprog1, 1, UNEVALLED, 0, /*
+Similar to `progn', but the value of the first form is returned.
+\(prog1 FIRST BODY...): All the arguments are evaluated sequentially.
+The value of FIRST is saved during evaluation of the remaining args,
+whose values are discarded.
+*/
+(args))
+@{
+/* This function can GC */
+REGISTER Lisp_Object val, form, tail;
+struct gcpro gcpro1;
+val = Feval (XCAR (args));
+GCPRO1 (val);
+LIST_LOOP_3 (form, XCDR (args), tail)
+Feval (form);
+UNGCPRO;
+return val;
+@}
+@end group
+@end smallexample
+Let's start with a precise explanation of the arguments to the
+@code{DEFUN} macro.  Here is a template for them:
+@example
+@group
+DEFUN (@var{lname}, @var{fname}, @var{min_args}, @var{max_args}, @var{interactive}, /*
+@var{docstring}
+*/
+(@var{arglist}))
+@end group
+@end example
+@table @var
+@item lname
+This string is the name of the Lisp symbol to define as the function
+name; in the example above, it is @code{"prog1"}.
+@item fname
+This is the C function name for this function.  This is the name that is
+used in C code for calling the function.  The name is, by convention,
+@samp{F} prepended to the Lisp name, with all dashes (@samp{-}) in the
+Lisp name changed to underscores.  Thus, to call this function from C
+code, call @code{Fprog1}.  Remember that the arguments are of type
+@code{Lisp_Object}; various macros and functions for creating values of
+type @code{Lisp_Object} are declared in the file @file{lisp.h}.
+Primitives whose names are special characters (e.g. @code{+} or
+@code{<}) are named by spelling out, in some fashion, the special
+character: e.g. @code{Fplus()} or @code{Flss()}.  Primitives whose names
+begin with normal alphanumeric characters but also contain special
+characters are spelled out in some creative way, e.g. @code{let*}
+becomes @code{FletX()}.
+Each function also has an associated structure that holds the data for
+the subr object that represents the function in Lisp.  This structure
+conveys the Lisp symbol name to the initialization routine that will
+create the symbol and store the subr object as its definition.  The C
+variable name of this structure is always @samp{S} prepended to the
+@var{fname}.  You hardly ever need to be aware of the existence of this
+structure, since @code{DEFUN} plus @code{DEFSUBR} takes care of all the
+details.
+@item min_args
+This is the minimum number of arguments that the function requires.  The
+function @code{prog1} allows a minimum of one argument.
+@item max_args
+This is the maximum number of arguments that the function accepts, if
+there is a fixed maximum.  Alternatively, it can be @code{UNEVALLED},
+indicating a special form that receives unevaluated arguments, or
+@code{MANY}, indicating an unlimited number of evaluated arguments (the
+C equivalent of @code{&rest}).  Both @code{UNEVALLED} and @code{MANY}
+are macros.  If @var{max_args} is a number, it may not be less than
+@var{min_args} and it may not be greater than 8. (If you need to add a
+function with more than 8 arguments, use the @code{MANY} form.  Resist
+the urge to edit the definition of @code{DEFUN} in @file{lisp.h}.  If
+you do it anyways, make sure to also add another clause to the switch
+statement in @code{primitive_funcall().})
+@item interactive
+This is an interactive specification, a string such as might be used as
+the argument of @code{interactive} in a Lisp function.  In the case of
+@code{prog1}, it is 0 (a null pointer), indicating that @code{prog1}
+cannot be called interactively.  A value of @code{""} indicates a
+function that should receive no arguments when called interactively.
+@item docstring
+This is the documentation string.  It is written just like a
+documentation string for a function defined in Lisp; in particular, the
+first line should be a single sentence.  Note how the documentation
+string is enclosed in a comment, none of the documentation is placed on
+the same lines as the comment-start and comment-end characters, and the
+comment-start characters are on the same line as the interactive
+specification.  @file{make-docfile}, which scans the C files for
+documentation strings, is very particular about what it looks for, and
+will not properly extract the doc string if it's not in this exact format.
+In order to make both @file{etags} and @file{make-docfile} happy, make
+sure that the @code{DEFUN} line contains the @var{lname} and
+@var{fname}, and that the comment-start characters for the doc string
+are on the same line as the interactive specification, and put a newline
+directly after them (and before the comment-end characters).
+@item arglist
+This is the comma-separated list of arguments to the C function.  For a
+function with a fixed maximum number of arguments, provide a C argument
+for each Lisp argument.  In this case, unlike regular C functions, the
+types of the arguments are not declared; they are simply always of type
+@code{Lisp_Object}.
+The names of the C arguments will be used as the names of the arguments
+to the Lisp primitive as displayed in its documentation, modulo the same
+concerns described above for @code{F...} names (in particular,
+underscores in the C arguments become dashes in the Lisp arguments).
+There is one additional kludge: A trailing `_' on the C argument is
+discarded when forming the Lisp argument.  This allows C language
+reserved words (like @code{default}) or global symbols (like
+@code{dirname}) to be used as argument names without compiler warnings
+or errors.
+A Lisp function with @w{@var{max_args} = @code{UNEVALLED}} is a
+@w{@dfn{special form}}; its arguments are not evaluated.  Instead it
+receives one argument of type @code{Lisp_Object}, a (Lisp) list of the
+unevaluated arguments, conventionally named @code{(args)}.
+When a Lisp function has no upper limit on the number of arguments,
+specify @w{@var{max_args} = @code{MANY}}.  In this case its implementation in
+C actually receives exactly two arguments: the number of Lisp arguments
+(an @code{int}) and the address of a block containing their values (a
+@w{@code{Lisp_Object *}}).  In this case only are the C types specified
+in the @var{arglist}: @w{@code{(int nargs, Lisp_Object *args)}}.
+@end table
+Within the function @code{Fprog1} itself, note the use of the macros
+@code{GCPRO1} and @code{UNGCPRO}.  @code{GCPRO1} is used to ``protect''
+a variable from garbage collection---to inform the garbage collector
+that it must look in that variable and regard the object pointed at by
+its contents as an accessible object.  This is necessary whenever you
+call @code{Feval} or anything that can directly or indirectly call
+@code{Feval} (this includes the @code{QUIT} macro!).  At such a time,
+any Lisp object that you intend to refer to again must be protected
+somehow.  @code{UNGCPRO} cancels the protection of the variables that
+are protected in the current function.  It is necessary to do this
+explicitly.
+The macro @code{GCPRO1} protects just one local variable.  If you want
+to protect two, use @code{GCPRO2} instead; repeating @code{GCPRO1} will
+not work.  Macros @code{GCPRO3} and @code{GCPRO4} also exist.
+These macros implicitly use local variables such as @code{gcpro1}; you
+must declare these explicitly, with type @code{struct gcpro}.  Thus, if
+you use @code{GCPRO2}, you must declare @code{gcpro1} and @code{gcpro2}.
+@cindex caller-protects (@code{GCPRO} rule)
+Note also that the general rule is @dfn{caller-protects}; i.e. you are
+only responsible for protecting those Lisp objects that you create.  Any
+objects passed to you as arguments should have been protected by whoever
+created them, so you don't in general have to protect them.
+In particular, the arguments to any Lisp primitive are always
+automatically @code{GCPRO}ed, when called ``normally'' from Lisp code or
+bytecode.  So only a few Lisp primitives that are called frequently from
+C code, such as @code{Fprogn} protect their arguments as a service to
+their caller.  You don't need to protect your arguments when writing a
+new @code{DEFUN}.
+@code{GCPRO}ing is perhaps the trickiest and most error-prone part of
+XEmacs coding.  It is @strong{extremely} important that you get this
+right and use a great deal of discipline when writing this code.
+@xref{GCPROing, ,@code{GCPRO}ing}, for full details on how to do this.
+What @code{DEFUN} actually does is declare a global structure of type
+@code{Lisp_Subr} whose name begins with capital @samp{SF} and which
+contains information about the primitive (e.g. a pointer to the
+function, its minimum and maximum allowed arguments, a string describing
+its Lisp name); @code{DEFUN} then begins a normal C function declaration
+using the @code{F...} name.  The Lisp subr object that is the function
+definition of a primitive (i.e. the object in the function slot of the
+symbol that names the primitive) actually points to this @samp{SF}
+structure; when @code{Feval} encounters a subr, it looks in the
+structure to find out how to call the C function.
+Defining the C function is not enough to make a Lisp primitive
+available; you must also create the Lisp symbol for the primitive (the
+symbol is @dfn{interned}; @pxref{Obarrays}) and store a suitable subr
+object in its function cell. (If you don't do this, the primitive won't
+be seen by Lisp code.) The code looks like this:
+@example
+DEFSUBR (@var{fname});
+@end example
+@noindent
+Here @var{fname} is the same name you used as the second argument to
+@code{DEFUN}.
+This call to @code{DEFSUBR} should go in the @code{syms_of_*()} function
+at the end of the module.  If no such function exists, create it and
+make sure to also declare it in @file{symsinit.h} and call it from the
+appropriate spot in @code{main()}.  @xref{General Coding Rules}.
+Note that C code cannot call functions by name unless they are defined
+in C.  The way to call a function written in Lisp from C is to use
+@code{Ffuncall}, which embodies the Lisp function @code{funcall}.  Since
+the Lisp function @code{funcall} accepts an unlimited number of
+arguments, in C it takes two: the number of Lisp-level arguments, and a
+one-dimensional array containing their values.  The first Lisp-level
+argument is the Lisp function to call, and the rest are the arguments to
+pass to it.  Since @code{Ffuncall} can call the evaluator, you must
+protect pointers from garbage collection around the call to
+@code{Ffuncall}. (However, @code{Ffuncall} explicitly protects all of
+its parameters, so you don't have to protect any pointers passed as
+parameters to it.)
+The C functions @code{call0}, @code{call1}, @code{call2}, and so on,
+provide handy ways to call a Lisp function conveniently with a fixed
+number of arguments.  They work by calling @code{Ffuncall}.
+@file{eval.c} is a very good file to look through for examples;
+@file{lisp.h} contains the definitions for important macros and
+functions.
+@node Adding Global Lisp Variables
+@section Adding Global Lisp Variables
+Global variables whose names begin with @samp{Q} are constants whose
+value is a symbol of a particular name.  The name of the variable should
+be derived from the name of the symbol using the same rules as for Lisp
+primitives.  These variables are initialized using a call to
+@code{defsymbol()} in the @code{syms_of_*()} function. (This call
+interns a symbol, sets the C variable to the resulting Lisp object, and
+calls @code{staticpro()} on the C variable to tell the
+garbage-collection mechanism about this variable.  What
+@code{staticpro()} does is add a pointer to the variable to a large
+global array; when garbage-collection happens, all pointers listed in
+the array are used as starting points for marking Lisp objects.  This is
+important because it's quite possible that the only current reference to
+the object is the C variable.  In the case of symbols, the
+@code{staticpro()} doesn't matter all that much because the symbol is
+contained in @code{obarray}, which is itself @code{staticpro()}ed.
+However, it's possible that a naughty user could do something like
+uninterning the symbol out of @code{obarray} or even setting
+@code{obarray} to a different value [although this is likely to make
+XEmacs crash!].)
+@strong{Please note:} It is potentially deadly if you declare a
+@samp{Q...}  variable in two different modules.  The two calls to
+@code{defsymbol()} are no problem, but some linkers will complain about
+multiply-defined symbols.  The most insidious aspect of this is that
+often the link will succeed anyway, but then the resulting executable
+will sometimes crash in obscure ways during certain operations!  To
+avoid this problem, declare any symbols with common names (such as
+@code{text}) that are not obviously associated with this particular
+module in the module @file{general.c}.
+Global variables whose names begin with @samp{V} are variables that
+contain Lisp objects.  The convention here is that all global variables
+of type @code{Lisp_Object} begin with @samp{V}, and all others don't
+(including integer and boolean variables that have Lisp
+equivalents). Most of the time, these variables have equivalents in
+Lisp, but some don't.  Those that do are declared this way by a call to
+@code{DEFVAR_LISP()} in the @code{vars_of_*()} initializer for the
+module.  What this does is create a special @dfn{symbol-value-forward}
+Lisp object that contains a pointer to the C variable, intern a symbol
+whose name is as specified in the call to @code{DEFVAR_LISP()}, and set
+its value to the symbol-value-forward Lisp object; it also calls
+@code{staticpro()} on the C variable to tell the garbage-collection
+mechanism about the variable.  When @code{eval} (or actually
+@code{symbol-value}) encounters this special object in the process of
+retrieving a variable's value, it follows the indirection to the C
+variable and gets its value.  @code{setq} does similar things so that
+the C variable gets changed.
+Whether or not you @code{DEFVAR_LISP()} a variable, you need to
+initialize it in the @code{vars_of_*()} function; otherwise it will end
+up as all zeroes, which is the integer 0 (@emph{not} @code{nil}), and
+this is probably not what you want.  Also, if the variable is not
+@code{DEFVAR_LISP()}ed, @strong{you must call} @code{staticpro()} on the
+C variable in the @code{vars_of_*()} function.  Otherwise, the
+garbage-collection mechanism won't know that the object in this variable
+is in use, and will happily collect it and reuse its storage for another
+Lisp object, and you will be the one who's unhappy when you can't figure
+out how your variable got overwritten.
+@node Coding for Mule
+@section Coding for Mule
+@cindex Coding for Mule
+Although Mule support is not compiled by default in XEmacs, many people
+are using it, and we consider it crucial that new code works correctly
+with multibyte characters.  This is not hard; it is only a matter of
+following several simple user-interface guidelines.  Even if you never
+compile with Mule, with a little practice you will find it quite easy
+to code Mule-correctly.
+Note that these guidelines are not necessarily tied to the current Mule
+implementation; they are also a good idea to follow on the grounds of
+code generalization for future I18N work.
+@menu
+* Character-Related Data Types::
+* Working With Character and Byte Positions::
+* Conversion to and from External Data::
+* General Guidelines for Writing Mule-Aware Code::
+* An Example of Mule-Aware Code::
+@end menu
+@node Character-Related Data Types
+@subsection Character-Related Data Types
+First, let's review the basic character-related datatypes used by
+XEmacs.  Note that the separate @code{typedef}s are not mandatory in the
+current implementation (all of them boil down to @code{unsigned char} or
+@code{int}), but they improve clarity of code a great deal, because one
+glance at the declaration can tell the intended use of the variable.
+@table @code
+@item Emchar
+@cindex Emchar
+An @code{Emchar} holds a single Emacs character.
+Obviously, the equality between characters and bytes is lost in the Mule
+world.  Characters can be represented by one or more bytes in the
+buffer, and @code{Emchar} is the C type large enough to hold any
+character.
+Without Mule support, an @code{Emchar} is equivalent to an
+@code{unsigned char}.
+@item Bufbyte
+@cindex Bufbyte
+The data representing the text in a buffer or string is logically a set
+of @code{Bufbyte}s.
+XEmacs does not work with character formats all the time; when reading
+characters from the outside, it decodes them to an internal format, and
+likewise encodes them when writing.  @code{Bufbyte} (in fact
+@code{unsigned char}) is the basic unit of XEmacs internal buffers and
+strings format.
+One character can correspond to one or more @code{Bufbyte}s.  In the
+current implementation, an ASCII character is represented by the same
+@code{Bufbyte}, and extended characters are represented by a sequence of
+@code{Bufbyte}s.
+Without Mule support, a @code{Bufbyte} is equivalent to an
+@code{Emchar}.
+@item Bufpos
+@itemx Charcount
+@cindex Bufpos
+@cindex Charcount
+A @code{Bufpos} represents a character position in a buffer or string.
+A @code{Charcount} represents a number (count) of characters.
+Logically, subtracting two @code{Bufpos} values yields a
+@code{Charcount} value.  Although all of these are @code{typedef}ed to
+@code{int}, we use them in preference to @code{int} to make it clear
+what sort of position is being used.
+@code{Bufpos} and @code{Charcount} values are the only ones that are
+ever visible to Lisp.
+@item Bytind
+@itemx Bytecount
+@cindex Bytind
+@cindex Bytecount
+A @code{Bytind} represents a byte position in a buffer or string.  A
+@code{Bytecount} represents the distance between two positions in bytes.
+The relationship between @code{Bytind} and @code{Bytecount} is the same
+as the relationship between @code{Bufpos} and @code{Charcount}.
+@item Extbyte
+@itemx Extcount
+@cindex Extbyte
+@cindex Extcount
+When dealing with the outside world, XEmacs works with @code{Extbyte}s,
+which are equivalent to @code{unsigned char}.  Obviously, an
+@code{Extcount} is the distance between two @code{Extbyte}s.  Extbytes
+and Extcounts are not all that frequent in XEmacs code.
+@end table
+@node Working With Character and Byte Positions
+@subsection Working With Character and Byte Positions
+Now that we have defined the basic character-related types, we can look
+at the macros and functions designed for work with them and for
+conversion between them.  Most of these macros are defined in
+@file{buffer.h}, and we don't discuss all of them here, but only the
+most important ones.  Examining the existing code is the best way to
+learn about them.
+@table @code
+@item MAX_EMCHAR_LEN
+@cindex MAX_EMCHAR_LEN
+This preprocessor constant is the maximum number of buffer bytes per
+Emacs character, i.e. the byte length of an @code{Emchar}.  It is useful
+when allocating temporary strings to keep a known number of characters.
+For instance:
+@example
+@group
+@{
+Charcount cclen;
+...
+@{
+/* Allocate place for @var{cclen} characters. */
+Bufbyte *buf = (Bufbyte *)alloca (cclen * MAX_EMCHAR_LEN);
+...
+@end group
+@end example
+If you followed the previous section, you can guess that, logically,
+multiplying a @code{Charcount} value with @code{MAX_EMCHAR_LEN} produces
+a @code{Bytecount} value.
+In the current Mule implementation, @code{MAX_EMCHAR_LEN} equals 4.
+Without Mule, it is 1.
+@item charptr_emchar
+@itemx set_charptr_emchar
+@cindex charptr_emchar
+@cindex set_charptr_emchar
+The @code{charptr_emchar} macro takes a @code{Bufbyte} pointer and
+returns the @code{Emchar} stored at that position.  If it were a
+function, its prototype would be:
+@example
+Emchar charptr_emchar (Bufbyte *p);
+@end example
+@code{set_charptr_emchar} stores an @code{Emchar} to the specified byte
+position.  It returns the number of bytes stored:
+@example
+Bytecount set_charptr_emchar (Bufbyte *p, Emchar c);
+@end example
+It is important to note that @code{set_charptr_emchar} is safe only for
+appending a character at the end of a buffer, not for overwriting a
+character in the middle.  This is because the width of characters
+varies, and @code{set_charptr_emchar} cannot resize the string if it
+writes, say, a two-byte character where a single-byte character used to
+reside.
+A typical use of @code{set_charptr_emchar} can be demonstrated by this
+example, which copies characters from buffer @var{buf} to a temporary
+string of Bufbytes.
+@example
+@group
+@{
+Bufpos pos;
+for (pos = beg; pos < end; pos++)
+@{
+Emchar c = BUF_FETCH_CHAR (buf, pos);
+p += set_charptr_emchar (buf, c);
+@}
+@}
+@end group
+@end example
+Note how @code{set_charptr_emchar} is used to store the @code{Emchar}
+and increment the counter, at the same time.
+@item INC_CHARPTR
+@itemx DEC_CHARPTR
+@cindex INC_CHARPTR
+@cindex DEC_CHARPTR
+These two macros increment and decrement a @code{Bufbyte} pointer,
+respectively.  They will adjust the pointer by the appropriate number of
+bytes according to the byte length of the character stored there.  Both
+macros assume that the memory address is located at the beginning of a
+valid character.
+Without Mule support, @code{INC_CHARPTR (p)} and @code{DEC_CHARPTR (p)}
+simply expand to @code{p++} and @code{p--}, respectively.
+@item bytecount_to_charcount
+@cindex bytecount_to_charcount
+Given a pointer to a text string and a length in bytes, return the
+equivalent length in characters.
+@example
+Charcount bytecount_to_charcount (Bufbyte *p, Bytecount bc);
+@end example
+@item charcount_to_bytecount
+@cindex charcount_to_bytecount
+Given a pointer to a text string and a length in characters, return the
+equivalent length in bytes.
+@example
+Bytecount charcount_to_bytecount (Bufbyte *p, Charcount cc);
+@end example
+@item charptr_n_addr
+@cindex charptr_n_addr
+Return a pointer to the beginning of the character offset @var{cc} (in
+characters) from @var{p}.
+@example
+Bufbyte *charptr_n_addr (Bufbyte *p, Charcount cc);
+@end example
+@end table
+@node Conversion to and from External Data
+@subsection Conversion to and from External Data
+When an external function, such as a C library function, returns a
+@code{char} pointer, you should almost never treat it as @code{Bufbyte}.
+This is because these returned strings may contain 8bit characters which
+can be misinterpreted by XEmacs, and cause a crash.  Likewise, when
+exporting a piece of internal text to the outside world, you should
+always convert it to an appropriate external encoding, lest the internal
+stuff (such as the infamous \201 characters) leak out.
+The interface to conversion between the internal and external
+representations of text are the numerous conversion macros defined in
+@file{buffer.h}.  Before looking at them, we'll look at the external
+formats supported by these macros.
+Currently meaningful formats are @code{FORMAT_BINARY},
+@code{FORMAT_FILENAME}, @code{FORMAT_OS}, and @code{FORMAT_CTEXT}.  Here
+is a description of these.
+@table @code
+@item FORMAT_BINARY
+Binary format.  This is the simplest format and is what we use in the
+absence of a more appropriate format.  This converts according to the
+@code{binary} coding system:
+@enumerate a
+@item
+On input, bytes 0--255 are converted into characters 0--255.
+@item
+On output, characters 0--255 are converted into bytes 0--255 and other
+characters are converted into `X'.
+@end enumerate
+@item FORMAT_FILENAME
+Format used for filenames.  In the original Mule, this is user-definable
+with the @code{pathname-coding-system} variable.  For the moment, we
+just use the @code{binary} coding system.
+@item FORMAT_OS
+Format used for the external Unix environment---@code{argv[]}, stuff
+from @code{getenv()}, stuff from the @file{/etc/passwd} file, etc.
+Perhaps should be the same as FORMAT_FILENAME.
+@item FORMAT_CTEXT
+Compound--text format.  This is the standard X format used for data
+stored in properties, selections, and the like.  This is an 8-bit
+no-lock-shift ISO2022 coding system.
+@end table
+The macros to convert between these formats and the internal format, and
+vice versa, follow.
+@table @code
+@item GET_CHARPTR_INT_DATA_ALLOCA
+@itemx GET_CHARPTR_EXT_DATA_ALLOCA
+These two are the most basic conversion macros.
+@code{GET_CHARPTR_INT_DATA_ALLOCA} converts external data to internal
+format, and @code{GET_CHARPTR_EXT_DATA_ALLOCA} converts the other way
+around.  The arguments each of these receives are @var{ptr} (pointer to
+the text in external format), @var{len} (length of texts in bytes),
+@var{fmt} (format of the external text), @var{ptr_out} (lvalue to which
+new text should be copied), and @var{len_out} (lvalue which will be
+assigned the length of the internal text in bytes).  The resulting text
+is stored to a stack-allocated buffer.  If the text doesn't need
+changing, these macros will do nothing, except for setting
+@var{len_out}.
+The macros above take many arguments which makes them unwieldy.  For
+this reason, a number of convenience macros are defined with obvious
+functionality, but accepting less arguments.  The general rule is that
+macros with @samp{INT} in their name convert text to internal Emacs
+representation, whereas the @samp{EXT} macros convert to external
+representation.
+@item GET_C_CHARPTR_INT_DATA_ALLOCA
+@itemx GET_C_CHARPTR_EXT_DATA_ALLOCA
+As their names imply, these macros work on C char pointers, which are
+zero-terminated, and thus do not need @var{len} or @var{len_out}
+parameters.
+@item GET_STRING_EXT_DATA_ALLOCA
+@itemx GET_C_STRING_EXT_DATA_ALLOCA
+These two macros convert a Lisp string into an external representation.
+The difference between them is that @code{GET_STRING_EXT_DATA_ALLOCA}
+stores its output to a generic string, providing @var{len_out}, the
+length of the resulting external string.  On the other hand,
+@code{GET_C_STRING_EXT_DATA_ALLOCA} assumes that the caller will be
+satisfied with output string being zero-terminated.
+Note that for Lisp strings only one conversion direction makes sense.
+@item GET_C_CHARPTR_EXT_BINARY_DATA_ALLOCA
+@itemx GET_CHARPTR_EXT_BINARY_DATA_ALLOCA
+@itemx GET_STRING_BINARY_DATA_ALLOCA
+@itemx GET_C_STRING_BINARY_DATA_ALLOCA
+@itemx GET_C_CHARPTR_EXT_FILENAME_DATA_ALLOCA
+@itemx ...
+These macros convert internal text to a specific external
+representation, with the external format being encoded into the name of
+the macro.  Note that the @code{GET_STRING_...} and
+@code{GET_C_STRING...}  macros lack the @samp{EXT} tag, because they
+only make sense in that direction.
+@item GET_C_CHARPTR_INT_BINARY_DATA_ALLOCA
+@itemx GET_CHARPTR_INT_BINARY_DATA_ALLOCA
+@itemx GET_C_CHARPTR_INT_FILENAME_DATA_ALLOCA
+@itemx ...
+These macros convert external text of a specific format to its internal
+representation, with the external format being incoded into the name of
+the macro.
+@end table
+@node General Guidelines for Writing Mule-Aware Code
+@subsection General Guidelines for Writing Mule-Aware Code
+This section contains some general guidance on how to write Mule-aware
+code, as well as some pitfalls you should avoid.
+@table @emph
+@item Never use @code{char} and @code{char *}.
+In XEmacs, the use of @code{char} and @code{char *} is almost always a
+mistake.  If you want to manipulate an Emacs character from ``C'', use
+@code{Emchar}.  If you want to examine a specific octet in the internal
+format, use @code{Bufbyte}.  If you want a Lisp-visible character, use a
+@code{Lisp_Object} and @code{make_char}.  If you want a pointer to move
+through the internal text, use @code{Bufbyte *}.  Also note that you
+almost certainly do not need @code{Emchar *}.
+@item Be careful not to confuse @code{Charcount}, @code{Bytecount}, and @code{Bufpos}.
+The whole point of using different types is to avoid confusion about the
+use of certain variables.  Lest this effect be nullified, you need to be
+careful about using the right types.
+@item Always convert external data
+It is extremely important to always convert external data, because
+XEmacs can crash if unexpected 8bit sequences are copied to its internal
+buffers literally.
+This means that when a system function, such as @code{readdir}, returns
+a string, you need to convert it using one of the conversion macros
+described in the previous chapter, before passing it further to Lisp.
+In the case of @code{readdir}, you would use the
+@code{GET_C_CHARPTR_INT_FILENAME_DATA_ALLOCA} macro.
+Also note that many internal functions, such as @code{make_string},
+accept Bufbytes, which removes the need for them to convert the data
+they receive.  This increases efficiency because that way external data
+needs to be decoded only once, when it is read.  After that, it is
+passed around in internal format.
+@end table
+@node An Example of Mule-Aware Code
+@subsection An Example of Mule-Aware Code
+As an example of Mule-aware code, we shall will analyze the
+@code{string} function, which conses up a Lisp string from the character
+arguments it receives.  Here is the definition, pasted from
+@code{alloc.c}:
+@example
+@group
+DEFUN ("string", Fstring, 0, MANY, 0, /*
+Concatenate all the argument characters and make the result a string.
+*/
+(int nargs, Lisp_Object *args))
+@{
+Bufbyte *storage = alloca_array (Bufbyte, nargs * MAX_EMCHAR_LEN);
+Bufbyte *p = storage;
+for (; nargs; nargs--, args++)
+@{
+Lisp_Object lisp_char = *args;
+CHECK_CHAR_COERCE_INT (lisp_char);
+p += set_charptr_emchar (p, XCHAR (lisp_char));
+@}
+return make_string (storage, p - storage);
+@}
+@end group
+@end example
+Now we can analyze the source line by line.
+Obviously, string will be as long as there are arguments to the
+function.  This is why we allocate @code{MAX_EMCHAR_LEN} * @var{nargs}
+bytes on the stack, i.e. the worst-case number of bytes for @var{nargs}
+@code{Emchar}s to fit in the string.
+Then, the loop checks that each element is a character, converting
+integers in the process.  Like many other functions in XEmacs, this
+function silently accepts integers where characters are expected, for
+historical and compatibility reasons.  Unless you know what you are
+doing, @code{CHECK_CHAR} will also suffice.  @code{XCHAR (lisp_char)}
+extracts the @code{Emchar} from the @code{Lisp_Object}, and
+@code{set_charptr_emchar} stores it to storage, increasing @code{p} in
+the process.
+Other instructive examples of correct coding under Mule can be found all
+over the XEmacs code.  For starters, I recommend
+@code{Fnormalize_menu_item_name} in @file{menubar.c}.  After you have
+understood this section of the manual and studied the examples, you can
+proceed writing new Mule-aware code.
+@node Techniques for XEmacs Developers
+@section Techniques for XEmacs Developers
+To make a quantified XEmacs, do: @code{make quantmacs}.
+You simply can't dump Quantified and Purified images.  Run the image
+like so:  @code{quantmacs -batch -l loadup.el run-temacs @var{xemacs-args...}}.
+Before you go through the trouble, are you compiling with all
+debugging and error-checking off?  If not try that first.  Be warned
+that while Quantify is directly responsible for quite a few
+optimizations which have been made to XEmacs, doing a run which
+generates results which can be acted upon is not necessarily a trivial
+task.
+Also, if you're still willing to do some runs make sure you configure
+with the @samp{--quantify} flag.  That will keep Quantify from starting
+to record data until after the loadup is completed and will shut off
+recording right before it shuts down (which generates enough bogus data
+to throw most results off).  It also enables three additional elisp
+commands: @code{quantify-start-recording-data},
+@code{quantify-stop-recording-data} and @code{quantify-clear-data}.
+If you want to make XEmacs faster, target your favorite slow benchmark,
+run a profiler like Quantify, @code{gprof}, or @code{tcov}, and figure
+out where the cycles are going.  Specific projects:
+@itemize @bullet
+@item
+Make the garbage collector faster.  Figure out how to write an
+incremental garbage collector.
+@item
+Write a compiler that takes bytecode and spits out C code.
+Unfortunately, you will then need a C compiler and a more fully
+developed module system.
+@item
+Speed up redisplay.
+@item
+Speed up syntax highlighting.  Maybe moving some of the syntax
+highlighting capabilities into C would make a difference.
+@item
+Implement tail recursion in Emacs Lisp (hard!).
+@end itemize
+Unfortunately, Emacs Lisp is slow, and is going to stay slow.  Function
+calls in elisp are especially expensive.  Iterating over a long list is
+going to be 30 times faster implemented in C than in Elisp.
+To get started debugging XEmacs, take a look at the @file{gdbinit} and
+@file{dbxrc} files in the @file{src} directory.
+@xref{Q2.1.15 - How to Debug an XEmacs problem with a debugger,,,
+xemacs-faq, XEmacs FAQ}.
+After making source code changes, run @code{make check} to ensure that
+you haven't introduced any regressions.  If you're feeling ambitious,
+you can try to improve the test suite in @file{tests/automated}.
+Here are things to know when you create a new source file:
+@itemize @bullet
+@item
+All @file{.c} files should @code{#include <config.h>} first.  Almost all
+@file{.c} files should @code{#include "lisp.h"} second.
+@item
+Generated header files should be included using the @code{#include <...>} syntax,
+not the @code{#include "..."} syntax.  The generated headers are:
+@file{config.h puresize-adjust.h sheap-adjust.h paths.h Emacs.ad.h}
+The basic rule is that you should assume builds using @code{--srcdir}
+and the @code{#include <...>} syntax needs to be used when the
+to-be-included generated file is in a potentially different directory
+@emph{at compile time}.  The non-obvious C rule is that @code{#include "..."}
+means to search for the included file in the same directory as the
+including file, @emph{not} in the current directory.
+@item
+Header files should @emph{not} include @code{<config.h>} and
+@code{"lisp.h"}.  It is the responsibility of the @file{.c} files that
+use it to do so.
+@item
+If the header uses @code{INLINE}, either directly or through
+@code{DECLARE_LRECORD}, then it must be added to @file{inline.c}'s
+includes.
+@item
+Try compiling at least once with
+@example
+gcc --with-mule --with-union-type --error-checking=all
+@end example
+@item
+Did I mention that you should run the test suite?
+@example
+make check
+@end example
+@end itemize
+@node A Summary of the Various XEmacs Modules, Allocation of Objects in XEmacs Lisp, Rules When Writing New C Code, Top
+@chapter A Summary of the Various XEmacs Modules
+This is accurate as of XEmacs 20.0.
+@menu
+* Low-Level Modules::
+* Basic Lisp Modules::
+* Modules for Standard Editing Operations::
+* Editor-Level Control Flow Modules::
+* Modules for the Basic Displayable Lisp Objects::
+* Modules for other Display-Related Lisp Objects::
+* Modules for the Redisplay Mechanism::
+* Modules for Interfacing with the File System::
+* Modules for Other Aspects of the Lisp Interpreter and Object System::
+* Modules for Interfacing with the Operating System::
+* Modules for Interfacing with X Windows::
+* Modules for Internationalization::
+@end menu
+@node Low-Level Modules
+@section Low-Level Modules
+@example
+config.h
+@end example
+This is automatically generated from @file{config.h.in} based on the
+results of configure tests and user-selected optional features and
+contains preprocessor definitions specifying the nature of the
+environment in which XEmacs is being compiled.
+@example
+paths.h
+@end example
+This is automatically generated from @file{paths.h.in} based on supplied
+configure values, and allows for non-standard installed configurations
+of the XEmacs directories.  It's currently broken, though.
+@example
+emacs.c
+signal.c
+@end example
+@file{emacs.c} contains @code{main()} and other code that performs the most
+basic environment initializations and handles shutting down the XEmacs
+process (this includes @code{kill-emacs}, the normal way that XEmacs is
+exited; @code{dump-emacs}, which is used during the build process to
+write out the XEmacs executable; @code{run-emacs-from-temacs}, which can
+be used to start XEmacs directly when temacs has finished loading all
+the Lisp code; and emergency code to handle crashes [XEmacs tries to
+auto-save all files before it crashes]).
+Low-level code that directly interacts with the Unix signal mechanism,
+however, is in @file{signal.c}.  Note that this code does not handle system
+dependencies in interfacing to signals; that is handled using the
+@file{syssignal.h} header file, described in section J below.
+@example
+unexaix.c
+unexalpha.c
+unexapollo.c
+unexconvex.c
+unexec.c
+unexelf.c
+unexelfsgi.c
+unexencap.c
+unexenix.c
+unexfreebsd.c
+unexfx2800.c
+unexhp9k3.c
+unexhp9k800.c
+unexmips.c
+unexnext.c
+unexsol2.c
+unexsunos4.c
+@end example
+These modules contain code dumping out the XEmacs executable on various
+different systems. (This process is highly machine-specific and
+requires intimate knowledge of the executable format and the memory map
+of the process.) Only one of these modules is actually used; this is
+chosen by @file{configure}.
+@example
+crt0.c
+lastfile.c
+pre-crt0.c
+@end example
+These modules are used in conjunction with the dump mechanism.  On some
+systems, an alternative version of the C startup code (the actual code
+that receives control from the operating system when the process is
+started, and which calls @code{main()}) is required so that the dumping
+process works properly; @file{crt0.c} provides this.
+@file{pre-crt0.c} and @file{lastfile.c} should be the very first and
+very last file linked, respectively. (Actually, this is not really true.
+@file{lastfile.c} should be after all Emacs modules whose initialized
+data should be made constant, and before all other Emacs files and all
+libraries.  In particular, the allocation modules @file{gmalloc.c},
+@file{alloca.c}, etc. are normally placed past @file{lastfile.c}, and
+all of the files that implement Xt widget classes @emph{must} be placed
+after @file{lastfile.c} because they contain various structures that
+must be statically initialized and into which Xt writes at various
+times.) @file{pre-crt0.c} and @file{lastfile.c} contain exported symbols
+that are used to determine the start and end of XEmacs' initialized
+data space when dumping.
+@example
+alloca.c
+free-hook.c
+getpagesize.h
+gmalloc.c
+malloc.c
+mem-limits.h
+ralloc.c
+vm-limit.c
+@end example
+These handle basic C allocation of memory.  @file{alloca.c} is an emulation of
+the stack allocation function @code{alloca()} on machines that lack
+this. (XEmacs makes extensive use of @code{alloca()} in its code.)
+@file{gmalloc.c} and @file{malloc.c} are two implementations of the standard C
+functions @code{malloc()}, @code{realloc()} and @code{free()}.  They are
+often used in place of the standard system-provided @code{malloc()}
+because they usually provide a much faster implementation, at the
+expense of additional memory use.  @file{gmalloc.c} is a newer implementation
+that is much more memory-efficient for large allocations than @file{malloc.c},
+and should always be preferred if it works. (At one point, @file{gmalloc.c}
+didn't work on some systems where @file{malloc.c} worked; but this should be
+fixed now.)
+@cindex relocating allocator
+@file{ralloc.c} is the @dfn{relocating allocator}.  It provides
+functions similar to @code{malloc()}, @code{realloc()} and @code{free()}
+that allocate memory that can be dynamically relocated in memory.  The
+advantage of this is that allocated memory can be shuffled around to
+place all the free memory at the end of the heap, and the heap can then
+be shrunk, releasing the memory back to the operating system.  The use
+of this can be controlled with the configure option @code{--rel-alloc};
+if enabled, memory allocated for buffers will be relocatable, so that if
+a very large file is visited and the buffer is later killed, the memory
+can be released to the operating system.  (The disadvantage of this
+mechanism is that it can be very slow.  On systems with the
+@code{mmap()} system call, the XEmacs version of @file{ralloc.c} uses
+this to move memory around without actually having to block-copy it,
+which can speed things up; but it can still cause noticeable performance
+degradation.)
+@file{free-hook.c} contains some debugging functions for checking for invalid
+arguments to @code{free()}.
+@file{vm-limit.c} contains some functions that warn the user when memory is
+getting low.  These are callback functions that are called by @file{gmalloc.c}
+and @file{malloc.c} at appropriate times.
+@file{getpagesize.h} provides a uniform interface for retrieving the size of a
+page in virtual memory.  @file{mem-limits.h} provides a uniform interface for
+retrieving the total amount of available virtual memory.  Both are
+similar in spirit to the @file{sys*.h} files described in section J, below.
+@example
+blocktype.c
+blocktype.h
+dynarr.c
+@end example
+These implement a couple of basic C data types to facilitate memory
+allocation.  The @code{Blocktype} type efficiently manages the
+allocation of fixed-size blocks by minimizing the number of times that
+@code{malloc()} and @code{free()} are called.  It allocates memory in
+large chunks, subdivides the chunks into blocks of the proper size, and
+returns the blocks as requested.  When blocks are freed, they are placed
+onto a linked list, so they can be efficiently reused.  This data type
+is not much used in XEmacs currently, because it's a fairly new
+addition.
+@cindex dynamic array
+The @code{Dynarr} type implements a @dfn{dynamic array}, which is
+similar to a standard C array but has no fixed limit on the number of
+elements it can contain.  Dynamic arrays can hold elements of any type,
+and when you add a new element, the array automatically resizes itself
+if it isn't big enough.  Dynarrs are extensively used in the redisplay
+mechanism.
+@example
+inline.c
+@end example
+This module is used in connection with inline functions (available in
+some compilers).  Often, inline functions need to have a corresponding
+non-inline function that does the same thing.  This module is where they
+reside.  It contains no actual code, but defines some special flags that
+cause inline functions defined in header files to be rendered as actual
+functions.  It then includes all header files that contain any inline
+function definitions, so that each one gets a real function equivalent.
+@example
+debug.c
+debug.h
+@end example
+These functions provide a system for doing internal consistency checks
+during code development.  This system is not currently used; instead the
+simpler @code{assert()} macro is used along with the various checks
+provided by the @samp{--error-check-*} configuration options.
+@example
+prefix-args.c
+@end example
+This is actually the source for a small, self-contained program
+used during building.
+@example
+universe.h
+@end example
+This is not currently used.
+@node Basic Lisp Modules
+@section Basic Lisp Modules
+@example
+emacsfns.h
+lisp-disunion.h
+lisp-union.h
+lisp.h
+lrecord.h
+symsinit.h
+@end example
+These are the basic header files for all XEmacs modules.  Each module
+includes @file{lisp.h}, which brings the other header files in.
+@file{lisp.h} contains the definitions of the structures and extractor
+and constructor macros for the basic Lisp objects and various other
+basic definitions for the Lisp environment, as well as some
+general-purpose definitions (e.g. @code{min()} and @code{max()}).
+@file{lisp.h} includes either @file{lisp-disunion.h} or
+@file{lisp-union.h}, depending on whether @code{USE_UNION_TYPE} is
+defined.  These files define the typedef of the Lisp object itself (as
+described above) and the low-level macros that hide the actual
+implementation of the Lisp object.  All extractor and constructor macros
+for particular types of Lisp objects are defined in terms of these
+low-level macros.
+As a general rule, all typedefs should go into the typedefs section of
+@file{lisp.h} rather than into a module-specific header file even if the
+structure is defined elsewhere.  This allows function prototypes that
+use the typedef to be placed into other header files.  Forward structure
+declarations (i.e. a simple declaration like @code{struct foo;} where
+the structure itself is defined elsewhere) should be placed into the
+typedefs section as necessary.
+@file{lrecord.h} contains the basic structures and macros that implement
+all record-type Lisp objects -- i.e. all objects whose type is a field
+in their C structure, which includes all objects except the few most
+basic ones.
+@file{lisp.h} contains prototypes for most of the exported functions in
+the various modules.  Lisp primitives defined using @code{DEFUN} that
+need to be called by C code should be declared using @code{EXFUN}.
+Other function prototypes should be placed either into the appropriate
+section of @code{lisp.h}, or into a module-specific header file,
+depending on how general-purpose the function is and whether it has
+special-purpose argument types requiring definitions not in
+@file{lisp.h}.)  All initialization functions are prototyped in
+@file{symsinit.h}.
+@example
+alloc.c
+pure.c
+puresize.h
+@end example
+The large module @file{alloc.c} implements all of the basic allocation and
+garbage collection for Lisp objects.  The most commonly used Lisp
+objects are allocated in chunks, similar to the Blocktype data type
+described above; others are allocated in individually @code{malloc()}ed
+blocks.  This module provides the foundation on which all other aspects
+of the Lisp environment sit, and is the first module initialized at
+startup.
+Note that @file{alloc.c} provides a series of generic functions that are
+not dependent on any particular object type, and interfaces to
+particular types of objects using a standardized interface of
+type-specific methods.  This scheme is a fundamental principle of
+object-oriented programming and is heavily used throughout XEmacs.  The
+great advantage of this is that it allows for a clean separation of
+functionality into different modules -- new classes of Lisp objects, new
+event interfaces, new device types, new stream interfaces, etc. can be
+added transparently without affecting code anywhere else in XEmacs.
+Because the different subsystems are divided into general and specific
+code, adding a new subtype within a subsystem will in general not
+require changes to the generic subsystem code or affect any of the other
+subtypes in the subsystem; this provides a great deal of robustness to
+the XEmacs code.
+@cindex pure space
+@file{pure.c} contains the declaration of the @dfn{purespace} array.
+Pure space is a hack used to place some constant Lisp data into the code
+segment of the XEmacs executable, even though the data needs to be
+initialized through function calls.  (See above in section VIII for more
+info about this.)  During startup, certain sorts of data is
+automatically copied into pure space, and other data is copied manually
+in some of the basic Lisp files by calling the function @code{purecopy},
+which copies the object if possible (this only works in temacs, of
+course) and returns the new object.  In particular, while temacs is
+executing, the Lisp reader automatically copies all compiled-function
+objects that it reads into pure space.  Since compiled-function objects
+are large, are never modified, and typically comprise the majority of
+the contents of a compiled-Lisp file, this works well.  While XEmacs is
+running, any attempt to modify an object that resides in pure space
+causes an error.  Objects in pure space are never garbage collected --
+almost all of the time, they're intended to be permanent, and in any
+case you can't write into pure space to set the mark bits.
+@file{puresize.h} contains the declaration of the size of the pure space
+array.  This depends on the optional features that are compiled in, any
+extra purespace requested by the user at compile time, and certain other
+factors (e.g. 64-bit machines need more pure space because their Lisp
+objects are larger).  The smallest size that suffices should be used, so
+that there's no wasted space.  If there's not enough pure space, you
+will get an error during the build process, specifying how much more
+pure space is needed.
+@example
+eval.c
+backtrace.h
+@end example
+This module contains all of the functions to handle the flow of control.
+This includes the mechanisms of defining functions, calling functions,
+traversing stack frames, and binding variables; the control primitives
+and other special forms such as @code{while}, @code{if}, @code{eval},
+@code{let}, @code{and}, @code{or}, @code{progn}, etc.; handling of
+non-local exits, unwind-protects, and exception handlers; entering the
+debugger; methods for the subr Lisp object type; etc.  It does
+@emph{not} include the @code{read} function, the @code{print} function,
+or the handling of symbols and obarrays.
+@file{backtrace.h} contains some structures related to stack frames and the
+flow of control.
+@example
+lread.c
+@end example
+This module implements the Lisp reader and the @code{read} function,
+which converts text into Lisp objects, according to the read syntax of
+the objects, as described above.  This is similar to the parser that is
+a part of all compilers.
+@example
+print.c
+@end example
+This module implements the Lisp print mechanism and the @code{print}
+function and related functions.  This is the inverse of the Lisp reader
+-- it converts Lisp objects to a printed, textual representation.
+(Hopefully something that can be read back in using @code{read} to get
+an equivalent object.)
+@example
+general.c
+symbols.c
+symeval.h
+@end example
+@file{symbols.c} implements the handling of symbols, obarrays, and
+retrieving the values of symbols.  Much of the code is devoted to
+handling the special @dfn{symbol-value-magic} objects that define
+special types of variables -- this includes buffer-local variables,
+variable aliases, variables that forward into C variables, etc.  This
+module is initialized extremely early (right after @file{alloc.c}),
+because it is here that the basic symbols @code{t} and @code{nil} are
+created, and those symbols are used everywhere throughout XEmacs.
+@file{symeval.h} contains the definitions of symbol structures and the
+@code{DEFVAR_LISP()} and related macros for declaring variables.
+@example
+data.c
+floatfns.c
+fns.c
+@end example
+These modules implement the methods and standard Lisp primitives for all
+the basic Lisp object types other than symbols (which are described
+above).  @file{data.c} contains all the predicates (primitives that return
+whether an object is of a particular type); the integer arithmetic
+functions; and the basic accessor and mutator primitives for the various
+object types.  @file{fns.c} contains all the standard predicates for working
+with sequences (where, abstractly speaking, a sequence is an ordered set
+of objects, and can be represented by a list, string, vector, or
+bit-vector); it also contains @code{equal}, perhaps on the grounds that
+bulk of the operation of @code{equal} is comparing sequences.
+@file{floatfns.c} contains methods and primitives for floats and floating-point
+arithmetic.
+@example
+bytecode.c
+bytecode.h
+@end example
+@file{bytecode.c} implements the byte-code interpreter and
+compiled-function objects, and @file{bytecode.h} contains associated
+structures.  Note that the byte-code @emph{compiler} is written in Lisp.
+@node Modules for Standard Editing Operations
+@section Modules for Standard Editing Operations
+@example
+buffer.c
+buffer.h
+bufslots.h
+@end example
+@file{buffer.c} implements the @dfn{buffer} Lisp object type.  This
+includes functions that create and destroy buffers; retrieve buffers by
+name or by other properties; manipulate lists of buffers (remember that
+buffers are permanent objects and stored in various ordered lists);
+retrieve or change buffer properties; etc.  It also contains the
+definitions of all the built-in buffer-local variables (which can be
+viewed as buffer properties).  It does @emph{not} contain code to
+manipulate buffer-local variables (that's in @file{symbols.c}, described
+above); or code to manipulate the text in a buffer.
+@file{buffer.h} defines the structures associated with a buffer and the various
+macros for retrieving text from a buffer and special buffer positions
+(e.g. @code{point}, the default location for text insertion).  It also
+contains macros for working with buffer positions and converting between
+their representations as character offsets and as byte offsets (under
+MULE, they are different, because characters can be multi-byte).  It is
+one of the largest header files.
+@file{bufslots.h} defines the fields in the buffer structure that correspond to
+the built-in buffer-local variables.  It is its own header file because
+it is included many times in @file{buffer.c}, as a way of iterating over all
+the built-in buffer-local variables.
+@example
+insdel.c
+insdel.h
+@end example
+@file{insdel.c} contains low-level functions for inserting and deleting text in
+a buffer, keeping track of changed regions for use by redisplay, and
+calling any before-change and after-change functions that may have been
+registered for the buffer.  It also contains the actual functions that
+convert between byte offsets and character offsets.
+@file{insdel.h} contains associated headers.
+@example
+marker.c
+@end example
+This module implements the @dfn{marker} Lisp object type, which
+conceptually is a pointer to a text position in a buffer that moves
+around as text is inserted and deleted, so as to remain in the same
+relative position.  This module doesn't actually move the markers around
+-- that's handled in @file{insdel.c}.  This module just creates them and
+implements the primitives for working with them.  As markers are simple
+objects, this does not entail much.
+Note that the standard arithmetic primitives (e.g. @code{+}) accept
+markers in place of integers and automatically substitute the value of
+@code{marker-position} for the marker, i.e. an integer describing the
+current buffer position of the marker.
+@example
+extents.c
+extents.h
+@end example
+This module implements the @dfn{extent} Lisp object type, which is like
+a marker that works over a range of text rather than a single position.
+Extents are also much more complex and powerful than markers and have a
+more efficient (and more algorithmically complex) implementation.  The
+implementation is described in detail in comments in @file{extents.c}.
+The code in @file{extents.c} works closely with @file{insdel.c} so that
+extents are properly moved around as text is inserted and deleted.
+There is also code in @file{extents.c} that provides information needed
+by the redisplay mechanism for efficient operation. (Remember that
+extents can have display properties that affect [sometimes drastically,
+as in the @code{invisible} property] the display of the text they
+cover.)
+@example
+editfns.c
+@end example
+@file{editfns.c} contains the standard Lisp primitives for working with
+a buffer's text, and calls the low-level functions in @file{insdel.c}.
+It also contains primitives for working with @code{point} (the default
+buffer insertion location).
+@file{editfns.c} also contains functions for retrieving various
+characteristics from the external environment: the current time, the
+process ID of the running XEmacs process, the name of the user who ran
+this XEmacs process, etc.  It's not clear why this code is in
+@file{editfns.c}.
+@example
+callint.c
+cmds.c
+commands.h
+@end example
+@cindex interactive
+These modules implement the basic @dfn{interactive} commands,
+i.e. user-callable functions.  Commands, as opposed to other functions,
+have special ways of getting their parameters interactively (by querying
+the user), as opposed to having them passed in a normal function
+invocation.  Many commands are not really meant to be called from other
+Lisp functions, because they modify global state in a way that's often
+undesired as part of other Lisp functions.
+@file{callint.c} implements the mechanism for querying the user for
+parameters and calling interactive commands.  The bulk of this module is
+code that parses the interactive spec that is supplied with an
+interactive command.
+@file{cmds.c} implements the basic, most commonly used editing commands:
+commands to move around the current buffer and insert and delete
+characters.  These commands are implemented using the Lisp primitives
+defined in @file{editfns.c}.
+@file{commands.h} contains associated structure definitions and prototypes.
+@example
+regex.c
+regex.h
+search.c
+@end example
+@file{search.c} implements the Lisp primitives for searching for text in
+a buffer, and some of the low-level algorithms for doing this.  In
+particular, the fast fixed-string Boyer-Moore search algorithm is
+implemented in @file{search.c}.  The low-level algorithms for doing
+regular-expression searching, however, are implemented in @file{regex.c}
+and @file{regex.h}.  These two modules are largely independent of
+XEmacs, and are similar to (and based upon) the regular-expression
+routines used in @file{grep} and other GNU utilities.
+@example
+doprnt.c
+@end example
+@file{doprnt.c} implements formatted-string processing, similar to
+@code{printf()} command in C.
+@example
+undo.c
+@end example
+This module implements the undo mechanism for tracking buffer changes.
+Most of this could be implemented in Lisp.
+@node Editor-Level Control Flow Modules
+@section Editor-Level Control Flow Modules
+@example
+event-Xt.c
+event-stream.c
+event-tty.c
+events.c
+events.h
+@end example
+These implement the handling of events (user input and other system
+notifications).
+@file{events.c} and @file{events.h} define the @dfn{event} Lisp object
+type and primitives for manipulating it.
+@file{event-stream.c} implements the basic functions for working with
+event queues, dispatching an event by looking it up in relevant keymaps
+and such, and handling timeouts; this includes the primitives
+@code{next-event} and @code{dispatch-event}, as well as related
+primitives such as @code{sit-for}, @code{sleep-for}, and
+@code{accept-process-output}. (@file{event-stream.c} is one of the
+hairiest and trickiest modules in XEmacs.  Beware!  You can easily mess
+things up here.)
+@file{event-Xt.c} and @file{event-tty.c} implement the low-level
+interfaces onto retrieving events from Xt (the X toolkit) and from TTY's
+(using @code{read()} and @code{select()}), respectively.  The event
+interface enforces a clean separation between the specific code for
+interfacing with the operating system and the generic code for working
+with events, by defining an API of basic, low-level event methods;
+@file{event-Xt.c} and @file{event-tty.c} are two different
+implementations of this API.  To add support for a new operating system
+(e.g. NeXTstep), one merely needs to provide another implementation of
+those API functions.
+Note that the choice of whether to use @file{event-Xt.c} or
+@file{event-tty.c} is made at compile time!  Or at the very latest, it
+is made at startup time.  @file{event-Xt.c} handles events for
+@emph{both} X and TTY frames; @file{event-tty.c} is only used when X
+support is not compiled into XEmacs.  The reason for this is that there
+is only one event loop in XEmacs: thus, it needs to be able to receive
+events from all different kinds of frames.
+@example
+keymap.c
+keymap.h
+@end example
+@file{keymap.c} and @file{keymap.h} define the @dfn{keymap} Lisp object
+type and associated methods and primitives. (Remember that keymaps are
+objects that associate event descriptions with functions to be called to
+``execute'' those events; @code{dispatch-event} looks up events in the
+relevant keymaps.)
+@example
+keyboard.c
+@end example
+@file{keyboard.c} contains functions that implement the actual editor
+command loop -- i.e. the event loop that cyclically retrieves and
+dispatches events.  This code is also rather tricky, just like
+@file{event-stream.c}.
+@example
+macros.c
+macros.h
+@end example
+These two modules contain the basic code for defining keyboard macros.
+These functions don't actually do much; most of the code that handles keyboard
+macros is mixed in with the event-handling code in @file{event-stream.c}.
+@example
+minibuf.c
+@end example
+This contains some miscellaneous code related to the minibuffer (most of
+the minibuffer code was moved into Lisp by Richard Mlynarik).  This
+includes the primitives for completion (although filename completion is
+in @file{dired.c}), the lowest-level interface to the minibuffer (if the
+command loop were cleaned up, this too could be in Lisp), and code for
+dealing with the echo area (this, too, was mostly moved into Lisp, and
+the only code remaining is code to call out to Lisp or provide simple
+bootstrapping implementations early in temacs, before the echo-area Lisp
+code is loaded).
+@node Modules for the Basic Displayable Lisp Objects
+@section Modules for the Basic Displayable Lisp Objects
+@example
+device-ns.h
+device-stream.c
+device-stream.h
+device-tty.c
+device-tty.h
+device-x.c
+device-x.h
+device.c
+device.h
+@end example
+These modules implement the @dfn{device} Lisp object type.  This
+abstracts a particular screen or connection on which frames are
+displayed.  As with Lisp objects, event interfaces, and other
+subsystems, the device code is separated into a generic component that
+contains a standardized interface (in the form of a set of methods) onto
+particular device types.
+The device subsystem defines all the methods and provides method
+services for not only device operations but also for the frame, window,
+menubar, scrollbar, toolbar, and other displayable-object subsystems.
+The reason for this is that all of these subsystems have the same
+subtypes (X, TTY, NeXTstep, Microsoft Windows, etc.) as devices do.
+@example
+frame-ns.h
+frame-tty.c
+frame-x.c
+frame-x.h
+frame.c
+frame.h
+@end example
+Each device contains one or more frames in which objects (e.g. text) are
+displayed.  A frame corresponds to a window in the window system;
+usually this is a top-level window but it could potentially be one of a
+number of overlapping child windows within a top-level window, using the
+MDI (Multiple Document Interface) protocol in Microsoft Windows or a
+similar scheme.
+The @file{frame-*} files implement the @dfn{frame} Lisp object type and
+provide the generic and device-type-specific operations on frames
+(e.g. raising, lowering, resizing, moving, etc.).
+@example
+window.c
+window.h
+@end example
+@cindex window (in Emacs)
+@cindex pane
+Each frame consists of one or more non-overlapping @dfn{windows} (better
+known as @dfn{panes} in standard window-system terminology) in which a
+buffer's text can be displayed.  Windows can also have scrollbars
+displayed around their edges.
+@file{window.c} and @file{window.h} implement the @dfn{window} Lisp
+object type and provide code to manage windows.  Since windows have no
+associated resources in the window system (the window system knows only
+about the frame; no child windows or anything are used for XEmacs
+windows), there is no device-type-specific code here; all of that code
+is part of the redisplay mechanism or the code for particular object
+types such as scrollbars.
+@node Modules for other Display-Related Lisp Objects
+@section Modules for other Display-Related Lisp Objects
+@example
+faces.c
+faces.h
+@end example
+@example
+bitmaps.h
+glyphs-ns.h
+glyphs-x.c
+glyphs-x.h
+glyphs.c
+glyphs.h
+@end example
+@example
+objects-ns.h
+objects-tty.c
+objects-tty.h
+objects-x.c
+objects-x.h
+objects.c
+objects.h
+@end example
+@example
+menubar-x.c
+menubar.c
+@end example
+@example
+scrollbar-x.c
+scrollbar-x.h
+scrollbar.c
+scrollbar.h
+@end example
+@example
+toolbar-x.c
+toolbar.c
+toolbar.h
+@end example
+@example
+font-lock.c
+@end example
+This file provides C support for syntax highlighting -- i.e.
+highlighting different syntactic constructs of a source file in
+different colors, for easy reading.  The C support is provided so that
+this is fast.
+@example
+dgif_lib.c
+gif_err.c
+gif_lib.h
+gifalloc.c
+@end example
+These modules decode GIF-format image files, for use with glyphs.
+@node Modules for the Redisplay Mechanism
+@section Modules for the Redisplay Mechanism
+@example
+redisplay-output.c
+redisplay-tty.c
+redisplay-x.c
+redisplay.c
+redisplay.h
+@end example
+These files provide the redisplay mechanism.  As with many other
+subsystems in XEmacs, there is a clean separation between the general
+and device-specific support.
+@file{redisplay.c} contains the bulk of the redisplay engine.  These
+functions update the redisplay structures (which describe how the screen
+is to appear) to reflect any changes made to the state of any
+displayable objects (buffer, frame, window, etc.) since the last time
+that redisplay was called.  These functions are highly optimized to
+avoid doing more work than necessary (since redisplay is called
+extremely often and is potentially a huge time sink), and depend heavily
+on notifications from the objects themselves that changes have occurred,
+so that redisplay doesn't explicitly have to check each possible object.
+The redisplay mechanism also contains a great deal of caching to further
+speed things up; some of this caching is contained within the various
+displayable objects.
+@file{redisplay-output.c} goes through the redisplay structures and converts
+them into calls to device-specific methods to actually output the screen
+changes.
+@file{redisplay-x.c} and @file{redisplay-tty.c} are two implementations
+of these redisplay output methods, for X frames and TTY frames,
+respectively.
+@example
+indent.c
+@end example
+This module contains various functions and Lisp primitives for
+converting between buffer positions and screen positions.  These
+functions call the redisplay mechanism to do most of the work, and then
+examine the redisplay structures to get the necessary information.  This
+module needs work.
+@example
+termcap.c
+terminfo.c
+tparam.c
+@end example
+These files contain functions for working with the termcap (BSD-style)
+and terminfo (System V style) databases of terminal capabilities and
+escape sequences, used when XEmacs is displaying in a TTY.
+@example
+cm.c
+cm.h
+@end example
+These files provide some miscellaneous TTY-output functions and should
+probably be merged into @file{redisplay-tty.c}.
+@node Modules for Interfacing with the File System
+@section Modules for Interfacing with the File System
+@example
+lstream.c
+lstream.h
+@end example
+These modules implement the @dfn{stream} Lisp object type.  This is an
+internal-only Lisp object that implements a generic buffering stream.
+The idea is to provide a uniform interface onto all sources and sinks of
+data, including file descriptors, stdio streams, chunks of memory, Lisp
+buffers, Lisp strings, etc.  That way, I/O functions can be written to
+the stream interface and can transparently handle all possible sources
+and sinks.  (For example, the @code{read} function can read data from a
+file, a string, a buffer, or even a function that is called repeatedly
+to return data, without worrying about where the data is coming from or
+what-size chunks it is returned in.)
+@cindex lstream
+Note that in the C code, streams are called @dfn{lstreams} (for ``Lisp
+streams'') to distinguish them from other kinds of streams, e.g. stdio
+streams and C++ I/O streams.
+Similar to other subsystems in XEmacs, lstreams are separated into
+generic functions and a set of methods for the different types of
+lstreams.  @file{lstream.c} provides implementations of many different
+types of streams; others are provided, e.g., in @file{mule-coding.c}.
+@example
+fileio.c
+@end example
+This implements the basic primitives for interfacing with the file
+system.  This includes primitives for reading files into buffers,
+writing buffers into files, checking for the presence or accessibility
+of files, canonicalizing file names, etc.  Note that these primitives
+are usually not invoked directly by the user: There is a great deal of
+higher-level Lisp code that implements the user commands such as
+@code{find-file} and @code{save-buffer}.  This is similar to the
+distinction between the lower-level primitives in @file{editfns.c} and
+the higher-level user commands in @file{commands.c} and
+@file{simple.el}.
+@example
+filelock.c
+@end example
+This file provides functions for detecting clashes between different
+processes (e.g. XEmacs and some external process, or two different
+XEmacs processes) modifying the same file.  (XEmacs can optionally use
+the @file{lock/} subdirectory to provide a form of ``locking'' between
+different XEmacs processes.)  This module is also used by the low-level
+functions in @file{insdel.c} to ensure that, if the first modification
+is being made to a buffer whose corresponding file has been externally
+modified, the user is made aware of this so that the buffer can be
+synched up with the external changes if necessary.
+@example
+filemode.c
+@end example
+This file provides some miscellaneous functions that construct a
+@samp{rwxr-xr-x}-type permissions string (as might appear in an
+@file{ls}-style directory listing) given the information returned by the
+@code{stat()} system call.
+@example
+dired.c
+ndir.h
+@end example
+These files implement the XEmacs interface to directory searching.  This
+includes a number of primitives for determining the files in a directory
+and for doing filename completion. (Remember that generic completion is
+handled by a different mechanism, in @file{minibuf.c}.)
+@file{ndir.h} is a header file used for the directory-searching
+emulation functions provided in @file{sysdep.c} (see section J below),
+for systems that don't provide any directory-searching functions. (On
+those systems, directories can be read directly as files, and parsed.)
+@example
+realpath.c
+@end example
+This file provides an implementation of the @code{realpath()} function
+for expanding symbolic links, on systems that don't implement it or have
+a broken implementation.
+@node Modules for Other Aspects of the Lisp Interpreter and Object System
+@section Modules for Other Aspects of the Lisp Interpreter and Object System
+@example
+elhash.c
+elhash.h
+hash.c
+hash.h
+@end example
+These files provide two implementations of hash tables.  Files
+@file{hash.c} and @file{hash.h} provide a generic C implementation of
+hash tables which can stand independently of XEmacs.  Files
+@file{elhash.c} and @file{elhash.h} provide a separate implementation of
+hash tables that can store only Lisp objects, and knows about Lispy
+things like garbage collection, and implement the @dfn{hash-table} Lisp
+object type.
+@example
+specifier.c
+specifier.h
+@end example
+This module implements the @dfn{specifier} Lisp object type.  This is
+primarily used for displayable properties, and allows for values that
+are specific to a particular buffer, window, frame, device, or device
+class, as well as a default value existing.  This is used, for example,
+to control the height of the horizontal scrollbar or the appearance of
+the @code{default}, @code{bold}, or other faces.  The specifier object
+consists of a number of specifications, each of which maps from a
+buffer, window, etc. to a value.  The function @code{specifier-instance}
+looks up a value given a window (from which a buffer, frame, and device
+can be derived).
+@example
+chartab.c
+chartab.h
+casetab.c
+@end example
+@file{chartab.c} and @file{chartab.h} implement the @dfn{char table}
+Lisp object type, which maps from characters or certain sorts of
+character ranges to Lisp objects.  The implementation of this object
+type is optimized for the internal representation of characters.  Char
+tables come in different types, which affect the allowed object types to
+which a character can be mapped and also dictate certain other
+properties of the char table.
+@cindex case table
+@file{casetab.c} implements one sort of char table, the @dfn{case
+table}, which maps characters to other characters of possibly different
+case.  These are used by XEmacs to implement case-changing primitives
+and to do case-insensitive searching.
+@example
+syntax.c
+syntax.h
+@end example
+@cindex scanner
+This module implements @dfn{syntax tables}, another sort of char table
+that maps characters into syntax classes that define the syntax of these
+characters (e.g. a parenthesis belongs to a class of @samp{open}
+characters that have corresponding @samp{close} characters and can be
+nested).  This module also implements the Lisp @dfn{scanner}, a set of
+primitives for scanning over text based on syntax tables.  This is used,
+for example, to find the matching parenthesis in a command such as
+@code{forward-sexp}, and by @file{font-lock.c} to locate quoted strings,
+comments, etc.
+@example
+casefiddle.c
+@end example
+This module implements various Lisp primitives for upcasing, downcasing
+and capitalizing strings or regions of buffers.
+@example
+rangetab.c
+@end example
+This module implements the @dfn{range table} Lisp object type, which
+provides for a mapping from ranges of integers to arbitrary Lisp
+objects.
+@example
+opaque.c
+opaque.h
+@end example
+This module implements the @dfn{opaque} Lisp object type, an
+internal-only Lisp object that encapsulates an arbitrary block of memory
+so that it can be managed by the Lisp allocation system.  To create an
+opaque object, you call @code{make_opaque()}, passing a pointer to a
+block of memory.  An object is created that is big enough to hold the
+memory, which is copied into the object's storage.  The object will then
+stick around as long as you keep pointers to it, after which it will be
+automatically reclaimed.
+@cindex mark method
+Opaque objects can also have an arbitrary @dfn{mark method} associated
+with them, in case the block of memory contains other Lisp objects that
+need to be marked for garbage-collection purposes. (If you need other
+object methods, such as a finalize method, you should just go ahead and
+create a new Lisp object type -- it's not hard.)
+@example
+abbrev.c
+@end example
+This function provides a few primitives for doing dynamic abbreviation
+expansion.  In XEmacs, most of the code for this has been moved into
+Lisp.  Some C code remains for speed and because the primitive
+@code{self-insert-command} (which is executed for all self-inserting
+characters) hooks into the abbrev mechanism. (@code{self-insert-command}
+is itself in C only for speed.)
+@example
+doc.c
+@end example
+This function provides primitives for retrieving the documentation
+strings of functions and variables.  These documentation strings contain
+certain special markers that get dynamically expanded (e.g. a
+reverse-lookup is performed on some named functions to retrieve their
+current key bindings).  Some documentation strings (in particular, for
+the built-in primitives and pre-loaded Lisp functions) are stored
+externally in a file @file{DOC} in the @file{lib-src/} directory and
+need to be fetched from that file. (Part of the build stage involves
+building this file, and another part involves constructing an index for
+this file and embedding it into the executable, so that the functions in
+@file{doc.c} do not have to search the entire @file{DOC} file to find
+the appropriate documentation string.)
+@example
+md5.c
+@end example
+This function provides a Lisp primitive that implements the MD5 secure
+hashing scheme, used to create a large hash value of a string of data such that
+the data cannot be derived from the hash value.  This is used for
+various security applications on the Internet.
+@node Modules for Interfacing with the Operating System
+@section Modules for Interfacing with the Operating System
+@example
+callproc.c
+process.c
+process.h
+@end example
+These modules allow XEmacs to spawn and communicate with subprocesses
+and network connections.
+@cindex synchronous subprocesses
+@cindex subprocesses, synchronous
+@file{callproc.c} implements (through the @code{call-process}
+primitive) what are called @dfn{synchronous subprocesses}.  This means
+that XEmacs runs a program, waits till it's done, and retrieves its
+output.  A typical example might be calling the @file{ls} program to get
+a directory listing.
+@cindex asynchronous subprocesses
+@cindex subprocesses, asynchronous
+@file{process.c} and @file{process.h} implement @dfn{asynchronous
+subprocesses}.  This means that XEmacs starts a program and then
+continues normally, not waiting for the process to finish.  Data can be
+sent to the process or retrieved from it as it's running.  This is used
+for the @code{shell} command (which provides a front end onto a shell
+program such as @file{csh}), the mail and news readers implemented in
+XEmacs, etc.  The result of calling @code{start-process} to start a
+subprocess is a process object, a particular kind of object used to
+communicate with the subprocess.  You can send data to the process by
+passing the process object and the data to @code{send-process}, and you
+can specify what happens to data retrieved from the process by setting
+properties of the process object. (When the process sends data, XEmacs
+receives a process event, which says that there is data ready.  When
+@code{dispatch-event} is called on this event, it reads the data from
+the process and does something with it, as specified by the process
+object's properties.  Typically, this means inserting the data into a
+buffer or calling a function.) Another property of the process object is
+called the @dfn{sentinel}, which is a function that is called when the
+process terminates.
+@cindex network connections
+Process objects are also used for network connections (connections to a
+process running on another machine).  Network connections are started
+with @code{open-network-stream} but otherwise work just like
+subprocesses.
+@example
+sysdep.c
+sysdep.h
+@end example
+These modules implement most of the low-level, messy operating-system
+interface code.  This includes various device control (ioctl) operations
+for file descriptors, TTY's, pseudo-terminals, etc. (usually this stuff
+is fairly system-dependent; thus the name of this module), and emulation
+of standard library functions and system calls on systems that don't
+provide them or have broken versions.
+@example
+sysdir.h
+sysfile.h
+sysfloat.h
+sysproc.h
+syspwd.h
+syssignal.h
+systime.h
+systty.h
+syswait.h
+@end example
+These header files provide consistent interfaces onto system-dependent
+header files and system calls.  The idea is that, instead of including a
+standard header file like @file{<sys/param.h>} (which may or may not
+exist on various systems) or having to worry about whether all system
+provide a particular preprocessor constant, or having to deal with the
+four different paradigms for manipulating signals, you just include the
+appropriate @file{sys*.h} header file, which includes all the right
+system header files, defines and missing preprocessor constants,
+provides a uniform interface onto system calls, etc.
+@file{sysdir.h} provides a uniform interface onto directory-querying
+functions. (In some cases, this is in conjunction with emulation
+functions in @file{sysdep.c}.)
+@file{sysfile.h} includes all the necessary header files for standard
+system calls (e.g. @code{read()}), ensures that all necessary
+@code{open()} and @code{stat()} preprocessor constants are defined, and
+possibly (usually) substitutes sugared versions of @code{read()},
+@code{write()}, etc. that automatically restart interrupted I/O
+operations.
+@file{sysfloat.h} includes the necessary header files for floating-point
+operations.
+@file{sysproc.h} includes the necessary header files for calling
+@code{select()}, @code{fork()}, @code{execve()}, socket operations, and
+the like, and ensures that the @code{FD_*()} macros for descriptor-set
+manipulations are available.
+@file{syspwd.h} includes the necessary header files for obtaining
+information from @file{/etc/passwd} (the functions are emulated under
+VMS).
+@file{syssignal.h} includes the necessary header files for
+signal-handling and provides a uniform interface onto the different
+signal-handling and signal-blocking paradigms.
+@file{systime.h} includes the necessary header files and provides
+uniform interfaces for retrieving the time of day, setting file
+access/modification times, getting the amount of time used by the XEmacs
+process, etc.
+@file{systty.h} buffers against the infinitude of different ways of
+controlling TTY's.
+@file{syswait.h} provides a uniform way of retrieving the exit status
+from a @code{wait()}ed-on process (some systems use a union, others use
+an int).
+@example
+hpplay.c
+libsst.c
+libsst.h
+libst.h
+linuxplay.c
+nas.c
+sgiplay.c
+sound.c
+sunplay.c
+@end example
+These files implement the ability to play various sounds on some types
+of computers.  You have to configure your XEmacs with sound support in
+order to get this capability.
+@file{sound.c} provides the generic interface.  It implements various
+Lisp primitives and variables that let you specify which sounds should
+be played in certain conditions. (The conditions are identified by
+symbols, which are passed to @code{ding} to make a sound.  Various
+standard functions call this function at certain times; if sound support
+does not exist, a simple beep results.
+@cindex native sound
+@cindex sound, native
+@file{sgiplay.c}, @file{sunplay.c}, @file{hpplay.c}, and
+@file{linuxplay.c} interface to the machine's speaker for various
+different kind of machines.  This is called @dfn{native} sound.
+@cindex sound, network
+@cindex network sound
+@cindex NAS
+@file{nas.c} interfaces to a computer somewhere else on the network
+using the NAS (Network Audio Server) protocol, playing sounds on that
+machine.  This allows you to run XEmacs on a remote machine, with its
+display set to your local machine, and have the sounds be made on your
+local machine, provided that you have a NAS server running on your local
+machine.
+@file{libsst.c}, @file{libsst.h}, and @file{libst.h} provide some
+additional functions for playing sound on a Sun SPARC but are not
+currently in use.
+@example
+tooltalk.c
+tooltalk.h
+@end example
+These two modules implement an interface to the ToolTalk protocol, which
+is an interprocess communication protocol implemented on some versions
+of Unix.  ToolTalk is a high-level protocol that allows processes to
+register themselves as providers of particular services; other processes
+can then request a service without knowing or caring exactly who is
+providing the service.  It is similar in spirit to the DDE protocol
+provided under Microsoft Windows.  ToolTalk is a part of the new CDE
+(Common Desktop Environment) specification and is used to connect the
+parts of the SPARCWorks development environment.
+@example
+getloadavg.c
+@end example
+This module provides the ability to retrieve the system's current load
+average. (The way to do this is highly system-specific, unfortunately,
+and requires a lot of special-case code.)
+@example
+sunpro.c
+@end example
+This module provides a small amount of code used internally at Sun to
+keep statistics on the usage of XEmacs.
+@example
+broken-sun.h
+strcmp.c
+strcpy.c
+sunOS-fix.c
+@end example
+These files provide replacement functions and prototypes to fix numerous
+bugs in early releases of SunOS 4.1.
+@example
+hftctl.c
+@end example
+This module provides some terminal-control code necessary on versions of
+AIX prior to 4.1.
+@example
+msdos.c
+msdos.h
+@end example
+These modules are used for MS-DOS support, which does not work in
+XEmacs.
+@node Modules for Interfacing with X Windows
+@section Modules for Interfacing with X Windows
+@example
+Emacs.ad.h
+@end example
+A file generated from @file{Emacs.ad}, which contains XEmacs-supplied
+fallback resources (so that XEmacs has pretty defaults).
+@example
+EmacsFrame.c
+EmacsFrame.h
+EmacsFrameP.h
+@end example
+These modules implement an Xt widget class that encapsulates a frame.
+This is for ease in integrating with Xt.  The EmacsFrame widget covers
+the entire X window except for the menubar; the scrollbars are
+positioned on top of the EmacsFrame widget.
+@strong{Warning:} Abandon hope, all ye who enter here.  This code took
+an ungodly amount of time to get right, and is likely to fall apart
+mercilessly at the slightest change.  Such is life under Xt.
+@example
+EmacsManager.c
+EmacsManager.h
+EmacsManagerP.h
+@end example
+These modules implement a simple Xt manager (i.e. composite) widget
+class that simply lets its children set whatever geometry they want.
+It's amazing that Xt doesn't provide this standardly, but on second
+thought, it makes sense, considering how amazingly broken Xt is.
+@example
+EmacsShell-sub.c
+EmacsShell.c
+EmacsShell.h
+EmacsShellP.h
+@end example
+These modules implement two Xt widget classes that are subclasses of
+the TopLevelShell and TransientShell classes.  This is necessary to deal
+with more brokenness that Xt has sadistically thrust onto the backs of
+developers.
+@example
+xgccache.c
+xgccache.h
+@end example
+These modules provide functions for maintenance and caching of GC's
+(graphics contexts) under the X Window System.  This code is junky and
+needs to be rewritten.
+@example
+xselect.c
+@end example
+@cindex selections
+This module provides an interface to the X Window System's concept of
+@dfn{selections}, the standard way for X applications to communicate
+with each other.
+@example
+xintrinsic.h
+xintrinsicp.h
+xmmanagerp.h
+xmprimitivep.h
+@end example
+These header files are similar in spirit to the @file{sys*.h} files and buffer
+against different implementations of Xt and Motif.
+@itemize @bullet
+@item
+@file{xintrinsic.h} should be included in place of @file{<Intrinsic.h>}.
+@item
+@file{xintrinsicp.h} should be included in place of @file{<IntrinsicP.h>}.
+@item
+@file{xmmanagerp.h} should be included in place of @file{<XmManagerP.h>}.
+@item
+@file{xmprimitivep.h} should be included in place of @file{<XmPrimitiveP.h>}.
+@end itemize
+@example
+xmu.c
+xmu.h
+@end example
+These files provide an emulation of the Xmu library for those systems
+(i.e. HPUX) that don't provide it as a standard part of X.
+@example
+ExternalClient-Xlib.c
+ExternalClient.c
+ExternalClient.h
+ExternalClientP.h
+ExternalShell.c
+ExternalShell.h
+ExternalShellP.h
+extw-Xlib.c
+extw-Xlib.h
+extw-Xt.c
+extw-Xt.h
+@end example
+@cindex external widget
+These files provide the @dfn{external widget} interface, which allows an
+XEmacs frame to appear as a widget in another application.  To do this,
+you have to configure with @samp{--external-widget}.
+@file{ExternalShell*} provides the server (XEmacs) side of the
+connection.
+@file{ExternalClient*} provides the client (other application) side of
+the connection.  These files are not compiled into XEmacs but are
+compiled into libraries that are then linked into your application.
+@file{extw-*} is common code that is used for both the client and server.
+Don't touch this code; something is liable to break if you do.
+@node Modules for Internationalization
+@section Modules for Internationalization
+@example
+mule-canna.c
+mule-ccl.c
+mule-charset.c
+mule-charset.h
+mule-coding.c
+mule-coding.h
+mule-mcpath.c
+mule-mcpath.h
+mule-wnnfns.c
+mule.c
+@end example
+These files implement the MULE (Asian-language) support.  Note that MULE
+actually provides a general interface for all sorts of languages, not
+just Asian languages (although they are generally the most complicated
+to support).  This code is still in beta.
+@file{mule-charset.*} and @file{mule-coding.*} provide the heart of the
+XEmacs MULE support.  @file{mule-charset.*} implements the @dfn{charset}
+Lisp object type, which encapsulates a character set (an ordered one- or
+two-dimensional set of characters, such as US ASCII or JISX0208 Japanese
+Kanji).
+@file{mule-coding.*} implements the @dfn{coding-system} Lisp object
+type, which encapsulates a method of converting between different
+encodings.  An encoding is a representation of a stream of characters,
+possibly from multiple character sets, using a stream of bytes or words,
+and defines (e.g.) which escape sequences are used to specify particular
+character sets, how the indices for a character are converted into bytes
+(sometimes this involves setting the high bit; sometimes complicated
+rearranging of the values takes place, as in the Shift-JIS encoding),
+etc.
+@file{mule-ccl.c} provides the CCL (Code Conversion Language)
+interpreter.  CCL is similar in spirit to Lisp byte code and is used to
+implement converters for custom encodings.
+@file{mule-canna.c} and @file{mule-wnnfns.c} implement interfaces to
+external programs used to implement the Canna and WNN input methods,
+respectively.  This is currently in beta.
+@file{mule-mcpath.c} provides some functions to allow for pathnames
+containing extended characters.  This code is fragmentary, obsolete, and
+completely non-working.  Instead, @var{pathname-coding-system} is used
+to specify conversions of names of files and directories.  The standard
+C I/O functions like @samp{open()} are wrapped so that conversion occurs
+automatically.
+@file{mule.c} provides a few miscellaneous things that should probably
+be elsewhere.
+@example
+intl.c
+@end example
+This provides some miscellaneous internationalization code for
+implementing message translation and interfacing to the Ximp input
+method.  None of this code is currently working.
+@example
+iso-wide.h
+@end example
+This contains leftover code from an earlier implementation of
+Asian-language support, and is not currently used.
+@node Allocation of Objects in XEmacs Lisp, Events and the Event Loop, A Summary of the Various XEmacs Modules, Top
+@chapter Allocation of Objects in XEmacs Lisp
+@menu
+* Introduction to Allocation::
+* Garbage Collection::
+* GCPROing::
+* Garbage Collection - Step by Step::
+* Integers and Characters::
+* Allocation from Frob Blocks::
+* lrecords::
+* Low-level allocation::
+* Pure Space::
+* Cons::
+* Vector::
+* Bit Vector::
+* Symbol::
+* Marker::
+* String::
+* Compiled Function::
+@end menu
+@node Introduction to Allocation
+@section Introduction to Allocation
+Emacs Lisp, like all Lisps, has garbage collection.  This means that
+the programmer never has to explicitly free (destroy) an object; it
+happens automatically when the object becomes inaccessible.  Most
+experts agree that garbage collection is a necessity in a modern,
+high-level language.  Its omission from C stems from the fact that C was
+originally designed to be a nice abstract layer on top of assembly
+language, for writing kernels and basic system utilities rather than
+large applications.
+Lisp objects can be created by any of a number of Lisp primitives.
+Most object types have one or a small number of basic primitives
+for creating objects.  For conses, the basic primitive is @code{cons};
+for vectors, the primitives are @code{make-vector} and @code{vector}; for
+symbols, the primitives are @code{make-symbol} and @code{intern}; etc.
+Some Lisp objects, especially those that are primarily used internally,
+have no corresponding Lisp primitives.  Every Lisp object, though,
+has at least one C primitive for creating it.
+Recall from section (VII) that a Lisp object, as stored in a 32-bit
+or 64-bit word, has a mark bit, a few tag bits, and a ``value'' that
+occupies the remainder of the bits.  We can separate the different
+Lisp object types into four broad categories:
+@itemize @bullet
+@item
+(a) Those for whom the value directly represents the contents of the
+Lisp object.  Only two types are in this category: integers and
+characters.  No special allocation or garbage collection is necessary
+for such objects.  Lisp objects of these types do not need to be
+@code{GCPRO}ed.
+@end itemize
+In the remaining three categories, the value is a pointer to a
+structure.
+@itemize @bullet
+@item
+@cindex frob block
+(b) Those for whom the tag directly specifies the type.  Recall that
+there are only three tag bits; this means that at most five types can be
+specified this way.  The most commonly-used types are stored in this
+format; this includes conses, strings, vectors, and sometimes symbols.
+With the exception of vectors, objects in this category are allocated in
+@dfn{frob blocks}, i.e. large blocks of memory that are subdivided into
+individual objects.  This saves a lot on malloc overhead, since there
+are typically quite a lot of these objects around, and the objects are
+small.  (A cons, for example, occupies 8 bytes on 32-bit machines -- 4
+bytes for each of the two objects it contains.) Vectors are individually
+@code{malloc()}ed since they are of variable size.  (It would be
+possible, and desirable, to allocate vectors of certain small sizes out
+of frob blocks, but it isn't currently done.) Strings are handled
+specially: Each string is allocated in two parts, a fixed size structure
+containing a length and a data pointer, and the actual data of the
+string.  The former structure is allocated in frob blocks as usual, and
+the latter data is stored in @dfn{string chars blocks} and is relocated
+during garbage collection to eliminate holes.
+@end itemize
+In the remaining two categories, the type is stored in the object
+itself.  The tag for all such objects is the generic @dfn{lrecord}
+(Lisp_Record) tag.  The first four bytes (or eight, for 64-bit machines)
+of the object's structure are a pointer to a structure that describes
+the object's type, which includes method pointers and a pointer to a
+string naming the type.  Note that it's possible to save some space by
+using a one- or two-byte tag, rather than a four- or eight-byte pointer
+to store the type, but it's not clear it's worth making the change.
+@itemize @bullet
+@item
+(c) Those lrecords that are allocated in frob blocks (see above).  This
+includes the objects that are most common and relatively small, and
+includes floats, compiled functions, symbols (when not in category (b)),
+extents, events, and markers.  With the cleanup of frob blocks done in
+19.12, it's not terribly hard to add more objects to this category, but
+it's a bit trickier than adding an object type to type (d) (esp. if the
+object needs a finalization method), and is not likely to save much
+space unless the object is small and there are many of them. (In fact,
+if there are very few of them, it might actually waste space.)
+@item
+(d) Those lrecords that are individually @code{malloc()}ed.  These are
+called @dfn{lcrecords}.  All other types are in this category.  Adding a
+new type to this category is comparatively easy, and all types added
+since 19.8 (when the current allocation scheme was devised, by Richard
+Mlynarik), with the exception of the character type, have been in this
+category.
+@end itemize
+Note that bit vectors are a bit of a special case.  They are
+simple lrecords as in category (c), but are individually @code{malloc()}ed
+like vectors.  You can basically view them as exactly like vectors
+except that their type is stored in lrecord fashion rather than
+in directly-tagged fashion.
+Note that FSF Emacs redesigned their object system in 19.29 to follow
+a similar scheme.  However, given RMS's expressed dislike for data
+abstraction, the FSF scheme is not nearly as clean or as easy to
+extend. (FSF calls items of type (c) @code{Lisp_Misc} and items of type
+(d) @code{Lisp_Vectorlike}, with separate tags for each, although
+@code{Lisp_Vectorlike} is also used for vectors.)
+@node Garbage Collection
+@section Garbage Collection
+@cindex garbage collection
+@cindex mark and sweep
+Garbage collection is simple in theory but tricky to implement.
+Emacs Lisp uses the oldest garbage collection method, called
+@dfn{mark and sweep}.  Garbage collection begins by starting with
+all accessible locations (i.e. all variables and other slots where
+Lisp objects might occur) and recursively traversing all objects
+accessible from those slots, marking each one that is found.
+We then go through all of memory and free each object that is
+not marked, and unmarking each object that is marked.  Note
+that ``all of memory'' means all currently allocated objects.
+Traversing all these objects means traversing all frob blocks,
+all vectors (which are chained in one big list), and all
+lcrecords (which are likewise chained).
+Note that, when an object is marked, the mark has to occur
+inside of the object's structure, rather than in the 32-bit
+@code{Lisp_Object} holding the object's pointer; i.e. you can't just
+set the pointer's mark bit.  This is because there may be many
+pointers to the same object.  This means that the method of
+marking an object can differ depending on the type.  The
+different marking methods are approximately as follows:
+@enumerate
+@item
+For conses, the mark bit of the car is set.
+@item
+For strings, the mark bit of the string's plist is set.
+@item
+For symbols when not lrecords, the mark bit of the
+symbol's plist is set.
+@item
+For vectors, the length is negated after adding 1.
+@item
+For lrecords, the pointer to the structure describing
+the type is changed (see below).
+@item
+Integers and characters do not need to be marked, since
+no allocation occurs for them.
+@end enumerate
+The details of this are in the @code{mark_object()} function.
+Note that any code that operates during garbage collection has
+to be especially careful because of the fact that some objects
+may be marked and as such may not look like they normally do.
+In particular:
+@itemize @bullet
+Some object pointers may have their mark bit set.  This will make
+@code{FOOBARP()} predicates fail.  Use @code{GC_FOOBARP()} to deal with
+this.
+@item
+Even if you clear the mark bit, @code{FOOBARP()} will still fail
+for lrecords because the implementation pointer has been
+changed (see below).  @code{GC_FOOBARP()} will correctly deal with
+this.
+@item
+Vectors have their size field munged, so anything that
+looks at this field will fail.
+@item
+Note that @code{XFOOBAR()} macros @emph{will} work correctly on object
+pointers with their mark bit set, because the logical shift operations
+that remove the tag also remove the mark bit.
+@end itemize
+Finally, note that garbage collection can be invoked explicitly
+by calling @code{garbage-collect} but is also called automatically
+by @code{eval}, once a certain amount of memory has been allocated
+since the last garbage collection (according to @code{gc-cons-threshold}).
+@node GCPROing
+@section @code{GCPRO}ing
+@code{GCPRO}ing is one of the ugliest and trickiest parts of Emacs
+internals.  The basic idea is that whenever garbage collection
+occurs, all in-use objects must be reachable somehow or
+other from one of the roots of accessibility.  The roots
+of accessibility are:
+@enumerate
+@item
+All objects that have been @code{staticpro()}d.  This is used for
+any global C variables that hold Lisp objects.  A call to
+@code{staticpro()} happens implicitly as a result of any symbols
+declared with @code{defsymbol()} and any variables declared with
+@code{DEFVAR_FOO()}.  You need to explicitly call @code{staticpro()}
+(in the @code{vars_of_foo()} method of a module) for other global
+C variables holding Lisp objects. (This typically includes
+internal lists and such things.)
+Note that @code{obarray} is one of the @code{staticpro()}d things.
+Therefore, all functions and variables get marked through this.
+@item
+Any shadowed bindings that are sitting on the @code{specpdl} stack.
+@item
+Any objects sitting in currently active (Lisp) stack frames,
+catches, and condition cases.
+@item
+A couple of special-case places where active objects are
+located.
+@item
+Anything currently marked with @code{GCPRO}.
+@end enumerate
+Marking with @code{GCPRO} is necessary because some C functions (quite
+a lot, in fact), allocate objects during their operation.  Quite
+frequently, there will be no other pointer to the object while the
+function is running, and if a garbage collection occurs and the object
+needs to be referenced again, bad things will happen.  The solution is
+to mark those objects with @code{GCPRO}.  Unfortunately this is easy to
+forget, and there is basically no way around this problem.  Here are
+some rules, though:
+@enumerate
+@item
+For every @code{GCPRO@var{n}}, there have to be declarations of
+@code{struct gcpro gcpro1, gcpro2}, etc.
+@item
+You @emph{must} @code{UNGCPRO} anything that's @code{GCPRO}ed, and you
+@emph{must not} @code{UNGCPRO} if you haven't @code{GCPRO}ed.  Getting
+either of these wrong will lead to crashes, often in completely random
+places unrelated to where the problem lies.
+@item
+The way this actually works is that all currently active @code{GCPRO}s
+are chained through the @code{struct gcpro} local variables, with the
+variable @samp{gcprolist} pointing to the head of the list and the nth
+local @code{gcpro} variable pointing to the first @code{gcpro} variable
+in the next enclosing stack frame.  Each @code{GCPRO}ed thing is an
+lvalue, and the @code{struct gcpro} local variable contains a pointer to
+this lvalue.  This is why things will mess up badly if you don't pair up
+the @code{GCPRO}s and @code{UNGCPRO}s -- you will end up with
+@code{gcprolist}s containing pointers to @code{struct gcpro}s or local
+@code{Lisp_Object} variables in no-longer-active stack frames.
+@item
+It is actually possible for a single @code{struct gcpro} to
+protect a contiguous array of any number of values, rather than
+just a single lvalue.  To effect this, call @code{GCPRO@var{n}} as usual on
+the first object in the array and then set @code{gcpro@var{n}.nvars}.
+@item
+@strong{Strings are relocated.}  What this means in practice is that the
+pointer obtained using @code{XSTRING_DATA()} is liable to change at any
+time, and you should never keep it around past any function call, or
+pass it as an argument to any function that might cause a garbage
+collection.  This is why a number of functions accept either a
+``non-relocatable'' @code{char *} pointer or a relocatable Lisp string,
+and only access the Lisp string's data at the very last minute.  In some
+cases, you may end up having to @code{alloca()} some space and copy the
+string's data into it.
+@item
+By convention, if you have to nest @code{GCPRO}'s, use @code{NGCPRO@var{n}}
+(along with @code{struct gcpro ngcpro1, ngcpro2}, etc.), @code{NNGCPRO@var{n}},
+etc.  This avoids compiler warnings about shadowed locals.
+@item
+It is @emph{always} better to err on the side of extra @code{GCPRO}s
+rather than too few.  The extra cycles spent on this are
+almost never going to make a whit of difference in the
+speed of anything.
+@item
+The general rule to follow is that caller, not callee, @code{GCPRO}s.
+That is, you should not have to explicitly @code{GCPRO} any Lisp objects
+that are passed in as parameters.
+One exception from this rule is if you ever plan to change the parameter
+value, and store a new object in it.  In that case, you @emph{must}
+@code{GCPRO} the parameter, because otherwise the new object will not be
+protected.
+So, if you create any Lisp objects (remember, this happens in all sorts
+of circumstances, e.g. with @code{Fcons()}, etc.), you are responsible
+for @code{GCPRO}ing them, unless you are @emph{absolutely sure} that
+there's no possibility that a garbage-collection can occur while you
+need to use the object.  Even then, consider @code{GCPRO}ing.
+@item
+A garbage collection can occur whenever anything calls @code{Feval}, or
+whenever a QUIT can occur where execution can continue past
+this. (Remember, this is almost anywhere.)
+@item
+If you have the @emph{least smidgeon of doubt} about whether
+you need to @code{GCPRO}, you should @code{GCPRO}.
+@item
+Beware of @code{GCPRO}ing something that is uninitialized.  If you have
+any shade of doubt about this, initialize all your variables to @code{Qnil}.
+@item
+Be careful of traps, like calling @code{Fcons()} in the argument to
+another function.  By the ``caller protects'' law, you should be
+@code{GCPRO}ing the newly-created cons, but you aren't.  A certain
+number of functions that are commonly called on freshly created stuff
+(e.g. @code{nconc2()}, @code{Fsignal()}), break the ``caller protects''
+law and go ahead and @code{GCPRO} their arguments so as to simplify
+things, but make sure and check if it's OK whenever doing something like
+this.
+@item
+Once again, remember to @code{GCPRO}!  Bugs resulting from insufficient
+@code{GCPRO}ing are intermittent and extremely difficult to track down,
+often showing up in crashes inside of @code{garbage-collect} or in
+weirdly corrupted objects or even in incorrect values in a totally
+different section of code.
+@end enumerate
+@cindex garbage collection, conservative
+@cindex conservative garbage collection
+Given the extremely error-prone nature of the @code{GCPRO} scheme, and
+the difficulties in tracking down, it should be considered a deficiency
+in the XEmacs code.  A solution to this problem would involve
+implementing so-called @dfn{conservative} garbage collection for the C
+stack.  That involves looking through all of stack memory and treating
+anything that looks like a reference to an object as a reference.  This
+will result in a few objects not getting collected when they should, but
+it obviates the need for @code{GCPRO}ing, and allows garbage collection
+to happen at any point at all, such as during object allocation.
+@node Garbage Collection - Step by Step
+@section Garbage Collection - Step by Step
+@cindex garbage collection step by step
+@menu
+* Invocation::
+* garbage_collect_1::
+* mark_object::
+* gc_sweep::
+* sweep_lcrecords_1::
+* compact_string_chars::
+* sweep_strings::
+* sweep_bit_vectors_1::
+@end menu
+@node Invocation
+@subsection Invocation
+@cindex garbage collection, invocation
+The first thing that anyone should know about garbage collection is:
+when and how the garbage collector is invoked. One might think that this
+could happen every time new memory is allocated, e.g. new objects are
+created, but this is @emph{not} the case. Instead, we have the following
+situation:
+The entry point of any process of garbage collection is an invocation
+of the function @code{garbage_collect_1} in file @code{alloc.c}. The
+invocation can occur @emph{explicitly} by calling the function
+@code{Fgarbage_collect} (in addition this function provides information
+about the freed memory), or can occur @emph{implicitly} in four different
+situations:
+@enumerate
+@item
+In function @code{main_1} in file @code{emacs.c}. This function is called
+at each startup of xemacs. The garbage collection is invoked after all
+initial creations are completed, but only if a special internal error
+checking-constant @code{ERROR_CHECK_GC} is defined.
+@item
+In function @code{disksave_object_finalization} in file
+@code{alloc.c}. The only purpose of this function is to clear the
+objects from memory which need not be stored with xemacs when we dump out
+an executable. This is only done by @code{Fdump_emacs} or by
+@code{Fdump_emacs_data} respectively (both in @code{emacs.c}). The
+actual clearing is accomplished by making these objects unreachable and
+starting a garbage collection. The function is only used while building
+xemacs.
+@item
+In function @code{Feval / eval} in file @code{eval.c}. Each time the
+well known and often used function eval is called to evaluate a form,
+one of the first things that could happen, is a potential call of
+@code{garbage_collect_1}. There exist three global variables,
+@code{consing_since_gc} (counts the created cons-cells since the last
+garbage collection), @code{gc_cons_threshold} (a specified threshold
+after which a garbage collection occurs) and @code{always_gc}. If
+@code{always_gc} is set or if the threshold is exceeded, the garbage
+collection will start.
+@item
+In function @code{Ffuncall / funcall} in file @code{eval.c}. This
+function evaluates calls of elisp functions and works according to
+@code{Feval}.
+@end enumerate
+The upshot is that garbage collection can basically occur everywhere
+@code{Feval}, respectively @code{Ffuncall}, is used - either directly or
+through another function. Since calls to these two functions are
+hidden in various other functions, many calls to
+@code{garabge_collect_1} are not obviously foreseeable, and therefore
+unexpected. Instances where they are used that are worth remembering are
+various elisp commands, as for example @code{or},
+@code{and}, @code{if}, @code{cond}, @code{while}, @code{setq}, etc.,
+miscellaneous @code{gui_item_...} functions, everything related to
+@code{eval} (@code{Feval_buffer}, @code{call0}, ...) and inside
+@code{Fsignal}. The latter is used to handle signals, as for example the
+ones raised by every @code{QUITE}-macro triggered after pressing Ctrl-g.
+@node garbage_collect_1
+@subsection @code{garbage_collect_1}
+@cindex @code{garbage_collect_1}
+We can now describe exactly what happens after the invocation takes
+place.
+@enumerate
+@item
+There are several cases in which the garbage collector is left immediately:
+when we are already garbage collecting (@code{gc_in_progress}), when
+the garbage collection is somehow forbidden
+(@code{gc_currently_forbidden}), when we are currently displaying something
+(@code{in_display}) or when we are preparing for the armageddon of the
+whole system (@code{preparing_for_armageddon}).
+@item
+Next the correct frame in which to put
+all the output occurring during garbage collecting is determined. In
+order to be able to restore the old display's state after displaying the
+message, some data about the current cursor position has to be
+saved. The variables @code{pre_gc_curser} and @code{cursor_changed} take
+care of that.
+@item
+The state of @code{gc_currently_forbidden} must be restored after
+the garbage collection, no matter what happens during the process. We
+accomplish this by @code{record_unwind_protect}ing the suitable function
+@code{restore_gc_inhibit} together with the current value of
+@code{gc_currently_forbidden}.
+@item
+If we are concurrently running an interactive xemacs session, the next step
+is simply to show the garbage collector's cursor/message.
+@item
+The following steps are the intrinsic steps of the garbage collector,
+therefore @code{gc_in_progress} is set.
+@item
+For debugging purposes, it is possible to copy the current C stack
+frame. However, this seems to be a currently unused feature.
+@item
+Before actually starting to go over all live objects, references to
+objects that are no longer used are pruned. We only have to do this for events
+(@code{clear_event_resource}) and for specifiers
+(@code{cleanup_specifiers}).
+@item
+Now the mark phase begins and marks all accessible elements. In order to
+start from
+all slots that serve as roots of accessibility, the function
+@code{mark_object} is called for each root individually to go out from
+there to mark all reachable objects. All roots that are traversed are
+shown in their processed order:
+@itemize @bullet
+@item
+all constant symbols and static variables that are registered via
+@code{staticpro}@ in the array @code{staticvec}.
+@xref{Adding Global Lisp Variables}.
+@item
+all Lisp objects that are created in C functions and that must be
+protected from freeing them. They are registered in the global
+list @code{gcprolist}.
+@xref{GCPROing}.
+@item
+all local variables (i.e. their name fields @code{symbol} and old
+values @code{old_values}) that are bound during the evaluation by the Lisp
+engine. They are stored in @code{specbinding} structs pushed on a stack
+called @code{specpdl}.
+@xref{Dynamic Binding; The specbinding Stack; Unwind-Protects}.
+@item
+all catch blocks that the Lisp engine encounters during the evaluation
+cause the creation of structs @code{catchtag} inserted in the list
+@code{catchlist}. Their tag (@code{tag}) and value (@code{val} fields
+are freshly created objects and therefore have to be marked.
+@xref{Catch and Throw}.
+@item
+every function application pushes new structs @code{backtrace}
+on the call stack of the Lisp engine (@code{backtrace_list}). The unique
+parts that have to be marked are the fields for each function
+(@code{function}) and all their arguments (@code{args}).
+@xref{Evaluation}.
+@item
+all objects that are used by the redisplay engine that must not be freed
+are marked by a special function called @code{mark_redisplay} (in
+@code{redisplay.c}).
+@item
+all objects created for profiling purposes are allocated by C functions
+instead of using the lisp allocation mechanisms. In order to receive the
+right ones during the sweep phase, they also have to be marked
+manually. That is done by the function @code{mark_profiling_info}
+@end itemize
+@item
+Hash tables in Xemacs belong to a kind of special objects that
+make use of a concept often called 'weak pointers'.
+To make a long story short, these kind of pointers are not followed
+during the estimation of the live objects during garbage collection.
+Any object referenced only by weak pointers is collected
+anyway, and the reference to it is cleared. In hash tables there are
+different usage patterns of them, manifesting in different types of hash
+tables, namely 'non-weak', 'weak', 'key-weak' and 'value-weak'
+(internally also 'key-car-weak' and 'value-car-weak') hash tables, each
+clearing entries depending on different conditions. More information can
+be found in the documentation to the function @code{make-hash-table}.
+Because there are complicated dependency rules about when and what to
+mark while processing weak hash tables, the standard @code{marker}
+method is only active if it is marking non-weak hash tables. As soon as
+a weak component is in the table, the hash table entries are ignored
+while marking. Instead their marking is done each separately by the
+function @code{finish_marking_weak_hash_tables}. This function iterates
+over each hash table entry @code{hentries} for each weak hash table in
+@code{Vall_weak_hash_tables}. Depending on the type of a table, the
+appropriate action is performed.
+If a table is acting as @code{HASH_TABLE_KEY_WEAK}, and a key already marked,
+everything reachable from the @code{value} component is marked. If it is
+acting as a @code{HASH_TABLE_VALUE_WEAK} and the value component is
+already marked, the marking starts beginning only from the
+@code{key} component.
+If it is a @code{HASH_TABLE_KEY_CAR_WEAK} and the car
+of the key entry is already marked, we mark both the @code{key} and
+@code{value} components.
+Finally, if the table is of the type @code{HASH_TABLE_VALUE_CAR_WEAK}
+and the car of the value components is already marked, again both the
+@code{key} and the @code{value} components get marked.
+Again, there are lists with comparable properties called weak
+lists. There exist different peculiarities of their types called
+@code{simple}, @code{assoc}, @code{key-assoc} and
+@code{value-assoc}. You can find further details about them in the
+description to the function @code{make-weak-list}. The scheme of their
+marking is similar: all weak lists are listed in @code{Qall_weak_lists},
+therefore we iterate over them. The marking is advanced until we hit an
+already marked pair. Then we know that during a former run all
+the rest has been marked completely. Again, depending on the special
+type of the weak list, our jobs differ. If it is a @code{WEAK_LIST_SIMPLE}
+and the elem is marked, we mark the @code{cons} part. If it is a
+@code{WEAK_LIST_ASSOC} and not a pair or a pair with both marked car and
+cdr, we mark the @code{cons} and the @code{elem}. If it is a
+@code{WEAK_LIST_KEY_ASSOC} and not a pair or a pair with a marked car of
+the elem, we mark the @code{cons} and the @code{elem}. Finally, if it is
+a @code{WEAK_LIST_VALUE_ASSOC} and not a pair or a pair with a marked
+cdr of the elem, we mark both the @code{cons} and the @code{elem}.
+Since, by marking objects in reach from weak hash tables and weak lists,
+other objects could get marked, this perhaps implies further marking of
+other weak objects, both finishing functions are redone as long as
+yet unmarked objects get freshly marked.
+@item
+After completing the special marking for the weak hash tables and for the weak
+lists, all entries that point to objects that are going to be swept in
+the further process are useless, and therefore have to be removed from
+the table or the list.
+The function @code{prune_weak_hash_tables} does the job for weak hash
+tables. Totally unmarked hash tables are removed from the list
+@code{Vall_weak_hash_tables}. The other ones are treated more carefully
+by scanning over all entries and removing one as soon as one of
+the components @code{key} and @code{value} is unmarked.
+The same idea applies to the weak lists. It is accomplished by
+@code{prune_weak_lists}: An unmarked list is pruned from
+@code{Vall_weak_lists} immediately. A marked list is treated more
+carefully by going over it and removing just the unmarked pairs.
+@item
+The function @code{prune_specifiers} checks all listed specifiers held
+in @code{Vall_speficiers} and removes the ones from the lists that are
+unmarked.
+@item
+All syntax tables are stored in a list called
+@code{Vall_syntax_tables}. The function @code{prune_syntax_tables} walks
+through it and unlinks the tables that are unmarked.
+@item
+Next, we will attack the complete sweeping - the function
+@code{gc_sweep} which holds the predominance.
+@item
+First, all the variables with respect to garbage collection are
+reset. @code{consing_since_gc} - the counter of the created cells since
+the last garbage collection - is set back to 0, and
+@code{gc_in_progress} is not @code{true} anymore.
+@item
+In case the session is interactive, the displayed cursor and message are
+removed again.
+@item
+The state of @code{gc_inhibit} is restored to the former value by
+unwinding the stack.
+@item
+A small memory reserve is always held back that can be reached by
+@code{breathing_space}. If nothing more is left, we create a new reserve
+and exit.
+@end enumerate
+@node mark_object
+@subsection @code{mark_object}
+@cindex @code{mark_object}
+The first thing that is checked while marking an object is whether the
+object is a real Lisp object @code{Lisp_Type_Record} or just an integer
+or a character. Integers and characters are the only two types that are
+stored directly - without another level of indirection, and therefore they
+don�t have to be marked and collected.
+@xref{How Lisp Objects Are Represented in C}.
+The second case is the one we have to handle. It is the one when we are
+dealing with a pointer to a Lisp object. But, there exist also three
+possibilities, that prevent us from doing anything while marking: The
+object is read only which prevents it from being garbage collected,
+i.e. marked (@code{C_READONLY_RECORD_HEADER}). The object in question is
+already marked, and need not be marked for the second time (checked by
+@code{MARKED_RECORD_HEADER_P}). If it is a special, unmarkable object
+(@code{UNMARKABLE_RECORD_HEADER_P}, apparently, these are objects that
+sit in some CONST space, and can therefore not be marked, see
+@code{this_one_is_unmarkable} in @code{alloc.c}).
+Now, the actual marking is feasible. We do so by once using the macro
+@code{MARK_RECORD_HEADER} to mark the object itself (actually the
+special flag in the lrecord header), and calling its special marker
+"method" @code{marker} if available. The marker method marks every
+other object that is in reach from our current object. Note, that these
+marker methods should not call @code{mark_object} recursively, but
+instead should return the next object from where further marking has to
+be performed.
+In case another object was returned, as mentioned before, we reiterate
+the whole @code{mark_object} process beginning with this next object.
+@node gc_sweep
+@subsection @code{gc_sweep}
+@cindex @code{gc_sweep}
+The job of this function is to free all unmarked records from memory. As
+we know, there are different types of objects implemented and managed, and
+consequently different ways to free them from memory.
+@xref{Introduction to Allocation}.
+We start with all objects stored through @code{lcrecords}. All
+bulkier objects are allocated and handled using that scheme of
+@code{lcrecords}. Each object is @code{malloc}ed separately
+instead of placing it in one of the contiguous frob blocks. All types
+that are currently stored
+using @code{lcrecords}�s  @code{alloc_lcrecord} and
+@code{make_lcrecord_list} are the types: vectors, buffers,
+char-table, char-table-entry, console, weak-list, database, device,
+ldap, hash-table, command-builder, extent-auxiliary, extent-info, face,
+coding-system, frame, image-instance, glyph, popup-data, gui-item,
+keymap, charset, color_instance, font_instance, opaque, opaque-list,
+process, range-table, specifier, symbol-value-buffer-local,
+symbol-value-lisp-magic, symbol-value-varalias, toolbar-button,
+tooltalk-message, tooltalk-pattern, window, and window-configuration. We
+take care of them in the fist place
+in order to be able to handle and to finalize items stored in them more
+easily. The function @code{sweep_lcrecords_1} as described below is
+doing the whole job for us.
+For a description about the internals: @xref{lrecords}.
+Our next candidates are the other objects that behave quite differently
+than everything else: the strings. They consists of two parts, a
+fixed-size portion (@code{struct Lisp_string}) holding the string's
+length, its property list and a pointer to the second part, and the
+actual string data, which is stored in string-chars blocks comparable to
+frob blocks. In this block, the data is not only freed, but also a
+compression of holes is made, i.e. all strings are relocated together.
+@xref{String}. This compacting phase is performed by the function
+@code{compact_string_chars}, the actual sweeping by the function
+@code{sweep_strings} is described below.
+After that, the other types are swept step by step using functions
+@code{sweep_conses}, @code{sweep_bit_vectors_1},
+@code{sweep_compiled_functions}, @code{sweep_floats},
+@code{sweep_symbols}, @code{sweep_extents}, @code{sweep_markers} and
+@code{sweep_extents}.  They are the fixed-size types cons, floats,
+compiled-functions, symbol, marker, extent, and event stored in
+so-called "frob blocks", and therefore we can basically do the same on
+every type objects, using the same macros, especially defined only to
+handle everything with respect to fixed-size blocks. The only fixed-size
+type that is not handled here are the fixed-size portion of strings,
+because we took special care of them earlier.
+The only big exceptions are bit vectors stored differently and
+therefore treated differently by the function @code{sweep_bit_vectors_1}
+described later.
+At first, we need some brief information about how
+these fixed-size types are managed in general, in order to understand
+how the sweeping is done. They have all a fixed size, and are therefore
+stored in big blocks of memory - allocated at once - that can hold a
+certain amount of objects of one type. The macro
+@code{DECLARE_FIXED_TYPE_ALLOC} creates the suitable structures for
+every type. More precisely, we have the block struct
+(holding a pointer to the previous block @code{prev} and the
+objects in @code{block[]}), a pointer to current block
+(@code{current_..._block)}) and its last index
+(@code{current_..._block_index}), and a pointer to the free list that
+will be created. Also a macro @code{FIXED_TYPE_FROM_BLOCK} plus some
+related macros exists that are used to obtain a new object, either from
+the free list @code{ALLOCATE_FIXED_TYPE_1} if there is an unused object
+of that type stored or by allocating a completely new block using
+@code{ALLOCATE_FIXED_TYPE_FROM_BLOCK}.
+The rest works as follows: all of them define a
+macro @code{UNMARK_...} that is used to unmark the object. They define a
+macro @code{ADDITIONAL_FREE_...} that defines additional work that has
+to be done when converting an object from in use to not in use (so far,
+only markers use it in order to unchain them). Then, they all call
+the macro @code{SWEEP_FIXED_TYPE_BLOCK} instantiated with their type name
+and their struct name.
+This call in particular does the following: we go over all blocks
+starting with the current moving towards the oldest.
+For each block, we look at every object in it. If the object already
+freed (checked with @code{FREE_STRUCT_P} using the first pointer of the
+object), or if it is
+set to read only (@code{C_READONLY_RECORD_HEADER_P}, nothing must be
+done. If it is unmarked (checked with @code{MARKED_RECORD_HEADER_P}), it
+is put in the free list and set free (using the macro
+@code{FREE_FIXED_TYPE}, otherwise it stays in the block, but is unmarked
+(by @code{UNMARK_...}). While going through one block, we note if the
+whole block is empty. If so, the whole block is freed (using
+@code{xfree}) and the free list state is set to the state it had before
+handling this block.
+@node sweep_lcrecords_1
+@subsection @code{sweep_lcrecords_1}
+@cindex @code{sweep_lcrecords_1}
+After nullifying the complete lcrecord statistics, we go over all
+lcrecords two separate times. They are all chained together in a list with
+a head called @code{all_lcrecords}.
+The first loop calls for each object its @code{finalizer} method, but only
+in the case that it is not read only
+(@code{C_READONLY_RECORD_HEADER_P)}, it is not already marked
+(@code{MARKED_RECORD_HEADER_P}), it is not already in a free list (list of
+freed objects, field @code{free}) and finally it owns a finalizer
+method.
+The second loop actually frees the appropriate objects again by iterating
+through the whole list. In case an object is read only or marked, it
+has to persist, otherwise it is manually freed by calling
+@code{xfree}. During this loop, the lcrecord statistics are kept up to
+date by calling @code{tick_lcrecord_stats} with the right arguments,
+@node compact_string_chars
+@subsection @code{compact_string_chars}
+@cindex @code{compact_string_chars}
+The purpose of this function is to compact all the data parts of the
+strings that are held in so-called @code{string_chars_block}, i.e. the
+strings that do not exceed a certain maximal length.
+The procedure with which this is done is as follows. We are keeping two
+positions in the @code{string_chars_block}s using two pointer/integer
+pairs, namely @code{from_sb}/@code{from_pos} and
+@code{to_sb}/@code{to_pos}. They stand for the actual positions, from
+where to where, to copy the actually handled string.
+While going over all chained @code{string_char_block}s and their held
+strings, staring at @code{first_string_chars_block}, both pointers
+are advanced and eventually a string is copied from @code{from_sb} to
+@code{to_sb}, depending on the status of the pointed at strings.
+More precisely, we can distinguish between the following actions.
+@itemize @bullet
+@item
+The string at @code{from_sb}'s position could be marked as free, which
+is indicated by an invalid pointer to the pointer that should point back
+to the fixed size string object, and which is checked by
+@code{FREE_STRUCT_P}. In this case, the @code{from_sb}/@code{from_pos}
+is advanced to the next string, and nothing has to be copied.
+@item
+Also, if a string object itself is unmarked, nothing has to be
+copied. We likewise advance the @code{from_sb}/@code{from_pos}
+pair as described above.
+@item
+In all other cases, we have a marked string at hand. The string data
+must be moved from the from-position to the to-position. In case
+there is not enough space in the actual @code{to_sb}-block, we advance
+this pointer to the beginning of the next block before copying. In case the
+from and to positions are different, we perform the
+actual copying using the library function @code{memmove}.
+@end itemize
+After compacting, the pointer to the current
+@code{string_chars_block}, sitting in @code{current_string_chars_block},
+is reset on the last block to which we moved a string,
+i.e. @code{to_block}, and all remaining blocks (we know that they just
+carry garbage) are explicitly @code{xfree}d.
+@node sweep_strings
+@subsection @code{sweep_strings}
+@cindex @code{sweep_strings}
+The sweeping for the fixed sized string objects is essentially exactly
+the same as it is for all other fixed size types. As before, the freeing
+into the suitable free list is done by using the macro
+@code{SWEEP_FIXED_SIZE_BLOCK} after defining the right macros
+@code{UNMARK_string} and @code{ADDITIONAL_FREE_string}. These two
+definitions are a little bit special compared to the ones used
+for the other fixed size types.
+@code{UNMARK_string} is defined the same way except some additional code
+used for updating the bookkeeping information.
+For strings, @code{ADDITIONAL_FREE_string} has to do something in
+addition: in case, the string was not allocated in a
+@code{string_chars_block} because it exceeded the maximal length, and
+therefore it was @code{malloc}ed separately, we know also @code{xfree}
+it explicitly.
+@node sweep_bit_vectors_1
+@subsection @code{sweep_bit_vectors_1}
+@cindex @code{sweep_bit_vectors_1}
+Bit vectors are also one of the rare types that are @code{malloc}ed
+individually. Consequently, while sweeping, all further needless
+bit vectors must be freed by hand. This is done, as one might imagine,
+the expected way: since they are all registered in a list called
+@code{all_bit_vectors}, all elements of that list are traversed,
+all unmarked bit vectors are unlinked by calling @code{xfree} and all of
+them become unmarked.
+In addition, the bookkeeping information used for garbage
+collector's output purposes is updated.
+@node Integers and Characters
+@section Integers and Characters
+Integer and character Lisp objects are created from integers using the
+macros @code{XSETINT()} and @code{XSETCHAR()} or the equivalent
+functions @code{make_int()} and @code{make_char()}. (These are actually
+macros on most systems.)  These functions basically just do some moving
+of bits around, since the integral value of the object is stored
+directly in the @code{Lisp_Object}.
+@code{XSETINT()} and the like will truncate values given to them that
+are too big; i.e. you won't get the value you expected but the tag bits
+will at least be correct.
+@node Allocation from Frob Blocks
+@section Allocation from Frob Blocks
+The uninitialized memory required by a @code{Lisp_Object} of a particular type
+is allocated using
+@code{ALLOCATE_FIXED_TYPE()}.  This only occurs inside of the
+lowest-level object-creating functions in @file{alloc.c}:
+@code{Fcons()}, @code{make_float()}, @code{Fmake_byte_code()},
+@code{Fmake_symbol()}, @code{allocate_extent()},
+@code{allocate_event()}, @code{Fmake_marker()}, and
+@code{make_uninit_string()}.  The idea is that, for each type, there are
+a number of frob blocks (each 2K in size); each frob block is divided up
+into object-sized chunks.  Each frob block will have some of these
+chunks that are currently assigned to objects, and perhaps some that are
+free. (If a frob block has nothing but free chunks, it is freed at the
+end of the garbage collection cycle.)  The free chunks are stored in a
+free list, which is chained by storing a pointer in the first four bytes
+of the chunk. (Except for the free chunks at the end of the last frob
+block, which are handled using an index which points past the end of the
+last-allocated chunk in the last frob block.)
+@code{ALLOCATE_FIXED_TYPE()} first tries to retrieve a chunk from the
+free list; if that fails, it calls
+@code{ALLOCATE_FIXED_TYPE_FROM_BLOCK()}, which looks at the end of the
+last frob block for space, and creates a new frob block if there is
+none. (There are actually two versions of these macros, one of which is
+more defensive but less efficient and is used for error-checking.)
+@node lrecords
+@section lrecords
+[see @file{lrecord.h}]
+All lrecords have at the beginning of their structure a @code{struct
+lrecord_header}.  This just contains a pointer to a @code{struct
+lrecord_implementation}, which is a structure containing method pointers
+and such.  There is one of these for each type, and it is a global,
+constant, statically-declared structure that is declared in the
+@code{DEFINE_LRECORD_IMPLEMENTATION()} macro. (This macro actually
+declares an array of two @code{struct lrecord_implementation}
+structures.  The first one contains all the standard method pointers,
+and is used in all normal circumstances.  During garbage collection,
+however, the lrecord is @dfn{marked} by bumping its implementation
+pointer by one, so that it points to the second structure in the array.
+This structure contains a special indication in it that it's a
+@dfn{marked-object} structure: the finalize method is the special
+function @code{this_marks_a_marked_record()}, and all other methods are
+null pointers.  At the end of garbage collection, all lrecords will
+either be reclaimed or unmarked by decrementing their implementation
+pointers, so this second structure pointer will never remain past
+garbage collection.
+Simple lrecords (of type (c) above) just have a @code{struct
+lrecord_header} at their beginning.  lcrecords, however, actually have a
+@code{struct lcrecord_header}.  This, in turn, has a @code{struct
+lrecord_header} at its beginning, so sanity is preserved; but it also
+has a pointer used to chain all lcrecords together, and a special ID
+field used to distinguish one lcrecord from another. (This field is used
+only for debugging and could be removed, but the space gain is not
+significant.)
+Simple lrecords are created using @code{ALLOCATE_FIXED_TYPE()}, just
+like for other frob blocks.  The only change is that the implementation
+pointer must be initialized correctly. (The implementation structure for
+an lrecord, or rather the pointer to it, is named @code{lrecord_float},
+@code{lrecord_extent}, @code{lrecord_buffer}, etc.)
+lcrecords are created using @code{alloc_lcrecord()}.  This takes a
+size to allocate and an implementation pointer. (The size needs to be
+passed because some lcrecords, such as window configurations, are of
+variable size.) This basically just @code{malloc()}s the storage,
+initializes the @code{struct lcrecord_header}, and chains the lcrecord
+onto the head of the list of all lcrecords, which is stored in the
+variable @code{all_lcrecords}.  The calls to @code{alloc_lcrecord()}
+generally occur in the lowest-level allocation function for each lrecord
+type.
+Whenever you create an lrecord, you need to call either
+@code{DEFINE_LRECORD_IMPLEMENTATION()} or
+@code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()}.  This needs to be
+specified in a C file, at the top level.  What this actually does is
+define and initialize the implementation structure for the lrecord. (And
+possibly declares a function @code{error_check_foo()} that implements
+the @code{XFOO()} macro when error-checking is enabled.)  The arguments
+to the macros are the actual type name (this is used to construct the C
+variable name of the lrecord implementation structure and related
+structures using the @samp{##} macro concatenation operator), a string
+that names the type on the Lisp level (this may not be the same as the C
+type name; typically, the C type name has underscores, while the Lisp
+string has dashes), various method pointers, and the name of the C
+structure that contains the object.  The methods are used to encapsulate
+type-specific information about the object, such as how to print it or
+mark it for garbage collection, so that it's easy to add new object
+types without having to add a specific case for each new type in a bunch
+of different places.
+The difference between @code{DEFINE_LRECORD_IMPLEMENTATION()} and
+@code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()} is that the former is
+used for fixed-size object types and the latter is for variable-size
+object types.  Most object types are fixed-size; some complex
+types, however (e.g. window configurations), are variable-size.
+Variable-size object types have an extra method, which is called
+to determine the actual size of a particular object of that type.
+(Currently this is only used for keeping allocation statistics.)
+For the purpose of keeping allocation statistics, the allocation
+engine keeps a list of all the different types that exist.  Note that,
+since @code{DEFINE_LRECORD_IMPLEMENTATION()} is a macro that is
+specified at top-level, there is no way for it to add to the list of all
+existing types.  What happens instead is that each implementation
+structure contains in it a dynamically assigned number that is
+particular to that type. (Or rather, it contains a pointer to another
+structure that contains this number.  This evasiveness is done so that
+the implementation structure can be declared const.) In the sweep stage
+of garbage collection, each lrecord is examined to see if its
+implementation structure has its dynamically-assigned number set.  If
+not, it must be a new type, and it is added to the list of known types
+and a new number assigned.  The number is used to index into an array
+holding the number of objects of each type and the total memory
+allocated for objects of that type.  The statistics in this array are
+also computed during the sweep stage.  These statistics are returned by
+the call to @code{garbage-collect} and are printed out at the end of the
+loadup phase.
+Note that for every type defined with a @code{DEFINE_LRECORD_*()}
+macro, there needs to be a @code{DECLARE_LRECORD_IMPLEMENTATION()}
+somewhere in a @file{.h} file, and this @file{.h} file needs to be
+included by @file{inline.c}.
+Furthermore, there should generally be a set of @code{XFOOBAR()},
+@code{FOOBARP()}, etc. macros in a @file{.h} (or occasionally @file{.c})
+file.  To create one of these, copy an existing model and modify as
+necessary.
+The various methods in the lrecord implementation structure are:
+@enumerate
+@item
+@cindex mark method
+A @dfn{mark} method.  This is called during the marking stage and passed
+a function pointer (usually the @code{mark_object()} function), which is
+used to mark an object.  All Lisp objects that are contained within the
+object need to be marked by applying this function to them.  The mark
+method should also return a Lisp object, which should be either nil or
+an object to mark. (This can be used in lieu of calling
+@code{mark_object()} on the object, to reduce the recursion depth, and
+consequently should be the most heavily nested sub-object, such as a
+long list.)
+@strong{Please note:} When the mark method is called, garbage collection
+is in progress, and special precautions need to be taken when accessing
+objects; see section (B) above.
+If your mark method does not need to do anything, it can be
+@code{NULL}.
+@item
+A @dfn{print} method.  This is called to create a printed representation
+of the object, whenever @code{princ}, @code{prin1}, or the like is
+called.  It is passed the object, a stream to which the output is to be
+directed, and an @code{escapeflag} which indicates whether the object's
+printed representation should be @dfn{escaped} so that it is
+readable. (This corresponds to the difference between @code{princ} and
+@code{prin1}.) Basically, @dfn{escaped} means that strings will have
+quotes around them and confusing characters in the strings such as
+quotes, backslashes, and newlines will be backslashed; and that special
+care will be taken to make symbols print in a readable fashion
+(e.g. symbols that look like numbers will be backslashed).  Other
+readable objects should perhaps pass @code{escapeflag} on when
+sub-objects are printed, so that readability is preserved when necessary
+(or if not, always pass in a 1 for @code{escapeflag}).  Non-readable
+objects should in general ignore @code{escapeflag}, except that some use
+it as an indication that more verbose output should be given.
+Sub-objects are printed using @code{print_internal()}, which takes
+exactly the same arguments as are passed to the print method.
+Literal C strings should be printed using @code{write_c_string()},
+or @code{write_string_1()} for non-null-terminated strings.
+Functions that do not have a readable representation should check the
+@code{print_readably} flag and signal an error if it is set.
+If you specify NULL for the print method, the
+@code{default_object_printer()} will be used.
+@item
+A @dfn{finalize} method.  This is called at the beginning of the sweep
+stage on lcrecords that are about to be freed, and should be used to
+perform any extra object cleanup.  This typically involves freeing any
+extra @code{malloc()}ed memory associated with the object, releasing any
+operating-system and window-system resources associated with the object
+(e.g. pixmaps, fonts), etc.
+The finalize method can be NULL if nothing needs to be done.
+WARNING #1: The finalize method is also called at the end of the dump
+phase; this time with the for_disksave parameter set to non-zero.  The
+object is @emph{not} about to disappear, so you have to make sure to
+@emph{not} free any extra @code{malloc()}ed memory if you're going to
+need it later.  (Also, signal an error if there are any operating-system
+and window-system resources here, because they can't be dumped.)
+Finalize methods should, as a rule, set to zero any pointers after
+they've been freed, and check to make sure pointers are not zero before
+freeing.  Although I'm pretty sure that finalize methods are not called
+twice on the same object (except for the @code{for_disksave} proviso),
+we've gotten nastily burned in some cases by not doing this.
+WARNING #2: The finalize method is @emph{only} called for
+lcrecords, @emph{not} for simply lrecords.  If you need a
+finalize method for simple lrecords, you have to stick
+it in the @code{ADDITIONAL_FREE_foo()} macro in @file{alloc.c}.
+WARNING #3: Things are in an @emph{extremely} bizarre state
+when @code{ADDITIONAL_FREE_foo()} is called, so you have to
+be incredibly careful when writing one of these functions.
+See the comment in @code{gc_sweep()}.  If you ever have to add
+one of these, consider using an lcrecord or dealing with
+the problem in a different fashion.
+@item
+An @dfn{equal} method.  This compares the two objects for similarity,
+when @code{equal} is called.  It should compare the contents of the
+objects in some reasonable fashion.  It is passed the two objects and a
+@dfn{depth} value, which is used to catch circular objects.  To compare
+sub-Lisp-objects, call @code{internal_equal()} and bump the depth value
+by one.  If this value gets too high, a @code{circular-object} error
+will be signaled.
+If this is NULL, objects are @code{equal} only when they are @code{eq},
+i.e. identical.
+@item
+A @dfn{hash} method.  This is used to hash objects when they are to be
+compared with @code{equal}.  The rule here is that if two objects are
+@code{equal}, they @emph{must} hash to the same value; i.e. your hash
+function should use some subset of the sub-fields of the object that are
+compared in the ``equal'' method.  If you specify this method as
+@code{NULL}, the object's pointer will be used as the hash, which will
+@emph{fail} if the object has an @code{equal} method, so don't do this.
+To hash a sub-Lisp-object, call @code{internal_hash()}.  Bump the
+depth by one, just like in the ``equal'' method.
+To convert a Lisp object directly into a hash value (using
+its pointer), use @code{LISP_HASH()}.  This is what happens when
+the hash method is NULL.
+To hash two or more values together into a single value, use
+@code{HASH2()}, @code{HASH3()}, @code{HASH4()}, etc.
+@item
+@dfn{getprop}, @dfn{putprop}, @dfn{remprop}, and @dfn{plist} methods.
+These are used for object types that have properties.  I don't feel like
+documenting them here.  If you create one of these objects, you have to
+use different macros to define them,
+i.e. @code{DEFINE_LRECORD_IMPLEMENTATION_WITH_PROPS()} or
+@code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION_WITH_PROPS()}.
+@item
+A @dfn{size_in_bytes} method, when the object is of variable-size.
+(i.e. declared with a @code{_SEQUENCE_IMPLEMENTATION} macro.)  This should
+simply return the object's size in bytes, exactly as you might expect.
+For an example, see the methods for window configurations and opaques.
+@end enumerate
+@node Low-level allocation
+@section Low-level allocation
+Memory that you want to allocate directly should be allocated using
+@code{xmalloc()} rather than @code{malloc()}.  This implements
+error-checking on the return value, and once upon a time did some more
+vital stuff (i.e. @code{BLOCK_INPUT}, which is no longer necessary).
+Free using @code{xfree()}, and realloc using @code{xrealloc()}.  Note
+that @code{xmalloc()} will do a non-local exit if the memory can't be
+allocated. (Many functions, however, do not expect this, and thus XEmacs
+will likely crash if this happens.  @strong{This is a bug.}  If you can,
+you should strive to make your function handle this OK.  However, it's
+difficult in the general circumstance, perhaps requiring extra
+unwind-protects and such.)
+Note that XEmacs provides two separate replacements for the standard
+@code{malloc()} library function.  These are called @dfn{old GNU malloc}
+(@file{malloc.c}) and @dfn{new GNU malloc} (@file{gmalloc.c}),
+respectively.  New GNU malloc is better in pretty much every way than
+old GNU malloc, and should be used if possible.  (It used to be that on
+some systems, the old one worked but the new one didn't.  I think this
+was due specifically to a bug in SunOS, which the new one now works
+around; so I don't think the old one ever has to be used any more.) The
+primary difference between both of these mallocs and the standard system
+malloc is that they are much faster, at the expense of increased space.
+The basic idea is that memory is allocated in fixed chunks of powers of
+two.  This allows for basically constant malloc time, since the various
+chunks can just be kept on a number of free lists. (The standard system
+malloc typically allocates arbitrary-sized chunks and has to spend some
+time, sometimes a significant amount of time, walking the heap looking
+for a free block to use and cleaning things up.)  The new GNU malloc
+improves on things by allocating large objects in chunks of 4096 bytes
+rather than in ever larger powers of two, which results in ever larger
+wastage.  There is a slight speed loss here, but it's of doubtful
+significance.
+NOTE: Apparently there is a third-generation GNU malloc that is
+significantly better than the new GNU malloc, and should probably
+be included in XEmacs.
+There is also the relocating allocator, @file{ralloc.c}.  This actually
+moves blocks of memory around so that the @code{sbrk()} pointer shrunk
+and virtual memory released back to the system.  On some systems,
+this is a big win.  On all systems, it causes a noticeable (and
+sometimes huge) speed penalty, so I turn it off by default.
+@file{ralloc.c} only works with the new GNU malloc in @file{gmalloc.c}.
+There are also two versions of @file{ralloc.c}, one that uses @code{mmap()}
+rather than block copies to move data around.  This purports to
+be faster, although that depends on the amount of data that would
+have had to be block copied and the system-call overhead for
+@code{mmap()}.  I don't know exactly how this works, except that the
+relocating-allocation routines are pretty much used only for
+the memory allocated for a buffer, which is the biggest consumer
+of space, esp. of space that may get freed later.
+Note that the GNU mallocs have some ``memory warning'' facilities.
+XEmacs taps into them and issues a warning through the standard
+warning system, when memory gets to 75%, 85%, and 95% full.
+(On some systems, the memory warnings are not functional.)
+Allocated memory that is going to be used to make a Lisp object
+is created using @code{allocate_lisp_storage()}.  This calls @code{xmalloc()}
+but also verifies that the pointer to the memory can fit into
+a Lisp word (remember that some bits are taken away for a type
+tag and a mark bit).  If not, an error is issued through @code{memory_full()}.
+@code{allocate_lisp_storage()} is called by @code{alloc_lcrecord()},
+@code{ALLOCATE_FIXED_TYPE()}, and the vector and bit-vector creation
+routines.  These routines also call @code{INCREMENT_CONS_COUNTER()} at the
+appropriate times; this keeps statistics on how much memory is
+allocated, so that garbage-collection can be invoked when the
+threshold is reached.
+@node Pure Space
+@section Pure Space
+Not yet documented.
+@node Cons
+@section Cons
+Conses are allocated in standard frob blocks.  The only thing to
+note is that conses can be explicitly freed using @code{free_cons()}
+and associated functions @code{free_list()} and @code{free_alist()}.  This
+immediately puts the conses onto the cons free list, and decrements
+the statistics on memory allocation appropriately.  This is used
+to good effect by some extremely commonly-used code, to avoid
+generating extra objects and thereby triggering GC sooner.
+However, you have to be @emph{extremely} careful when doing this.
+If you mess this up, you will get BADLY BURNED, and it has happened
+before.
+@node Vector
+@section Vector
+As mentioned above, each vector is @code{malloc()}ed individually, and
+all are threaded through the variable @code{all_vectors}.  Vectors are
+marked strangely during garbage collection, by kludging the size field.
+Note that the @code{struct Lisp_Vector} is declared with its
+@code{contents} field being a @emph{stretchy} array of one element.  It
+is actually @code{malloc()}ed with the right size, however, and access
+to any element through the @code{contents} array works fine.
+@node Bit Vector
+@section Bit Vector
+Bit vectors work exactly like vectors, except for more complicated
+code to access an individual bit, and except for the fact that bit
+vectors are lrecords while vectors are not. (The only difference here is
+that there's an lrecord implementation pointer at the beginning and the
+tag field in bit vector Lisp words is ``lrecord'' rather than
+``vector''.)
+@node Symbol
+@section Symbol
+Symbols are also allocated in frob blocks.  Note that the code
+exists for symbols to be either lrecords (category (c) above)
+or simple types (category (b) above), and are lrecords by
+default (I think), although there is no good reason for this.
+Note that symbols in the awful horrible obarray structure are
+chained through their @code{next} field.
+Remember that @code{intern} looks up a symbol in an obarray, creating
+one if necessary.
+@node Marker
+@section Marker
+Markers are allocated in frob blocks, as usual.  They are kept
+in a buffer unordered, but in a doubly-linked list so that they
+can easily be removed. (Formerly this was a singly-linked list,
+but in some cases garbage collection took an extraordinarily
+long time due to the O(N^2) time required to remove lots of
+markers from a buffer.) Markers are removed from a buffer in
+the finalize stage, in @code{ADDITIONAL_FREE_marker()}.
+@node String
+@section String
+As mentioned above, strings are a special case.  A string is logically
+two parts, a fixed-size object (containing the length, property list,
+and a pointer to the actual data), and the actual data in the string.
+The fixed-size object is a @code{struct Lisp_String} and is allocated in
+frob blocks, as usual.  The actual data is stored in special
+@dfn{string-chars blocks}, which are 8K blocks of memory.
+Currently-allocated strings are simply laid end to end in these
+string-chars blocks, with a pointer back to the @code{struct Lisp_String}
+stored before each string in the string-chars block.  When a new string
+needs to be allocated, the remaining space at the end of the last
+string-chars block is used if there's enough, and a new string-chars
+block is created otherwise.
+There are never any holes in the string-chars blocks due to the string
+compaction and relocation that happens at the end of garbage collection.
+During the sweep stage of garbage collection, when objects are
+reclaimed, the garbage collector goes through all string-chars blocks,
+looking for unused strings.  Each chunk of string data is preceded by a
+pointer to the corresponding @code{struct Lisp_String}, which indicates
+both whether the string is used and how big the string is, i.e. how to
+get to the next chunk of string data.  Holes are compressed by
+block-copying the next string into the empty space and relocating the
+pointer stored in the corresponding @code{struct Lisp_String}.
+@strong{This means you have to be careful with strings in your code.}
+See the section above on @code{GCPRO}ing.
+Note that there is one situation not handled: a string that is too big
+to fit into a string-chars block.  Such strings, called @dfn{big
+strings}, are all @code{malloc()}ed as their own block. (#### Although it
+would make more sense for the threshold for big strings to be somewhat
+lower, e.g. 1/2 or 1/4 the size of a string-chars block.  It seems that
+this was indeed the case formerly -- indeed, the threshold was set at
+1/8 -- but Mly forgot about this when rewriting things for 19.8.)
+Note also that the string data in string-chars blocks is padded as
+necessary so that proper alignment constraints on the @code{struct
+Lisp_String} back pointers are maintained.
+Finally, strings can be resized.  This happens in Mule when a
+character is substituted with a different-length character, or during
+modeline frobbing. (You could also export this to Lisp, but it's not
+done so currently.) Resizing a string is a potentially tricky process.
+If the change is small enough that the padding can absorb it, nothing
+other than a simple memory move needs to be done.  Keep in mind,
+however, that the string can't shrink too much because the offset to the
+next string in the string-chars block is computed by looking at the
+length and rounding to the nearest multiple of four or eight.  If the
+string would shrink or expand beyond the correct padding, new string
+data needs to be allocated at the end of the last string-chars block and
+the data moved appropriately.  This leaves some dead string data, which
+is marked by putting a special marker of 0xFFFFFFFF in the @code{struct
+Lisp_String} pointer before the data (there's no real @code{struct
+Lisp_String} to point to and relocate), and storing the size of the dead
+string data (which would normally be obtained from the now-non-existent
+@code{struct Lisp_String}) at the beginning of the dead string data gap.
+The string compactor recognizes this special 0xFFFFFFFF marker and
+handles it correctly.
+@node Compiled Function
+@section Compiled Function
+Not yet documented.
+@node Events and the Event Loop, Evaluation; Stack Frames; Bindings, Allocation of Objects in XEmacs Lisp, Top
+@chapter Events and the Event Loop
+@menu
+* Introduction to Events::
+* Main Loop::
+* Specifics of the Event Gathering Mechanism::
+* Specifics About the Emacs Event::
+* The Event Stream Callback Routines::
+* Other Event Loop Functions::
+* Converting Events::
+* Dispatching Events; The Command Builder::
+@end menu
+@node Introduction to Events
+@section Introduction to Events
+An event is an object that encapsulates information about an
+interesting occurrence in the operating system.  Events are
+generated either by user action, direct (e.g. typing on the
+keyboard or moving the mouse) or indirect (moving another
+window, thereby generating an expose event on an Emacs frame),
+or as a result of some other typically asynchronous action happening,
+such as output from a subprocess being ready or a timer expiring.
+Events come into the system in an asynchronous fashion (typically
+through a callback being called) and are converted into a
+synchronous event queue (first-in, first-out) in a process that
+we will call @dfn{collection}.
+Note that each application has its own event queue. (It is
+immaterial whether the collection process directly puts the
+events in the proper application's queue, or puts them into
+a single system queue, which is later split up.)
+The most basic level of event collection is done by the
+operating system or window system.  Typically, XEmacs does
+its own event collection as well.  Often there are multiple
+layers of collection in XEmacs, with events from various
+sources being collected into a queue, which is then combined
+with other sources to go into another queue (i.e. a second
+level of collection), with perhaps another level on top of
+this, etc.
+XEmacs has its own types of events (called @dfn{Emacs events}),
+which provides an abstract layer on top of the system-dependent
+nature of the most basic events that are received.  Part of the
+complex nature of the XEmacs event collection process involves
+converting from the operating-system events into the proper
+Emacs events -- there may not be a one-to-one correspondence.
+Emacs events are documented in @file{events.h}; I'll discuss them
+later.
+@node Main Loop
+@section Main Loop
+The @dfn{command loop} is the top-level loop that the editor is always
+running.  It loops endlessly, calling @code{next-event} to retrieve an
+event and @code{dispatch-event} to execute it. @code{dispatch-event} does
+the appropriate thing with non-user events (process, timeout,
+magic, eval, mouse motion); this involves calling a Lisp handler
+function, redrawing a newly-exposed part of a frame, reading
+subprocess output, etc.  For user events, @code{dispatch-event}
+looks up the event in relevant keymaps or menubars; when a
+full key sequence or menubar selection is reached, the appropriate
+function is executed. @code{dispatch-event} may have to keep state
+across calls; this is done in the ``command-builder'' structure
+associated with each console (remember, there's usually only
+one console), and the engine that looks up keystrokes and
+constructs full key sequences is called the @dfn{command builder}.
+This is documented elsewhere.
+The guts of the command loop are in @code{command_loop_1()}.  This
+function doesn't catch errors, though -- that's the job of
+@code{command_loop_2()}, which is a condition-case (i.e. error-trapping)
+wrapper around @code{command_loop_1()}.  @code{command_loop_1()} never
+returns, but may get thrown out of.
+When an error occurs, @code{cmd_error()} is called, which usually
+invokes the Lisp error handler in @code{command-error}; however, a
+default error handler is provided if @code{command-error} is @code{nil}
+(e.g. during startup).  The purpose of the error handler is simply to
+display the error message and do associated cleanup; it does not need to
+throw anywhere.  When the error handler finishes, the condition-case in
+@code{command_loop_2()} will finish and @code{command_loop_2()} will
+reinvoke @code{command_loop_1()}.
+@code{command_loop_2()} is invoked from three places: from
+@code{initial_command_loop()} (called from @code{main()} at the end of
+internal initialization), from the Lisp function @code{recursive-edit},
+and from @code{call_command_loop()}.
+@code{call_command_loop()} is called when a macro is started and when
+the minibuffer is entered; normal termination of the macro or minibuffer
+causes a throw out of the recursive command loop. (To
+@code{execute-kbd-macro} for macros and @code{exit} for minibuffers.
+Note also that the low-level minibuffer-entering function,
+@code{read-minibuffer-internal}, provides its own error handling and
+does not need @code{command_loop_2()}'s error encapsulation; so it tells
+@code{call_command_loop()} to invoke @code{command_loop_1()} directly.)
+Note that both read-minibuffer-internal and recursive-edit set up a
+catch for @code{exit}; this is why @code{abort-recursive-edit}, which
+throws to this catch, exits out of either one.
+@code{initial_command_loop()}, called from @code{main()}, sets up a
+catch for @code{top-level} when invoking @code{command_loop_2()},
+allowing functions to throw all the way to the top level if they really
+need to.  Before invoking @code{command_loop_2()},
+@code{initial_command_loop()} calls @code{top_level_1()}, which handles
+all of the startup stuff (creating the initial frame, handling the
+command-line options, loading the user's @file{.emacs} file, etc.).  The
+function that actually does this is in Lisp and is pointed to by the
+variable @code{top-level}; normally this function is
+@code{normal-top-level}.  @code{top_level_1()} is just an error-handling
+wrapper similar to @code{command_loop_2()}.  Note also that
+@code{initial_command_loop()} sets up a catch for @code{top-level} when
+invoking @code{top_level_1()}, just like when it invokes
+@code{command_loop_2()}.
+@node Specifics of the Event Gathering Mechanism
+@section Specifics of the Event Gathering Mechanism
+Here is an approximate diagram of the collection processes
+at work in XEmacs, under TTY's (TTY's are simpler than X
+so we'll look at this first):
+@noindent
+@example
+asynch.      asynch.    asynch.   asynch.             [Collectors in
+kbd events  kbd events   process   process                the OS]
+|         |         output    output
+|         |           |         |
+|         |           |         |      SIGINT,   [signal handlers
+|         |           |         |      SIGQUIT,     in XEmacs]
+V         V           V         V      SIGWINCH,
+file      file        file      file    SIGALRM
+desc.     desc.       desc.     desc.     |
+(TTY)     (TTY)       (pipe)    (pipe)    |
+|          |          |         |      fake    timeouts
+|          |          |         |      file        |
+|          |          |         |      desc.       |
+|          |          |         |      (pipe)      |
+|          |          |         |        |         |
+|          |          |         |        |         |
+|          |          |         |        |         |
+V          V          V         V        V         V
+------>-----------<----------------<----------------
+|
+|
+| [collected using select() in emacs_tty_next_event()
+|  and converted to the appropriate Emacs event]
+|
+|
+V          (above this line is TTY-specific)
+Emacs -----------------------------------------------
+event (below this line is the generic event mechanism)
+|
+|
+was there     if not, call
+a SIGINT?  emacs_tty_next_event()
+|             |
+|             |
+|             |
+V             V
+--->------<----
+|
+|     [collected in event_stream_next_event();
+|      SIGINT is converted using maybe_read_quit_event()]
+V
+Emacs
+event
+|
+\---->------>----- maybe_kbd_translate() ---->---\
+|
+|
+|
+command event queue                                    |
+if not from command
+(contains events that were                   event queue, call
+read earlier but not processed,              event_stream_next_event()
+typically when waiting in a                               |
+sit-for, sleep-for, etc. for                              |
+a particular event to be received)                         |
+|                                            |
+|                                            |
+V                                            V
+---->------------------------------------<----
+|
+| [collected in
+|  next_event_internal()]
+|
+unread-     unread-       event from          |
+command-    command-       keyboard       else, call
+events      event           macro      next_event_internal()
+|           |               |               |
+|           |               |               |
+|           |               |               |
+V           V               V               V
+--------->----------------------<------------
+|
+|      [collected in `next-event', which may loop
+|       more than once if the event it gets is on
+|       a dead frame, device, etc.]
+|
+|
+V
+feed into top-level event loop,
+which repeatedly calls `next-event'
+and then dispatches the event
+using `dispatch-event'
+@end example
+Notice the separation between TTY-specific and generic event mechanism.
+When using the Xt-based event loop, the TTY-specific stuff is replaced
+but the rest stays the same.
+It's also important to realize that only one different kind of
+system-specific event loop can be operating at a time, and must be able
+to receive all kinds of events simultaneously.  For the two existing
+event loops (implemented in @file{event-tty.c} and @file{event-Xt.c},
+respectively), the TTY event loop @emph{only} handles TTY consoles,
+while the Xt event loop handles @emph{both} TTY and X consoles.  This
+situation is different from all of the output handlers, where you simply
+have one per console type.
+Here's the Xt Event Loop Diagram (notice that below a certain point,
+it's the same as the above diagram):
+@example
+asynch. asynch. asynch. asynch.                 [Collectors in
+kbd     kbd    process process                    the OS]
+events  events  output  output
+|       |       |       |
+|       |       |       |     asynch. asynch. [Collectors in the
+|       |       |       |       X        X     OS and X Window System]
+|       |       |       |     events  events
+|       |       |       |       |        |
+|       |       |       |       |        |
+|       |       |       |       |        |    SIGINT, [signal handlers
+|       |       |       |       |        |    SIGQUIT,   in XEmacs]
+|       |       |       |       |        |    SIGWINCH,
+|       |       |       |       |        |    SIGALRM
+|       |       |       |       |        |       |
+|       |       |       |       |        |       |
+|       |       |       |       |        |       |      timeouts
+|       |       |       |       |        |       |          |
+|       |       |       |       |        |       |          |
+|       |       |       |       |        |       V          |
+V       V       V       V       V        V      fake        |
+file    file    file    file    file     file    file        |
+desc.   desc.   desc.   desc.   desc.    desc.   desc.       |
+(TTY)   (TTY)   (pipe)  (pipe) (socket) (socket) (pipe)      |
+|       |       |       |       |        |       |          |
+|       |       |       |       |        |       |          |
+|       |       |       |       |        |       |          |
+V       V       V       V       V        V       V          V
+--->----------------------------------------<---------<------
+|              |               |
+|              |               |[collected using select() in
+|              |               | _XtWaitForSomething(), called
+|              |               | from XtAppProcessEvent(), called
+|              |               | in emacs_Xt_next_event();
+|              |               | dispatched to various callbacks]
+|              |               |
+|              |               |
+emacs_Xt_        p_s_callback(),    | [popup_selection_callback]
+event_handler()  x_u_v_s_callback(),| [x_update_vertical_scrollbar_
+|           x_u_h_s_callback(),|  callback]
+|           search_callback()  | [x_update_horizontal_scrollbar_
+|              |               |  callback]
+|              |               |
+|              |               |
+enqueue_Xt_       signal_special_   |
+dispatch_event()  Xt_user_event()   |
+[maybe multiple     |               |
+times, maybe 0     |               |
+times]             |               |
+|            enqueue_Xt_       |
+|            dispatch_event()  |
+|              |               |
+|              |               |
+V              V               |
+-->----------<--               |
+|                       |
+|                       |
+dispatch             Xt_what_callback()
+event                  sets flags
+queue                      |
+|                       |
+|                       |
+|                       |
+|                       |
+---->-----------<--------
+|
+|
+|     [collected and converted as appropriate in
+|            emacs_Xt_next_event()]
+|
+|
+V          (above this line is Xt-specific)
+Emacs ------------------------------------------------
+event (below this line is the generic event mechanism)
+|
+|
+was there      if not, call
+a SIGINT?   emacs_Xt_next_event()
+|              |
+|              |
+|              |
+V              V
+--->-------<----
+|
+|        [collected in event_stream_next_event();
+|         SIGINT is converted using maybe_read_quit_event()]
+V
+Emacs
+event
+|
+\---->------>----- maybe_kbd_translate() -->-----\
+|
+|
+|
+command event queue                                    |
+if not from command
+(contains events that were                  event queue, call
+read earlier but not processed,             event_stream_next_event()
+typically when waiting in a                               |
+sit-for, sleep-for, etc. for                              |
+a particular event to be received)                         |
+|                                            |
+|                                            |
+V                                            V
+---->----------------------------------<------
+|
+| [collected in
+|  next_event_internal()]
+|
+unread-     unread-       event from          |
+command-    command-       keyboard       else, call
+events      event           macro      next_event_internal()
+|           |               |               |
+|           |               |               |
+|           |               |               |
+V           V               V               V
+--------->----------------------<------------
+|
+|      [collected in `next-event', which may loop
+|       more than once if the event it gets is on
+|       a dead frame, device, etc.]
+|
+|
+V
+feed into top-level event loop,
+which repeatedly calls `next-event'
+and then dispatches the event
+using `dispatch-event'
+@end example
+@node Specifics About the Emacs Event
+@section Specifics About the Emacs Event
+@node The Event Stream Callback Routines
+@section The Event Stream Callback Routines
+@node Other Event Loop Functions
+@section Other Event Loop Functions
+@code{detect_input_pending()} and @code{input-pending-p} look for
+input by calling @code{event_stream->event_pending_p} and looking in
+@code{[V]unread-command-event} and the @code{command_event_queue} (they
+do not check for an executing keyboard macro, though).
+@code{discard-input} cancels any command events pending (and any
+keyboard macros currently executing), and puts the others onto the
+@code{command_event_queue}.  There is a comment about a ``race
+condition'', which is not a good sign.
+@code{next-command-event} and @code{read-char} are higher-level
+interfaces to @code{next-event}.  @code{next-command-event} gets the
+next @dfn{command} event (i.e.  keypress, mouse event, menu selection,
+or scrollbar action), calling @code{dispatch-event} on any others.
+@code{read-char} calls @code{next-command-event} and uses
+@code{event_to_character()} to return the character equivalent.  With
+the right kind of input method support, it is possible for (read-char)
+to return a Kanji character.
+@node Converting Events
+@section Converting Events
+@code{character_to_event()}, @code{event_to_character()},
+@code{event-to-character}, and @code{character-to-event} convert between
+characters and keypress events corresponding to the characters.  If the
+event was not a keypress, @code{event_to_character()} returns -1 and
+@code{event-to-character} returns @code{nil}.  These functions convert
+between character representation and the split-up event representation
+(keysym plus mod keys).
+@node Dispatching Events; The Command Builder
+@section Dispatching Events; The Command Builder
+Not yet documented.
+@node Evaluation; Stack Frames; Bindings, Symbols and Variables, Events and the Event Loop, Top
+@chapter Evaluation; Stack Frames; Bindings
+@menu
+* Evaluation::
+* Dynamic Binding; The specbinding Stack; Unwind-Protects::
+* Simple Special Forms::
+* Catch and Throw::
+@end menu
+@node Evaluation
+@section Evaluation
+@code{Feval()} evaluates the form (a Lisp object) that is passed to
+it.  Note that evaluation is only non-trivial for two types of objects:
+symbols and conses.  A symbol is evaluated simply by calling
+@code{symbol-value} on it and returning the value.
+Evaluating a cons means calling a function.  First, @code{eval} checks
+to see if garbage-collection is necessary, and calls
+@code{garbage_collect_1()} if so.  It then increases the evaluation
+depth by 1 (@code{lisp_eval_depth}, which is always less than
+@code{max_lisp_eval_depth}) and adds an element to the linked list of
+@code{struct backtrace}'s (@code{backtrace_list}).  Each such structure
+contains a pointer to the function being called plus a list of the
+function's arguments.  Originally these values are stored unevalled, and
+as they are evaluated, the backtrace structure is updated.  Garbage
+collection pays attention to the objects pointed to in the backtrace
+structures (garbage collection might happen while a function is being
+called or while an argument is being evaluated, and there could easily
+be no other references to the arguments in the argument list; once an
+argument is evaluated, however, the unevalled version is not needed by
+eval, and so the backtrace structure is changed).
+At this point, the function to be called is determined by looking at
+the car of the cons (if this is a symbol, its function definition is
+retrieved and the process repeated).  The function should then consist
+of either a @code{Lisp_Subr} (built-in function written in C), a
+@code{Lisp_Compiled_Function} object, or a cons whose car is one of the
+symbols @code{autoload}, @code{macro} or @code{lambda}.
+If the function is a @code{Lisp_Subr}, the lisp object points to a
+@code{struct Lisp_Subr} (created by @code{DEFUN()}), which contains a
+pointer to the C function, a minimum and maximum number of arguments
+(or possibly the special constants @code{MANY} or @code{UNEVALLED}), a
+pointer to the symbol referring to that subr, and a couple of other
+things.  If the subr wants its arguments @code{UNEVALLED}, they are
+passed raw as a list.  Otherwise, an array of evaluated arguments is
+created and put into the backtrace structure, and either passed whole
+(@code{MANY}) or each argument is passed as a C argument.
+If the function is a @code{Lisp_Compiled_Function},
+@code{funcall_compiled_function()} is called.  If the function is a
+lambda list, @code{funcall_lambda()} is called.  If the function is a
+macro, [..... fill in] is done.  If the function is an autoload,
+@code{do_autoload()} is called to load the definition and then eval
+starts over [explain this more].
+When @code{Feval()} exits, the evaluation depth is reduced by one, the
+debugger is called if appropriate, and the current backtrace structure
+is removed from the list.
+Both @code{funcall_compiled_function()} and @code{funcall_lambda()} need
+to go through the list of formal parameters to the function and bind
+them to the actual arguments, checking for @code{&rest} and
+@code{&optional} symbols in the formal parameters and making sure the
+number of actual arguments is correct.
+@code{funcall_compiled_function()} can do this a little more
+efficiently, since the formal parameter list can be checked for sanity
+when the compiled function object is created.
+@code{funcall_lambda()} simply calls @code{Fprogn} to execute the code
+in the lambda list.
+@code{funcall_compiled_function()} calls the real byte-code interpreter
+@code{execute_optimized_program()} on the byte-code instructions, which
+are converted into an internal form for faster execution.
+When a compiled function is executed for the first time by
+@code{funcall_compiled_function()}, or when it is @code{Fpurecopy()}ed
+during the dump phase of building XEmacs, the byte-code instructions are
+converted from a @code{Lisp_String} (which is inefficient to access,
+especially in the presence of MULE) into a @code{Lisp_Opaque} object
+containing an array of unsigned char, which can be directly executed by
+the byte-code interpreter.  At this time the byte code is also analyzed
+for validity and transformed into a more optimized form, so that
+@code{execute_optimized_program()} can really fly.
+Here are some of the optimizations performed by the internal byte-code
+transformer:
+@enumerate
+@item
+References to the @code{constants} array are checked for out-of-range
+indices, so that the byte interpreter doesn't have to.
+@item
+References to the @code{constants} array that will be used as a Lisp
+variable are checked for being correct non-constant (i.e. not @code{t},
+@code{nil}, or @code{keywordp}) symbols, so that the byte interpreter
+doesn't have to.
+@item
+The maxiumum number of variable bindings in the byte-code is
+pre-computed, so that space on the @code{specpdl} stack can be
+pre-reserved once for the whole function execution.
+@item
+All byte-code jumps are relative to the current program counter instead
+of the start of the program, thereby saving a register.
+@item
+One-byte relative jumps are converted from the byte-code form of unsigned
+chars offset by 127 to machine-friendly signed chars.
+@end enumerate
+Of course, this transformation of the @code{instructions} should not be
+visible to the user, so @code{Fcompiled_function_instructions()} needs
+to know how to convert the optimized opaque object back into a Lisp
+string that is identical to the original string from the @file{.elc}
+file.  (Actually, the resulting string may (rarely) contain slightly
+different, yet equivalent, byte code.)
+@code{Ffuncall()} implements Lisp @code{funcall}.  @code{(funcall fun
+x1 x2 x3 ...)} is equivalent to @code{(eval (list fun (quote x1) (quote
+x2) (quote x3) ...))}.  @code{Ffuncall()} contains its own code to do
+the evaluation, however, and is very similar to @code{Feval()}.
+From the performance point of view, it is worth knowing that most of the
+time in Lisp evaluation is spent executing @code{Lisp_Subr} and
+@code{Lisp_Compiled_Function} objects via @code{Ffuncall()} (not
+@code{Feval()}).
+@code{Fapply()} implements Lisp @code{apply}, which is very similar to
+@code{funcall} except that if the last argument is a list, the result is the
+same as if each of the arguments in the list had been passed separately.
+@code{Fapply()} does some business to expand the last argument if it's a
+list, then calls @code{Ffuncall()} to do the work.
+@code{apply1()}, @code{call0()}, @code{call1()}, @code{call2()}, and
+@code{call3()} call a function, passing it the argument(s) given (the
+arguments are given as separate C arguments rather than being passed as
+an array).  @code{apply1()} uses @code{Fapply()} while the others use
+@code{Ffuncall()} to do the real work.
+@node Dynamic Binding; The specbinding Stack; Unwind-Protects
+@section Dynamic Binding; The specbinding Stack; Unwind-Protects
+@example
+struct specbinding
+@{
+Lisp_Object symbol;
+Lisp_Object old_value;
+Lisp_Object (*func) (Lisp_Object); /* for unwind-protect */
+@};
+@end example
+@code{struct specbinding} is used for local-variable bindings and
+unwind-protects.  @code{specpdl} holds an array of @code{struct specbinding}'s,
+@code{specpdl_ptr} points to the beginning of the free bindings in the
+array, @code{specpdl_size} specifies the total number of binding slots
+in the array, and @code{max_specpdl_size} specifies the maximum number
+of bindings the array can be expanded to hold.  @code{grow_specpdl()}
+increases the size of the @code{specpdl} array, multiplying its size by
+2 but never exceeding @code{max_specpdl_size} (except that if this
+number is less than 400, it is first set to 400).
+@code{specbind()} binds a symbol to a value and is used for local
+variables and @code{let} forms.  The symbol and its old value (which
+might be @code{Qunbound}, indicating no prior value) are recorded in the
+specpdl array, and @code{specpdl_size} is increased by 1.
+@code{record_unwind_protect()} implements an @dfn{unwind-protect},
+which, when placed around a section of code, ensures that some specified
+cleanup routine will be executed even if the code exits abnormally
+(e.g. through a @code{throw} or quit).  @code{record_unwind_protect()}
+simply adds a new specbinding to the @code{specpdl} array and stores the
+appropriate information in it.  The cleanup routine can either be a C
+function, which is stored in the @code{func} field, or a @code{progn}
+form, which is stored in the @code{old_value} field.
+@code{unbind_to()} removes specbindings from the @code{specpdl} array
+until the specified position is reached.  Each specbinding can be one of
+three types:
+@enumerate
+@item
+an unwind-protect with a C cleanup function (@code{func} is not 0, and
+@code{old_value} holds an argument to be passed to the function);
+@item
+an unwind-protect with a Lisp form (@code{func} is 0, @code{symbol}
+is @code{nil}, and @code{old_value} holds the form to be executed with
+@code{Fprogn()}); or
+@item
+a local-variable binding (@code{func} is 0, @code{symbol} is not
+@code{nil}, and @code{old_value} holds the old value, which is stored as
+the symbol's value).
+@end enumerate
+@node Simple Special Forms
+@section Simple Special Forms
+@code{or}, @code{and}, @code{if}, @code{cond}, @code{progn},
+@code{prog1}, @code{prog2}, @code{setq}, @code{quote}, @code{function},
+@code{let*}, @code{let}, @code{while}
+All of these are very simple and work as expected, calling
+@code{Feval()} or @code{Fprogn()} as necessary and (in the case of
+@code{let} and @code{let*}) using @code{specbind()} to create bindings
+and @code{unbind_to()} to undo the bindings when finished.
+Note that, with the exeption of @code{Fprogn}, these functions are
+typically called in real life only in interpreted code, since the byte
+compiler knows how to convert calls to these functions directly into
+byte code.
+@node Catch and Throw
+@section Catch and Throw
+@example
+struct catchtag
+@{
+Lisp_Object tag;
+Lisp_Object val;
+struct catchtag *next;
+struct gcpro *gcpro;
+jmp_buf jmp;
+struct backtrace *backlist;
+int lisp_eval_depth;
+int pdlcount;
+@};
+@end example
+@code{catch} is a Lisp function that places a catch around a body of
+code.  A catch is a means of non-local exit from the code.  When a catch
+is created, a tag is specified, and executing a @code{throw} to this tag
+will exit from the body of code caught with this tag, and its value will
+be the value given in the call to @code{throw}.  If there is no such
+call, the code will be executed normally.
+Information pertaining to a catch is held in a @code{struct catchtag},
+which is placed at the head of a linked list pointed to by
+@code{catchlist}.  @code{internal_catch()} is passed a C function to
+call (@code{Fprogn()} when Lisp @code{catch} is called) and arguments to
+give it, and places a catch around the function.  Each @code{struct
+catchtag} is held in the stack frame of the @code{internal_catch()}
+instance that created the catch.
+@code{internal_catch()} is fairly straightforward.  It stores into the
+@code{struct catchtag} the tag name and the current values of
+@code{backtrace_list}, @code{lisp_eval_depth}, @code{gcprolist}, and the
+offset into the @code{specpdl} array, sets a jump point with @code{_setjmp()}
+(storing the jump point into the @code{struct catchtag}), and calls the
+function.  Control will return to @code{internal_catch()} either when
+the function exits normally or through a @code{_longjmp()} to this jump
+point.  In the latter case, @code{throw} will store the value to be
+returned into the @code{struct catchtag} before jumping.  When it's
+done, @code{internal_catch()} removes the @code{struct catchtag} from
+the catchlist and returns the proper value.
+@code{Fthrow()} goes up through the catchlist until it finds one with
+a matching tag.  It then calls @code{unbind_catch()} to restore
+everything to what it was when the appropriate catch was set, stores the
+return value in the @code{struct catchtag}, and jumps (with
+@code{_longjmp()}) to its jump point.
+@code{unbind_catch()} removes all catches from the catchlist until it
+finds the correct one.  Some of the catches might have been placed for
+error-trapping, and if so, the appropriate entries on the handlerlist
+must be removed (see ``errors'').  @code{unbind_catch()} also restores
+the values of @code{gcprolist}, @code{backtrace_list}, and
+@code{lisp_eval}, and calls @code{unbind_to()} to undo any specbindings
+created since the catch.
+@node Symbols and Variables, Buffers and Textual Representation, Evaluation; Stack Frames; Bindings, Top
+@chapter Symbols and Variables
+@menu
+* Introduction to Symbols::
+* Obarrays::
+* Symbol Values::
+@end menu
+@node Introduction to Symbols
+@section Introduction to Symbols
+A symbol is basically just an object with four fields: a name (a
+string), a value (some Lisp object), a function (some Lisp object), and
+a property list (usually a list of alternating keyword/value pairs).
+What makes symbols special is that there is usually only one symbol with
+a given name, and the symbol is referred to by name.  This makes a
+symbol a convenient way of calling up data by name, i.e. of implementing
+variables. (The variable's value is stored in the @dfn{value slot}.)
+Similarly, functions are referenced by name, and the definition of the
+function is stored in a symbol's @dfn{function slot}.  This means that
+there can be a distinct function and variable with the same name.  The
+property list is used as a more general mechanism of associating
+additional values with particular names, and once again the namespace is
+independent of the function and variable namespaces.
+@node Obarrays
+@section Obarrays
+The identity of symbols with their names is accomplished through a
+structure called an obarray, which is just a poorly-implemented hash
+table mapping from strings to symbols whose name is that string. (I say
+``poorly implemented'' because an obarray appears in Lisp as a vector
+with some hidden fields rather than as its own opaque type.  This is an
+Emacs Lisp artifact that should be fixed.)
+Obarrays are implemented as a vector of some fixed size (which should
+be a prime for best results), where each ``bucket'' of the vector
+contains one or more symbols, threaded through a hidden @code{next}
+field in the symbol.  Lookup of a symbol in an obarray, and adding a
+symbol to an obarray, is accomplished through standard hash-table
+techniques.
+The standard Lisp function for working with symbols and obarrays is
+@code{intern}.  This looks up a symbol in an obarray given its name; if
+it's not found, a new symbol is automatically created with the specified
+name, added to the obarray, and returned.  This is what happens when the
+Lisp reader encounters a symbol (or more precisely, encounters the name
+of a symbol) in some text that it is reading.  There is a standard
+obarray called @code{obarray} that is used for this purpose, although
+the Lisp programmer is free to create his own obarrays and @code{intern}
+symbols in them.
+Note that, once a symbol is in an obarray, it stays there until
+something is done about it, and the standard obarray @code{obarray}
+always stays around, so once you use any particular variable name, a
+corresponding symbol will stay around in @code{obarray} until you exit
+XEmacs.
+Note that @code{obarray} itself is a variable, and as such there is a
+symbol in @code{obarray} whose name is @code{"obarray"} and which
+contains @code{obarray} as its value.
+Note also that this call to @code{intern} occurs only when in the Lisp
+reader, not when the code is executed (at which point the symbol is
+already around, stored as such in the definition of the function).
+You can create your own obarray using @code{make-vector} (this is
+horrible but is an artifact) and intern symbols into that obarray.
+Doing that will result in two or more symbols with the same name.
+However, at most one of these symbols is in the standard @code{obarray}:
+You cannot have two symbols of the same name in any particular obarray.
+Note that you cannot add a symbol to an obarray in any fashion other
+than using @code{intern}: i.e. you can't take an existing symbol and put
+it in an existing obarray.  Nor can you change the name of an existing
+symbol. (Since obarrays are vectors, you can violate the consistency of
+things by storing directly into the vector, but let's ignore that
+possibility.)
+Usually symbols are created by @code{intern}, but if you really want,
+you can explicitly create a symbol using @code{make-symbol}, giving it
+some name.  The resulting symbol is not in any obarray (i.e. it is
+@dfn{uninterned}), and you can't add it to any obarray.  Therefore its
+primary purpose is as a symbol to use in macros to avoid namespace
+pollution.  It can also be used as a carrier of information, but cons
+cells could probably be used just as well.
+You can also use @code{intern-soft} to look up a symbol but not create
+a new one, and @code{unintern} to remove a symbol from an obarray.  This
+returns the removed symbol. (Remember: You can't put the symbol back
+into any obarray.) Finally, @code{mapatoms} maps over all of the symbols
+in an obarray.
+@node Symbol Values
+@section Symbol Values
+The value field of a symbol normally contains a Lisp object.  However,
+a symbol can be @dfn{unbound}, meaning that it logically has no value.
+This is internally indicated by storing a special Lisp object, called
+@dfn{the unbound marker} and stored in the global variable
+@code{Qunbound}.  The unbound marker is of a special Lisp object type
+called @dfn{symbol-value-magic}.  It is impossible for the Lisp
+programmer to directly create or access any object of this type.
+@strong{You must not let any ``symbol-value-magic'' object escape to
+the Lisp level.}  Printing any of these objects will cause the message
+@samp{INTERNAL EMACS BUG} to appear as part of the print representation.
+(You may see this normally when you call @code{debug_print()} from the
+debugger on a Lisp object.) If you let one of these objects escape to
+the Lisp level, you will violate a number of assumptions contained in
+the C code and make the unbound marker not function right.
+When a symbol is created, its value field (and function field) are set
+to @code{Qunbound}.  The Lisp programmer can restore these conditions
+later using @code{makunbound} or @code{fmakunbound}, and can query to
+see whether the value of function fields are @dfn{bound} (i.e. have a
+value other than @code{Qunbound}) using @code{boundp} and
+@code{fboundp}.  The fields are set to a normal Lisp object using
+@code{set} (or @code{setq}) and @code{fset}.
+Other symbol-value-magic objects are used as special markers to
+indicate variables that have non-normal properties.  This includes any
+variables that are tied into C variables (setting the variable magically
+sets some global variable in the C code, and likewise for retrieving the
+variable's value), variables that magically tie into slots in the
+current buffer, variables that are buffer-local, etc.  The
+symbol-value-magic object is stored in the value cell in place of
+a normal object, and the code to retrieve a symbol's value
+(i.e. @code{symbol-value}) knows how to do special things with them.
+This means that you should not just fetch the value cell directly if you
+want a symbol's value.
+The exact workings of this are rather complex and involved and are
+well-documented in comments in @file{buffer.c}, @file{symbols.c}, and
+@file{lisp.h}.
+@node Buffers and Textual Representation, MULE Character Sets and Encodings, Symbols and Variables, Top
+@chapter Buffers and Textual Representation
+@menu
+* Introduction to Buffers::     A buffer holds a block of text such as a file.
+* The Text in a Buffer::        Representation of the text in a buffer.
+* Buffer Lists::                Keeping track of all buffers.
+* Markers and Extents::         Tagging locations within a buffer.
+* Bufbytes and Emchars::        Representation of individual characters.
+* The Buffer Object::           The Lisp object corresponding to a buffer.
+@end menu
+@node Introduction to Buffers
+@section Introduction to Buffers
+A buffer is logically just a Lisp object that holds some text.
+In this, it is like a string, but a buffer is optimized for
+frequent insertion and deletion, while a string is not.  Furthermore:
+@enumerate
+@item
+Buffers are @dfn{permanent} objects, i.e. once you create them, they
+remain around, and need to be explicitly deleted before they go away.
+@item
+Each buffer has a unique name, which is a string.  Buffers are
+normally referred to by name.  In this respect, they are like
+symbols.
+@item
+Buffers have a default insertion position, called @dfn{point}.
+Inserting text (unless you explicitly give a position) goes at point,
+and moves point forward past the text.  This is what is going on when
+you type text into Emacs.
+@item
+Buffers have lots of extra properties associated with them.
+@item
+Buffers can be @dfn{displayed}.  What this means is that there
+exist a number of @dfn{windows}, which are objects that correspond
+to some visible section of your display, and each window has
+an associated buffer, and the current contents of the buffer
+are shown in that section of the display.  The redisplay mechanism
+(which takes care of doing this) knows how to look at the
+text of a buffer and come up with some reasonable way of displaying
+this.  Many of the properties of a buffer control how the
+buffer's text is displayed.
+@item
+One buffer is distinguished and called the @dfn{current buffer}.  It is
+stored in the variable @code{current_buffer}.  Buffer operations operate
+on this buffer by default.  When you are typing text into a buffer, the
+buffer you are typing into is always @code{current_buffer}.  Switching
+to a different window changes the current buffer.  Note that Lisp code
+can temporarily change the current buffer using @code{set-buffer} (often
+enclosed in a @code{save-excursion} so that the former current buffer
+gets restored when the code is finished).  However, calling
+@code{set-buffer} will NOT cause a permanent change in the current
+buffer.  The reason for this is that the top-level event loop sets
+@code{current_buffer} to the buffer of the selected window, each time
+it finishes executing a user command.
+@end enumerate
+Make sure you understand the distinction between @dfn{current buffer}
+and @dfn{buffer of the selected window}, and the distinction between
+@dfn{point} of the current buffer and @dfn{window-point} of the selected
+window. (This latter distinction is explained in detail in the section
+on windows.)
+@node The Text in a Buffer
+@section The Text in a Buffer
+The text in a buffer consists of a sequence of zero or more
+characters.  A @dfn{character} is an integer that logically represents
+a letter, number, space, or other unit of text.  Most of the characters
+that you will typically encounter belong to the ASCII set of characters,
+but there are also characters for various sorts of accented letters,
+special symbols, Chinese and Japanese ideograms (i.e. Kanji, Katakana,
+etc.), Cyrillic and Greek letters, etc.  The actual number of possible
+characters is quite large.
+For now, we can view a character as some non-negative integer that
+has some shape that defines how it typically appears (e.g. as an
+uppercase A). (The exact way in which a character appears depends on the
+font used to display the character.) The internal type of characters in
+the C code is an @code{Emchar}; this is just an @code{int}, but using a
+symbolic type makes the code clearer.
+Between every character in a buffer is a @dfn{buffer position} or
+@dfn{character position}.  We can speak of the character before or after
+a particular buffer position, and when you insert a character at a
+particular position, all characters after that position end up at new
+positions.  When we speak of the character @dfn{at} a position, we
+really mean the character after the position.  (This schizophrenia
+between a buffer position being ``between'' a character and ``on'' a
+character is rampant in Emacs.)
+Buffer positions are numbered starting at 1.  This means that
+position 1 is before the first character, and position 0 is not
+valid.  If there are N characters in a buffer, then buffer
+position N+1 is after the last one, and position N+2 is not valid.
+The internal makeup of the Emchar integer varies depending on whether
+we have compiled with MULE support.  If not, the Emchar integer is an
+8-bit integer with possible values from 0 - 255.  0 - 127 are the
+standard ASCII characters, while 128 - 255 are the characters from the
+ISO-8859-1 character set.  If we have compiled with MULE support, an
+Emchar is a 19-bit integer, with the various bits having meanings
+according to a complex scheme that will be detailed later.  The
+characters numbered 0 - 255 still have the same meanings as for the
+non-MULE case, though.
+Internally, the text in a buffer is represented in a fairly simple
+fashion: as a contiguous array of bytes, with a @dfn{gap} of some size
+in the middle.  Although the gap is of some substantial size in bytes,
+there is no text contained within it: From the perspective of the text
+in the buffer, it does not exist.  The gap logically sits at some buffer
+position, between two characters (or possibly at the beginning or end of
+the buffer).  Insertion of text in a buffer at a particular position is
+always accomplished by first moving the gap to that position
+(i.e. through some block moving of text), then writing the text into the
+beginning of the gap, thereby shrinking the gap.  If the gap shrinks
+down to nothing, a new gap is created. (What actually happens is that a
+new gap is ``created'' at the end of the buffer's text, which requires
+nothing more than changing a couple of indices; then the gap is
+``moved'' to the position where the insertion needs to take place by
+moving up in memory all the text after that position.)  Similarly,
+deletion occurs by moving the gap to the place where the text is to be
+deleted, and then simply expanding the gap to include the deleted text.
+(@dfn{Expanding} and @dfn{shrinking} the gap as just described means
+just that the internal indices that keep track of where the gap is
+located are changed.)
+Note that the total amount of memory allocated for a buffer text never
+decreases while the buffer is live.  Therefore, if you load up a
+20-megabyte file and then delete all but one character, there will be a
+20-megabyte gap, which won't get any smaller (except by inserting
+characters back again).  Once the buffer is killed, the memory allocated
+for the buffer text will be freed, but it will still be sitting on the
+heap, taking up virtual memory, and will not be released back to the
+operating system. (However, if you have compiled XEmacs with rel-alloc,
+the situation is different.  In this case, the space @emph{will} be
+released back to the operating system.  However, this tends to result in a
+noticeable speed penalty.)
+Astute readers may notice that the text in a buffer is represented as
+an array of @emph{bytes}, while (at least in the MULE case) an Emchar is
+a 19-bit integer, which clearly cannot fit in a byte.  This means (of
+course) that the text in a buffer uses a different representation from
+an Emchar: specifically, the 19-bit Emchar becomes a series of one to
+four bytes.  The conversion between these two representations is complex
+and will be described later.
+In the non-MULE case, everything is very simple: An Emchar
+is an 8-bit value, which fits neatly into one byte.
+If we are given a buffer position and want to retrieve the
+character at that position, we need to follow these steps:
+@enumerate
+@item
+Pretend there's no gap, and convert the buffer position into a @dfn{byte
+index} that indexes to the appropriate byte in the buffer's stream of
+textual bytes.  By convention, byte indices begin at 1, just like buffer
+positions.  In the non-MULE case, byte indices and buffer positions are
+identical, since one character equals one byte.
+@item
+Convert the byte index into a @dfn{memory index}, which takes the gap
+into account.  The memory index is a direct index into the block of
+memory that stores the text of a buffer.  This basically just involves
+checking to see if the byte index is past the gap, and if so, adding the
+size of the gap to it.  By convention, memory indices begin at 1, just
+like buffer positions and byte indices, and when referring to the
+position that is @dfn{at} the gap, we always use the memory position at
+the @emph{beginning}, not at the end, of the gap.
+@item
+Fetch the appropriate bytes at the determined memory position.
+@item
+Convert these bytes into an Emchar.
+@end enumerate
+In the non-Mule case, (3) and (4) boil down to a simple one-byte
+memory access.
+Note that we have defined three types of positions in a buffer:
+@enumerate
+@item
+@dfn{buffer positions} or @dfn{character positions}, typedef @code{Bufpos}
+@item
+@dfn{byte indices}, typedef @code{Bytind}
+@item
+@dfn{memory indices}, typedef @code{Memind}
+@end enumerate
+All three typedefs are just @code{int}s, but defining them this way makes
+things a lot clearer.
+Most code works with buffer positions.  In particular, all Lisp code
+that refers to text in a buffer uses buffer positions.  Lisp code does
+not know that byte indices or memory indices exist.
+Finally, we have a typedef for the bytes in a buffer.  This is a
+@code{Bufbyte}, which is an unsigned char.  Referring to them as
+Bufbytes underscores the fact that we are working with a string of bytes
+in the internal Emacs buffer representation rather than in one of a
+number of possible alternative representations (e.g. EUC-encoded text,
+etc.).
+@node Buffer Lists
+@section Buffer Lists
+Recall earlier that buffers are @dfn{permanent} objects, i.e.  that
+they remain around until explicitly deleted.  This entails that there is
+a list of all the buffers in existence.  This list is actually an
+assoc-list (mapping from the buffer's name to the buffer) and is stored
+in the global variable @code{Vbuffer_alist}.
+The order of the buffers in the list is important: the buffers are
+ordered approximately from most-recently-used to least-recently-used.
+Switching to a buffer using @code{switch-to-buffer},
+@code{pop-to-buffer}, etc. and switching windows using
+@code{other-window}, etc.  usually brings the new current buffer to the
+front of the list.  @code{switch-to-buffer}, @code{other-buffer},
+etc. look at the beginning of the list to find an alternative buffer to
+suggest.  You can also explicitly move a buffer to the end of the list
+using @code{bury-buffer}.
+In addition to the global ordering in @code{Vbuffer_alist}, each frame
+has its own ordering of the list.  These lists always contain the same
+elements as in @code{Vbuffer_alist} although possibly in a different
+order.  @code{buffer-list} normally returns the list for the selected
+frame.  This allows you to work in separate frames without things
+interfering with each other.
+The standard way to look up a buffer given a name is
+@code{get-buffer}, and the standard way to create a new buffer is
+@code{get-buffer-create}, which looks up a buffer with a given name,
+creating a new one if necessary.  These operations correspond exactly
+with the symbol operations @code{intern-soft} and @code{intern},
+respectively.  You can also force a new buffer to be created using
+@code{generate-new-buffer}, which takes a name and (if necessary) makes
+a unique name from this by appending a number, and then creates the
+buffer.  This is basically like the symbol operation @code{gensym}.
+@node Markers and Extents
+@section Markers and Extents
+Among the things associated with a buffer are things that are
+logically attached to certain buffer positions.  This can be used to
+keep track of a buffer position when text is inserted and deleted, so
+that it remains at the same spot relative to the text around it; to
+assign properties to particular sections of text; etc.  There are two
+such objects that are useful in this regard: they are @dfn{markers} and
+@dfn{extents}.
+A @dfn{marker} is simply a flag placed at a particular buffer
+position, which is moved around as text is inserted and deleted.
+Markers are used for all sorts of purposes, such as the @code{mark} that
+is the other end of textual regions to be cut, copied, etc.
+An @dfn{extent} is similar to two markers plus some associated
+properties, and is used to keep track of regions in a buffer as text is
+inserted and deleted, and to add properties (e.g. fonts) to particular
+regions of text.  The external interface of extents is explained
+elsewhere.
+The important thing here is that markers and extents simply contain
+buffer positions in them as integers, and every time text is inserted or
+deleted, these positions must be updated.  In order to minimize the
+amount of shuffling that needs to be done, the positions in markers and
+extents (there's one per marker, two per extent) and stored in Meminds.
+This means that they only need to be moved when the text is physically
+moved in memory; since the gap structure tries to minimize this, it also
+minimizes the number of marker and extent indices that need to be
+adjusted.  Look in @file{insdel.c} for the details of how this works.
+One other important distinction is that markers are @dfn{temporary}
+while extents are @dfn{permanent}.  This means that markers disappear as
+soon as there are no more pointers to them, and correspondingly, there
+is no way to determine what markers are in a buffer if you are just
+given the buffer.  Extents remain in a buffer until they are detached
+(which could happen as a result of text being deleted) or the buffer is
+deleted, and primitives do exist to enumerate the extents in a buffer.
+@node Bufbytes and Emchars
+@section Bufbytes and Emchars
+Not yet documented.
+@node The Buffer Object
+@section The Buffer Object
+Buffers contain fields not directly accessible by the Lisp programmer.
+We describe them here, naming them by the names used in the C code.
+Many are accessible indirectly in Lisp programs via Lisp primitives.
+@table @code
+@item name
+The buffer name is a string that names the buffer.  It is guaranteed to
+be unique.  @xref{Buffer Names,,, lispref, XEmacs Lisp Programmer's
+Manual}.
+@item save_modified
+This field contains the time when the buffer was last saved, as an
+integer.  @xref{Buffer Modification,,, lispref, XEmacs Lisp Programmer's
+Manual}.
+@item modtime
+This field contains the modification time of the visited file.  It is
+set when the file is written or read.  Every time the buffer is written
+to the file, this field is compared to the modification time of the
+file.  @xref{Buffer Modification,,, lispref, XEmacs Lisp Programmer's
+Manual}.
+@item auto_save_modified
+This field contains the time when the buffer was last auto-saved.
+@item last_window_start
+This field contains the @code{window-start} position in the buffer as of
+the last time the buffer was displayed in a window.
+@item undo_list
+This field points to the buffer's undo list.  @xref{Undo,,, lispref,
+XEmacs Lisp Programmer's Manual}.
+@item syntax_table_v
+This field contains the syntax table for the buffer.  @xref{Syntax
+Tables,,, lispref, XEmacs Lisp Programmer's Manual}.
+@item downcase_table
+This field contains the conversion table for converting text to lower
+case.  @xref{Case Tables,,, lispref, XEmacs Lisp Programmer's Manual}.
+@item upcase_table
+This field contains the conversion table for converting text to upper
+case.  @xref{Case Tables,,, lispref, XEmacs Lisp Programmer's Manual}.
+@item case_canon_table
+This field contains the conversion table for canonicalizing text for
+case-folding search.  @xref{Case Tables,,, lispref, XEmacs Lisp
+Programmer's Manual}.
+@item case_eqv_table
+This field contains the equivalence table for case-folding search.
+@xref{Case Tables,,, lispref, XEmacs Lisp Programmer's Manual}.
+@item display_table
+This field contains the buffer's display table, or @code{nil} if it
+doesn't have one.  @xref{Display Tables,,, lispref, XEmacs Lisp
+Programmer's Manual}.
+@item markers
+This field contains the chain of all markers that currently point into
+the buffer.  Deletion of text in the buffer, and motion of the buffer's
+gap, must check each of these markers and perhaps update it.
+@xref{Markers,,, lispref, XEmacs Lisp Programmer's Manual}.
+@item backed_up
+This field is a flag that tells whether a backup file has been made for
+the visited file of this buffer.
+@item mark
+This field contains the mark for the buffer.  The mark is a marker,
+hence it is also included on the list @code{markers}.  @xref{The Mark,,,
+lispref, XEmacs Lisp Programmer's Manual}.
+@item mark_active
+This field is non-@code{nil} if the buffer's mark is active.
+@item local_var_alist
+This field contains the association list describing the variables local
+in this buffer, and their values, with the exception of local variables
+that have special slots in the buffer object.  (Those slots are omitted
+from this table.)  @xref{Buffer-Local Variables,,, lispref, XEmacs Lisp
+Programmer's Manual}.
+@item modeline_format
+This field contains a Lisp object which controls how to display the mode
+line for this buffer.  @xref{Modeline Format,,, lispref, XEmacs Lisp
+Programmer's Manual}.
+@item base_buffer
+This field holds the buffer's base buffer (if it is an indirect buffer),
+or @code{nil}.
+@end table
+@node MULE Character Sets and Encodings, The Lisp Reader and Compiler, Buffers and Textual Representation, Top
+@chapter MULE Character Sets and Encodings
+Recall that there are two primary ways that text is represented in
+XEmacs.  The @dfn{buffer} representation sees the text as a series of
+bytes (Bufbytes), with a variable number of bytes used per character.
+The @dfn{character} representation sees the text as a series of integers
+(Emchars), one per character.  The character representation is a cleaner
+representation from a theoretical standpoint, and is thus used in many
+cases when lots of manipulations on a string need to be done.  However,
+the buffer representation is the standard representation used in both
+Lisp strings and buffers, and because of this, it is the ``default''
+representation that text comes in.  The reason for using this
+representation is that it's compact and is compatible with ASCII.
+@menu
+* Character Sets::
+* Encodings::
+* Internal Mule Encodings::
+* CCL::
+@end menu
+@node Character Sets
+@section Character Sets
+A character set (or @dfn{charset}) is an ordered set of characters.  A
+particular character in a charset is indexed using one or more
+@dfn{position codes}, which are non-negative integers.  The number of
+position codes needed to identify a particular character in a charset is
+called the @dfn{dimension} of the charset.  In XEmacs/Mule, all charsets
+have dimension 1 or 2, and the size of all charsets (except for a few
+special cases) is either 94, 96, 94 by 94, or 96 by 96.  The range of
+position codes used to index characters from any of these types of
+character sets is as follows:
+@example
+Charset type            Position code 1         Position code 2
+------------------------------------------------------------
+94                      33 - 126                N/A
+96                      32 - 127                N/A
+94x94                   33 - 126                33 - 126
+96x96                   32 - 127                32 - 127
+@end example
+Note that in the above cases position codes do not start at an
+expected value such as 0 or 1.  The reason for this will become clear
+later.
+For example, Latin-1 is a 96-character charset, and JISX0208 (the
+Japanese national character set) is a 94x94-character charset.
+[Note that, although the ranges above define the @emph{valid} position
+codes for a charset, some of the slots in a particular charset may in
+fact be empty.  This is the case for JISX0208, for example, where (e.g.)
+all the slots whose first position code is in the range 118 - 127 are
+empty.]
+There are three charsets that do not follow the above rules.  All of
+them have one dimension, and have ranges of position codes as follows:
+@example
+Charset name            Position code 1
+------------------------------------
+ASCII                   0 - 127
+Control-1               0 - 31
+Composite               0 - some large number
+@end example
+(The upper bound of the position code for composite characters has not
+yet been determined, but it will probably be at least 16,383).
+ASCII is the union of two subsidiary character sets: Printing-ASCII
+(the printing ASCII character set, consisting of position codes 33 -
+126, like for a standard 94-character charset) and Control-ASCII (the
+non-printing characters that would appear in a binary file with codes 0
+- 32 and 127).
+Control-1 contains the non-printing characters that would appear in a
+binary file with codes 128 - 159.
+Composite contains characters that are generated by overstriking one
+or more characters from other charsets.
+Note that some characters in ASCII, and all characters in Control-1,
+are @dfn{control} (non-printing) characters.  These have no printed
+representation but instead control some other function of the printing
+(e.g. TAB or 8 moves the current character position to the next tab
+stop).  All other characters in all charsets are @dfn{graphic}
+(printing) characters.
+When a binary file is read in, the bytes in the file are assigned to
+character sets as follows:
+@example
+Bytes           Character set           Range
+--------------------------------------------------
+0 - 127         ASCII                   0 - 127
+128 - 159       Control-1               0 - 31
+160 - 255       Latin-1                 32 - 127
+@end example
+This is a bit ad-hoc but gets the job done.
+@node Encodings
+@section Encodings
+An @dfn{encoding} is a way of numerically representing characters from
+one or more character sets.  If an encoding only encompasses one
+character set, then the position codes for the characters in that
+character set could be used directly.  This is not possible, however, if
+more than one character set is to be used in the encoding.
+For example, the conversion detailed above between bytes in a binary
+file and characters is effectively an encoding that encompasses the
+three character sets ASCII, Control-1, and Latin-1 in a stream of 8-bit
+bytes.
+Thus, an encoding can be viewed as a way of encoding characters from a
+specified group of character sets using a stream of bytes, each of which
+contains a fixed number of bits (but not necessarily 8, as in the common
+usage of ``byte'').
+Here are descriptions of a couple of common
+encodings:
+@menu
+* Japanese EUC (Extended Unix Code)::
+* JIS7::
+@end menu
+@node Japanese EUC (Extended Unix Code)
+@subsection Japanese EUC (Extended Unix Code)
+This encompasses the character sets Printing-ASCII, Japanese-JISX0201,
+and Japanese-JISX0208-Kana (half-width katakana, the right half of
+JISX0201).  It uses 8-bit bytes.
+Note that Printing-ASCII and Japanese-JISX0201-Kana are 94-character
+charsets, while Japanese-JISX0208 is a 94x94-character charset.
+The encoding is as follows:
+@example
+Character set            Representation (PC=position-code)
+-------------            --------------
+Printing-ASCII           PC1
+Japanese-JISX0201-Kana   0x8E       | PC1 + 0x80
+Japanese-JISX0208        PC1 + 0x80 | PC2 + 0x80
+Japanese-JISX0212        PC1 + 0x80 | PC2 + 0x80
+@end example
+@node JIS7
+@subsection JIS7
+This encompasses the character sets Printing-ASCII,
+Japanese-JISX0201-Roman (the left half of JISX0201; this character set
+is very similar to Printing-ASCII and is a 94-character charset),
+Japanese-JISX0208, and Japanese-JISX0201-Kana.  It uses 7-bit bytes.
+Unlike Japanese EUC, this is a @dfn{modal} encoding, which
+means that there are multiple states that the encoding can
+be in, which affect how the bytes are to be interpreted.
+Special sequences of bytes (called @dfn{escape sequences})
+are used to change states.
+The encoding is as follows:
+@example
+Character set              Representation (PC=position-code)
+-------------              --------------
+Printing-ASCII             PC1
+Japanese-JISX0201-Roman    PC1
+Japanese-JISX0201-Kana     PC1
+Japanese-JISX0208          PC1 PC2
+Escape sequence   ASCII equivalent   Meaning
+---------------   ----------------   -------
+0x1B 0x28 0x4A    ESC ( J            invoke Japanese-JISX0201-Roman
+0x1B 0x28 0x49    ESC ( I            invoke Japanese-JISX0201-Kana
+0x1B 0x24 0x42    ESC $ B            invoke Japanese-JISX0208
+0x1B 0x28 0x42    ESC ( B            invoke Printing-ASCII
+@end example
+Initially, Printing-ASCII is invoked.
+@node Internal Mule Encodings
+@section Internal Mule Encodings
+In XEmacs/Mule, each character set is assigned a unique number, called a
+@dfn{leading byte}.  This is used in the encodings of a character.
+Leading bytes are in the range 0x80 - 0xFF (except for ASCII, which has
+a leading byte of 0), although some leading bytes are reserved.
+Charsets whose leading byte is in the range 0x80 - 0x9F are called
+@dfn{official} and are used for built-in charsets.  Other charsets are
+called @dfn{private} and have leading bytes in the range 0xA0 - 0xFF;
+these are user-defined charsets.
+More specifically:
+@example
+Character set           Leading byte
+-------------           ------------
+ASCII                   0
+Composite               0x80
+Dimension-1 Official    0x81 - 0x8D
+(0x8E is free)
+Control-1               0x8F
+Dimension-2 Official    0x90 - 0x99
+(0x9A - 0x9D are free;
+0x9E and 0x9F are reserved)
+Dimension-1 Private     0xA0 - 0xEF
+Dimension-2 Private     0xF0 - 0xFF
+@end example
+There are two internal encodings for characters in XEmacs/Mule.  One is
+called @dfn{string encoding} and is an 8-bit encoding that is used for
+representing characters in a buffer or string.  It uses 1 to 4 bytes per
+character.  The other is called @dfn{character encoding} and is a 19-bit
+encoding that is used for representing characters individually in a
+variable.
+(In the following descriptions, we'll ignore composite characters for
+the moment.  We also give a general (structural) overview first,
+followed later by the exact details.)
+@menu
+* Internal String Encoding::
+* Internal Character Encoding::
+@end menu
+@node Internal String Encoding
+@subsection Internal String Encoding
+ASCII characters are encoded using their position code directly.  Other
+characters are encoded using their leading byte followed by their
+position code(s) with the high bit set.  Characters in private character
+sets have their leading byte prefixed with a @dfn{leading byte prefix},
+which is either 0x9E or 0x9F. (No character sets are ever assigned these
+leading bytes.) Specifically:
+@example
+Character set           Encoding (PC=position-code, LB=leading-byte)
+-------------           --------
+ASCII                   PC-1 |
+Control-1               LB   |  PC1 + 0xA0 |
+Dimension-1 official    LB   |  PC1 + 0x80 |
+Dimension-1 private     0x9E |  LB         | PC1 + 0x80 |
+Dimension-2 official    LB   |  PC1 + 0x80 | PC2 + 0x80 |
+Dimension-2 private     0x9F |  LB         | PC1 + 0x80 | PC2 + 0x80
+@end example
+The basic characteristic of this encoding is that the first byte
+of all characters is in the range 0x00 - 0x9F, and the second and
+following bytes of all characters is in the range 0xA0 - 0xFF.
+This means that it is impossible to get out of sync, or more
+specifically:
+@enumerate
+@item
+Given any byte position, the beginning of the character it is
+within can be determined in constant time.
+@item
+Given any byte position at the beginning of a character, the
+beginning of the next character can be determined in constant
+time.
+@item
+Given any byte position at the beginning of a character, the
+beginning of the previous character can be determined in constant
+time.
+@item
+Textual searches can simply treat encoded strings as if they
+were encoded in a one-byte-per-character fashion rather than
+the actual multi-byte encoding.
+@end enumerate
+None of the standard non-modal encodings meet all of these
+conditions.  For example, EUC satisfies only (2) and (3), while
+Shift-JIS and Big5 (not yet described) satisfy only (2). (All
+non-modal encodings must satisfy (2), in order to be unambiguous.)
+@node Internal Character Encoding
+@subsection Internal Character Encoding
+One 19-bit word represents a single character.  The word is
+separated into three fields:
+@example
+Bit number:     18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
+<------------> <------------------> <------------------>
+Field:                1                  2                    3
+@end example
+Note that fields 2 and 3 hold 7 bits each, while field 1 holds 5 bits.
+@example
+Character set           Field 1         Field 2         Field 3
+-------------           -------         -------         -------
+ASCII                      0               0              PC1
+range:                                                   (00 - 7F)
+Control-1                  0               1              PC1
+range:                                                   (00 - 1F)
+Dimension-1 official       0            LB - 0x80         PC1
+range:                                    (01 - 0D)      (20 - 7F)
+Dimension-1 private        0            LB - 0x80         PC1
+range:                                    (20 - 6F)      (20 - 7F)
+Dimension-2 official    LB - 0x8F         PC1             PC2
+range:                    (01 - 0A)       (20 - 7F)      (20 - 7F)
+Dimension-2 private     LB - 0xE1         PC1             PC2
+range:                    (0F - 1E)       (20 - 7F)      (20 - 7F)
+Composite                 0x1F             ?               ?
+@end example
+Note that character codes 0 - 255 are the same as the ``binary encoding''
+described above.
+@node CCL
+@section CCL
+@example
+CCL PROGRAM SYNTAX:
+CCL_PROGRAM := (CCL_MAIN_BLOCK
+[ CCL_EOF_BLOCK ])
+CCL_MAIN_BLOCK := CCL_BLOCK
+CCL_EOF_BLOCK := CCL_BLOCK
+CCL_BLOCK := STATEMENT | (STATEMENT [STATEMENT ...])
+STATEMENT :=
+SET | IF | BRANCH | LOOP | REPEAT | BREAK
+| READ | WRITE
+SET := (REG = EXPRESSION) | (REG SELF_OP EXPRESSION)
+| INT-OR-CHAR
+EXPRESSION := ARG | (EXPRESSION OP ARG)
+IF := (if EXPRESSION CCL_BLOCK CCL_BLOCK)
+BRANCH := (branch EXPRESSION CCL_BLOCK [CCL_BLOCK ...])
+LOOP := (loop STATEMENT [STATEMENT ...])
+BREAK := (break)
+REPEAT := (repeat)
+| (write-repeat [REG | INT-OR-CHAR | string])
+| (write-read-repeat REG [INT-OR-CHAR | string | ARRAY]?)
+READ := (read REG) | (read REG REG)
+| (read-if REG ARITH_OP ARG CCL_BLOCK CCL_BLOCK)
+| (read-branch REG CCL_BLOCK [CCL_BLOCK ...])
+WRITE := (write REG) | (write REG REG)
+| (write INT-OR-CHAR) | (write STRING) | STRING
+| (write REG ARRAY)
+END := (end)
+REG := r0 | r1 | r2 | r3 | r4 | r5 | r6 | r7
+ARG := REG | INT-OR-CHAR
+OP :=   + | - | * | / | % | & | '|' | ^ | << | >> | <8 | >8 | //
+| < | > | == | <= | >= | !=
+SELF_OP :=
++= | -= | *= | /= | %= | &= | '|=' | ^= | <<= | >>=
+ARRAY := '[' INT-OR-CHAR ... ']'
+INT-OR-CHAR := INT | CHAR
+MACHINE CODE:
+The machine code consists of a vector of 32-bit words.
+The first such word specifies the start of the EOF section of the code;
+this is the code executed to handle any stuff that needs to be done
+(e.g. designating back to ASCII and left-to-right mode) after all
+other encoded/decoded data has been written out.  This is not used for
+charset CCL programs.
+REGISTER: 0..7  -- refered by RRR or rrr
+OPERATOR BIT FIELD (27-bit): XXXXXXXXXXXXXXX RRR TTTTT
+TTTTT (5-bit): operator type
+RRR (3-bit): register number
+XXXXXXXXXXXXXXXX (15-bit):
+CCCCCCCCCCCCCCC: constant or address
+000000000000rrr: register number
+AAAA:   00000 +
+00001 -
+00010 *
+00011 /
+00100 %
+00101 &
+00110 |
+00111 ~
+01000 <<
+01001 >>
+01010 <8
+01011 >8
+01100 //
+01101 not used
+01110 not used
+01111 not used
+10000 <
+10001 >
+10010 ==
+10011 <=
+10100 >=
+10101 !=
+OPERATORS:      TTTTT RRR XX..
+SetCS:          00000 RRR C...C      RRR = C...C
+SetCL:          00001 RRR .....      RRR = c...c
+c.............c
+SetR:           00010 RRR ..rrr      RRR = rrr
+SetA:           00011 RRR ..rrr      RRR = array[rrr]
+C.............C      size of array = C...C
+c.............c      contents = c...c
+Jump:           00100 000 c...c      jump to c...c
+JumpCond:       00101 RRR c...c      if (!RRR) jump to c...c
+WriteJump:      00110 RRR c...c      Write1 RRR, jump to c...c
+WriteReadJump:  00111 RRR c...c      Write1, Read1 RRR, jump to c...c
+WriteCJump:     01000 000 c...c      Write1 C...C, jump to c...c
+C...C
+WriteCReadJump: 01001 RRR c...c      Write1 C...C, Read1 RRR,
+C.............C      and jump to c...c
+WriteSJump:     01010 000 c...c      WriteS, jump to c...c
+C.............C
+S.............S
+...
+WriteSReadJump: 01011 RRR c...c      WriteS, Read1 RRR, jump to c...c
+C.............C
+S.............S
+...
+WriteAReadJump: 01100 RRR c...c      WriteA, Read1 RRR, jump to c...c
+C.............C      size of array = C...C
+c.............c      contents = c...c
+...
+Branch:         01101 RRR C...C      if (RRR >= 0 && RRR < C..)
+c.............c      branch to (RRR+1)th address
+Read1:          01110 RRR ...        read 1-byte to RRR
+Read2:          01111 RRR ..rrr      read 2-byte to RRR and rrr
+ReadBranch:     10000 RRR C...C      Read1 and Branch
+c.............c
+...
+Write1:         10001 RRR .....      write 1-byte RRR
+Write2:         10010 RRR ..rrr      write 2-byte RRR and rrr
+WriteC:         10011 000 .....      write 1-char C...CC
+C.............C
+WriteS:         10100 000 .....      write C..-byte of string
+C.............C
+S.............S
+...
+WriteA:         10101 RRR .....      write array[RRR]
+C.............C      size of array = C...C
+c.............c      contents = c...c
+...
+End:            10110 000 .....      terminate the execution
+SetSelfCS:      10111 RRR C...C      RRR AAAAA= C...C
+..........AAAAA
+SetSelfCL:      11000 RRR .....      RRR AAAAA= c...c
+c.............c
+..........AAAAA
+SetSelfR:       11001 RRR ..Rrr      RRR AAAAA= rrr
+..........AAAAA
+SetExprCL:      11010 RRR ..Rrr      RRR = rrr AAAAA c...c
+c.............c
+..........AAAAA
+SetExprR:       11011 RRR ..rrr      RRR = rrr AAAAA Rrr
+............Rrr
+..........AAAAA
+JumpCondC:      11100 RRR c...c      if !(RRR AAAAA C..) jump to c...c
+C.............C
+..........AAAAA
+JumpCondR:      11101 RRR c...c      if !(RRR AAAAA rrr) jump to c...c
+............rrr
+..........AAAAA
+ReadJumpCondC:  11110 RRR c...c      Read1 and JumpCondC
+C.............C
+..........AAAAA
+ReadJumpCondR:  11111 RRR c...c      Read1 and JumpCondR
+............rrr
+..........AAAAA
+@end example
+@node The Lisp Reader and Compiler, Lstreams, MULE Character Sets and Encodings, Top
+@chapter The Lisp Reader and Compiler
+Not yet documented.
+@node Lstreams, Consoles; Devices; Frames; Windows, The Lisp Reader and Compiler, Top
+@chapter Lstreams
+An @dfn{lstream} is an internal Lisp object that provides a generic
+buffering stream implementation.  Conceptually, you send data to the
+stream or read data from the stream, not caring what's on the other end
+of the stream.  The other end could be another stream, a file
+descriptor, a stdio stream, a fixed block of memory, a reallocating
+block of memory, etc.  The main purpose of the stream is to provide a
+standard interface and to do buffering.  Macros are defined to read or
+write characters, so the calling functions do not have to worry about
+blocking data together in order to achieve efficiency.
+@menu
+* Creating an Lstream::         Creating an lstream object.
+* Lstream Types::               Different sorts of things that are streamed.
+* Lstream Functions::           Functions for working with lstreams.
+* Lstream Methods::             Creating new lstream types.
+@end menu
+@node Creating an Lstream
+@section Creating an Lstream
+Lstreams come in different types, depending on what is being interfaced
+to.  Although the primitive for creating new lstreams is
+@code{Lstream_new()}, generally you do not call this directly.  Instead,
+you call some type-specific creation function, which creates the lstream
+and initializes it as appropriate for the particular type.
+All lstream creation functions take a @var{mode} argument, specifying
+what mode the lstream should be opened as.  This controls whether the
+lstream is for input and output, and optionally whether data should be
+blocked up in units of MULE characters.  Note that some types of
+lstreams can only be opened for input; others only for output; and
+others can be opened either way.  #### Richard Mlynarik thinks that
+there should be a strict separation between input and output streams,
+and he's probably right.
+@var{mode} is a string, one of
+@table @code
+@item "r"
+Open for reading.
+@item "w"
+Open for writing.
+@item "rc"
+Open for reading, but ``read'' never returns partial MULE characters.
+@item "wc"
+Open for writing, but never writes partial MULE characters.
+@end table
+@node Lstream Types
+@section Lstream Types
+@table @asis
+@item stdio
+@item filedesc
+@item lisp-string
+@item fixed-buffer
+@item resizing-buffer
+@item dynarr
+@item lisp-buffer
+@item print
+@item decoding
+@item encoding
+@end table
+@node Lstream Functions
+@section Lstream Functions
+@deftypefun {Lstream *} Lstream_new (Lstream_implementation *@var{imp}, CONST char *@var{mode})
+Allocate and return a new Lstream.  This function is not really meant to
+be called directly; rather, each stream type should provide its own
+stream creation function, which creates the stream and does any other
+necessary creation stuff (e.g. opening a file).
+@end deftypefun
+@deftypefun void Lstream_set_buffering (Lstream *@var{lstr}, Lstream_buffering @var{buffering}, int @var{buffering_size})
+Change the buffering of a stream.  See @file{lstream.h}.  By default the
+buffering is @code{STREAM_BLOCK_BUFFERED}.
+@end deftypefun
+@deftypefun int Lstream_flush (Lstream *@var{lstr})
+Flush out any pending unwritten data in the stream.  Clear any buffered
+input data.  Returns 0 on success, -1 on error.
+@end deftypefun
+@deftypefn Macro int Lstream_putc (Lstream *@var{stream}, int @var{c})
+Write out one byte to the stream.  This is a macro and so it is very
+efficient.  The @var{c} argument is only evaluated once but the @var{stream}
+argument is evaluated more than once.  Returns 0 on success, -1 on
+error.
+@end deftypefn
+@deftypefn Macro int Lstream_getc (Lstream *@var{stream})
+Read one byte from the stream.  This is a macro and so it is very
+efficient.  The @var{stream} argument is evaluated more than once.  Return
+value is -1 for EOF or error.
+@end deftypefn
+@deftypefn Macro void Lstream_ungetc (Lstream *@var{stream}, int @var{c})
+Push one byte back onto the input queue.  This will be the next byte
+read from the stream.  Any number of bytes can be pushed back and will
+be read in the reverse order they were pushed back -- most recent
+first. (This is necessary for consistency -- if there are a number of
+bytes that have been unread and I read and unread a byte, it needs to be
+the first to be read again.) This is a macro and so it is very
+efficient.  The @var{c} argument is only evaluated once but the @var{stream}
+argument is evaluated more than once.
+@end deftypefn
+@deftypefun int Lstream_fputc (Lstream *@var{stream}, int @var{c})
+@deftypefunx int Lstream_fgetc (Lstream *@var{stream})
+@deftypefunx void Lstream_fungetc (Lstream *@var{stream}, int @var{c})
+Function equivalents of the above macros.
+@end deftypefun
+@deftypefun ssize_t Lstream_read (Lstream *@var{stream}, void *@var{data}, size_t @var{size})
+Read @var{size} bytes of @var{data} from the stream.  Return the number
+of bytes read.  0 means EOF. -1 means an error occurred and no bytes
+were read.
+@end deftypefun
+@deftypefun ssize_t Lstream_write (Lstream *@var{stream}, void *@var{data}, size_t @var{size})
+Write @var{size} bytes of @var{data} to the stream.  Return the number
+of bytes written.  -1 means an error occurred and no bytes were written.
+@end deftypefun
+@deftypefun void Lstream_unread (Lstream *@var{stream}, void *@var{data}, size_t @var{size})
+Push back @var{size} bytes of @var{data} onto the input queue.  The next
+call to @code{Lstream_read()} with the same size will read the same
+bytes back.  Note that this will be the case even if there is other
+pending unread data.
+@end deftypefun
+@deftypefun int Lstream_close (Lstream *@var{stream})
+Close the stream.  All data will be flushed out.
+@end deftypefun
+@deftypefun void Lstream_reopen (Lstream *@var{stream})
+Reopen a closed stream.  This enables I/O on it again.  This is not
+meant to be called except from a wrapper routine that reinitializes
+variables and such -- the close routine may well have freed some
+necessary storage structures, for example.
+@end deftypefun
+@deftypefun void Lstream_rewind (Lstream *@var{stream})
+Rewind the stream to the beginning.
+@end deftypefun
+@node Lstream Methods
+@section Lstream Methods
+@deftypefn {Lstream Method} ssize_t reader (Lstream *@var{stream}, unsigned char *@var{data}, size_t @var{size})
+Read some data from the stream's end and store it into @var{data}, which
+can hold @var{size} bytes.  Return the number of bytes read.  A return
+value of 0 means no bytes can be read at this time.  This may be because
+of an EOF, or because there is a granularity greater than one byte that
+the stream imposes on the returned data, and @var{size} is less than
+this granularity. (This will happen frequently for streams that need to
+return whole characters, because @code{Lstream_read()} calls the reader
+function repeatedly until it has the number of bytes it wants or until 0
+is returned.)  The lstream functions do not treat a 0 return as EOF or
+do anything special; however, the calling function will interpret any 0
+it gets back as EOF.  This will normally not happen unless the caller
+calls @code{Lstream_read()} with a very small size.
+This function can be @code{NULL} if the stream is output-only.
+@end deftypefn
+@deftypefn {Lstream Method} ssize_t writer (Lstream *@var{stream}, CONST unsigned char *@var{data}, size_t @var{size})
+Send some data to the stream's end.  Data to be sent is in @var{data}
+and is @var{size} bytes.  Return the number of bytes sent.  This
+function can send and return fewer bytes than is passed in; in that
+case, the function will just be called again until there is no data left
+or 0 is returned.  A return value of 0 means that no more data can be
+currently stored, but there is no error; the data will be squirreled
+away until the writer can accept data. (This is useful, e.g., if you're
+dealing with a non-blocking file descriptor and are getting
+@code{EWOULDBLOCK} errors.)  This function can be @code{NULL} if the
+stream is input-only.
+@end deftypefn
+@deftypefn {Lstream Method} int rewinder (Lstream *@var{stream})
+Rewind the stream.  If this is @code{NULL}, the stream is not seekable.
+@end deftypefn
+@deftypefn {Lstream Method} int seekable_p (Lstream *@var{stream})
+Indicate whether this stream is seekable -- i.e. it can be rewound.
+This method is ignored if the stream does not have a rewind method.  If
+this method is not present, the result is determined by whether a rewind
+method is present.
+@end deftypefn
+@deftypefn {Lstream Method} int flusher (Lstream *@var{stream})
+Perform any additional operations necessary to flush the data in this
+stream.
+@end deftypefn
+@deftypefn {Lstream Method} int pseudo_closer (Lstream *@var{stream})
+@end deftypefn
+@deftypefn {Lstream Method} int closer (Lstream *@var{stream})
+Perform any additional operations necessary to close this stream down.
+May be @code{NULL}.  This function is called when @code{Lstream_close()}
+is called or when the stream is garbage-collected.  When this function
+is called, all pending data in the stream will already have been written
+out.
+@end deftypefn
+@deftypefn {Lstream Method} Lisp_Object marker (Lisp_Object @var{lstream}, void (*@var{markfun}) (Lisp_Object))
+Mark this object for garbage collection.  Same semantics as a standard
+@code{Lisp_Object} marker.  This function can be @code{NULL}.
+@end deftypefn
+@node Consoles; Devices; Frames; Windows, The Redisplay Mechanism, Lstreams, Top
+@chapter Consoles; Devices; Frames; Windows
+@menu
+* Introduction to Consoles; Devices; Frames; Windows::
+* Point::
+* Window Hierarchy::
+* The Window Object::
+@end menu
+@node Introduction to Consoles; Devices; Frames; Windows
+@section Introduction to Consoles; Devices; Frames; Windows
+A window-system window that you see on the screen is called a
+@dfn{frame} in Emacs terminology.  Each frame is subdivided into one or
+more non-overlapping panes, called (confusingly) @dfn{windows}.  Each
+window displays the text of a buffer in it. (See above on Buffers.) Note
+that buffers and windows are independent entities: Two or more windows
+can be displaying the same buffer (potentially in different locations),
+and a buffer can be displayed in no windows.
+A single display screen that contains one or more frames is called
+a @dfn{display}.  Under most circumstances, there is only one display.
+However, more than one display can exist, for example if you have
+a @dfn{multi-headed} console, i.e. one with a single keyboard but
+multiple displays. (Typically in such a situation, the various
+displays act like one large display, in that the mouse is only
+in one of them at a time, and moving the mouse off of one moves
+it into another.) In some cases, the different displays will
+have different characteristics, e.g. one color and one mono.
+XEmacs can display frames on multiple displays.  It can even deal
+simultaneously with frames on multiple keyboards (called @dfn{consoles} in
+XEmacs terminology).  Here is one case where this might be useful: You
+are using XEmacs on your workstation at work, and leave it running.
+Then you go home and dial in on a TTY line, and you can use the
+already-running XEmacs process to display another frame on your local
+TTY.
+Thus, there is a hierarchy console -> display -> frame -> window.
+There is a separate Lisp object type for each of these four concepts.
+Furthermore, there is logically a @dfn{selected console},
+@dfn{selected display}, @dfn{selected frame}, and @dfn{selected window}.
+Each of these objects is distinguished in various ways, such as being the
+default object for various functions that act on objects of that type.
+Note that every containing object rememembers the ``selected'' object
+among the objects that it contains: e.g. not only is there a selected
+window, but every frame remembers the last window in it that was
+selected, and changing the selected frame causes the remembered window
+within it to become the selected window.  Similar relationships apply
+for consoles to devices and devices to frames.
+@node Point
+@section Point
+Recall that every buffer has a current insertion position, called
+@dfn{point}.  Now, two or more windows may be displaying the same buffer,
+and the text cursor in the two windows (i.e. @code{point}) can be in
+two different places.  You may ask, how can that be, since each
+buffer has only one value of @code{point}?  The answer is that each window
+also has a value of @code{point} that is squirreled away in it.  There
+is only one selected window, and the value of ``point'' in that buffer
+corresponds to that window.  When the selected window is changed
+from one window to another displaying the same buffer, the old
+value of @code{point} is stored into the old window's ``point'' and the
+value of @code{point} from the new window is retrieved and made the
+value of @code{point} in the buffer.  This means that @code{window-point}
+for the selected window is potentially inaccurate, and if you
+want to retrieve the correct value of @code{point} for a window,
+you must special-case on the selected window and retrieve the
+buffer's point instead.  This is related to why @code{save-window-excursion}
+does not save the selected window's value of @code{point}.
+@node Window Hierarchy
+@section Window Hierarchy
+@cindex window hierarchy
+@cindex hierarchy of windows
+If a frame contains multiple windows (panes), they are always created
+by splitting an existing window along the horizontal or vertical axis.
+Terminology is a bit confusing here: to @dfn{split a window
+horizontally} means to create two side-by-side windows, i.e. to make a
+@emph{vertical} cut in a window.  Likewise, to @dfn{split a window
+vertically} means to create two windows, one above the other, by making
+a @emph{horizontal} cut.
+If you split a window and then split again along the same axis, you
+will end up with a number of panes all arranged along the same axis.
+The precise way in which the splits were made should not be important,
+and this is reflected internally.  Internally, all windows are arranged
+in a tree, consisting of two types of windows, @dfn{combination} windows
+(which have children, and are covered completely by those children) and
+@dfn{leaf} windows, which have no children and are visible.  Every
+combination window has two or more children, all arranged along the same
+axis.  There are (logically) two subtypes of windows, depending on
+whether their children are horizontally or vertically arrayed.  There is
+always one root window, which is either a leaf window (if the frame
+contains only one window) or a combination window (if the frame contains
+more than one window).  In the latter case, the root window will have
+two or more children, either horizontally or vertically arrayed, and
+each of those children will be either a leaf window or another
+combination window.
+Here are some rules:
+@enumerate
+@item
+Horizontal combination windows can never have children that are
+horizontal combination windows; same for vertical.
+@item
+Only leaf windows can be split (obviously) and this splitting does one
+of two things: (a) turns the leaf window into a combination window and
+creates two new leaf children, or (b) turns the leaf window into one of
+the two new leaves and creates the other leaf.  Rule (1) dictates which
+of these two outcomes happens.
+@item
+Every combination window must have at least two children.
+@item
+Leaf windows can never become combination windows.  They can be deleted,
+however.  If this results in a violation of (3), the parent combination
+window also gets deleted.
+@item
+All functions that accept windows must be prepared to accept combination
+windows, and do something sane (e.g. signal an error if so).
+Combination windows @emph{do} escape to the Lisp level.
+@item
+All windows have three fields governing their contents:
+these are @dfn{hchild} (a list of horizontally-arrayed children),
+@dfn{vchild} (a list of vertically-arrayed children), and @dfn{buffer}
+(the buffer contained in a leaf window).  Exactly one of
+these will be non-nil.  Remember that @dfn{horizontally-arrayed}
+means ``side-by-side'' and @dfn{vertically-arrayed} means
+@dfn{one above the other}.
+@item
+Leaf windows also have markers in their @code{start} (the
+first buffer position displayed in the window) and @code{pointm}
+(the window's stashed value of @code{point} -- see above) fields,
+while combination windows have nil in these fields.
+@item
+The list of children for a window is threaded through the
+@code{next} and @code{prev} fields of each child window.
+@item
+@strong{Deleted windows can be undeleted}.  This happens as a result of
+restoring a window configuration, and is unlike frames, displays, and
+consoles, which, once deleted, can never be restored.  Deleting a window
+does nothing except set a special @code{dead} bit to 1 and clear out the
+@code{next}, @code{prev}, @code{hchild}, and @code{vchild} fields, for
+GC purposes.
+@item
+Most frames actually have two top-level windows -- one for the
+minibuffer and one (the @dfn{root}) for everything else.  The modeline
+(if present) separates these two.  The @code{next} field of the root
+points to the minibuffer, and the @code{prev} field of the minibuffer
+points to the root.  The other @code{next} and @code{prev} fields are
+@code{nil}, and the frame points to both of these windows.
+Minibuffer-less frames have no minibuffer window, and the @code{next}
+and @code{prev} of the root window are @code{nil}.  Minibuffer-only
+frames have no root window, and the @code{next} of the minibuffer window
+is @code{nil} but the @code{prev} points to itself. (#### This is an
+artifact that should be fixed.)
+@end enumerate
+@node The Window Object
+@section The Window Object
+Windows have the following accessible fields:
+@table @code
+@item frame
+The frame that this window is on.
+@item mini_p
+Non-@code{nil} if this window is a minibuffer window.
+@item buffer
+The buffer that the window is displaying.  This may change often during
+the life of the window.
+@item dedicated
+Non-@code{nil} if this window is dedicated to its buffer.
+@item pointm
+@cindex window point internals
+This is the value of point in the current buffer when this window is
+selected; when it is not selected, it retains its previous value.
+@item start
+The position in the buffer that is the first character to be displayed
+in the window.
+@item force_start
+If this flag is non-@code{nil}, it says that the window has been
+scrolled explicitly by the Lisp program.  This affects what the next
+redisplay does if point is off the screen: instead of scrolling the
+window to show the text around point, it moves point to a location that
+is on the screen.
+@item last_modified
+The @code{modified} field of the window's buffer, as of the last time
+a redisplay completed in this window.
+@item last_point
+The buffer's value of point, as of the last time
+a redisplay completed in this window.
+@item left
+This is the left-hand edge of the window, measured in columns.  (The
+leftmost column on the screen is @w{column 0}.)
+@item top
+This is the top edge of the window, measured in lines.  (The top line on
+the screen is @w{line 0}.)
+@item height
+The height of the window, measured in lines.
+@item width
+The width of the window, measured in columns.
+@item next
+This is the window that is the next in the chain of siblings.  It is
+@code{nil} in a window that is the rightmost or bottommost of a group of
+siblings.
+@item prev
+This is the window that is the previous in the chain of siblings.  It is
+@code{nil} in a window that is the leftmost or topmost of a group of
+siblings.
+@item parent
+Internally, XEmacs arranges windows in a tree; each group of siblings has
+a parent window whose area includes all the siblings.  This field points
+to a window's parent.
+Parent windows do not display buffers, and play little role in display
+except to shape their child windows.  Emacs Lisp programs usually have
+no access to the parent windows; they operate on the windows at the
+leaves of the tree, which actually display buffers.
+@item hscroll
+This is the number of columns that the display in the window is scrolled
+horizontally to the left.  Normally, this is 0.
+@item use_time
+This is the last time that the window was selected.  The function
+@code{get-lru-window} uses this field.
+@item display_table
+The window's display table, or @code{nil} if none is specified for it.
+@item update_mode_line
+Non-@code{nil} means this window's mode line needs to be updated.
+@item base_line_number
+The line number of a certain position in the buffer, or @code{nil}.
+This is used for displaying the line number of point in the mode line.
+@item base_line_pos
+The position in the buffer for which the line number is known, or
+@code{nil} meaning none is known.
+@item region_showing
+If the region (or part of it) is highlighted in this window, this field
+holds the mark position that made one end of that region.  Otherwise,
+this field is @code{nil}.
+@end table
+@node The Redisplay Mechanism, Extents, Consoles; Devices; Frames; Windows, Top
+@chapter The Redisplay Mechanism
+The redisplay mechanism is one of the most complicated sections of
+XEmacs, especially from a conceptual standpoint.  This is doubly so
+because, unlike for the basic aspects of the Lisp interpreter, the
+computer science theories of how to efficiently handle redisplay are not
+well-developed.
+When working with the redisplay mechanism, remember the Golden Rules
+of Redisplay:
+@enumerate
+@item
+It Is Better To Be Correct Than Fast.
+@item
+Thou Shalt Not Run Elisp From Within Redisplay.
+@item
+It Is Better To Be Fast Than Not To Be.
+@end enumerate
+@menu
+* Critical Redisplay Sections::
+* Line Start Cache::
+* Redisplay Piece by Piece::
+@end menu
+@node Critical Redisplay Sections
+@section Critical Redisplay Sections
+@cindex critical redisplay sections
+Within this section, we are defenseless and assume that the
+following cannot happen:
+@enumerate
+@item
+garbage collection
+@item
+Lisp code evaluation
+@item
+frame size changes
+@end enumerate
+We ensure (3) by calling @code{hold_frame_size_changes()}, which
+will cause any pending frame size changes to get put on hold
+till after the end of the critical section.  (1) follows
+automatically if (2) is met.  #### Unfortunately, there are
+some places where Lisp code can be called within this section.
+We need to remove them.
+If @code{Fsignal()} is called during this critical section, we
+will @code{abort()}.
+If garbage collection is called during this critical section,
+we simply return. #### We should abort instead.
+#### If a frame-size change does occur we should probably
+actually be preempting redisplay.
+@node Line Start Cache
+@section Line Start Cache
+@cindex line start cache
+The traditional scrolling code in Emacs breaks in a variable height
+world.  It depends on the key assumption that the number of lines that
+can be displayed at any given time is fixed.  This led to a complete
+separation of the scrolling code from the redisplay code.  In order to
+fully support variable height lines, the scrolling code must actually be
+tightly integrated with redisplay.  Only redisplay can determine how
+many lines will be displayed on a screen for any given starting point.
+What is ideally wanted is a complete list of the starting buffer
+position for every possible display line of a buffer along with the
+height of that display line.  Maintaining such a full list would be very
+expensive.  We settle for having it include information for all areas
+which we happen to generate anyhow (i.e. the region currently being
+displayed) and for those areas we need to work with.
+In order to ensure that the cache accurately represents what redisplay
+would actually show, it is necessary to invalidate it in many
+situations.  If the buffer changes, the starting positions may no longer
+be correct.  If a face or an extent has changed then the line heights
+may have altered.  These events happen frequently enough that the cache
+can end up being constantly disabled.  With this potentially constant
+invalidation when is the cache ever useful?
+Even if the cache is invalidated before every single usage, it is
+necessary.  Scrolling often requires knowledge about display lines which
+are actually above or below the visible region.  The cache provides a
+convenient light-weight method of storing this information for multiple
+display regions.  This knowledge is necessary for the scrolling code to
+always obey the First Golden Rule of Redisplay.
+If the cache already contains all of the information that the scrolling
+routines happen to need so that it doesn't have to go generate it, then
+we are able to obey the Third Golden Rule of Redisplay.  The first thing
+we do to help out the cache is to always add the displayed region.  This
+region had to be generated anyway, so the cache ends up getting the
+information basically for free.  In those cases where a user is simply
+scrolling around viewing a buffer there is a high probability that this
+is sufficient to always provide the needed information.  The second
+thing we can do is be smart about invalidating the cache.
+TODO -- Be smart about invalidating the cache.  Potential places:
+@itemize @bullet
+@item
+Insertions at end-of-line which don't cause line-wraps do not alter the
+starting positions of any display lines.  These types of buffer
+modifications should not invalidate the cache.  This is actually a large
+optimization for redisplay speed as well.
+@item
+Buffer modifications frequently only affect the display of lines at and
+below where they occur.  In these situations we should only invalidate
+the part of the cache starting at where the modification occurs.
+@end itemize
+In case you're wondering, the Second Golden Rule of Redisplay is not
+applicable.
+@node Redisplay Piece by Piece
+@section Redisplay Piece by Piece
+@cindex Redisplay Piece by Piece
+As you can begin to see redisplay is complex and also not well
+documented. Chuck no longer works on XEmacs so this section is my take
+on the workings of redisplay.
+Redisplay happens in three phases:
+@enumerate
+@item
+Determine desired display in area that needs redisplay.
+Implemented by @code{redisplay.c}
+@item
+Compare desired display with current display
+Implemented by @code{redisplay-output.c}
+@item
+Output changes Implemented by @code{redisplay-output.c},
+@code{redisplay-x.c}, @code{redisplay-msw.c} and @code{redisplay-tty.c}
+@end enumerate
+Steps 1 and 2 are device-independant and relatively complex.  Step 3 is
+mostly device-dependent.
+Determining the desired display
+Display attributes are stored in @code{display_line} structures. Each
+@code{display_line} consists of a set of @code{display_block}'s and each
+@code{display_block} contains a number of @code{rune}'s. Generally
+dynarr's of @code{display_line}'s are held by each window representing
+the current display and the desired display.
+The @code{display_line} structures are tighly tied to buffers which
+presents a problem for redisplay as this connection is bogus for the
+modeline. Hence the @code{display_line} generation routines are
+duplicated for generating the modeline. This means that the modeline
+display code has many bugs that the standard redisplay code does not.
+The guts of @code{display_line} generation are in
+@code{create_text_block}, which creates a single display line for the
+desired locale. This incrementally parses the characters on the current
+line and generates redisplay structures for each.
+Gutter redisplay is different. Because the data to display is stored in
+a string we cannot use @code{create_text_block}. Instead we use
+@code{create_text_string_block} which performs the same function as
+@code{create_text_block} but for strings. Many of the complexities of
+@code{create_text_block} to do with cursor handling and selective
+display have been removed.
+@node Extents, Faces, The Redisplay Mechanism, Top
+@chapter Extents
+@menu
+* Introduction to Extents::     Extents are ranges over text, with properties.
+* Extent Ordering::             How extents are ordered internally.
+* Format of the Extent Info::   The extent information in a buffer or string.
+* Zero-Length Extents::         A weird special case.
+* Mathematics of Extent Ordering::      A rigorous foundation.
+* Extent Fragments::            Cached information useful for redisplay.
+@end menu
+@node Introduction to Extents
+@section Introduction to Extents
+Extents are regions over a buffer, with a start and an end position
+denoting the region of the buffer included in the extent.  In
+addition, either end can be closed or open, meaning that the endpoint
+is or is not logically included in the extent.  Insertion of a character
+at a closed endpoint causes the character to go inside the extent;
+insertion at an open endpoint causes the character to go outside.
+Extent endpoints are stored using memory indices (see @file{insdel.c}),
+to minimize the amount of adjusting that needs to be done when
+characters are inserted or deleted.
+(Formerly, extent endpoints at the gap could be either before or
+after the gap, depending on the open/closedness of the endpoint.
+The intent of this was to make it so that insertions would
+automatically go inside or out of extents as necessary with no
+further work needing to be done.  It didn't work out that way,
+however, and just ended up complexifying and buggifying all the
+rest of the code.)
+@node Extent Ordering
+@section Extent Ordering
+Extents are compared using memory indices.  There are two orderings
+for extents and both orders are kept current at all times.  The normal
+or @dfn{display} order is as follows:
+@example
+Extent A is ``less than'' extent B,
+that is, earlier in the display order,
+if:    A-start < B-start,
+or if: A-start = B-start, and A-end > B-end
+@end example
+So if two extents begin at the same position, the larger of them is the
+earlier one in the display order (@code{EXTENT_LESS} is true).
+For the e-order, the same thing holds:
+@example
+Extent A is ``less than'' extent B in e-order,
+that is, later in the buffer,
+if:    A-end < B-end,
+or if: A-end = B-end, and A-start > B-start
+@end example
+So if two extents end at the same position, the smaller of them is the
+earlier one in the e-order (@code{EXTENT_E_LESS} is true).
+The display order and the e-order are complementary orders: any
+theorem about the display order also applies to the e-order if you swap
+all occurrences of ``display order'' and ``e-order'', ``less than'' and
+``greater than'', and ``extent start'' and ``extent end''.
+@node Format of the Extent Info
+@section Format of the Extent Info
+An extent-info structure consists of a list of the buffer or string's
+extents and a @dfn{stack of extents} that lists all of the extents over
+a particular position.  The stack-of-extents info is used for
+optimization purposes -- it basically caches some info that might
+be expensive to compute.  Certain otherwise hard computations are easy
+given the stack of extents over a particular position, and if the
+stack of extents over a nearby position is known (because it was
+calculated at some prior point in time), it's easy to move the stack
+of extents to the proper position.
+Given that the stack of extents is an optimization, and given that
+it requires memory, a string's stack of extents is wiped out each
+time a garbage collection occurs.  Therefore, any time you retrieve
+the stack of extents, it might not be there.  If you need it to
+be there, use the @code{_force} version.
+Similarly, a string may or may not have an extent_info structure.
+(Generally it won't if there haven't been any extents added to the
+string.) So use the @code{_force} version if you need the extent_info
+structure to be there.
+A list of extents is maintained as a double gap array: one gap array
+is ordered by start index (the @dfn{display order}) and the other is
+ordered by end index (the @dfn{e-order}).  Note that positions in an
+extent list should logically be conceived of as referring @emph{to} a
+particular extent (as is the norm in programs) rather than sitting
+between two extents.  Note also that callers of these functions should
+not be aware of the fact that the extent list is implemented as an
+array, except for the fact that positions are integers (this should be
+generalized to handle integers and linked list equally well).
+@node Zero-Length Extents
+@section Zero-Length Extents
+Extents can be zero-length, and will end up that way if their endpoints
+are explicitly set that way or if their detachable property is nil
+and all the text in the extent is deleted. (The exception is open-open
+zero-length extents, which are barred from existing because there is
+no sensible way to define their properties.  Deletion of the text in
+an open-open extent causes it to be converted into a closed-open
+extent.)  Zero-length extents are primarily used to represent
+annotations, and behave as follows:
+@enumerate
+@item
+Insertion at the position of a zero-length extent expands the extent
+if both endpoints are closed; goes after the extent if it is closed-open;
+and goes before the extent if it is open-closed.
+@item
+Deletion of a character on a side of a zero-length extent whose
+corresponding endpoint is closed causes the extent to be detached if
+it is detachable; if the extent is not detachable or the corresponding
+endpoint is open, the extent remains in the buffer, moving as necessary.
+@end enumerate
+Note that closed-open, non-detachable zero-length extents behave
+exactly like markers and that open-closed, non-detachable zero-length
+extents behave like the ``point-type'' marker in Mule.
+@node Mathematics of Extent Ordering
+@section Mathematics of Extent Ordering
+@cindex extent mathematics
+@cindex mathematics of extents
+@cindex extent ordering
+@cindex display order of extents
+@cindex extents, display order
+The extents in a buffer are ordered by ``display order'' because that
+is that order that the redisplay mechanism needs to process them in.
+The e-order is an auxiliary ordering used to facilitate operations
+over extents.  The operations that can be performed on the ordered
+list of extents in a buffer are
+@enumerate
+@item
+Locate where an extent would go if inserted into the list.
+@item
+Insert an extent into the list.
+@item
+Remove an extent from the list.
+@item
+Map over all the extents that overlap a range.
+@end enumerate
+(4) requires being able to determine the first and last extents
+that overlap a range.
+NOTE: @dfn{overlap} is used as follows:
+@itemize @bullet
+@item
+two ranges overlap if they have at least one point in common.
+Whether the endpoints are open or closed makes a difference here.
+@item
+a point overlaps a range if the point is contained within the
+range; this is equivalent to treating a point @math{P} as the range
+@math{[P, P]}.
+@item
+In the case of an @emph{extent} overlapping a point or range, the extent
+is normally treated as having closed endpoints.  This applies
+consistently in the discussion of stacks of extents and such below.
+Note that this definition of overlap is not necessarily consistent with
+the extents that @code{map-extents} maps over, since @code{map-extents}
+sometimes pays attention to whether the endpoints of an extents are open
+or closed.  But for our purposes, it greatly simplifies things to treat
+all extents as having closed endpoints.
+@end itemize
+First, define @math{>}, @math{<}, @math{<=}, etc. as applied to extents
+to mean comparison according to the display order.  Comparison between
+an extent @math{E} and an index @math{I} means comparison between
+@math{E} and the range @math{[I, I]}.
+Also define @math{e>}, @math{e<}, @math{e<=}, etc. to mean comparison
+according to the e-order.
+For any range @math{R}, define @math{R(0)} to be the starting index of
+the range and @math{R(1)} to be the ending index of the range.
+For any extent @math{E}, define @math{E(next)} to be the extent directly
+following @math{E}, and @math{E(prev)} to be the extent directly
+preceding @math{E}.  Assume @math{E(next)} and @math{E(prev)} can be
+determined from @math{E} in constant time.  (This is because we store
+the extent list as a doubly linked list.)
+Similarly, define @math{E(e-next)} and @math{E(e-prev)} to be the
+extents directly following and preceding @math{E} in the e-order.
+Now:
+Let @math{R} be a range.
+Let @math{F} be the first extent overlapping @math{R}.
+Let @math{L} be the last extent overlapping @math{R}.
+Theorem 1: @math{R(1)} lies between @math{L} and @math{L(next)},
+i.e. @math{L <= R(1) < L(next)}.
+This follows easily from the definition of display order.  The
+basic reason that this theorem applies is that the display order
+sorts by increasing starting index.
+Therefore, we can determine @math{L} just by looking at where we would
+insert @math{R(1)} into the list, and if we know @math{F} and are moving
+forward over extents, we can easily determine when we've hit @math{L} by
+comparing the extent we're at to @math{R(1)}.
+@example
+Theorem 2: @math{F(e-prev) e< [1, R(0)] e<= F}.
+@end example
+This is the analog of Theorem 1, and applies because the e-order
+sorts by increasing ending index.
+Therefore, @math{F} can be found in the same amount of time as
+operation (1), i.e. the time that it takes to locate where an extent
+would go if inserted into the e-order list.
+If the lists were stored as balanced binary trees, then operation (1)
+would take logarithmic time, which is usually quite fast.  However,
+currently they're stored as simple doubly-linked lists, and instead we
+do some caching to try to speed things up.
+Define a @dfn{stack of extents} (or @dfn{SOE}) as the set of extents
+(ordered in the display order) that overlap an index @math{I}, together
+with the SOE's @dfn{previous} extent, which is an extent that precedes
+@math{I} in the e-order. (Hopefully there will not be very many extents
+between @math{I} and the previous extent.)
+Now:
+Let @math{I} be an index, let @math{S} be the stack of extents on
+@math{I}, let @math{F} be the first extent in @math{S}, and let @math{P}
+be @math{S}'s previous extent.
+Theorem 3: The first extent in @math{S} is the first extent that overlaps
+any range @math{[I, J]}.
+Proof: Any extent that overlaps @math{[I, J]} but does not include
+@math{I} must have a start index @math{> I}, and thus be greater than
+any extent in @math{S}.
+Therefore, finding the first extent that overlaps a range @math{R} is
+the same as finding the first extent that overlaps @math{R(0)}.
+Theorem 4: Let @math{I2} be an index such that @math{I2 > I}, and let
+@math{F2} be the first extent that overlaps @math{I2}.  Then, either
+@math{F2} is in @math{S} or @math{F2} is greater than any extent in
+@math{S}.
+Proof: If @math{F2} does not include @math{I} then its start index is
+greater than @math{I} and thus it is greater than any extent in
+@math{S}, including @math{F}.  Otherwise, @math{F2} includes @math{I}
+and thus is in @math{S}, and thus @math{F2 >= F}.
+@node Extent Fragments
+@section Extent Fragments
+@cindex extent fragment
+Imagine that the buffer is divided up into contiguous, non-overlapping
+@dfn{runs} of text such that no extent starts or ends within a run
+(extents that abut the run don't count).
+An extent fragment is a structure that holds data about the run that
+contains a particular buffer position (if the buffer position is at the
+junction of two runs, the run after the position is used) -- the
+beginning and end of the run, a list of all of the extents in that run,
+the @dfn{merged face} that results from merging all of the faces
+corresponding to those extents, the begin and end glyphs at the
+beginning of the run, etc.  This is the information that redisplay needs
+in order to display this run.
+Extent fragments have to be very quick to update to a new buffer
+position when moving linearly through the buffer.  They rely on the
+stack-of-extents code, which does the heavy-duty algorithmic work of
+determining which extents overly a particular position.
+@node Faces, Glyphs, Extents, Top
+@chapter Faces
+Not yet documented.
+@node Glyphs, Specifiers, Faces, Top
+@chapter Glyphs
+Glyphs are graphical elements that can be displayed in XEmacs buffers or
+gutters. We use the term graphical element here in the broadest possible
+sense since glyphs can be as mundane as text to as arcane as a native
+tab widget.
+In XEmacs, glyphs represent the uninstantiated state of graphical
+elements, i.e. they hold all the information necessary to produce an
+image on-screen but the image does not exist at this stage.
+Glyphs are lazily instantiated by calling one of the glyph
+functions. This usually occurs within redisplay when
+@code{Fglyph_height} is called. Instantiation causes an image-instance
+to be created and cached. This cache is on a device basis for all glyphs
+except glyph-widgets, and on a window basis for glyph widgets.  The
+caching is done by @code{image_instantiate} and is necessary because it
+is generally possible to display an image-instance in multiple
+domains. For instance if we create a Pixmap, we can actually display
+this on multiple windows - even though we only need a single Pixmap
+instance to do this. If caching wasn't done then it would be necessary
+to create image-instances for every displayable occurrance of a glyph -
+and every usage - and this would be extremely memory and cpu intensive.
+Widget-glyphs (a.k.a native widgets) are not cached in this way. This is
+because widget-glyph image-instances on screen are toolkit windows, and
+thus cannot be reused in multiple XEmacs domains. Thus widget-glyphs are
+cached on a window basis.
+Any action on a glyph first consults the cache before actually
+instantiating a widget.
+@section Widget-Glyphs in the MS-WIndows Environment
+To Do
+@section Widget-Glyphs in the X Environment
+Widget-glyphs under X make heavy use of lwlib for manipulating the
+native toolkit objects. This is primarily so that different toolkits can
+be supported for widget-glyphs, just as they are supported for features
+such as menubars etc.
+Lwlib is extremely poorly documented and quite hairy so here is my
+understanding of what goes on.
+Lwlib maintains a set of widget_instances which mirror the hierarchical
+state of Xt widgets. I think this is so that widgets can be updated and
+manipulated generically by the lwlib library. For instance
+update_one_widget_instance can cope with multiple types of widget and
+multiple types of toolkit. Each element in the widget hierarchy is updated
+from its corresponding widget_instance by walking the widget_instance
+tree recursively.
+This has desirable properties such as lw_modify_all_widgets which is
+called from glyphs-x.c and updates all the properties of a widget
+without having to know what the widget is or what toolkit it is from.
+Unfortunately this also has hairy properrties such as making the lwlib
+code quite complex. And of course lwlib has to know at some level what
+the widget is and how to set its properties.
+@node Specifiers, Menus, Glyphs, Top
+@chapter Specifiers
+Not yet documented.
+@node Menus, Subprocesses, Specifiers, Top
+@chapter Menus
+A menu is set by setting the value of the variable
+@code{current-menubar} (which may be buffer-local) and then calling
+@code{set-menubar-dirty-flag} to signal a change.  This will cause the
+menu to be redrawn at the next redisplay.  The format of the data in
+@code{current-menubar} is described in @file{menubar.c}.
+Internally the data in current-menubar is parsed into a tree of
+@code{widget_value's} (defined in @file{lwlib.h}); this is accomplished
+by the recursive function @code{menu_item_descriptor_to_widget_value()},
+called by @code{compute_menubar_data()}.  Such a tree is deallocated
+using @code{free_widget_value()}.
+@code{update_screen_menubars()} is one of the external entry points.
+This checks to see, for each screen, if that screen's menubar needs to
+be updated.  This is the case if
+@enumerate
+@item
+@code{set-menubar-dirty-flag} was called since the last redisplay.  (This
+function sets the C variable menubar_has_changed.)
+@item
+The buffer displayed in the screen has changed.
+@item
+The screen has no menubar currently displayed.
+@end enumerate
+@code{set_screen_menubar()} is called for each such screen.  This
+function calls @code{compute_menubar_data()} to create the tree of
+widget_value's, then calls @code{lw_create_widget()},
+@code{lw_modify_all_widgets()}, and/or @code{lw_destroy_all_widgets()}
+to create the X-Toolkit widget associated with the menu.
+@code{update_psheets()}, the other external entry point, actually
+changes the menus being displayed.  It uses the widgets fixed by
+@code{update_screen_menubars()} and calls various X functions to ensure
+that the menus are displayed properly.
+The menubar widget is set up so that @code{pre_activate_callback()} is
+called when the menu is first selected (i.e. mouse button goes down),
+and @code{menubar_selection_callback()} is called when an item is
+selected.  @code{pre_activate_callback()} calls the function in
+activate-menubar-hook, which can change the menubar (this is described
+in @file{menubar.c}).  If the menubar is changed,
+@code{set_screen_menubars()} is called.
+@code{menubar_selection_callback()} enqueues a menu event, putting in it
+a function to call (either @code{eval} or @code{call-interactively}) and
+its argument, which is the callback function or form given in the menu's
+description.
+@node Subprocesses, Interface to X Windows, Menus, Top
+@chapter Subprocesses
+The fields of a process are:
+@table @code
+@item name
+A string, the name of the process.
+@item command
+A list containing the command arguments that were used to start this
+process.
+@item filter
+A function used to accept output from the process instead of a buffer,
+or @code{nil}.
+@item sentinel
+A function called whenever the process receives a signal, or @code{nil}.
+@item buffer
+The associated buffer of the process.
+@item pid
+An integer, the Unix process @sc{id}.
+@item childp
+A flag, non-@code{nil} if this is really a child process.
+It is @code{nil} for a network connection.
+@item mark
+A marker indicating the position of the end of the last output from this
+process inserted into the buffer.  This is often but not always the end
+of the buffer.
+@item kill_without_query
+If this is non-@code{nil}, killing XEmacs while this process is still
+running does not ask for confirmation about killing the process.
+@item raw_status_low
+@itemx raw_status_high
+These two fields record 16 bits each of the process status returned by
+the @code{wait} system call.
+@item status
+The process status, as @code{process-status} should return it.
+@item tick
+@itemx update_tick
+If these two fields are not equal, a change in the status of the process
+needs to be reported, either by running the sentinel or by inserting a
+message in the process buffer.
+@item pty_flag
+Non-@code{nil} if communication with the subprocess uses a @sc{pty};
+@code{nil} if it uses a pipe.
+@item infd
+The file descriptor for input from the process.
+@item outfd
+The file descriptor for output to the process.
+@item subtty
+The file descriptor for the terminal that the subprocess is using.  (On
+some systems, there is no need to record this, so the value is
+@code{-1}.)
+@item tty_name
+The name of the terminal that the subprocess is using,
+or @code{nil} if it is using pipes.
+@end table
+@node Interface to X Windows, Index, Subprocesses, Top
+@chapter Interface to X Windows
+Not yet documented.
+@include index.texi
+@c Print the tables of contents
+@summarycontents
+@contents
+@c That's all
+@bye

Mercurial > hg > xemacs-beta

comparison man/internals/internals.texi @ 428:3ecd8885ac67 r21-2-22