xemacs-beta: man/internals/internals.texi comparison

comparison man/internals/internals.texi @ 2365:ce4aa0ef8af1

[xemacs-hg @ 2004-11-04 07:48:14 by ben] Major work on internals manual. Rearranged many chapters so as to lie in coherent divisions. Add tons of stuff to Future Work, Old Future Work, Discussions. Add lots of stuff to Mule section (Multilingual ...). Remove index.texi, incorporate into internals.texi. Section on early history and an introduction. Section on XEmacs split. Lots of new MS Windows docs Mostly recently: Windows-I18N docs. Lots if new I18N docs. Loads of other stuff. .

author	ben
date	Thu, 04 Nov 2004 07:48:14 +0000
parents	6aa56b089139
children	2d4dd2ef74e7

comparison

equal deleted inserted replaced

-:28dea3be3c6c
+:ce4aa0ef8af1
 @ifinfo
 This Info file contains v21.5 of the XEmacs Internals Manual, October 2004.
 @end ifinfo
-@c Don't update this by hand!!!!!!
+@ignore
-@c Use C-u C-c C-u m (aka C-u M-x texinfo-master-list).
+Don't update this by hand!!!!!!
-@c NOTE: This command does not include the Index:: menu entry.
+Use C-u C-c C-u m (aka C-u M-x texinfo-master-list).
-@c You must add it by hand.
+NOTE: This command does not include the Index:: menu entry.
+You must add it by hand.
-@c Here are some useful Lisp routines for quickly Texinfo-izing text that
-@c has been formatted into ASCII lists and tables.  The first routine is
+Here are some useful Lisp routines for quickly Texinfo-izing text that
-@c currently more general and well-developed than the second.
+has been formatted into ASCII lists and tables.
-@c (defun list-to-texinfo (b e)
+(defun list-to-texinfo (b e)
-@c   "Convert the selected region from an ASCII list to a Texinfo list."
+"Convert the selected region from an ASCII list to a Texinfo list."
-@c   (interactive "r")
+(interactive "r")
-@c   (save-restriction
+(save-restriction
-@c     (narrow-to-region b e)
+(narrow-to-region b e)
-@c     (goto-char (point-min))
+(goto-char (point-min))
-@c     (let ((dash-type "^ *-+ +")
+(let ((dash-type "^ *-+ +")
-@c 	  (num-type "^ *[[(]?\\([0-9]+\\|[a-z]\\)[]).] +")
+	  ;; allow single-letter numbering or roman numerals
-@c 	  dash)
+	  (letter-type "^ *[[(]?\\([a-zA-Z]\\|[IVXivx]+\\)[]).] +")
-@c       (save-excursion
+	  (num-type "^ *[[(]?[0-9]+[]).] +")
-@c 	(cond ((re-search-forward num-type nil t))
+	  dash regexp)
-@c 	      ((re-search-forward dash-type nil t) (setq dash t))
+(save-excursion
-@c 	      (t (error "No table entries?"))))
+	(re-search-forward "\\s-*")
-@c       (if dash (insert "@itemize @bullet\n")
+	(cond ((looking-at dash-type) (setq regexp dash-type dash t))
-@c 	(insert "@enumerate\n"))
+	      ((looking-at letter-type) (setq regexp letter-type))
-@c       (while (re-search-forward (if dash dash-type num-type) nil t)
+	      ((looking-at num-type) (setq regexp num-type))
-@c 	(let ((p (point)))
+	      ((re-search-forward num-type nil t) (setq regexp num-type))
-@c 	  (or (re-search-forward (if dash dash-type num-type) nil t)
+	      ((re-search-forward letter-type nil t) (setq regexp letter-type))
-@c 	      (goto-char (point-max)))
+	      ((re-search-forward dash-type nil t)
-@c 	  (beginning-of-line)
+	       (setq regexp dash-type dash t))
-@c 	  (forward-line -1)
+	      (t (error "No table entries?"))))
-@c 	  (let ((q (point)))
+(if dash (insert "@itemize @bullet\n")
-@c 	    (goto-char p)
+	(insert "@enumerate\n"))
-@c 	    (kill-rectangle p q))
+(re-search-forward regexp nil 'limit)
-@c 	  (insert "@item\n")))
+(while (not (eobp))
-@c       (goto-char (point-max))
+	(delete-region (point-at-bol) (point))
-@c       (beginning-of-line)
+	(insert "@item\n")
-@c       (if dash (insert "@end itemize\n")
+	;; move forward over any text following the dash to not screw
-@c 	(insert "@end enumerate\n")))))
+	;; up remove-spacing.
+	(forward-line 1)
-@c (defun table-to-texinfo (b e)
+	(let ((p (point)))
-@c   "Convert the selected region from an ASCII table to a Texinfo table."
+	  (or (re-search-forward regexp nil t)
-@c   (interactive "r")
+	      (goto-char (point-max)))
-@c   (save-restriction
+	  ;; trick to avoid using a marker
-@c     (narrow-to-region b e)
+	  (save-excursion
-@c     (goto-char (point-min))
+	    ;; back up so as not to affect the line we're on (beginning of
-@c     (insert "@table @code\n")
+	    ;; next entry)
-@c     (while (not (eobp))
+	    (forward-line -1)
-@c       (insert "@item ")
+	    (remove-spacing p (point)))))
-@c       (forward-sexp)
+(beginning-of-line)
-@c       (delete-char)
+(if dash (insert "@end itemize\n")
-@c       (insert "\n")
+	(insert "@end enumerate\n")))))
-@c       (or (search-forward "\n\n" nil t)
-@c 	  (goto-char (point-max))))
+(defun remove-spacing (b e)
-@c     (beginning-of-line)
+"Remove leading space from the selected region.
-@c     (insert "@end table\n")))
+This finds the maximum leading blank area common to all lines in the region.
+This includes all lines any part of which are in the region."
-@c A useful Lisp routine for adding markup based on conventions used in plain
+(interactive "r")
-@c text files; see doc string below.
+(save-excursion
+(let ((min 999999)
-@c (defun convert-text-to-texinfo (&optional no-narrow)
+	  seen)
-@c   "Convert text to Texinfo.
+(goto-char e)
-@c If the region is active, do the region; otherwise, go from point to the end
+(end-of-line)
-@c of the buffer.  This query-replaces for various kinds of conventions used
+(setq e (point))
-@c in text: @code{} surrounded by ` and ' or followed by a (); @strong{}
+(goto-char b)
-@c surrounded by *'s; @file{} something that looks like a file name."
+(beginning-of-line)
-@c   (interactive)
+(setq b (point))
-@c   (if (region-active-p)
+(while (< (point) e)
-@c       (save-restriction
+	(cond ((looking-at "^\\s-+")
-@c 	(narrow-to-region (region-beginning) (region-end))
+	       (goto-char (match-end 0))
-@c 	(convert-comments-to-texinfo t))
+	       (setq min (min min (current-column))
-@c     (let ((p (point))
+		     seen t))
-@c 	  (case-replace nil))
+	      ((looking-at "^\\s-*$"))
-@c       (query-replace-regexp "`\\([^']+\\)'\\([^']\\)" "@code{\\1}\\2" nil)
+	      (t (setq min 0)))
-@c       (goto-char p)
+	(forward-line 1))
-@c       (query-replace-regexp "\\(\\Sw\\)\\*\\(\\(?:\\s_\\|\\sw\\)+\\)\\*\\([^A-Za-z.}]\\)" "\\1@strong{\\2}\\3" nil)
+(when (and seen (> min 0))
-@c       (goto-char p)
+	(goto-char e)
-@c       (query-replace-regexp "\\(\\(\\s_\\|\\sw\\)+()\\)\\([^}]\\)" "@code{\\1}\\3" nil)
+	(untabify b e)
-@c       (goto-char p)
+	;; we are at end of line already.
-@c       (query-replace-regexp "\\(\\(\\s_\\|\\sw\\)+\\.[A-Za-z]+\\)\\([^A-Za-z.}]\\)" "@file{\\1}\\3" nil)
+	(if (not (= (point) (point-at-eol)))
-@c       )))
+	    (error "Logic error"))
+	;; Pad line with spaces if necessary (it may be just a blank line)
+	(if (< (current-column) min)
+	    (insert-char ?\  (- min (current-column)))
+	  (beginning-of-line)
+	  (forward-char min))
+	(kill-rectangle b (point))))))
+(defun table-to-texinfo (b e)
+"Convert the selected region from an ASCII table to a Texinfo table.
+Assumes entries are separated by a blank line, and the first sexp in
+each entry is the table heading."
+(interactive "r")
+(save-restriction
+(narrow-to-region b e)
+(goto-char (point-min))
+(insert "@table @code\n")
+(while (not (eobp))
+;; remember where we want to insert the @item.
+;; delete the spacing first since inserting the @item may create
+;; a line with no spacing, if there is text following the heading on
+;; the same line.
+(let ((beg (point)))
+	;; removing the space and inserting the @item will change the
+	;; position of the end of the region, so to make it easy on us
+	;; leave point at end so it will be adjusted.
+	(forward-line 1)
+	(let ((beg2 (point)))
+	  (or (re-search-forward "^$" nil t)
+	      (goto-char (point-max)))
+	  (backward-char 1)
+	  (remove-spacing beg2 (point)))
+	(ignore-errors (forward-char 2))
+	(save-excursion
+	  (goto-char beg)
+	  (insert "@item ")
+	  (forward-sexp)
+	  (delete-char)
+	  (insert "\n"))))
+(beginning-of-line)
+(insert "@end table\n")))
+A useful Lisp routine for adding markup based on conventions used in plain
+text files; see doc string below.
+(defun convert-text-to-texinfo (&optional no-narrow)
+"Convert text to Texinfo.
+If the region is active, do the region; otherwise, go from point to the end
+of the buffer.  This query-replaces for various kinds of conventions used
+in text: @code{} surrounded by ` and ' or followed by a (); @strong{}
+surrounded by *'s; @file{} something that looks like a file name."
+(interactive)
+(if (region-active-p)
+(save-restriction
+	(narrow-to-region (region-beginning) (region-end))
+	(convert-comments-to-texinfo t))
+(let ((p (point))
+	  (case-replace nil))
+(query-replace-regexp "`\\([^']+\\)'\\([^']\\)" "@code{\\1}\\2" nil)
+(goto-char p)
+(query-replace-regexp "\\(\\Sw\\)\\*\\(\\(?:\\s_\\|\\sw\\)+\\)\\*\\([^A-Za-z.}]\\)" "\\1@strong{\\2}\\3" nil)
+(goto-char p)
+(query-replace-regexp "\\(\\(\\s_\\|\\sw\\)+()\\)\\([^}]\\)" "@code{\\1}\\3" nil)
+(goto-char p)
+(query-replace-regexp "\\(\\(\\s_\\|\\sw\\)+\\.[A-Za-z]+\\)\\([^A-Za-z.}]\\)" "@file{\\1}\\3" nil)
+)))
+Macro the generate the "Future Work" section from a title; put
+point at beginning.
+(defalias 'make-future (read-kbd-macro
+"<S-end> <f3> <home> @node SPC <end> RET @section SPC <f4> <home> <up> <C-right> <right> Future SPC Work SPC - - SPC <home> <down> <C-right> <right> Future SPC Work SPC - - SPC <end> RET @cindex SPC future SPC work, SPC <f4> C-r , RET C-x C-x M-l RET @cindex SPC <f4> <home> <C-right> <S-end> M-l , SPC future SPC work RET"))
+Similar but generates a "Discussion" section.
+(defalias 'make-discussion (read-kbd-macro
+"<S-end> <f3> <home> @node SPC <end> RET @section SPC <f4> <home> <up> <C-right> <right> Discussion SPC - - SPC <home> <down> <C-right> <right> Discussion SPC - - SPC <end> RET @cindex SPC discussion, SPC <f4> C-r , RET C-x C-x M-l RET @cindex SPC <f4> <home> <C-right> <S-end> M-l , SPC discussion RET"))
+Similar but generates an "Old Future Work" section.
+(defalias 'make-old-future (read-kbd-macro
+"<S-end> <f3> <home> @node SPC <end> RET @section SPC <f4> <home> <up> <C-right> <right> Old SPC Future SPC Work SPC - - SPC <home> <down> <C-right> <right> Old SPC Future SPC Work SPC - - SPC <end> RET @cindex SPC old SPC future SPC work, SPC <f4> C-r , RET C-x C-x M-l RET @cindex SPC <f4> <home> <C-right> <S-end> M-l , SPC old SPC future SPC work RET"))
+Similar but generates a general section.
+(defalias 'make-section (read-kbd-macro
+"<S-end> <f3> <home> @node SPC <end> RET @section SPC <f4> RET @cindex SPC C-SPC C-g <f4> C-x C-x M-l <home> <down>"))
+Similar but generates a general subsection.
+(defalias 'make-subsection (read-kbd-macro
+"<S-end> <f3> <home> @node SPC <end> RET @subsection SPC <f4> RET @cindex SPC C-SPC C-g <f4> C-x C-x M-l <home> <down>"))
+@end ignore
 @menu
 * Introduction::                Overview of this manual.
 * Authorship of XEmacs::
 * A History of Emacs::          Times, dates, important events.
-* XEmacs From the Outside::     A broad conceptual overview.
+* The XEmacs Split::
+* XEmacs from the Outside::     A broad conceptual overview.
 * The Lisp Language::           An overview.
-* XEmacs From the Perspective of Building::
+* XEmacs from the Perspective of Building::
 * Build-Time Dependencies::
-* XEmacs From the Inside::
+* The Modules of XEmacs::
-* The XEmacs Object System (Abstractly Speaking)::
-* How Lisp Objects Are Represented in C::
 * Major Textual Changes::
 * Rules When Writing New C Code::
 * Regression Testing XEmacs::
 * CVS Techniques::
-* The Modules of XEmacs::
+* XEmacs from the Inside::
+* The XEmacs Object System (Abstractly Speaking)::
+* How Lisp Objects Are Represented in C::
 * Allocation of Objects in XEmacs Lisp::
-* Dumping::
+* The Lisp Reader and Compiler::
-* Events and the Event Loop::
-* Asynchronous Events; Quit Checking::
 * Evaluation; Stack Frames; Bindings::
 * Symbols and Variables::
 * Buffers::
 * Text::
 * Multilingual Support::
-* The Lisp Reader and Compiler::
-* Lstreams::
 * Consoles; Devices; Frames; Windows::
 * The Redisplay Mechanism::
 * Extents::
 * Faces::
 * Glyphs::
 * Specifiers::
 * Menus::
+* Events and the Event Loop::
+* Asynchronous Events; Quit Checking::
+* Lstreams::
 * Subprocesses::
 * Interface to MS Windows::
 * Interface to the X Window System::
+* Dumping::
 * Future Work::
 * Future Work Discussion::
 * Old Future Work::
 * Index::
 * Through Version 18::          Unification prevails.
 * Lucid Emacs::                 One version 19 Emacs.
 * GNU Emacs 19::                The other version 19 Emacs.
 * GNU Emacs 20::                The other version 20 Emacs.
 * XEmacs::                      The continuation of Lucid Emacs.
+The Modules of XEmacs
+* A Summary of the Various XEmacs Modules::
+* Low-Level Modules::
+* Basic Lisp Modules::
+* Modules for Standard Editing Operations::
+* Modules for Interfacing with the File System::
+* Modules for Other Aspects of the Lisp Interpreter and Object System::
+* Modules for Interfacing with the Operating System::
 Major Textual Changes
 * Great Integral Type Renaming::
 * Text/Char Type Renaming::
 * Modules for Regression Testing::
 CVS Techniques
 * Merging a Branch into the Trunk::
-The Modules of XEmacs
-* A Summary of the Various XEmacs Modules::
-* Low-Level Modules::
-* Basic Lisp Modules::
-* Modules for Standard Editing Operations::
-* Modules for Interfacing with the File System::
-* Modules for Other Aspects of the Lisp Interpreter and Object System::
-* Modules for Interfacing with the Operating System::
 Allocation of Objects in XEmacs Lisp
 * Introduction to Allocation::
 * Garbage Collection::
 * sweep_lcrecords_1::
 * compact_string_chars::
 * sweep_strings::
 * sweep_bit_vectors_1::
-Dumping
+Evaluation; Stack Frames; Bindings
-* Dumping Justification::
+* Evaluation::
-* Overview::
+* Dynamic Binding; The specbinding Stack; Unwind-Protects::
-* Data descriptions::
+* Simple Special Forms::
-* Dumping phase::
+* Catch and Throw::
-* Reloading phase::
+* Error Trapping::
-* Remaining issues::
+Symbols and Variables
-Dumping phase
+* Introduction to Symbols::
-* Object inventory::
+* Obarrays::
-* Address allocation::
+* Symbol Values::
-* The header::
-* Data dumping::
+Buffers
-* Pointers dumping::
+* Introduction to Buffers::     A buffer holds a block of text such as a file.
+* Buffer Lists::                Keeping track of all buffers.
+* Markers and Extents::         Tagging locations within a buffer.
+* The Buffer Object::           The Lisp object corresponding to a buffer.
+Text
+* The Text in a Buffer::        Representation of the text in a buffer.
+* Ibytes and Ichars::           Representation of individual characters.
+* Byte-Char Position Conversion::
+* Searching and Matching::      Higher-level algorithms.
+Multilingual Support
+* Introduction to Multilingual Issues #1::
+* Introduction to Multilingual Issues #2::
+* Introduction to Multilingual Issues #3::
+* Introduction to Multilingual Issues #4::
+* Character Sets::
+* Encodings::
+* Internal Mule Encodings::
+* Byte/Character Types; Buffer Positions; Other Typedefs::
+* Internal Text API's::
+* Coding for Mule::
+* CCL::
+* Microsoft Windows-Related Multilingual Issues::
+* Modules for Internationalization::
+Encodings
+* Japanese EUC (Extended Unix Code)::
+* JIS7::
+Internal Mule Encodings
+* Internal String Encoding::
+* Internal Character Encoding::
+Byte/Character Types; Buffer Positions; Other Typedefs
+* Byte Types::
+* Different Ways of Seeing Internal Text::
+* Buffer Positions::
+* Other Typedefs::
+* Usage of the Various Representations::
+* Working With the Various Representations::
+Internal Text API's
+* Basic internal-format API's::
+* The DFC API::
+* The Eistring API::
+Coding for Mule
+* Character-Related Data Types::
+* Working With Character and Byte Positions::
+* Conversion to and from External Data::
+* General Guidelines for Writing Mule-Aware Code::
+* An Example of Mule-Aware Code::
+* Mule-izing Code::
+Microsoft Windows-Related Multilingual Issues
+* Microsoft Documentation::
+* Locales::
+* More about code pages::
+* More about locales::
+* Unicode support under Windows::
+* The golden rules of writing Unicode-safe code::
+* The format of the locale in setlocale()::
+* Random other Windows I18N docs::
+Consoles; Devices; Frames; Windows
+* Introduction to Consoles; Devices; Frames; Windows::
+* Point::
+* Window Hierarchy::
+* The Window Object::
+* Modules for the Basic Displayable Lisp Objects::
+The Redisplay Mechanism
+* Critical Redisplay Sections::
+* Line Start Cache::
+* Redisplay Piece by Piece::
+* Modules for the Redisplay Mechanism::
+* Modules for other Display-Related Lisp Objects::
+Extents
+* Introduction to Extents::     Extents are ranges over text, with properties.
+* Extent Ordering::             How extents are ordered internally.
+* Format of the Extent Info::   The extent information in a buffer or string.
+* Zero-Length Extents::         A weird special case.
+* Mathematics of Extent Ordering::  A rigorous foundation.
+* Extent Fragments::            Cached information useful for redisplay.
 Events and the Event Loop
 * Introduction to Events::
 * Main Loop::
 * Control-G (Quit) Checking::
 * Profiling::
 * Asynchronous Timeouts::
 * Exiting::
-Evaluation; Stack Frames; Bindings
-* Evaluation::
-* Dynamic Binding; The specbinding Stack; Unwind-Protects::
-* Simple Special Forms::
-* Catch and Throw::
-Symbols and Variables
-* Introduction to Symbols::
-* Obarrays::
-* Symbol Values::
-Buffers
-* Introduction to Buffers::     A buffer holds a block of text such as a file.
-* Buffer Lists::                Keeping track of all buffers.
-* Markers and Extents::         Tagging locations within a buffer.
-* The Buffer Object::           The Lisp object corresponding to a buffer.
-Text
-* The Text in a Buffer::        Representation of the text in a buffer.
-* Ibytes and Ichars::           Representation of individual characters.
-* Byte-Char Position Conversion::
-* Searching and Matching::      Higher-level algorithms.
-Multilingual Support
-* Introduction to Multilingual Issues #1::
-* Introduction to Multilingual Issues #2::
-* Introduction to Multilingual Issues #3::
-* Introduction to Multilingual Issues #4::
-* Character Sets::
-* Encodings::
-* Internal Mule Encodings::
-* Byte/Character Types; Buffer Positions; Other Typedefs::
-* Internal Text API's::
-* Coding for Mule::
-* CCL::
-* Modules for Internationalization::
-Encodings
-* Japanese EUC (Extended Unix Code)::
-* JIS7::
-Internal Mule Encodings
-* Internal String Encoding::
-* Internal Character Encoding::
-Byte/Character Types; Buffer Positions; Other Typedefs
-* Byte Types::
-* Different Ways of Seeing Internal Text::
-* Buffer Positions::
-* Other Typedefs::
-* Usage of the Various Representations::
-* Working With the Various Representations::
-Internal Text API's
-* Basic internal-format API's::
-* The DFC API::
-* The Eistring API::
-Coding for Mule
-* Character-Related Data Types::
-* Working With Character and Byte Positions::
-* Conversion to and from External Data::
-* General Guidelines for Writing Mule-Aware Code::
-* An Example of Mule-Aware Code::
-* Mule-izing Code::
 Lstreams
 * Creating an Lstream::         Creating an lstream object.
 * Lstream Types::               Different sorts of things that are streamed.
 * Lstream Functions::           Functions for working with lstreams.
 * Lstream Methods::             Creating new lstream types.
-Consoles; Devices; Frames; Windows
-* Introduction to Consoles; Devices; Frames; Windows::
-* Point::
-* Window Hierarchy::
-* The Window Object::
-* Modules for the Basic Displayable Lisp Objects::
-The Redisplay Mechanism
-* Critical Redisplay Sections::
-* Line Start Cache::
-* Redisplay Piece by Piece::
-* Modules for the Redisplay Mechanism::
-* Modules for other Display-Related Lisp Objects::
-Extents
-* Introduction to Extents::     Extents are ranges over text, with properties.
-* Extent Ordering::             How extents are ordered internally.
-* Format of the Extent Info::   The extent information in a buffer or string.
-* Zero-Length Extents::         A weird special case.
-* Mathematics of Extent Ordering::  A rigorous foundation.
-* Extent Fragments::            Cached information useful for redisplay.
 Interface to MS Windows
 * Different kinds of Windows environments::
 * Windows Build Flags::
 * Menubars::
 * Checkboxes and Radio Buttons::
 * Progress Bars::
 * Tab Controls::
+Dumping
+* Dumping Justification::
+* Overview::
+* Data descriptions::
+* Dumping phase::
+* Reloading phase::
+* Remaining issues::
+Dumping phase
+* Object inventory::
+* Address allocation::
+* The header::
+* Data dumping::
+* Pointers dumping::
 Future Work
+* Future Work -- General Suggestions::
 * Future Work -- Elisp Compatibility Package::
 * Future Work -- Drag-n-Drop::
 * Future Work -- Standard Interface for Enabling Extensions::
 * Future Work -- Better Initialization File Scheme::
 * Future Work -- Keyword Parameters::
 Future Work -- Byte Code Snippets
 * Future Work -- Autodetection::
 * Future Work -- Conversion Error Detection::
+* Future Work -- Unicode::
 * Future Work -- BIDI Support::
 * Future Work -- Localized Text/Messages::
 Future Work -- Lisp Engine Replacement
 * Future Work -- Lisp Engine Discussion::
 * Future Work -- Lisp Engine Replacement -- Implementation::
+* Future Work -- Startup File Modification by Packages::
 Future Work Discussion
 * Discussion -- garbage collection::
 * Discussion -- glyphs::
+* Discussion -- Dialog Boxes::
+* Discussion -- Multilingual Issues::
+* Discussion -- Windows External Widget::
+* Discussion -- Packages::
+* Discussion -- Distribution Layout::
 Old Future Work
-* Future Work -- A Portable Unexec Replacement::
+* Old Future Work -- A Portable Unexec Replacement::
-* Future Work -- Indirect Buffers::
+* Old Future Work -- Indirect Buffers::
-* Future Work -- Improvements in support for non-ASCII (European) keysyms under X::
+* Old Future Work -- Improvements in support for non-ASCII (European) keysyms under X::
-* Future Work -- xemacs.org Mailing Address Changes::
+* Old Future Work -- RTF Clipboard Support::
-* Future Work -- Lisp callbacks from critical areas of the C code::
+* Old Future Work -- xemacs.org Mailing Address Changes::
+* Old Future Work -- Lisp callbacks from critical areas of the C code::
 @end detailmenu
 @end menu
 @node Introduction, Authorship of XEmacs, Top, Top
 the snapshot of the code you are looking at, and in the case of
 contradictions between the code comments and the manual, @strong{always}
 assume that the code comments are correct. (Because of the proximity of
 the comments to the code, comments will rarely be out-of-date.)
+The manual is organized in chapters which are broadly grouped into major
+divisions:
+@enumerate
+@item
+First is the introduction, including this chapter and chapters on the
+history and authorship of XEmacs.
+@item
+Next, starting with @ref{XEmacs from the Outside}, are some chapters
+giving a broad overview of the internal workings of XEmacs and
+documenting important information relevant to those working on the code.
+@item
+The remaining divisions document the nitty-gritty details of the
+internal workings.  First, starting with @ref{XEmacs from the Outside},
+is a division on the workings of the Lisp interpreter that drives
+XEmacs.
+@item
+Next, starting with @ref{Buffers}, is a division on the parts of the
+code specifically devoted to text processing, including multilingual
+support (Mule).
+@item
+Afterwards, starting with @ref{Consoles; Devices; Frames; Windows}, is a
+division covering the display mechanism and the objects and modules
+relevant to this.
+@item
+Then, starting with @ref{Events and the Event Loop}, is a division
+covering the interface between XEmacs and the outside world, including
+user interactions, subprocesses, file I/O, interfaces to particular
+windowing systems, and dumping.
+@item
+Finally, starting with @ref{Future Work}, is a division containing
+proposals and discussion relating to future work on XEmacs.
+@end enumerate
 This manual was primarily written by Ben Wing.  Certain sections were
 written by others, including those mentioned on the title page as well
 as other coders.  Some sections were lifted directly from comments in
 the code, and in those cases we may not completely be aware of the
 authorship.  In addition, due to the collaborative nature of XEmacs,
 @table @asis
 @item Stephen Turnbull
 Various cleanup work, mostly post-2000.  Object-Oriented Techniques in
 XEmacs.  A Reader's Guide to XEmacs Coding Conventions.  Searching and
 Matching.  Regression Testing XEmacs.  Modules for Regression Testing.
-Lucid Widget Library.
+Lucid Widget Library.  A number of sections in the Future Work chapter.
 @item Martin Buchholz
 Various cleanup work, mostly pre-2001.  Docs on inline functions.  Docs
 on dfc conversion functions (Conversion to and from External Data).
 Improvements in support for non-ASCII (European) keysyms under X.
+A section or two in the Future Work chapter.
 @item Hrvoje Niksic
 Coding for Mule.
 @item Matthias Neubauer
 Garbage Collection - Step by Step.
 @item Olivier Galibert
 Redisplay Piece by Piece.  Glyphs.
 @item Chuck Thompson
 Line Start Cache.
 @item Kenichi Handa
 CCL.
+@item Jamie Zawinski
+A couple of sections in the Future Work chapter.
 @end table
 @node Authorship of XEmacs, A History of Emacs, Introduction, Top
 @chapter Authorship of XEmacs
 @cindex authorship, XEmacs
 @item alloca.s
 Inherited almost unchanged from FSF kept in sync up through 19.30
 basically no changes for Xemacs.
 @end table
-@node A History of Emacs, XEmacs From the Outside, Authorship of XEmacs, Top
+@node A History of Emacs, The XEmacs Split, Authorship of XEmacs, Top
 @chapter A History of Emacs
 @cindex history of Emacs, a
 @cindex Emacs, a history of
 @cindex Hackers (Steven Levy)
 @cindex Levy, Steven
 version 21.2.45 released February 23, 2001.
 @item
 version 21.2.46 released March 21, 2001.
 @end itemize
-@node XEmacs From the Outside, The Lisp Language, A History of Emacs, Top
+@node The XEmacs Split, XEmacs from the Outside, A History of Emacs, Top
-@chapter XEmacs From the Outside
+@chapter The XEmacs Split
+@cindex XEmacs split
+Author: @uref{mailto:ben@@xemacs.org,Ben Wing}
+@strong{NOTE NOTE NOTE}: The following is a @strong{highly} opinionated
+piece written by one of the main authors of XEmacs.  This reflects his
+opinions, and his only!  It is included here because it may help to
+clarify some of the issues that are keeping the two versions of Emacs
+separate.
+Many people look at the split between GNU Emacs and XEmacs and are
+convinced that the XEmacs team is being needlessly divisive and just needs
+to cooperate a bit with RMS, and the two versions of Emacs will merge. In
+fact there have been six to seven major attempts at merging, each running
+hundreds of messages long and all of them coming from the XEmacs side. All
+have failed because they have eventually come to the same conclusion, which
+is that RMS has no real interest in cooperation at all. If you work with
+him, you have to do it his way -- "my way or the highway".  Specifically:
+@enumerate
+@item
+RMS insists on having legal papers signed for every bit of code that goes
+into GNU Emacs. RMS's lawyers have told him that every contribution over
+ten lines long requires legal papers. These papers cannot be filled out
+over to the web but must be done so in person and mailed to the FSF.
+Obviously this by itself has a tendency to inhibit contributions because of
+the hassle factor. Furthermore, many people (and especially organizations)
+are either hesitant to or refuse to sign legal papers, for reasons
+mentioned below.  Because of these reasons, XEmacs has never enforced legal
+signed papers for the code in it. Such papers are not a part of the GPL and
+are not required by any projects other than those of the FSF (for example,
+Linux does not require such papers). Since we do not know exactly who is
+the author of every bit of code that has been contributed to XEmacs in the
+last nine years, we would essentially have to rewrite large sections of the
+code. The situation however, is worse than that because many of the large
+copyright holders of XEmacs (for example Sun Microsystems) refuse to sign
+legal papers. Although they have not stated their reasons, there are quite
+a number of reasons not to sign legal papers:
+@itemize @bullet
+@item
+By doing so you essentially give up all control over your code. You can
+no longer release your code under a different license. If you want to
+use your code that you've contributed to the FSF in a project of your
+own, and that project is not released under the GPL, you are not allowed
+to do this. Obviously, large companies tend to want to reuse their code
+in many different projects and as a result feel very uncomfortable about
+signing legal papers.
+@item
+One of the dangers of assigning copyright to the FSF is that if the FSF
+happens to be taken over by some evil corporate identity or anyone with
+different ideas than RMS, they will own all copyright-assigned code, and
+can revoke the GPL and enforce any license they please.  If the code has
+many different copyright holders, this is much less likely of a
+scenario.
+@end itemize
+@item
+RMS does not like abstract data structures. Abstract data structures are
+the foundation of XEmacs and most other modern programming projects. In
+my opinion, is difficult to impossible to write maintainable and
+expandable code without using abstract data structures. In merging talks
+with RMS he has said we can have any abstract data structures we want in
+a merged version but must allow direct access to the implementation as
+well, which defeats the primary purpose of having abstract data
+structures.
+@item
+RMS is very unwilling to compromise when it comes to divergent
+implementations of the same functionality, which is very common between
+XEmacs and GNU Emacs. Rather than taking the better interface on
+technical grounds, RMS insists that both interfaces must be implemented
+in C at the same level (rather than implementing one in C and the other
+on top if it), so that code that uses either interface is just as
+fast. This means that the resulting merged Emacs would be filled with a
+lot of very complicated code to simultaneously support two divergent
+interfaces, and would be difficult to maintain in this state.
+@item
+RMS's idea of compromise and cooperation is almost purely political
+rather than technical. The XEmacs maintainers would like to have issues
+resolved by examining them technically and deciding what makes the most
+sense from a technical prospective. RMS however, wants to proceed on a
+tit for tat kind of basis, which is to say, �If we support this feature
+of yours, we also get to support this other feature of mine.� The
+result of such a process is typically a big mess, because there is no
+overarching design but instead a great deal of incompatible things
+hodgepodged together.
+@end enumerate
+If only some of the above differences were firmly held by RMS, and if he
+were willing to compromise effectively on the others and to demonstrate
+willingness to work with us on the issues that he is less willing to
+compromise on, we might go ahead with the merge despite misgivings. However
+RMS has shown no real interest at all in compromising. He has never stated
+how all of the redundant work that would be required to support his
+preconditions would get done. It's unlikely that he would do it all and
+it's certainly not clear that the XEmacs project would be willing to do it
+all, given that it is a tremendous amount of extra work and the XEmacs
+project is already strapped for coding resources. (Not to mention the
+inherent difficulty in convincing people to redo existing work for
+primarily political reasons.) In general the free software community is
+quite strapped as a whole for coding resources; duplicative efforts amount
+to very little positively and have a lot of negative effects in that they
+take away what few resources we do have from projects that would actually
+be useful.
+RMS however, does not seem to be bothered by this. He is more interested in
+sticking firm to his principles, though the heavens may fall down, than in
+working forward to create genuinely useful software. It is abundantly clear
+that RMS has no real interest in unity except if it happens to be on his
+own terms and allows him ultimate control over the result. He would rather
+see nothing happen at all than something that is not exactly according to
+his principles.  The fact that few if any people share his principles is
+meaningless to him.
+@node XEmacs from the Outside, The Lisp Language, The XEmacs Split, Top
+@chapter XEmacs from the Outside
 @cindex XEmacs from the outside
 @cindex outside, XEmacs from the
 @cindex read-eval-print
 XEmacs appears to the outside world as an editor, but it is really a
 @cindex pi, calculating
 Note that you do not have to use XEmacs as an editor; you could just
 as well make it do your taxes, compute pi, play bridge, etc.  You'd just
 have to write functions to do those operations in Lisp.
-@node The Lisp Language, XEmacs From the Perspective of Building, XEmacs From the Outside, Top
+@node The Lisp Language, XEmacs from the Perspective of Building, XEmacs from the Outside, Top
 @chapter The Lisp Language
 @cindex Lisp language, the
 @cindex Lisp vs. C
 @cindex C vs. Lisp
 @cindex Lisp vs. Java
 The word @dfn{application} in the previous paragraph was used
 intentionally.  XEmacs implements an API for programs written in Lisp
 that makes it a full-fledged application platform, very much like an OS
 inside the real OS.
-@node XEmacs From the Perspective of Building, Build-Time Dependencies, The Lisp Language, Top
+@node XEmacs from the Perspective of Building, Build-Time Dependencies, The Lisp Language, Top
-@chapter XEmacs From the Perspective of Building
+@chapter XEmacs from the Perspective of Building
 @cindex XEmacs from the perspective of building
 @cindex building, XEmacs from the perspective of
 The heart of XEmacs is the Lisp environment, which is written in C.
 This is contained in the @file{src/} subdirectory.  Underneath
 This is useful when the dumping procedure described above is broken, or
 when using certain program debugging tools such as Purify.  These tools
 get mighty confused by the tricks played by the XEmacs build process,
 such as allocating memory in one process, and freeing it in the next.
-@node Build-Time Dependencies, XEmacs From the Inside, XEmacs From the Perspective of Building, Top
+@node Build-Time Dependencies, The Modules of XEmacs, XEmacs from the Perspective of Building, Top
 @chapter Build-Time Dependencies
 @cindex build-time dependencies
 @cindex dependencies, build-time
 This is a collection of random notes on build-time dependencies as of
 use any higher-level functionality that might load @file{custom.el}, but
 you do not need @file{subr.el}, you should @samp{defvar}
 @code{custom-declare-variable-list} to prevent the @samp{void-variable}
 error.  (Currently this is only needed for @file{make-docfile.el}.)
-@node XEmacs From the Inside, The XEmacs Object System (Abstractly Speaking), Build-Time Dependencies, Top
+@node The Modules of XEmacs, Major Textual Changes, Build-Time Dependencies, Top
-@chapter XEmacs From the Inside
-@cindex XEmacs from the inside
-@cindex inside, XEmacs from the
-Internally, XEmacs is quite complex, and can be very confusing.  To
-simplify things, it can be useful to think of XEmacs as containing an
-event loop that ``drives'' everything, and a number of other subsystems,
-such as a Lisp engine and a redisplay mechanism.  Each of these other
-subsystems exists simultaneously in XEmacs, and each has a certain
-state.  The flow of control continually passes in and out of these
-different subsystems in the course of normal operation of the editor.
-It is important to keep in mind that, most of the time, the editor is
-``driven'' by the event loop.  Except during initialization and batch
-mode, all subsystems are entered directly or indirectly through the
-event loop, and ultimately, control exits out of all subsystems back up
-to the event loop.  This cycle of entering a subsystem, exiting back out
-to the event loop, and starting another iteration of the event loop
-occurs once each keystroke, mouse motion, etc.
-If you're trying to understand a particular subsystem (other than the
-event loop), think of it as a ``daemon'' process or ``servant'' that is
-responsible for one particular aspect of a larger system, and
-periodically receives commands or environment changes that cause it to
-do something.  Ultimately, these commands and environment changes are
-always triggered by the event loop.  For example:
-@itemize @bullet
-@item
-The window and frame mechanism is responsible for keeping track of what
-windows and frames exist, what buffers are in them, etc.  It is
-periodically given commands (usually from the user) to make a change to
-the current window/frame state: i.e. create a new frame, delete a
-window, etc.
-@item
-The buffer mechanism is responsible for keeping track of what buffers
-exist and what text is in them.  It is periodically given commands
-(usually from the user) to insert or delete text, create a buffer, etc.
-When it receives a text-change command, it notifies the redisplay
-mechanism.
-@item
-The redisplay mechanism is responsible for making sure that windows and
-frames are displayed correctly.  It is periodically told (by the event
-loop) to actually ``do its job'', i.e. snoop around and see what the
-current state of the environment (mostly of the currently-existing
-windows, frames, and buffers) is, and make sure that state matches
-what's actually displayed.  It keeps lots and lots of information around
-(such as what is actually being displayed currently, and what the
-environment was last time it checked) so that it can minimize the work
-it has to do.  It is also helped along in that whenever a relevant
-change to the environment occurs, the redisplay mechanism is told about
-this, so it has a pretty good idea of where it has to look to find
-possible changes and doesn't have to look everywhere.
-@item
-The Lisp engine is responsible for executing the Lisp code in which most
-user commands are written.  It is entered through a call to @code{eval}
-or @code{funcall}, which occurs as a result of dispatching an event from
-the event loop.  The functions it calls issue commands to the buffer
-mechanism, the window/frame subsystem, etc.
-@item
-The Lisp allocation subsystem is responsible for keeping track of Lisp
-objects.  It is given commands from the Lisp engine to allocate objects,
-garbage collect, etc.
-@end itemize
-etc.
-The important idea here is that there are a number of independent
-subsystems each with its own responsibility and persistent state, just
-like different employees in a company, and each subsystem is
-periodically given commands from other subsystems.  Commands can flow
-from any one subsystem to any other, but there is usually some sort of
-hierarchy, with all commands originating from the event subsystem.
-XEmacs is entered in @code{main()}, which is in @file{emacs.c}.  When
-this is called the first time (in a properly-invoked @file{temacs}), it
-does the following:
-@enumerate
-@item
-It does some very basic environment initializations, such as determining
-where it and its directories (e.g. @file{lisp/} and @file{etc/}) reside
-and setting up signal handlers.
-@item
-It initializes the entire Lisp interpreter.
-@item
-It sets the initial values of many built-in variables (including many
-variables that are visible to Lisp programs), such as the global keymap
-object and the built-in faces (a face is an object that describes the
-display characteristics of text).  This involves creating Lisp objects
-and thus is dependent on step (2).
-@item
-It performs various other initializations that are relevant to the
-particular environment it is running in, such as retrieving environment
-variables, determining the current date and the user who is running the
-program, examining its standard input, creating any necessary file
-descriptors, etc.
-@item
-At this point, the C initialization is complete.  A Lisp program that
-was specified on the command line (usually @file{loadup.el}) is called
-(temacs is normally invoked as @code{temacs -batch -l loadup.el dump}).
-@file{loadup.el} loads all of the other Lisp files that are needed for
-the operation of the editor, calls the @code{dump-emacs} function to
-write out @file{xemacs}, and then kills the temacs process.
-@end enumerate
-When @file{xemacs} is then run, it only redoes steps (1) and (4)
-above; all variables already contain the values they were set to when
-the executable was dumped, and all memory that was allocated with
-@code{malloc()} is still around. (XEmacs knows whether it is being run
-as @file{xemacs} or @file{temacs} because it sets the global variable
-@code{initialized} to 1 after step (4) above.) At this point,
-@file{xemacs} calls a Lisp function to do any further initialization,
-which includes parsing the command-line (the C code can only do limited
-command-line parsing, which includes looking for the @samp{-batch} and
-@samp{-l} flags and a few other flags that it needs to know about before
-initialization is complete), creating the first frame (or @dfn{window}
-in standard window-system parlance), running the user's init file
-(usually the file @file{.emacs} in the user's home directory), etc.  The
-function to do this is usually called @code{normal-top-level};
-@file{loadup.el} tells the C code about this function by setting its
-name as the value of the Lisp variable @code{top-level}.
-When the Lisp initialization code is done, the C code enters the event
-loop, and stays there for the duration of the XEmacs process.  The code
-for the event loop is contained in @file{cmdloop.c}, and is called
-@code{Fcommand_loop_1()}.  Note that this event loop could very well be
-written in Lisp, and in fact a Lisp version exists; but apparently,
-doing this makes XEmacs run noticeably slower.
-Notice how much of the initialization is done in Lisp, not in C.
-In general, XEmacs tries to move as much code as is possible
-into Lisp.  Code that remains in C is code that implements the
-Lisp interpreter itself, or code that needs to be very fast, or
-code that needs to do system calls or other such stuff that
-needs to be done in C, or code that needs to have access to
-``forbidden'' structures. (One conscious aspect of the design of
-Lisp under XEmacs is a clean separation between the external
-interface to a Lisp object's functionality and its internal
-implementation.  Part of this design is that Lisp programs
-are forbidden from accessing the contents of the object other
-than through using a standard API.  In this respect, XEmacs Lisp
-is similar to modern Lisp dialects but differs from GNU Emacs,
-which tends to expose the implementation and allow Lisp
-programs to look at it directly.  The major advantage of
-hiding the implementation is that it allows the implementation
-to be redesigned without affecting any Lisp programs, including
-those that might want to be ``clever'' by looking directly at
-the object's contents and possibly manipulating them.)
-Moving code into Lisp makes the code easier to debug and maintain and
-makes it much easier for people who are not XEmacs developers to
-customize XEmacs, because they can make a change with much less chance
-of obscure and unwanted interactions occurring than if they were to
-change the C code.
-@node The XEmacs Object System (Abstractly Speaking), How Lisp Objects Are Represented in C, XEmacs From the Inside, Top
-@chapter The XEmacs Object System (Abstractly Speaking)
-@cindex XEmacs object system (abstractly speaking), the
-@cindex object system (abstractly speaking), the XEmacs
-At the heart of the Lisp interpreter is its management of objects.
-XEmacs Lisp contains many built-in objects, some of which are
-simple and others of which can be very complex; and some of which
-are very common, and others of which are rarely used or are only
-used internally. (Since the Lisp allocation system, with its
-automatic reclamation of unused storage, is so much more convenient
-than @code{malloc()} and @code{free()}, the C code makes extensive use of it
-in its internal operations.)
-The basic Lisp objects are
-@table @code
-@item integer
-31 bits of precision, or 63 bits on 64-bit machines; the
-reason for this is described below when the internal Lisp object
-representation is described.
-@item char
-An object representing a single character of text; chars behave like
-integers in many ways but are logically considered text rather than
-numbers and have a different read syntax. (the read syntax for a char
-contains the char itself or some textual encoding of it---for example,
-a Japanese Kanji character might be encoded as @samp{^[$(B#&^[(B} using the
-ISO-2022 encoding standard---rather than the numerical representation
-of the char; this way, if the mapping between chars and integers
-changes, which is quite possible for Kanji characters and other extended
-characters, the same character will still be created.  Note that some
-primitives confuse chars and integers.  The worst culprit is @code{eq},
-which makes a special exception and considers a char to be @code{eq} to
-its integer equivalent, even though in no other case are objects of two
-different types @code{eq}.  The reason for this monstrosity is
-compatibility with existing code; the separation of char from integer
-came fairly recently.)
-@item float
-Same precision as a double in C.
-@item bignum
-@itemx ratio
-@itemx bigfloat
-As build-time options, arbitrary-precision numbers are available.
-Bignums are integers, and when available they remove the restriction on
-buffer size.  Ratios are non-integral rational numbers.  Bigfloats are
-arbitrary-precision floating point numbers, with precision specified at
-runtime.
-@item symbol
-An object that contains Lisp objects and is referred to by name;
-symbols are used to implement variables and named functions
-and to provide the equivalent of preprocessor constants in C.
-@item string
-Self-explanatory; behaves much like a vector of chars
-but has a different read syntax and is stored and manipulated
-more compactly.
-@item bit-vector
-A vector of bits; similar to a string in spirit.
-@item vector
-A one-dimensional array of Lisp objects providing constant-time access
-to any of the objects; access to an arbitrary object in a vector is
-faster than for lists, but the operations that can be done on a vector
-are more limited.
-@item compiled-function
-An object containing compiled Lisp code, known as @dfn{byte code}.
-@item subr
-A Lisp primitive, i.e. a Lisp-callable function implemented in C.
-@item cons
-A simple container for two Lisp objects, used to implement lists and
-most other data structures in Lisp.
-@end table
-Objects which are not conses are called atoms.
-@cindex closure
-Note that there is no basic ``function'' type, as in more powerful
-versions of Lisp (where it's called a @dfn{closure}).  XEmacs Lisp does
-not provide the closure semantics implemented by Common Lisp and Scheme.
-The guts of a function in XEmacs Lisp are represented in one of four
-ways: a symbol specifying another function (when one function is an
-alias for another), a list (whose first element must be the symbol
-@code{lambda}) containing the function's source code, a
-compiled-function object, or a subr object. (In other words, given a
-symbol specifying the name of a function, calling @code{symbol-function}
-to retrieve the contents of the symbol's function cell will return one
-of these types of objects.)
-XEmacs Lisp also contains numerous specialized objects used to implement
-the editor:
-@table @code
-@item buffer
-Stores text like a string, but is optimized for insertion and deletion
-and has certain other properties that can be set.
-@item frame
-An object with various properties whose displayable representation is a
-@dfn{window} in window-system parlance.
-@item window
-A section of a frame that displays the contents of a buffer;
-often called a @dfn{pane} in window-system parlance.
-@item window-configuration
-An object that represents a saved configuration of windows in a frame.
-@item device
-An object representing a screen on which frames can be displayed;
-equivalent to a @dfn{display} in the X Window System and a @dfn{TTY} in
-character mode.
-@item face
-An object specifying the appearance of text or graphics; it has
-properties such as font, foreground color, and background color.
-@item marker
-An object that refers to a particular position in a buffer and moves
-around as text is inserted and deleted to stay in the same relative
-position to the text around it.
-@item extent
-Similar to a marker but covers a range of text in a buffer; can also
-specify properties of the text, such as a face in which the text is to
-be displayed, whether the text is invisible or unmodifiable, etc.
-@item event
-Generated by calling @code{next-event} and contains information
-describing a particular event happening in the system, such as the user
-pressing a key or a process terminating.
-@item keymap
-An object that maps from events (described using lists, vectors, and
-symbols rather than with an event object because the mapping is for
-classes of events, rather than individual events) to functions to
-execute or other events to recursively look up; the functions are
-described by name, using a symbol, or using lists to specify the
-function's code.
-@item glyph
-An object that describes the appearance of an image (e.g.  pixmap) on
-the screen; glyphs can be attached to the beginning or end of extents
-and in some future version of XEmacs will be able to be inserted
-directly into a buffer.
-@item process
-An object that describes a connection to an externally-running process.
-@end table
-There are some other, less-commonly-encountered general objects:
-@table @code
-@item hash-table
-An object that maps from an arbitrary Lisp object to another arbitrary
-Lisp object, using hashing for fast lookup.
-@item obarray
-A limited form of hash-table that maps from strings to symbols; obarrays
-are used to look up a symbol given its name and are not actually their
-own object type but are kludgily represented using vectors with hidden
-fields (this representation derives from GNU Emacs).
-@item specifier
-A complex object used to specify the value of a display property; a
-default value is given and different values can be specified for
-particular frames, buffers, windows, devices, or classes of device.
-@item char-table
-An object that maps from chars or classes of chars to arbitrary Lisp
-objects; internally char tables use a complex nested-vector
-representation that is optimized to the way characters are represented
-as integers.
-@item range-table
-An object that maps from ranges of integers to arbitrary Lisp objects.
-@end table
-And some strange special-purpose objects:
-@table @code
-@item charset
-@itemx coding-system
-Objects used when MULE, or multi-lingual/Asian-language, support is
-enabled.
-@item color-instance
-@itemx font-instance
-@itemx image-instance
-An object that encapsulates a window-system resource; instances are
-mostly used internally but are exposed on the Lisp level for cleanness
-of the specifier model and because it's occasionally useful for Lisp
-program to create or query the properties of instances.
-@item subwindow
-An object that encapsulate a @dfn{subwindow} resource, i.e. a
-window-system child window that is drawn into by an external process;
-this object should be integrated into the glyph system but isn't yet,
-and may change form when this is done.
-@item tooltalk-message
-@itemx tooltalk-pattern
-Objects that represent resources used in the ToolTalk interprocess
-communication protocol.
-@item toolbar-button
-An object used in conjunction with the toolbar.
-@end table
-And objects that are only used internally:
-@table @code
-@item opaque
-A generic object for encapsulating arbitrary memory; this allows you the
-generality of @code{malloc()} and the convenience of the Lisp object
-system.
-@item lstream
-A buffering I/O stream, used to provide a unified interface to anything
-that can accept output or provide input, such as a file descriptor, a
-stdio stream, a chunk of memory, a Lisp buffer, a Lisp string, etc.;
-it's a Lisp object to make its memory management more convenient.
-@item char-table-entry
-Subsidiary objects in the internal char-table representation.
-@item extent-auxiliary
-@itemx menubar-data
-@itemx toolbar-data
-Various special-purpose objects that are basically just used to
-encapsulate memory for particular subsystems, similar to the more
-general ``opaque'' object.
-@item symbol-value-forward
-@itemx symbol-value-buffer-local
-@itemx symbol-value-varalias
-@itemx symbol-value-lisp-magic
-Special internal-only objects that are placed in the value cell of a
-symbol to indicate that there is something special with this variable --
-e.g. it has no value, it mirrors another variable, or it mirrors some C
-variable; there is really only one kind of object, called a
-@dfn{symbol-value-magic}, but it is sort-of halfway kludged into
-semi-different object types.
-@end table
-@cindex permanent objects
-@cindex temporary objects
-Some types of objects are @dfn{permanent}, meaning that once created,
-they do not disappear until explicitly destroyed, using a function such
-as @code{delete-buffer}, @code{delete-window}, @code{delete-frame}, etc.
-Others will disappear once they are not longer used, through the garbage
-collection mechanism.  Buffers, frames, windows, devices, and processes
-are among the objects that are permanent.  Note that some objects can go
-both ways: Faces can be created either way; extents are normally
-permanent, but detached extents (extents not referring to any text, as
-happens to some extents when the text they are referring to is deleted)
-are temporary.  Note that some permanent objects, such as faces and
-coding systems, cannot be deleted.  Note also that windows are unique in
-that they can be @emph{undeleted} after having previously been
-deleted. (This happens as a result of restoring a window configuration.)
-@cindex read syntax
-Many types of objects have a @dfn{read syntax}, i.e. a way of
-specifying an object of that type in Lisp code.  When you load a Lisp
-file, or type in code to be evaluated, what really happens is that the
-function @code{read} is called, which reads some text and creates an object
-based on the syntax of that text; then @code{eval} is called, which
-possibly does something special; then this loop repeats until there's
-no more text to read. (@code{eval} only actually does something special
-with symbols, which causes the symbol's value to be returned,
-similar to referencing a variable; and with conses [i.e. lists],
-which cause a function invocation.  All other values are returned
-unchanged.)
-The read syntax
-@example
-17297
-@end example
-converts to an integer whose value is 17297.
-@example
-355/113
-@end example
-converts to a ratio commonly used to approximate @emph{pi} when ratios
-are configured, and otherwise to a symbol whose name is ``355/113'' (for
-backward compatibility).
-@example
-1.983e-4
-@end example
-converts to a float whose value is 1.983e-4, or .0001983.
-@example
-?b
-@end example
-converts to a char that represents the lowercase letter b.
-@example
-?^[$(B#&^[(B
-@end example
-(where @samp{^[} actually is an @samp{ESC} character) converts to a
-particular Kanji character when using an ISO2022-based coding system for
-input. (To decode this goo: @samp{ESC} begins an escape sequence;
-@samp{ESC $ (} is a class of escape sequences meaning ``switch to a
-94x94 character set''; @samp{ESC $ ( B} means ``switch to Japanese
-Kanji''; @samp{#} and @samp{&} collectively index into a 94-by-94 array
-of characters [subtract 33 from the ASCII value of each character to get
-the corresponding index]; @samp{ESC (} is a class of escape sequences
-meaning ``switch to a 94 character set''; @samp{ESC (B} means ``switch
-to US ASCII''.  It is a coincidence that the letter @samp{B} is used to
-denote both Japanese Kanji and US ASCII.  If the first @samp{B} were
-replaced with an @samp{A}, you'd be requesting a Chinese Hanzi character
-from the GB2312 character set.)
-@example
-"foobar"
-@end example
-converts to a string.
-@example
-foobar
-@end example
-converts to a symbol whose name is @code{"foobar"}.  This is done by
-looking up the string equivalent in the global variable
-@code{obarray}, whose contents should be an obarray.  If no symbol
-is found, a new symbol with the name @code{"foobar"} is automatically
-created and added to @code{obarray}; this process is called
-@dfn{interning} the symbol.
-@cindex interning
-@example
-(foo . bar)
-@end example
-converts to a cons cell containing the symbols @code{foo} and @code{bar}.
-@example
-(1 a 2.5)
-@end example
-converts to a three-element list containing the specified objects
-(note that a list is actually a set of nested conses; see the
-XEmacs Lisp Reference).
-@example
-[1 a 2.5]
-@end example
-converts to a three-element vector containing the specified objects.
-@example
-#[... ... ... ...]
-@end example
-converts to a compiled-function object (the actual contents are not
-shown since they are not relevant here; look at a file that ends with
-@file{.elc} for examples).
-@example
-#*01110110
-@end example
-converts to a bit-vector.
-@example
-#s(hash-table ... ...)
-@end example
-converts to a hash table (the actual contents are not shown).
-@example
-#s(range-table ... ...)
-@end example
-converts to a range table (the actual contents are not shown).
-@example
-#s(char-table ... ...)
-@end example
-converts to a char table (the actual contents are not shown).
-Note that the @code{#s()} syntax is the general syntax for structures,
-which are not really implemented in XEmacs Lisp but should be.
-When an object is printed out (using @code{print} or a related
-function), the read syntax is used, so that the same object can be read
-in again.
-The other objects do not have read syntaxes, usually because it does not
-really make sense to create them in this fashion (i.e.  processes, where
-it doesn't make sense to have a subprocess created as a side effect of
-reading some Lisp code), or because they can't be created at all
-(e.g. subrs).  Permanent objects, as a rule, do not have a read syntax;
-nor do most complex objects, which contain too much state to be easily
-initialized through a read syntax.
-@node How Lisp Objects Are Represented in C, Major Textual Changes, The XEmacs Object System (Abstractly Speaking), Top
-@chapter How Lisp Objects Are Represented in C
-@cindex Lisp objects are represented in C, how
-@cindex objects are represented in C, how Lisp
-@cindex represented in C, how Lisp objects are
-Lisp objects are represented in C using a 32-bit or 64-bit machine word
-(depending on the processor; i.e. DEC Alphas use 64-bit Lisp objects and
-most other processors use 32-bit Lisp objects).  The representation
-stuffs a pointer together with a tag, as follows:
-@example
-[ 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 ]
-[ 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 ]
-<---------------------------------------------------------> <->
-a pointer to a structure, or an integer            tag
-@end example
-A tag of 00 is used for all pointer object types, a tag of 10 is used
-for characters, and the other two tags 01 and 11 are joined together to
-form the integer object type.  This representation gives us 31 bit
-integers and 30 bit characters, while pointers are represented directly
-without any bit masking or shifting.  This representation, though,
-assumes that pointers to structs are always aligned to multiples of 4,
-so the lower 2 bits are always zero.
-Lisp objects use the typedef @code{Lisp_Object}, but the actual C type
-used for the Lisp object can vary.  It can be either a simple type
-(@code{long} on the DEC Alpha, @code{int} on other machines) or a
-structure whose fields are bit fields that line up properly (actually, a
-union of structures is used).  Generally the simple integral type is
-preferable because it ensures that the compiler will actually use a
-machine word to represent the object (some compilers will use more
-general and less efficient code for unions and structs even if they can
-fit in a machine word).  The union type, however, has the advantage of
-stricter type checking.  If you accidentally pass an integer where a Lisp
-object is desired, you get a compile error.  The choice of which type
-to use is determined by the preprocessor constant @code{USE_UNION_TYPE}
-which is defined via the @code{--use-union-type} option to
-@code{configure}.
-Various macros are used to convert between Lisp_Objects and the
-corresponding C type.  Macros of the form @code{XINT()}, @code{XCHAR()},
-@code{XSTRING()}, @code{XSYMBOL()}, do any required bit shifting and/or
-masking and cast it to the appropriate type.  @code{XINT()} needs to be
-a bit tricky so that negative numbers are properly sign-extended.  Since
-integers are stored left-shifted, if the right-shift operator does an
-arithmetic shift (i.e. it leaves the most-significant bit as-is rather
-than shifting in a zero, so that it mimics a divide-by-two even for
-negative numbers) the shift to remove the tag bit is enough.  This is
-the case on all the systems we support.
-Note that when @code{ERROR_CHECK_TYPECHECK} is defined, the converter
-macros become more complicated---they check the tag bits and/or the
-type field in the first four bytes of a record type to ensure that the
-object is really of the correct type.  This is great for catching places
-where an incorrect type is being dereferenced---this typically results
-in a pointer being dereferenced as the wrong type of structure, with
-unpredictable (and sometimes not easily traceable) results.
-There are similar @code{XSET@var{TYPE}()} macros that construct a Lisp
-object.  These macros are of the form @code{XSET@var{TYPE}
-(@var{lvalue}, @var{result})}, i.e. they have to be a statement rather
-than just used in an expression.  The reason for this is that standard C
-doesn't let you ``construct'' a structure (but GCC does).  Granted, this
-sometimes isn't too convenient; for the case of integers, at least, you
-can use the function @code{make_int()}, which constructs and
-@emph{returns} an integer Lisp object.  Note that the
-@code{XSET@var{TYPE}()} macros are also affected by
-@code{ERROR_CHECK_TYPECHECK} and make sure that the structure is of the
-right type in the case of record types, where the type is contained in
-the structure.
-The C programmer is responsible for @strong{guaranteeing} that a
-Lisp_Object is the correct type before using the @code{X@var{TYPE}}
-macros.  This is especially important in the case of lists.  Use
-@code{XCAR} and @code{XCDR} if a Lisp_Object is certainly a cons cell,
-else use @code{Fcar()} and @code{Fcdr()}.  Trust other C code, but not
-Lisp code.  On the other hand, if XEmacs has an internal logic error,
-it's better to crash immediately, so sprinkle @code{assert()}s and
-``unreachable'' @code{abort()}s liberally about the source code.  Where
-performance is an issue, use @code{type_checking_assert},
-@code{bufpos_checking_assert}, and @code{gc_checking_assert}, which do
-nothing unless the corresponding configure error checking flag was
-specified.
-@node Major Textual Changes, Rules When Writing New C Code, How Lisp Objects Are Represented in C, Top
-@chapter Major Textual Changes
-@cindex textual changes, major
-@cindex major textual changes
-Sometimes major textual changes are made to the source.  This means that
-a search-and-replace is done to change type names and such.  Some people
-disagree with such changes, and certainly if done without good reason
-will just lead to headaches.  But it's important to keep the code clean
-and understable, and consistent naming goes a long way towards this.
-An example of the right way to do this was the so-called "great integral
-type renaming".
-@menu
-* Great Integral Type Renaming::
-* Text/Char Type Renaming::
-@end menu
-@node Great Integral Type Renaming, Text/Char Type Renaming, Major Textual Changes, Major Textual Changes
-@section Great Integral Type Renaming
-@cindex Great Integral Type Renaming
-@cindex integral type renaming, great
-@cindex type renaming, integral
-@cindex renaming, integral types
-The purpose of this is to rationalize the names used for various
-integral types, so that they match their intended uses and follow
-consist conventions, and eliminate types that were not semantically
-different from each other.
-The conventions are:
-@itemize @bullet
-@item
-All integral types that measure quantities of anything are signed.  Some
-people disagree vociferously with this, but their arguments are mostly
-theoretical, and are vastly outweighed by the practical headaches of
-mixing signed and unsigned values, and more importantly by the far
-increased likelihood of inadvertent bugs: Because of the broken "viral"
-nature of unsigned quantities in C (operations involving mixed
-signed/unsigned are done unsigned, when exactly the opposite is nearly
-always wanted), even a single error in declaring a quantity unsigned
-that should be signed, or even the even more subtle error of comparing
-signed and unsigned values and forgetting the necessary cast, can be
-catastrophic, as comparisons will yield wrong results.  -Wsign-compare
-is turned on specifically to catch this, but this tends to result in a
-great number of warnings when mixing signed and unsigned, and the casts
-are annoying.  More has been written on this elsewhere.
-@item
-All such quantity types just mentioned boil down to EMACS_INT, which is
-32 bits on 32-bit machines and 64 bits on 64-bit machines.  This is
-guaranteed to be the same size as Lisp objects of type @code{int}, and (as
-far as I can tell) of size_t (unsigned!) and ssize_t.  The only type
-below that is not an EMACS_INT is Hashcode, which is an unsigned value
-of the same size as EMACS_INT.
-@item
-Type names should be relatively short (no more than 10 characters or
-so), with the first letter capitalized and no underscores if they can at
-all be avoided.
-@item
-"count" == a zero-based measurement of some quantity.  Includes sizes,
-offsets, and indexes.
-@item
-"bpos" == a one-based measurement of a position in a buffer.  "Charbpos"
-and "Bytebpos" count text in the buffer, rather than bytes in memory;
-thus Bytebpos does not directly correspond to the memory representation.
-Use "Membpos" for this.
-@item
-"Char" refers to internal-format characters, not to the C type "char",
-which is really a byte.
-@end itemize
-For the actual name changes, see the script below.
-I ran the following script to do the conversion. (NOTE: This script is
-idempotent.  You can safely run it multiple times and it will not screw
-up previous results -- in fact, it will do nothing if nothing has
-changed.  Thus, it can be run repeatedly as necessary to handle patches
-coming in from old workspaces, or old branches.)  There are two tags,
-just before and just after the change: @samp{pre-integral-type-rename}
-and @samp{post-integral-type-rename}.  When merging code from the main
-trunk into a branch, the best thing to do is first merge up to
-@samp{pre-integral-type-rename}, then apply the script and associated
-changes, then merge from @samp{post-integral-type-change} to the
-present. (Alternatively, just do the merging in one operation; but you
-may then have a lot of conflicts needing to be resolved by hand.)
-Script @samp{fixtypes.sh} follows:
-@example
------------------------------------ cut ------------------------------------
-files="*.[ch] s/*.h m/*.h config.h.in ../configure.in Makefile.in.in ../lib-src/*.[ch] ../lwlib/*.[ch]"
-gr Memory_Count Bytecount $files
-gr Lstream_Data_Count Bytecount $files
-gr Element_Count Elemcount $files
-gr Hash_Code Hashcode $files
-gr extcount bytecount $files
-gr bufpos charbpos $files
-gr bytind bytebpos $files
-gr memind membpos $files
-gr bufbyte intbyte $files
-gr Extcount Bytecount $files
-gr Bufpos Charbpos $files
-gr Bytind Bytebpos $files
-gr Memind Membpos $files
-gr Bufbyte Intbyte $files
-gr EXTCOUNT BYTECOUNT $files
-gr BUFPOS CHARBPOS $files
-gr BYTIND BYTEBPOS $files
-gr MEMIND MEMBPOS $files
-gr BUFBYTE INTBYTE $files
-gr MEMORY_COUNT BYTECOUNT $files
-gr LSTREAM_DATA_COUNT BYTECOUNT $files
-gr ELEMENT_COUNT ELEMCOUNT $files
-gr HASH_CODE HASHCODE $files
------------------------------------ cut ------------------------------------
-@end example
-The @samp{gr} script, and the scripts it uses, are documented in
-@file{README.global-renaming}, because if placed in this file they would
-need to have their @@ characters doubled, meaning you couldn't easily
-cut and paste from the source.
-In addition to those programs, I needed to fix up a few other
-things, particularly relating to the duplicate definitions of
-types, now that some types merged with others.  Specifically:
-@enumerate
-@item
-in @file{lisp.h}, removed duplicate declarations of Bytecount.  The changed
-code should now look like this: (In each code snippet below, the first
-and last lines are the same as the original, as are all lines outside of
-those lines.  That allows you to locate the section to be replaced, and
-replace the stuff in that section, verifying that there isn't anything
-new added that would need to be kept.)
-@example
---------------------------------- snip -------------------------------------
-/* Counts of bytes or chars */
-typedef EMACS_INT Bytecount;
-typedef EMACS_INT Charcount;
-/* Counts of elements */
-typedef EMACS_INT Elemcount;
-/* Hash codes */
-typedef unsigned long Hashcode;
-/* ------------------------ dynamic arrays ------------------- */
---------------------------------- snip -------------------------------------
-@end example
-@item
-in @file{lstream.h}, removed duplicate declaration of Bytecount.  Rewrote the
-comment about this type.  The changed code should now look like this:
-@example
---------------------------------- snip -------------------------------------
-#endif
-/* The have been some arguments over the what the type should be that
-specifies a count of bytes in a data block to be written out or read in,
-using @code{Lstream_read()}, @code{Lstream_write()}, and related functions.
-Originally it was long, which worked fine; Martin "corrected" these to
-size_t and ssize_t on the grounds that this is theoretically cleaner and
-is in keeping with the C standards.  Unfortunately, this practice is
-horribly error-prone due to design flaws in the way that mixed
-signed/unsigned arithmetic happens.  In fact, by doing this change,
-Martin introduced a subtle but fatal error that caused the operation of
-sending large mail messages to the SMTP server under Windows to fail.
-By putting all values back to be signed, avoiding any signed/unsigned
-mixing, the bug immediately went away.  The type then in use was
-Lstream_Data_Count, so that it be reverted cleanly if a vote came to
-that.  Now it is Bytecount.
-Some earlier comments about why the type must be signed: This MUST BE
-SIGNED, since it also is used in functions that return the number of
-bytes actually read to or written from in an operation, and these
-functions can return -1 to signal error.
-Note that the standard Unix @code{read()} and @code{write()} functions define the
-count going in as a size_t, which is UNSIGNED, and the count going
-out as an ssize_t, which is SIGNED.  This is a horrible design
-flaw.  Not only is it highly likely to lead to logic errors when a
--1 gets interpreted as a large positive number, but operations are
-bound to fail in all sorts of horrible ways when a number in the
-upper-half of the size_t range is passed in -- this number is
-unrepresentable as an ssize_t, so code that checks to see how many
-bytes are actually written (which is mandatory if you are dealing
-with certain types of devices) will get completely screwed up.
---ben
-*/
-typedef enum lstream_buffering
---------------------------------- snip -------------------------------------
-@end example
-@item
-in @file{dumper.c}, there are four places, all inside of @code{switch()} statements,
-where XD_BYTECOUNT appears twice as a case tag.  In each case, the two
-case blocks contain identical code, and you should *REMOVE THE SECOND*
-and leave the first.
-@end enumerate
-@node Text/Char Type Renaming,  , Great Integral Type Renaming, Major Textual Changes
-@section Text/Char Type Renaming
-@cindex Text/Char Type Renaming
-@cindex type renaming, text/char
-@cindex renaming, text/char types
-The purpose of this was
-@enumerate
-@item
-To distinguish between ``charptr'' when it refers to operations on
-the pointer itself and when it refers to operations on text
-@item
-To use consistent naming for everything referring to internal format, i.e.
-@end enumerate
-@example
-	Itext == text in internal format
-	Ibyte == a byte in such text
-	Ichar == a char as represented in internal character format
-@end example
-Thus e.g.
-@example
-	set_charptr_emchar -> set_itext_ichar
-@end example
-This was done using a script like this:
-@example
-files="*.[ch] s/*.h m/*.h config.h.in ../configure.in Makefile.in.in ../lib-src/*.[ch] ../lwlib/*.[ch]"
-gr Intbyte Ibyte $files
-gr INTBYTE IBYTE $files
-gr intbyte ibyte $files
-gr EMCHAR ICHAR $files
-gr emchar ichar $files
-gr Emchar Ichar $files
-gr INC_CHARPTR INC_IBYTEPTR $files
-gr DEC_CHARPTR DEC_IBYTEPTR $files
-gr VALIDATE_CHARPTR VALIDATE_IBYTEPTR $files
-gr valid_charptr valid_ibyteptr $files
-gr CHARPTR ITEXT $files
-gr charptr itext $files
-gr Charptr Itext $files
-@end example
-See above for the source to @samp{gr}.
-As in the integral-types change, there are pre and post tags before and
-after the change:
-@example
-	pre-internal-format-textual-renaming
-	post-internal-format-textual-renaming
-@end example
-When merging a large branch, follow the same sort of procedure
-documented above, using these tags -- essentially sync up to the pre
-tag, then apply the script yourself, then sync from the post tag to the
-present.  You can probably do the same if you don't have a separate
-workspace, but do have lots of outstanding changes and you'd rather not
-just merge all the textual changes directly.  Use something like this:
-(WARNING: I'm not a CVS guru; before trying this, or any large operation
-that might potentially mess things up, @strong{DEFINITELY} make a backup of
-your existing workspace.)
-@example
-cup -r pre-internal-format-textual-renaming
-<apply script>
-cup -A -j post-internal-format-textual-renaming -j HEAD
-@end example
-This might also work:
-@example
-cup -j pre-internal-format-textual-renaming
-<apply script>
-cup -j post-internal-format-textual-renaming -j HEAD
-@end example
-ben
-The following is a script to go in the opposite direction:
-@example
-files="*.[ch] s/*.h m/*.h config.h.in ../configure.in Makefile.in.in ../lib-src/*.[ch] ../lwlib/*.[ch]"
-# Evidently Perl considers _ to be a word char ala \b, even though XEmacs
-# doesn't.  We need to be careful here with ibyte/ichar because of words
-# like Richard, @code{eicharlen()}, multibyte, HIBYTE, etc.
-gr Ibyte Intbyte $files
-gr '\bIBYTE' INTBYTE $files
-gr '\bibyte' intbyte $files
-gr '\bICHAR' EMCHAR $files
-gr '\bichar' emchar $files
-gr '\bIchar' Emchar $files
-gr '\bIBYTEPTR' CHARPTR $files
-gr '\bibyteptr' charptr $files
-gr '\bITEXT' CHARPTR $files
-gr '\bitext' charptr $files
-gr '\bItext' CHARPTR $files
-gr '_IBYTE' _INTBYTE $files
-gr '_ibyte' _intbyte $files
-gr '_ICHAR' _EMCHAR $files
-gr '_ichar' _emchar $files
-gr '_Ichar' _Emchar $files
-gr '_IBYTEPTR' _CHARPTR $files
-gr '_ibyteptr' _charptr $files
-gr '_ITEXT' _CHARPTR $files
-gr '_itext' _charptr $files
-gr '_Itext' _CHARPTR $files
-@end example
-@node Rules When Writing New C Code, Regression Testing XEmacs, Major Textual Changes, Top
-@chapter Rules When Writing New C Code
-@cindex writing new C code, rules when
-@cindex C code, rules when writing new
-@cindex code, rules when writing new C
-The XEmacs C Code is extremely complex and intricate, and there are many
-rules that are more or less consistently followed throughout the code.
-Many of these rules are not obvious, so they are explained here.  It is
-of the utmost importance that you follow them.  If you don't, you may
-get something that appears to work, but which will crash in odd
-situations, often in code far away from where the actual breakage is.
-@menu
-* A Reader's Guide to XEmacs Coding Conventions::
-* General Coding Rules::
-* Object-Oriented Techniques for C::
-* Writing Lisp Primitives::
-* Writing Good Comments::
-* Adding Global Lisp Variables::
-* Writing Macros::
-* Proper Use of Unsigned Types::
-* Techniques for XEmacs Developers::
-@end menu
-See also @ref{Coding for Mule}.
-@node A Reader's Guide to XEmacs Coding Conventions, General Coding Rules, Rules When Writing New C Code, Rules When Writing New C Code
-@section A Reader's Guide to XEmacs Coding Conventions
-@cindex coding conventions
-@cindex reader's guide
-@cindex coding rules, naming
-Of course the low-level implementation language of XEmacs is C, but much
-of that uses the Lisp engine to do its work.  However, because the code
-is ``inside'' of the protective containment shell around the ``reactor
-core,'' you'll see lots of complex ``plumbing'' needed to do the work
-and ``safety mechanisms,'' whose failure results in a meltdown.  This
-section provides a quick overview (or review) of the various components
-of the implementation of Lisp objects.
-Two typographic conventions help to identify C objects that implement
-Lisp objects.  The first is that capitalized identifiers, especially
-beginning with the letters @samp{Q}, @samp{V}, @samp{F}, and @samp{S},
-for C variables and functions, and C macros with beginning with the
-letter @samp{X}, are used to implement Lisp.  The second is that where
-Lisp uses the hyphen @samp{-} in symbol names, the corresponding C
-identifiers use the underscore @samp{_}.  Of course, since XEmacs Lisp
-contains interfaces to many external libraries, those external names
-will follow the coding conventions their authors chose, and may overlap
-the ``XEmacs name space.''  However these cases are usually pretty
-obvious.
-All Lisp objects are handled indirectly.  The @code{Lisp_Object}
-type is usually a pointer to a structure, except for a very small number
-of types with immediate representations (currently characters and
-integers).  However, these types cannot be directly operated on in C
-code, either, so they can also be considered indirect.  Types that do
-not have an immediate representation always have a C typedef
-@code{Lisp_@var{type}} for a corresponding structure.
-@c #### mention l(c)records here?
-In older code, it was common practice to pass around pointers to
-@code{Lisp_@var{type}}, but this is now deprecated in favor of using
-@code{Lisp_Object} for all function arguments and return values that are
-Lisp objects.  The @code{X@var{type}} macro is used to extract the
-pointer and cast it to @code{(Lisp_@var{type} *)} for the desired type.
-@strong{Convention}: macros whose names begin with @samp{X} operate on
-@code{Lisp_Object}s and do no type-checking.  Many such macros are type
-extractors, but others implement Lisp operations in C (@emph{e.g.},
-@code{XCAR} implements the Lisp @code{car} function).  These are unsafe,
-and must only be used where types of all data have already been checked.
-Such macros are only applied to @code{Lisp_Object}s.  In internal
-implementations where the pointer has already been converted, the
-structure is operated on directly using the C @code{->} member access
-operator.
-The @code{@var{type}P}, @code{CHECK_@var{type}}, and
-@code{CONCHECK_@var{type}} macros are used to test types.  The first
-returns a Boolean value, and the latter signal errors.  (The
-@samp{CONCHECK} variety allows execution to be CONtinued under some
-circumstances, thus the name.)  Functions which expect to be passed user
-data invariably call @samp{CHECK} macros on arguments.
-There are many types of specialized Lisp objects implemented in C, but
-the most pervasive type is the @dfn{symbol}.  Symbols are used as
-identifiers, variables, and functions.
-@strong{Convention}: Global variables whose names begin with @samp{Q}
-are constants whose value is a symbol.  The name of the variable should
-be derived from the name of the symbol using the same rules as for Lisp
-primitives.  Such variables allow the C code to check whether a
-particular @code{Lisp_Object} is equal to a given symbol.  Symbols are
-Lisp objects, so these variables may be passed to Lisp primitives.  (An
-alternative to the use of @samp{Q...} variables is to call the
-@code{intern} function at initialization in the
-@code{vars_of_@var{module}} function, which is hardly less efficient.)
-@strong{Convention}: Global variables whose names begin with @samp{V}
-are variables that contain Lisp objects.  The convention here is that
-all global variables of type @code{Lisp_Object} begin with @samp{V}, and
-no others do (not even integer and boolean variables that have Lisp
-equivalents). Most of the time, these variables have equivalents in
-Lisp, which are defined via the @samp{DEFVAR} family of macros, but some
-don't.  Since the variable's value is a @code{Lisp_Object}, it can be
-passed to Lisp primitives.
-The implementation of Lisp primitives is more complex.
-@strong{Convention}: Global variables with names beginning with @samp{S}
-contain a structure that allows the Lisp engine to identify and call a C
-function.  In modern versions of XEmacs, these identifiers are almost
-always completely hidden in the @code{DEFUN} and @code{SUBR} macros, but
-you will encounter them if you look at very old versions of XEmacs or at
-GNU Emacs.  @strong{Convention}: Functions with names beginning with
-@samp{F} implement Lisp primitives.  Of course all their arguments and
-their return values must be Lisp_Objects.  (This is hidden in the
-@code{DEFUN} macro.)
-@node General Coding Rules, Object-Oriented Techniques for C, A Reader's Guide to XEmacs Coding Conventions, Rules When Writing New C Code
-@section General Coding Rules
-@cindex coding rules, general
-The C code is actually written in a dialect of C called @dfn{Clean C},
-meaning that it can be compiled, mostly warning-free, with either a C or
-C++ compiler.  Coding in Clean C has several advantages over plain C.
-C++ compilers are more nit-picking, and a number of coding errors have
-been found by compiling with C++.  The ability to use both C and C++
-tools means that a greater variety of development tools are available to
-the developer.  In addition, the ability to overload operators in C++
-means it is possible, for error-checking purposes, to redefine certain
-simple types (normally defined as aliases for simple built-in types such
-as @code{unsigned char} or @code{long}) as classes, strictly limiting the permissible
-operations and catching illegal implicit casts and such.
-Every module includes @file{<config.h>} (angle brackets so that
-@samp{--srcdir} works correctly; @file{config.h} may or may not be in
-the same directory as the C sources) and @file{lisp.h}.  @file{config.h}
-must always be included before any other header files (including
-system header files) to ensure that certain tricks played by various
-@file{s/} and @file{m/} files work out correctly.
-When including header files, always use angle brackets, not double
-quotes, except when the file to be included is always in the same
-directory as the including file.  If either file is a generated file,
-then that is not likely to be the case.  In order to understand why we
-have this rule, imagine what happens when you do a build in the source
-directory using @samp{./configure} and another build in another
-directory using @samp{../work/configure}.  There will be two different
-@file{config.h} files.  Which one will be used if you @samp{#include
-"config.h"}?
-Almost every module contains a @code{syms_of_*()} function and a
-@code{vars_of_*()} function.  The former declares any Lisp primitives
-you have defined and defines any symbols you will be using.  The latter
-declares any global Lisp variables you have added and initializes global
-C variables in the module.  @strong{Important}: There are stringent
-requirements on exactly what can go into these functions.  See the
-comment in @file{emacs.c}.  The reason for this is to avoid obscure
-unwanted interactions during initialization.  If you don't follow these
-rules, you'll be sorry!  If you want to do anything that isn't allowed,
-create a @code{complex_vars_of_*()} function for it.  Doing this is
-tricky, though: you have to make sure your function is called at the
-right time so that all the initialization dependencies work out.
-Declare each function of these kinds in @file{symsinit.h}.  Make sure
-it's called in the appropriate place in @file{emacs.c}.  You never need
-to include @file{symsinit.h} directly, because it is included by
-@file{lisp.h}.
-@strong{All global and static variables that are to be modifiable must
-be declared uninitialized.}  This means that you may not use the
-``declare with initializer'' form for these variables, such as @code{int
-some_variable = 0;}.  The reason for this has to do with some kludges
-done during the dumping process: If possible, the initialized data
-segment is re-mapped so that it becomes part of the (unmodifiable) code
-segment in the dumped executable.  This allows this memory to be shared
-among multiple running XEmacs processes.  XEmacs is careful to place as
-much constant data as possible into initialized variables during the
-@file{temacs} phase.
-@cindex copy-on-write
-@strong{Please note:} This kludge only works on a few systems nowadays,
-and is rapidly becoming irrelevant because most modern operating systems
-provide @dfn{copy-on-write} semantics.  All data is initially shared
-between processes, and a private copy is automatically made (on a
-page-by-page basis) when a process first attempts to write to a page of
-memory.
-Formerly, there was a requirement that static variables not be declared
-inside of functions.  This had to do with another hack along the same
-vein as what was just described: old USG systems put statically-declared
-variables in the initialized data space, so those header files had a
-@code{#define static} declaration. (That way, the data-segment remapping
-described above could still work.) This fails badly on static variables
-inside of functions, which suddenly become automatic variables;
-therefore, you weren't supposed to have any of them.  This awful kludge
-has been removed in XEmacs because
-@enumerate
-@item
-almost all of the systems that used this kludge ended up having
-to disable the data-segment remapping anyway;
-@item
-the only systems that didn't were extremely outdated ones;
-@item
-this hack completely messed up inline functions.
-@end enumerate
-The C source code makes heavy use of C preprocessor macros.  One popular
-macro style is:
-@example
-#define FOO(var, value) do @{            \
-Lisp_Object FOO_value = (value);      \
-... /* compute using FOO_value */     \
-(var) = bar;                          \
-@} while (0)
-@end example
-The @code{do @{...@} while (0)} is a standard trick to allow FOO to have
-statement semantics, so that it can safely be used within an @code{if}
-statement in C, for example.  Multiple evaluation is prevented by
-copying a supplied argument into a local variable, so that
-@code{FOO(var,fun(1))} only calls @code{fun} once.
-Lisp lists are popular data structures in the C code as well as in
-Elisp.  There are two sets of macros that iterate over lists.
-@code{EXTERNAL_LIST_LOOP_@var{n}} should be used when the list has been
-supplied by the user, and cannot be trusted to be acyclic and
-@code{nil}-terminated.  A @code{malformed-list} or @code{circular-list} error
-will be generated if the list being iterated over is not entirely
-kosher.  @code{LIST_LOOP_@var{n}}, on the other hand, is faster and less
-safe, and can be used only on trusted lists.
-Related macros are @code{GET_EXTERNAL_LIST_LENGTH} and
-@code{GET_LIST_LENGTH}, which calculate the length of a list, and in the
-case of @code{GET_EXTERNAL_LIST_LENGTH}, validating the properness of
-the list.  The macros @code{EXTERNAL_LIST_LOOP_DELETE_IF} and
-@code{LIST_LOOP_DELETE_IF} delete elements from a lisp list satisfying some
-predicate.
-@node Object-Oriented Techniques for C, Writing Lisp Primitives, General Coding Rules, Rules When Writing New C Code
-@section Object-Oriented Techniques for C
-@cindex coding rules, object-oriented
-@cindex object-oriented techniques
-At the lowest levels, XEmacs makes heavy use of object-oriented
-techniques to promote code-sharing and uniform interfaces for different
-devices and platforms.  Commonly, but not always, such objects are
-``wrapped'' and exported to Lisp as Lisp objects.  Usually they use
-the internal structures developed for Lisp objects (the @samp{lrecord}
-structure) in order to take advantage of Lisp memory management.
-Unfortunately, XEmacs was originally written in C, so these techniques
-are based on heavy use of C macros.
-@c You can't use @var{} for type below, because case is important.
-A module defining a class is likely to use most of the following
-declarations and macros.  In the following, the notation @samp{<type>}
-will stand for the full name of the class, and will be capitalized in
-the way normal for its context.  The notation @samp{<typ>} will stand
-for the abbreviated form commonly used in macro names, while @samp{ty}
-will be used as the typical name for instances of the class.  (See the
-entry for @samp{MAYBE_<TY>METH} below for an example using all three
-notations.)
-In the interface (@file{.h} file), the following declarations are used
-often.  Others may be used in for particular modules.  Since they're
-quite short in most cases, the definitions are given as well.  The
-generic macros used are defined in @file{lisp.h} or @file{lrecord.h}.
-@c #### reorganize this table into stuff used in general code, and stuff
-@c used only in declarations or initializations
-@table @samp
-@c #### declaration
-@item typedef struct Lisp_<Type> Lisp_<Type>
-This refers to the internal structure used by C code.  The XEmacs coding
-style now forbids passing pointers to @samp{Lisp_<Type>} structures into
-or out of a function; instead, a @samp{Lisp_Object} should be passed or
-returned (created using @samp{wrap_<type>}, if necessary).
-@c #### declaration
-@item DECLARE_LRECORD (<type>, Lisp_<Type>)
-Declares an @samp{lrecord} for @samp{<Type>}, which is the unit of
-allocation.
-@item #define X<TYPE>(x) XRECORD (x, <type>, Lisp_<Type>)
-Turns a @code{Lisp_Object} into a pointer to @samp{struct Lisp_<Type>}.
-@item #define wrap_<type>(p) wrap_record (p, <type>)
-Turns a pointer to @samp{struct Lisp_<Type>} into a @code{Lisp_Object}.
-@item #define <TYPE>P(x) RECORDP (x, <type>)
-Tests whether a given @code{Lisp_Object} is of type @samp{Lisp_<Type>}.
-Returns a C int, not a Lisp Boolean value.
-@item #define CHECK_<TYPE>(x) CHECK_RECORD (x, <type>)
-@itemx #define CONCHECK_<TYPE>(x) CONCHECK_RECORD (x, <type>)
-Tests whether a given @code{Lisp_Object} is of type @samp{Lisp_<Type>},
-and signals a Lisp error if not.  The @samp{CHECK} version of the macro
-never returns if the type is wrong, while the @samp{CONCHECK} version
-can return if the user catches it in the debugger and explicitly
-requests a return.
-@item #define RAW_<TYP>METH(ty, m) ((ty)->methods->m##_method)
-Return a function pointer for the method for an object @var{TY} of class
-@samp{Lisp_<Type>}, or @samp{NULL} if there is none for this type.
-@item #define HAS_<TYP>METH_P(ty, m) (!!RAW_<TYP>METH (ty, m))
-Test whether the class that @var{TY} is an instance of has the method.
-@item #define <TYP>METH(ty, m, args) ((RAW_<TYP>METH (ty, m)) args)
-Call the method on @samp{args}.  @samp{args} must be enclosed in
-parentheses in the call.  It is the programmer's responsibility to
-ensure that the method is available.  The standard convenience macro
-@samp{MAYBE_<TYP>METH} is often provided for the common case where a
-void-returning method of @samp{Type} is called.
-@item #define MAYBE_<TYP>METH(ty, m, args) do @{ ... @} while (0)
-Call a void-returning @samp{<Type>} method, if it exists.  Note the use
-of the @samp{do ... while (0)} idiom to give the macro call C statement
-semantics.  The full definition is equally idiomatic:
-@example
-#define MAYBE_<TYP>METH(ty, m, args) do @{	\
-Lisp_<Type> *maybe_<typ>meth_ty = (ty);	\
-if (HAS_<TYP>METH_P (maybe_<typ>meth_ty, m))	\
-<TYP>METH (maybe_<typ>meth_ty, m, args);	\
-@} while (0)
-@end example
-@end table
-The use of macros for invoking an object's methods makes life a bit
-difficult for the student or maintainer when browsing the code.  In
-particular, calls are of the form @samp{<TYP>METH (ty, some_method, (x,
-y))}, but definitions typically are for @samp{<subtype>_some_method}.
-Thus, when you are trying to find calls, you need to grep for
-@samp{some_method}, but this will also catch calls and definitions of
-that method for instances of other subtypes of @samp{<Type>}, and there
-may be a rather large number of them.
-@node Writing Lisp Primitives, Writing Good Comments, Object-Oriented Techniques for C, Rules When Writing New C Code
-@section Writing Lisp Primitives
-@cindex writing Lisp primitives
-@cindex Lisp primitives, writing
-@cindex primitives, writing Lisp
-Lisp primitives are Lisp functions implemented in C.  The details of
-interfacing the C function so that Lisp can call it are handled by a few
-C macros.  The only way to really understand how to write new C code is
-to read the source, but we can explain some things here.
-An example of a special form is the definition of @code{prog1}, from
-@file{eval.c}.  (An ordinary function would have the same general
-appearance.)
-@cindex garbage collection protection
-@smallexample
-@group
-DEFUN ("prog1", Fprog1, 1, UNEVALLED, 0, /*
-Similar to `progn', but the value of the first form is returned.
-\(prog1 FIRST BODY...): All the arguments are evaluated sequentially.
-The value of FIRST is saved during evaluation of the remaining args,
-whose values are discarded.
-*/
-(args))
-@{
-/* This function can GC */
-REGISTER Lisp_Object val, form, tail;
-struct gcpro gcpro1;
-val = Feval (XCAR (args));
-GCPRO1 (val);
-LIST_LOOP_3 (form, XCDR (args), tail)
-Feval (form);
-UNGCPRO;
-return val;
-@}
-@end group
-@end smallexample
-Let's start with a precise explanation of the arguments to the
-@code{DEFUN} macro.  Here is a template for them:
-@example
-@group
-DEFUN (@var{lname}, @var{fname}, @var{min_args}, @var{max_args}, @var{interactive}, /*
-@var{docstring}
-*/
-(@var{arglist}))
-@end group
-@end example
-@table @var
-@item lname
-This string is the name of the Lisp symbol to define as the function
-name; in the example above, it is @code{"prog1"}.
-@item fname
-This is the C function name for this function.  This is the name that is
-used in C code for calling the function.  The name is, by convention,
-@samp{F} prepended to the Lisp name, with all dashes (@samp{-}) in the
-Lisp name changed to underscores.  Thus, to call this function from C
-code, call @code{Fprog1}.  Remember that the arguments are of type
-@code{Lisp_Object}; various macros and functions for creating values of
-type @code{Lisp_Object} are declared in the file @file{lisp.h}.
-Primitives whose names are special characters (e.g. @code{+} or
-@code{<}) are named by spelling out, in some fashion, the special
-character: e.g. @code{Fplus()} or @code{Flss()}.  Primitives whose names
-begin with normal alphanumeric characters but also contain special
-characters are spelled out in some creative way, e.g. @code{let*}
-becomes @code{FletX()}.
-Each function also has an associated structure that holds the data for
-the subr object that represents the function in Lisp.  This structure
-conveys the Lisp symbol name to the initialization routine that will
-create the symbol and store the subr object as its definition.  The C
-variable name of this structure is always @samp{S} prepended to the
-@var{fname}.  You hardly ever need to be aware of the existence of this
-structure, since @code{DEFUN} plus @code{DEFSUBR} takes care of all the
-details.
-@item min_args
-This is the minimum number of arguments that the function requires.  The
-function @code{prog1} allows a minimum of one argument.
-@item max_args
-This is the maximum number of arguments that the function accepts, if
-there is a fixed maximum.  Alternatively, it can be @code{UNEVALLED},
-indicating a special form that receives unevaluated arguments, or
-@code{MANY}, indicating an unlimited number of evaluated arguments (the
-C equivalent of @code{&rest}).  Both @code{UNEVALLED} and @code{MANY}
-are macros.  If @var{max_args} is a number, it may not be less than
-@var{min_args} and it may not be greater than 8. (If you need to add a
-function with more than 8 arguments, use the @code{MANY} form.  Resist
-the urge to edit the definition of @code{DEFUN} in @file{lisp.h}.  If
-you do it anyways, make sure to also add another clause to the switch
-statement in @code{primitive_funcall().})
-@item interactive
-This is an interactive specification, a string such as might be used as
-the argument of @code{interactive} in a Lisp function.  In the case of
-@code{prog1}, it is 0 (a null pointer), indicating that @code{prog1}
-cannot be called interactively.  A value of @code{""} indicates a
-function that should receive no arguments when called interactively.
-@item docstring
-This is the documentation string.  It is written just like a
-documentation string for a function defined in Lisp; in particular, the
-first line should be a single sentence.  Note how the documentation
-string is enclosed in a comment, none of the documentation is placed on
-the same lines as the comment-start and comment-end characters, and the
-comment-start characters are on the same line as the interactive
-specification.  @file{make-docfile}, which scans the C files for
-documentation strings, is very particular about what it looks for, and
-will not properly extract the doc string if it's not in this exact format.
-In order to make both @file{etags} and @file{make-docfile} happy, make
-sure that the @code{DEFUN} line contains the @var{lname} and
-@var{fname}, and that the comment-start characters for the doc string
-are on the same line as the interactive specification, and put a newline
-directly after them (and before the comment-end characters).
-@item arglist
-This is the comma-separated list of arguments to the C function.  For a
-function with a fixed maximum number of arguments, provide a C argument
-for each Lisp argument.  In this case, unlike regular C functions, the
-types of the arguments are not declared; they are simply always of type
-@code{Lisp_Object}.
-The names of the C arguments will be used as the names of the arguments
-to the Lisp primitive as displayed in its documentation, modulo the same
-concerns described above for @code{F...} names (in particular,
-underscores in the C arguments become dashes in the Lisp arguments).
-There is one additional kludge: A trailing @samp{_} on the C argument is
-discarded when forming the Lisp argument.  This allows C language
-reserved words (like @code{default}) or global symbols (like
-@code{dirname}) to be used as argument names without compiler warnings
-or errors.
-A Lisp function with @w{@var{max_args} = @code{UNEVALLED}} is a
-@w{@dfn{special form}}; its arguments are not evaluated.  Instead it
-receives one argument of type @code{Lisp_Object}, a (Lisp) list of the
-unevaluated arguments, conventionally named @code{(args)}.
-When a Lisp function has no upper limit on the number of arguments,
-specify @w{@var{max_args} = @code{MANY}}.  In this case its implementation in
-C actually receives exactly two arguments: the number of Lisp arguments
-(an @code{int}) and the address of a block containing their values (a
-@w{@code{Lisp_Object *}}).  In this case only are the C types specified
-in the @var{arglist}: @w{@code{(int nargs, Lisp_Object *args)}}.
-@end table
-Within the function @code{Fprog1} itself, note the use of the macros
-@code{GCPRO1} and @code{UNGCPRO}.  @code{GCPRO1} is used to ``protect''
-a variable from garbage collection---to inform the garbage collector
-that it must look in that variable and regard the object pointed at by
-its contents as an accessible object.  This is necessary whenever you
-call @code{Feval} or anything that can directly or indirectly call
-@code{Feval} (this includes the @code{QUIT} macro!).  At such a time,
-any Lisp object that you intend to refer to again must be protected
-somehow.  @code{UNGCPRO} cancels the protection of the variables that
-are protected in the current function.  It is necessary to do this
-explicitly.
-The macro @code{GCPRO1} protects just one local variable.  If you want
-to protect two, use @code{GCPRO2} instead; repeating @code{GCPRO1} will
-not work.  Macros @code{GCPRO3} and @code{GCPRO4} also exist.
-These macros implicitly use local variables such as @code{gcpro1}; you
-must declare these explicitly, with type @code{struct gcpro}.  Thus, if
-you use @code{GCPRO2}, you must declare @code{gcpro1} and @code{gcpro2}.
-@cindex caller-protects (@code{GCPRO} rule)
-Note also that the general rule is @dfn{caller-protects}; i.e. you are
-only responsible for protecting those Lisp objects that you create.  Any
-objects passed to you as arguments should have been protected by whoever
-created them, so you don't in general have to protect them.
-In particular, the arguments to any Lisp primitive are always
-automatically @code{GCPRO}ed, when called ``normally'' from Lisp code or
-bytecode.  So only a few Lisp primitives that are called frequently from
-C code, such as @code{Fprogn} protect their arguments as a service to
-their caller.  You don't need to protect your arguments when writing a
-new @code{DEFUN}.
-@code{GCPRO}ing is perhaps the trickiest and most error-prone part of
-XEmacs coding.  It is @strong{extremely} important that you get this
-right and use a great deal of discipline when writing this code.
-@xref{GCPROing, ,@code{GCPRO}ing}, for full details on how to do this.
-What @code{DEFUN} actually does is declare a global structure of type
-@code{Lisp_Subr} whose name begins with capital @samp{SF} and which
-contains information about the primitive (e.g. a pointer to the
-function, its minimum and maximum allowed arguments, a string describing
-its Lisp name); @code{DEFUN} then begins a normal C function declaration
-using the @code{F...} name.  The Lisp subr object that is the function
-definition of a primitive (i.e. the object in the function slot of the
-symbol that names the primitive) actually points to this @samp{SF}
-structure; when @code{Feval} encounters a subr, it looks in the
-structure to find out how to call the C function.
-Defining the C function is not enough to make a Lisp primitive
-available; you must also create the Lisp symbol for the primitive (the
-symbol is @dfn{interned}; @pxref{Obarrays}) and store a suitable subr
-object in its function cell. (If you don't do this, the primitive won't
-be seen by Lisp code.) The code looks like this:
-@example
-DEFSUBR (@var{fname});
-@end example
-@noindent
-Here @var{fname} is the same name you used as the second argument to
-@code{DEFUN}.
-This call to @code{DEFSUBR} should go in the @code{syms_of_*()} function
-at the end of the module.  If no such function exists, create it and
-make sure to also declare it in @file{symsinit.h} and call it from the
-appropriate spot in @code{main()}.  @xref{General Coding Rules}.
-Note that C code cannot call functions by name unless they are defined
-in C.  The way to call a function written in Lisp from C is to use
-@code{Ffuncall}, which embodies the Lisp function @code{funcall}.  Since
-the Lisp function @code{funcall} accepts an unlimited number of
-arguments, in C it takes two: the number of Lisp-level arguments, and a
-one-dimensional array containing their values.  The first Lisp-level
-argument is the Lisp function to call, and the rest are the arguments to
-pass to it.  Since @code{Ffuncall} can call the evaluator, you must
-protect pointers from garbage collection around the call to
-@code{Ffuncall}. (However, @code{Ffuncall} explicitly protects all of
-its parameters, so you don't have to protect any pointers passed as
-parameters to it.)
-The C functions @code{call0}, @code{call1}, @code{call2}, and so on,
-provide handy ways to call a Lisp function conveniently with a fixed
-number of arguments.  They work by calling @code{Ffuncall}.
-@file{eval.c} is a very good file to look through for examples;
-@file{lisp.h} contains the definitions for important macros and
-functions.
-@node Writing Good Comments, Adding Global Lisp Variables, Writing Lisp Primitives, Rules When Writing New C Code
-@section Writing Good Comments
-@cindex writing good comments
-@cindex comments, writing good
-Comments are a lifeline for programmers trying to understand tricky
-code.  In general, the less obvious it is what you are doing, the more
-you need a comment, and the more detailed it needs to be.  You should
-always be on guard when you're writing code for stuff that's tricky, and
-should constantly be putting yourself in someone else's shoes and asking
-if that person could figure out without much difficulty what's going
-on. (Assume they are a competent programmer who understands the
-essentials of how the XEmacs code is structured but doesn't know much
-about the module you're working on or any algorithms you're using.) If
-you're not sure whether they would be able to, add a comment.  Always
-err on the side of more comments, rather than less.
-Generally, when making comments, there is no need to attribute them with
-your name or initials.  This especially goes for small,
-easy-to-understand, non-opinionated ones.  Also, comments indicating
-where, when, and by whom a file was changed are @emph{strongly}
-discouraged, and in general will be removed as they are discovered.
-This is exactly what @file{ChangeLogs} are there for.  However, it can
-occasionally be useful to mark exactly where (but not when or by whom)
-changes are made, particularly when making small changes to a file
-imported from elsewhere.  These marks help when later on a newer version
-of the file is imported and the changes need to be merged. (If
-everything were always kept in CVS, there would be no need for this.
-But in practice, this often doesn't happen, or the CVS repository is
-later on lost or unavailable to the person doing the update.)
-When putting in an explicit opinion in a comment, you should
-@emph{always} attribute it with your name and the date.  This also goes
-for long, complex comments explaining in detail the workings of
-something -- by putting your name there, you make it possible for
-someone who has questions about how that thing works to determine who
-wrote the comment so they can write to them.  Use your actual name or
-your alias at xemacs.org, and not your initials or nickname, unless that
-is generally recognized (e.g. @samp{jwz}).  Even then, please consider
-requesting a virtual user at xemacs.org (forwarding address; we can't
-provide an actual mailbox).  Otherwise, give first and last name.  If
-you're not a regular contributor, you might consider putting your email
-address in -- it may be in the ChangeLog, but after awhile ChangeLogs
-have a tendency of disappearing or getting muddled.  (E.g. your comment
-may get copied somewhere else or even into another program, and tracking
-down the proper ChangeLog may be very difficult.)
-If you come across an opinion that is not or is no longer valid, or you
-come across any comment that no longer applies but you want to keep it
-around, enclose it in @samp{[[ } and @samp{ ]]} marks and add a comment
-afterwards explaining why the preceding comment is no longer valid.  Put
-your name on this comment, as explained above.
-Just as comments are a lifeline to programmers, incorrect comments are
-death.  If you come across an incorrect comment, @strong{immediately}
-correct it or flag it as incorrect, as described in the previous
-paragraph.  Whenever you work on a section of code, @emph{always} make
-sure to update any comments to be correct -- or, at the very least, flag
-them as incorrect.
-To indicate a "todo" or other problem, use four pound signs --
-i.e. @samp{####}.
-@node Adding Global Lisp Variables, Writing Macros, Writing Good Comments, Rules When Writing New C Code
-@section Adding Global Lisp Variables
-@cindex global Lisp variables, adding
-@cindex variables, adding global Lisp
-Global variables whose names begin with @samp{Q} are constants whose
-value is a symbol of a particular name.  The name of the variable should
-be derived from the name of the symbol using the same rules as for Lisp
-primitives.  These variables are initialized using a call to
-@code{defsymbol()} in the @code{syms_of_*()} function. (This call
-interns a symbol, sets the C variable to the resulting Lisp object, and
-calls @code{staticpro()} on the C variable to tell the
-garbage-collection mechanism about this variable.  What
-@code{staticpro()} does is add a pointer to the variable to a large
-global array; when garbage-collection happens, all pointers listed in
-the array are used as starting points for marking Lisp objects.  This is
-important because it's quite possible that the only current reference to
-the object is the C variable.  In the case of symbols, the
-@code{staticpro()} doesn't matter all that much because the symbol is
-contained in @code{obarray}, which is itself @code{staticpro()}ed.
-However, it's possible that a naughty user could do something like
-uninterning the symbol out of @code{obarray} or even setting
-@code{obarray} to a different value [although this is likely to make
-XEmacs crash!].)
-@strong{Please note:} It is potentially deadly if you declare a
-@samp{Q...}  variable in two different modules.  The two calls to
-@code{defsymbol()} are no problem, but some linkers will complain about
-multiply-defined symbols.  The most insidious aspect of this is that
-often the link will succeed anyway, but then the resulting executable
-will sometimes crash in obscure ways during certain operations!
-To avoid this problem, declare any symbols with common names (such as
-@code{text}) that are not obviously associated with this particular
-module in the file @file{general-slots.h}.  The ``-slots'' suffix
-indicates that this is a file that is included multiple times in
-@file{general.c}.  Redefinition of preprocessor macros allows the
-effects to be different in each context, so this is actually more
-convenient and less error-prone than doing it in your module.
-Global variables whose names begin with @samp{V} are variables that
-contain Lisp objects.  The convention here is that all global variables
-of type @code{Lisp_Object} begin with @samp{V}, and all others don't
-(including integer and boolean variables that have Lisp
-equivalents). Most of the time, these variables have equivalents in
-Lisp, but some don't.  Those that do are declared this way by a call to
-@code{DEFVAR_LISP()} in the @code{vars_of_*()} initializer for the
-module.  What this does is create a special @dfn{symbol-value-forward}
-Lisp object that contains a pointer to the C variable, intern a symbol
-whose name is as specified in the call to @code{DEFVAR_LISP()}, and set
-its value to the symbol-value-forward Lisp object; it also calls
-@code{staticpro()} on the C variable to tell the garbage-collection
-mechanism about the variable.  When @code{eval} (or actually
-@code{symbol-value}) encounters this special object in the process of
-retrieving a variable's value, it follows the indirection to the C
-variable and gets its value.  @code{setq} does similar things so that
-the C variable gets changed.
-Whether or not you @code{DEFVAR_LISP()} a variable, you need to
-initialize it in the @code{vars_of_*()} function; otherwise it will end
-up as all zeroes, which is the integer 0 (@emph{not} @code{nil}), and
-this is probably not what you want.  Also, if the variable is not
-@code{DEFVAR_LISP()}ed, @strong{you must call} @code{staticpro()} on the
-C variable in the @code{vars_of_*()} function.  Otherwise, the
-garbage-collection mechanism won't know that the object in this variable
-is in use, and will happily collect it and reuse its storage for another
-Lisp object, and you will be the one who's unhappy when you can't figure
-out how your variable got overwritten.
-@node Writing Macros, Proper Use of Unsigned Types, Adding Global Lisp Variables, Rules When Writing New C Code
-@section Writing Macros
-@cindex writing macros
-@cindex macros, writing
-The three golden rules of macros:
-@enumerate
-@item
-Anything that's an lvalue can be evaluated more than once.
-@item
-Macros where anything else can be evaluated more than once should
-have the word "unsafe" in their name (exceptions may be made for
-large sets of macros that evaluate arguments of certain types more
-than once, e.g. struct buffer * arguments, when clearly indicated in
-the macro documentation).  These macros are generally meant to be
-called only by other macros that have already stored the calling
-values in temporary variables.
-@item
-Nothing else can be evaluated more than once.  Use inline
-functions, if necessary, to prevent multiple evaluation.
-@end enumerate
-NOTE: The functions and macros below are given full prototypes in their
-docs, even when the implementation is a macro.  In such cases, passing
-an argument of a type other than expected will produce undefined
-results.  Also, given that macros can do things functions can't (in
-particular, directly modify arguments as if they were passed by
-reference), the declaration syntax has been extended to include the
-call-by-reference syntax from C++, where an & after a type indicates
-that the argument is an lvalue and is passed by reference, i.e. the
-function can modify its value. (This is equivalent in C to passing a
-pointer to the argument, but without the need to explicitly worry about
-pointers.)
-When to capitalize macros:
-@itemize @bullet
-@item
-Capitalize macros doing stuff obviously impossible with (C)
-functions, e.g. directly modifying arguments as if they were passed by
-reference.
-@item
-Capitalize macros that evaluate @strong{any} argument more than once regardless
-of whether that's "allowed" (e.g. buffer arguments).
-@item
-Capitalize macros that directly access a field in a Lisp_Object or
-its equivalent underlying structure.  In such cases, access through the
-Lisp_Object precedes the macro with an X, and access through the underlying
-structure doesn't.
-@item
-Capitalize certain other basic macros relating to Lisp_Objects; e.g.
-FRAMEP, CHECK_FRAME, etc.
-@item
-Try to avoid capitalizing any other macros.
-@end itemize
-@node Proper Use of Unsigned Types, Techniques for XEmacs Developers, Writing Macros, Rules When Writing New C Code
-@section Proper Use of Unsigned Types
-@cindex unsigned types, proper use of
-@cindex types, proper use of unsigned
-Avoid using @code{unsigned int} and @code{unsigned long} whenever
-possible.  Unsigned types are viral -- any arithmetic or comparisons
-involving mixed signed and unsigned types are automatically converted to
-unsigned, which is almost certainly not what you want.  Many subtle and
-hard-to-find bugs are created by careless use of unsigned types.  In
-general, you should almost @emph{never} use an unsigned type to hold a
-regular quantity of any sort.  The only exceptions are
-@enumerate
-@item
-When there's a reasonable possibility you will actually need all 32 or
-64 bits to store the quantity.
-@item
-When calling existing API's that require unsigned types.  In this case,
-you should still do all manipulation using signed types, and do the
-conversion at the very threshold of the API call.
-@item
-In existing code that you don't want to modify because you don't
-maintain it.
-@item
-In bit-field structures.
-@end enumerate
-Other reasonable uses of @code{unsigned int} and @code{unsigned long}
-are representing non-quantities -- e.g. bit-oriented flags and such.
-@node Techniques for XEmacs Developers,  , Proper Use of Unsigned Types, Rules When Writing New C Code
-@section Techniques for XEmacs Developers
-@cindex techniques for XEmacs developers
-@cindex developers, techniques for XEmacs
-@cindex Purify
-@cindex Quantify
-To make a purified XEmacs, do: @code{make puremacs}.
-To make a quantified XEmacs, do: @code{make quantmacs}.
-You simply can't dump Quantified and Purified images (unless using the
-portable dumper).  Purify gets confused when xemacs frees memory in one
-process that was allocated in a @emph{different} process on a different
-machine!  Run it like so:
-@example
-temacs -batch -l loadup.el run-temacs @var{xemacs-args...}
-@end example
-@cindex error checking
-Before you go through the trouble, are you compiling with all
-debugging and error-checking off?  If not, try that first.  Be warned
-that while Quantify is directly responsible for quite a few
-optimizations which have been made to XEmacs, doing a run which
-generates results which can be acted upon is not necessarily a trivial
-task.
-Also, if you're still willing to do some runs make sure you configure
-with the @samp{--quantify} flag.  That will keep Quantify from starting
-to record data until after the loadup is completed and will shut off
-recording right before it shuts down (which generates enough bogus data
-to throw most results off).  It also enables three additional elisp
-commands: @code{quantify-start-recording-data},
-@code{quantify-stop-recording-data} and @code{quantify-clear-data}.
-If you want to make XEmacs faster, target your favorite slow benchmark,
-run a profiler like Quantify, @code{gprof}, or @code{tcov}, and figure
-out where the cycles are going.  In many cases you can localize the
-problem (because a particular new feature or even a single patch
-elicited it).  Don't hesitate to use brute force techniques like a
-global counter incremented at strategic places, especially in
-combination with other performance indications (@emph{e.g.}, degree of
-buffer fragmentation into extents).
-Specific projects:
-@itemize @bullet
-@item
-Make the garbage collector faster.  Figure out how to write an
-incremental garbage collector.
-@item
-Write a compiler that takes bytecode and spits out C code.
-Unfortunately, you will then need a C compiler and a more fully
-developed module system.
-@item
-Speed up redisplay.
-@item
-Speed up syntax highlighting.  It was suggested that ``maybe moving some
-of the syntax highlighting capabilities into C would make a
-difference.''  Wrong idea, I think.  When processing one 400kB file a
-particular low-level routine was being called 40 @emph{million} times
-simply for @emph{one} call to @code{newline-and-indent}.  Syntax
-highlighting needs to be rewritten to use a reliable, fast parser, then
-to trust the pre-parsed structure, and only do re-highlighting locally
-to a text change.  Modern machines are fast enough to implement such
-parsers in Lisp; but no machine will ever be fast enough to deal with
-quadratic (or worse) algorithms!
-@item
-Implement tail recursion in Emacs Lisp (hard!).
-@end itemize
-Unfortunately, Emacs Lisp is slow, and is going to stay slow.  Function
-calls in elisp are especially expensive.  Iterating over a long list is
-going to be 30 times faster implemented in C than in Elisp.
-Heavily used small code fragments need to be fast.  The traditional way
-to implement such code fragments in C is with macros.  But macros in C
-are known to be broken.
-@cindex macro hygiene
-Macro arguments that are repeatedly evaluated may suffer from repeated
-side effects or suboptimal performance.
-Variable names used in macros may collide with caller's variables,
-causing (at least) unwanted compiler warnings.
-In order to solve these problems, and maintain statement semantics, one
-should use the @code{do @{ ... @} while (0)} trick while trying to
-reference macro arguments exactly once using local variables.
-Let's take a look at this poor macro definition:
-@example
-#define MARK_OBJECT(obj) \
-if (!marked_p (obj)) mark_object (obj), did_mark = 1
-@end example
-This macro evaluates its argument twice, and also fails if used like this:
-@example
-if (flag) MARK_OBJECT (obj); else @code{do_something()};
-@end example
-A much better definition is
-@example
-#define MARK_OBJECT(obj) do @{ \
-Lisp_Object mo_obj = (obj); \
-if (!marked_p (mo_obj))     \
-@{                         \
-mark_object (mo_obj);   \
-did_mark = 1;           \
-@}                         \
-@} while (0)
-@end example
-Notice the elimination of double evaluation by using the local variable
-with the obscure name.  Writing safe and efficient macros requires great
-care.  The one problem with macros that cannot be portably worked around
-is, since a C block has no value, a macro used as an expression rather
-than a statement cannot use the techniques just described to avoid
-multiple evaluation.
-@cindex inline functions
-In most cases where a macro has function semantics, an inline function
-is a better implementation technique.  Modern compiler optimizers tend
-to inline functions even if they have no @code{inline} keyword, and
-configure magic ensures that the @code{inline} keyword can be safely
-used as an additional compiler hint.  Inline functions used in a single
-.c files are easy.  The function must already be defined to be
-@code{static}.  Just add another @code{inline} keyword to the
-definition.
-@example
-inline static int
-heavily_used_small_function (int arg)
-@{
-...
-@}
-@end example
-Inline functions in header files are trickier, because we would like to
-make the following optimization if the function is @emph{not} inlined
-(for example, because we're compiling for debugging).  We would like the
-function to be defined externally exactly once, and each calling
-translation unit would create an external reference to the function,
-instead of including a definition of the inline function in the object
-code of every translation unit that uses it.  This optimization is
-currently only available for gcc.  But you don't have to worry about the
-trickiness; just define your inline functions in header files using this
-pattern:
-@example
-INLINE_HEADER int
-i_used_to_be_a_crufty_macro_but_look_at_me_now (int arg);
-INLINE_HEADER int
-i_used_to_be_a_crufty_macro_but_look_at_me_now (int arg)
-@{
-...
-@}
-@end example
-The declaration right before the definition is to prevent warnings when
-compiling with @code{gcc -Wmissing-declarations}.  I consider issuing
-this warning for inline functions a gcc bug, but the gcc maintainers disagree.
-@cindex inline functions, headers
-@cindex header files, inline functions
-Every header which contains inline functions, either directly by using
-@code{INLINE_HEADER} or indirectly by using @code{DECLARE_LRECORD} must
-be added to @file{inline.c}'s includes to make the optimization
-described above work.  (Optimization note: if all INLINE_HEADER
-functions are in fact inlined in all translation units, then the linker
-can just discard @code{inline.o}, since it contains only unreferenced code).
-To get started debugging XEmacs, take a look at the @file{.gdbinit} and
-@file{.dbxrc} files in the @file{src} directory.  See the section in the
-XEmacs FAQ on How to Debug an XEmacs problem with a debugger.
-After making source code changes, run @code{make check} to ensure that
-you haven't introduced any regressions.  If you want to make xemacs more
-reliable, please improve the test suite in @file{tests/automated}.
-Did you make sure you didn't introduce any new compiler warnings?
-Before submitting a patch, please try compiling at least once with
-@example
-configure --with-mule --use-union-type --error-checking=all
-@end example
-Here are things to know when you create a new source file:
-@itemize @bullet
-@item
-All @file{.c} files should @code{#include <config.h>} first.  Almost all
-@file{.c} files should @code{#include "lisp.h"} second.
-@item
-Generated header files should be included using the @samp{#include <...>}
-syntax, not the @samp{#include "..."} syntax.  The generated headers are:
-@file{config.h sheap-adjust.h paths.h Emacs.ad.h}
-The basic rule is that you should assume builds using @samp{--srcdir}
-and the @samp{#include <...>} syntax needs to be used when the
-to-be-included generated file is in a potentially different directory
-@emph{at compile time}.  The non-obvious C rule is that
-@samp{#include "..."} means to search for the included file in the same
-directory as the including file, @emph{not} in the current directory.
-Normally this is not a problem but when building with @samp{--srcdir},
-@file{make} will search the @samp{VPATH} for you, while the C compiler
-knows nothing about it.
-@item
-Header files should @emph{not} include @samp{<config.h>} and
-@samp{"lisp.h"}.  It is the responsibility of the @file{.c} files that
-use it to do so.
-@end itemize
-@cindex Lisp object types, creating
-@cindex creating Lisp object types
-@cindex object types, creating Lisp
-Here is a checklist of things to do when creating a new lisp object type
-named @var{foo}:
-@enumerate
-@item
-create @var{foo}.h
-@item
-create @var{foo}.c
-@item
-add definitions of @code{syms_of_@var{foo}}, etc. to @file{@var{foo}.c}
-@item
-add declarations of @code{syms_of_@var{foo}}, etc. to @file{symsinit.h}
-@item
-add calls to @code{syms_of_@var{foo}}, etc. to @file{emacs.c}
-@item
-add definitions of macros like @code{CHECK_@var{FOO}} and
-@code{@var{FOO}P} to @file{@var{foo}.h}
-@item
-add the new type index to @code{enum lrecord_type}
-@item
-add a DEFINE_LRECORD_IMPLEMENTATION call to @file{@var{foo}.c}
-@item
-add an INIT_LRECORD_IMPLEMENTATION call to @code{syms_of_@var{foo}.c}
-@end enumerate
-@node Regression Testing XEmacs, CVS Techniques, Rules When Writing New C Code, Top
-@chapter Regression Testing XEmacs
-@cindex testing, regression
-@menu
-* How to Regression-Test::
-* Modules for Regression Testing::
-@end menu
-@node How to Regression-Test, Modules for Regression Testing, Regression Testing XEmacs, Regression Testing XEmacs
-@section How to Regression-Test
-@cindex how to regression-test
-@cindex regression-test, how to
-@cindex testing, regression, how to
-The source directory @file{tests/automated} contains XEmacs' automated
-test suite.  The usual way of running all the tests is running
-@code{make check} from the top-level build directory.
-The test suite is unfinished and it's still lacking some essential
-features.  It is nevertheless recommended that you run the tests to
-confirm that XEmacs behaves correctly.
-If you want to run a specific test case, you can do it from the
-command-line like this:
-@example
-$ xemacs -batch -l test-harness.elc -f batch-test-emacs TEST-FILE
-@end example
-If a test fails and you need more information, you can run the test
-suite interactively by loading @file{test-harness.el} into a running
-XEmacs and typing @kbd{M-x test-emacs-test-file RET <filename> RET}.
-You will see a log of passed and failed tests, which should allow you to
-investigate the source of the error and ultimately fix the bug.  If you
-are not capable of, or don't have time for, debugging it yourself,
-please do report the failures using @kbd{M-x report-emacs-bug} or
-@kbd{M-x build-report}.
-@deffn Command test-emacs-test-file file
-Runs the tests in @var{file}.  @file{test-harness.el} must be loaded.
-Defines all the macros described in this node, and undefines them when
-done.
-@end deffn
-Adding a new test file is trivial: just create a new file here and it
-will be run.  There is no need to byte-compile any of the files in
-this directory---the test-harness will take care of any necessary
-byte-compilation.
-Look at the existing test cases for the examples of coding test cases.
-It all boils down to your imagination and judicious use of the macros
-@code{Assert}, @code{Check-Error}, @code{Check-Error-Message}, and
-@code{Check-Message}.  Note that all of these macros are defined only
-for the duration of the test: they do not exist in the global
-environment.
-@deffn Macro Assert expr
-Check that @var{expr} is non-nil at this point in the test.
-@end deffn
-@deffn Macro Check-Error expected-error body
-Check that execution of @var{body} causes @var{expected-error} to be
-signaled.  @var{body} is a @code{progn}-like body, and may contain
-several expressions.  @var{expected-error} is a symbol defined as
-an error by @code{define-error}.
-@end deffn
-@deffn Macro Check-Error-Message expected-error expected-error-regexp body
-Check that execution of @var{body} causes @var{expected-error} to be
-signaled, and generate a message matching @var{expected-error-regexp}.
-@var{body} is a @code{progn}-like body, and may contain several
-expressions.  @var{expected-error} is a symbol defined as an error
-by @code{define-error}.
-@end deffn
-@deffn Macro Check-Message expected-message body
-Check that execution of @var{body} causes @var{expected-message} to be
-generated (using @code{message} or a similar function).  @var{body} is a
-@code{progn}-like body, and may contain several expressions.
-@end deffn
-Here's a simple example checking case-sensitive and case-insensitive
-comparisons from @file{case-tests.el}.
-@example
-(with-temp-buffer
-(insert "Test Buffer")
-(let ((case-fold-search t))
-(goto-char (point-min))
-(Assert (eq (search-forward "test buffer" nil t) 12))
-(goto-char (point-min))
-(Assert (eq (search-forward "Test buffer" nil t) 12))
-(goto-char (point-min))
-(Assert (eq (search-forward "Test Buffer" nil t) 12))
-(setq case-fold-search nil)
-(goto-char (point-min))
-(Assert (not (search-forward "test buffer" nil t)))
-(goto-char (point-min))
-(Assert (not (search-forward "Test buffer" nil t)))
-(goto-char (point-min))
-(Assert (eq (search-forward "Test Buffer" nil t) 12))))
-@end example
-This example could be saved in a file in @file{tests/automated}, and it
-would constitute a complete test, automatically executed when you run
-@kbd{make check} after building XEmacs.  More complex tests may require
-substantial temporary scaffolding to create the environment that elicits
-the bugs, but the top-level @file{Makefile} and @file{test-harness.el}
-handle the running and collection of results from the @code{Assert},
-@code{Check-Error}, @code{Check-Error-Message}, and @code{Check-Message}
-macros.
-Don't suppress tests just because they're due to known bugs not yet
-fixed---use the @code{Known-Bug-Expect-Failure} wrapper macro to mark
-them.
-@deffn Macro Known-Bug-Expect-Failure body
-Arrange for failing tests in @var{body} to generate messages prefixed
-with "KNOWN BUG:" instead of "FAIL:".  @var{body} is a @code{progn}-like
-body, and may contain several tests.
-@end deffn
-A lot of the tests we run push limits; suppress Ebola warning messages
-with the @code{Ignore-Ebola} wrapper macro.
-@deffn Macro Ignore-Ebola body
-Suppress Ebola warning messages while running tests in @var{body}.
-@var{body} is a @code{progn}-like body, and may contain several tests.
-@end deffn
-Both macros are defined temporarily within the test function.  Simple
-examples:
-@example
-;; Apparently Ignore-Ebola is a solution with no problem to address.
-;; There are no examples in 21.5, anyway.
-;; from regexp-tests.el
-(Known-Bug-Expect-Failure
-(Assert (not (string-match "\\b" "")))
-(Assert (not (string-match " \\b" " "))))
-@end example
-In general, you should avoid using functionality from packages in your
-tests, because you can't be sure that everyone will have the required
-package.  However, if you've got a test that works, by all means add it.
-Simply wrap the test in an appropriate test, add a notice that the test
-was skipped, and update the @code{skipped-test-reasons} hashtable.  The
-wrapper macro @code{Skip-Test-Unless} is provided to handle common
-cases.
-@defvar skipped-test-reasons
-Hash table counting the number of times a particular reason is given for
-skipping tests.  This is only defined within @code{test-emacs-test-file}.
-@end defvar
-@deffn Macro Skip-Test-Unless prerequisite reason description body
-@var{prerequisite} is usually a feature test (@code{featurep},
-@code{boundp}, @code{fboundp}).  @var{reason} is a string describing the
-prerequisite; it must be unique because it is used as a hash key in a
-table of reasons for skipping tests.  @var{description} describes the
-tests being skipped, for the test result summary.  @var{body} is a
-@code{progn}-like body, and may contain several tests.
-@end deffn
-@code{Skip-Test-Unless} is defined temporarily within the test function.
-Here's an example of usage from @file{syntax-tests.el}:
-@example
-;; Test forward-comment at buffer boundaries
-(with-temp-buffer
-;; try to use exactly what you need: featurep, boundp, fboundp
-(Skip-Test-Unless (fboundp 'c-mode)
-"c-mode unavailable"
-"comment and parse-partial-sexp tests"
-;; and here's the test code
-(c-mode)
-(insert "// comment\n")
-(forward-comment -2)
-(Assert (eq (point) (point-min)))
-(let ((point (point)))
-(insert "/* comment */")
-(goto-char point)
-(forward-comment 2)
-(Assert (eq (point) (point-max)))
-(parse-partial-sexp point (point-max)))))
-@end example
-@code{Skip-Test-Unless} is intended for use with features that are normally
-present in typical configurations.  For truly optional features, or
-tests that apply to one of several alternative implementations (eg, to
-GTK widgets, but not Athena, Motif, MS Windows, or Carbon), simply
-silently suppress the test if the feature is not available.
-Here are a few general hints for writing tests.
-@enumerate
-@item
-Include related successful cases.  Fixes often break something.
-@item
-Use the Known-Bug-Expect-Failure macro to mark the cases you know
-are going to fail.  We want to be able to distinguish between
-regressions and other unexpected failures, and cases that have
-been (partially) analyzed but not yet repaired.
-@item
-Mark the bug with the date of report.  An ``Unfixed since yyyy-mm-dd''
-gloss for Known-Bug-Expect-Failure is planned to further increase
-developer embarrassment (== incentive to fix the bug), but until then at
-least put a comment about the date so we can easily see when it was
-first reported.
-@item
-It's a matter of your judgement, but you should often use generic tests
-(@emph{e.g.}, @code{eq}) instead of more specific tests (@code{=} for
-numbers) even though you know that arguments ``should'' be of correct
-type.  That is, if the functions used can return generic objects
-(typically @code{nil}), as well as some more specific type that will be
-returned on success.  We don't want failures of those assertions
-reported as ``other failures'' (a wrong-type-arg signal, rather than a
-null return), we want them reported as ``assertion failures.''
-One example is a test that tests @code{(= (string-match this that) 0)},
-expecting a successful match.  Now suppose @code{string-match} is broken
-such that the match fails.  Then it will return @code{nil}, and @code{=}
-will signal ``wrong-type-argument, number-char-or-marker-p, nil'',
-generating an ``other failure'' in the report.  But this should be
-reported as an assertion failure (the test failed in a foreseeable way),
-rather than something else (we don't know what happened because XEmacs
-is broken in a way that we weren't trying to test!)
-@end enumerate
-@node Modules for Regression Testing,  , How to Regression-Test, Regression Testing XEmacs
-@section Modules for Regression Testing
-@cindex modules for regression testing
-@cindex regression testing, modules for
-@example
-@file{test-harness.el}
-@file{base64-tests.el}
-@file{byte-compiler-tests.el}
-@file{case-tests.el}
-@file{ccl-tests.el}
-@file{c-tests.el}
-@file{database-tests.el}
-@file{extent-tests.el}
-@file{hash-table-tests.el}
-@file{lisp-tests.el}
-@file{md5-tests.el}
-@file{mule-tests.el}
-@file{regexp-tests.el}
-@file{symbol-tests.el}
-@file{syntax-tests.el}
-@file{tag-tests.el}
-@file{weak-tests.el}
-@end example
-@file{test-harness.el} defines the macros @code{Assert},
-@code{Check-Error}, @code{Check-Error-Message}, and
-@code{Check-Message}.  The other files are test files, testing various
-XEmacs facilities.  @xref{Regression Testing XEmacs}.
-@node CVS Techniques, The Modules of XEmacs, Regression Testing XEmacs, Top
-@chapter CVS Techniques
-@cindex CVS techniques
-@menu
-* Merging a Branch into the Trunk::
-@end menu
-@node Merging a Branch into the Trunk,  , CVS Techniques, CVS Techniques
-@section Merging a Branch into the Trunk
-@cindex merging a branch into the trunk
-@enumerate
-@item
-If you haven't already done a merge, you will be merging from the branch
-point; otherwise you'll be merging from the last merge point, which
-should be marked by a tag, e.g. @samp{last-sync-ben-mule-21-5}.  In the
-former case, create the last-sync tag, e.g.
-@example
-crw rtag -r ben-mule-21-5-bp last-sync-ben-mule-21-5 xemacs
-@end example
-(You did create a branch point tag when you created the branch, didn't
-you?)
-@item
-Check everything in on your branch.
-@item
-Tag your branch with a pre-sync tag, e.g.
-@example
-crw rtag -r ben-mule-21-5 ben-mule-21-5-pre-feb-20-2002-sync xemacs
-@end example
-Note, you need to use rtag and specify a version with @samp{-r} (use
-@samp{-r HEAD} if necessary) so that removed files are handled correctly
-in some obscure cases.  See section 4.8 of the CVS manual.
-@item
-Tag the trunk so you have a stable place to merge up to in case people
-are asynchronously committing to the trunk, e.g.
-@example
-crw rtag -r HEAD main-branch-ben-mule-21-5-syncpoint-feb-20-2002 xemacs
-crw rtag -F -r main-branch-ben-mule-21-5-syncpoint-feb-20-2002 next-sync-ben-mule-21-5 xemacs
-@end example
-Use -F in the second case because the name might already exist, e.g. if
-you've already done a merge.  We make two tags because one is a
-permanent mark indicating a syncpoint when merging, and the other is a
-symbolic tag to make other operations easier.
-@item
-Make a backup of your source tree (not totally necessary but useful for
-reference and peace of mind): Move one level up from the top directory
-of your branch and do, e.g.
-@example
-cp -a mule mule-backup-2-23-02
-@end example
-@item
-Now, we're ready to merge!  Make sure you're in the top directory of
-your branch and do, e.g.
-@example
-cvs update -j last-sync-ben-mule-21-5 -j next-sync-ben-mule-21-5
-@end example
-@item
-Fix all merge conflicts.  Get the sucker to compile and run.
-@item
-Tag your branch with a post-sync tag, e.g.
-@example
-crw rtag -r ben-mule-21-5 ben-mule-21-5-post-feb-20-2002-sync xemacs
-@end example
-@item
-Update the last-sync tag, e.g.
-@example
-crw rtag -F -r next-sync-ben-mule-21-5 last-sync-ben-mule-21-5 xemacs
-@end example
-@end enumerate
-@node The Modules of XEmacs, Allocation of Objects in XEmacs Lisp, CVS Techniques, Top
 @chapter The Modules of XEmacs
 @cindex modules of XEmacs
 @menu
 * A Summary of the Various XEmacs Modules::
 @end example
 This module provides some terminal-control code necessary on versions of
 AIX prior to 4.1.
+@node Major Textual Changes, Rules When Writing New C Code, The Modules of XEmacs, Top
-@node Allocation of Objects in XEmacs Lisp, Dumping, The Modules of XEmacs, Top
+@chapter Major Textual Changes
+@cindex textual changes, major
+@cindex major textual changes
+Sometimes major textual changes are made to the source.  This means that
+a search-and-replace is done to change type names and such.  Some people
+disagree with such changes, and certainly if done without good reason
+will just lead to headaches.  But it's important to keep the code clean
+and understable, and consistent naming goes a long way towards this.
+An example of the right way to do this was the so-called "great integral
+type renaming".
+@menu
+* Great Integral Type Renaming::
+* Text/Char Type Renaming::
+@end menu
+@node Great Integral Type Renaming, Text/Char Type Renaming, Major Textual Changes, Major Textual Changes
+@section Great Integral Type Renaming
+@cindex Great Integral Type Renaming
+@cindex integral type renaming, great
+@cindex type renaming, integral
+@cindex renaming, integral types
+The purpose of this is to rationalize the names used for various
+integral types, so that they match their intended uses and follow
+consist conventions, and eliminate types that were not semantically
+different from each other.
+The conventions are:
+@itemize @bullet
+@item
+All integral types that measure quantities of anything are signed.  Some
+people disagree vociferously with this, but their arguments are mostly
+theoretical, and are vastly outweighed by the practical headaches of
+mixing signed and unsigned values, and more importantly by the far
+increased likelihood of inadvertent bugs: Because of the broken "viral"
+nature of unsigned quantities in C (operations involving mixed
+signed/unsigned are done unsigned, when exactly the opposite is nearly
+always wanted), even a single error in declaring a quantity unsigned
+that should be signed, or even the even more subtle error of comparing
+signed and unsigned values and forgetting the necessary cast, can be
+catastrophic, as comparisons will yield wrong results.  -Wsign-compare
+is turned on specifically to catch this, but this tends to result in a
+great number of warnings when mixing signed and unsigned, and the casts
+are annoying.  More has been written on this elsewhere.
+@item
+All such quantity types just mentioned boil down to EMACS_INT, which is
+32 bits on 32-bit machines and 64 bits on 64-bit machines.  This is
+guaranteed to be the same size as Lisp objects of type @code{int}, and (as
+far as I can tell) of size_t (unsigned!) and ssize_t.  The only type
+below that is not an EMACS_INT is Hashcode, which is an unsigned value
+of the same size as EMACS_INT.
+@item
+Type names should be relatively short (no more than 10 characters or
+so), with the first letter capitalized and no underscores if they can at
+all be avoided.
+@item
+"count" == a zero-based measurement of some quantity.  Includes sizes,
+offsets, and indexes.
+@item
+"bpos" == a one-based measurement of a position in a buffer.  "Charbpos"
+and "Bytebpos" count text in the buffer, rather than bytes in memory;
+thus Bytebpos does not directly correspond to the memory representation.
+Use "Membpos" for this.
+@item
+"Char" refers to internal-format characters, not to the C type "char",
+which is really a byte.
+@end itemize
+For the actual name changes, see the script below.
+I ran the following script to do the conversion. (NOTE: This script is
+idempotent.  You can safely run it multiple times and it will not screw
+up previous results -- in fact, it will do nothing if nothing has
+changed.  Thus, it can be run repeatedly as necessary to handle patches
+coming in from old workspaces, or old branches.)  There are two tags,
+just before and just after the change: @samp{pre-integral-type-rename}
+and @samp{post-integral-type-rename}.  When merging code from the main
+trunk into a branch, the best thing to do is first merge up to
+@samp{pre-integral-type-rename}, then apply the script and associated
+changes, then merge from @samp{post-integral-type-change} to the
+present. (Alternatively, just do the merging in one operation; but you
+may then have a lot of conflicts needing to be resolved by hand.)
+Script @samp{fixtypes.sh} follows:
+@example
+----------------------------------- cut ------------------------------------
+files="*.[ch] s/*.h m/*.h config.h.in ../configure.in Makefile.in.in ../lib-src/*.[ch] ../lwlib/*.[ch]"
+gr Memory_Count Bytecount $files
+gr Lstream_Data_Count Bytecount $files
+gr Element_Count Elemcount $files
+gr Hash_Code Hashcode $files
+gr extcount bytecount $files
+gr bufpos charbpos $files
+gr bytind bytebpos $files
+gr memind membpos $files
+gr bufbyte intbyte $files
+gr Extcount Bytecount $files
+gr Bufpos Charbpos $files
+gr Bytind Bytebpos $files
+gr Memind Membpos $files
+gr Bufbyte Intbyte $files
+gr EXTCOUNT BYTECOUNT $files
+gr BUFPOS CHARBPOS $files
+gr BYTIND BYTEBPOS $files
+gr MEMIND MEMBPOS $files
+gr BUFBYTE INTBYTE $files
+gr MEMORY_COUNT BYTECOUNT $files
+gr LSTREAM_DATA_COUNT BYTECOUNT $files
+gr ELEMENT_COUNT ELEMCOUNT $files
+gr HASH_CODE HASHCODE $files
+----------------------------------- cut ------------------------------------
+@end example
+The @samp{gr} script, and the scripts it uses, are documented in
+@file{README.global-renaming}, because if placed in this file they would
+need to have their @@ characters doubled, meaning you couldn't easily
+cut and paste from the source.
+In addition to those programs, I needed to fix up a few other
+things, particularly relating to the duplicate definitions of
+types, now that some types merged with others.  Specifically:
+@enumerate
+@item
+in @file{lisp.h}, removed duplicate declarations of Bytecount.  The changed
+code should now look like this: (In each code snippet below, the first
+and last lines are the same as the original, as are all lines outside of
+those lines.  That allows you to locate the section to be replaced, and
+replace the stuff in that section, verifying that there isn't anything
+new added that would need to be kept.)
+@example
+--------------------------------- snip -------------------------------------
+/* Counts of bytes or chars */
+typedef EMACS_INT Bytecount;
+typedef EMACS_INT Charcount;
+/* Counts of elements */
+typedef EMACS_INT Elemcount;
+/* Hash codes */
+typedef unsigned long Hashcode;
+/* ------------------------ dynamic arrays ------------------- */
+--------------------------------- snip -------------------------------------
+@end example
+@item
+in @file{lstream.h}, removed duplicate declaration of Bytecount.  Rewrote the
+comment about this type.  The changed code should now look like this:
+@example
+--------------------------------- snip -------------------------------------
+#endif
+/* The have been some arguments over the what the type should be that
+specifies a count of bytes in a data block to be written out or read in,
+using @code{Lstream_read()}, @code{Lstream_write()}, and related functions.
+Originally it was long, which worked fine; Martin "corrected" these to
+size_t and ssize_t on the grounds that this is theoretically cleaner and
+is in keeping with the C standards.  Unfortunately, this practice is
+horribly error-prone due to design flaws in the way that mixed
+signed/unsigned arithmetic happens.  In fact, by doing this change,
+Martin introduced a subtle but fatal error that caused the operation of
+sending large mail messages to the SMTP server under Windows to fail.
+By putting all values back to be signed, avoiding any signed/unsigned
+mixing, the bug immediately went away.  The type then in use was
+Lstream_Data_Count, so that it be reverted cleanly if a vote came to
+that.  Now it is Bytecount.
+Some earlier comments about why the type must be signed: This MUST BE
+SIGNED, since it also is used in functions that return the number of
+bytes actually read to or written from in an operation, and these
+functions can return -1 to signal error.
+Note that the standard Unix @code{read()} and @code{write()} functions define the
+count going in as a size_t, which is UNSIGNED, and the count going
+out as an ssize_t, which is SIGNED.  This is a horrible design
+flaw.  Not only is it highly likely to lead to logic errors when a
+-1 gets interpreted as a large positive number, but operations are
+bound to fail in all sorts of horrible ways when a number in the
+upper-half of the size_t range is passed in -- this number is
+unrepresentable as an ssize_t, so code that checks to see how many
+bytes are actually written (which is mandatory if you are dealing
+with certain types of devices) will get completely screwed up.
+--ben
+*/
+typedef enum lstream_buffering
+--------------------------------- snip -------------------------------------
+@end example
+@item
+in @file{dumper.c}, there are four places, all inside of @code{switch()} statements,
+where XD_BYTECOUNT appears twice as a case tag.  In each case, the two
+case blocks contain identical code, and you should *REMOVE THE SECOND*
+and leave the first.
+@end enumerate
+@node Text/Char Type Renaming,  , Great Integral Type Renaming, Major Textual Changes
+@section Text/Char Type Renaming
+@cindex Text/Char Type Renaming
+@cindex type renaming, text/char
+@cindex renaming, text/char types
+The purpose of this was
+@enumerate
+@item
+To distinguish between ``charptr'' when it refers to operations on
+the pointer itself and when it refers to operations on text
+@item
+To use consistent naming for everything referring to internal format, i.e.
+@end enumerate
+@example
+	Itext == text in internal format
+	Ibyte == a byte in such text
+	Ichar == a char as represented in internal character format
+@end example
+Thus e.g.
+@example
+	set_charptr_emchar -> set_itext_ichar
+@end example
+This was done using a script like this:
+@example
+files="*.[ch] s/*.h m/*.h config.h.in ../configure.in Makefile.in.in ../lib-src/*.[ch] ../lwlib/*.[ch]"
+gr Intbyte Ibyte $files
+gr INTBYTE IBYTE $files
+gr intbyte ibyte $files
+gr EMCHAR ICHAR $files
+gr emchar ichar $files
+gr Emchar Ichar $files
+gr INC_CHARPTR INC_IBYTEPTR $files
+gr DEC_CHARPTR DEC_IBYTEPTR $files
+gr VALIDATE_CHARPTR VALIDATE_IBYTEPTR $files
+gr valid_charptr valid_ibyteptr $files
+gr CHARPTR ITEXT $files
+gr charptr itext $files
+gr Charptr Itext $files
+@end example
+See above for the source to @samp{gr}.
+As in the integral-types change, there are pre and post tags before and
+after the change:
+@example
+	pre-internal-format-textual-renaming
+	post-internal-format-textual-renaming
+@end example
+When merging a large branch, follow the same sort of procedure
+documented above, using these tags -- essentially sync up to the pre
+tag, then apply the script yourself, then sync from the post tag to the
+present.  You can probably do the same if you don't have a separate
+workspace, but do have lots of outstanding changes and you'd rather not
+just merge all the textual changes directly.  Use something like this:
+(WARNING: I'm not a CVS guru; before trying this, or any large operation
+that might potentially mess things up, @strong{DEFINITELY} make a backup of
+your existing workspace.)
+@example
+cup -r pre-internal-format-textual-renaming
+<apply script>
+cup -A -j post-internal-format-textual-renaming -j HEAD
+@end example
+This might also work:
+@example
+cup -j pre-internal-format-textual-renaming
+<apply script>
+cup -j post-internal-format-textual-renaming -j HEAD
+@end example
+ben
+The following is a script to go in the opposite direction:
+@example
+files="*.[ch] s/*.h m/*.h config.h.in ../configure.in Makefile.in.in ../lib-src/*.[ch] ../lwlib/*.[ch]"
+# Evidently Perl considers _ to be a word char ala \b, even though XEmacs
+# doesn't.  We need to be careful here with ibyte/ichar because of words
+# like Richard, @code{eicharlen()}, multibyte, HIBYTE, etc.
+gr Ibyte Intbyte $files
+gr '\bIBYTE' INTBYTE $files
+gr '\bibyte' intbyte $files
+gr '\bICHAR' EMCHAR $files
+gr '\bichar' emchar $files
+gr '\bIchar' Emchar $files
+gr '\bIBYTEPTR' CHARPTR $files
+gr '\bibyteptr' charptr $files
+gr '\bITEXT' CHARPTR $files
+gr '\bitext' charptr $files
+gr '\bItext' CHARPTR $files
+gr '_IBYTE' _INTBYTE $files
+gr '_ibyte' _intbyte $files
+gr '_ICHAR' _EMCHAR $files
+gr '_ichar' _emchar $files
+gr '_Ichar' _Emchar $files
+gr '_IBYTEPTR' _CHARPTR $files
+gr '_ibyteptr' _charptr $files
+gr '_ITEXT' _CHARPTR $files
+gr '_itext' _charptr $files
+gr '_Itext' _CHARPTR $files
+@end example
+@node Rules When Writing New C Code, Regression Testing XEmacs, Major Textual Changes, Top
+@chapter Rules When Writing New C Code
+@cindex writing new C code, rules when
+@cindex C code, rules when writing new
+@cindex code, rules when writing new C
+The XEmacs C Code is extremely complex and intricate, and there are many
+rules that are more or less consistently followed throughout the code.
+Many of these rules are not obvious, so they are explained here.  It is
+of the utmost importance that you follow them.  If you don't, you may
+get something that appears to work, but which will crash in odd
+situations, often in code far away from where the actual breakage is.
+@menu
+* A Reader's Guide to XEmacs Coding Conventions::
+* General Coding Rules::
+* Object-Oriented Techniques for C::
+* Writing Lisp Primitives::
+* Writing Good Comments::
+* Adding Global Lisp Variables::
+* Writing Macros::
+* Proper Use of Unsigned Types::
+* Techniques for XEmacs Developers::
+@end menu
+See also @ref{Coding for Mule}.
+@node A Reader's Guide to XEmacs Coding Conventions, General Coding Rules, Rules When Writing New C Code, Rules When Writing New C Code
+@section A Reader's Guide to XEmacs Coding Conventions
+@cindex coding conventions
+@cindex reader's guide
+@cindex coding rules, naming
+Of course the low-level implementation language of XEmacs is C, but much
+of that uses the Lisp engine to do its work.  However, because the code
+is ``inside'' of the protective containment shell around the ``reactor
+core,'' you'll see lots of complex ``plumbing'' needed to do the work
+and ``safety mechanisms,'' whose failure results in a meltdown.  This
+section provides a quick overview (or review) of the various components
+of the implementation of Lisp objects.
+Two typographic conventions help to identify C objects that implement
+Lisp objects.  The first is that capitalized identifiers, especially
+beginning with the letters @samp{Q}, @samp{V}, @samp{F}, and @samp{S},
+for C variables and functions, and C macros with beginning with the
+letter @samp{X}, are used to implement Lisp.  The second is that where
+Lisp uses the hyphen @samp{-} in symbol names, the corresponding C
+identifiers use the underscore @samp{_}.  Of course, since XEmacs Lisp
+contains interfaces to many external libraries, those external names
+will follow the coding conventions their authors chose, and may overlap
+the ``XEmacs name space.''  However these cases are usually pretty
+obvious.
+All Lisp objects are handled indirectly.  The @code{Lisp_Object}
+type is usually a pointer to a structure, except for a very small number
+of types with immediate representations (currently characters and
+integers).  However, these types cannot be directly operated on in C
+code, either, so they can also be considered indirect.  Types that do
+not have an immediate representation always have a C typedef
+@code{Lisp_@var{type}} for a corresponding structure.
+@c #### mention l(c)records here?
+In older code, it was common practice to pass around pointers to
+@code{Lisp_@var{type}}, but this is now deprecated in favor of using
+@code{Lisp_Object} for all function arguments and return values that are
+Lisp objects.  The @code{X@var{type}} macro is used to extract the
+pointer and cast it to @code{(Lisp_@var{type} *)} for the desired type.
+@strong{Convention}: macros whose names begin with @samp{X} operate on
+@code{Lisp_Object}s and do no type-checking.  Many such macros are type
+extractors, but others implement Lisp operations in C (@emph{e.g.},
+@code{XCAR} implements the Lisp @code{car} function).  These are unsafe,
+and must only be used where types of all data have already been checked.
+Such macros are only applied to @code{Lisp_Object}s.  In internal
+implementations where the pointer has already been converted, the
+structure is operated on directly using the C @code{->} member access
+operator.
+The @code{@var{type}P}, @code{CHECK_@var{type}}, and
+@code{CONCHECK_@var{type}} macros are used to test types.  The first
+returns a Boolean value, and the latter signal errors.  (The
+@samp{CONCHECK} variety allows execution to be CONtinued under some
+circumstances, thus the name.)  Functions which expect to be passed user
+data invariably call @samp{CHECK} macros on arguments.
+There are many types of specialized Lisp objects implemented in C, but
+the most pervasive type is the @dfn{symbol}.  Symbols are used as
+identifiers, variables, and functions.
+@strong{Convention}: Global variables whose names begin with @samp{Q}
+are constants whose value is a symbol.  The name of the variable should
+be derived from the name of the symbol using the same rules as for Lisp
+primitives.  Such variables allow the C code to check whether a
+particular @code{Lisp_Object} is equal to a given symbol.  Symbols are
+Lisp objects, so these variables may be passed to Lisp primitives.  (An
+alternative to the use of @samp{Q...} variables is to call the
+@code{intern} function at initialization in the
+@code{vars_of_@var{module}} function, which is hardly less efficient.)
+@strong{Convention}: Global variables whose names begin with @samp{V}
+are variables that contain Lisp objects.  The convention here is that
+all global variables of type @code{Lisp_Object} begin with @samp{V}, and
+no others do (not even integer and boolean variables that have Lisp
+equivalents). Most of the time, these variables have equivalents in
+Lisp, which are defined via the @samp{DEFVAR} family of macros, but some
+don't.  Since the variable's value is a @code{Lisp_Object}, it can be
+passed to Lisp primitives.
+The implementation of Lisp primitives is more complex.
+@strong{Convention}: Global variables with names beginning with @samp{S}
+contain a structure that allows the Lisp engine to identify and call a C
+function.  In modern versions of XEmacs, these identifiers are almost
+always completely hidden in the @code{DEFUN} and @code{SUBR} macros, but
+you will encounter them if you look at very old versions of XEmacs or at
+GNU Emacs.  @strong{Convention}: Functions with names beginning with
+@samp{F} implement Lisp primitives.  Of course all their arguments and
+their return values must be Lisp_Objects.  (This is hidden in the
+@code{DEFUN} macro.)
+@node General Coding Rules, Object-Oriented Techniques for C, A Reader's Guide to XEmacs Coding Conventions, Rules When Writing New C Code
+@section General Coding Rules
+@cindex coding rules, general
+The C code is actually written in a dialect of C called @dfn{Clean C},
+meaning that it can be compiled, mostly warning-free, with either a C or
+C++ compiler.  Coding in Clean C has several advantages over plain C.
+C++ compilers are more nit-picking, and a number of coding errors have
+been found by compiling with C++.  The ability to use both C and C++
+tools means that a greater variety of development tools are available to
+the developer.  In addition, the ability to overload operators in C++
+means it is possible, for error-checking purposes, to redefine certain
+simple types (normally defined as aliases for simple built-in types such
+as @code{unsigned char} or @code{long}) as classes, strictly limiting the permissible
+operations and catching illegal implicit casts and such.
+Every module includes @file{<config.h>} (angle brackets so that
+@samp{--srcdir} works correctly; @file{config.h} may or may not be in
+the same directory as the C sources) and @file{lisp.h}.  @file{config.h}
+must always be included before any other header files (including
+system header files) to ensure that certain tricks played by various
+@file{s/} and @file{m/} files work out correctly.
+When including header files, always use angle brackets, not double
+quotes, except when the file to be included is always in the same
+directory as the including file.  If either file is a generated file,
+then that is not likely to be the case.  In order to understand why we
+have this rule, imagine what happens when you do a build in the source
+directory using @samp{./configure} and another build in another
+directory using @samp{../work/configure}.  There will be two different
+@file{config.h} files.  Which one will be used if you @samp{#include
+"config.h"}?
+Almost every module contains a @code{syms_of_*()} function and a
+@code{vars_of_*()} function.  The former declares any Lisp primitives
+you have defined and defines any symbols you will be using.  The latter
+declares any global Lisp variables you have added and initializes global
+C variables in the module.  @strong{Important}: There are stringent
+requirements on exactly what can go into these functions.  See the
+comment in @file{emacs.c}.  The reason for this is to avoid obscure
+unwanted interactions during initialization.  If you don't follow these
+rules, you'll be sorry!  If you want to do anything that isn't allowed,
+create a @code{complex_vars_of_*()} function for it.  Doing this is
+tricky, though: you have to make sure your function is called at the
+right time so that all the initialization dependencies work out.
+Declare each function of these kinds in @file{symsinit.h}.  Make sure
+it's called in the appropriate place in @file{emacs.c}.  You never need
+to include @file{symsinit.h} directly, because it is included by
+@file{lisp.h}.
+@strong{All global and static variables that are to be modifiable must
+be declared uninitialized.}  This means that you may not use the
+``declare with initializer'' form for these variables, such as @code{int
+some_variable = 0;}.  The reason for this has to do with some kludges
+done during the dumping process: If possible, the initialized data
+segment is re-mapped so that it becomes part of the (unmodifiable) code
+segment in the dumped executable.  This allows this memory to be shared
+among multiple running XEmacs processes.  XEmacs is careful to place as
+much constant data as possible into initialized variables during the
+@file{temacs} phase.
+@cindex copy-on-write
+@strong{Please note:} This kludge only works on a few systems nowadays,
+and is rapidly becoming irrelevant because most modern operating systems
+provide @dfn{copy-on-write} semantics.  All data is initially shared
+between processes, and a private copy is automatically made (on a
+page-by-page basis) when a process first attempts to write to a page of
+memory.
+Formerly, there was a requirement that static variables not be declared
+inside of functions.  This had to do with another hack along the same
+vein as what was just described: old USG systems put statically-declared
+variables in the initialized data space, so those header files had a
+@code{#define static} declaration. (That way, the data-segment remapping
+described above could still work.) This fails badly on static variables
+inside of functions, which suddenly become automatic variables;
+therefore, you weren't supposed to have any of them.  This awful kludge
+has been removed in XEmacs because
+@enumerate
+@item
+almost all of the systems that used this kludge ended up having
+to disable the data-segment remapping anyway;
+@item
+the only systems that didn't were extremely outdated ones;
+@item
+this hack completely messed up inline functions.
+@end enumerate
+The C source code makes heavy use of C preprocessor macros.  One popular
+macro style is:
+@example
+#define FOO(var, value) do @{            \
+Lisp_Object FOO_value = (value);      \
+... /* compute using FOO_value */     \
+(var) = bar;                          \
+@} while (0)
+@end example
+The @code{do @{...@} while (0)} is a standard trick to allow FOO to have
+statement semantics, so that it can safely be used within an @code{if}
+statement in C, for example.  Multiple evaluation is prevented by
+copying a supplied argument into a local variable, so that
+@code{FOO(var,fun(1))} only calls @code{fun} once.
+Lisp lists are popular data structures in the C code as well as in
+Elisp.  There are two sets of macros that iterate over lists.
+@code{EXTERNAL_LIST_LOOP_@var{n}} should be used when the list has been
+supplied by the user, and cannot be trusted to be acyclic and
+@code{nil}-terminated.  A @code{malformed-list} or @code{circular-list} error
+will be generated if the list being iterated over is not entirely
+kosher.  @code{LIST_LOOP_@var{n}}, on the other hand, is faster and less
+safe, and can be used only on trusted lists.
+Related macros are @code{GET_EXTERNAL_LIST_LENGTH} and
+@code{GET_LIST_LENGTH}, which calculate the length of a list, and in the
+case of @code{GET_EXTERNAL_LIST_LENGTH}, validating the properness of
+the list.  The macros @code{EXTERNAL_LIST_LOOP_DELETE_IF} and
+@code{LIST_LOOP_DELETE_IF} delete elements from a lisp list satisfying some
+predicate.
+@node Object-Oriented Techniques for C, Writing Lisp Primitives, General Coding Rules, Rules When Writing New C Code
+@section Object-Oriented Techniques for C
+@cindex coding rules, object-oriented
+@cindex object-oriented techniques
+At the lowest levels, XEmacs makes heavy use of object-oriented
+techniques to promote code-sharing and uniform interfaces for different
+devices and platforms.  Commonly, but not always, such objects are
+``wrapped'' and exported to Lisp as Lisp objects.  Usually they use
+the internal structures developed for Lisp objects (the @samp{lrecord}
+structure) in order to take advantage of Lisp memory management.
+Unfortunately, XEmacs was originally written in C, so these techniques
+are based on heavy use of C macros.
+@c You can't use @var{} for type below, because case is important.
+A module defining a class is likely to use most of the following
+declarations and macros.  In the following, the notation @samp{<type>}
+will stand for the full name of the class, and will be capitalized in
+the way normal for its context.  The notation @samp{<typ>} will stand
+for the abbreviated form commonly used in macro names, while @samp{ty}
+will be used as the typical name for instances of the class.  (See the
+entry for @samp{MAYBE_<TY>METH} below for an example using all three
+notations.)
+In the interface (@file{.h} file), the following declarations are used
+often.  Others may be used in for particular modules.  Since they're
+quite short in most cases, the definitions are given as well.  The
+generic macros used are defined in @file{lisp.h} or @file{lrecord.h}.
+@c #### reorganize this table into stuff used in general code, and stuff
+@c used only in declarations or initializations
+@table @samp
+@c #### declaration
+@item typedef struct Lisp_<Type> Lisp_<Type>
+This refers to the internal structure used by C code.  The XEmacs coding
+style now forbids passing pointers to @samp{Lisp_<Type>} structures into
+or out of a function; instead, a @samp{Lisp_Object} should be passed or
+returned (created using @samp{wrap_<type>}, if necessary).
+@c #### declaration
+@item DECLARE_LRECORD (<type>, Lisp_<Type>)
+Declares an @samp{lrecord} for @samp{<Type>}, which is the unit of
+allocation.
+@item #define X<TYPE>(x) XRECORD (x, <type>, Lisp_<Type>)
+Turns a @code{Lisp_Object} into a pointer to @samp{struct Lisp_<Type>}.
+@item #define wrap_<type>(p) wrap_record (p, <type>)
+Turns a pointer to @samp{struct Lisp_<Type>} into a @code{Lisp_Object}.
+@item #define <TYPE>P(x) RECORDP (x, <type>)
+Tests whether a given @code{Lisp_Object} is of type @samp{Lisp_<Type>}.
+Returns a C int, not a Lisp Boolean value.
+@item #define CHECK_<TYPE>(x) CHECK_RECORD (x, <type>)
+@itemx #define CONCHECK_<TYPE>(x) CONCHECK_RECORD (x, <type>)
+Tests whether a given @code{Lisp_Object} is of type @samp{Lisp_<Type>},
+and signals a Lisp error if not.  The @samp{CHECK} version of the macro
+never returns if the type is wrong, while the @samp{CONCHECK} version
+can return if the user catches it in the debugger and explicitly
+requests a return.
+@item #define RAW_<TYP>METH(ty, m) ((ty)->methods->m##_method)
+Return a function pointer for the method for an object @var{TY} of class
+@samp{Lisp_<Type>}, or @samp{NULL} if there is none for this type.
+@item #define HAS_<TYP>METH_P(ty, m) (!!RAW_<TYP>METH (ty, m))
+Test whether the class that @var{TY} is an instance of has the method.
+@item #define <TYP>METH(ty, m, args) ((RAW_<TYP>METH (ty, m)) args)
+Call the method on @samp{args}.  @samp{args} must be enclosed in
+parentheses in the call.  It is the programmer's responsibility to
+ensure that the method is available.  The standard convenience macro
+@samp{MAYBE_<TYP>METH} is often provided for the common case where a
+void-returning method of @samp{Type} is called.
+@item #define MAYBE_<TYP>METH(ty, m, args) do @{ ... @} while (0)
+Call a void-returning @samp{<Type>} method, if it exists.  Note the use
+of the @samp{do ... while (0)} idiom to give the macro call C statement
+semantics.  The full definition is equally idiomatic:
+@example
+#define MAYBE_<TYP>METH(ty, m, args) do @{	\
+Lisp_<Type> *maybe_<typ>meth_ty = (ty);	\
+if (HAS_<TYP>METH_P (maybe_<typ>meth_ty, m))	\
+<TYP>METH (maybe_<typ>meth_ty, m, args);	\
+@} while (0)
+@end example
+@end table
+The use of macros for invoking an object's methods makes life a bit
+difficult for the student or maintainer when browsing the code.  In
+particular, calls are of the form @samp{<TYP>METH (ty, some_method, (x,
+y))}, but definitions typically are for @samp{<subtype>_some_method}.
+Thus, when you are trying to find calls, you need to grep for
+@samp{some_method}, but this will also catch calls and definitions of
+that method for instances of other subtypes of @samp{<Type>}, and there
+may be a rather large number of them.
+@node Writing Lisp Primitives, Writing Good Comments, Object-Oriented Techniques for C, Rules When Writing New C Code
+@section Writing Lisp Primitives
+@cindex writing Lisp primitives
+@cindex Lisp primitives, writing
+@cindex primitives, writing Lisp
+Lisp primitives are Lisp functions implemented in C.  The details of
+interfacing the C function so that Lisp can call it are handled by a few
+C macros.  The only way to really understand how to write new C code is
+to read the source, but we can explain some things here.
+An example of a special form is the definition of @code{prog1}, from
+@file{eval.c}.  (An ordinary function would have the same general
+appearance.)
+@cindex garbage collection protection
+@smallexample
+@group
+DEFUN ("prog1", Fprog1, 1, UNEVALLED, 0, /*
+Similar to `progn', but the value of the first form is returned.
+\(prog1 FIRST BODY...): All the arguments are evaluated sequentially.
+The value of FIRST is saved during evaluation of the remaining args,
+whose values are discarded.
+*/
+(args))
+@{
+/* This function can GC */
+REGISTER Lisp_Object val, form, tail;
+struct gcpro gcpro1;
+val = Feval (XCAR (args));
+GCPRO1 (val);
+LIST_LOOP_3 (form, XCDR (args), tail)
+Feval (form);
+UNGCPRO;
+return val;
+@}
+@end group
+@end smallexample
+Let's start with a precise explanation of the arguments to the
+@code{DEFUN} macro.  Here is a template for them:
+@example
+@group
+DEFUN (@var{lname}, @var{fname}, @var{min_args}, @var{max_args}, @var{interactive}, /*
+@var{docstring}
+*/
+(@var{arglist}))
+@end group
+@end example
+@table @var
+@item lname
+This string is the name of the Lisp symbol to define as the function
+name; in the example above, it is @code{"prog1"}.
+@item fname
+This is the C function name for this function.  This is the name that is
+used in C code for calling the function.  The name is, by convention,
+@samp{F} prepended to the Lisp name, with all dashes (@samp{-}) in the
+Lisp name changed to underscores.  Thus, to call this function from C
+code, call @code{Fprog1}.  Remember that the arguments are of type
+@code{Lisp_Object}; various macros and functions for creating values of
+type @code{Lisp_Object} are declared in the file @file{lisp.h}.
+Primitives whose names are special characters (e.g. @code{+} or
+@code{<}) are named by spelling out, in some fashion, the special
+character: e.g. @code{Fplus()} or @code{Flss()}.  Primitives whose names
+begin with normal alphanumeric characters but also contain special
+characters are spelled out in some creative way, e.g. @code{let*}
+becomes @code{FletX()}.
+Each function also has an associated structure that holds the data for
+the subr object that represents the function in Lisp.  This structure
+conveys the Lisp symbol name to the initialization routine that will
+create the symbol and store the subr object as its definition.  The C
+variable name of this structure is always @samp{S} prepended to the
+@var{fname}.  You hardly ever need to be aware of the existence of this
+structure, since @code{DEFUN} plus @code{DEFSUBR} takes care of all the
+details.
+@item min_args
+This is the minimum number of arguments that the function requires.  The
+function @code{prog1} allows a minimum of one argument.
+@item max_args
+This is the maximum number of arguments that the function accepts, if
+there is a fixed maximum.  Alternatively, it can be @code{UNEVALLED},
+indicating a special form that receives unevaluated arguments, or
+@code{MANY}, indicating an unlimited number of evaluated arguments (the
+C equivalent of @code{&rest}).  Both @code{UNEVALLED} and @code{MANY}
+are macros.  If @var{max_args} is a number, it may not be less than
+@var{min_args} and it may not be greater than 8. (If you need to add a
+function with more than 8 arguments, use the @code{MANY} form.  Resist
+the urge to edit the definition of @code{DEFUN} in @file{lisp.h}.  If
+you do it anyways, make sure to also add another clause to the switch
+statement in @code{primitive_funcall().})
+@item interactive
+This is an interactive specification, a string such as might be used as
+the argument of @code{interactive} in a Lisp function.  In the case of
+@code{prog1}, it is 0 (a null pointer), indicating that @code{prog1}
+cannot be called interactively.  A value of @code{""} indicates a
+function that should receive no arguments when called interactively.
+@item docstring
+This is the documentation string.  It is written just like a
+documentation string for a function defined in Lisp; in particular, the
+first line should be a single sentence.  Note how the documentation
+string is enclosed in a comment, none of the documentation is placed on
+the same lines as the comment-start and comment-end characters, and the
+comment-start characters are on the same line as the interactive
+specification.  @file{make-docfile}, which scans the C files for
+documentation strings, is very particular about what it looks for, and
+will not properly extract the doc string if it's not in this exact format.
+In order to make both @file{etags} and @file{make-docfile} happy, make
+sure that the @code{DEFUN} line contains the @var{lname} and
+@var{fname}, and that the comment-start characters for the doc string
+are on the same line as the interactive specification, and put a newline
+directly after them (and before the comment-end characters).
+@item arglist
+This is the comma-separated list of arguments to the C function.  For a
+function with a fixed maximum number of arguments, provide a C argument
+for each Lisp argument.  In this case, unlike regular C functions, the
+types of the arguments are not declared; they are simply always of type
+@code{Lisp_Object}.
+The names of the C arguments will be used as the names of the arguments
+to the Lisp primitive as displayed in its documentation, modulo the same
+concerns described above for @code{F...} names (in particular,
+underscores in the C arguments become dashes in the Lisp arguments).
+There is one additional kludge: A trailing @samp{_} on the C argument is
+discarded when forming the Lisp argument.  This allows C language
+reserved words (like @code{default}) or global symbols (like
+@code{dirname}) to be used as argument names without compiler warnings
+or errors.
+A Lisp function with @w{@var{max_args} = @code{UNEVALLED}} is a
+@w{@dfn{special form}}; its arguments are not evaluated.  Instead it
+receives one argument of type @code{Lisp_Object}, a (Lisp) list of the
+unevaluated arguments, conventionally named @code{(args)}.
+When a Lisp function has no upper limit on the number of arguments,
+specify @w{@var{max_args} = @code{MANY}}.  In this case its implementation in
+C actually receives exactly two arguments: the number of Lisp arguments
+(an @code{int}) and the address of a block containing their values (a
+@w{@code{Lisp_Object *}}).  In this case only are the C types specified
+in the @var{arglist}: @w{@code{(int nargs, Lisp_Object *args)}}.
+@end table
+Within the function @code{Fprog1} itself, note the use of the macros
+@code{GCPRO1} and @code{UNGCPRO}.  @code{GCPRO1} is used to ``protect''
+a variable from garbage collection---to inform the garbage collector
+that it must look in that variable and regard the object pointed at by
+its contents as an accessible object.  This is necessary whenever you
+call @code{Feval} or anything that can directly or indirectly call
+@code{Feval} (this includes the @code{QUIT} macro!).  At such a time,
+any Lisp object that you intend to refer to again must be protected
+somehow.  @code{UNGCPRO} cancels the protection of the variables that
+are protected in the current function.  It is necessary to do this
+explicitly.
+The macro @code{GCPRO1} protects just one local variable.  If you want
+to protect two, use @code{GCPRO2} instead; repeating @code{GCPRO1} will
+not work.  Macros @code{GCPRO3} and @code{GCPRO4} also exist.
+These macros implicitly use local variables such as @code{gcpro1}; you
+must declare these explicitly, with type @code{struct gcpro}.  Thus, if
+you use @code{GCPRO2}, you must declare @code{gcpro1} and @code{gcpro2}.
+@cindex caller-protects (@code{GCPRO} rule)
+Note also that the general rule is @dfn{caller-protects}; i.e. you are
+only responsible for protecting those Lisp objects that you create.  Any
+objects passed to you as arguments should have been protected by whoever
+created them, so you don't in general have to protect them.
+In particular, the arguments to any Lisp primitive are always
+automatically @code{GCPRO}ed, when called ``normally'' from Lisp code or
+bytecode.  So only a few Lisp primitives that are called frequently from
+C code, such as @code{Fprogn} protect their arguments as a service to
+their caller.  You don't need to protect your arguments when writing a
+new @code{DEFUN}.
+@code{GCPRO}ing is perhaps the trickiest and most error-prone part of
+XEmacs coding.  It is @strong{extremely} important that you get this
+right and use a great deal of discipline when writing this code.
+@xref{GCPROing, ,@code{GCPRO}ing}, for full details on how to do this.
+What @code{DEFUN} actually does is declare a global structure of type
+@code{Lisp_Subr} whose name begins with capital @samp{SF} and which
+contains information about the primitive (e.g. a pointer to the
+function, its minimum and maximum allowed arguments, a string describing
+its Lisp name); @code{DEFUN} then begins a normal C function declaration
+using the @code{F...} name.  The Lisp subr object that is the function
+definition of a primitive (i.e. the object in the function slot of the
+symbol that names the primitive) actually points to this @samp{SF}
+structure; when @code{Feval} encounters a subr, it looks in the
+structure to find out how to call the C function.
+Defining the C function is not enough to make a Lisp primitive
+available; you must also create the Lisp symbol for the primitive (the
+symbol is @dfn{interned}; @pxref{Obarrays}) and store a suitable subr
+object in its function cell. (If you don't do this, the primitive won't
+be seen by Lisp code.) The code looks like this:
+@example
+DEFSUBR (@var{fname});
+@end example
+@noindent
+Here @var{fname} is the same name you used as the second argument to
+@code{DEFUN}.
+This call to @code{DEFSUBR} should go in the @code{syms_of_*()} function
+at the end of the module.  If no such function exists, create it and
+make sure to also declare it in @file{symsinit.h} and call it from the
+appropriate spot in @code{main()}.  @xref{General Coding Rules}.
+Note that C code cannot call functions by name unless they are defined
+in C.  The way to call a function written in Lisp from C is to use
+@code{Ffuncall}, which embodies the Lisp function @code{funcall}.  Since
+the Lisp function @code{funcall} accepts an unlimited number of
+arguments, in C it takes two: the number of Lisp-level arguments, and a
+one-dimensional array containing their values.  The first Lisp-level
+argument is the Lisp function to call, and the rest are the arguments to
+pass to it.  Since @code{Ffuncall} can call the evaluator, you must
+protect pointers from garbage collection around the call to
+@code{Ffuncall}. (However, @code{Ffuncall} explicitly protects all of
+its parameters, so you don't have to protect any pointers passed as
+parameters to it.)
+The C functions @code{call0}, @code{call1}, @code{call2}, and so on,
+provide handy ways to call a Lisp function conveniently with a fixed
+number of arguments.  They work by calling @code{Ffuncall}.
+@file{eval.c} is a very good file to look through for examples;
+@file{lisp.h} contains the definitions for important macros and
+functions.
+@node Writing Good Comments, Adding Global Lisp Variables, Writing Lisp Primitives, Rules When Writing New C Code
+@section Writing Good Comments
+@cindex writing good comments
+@cindex comments, writing good
+Comments are a lifeline for programmers trying to understand tricky
+code.  In general, the less obvious it is what you are doing, the more
+you need a comment, and the more detailed it needs to be.  You should
+always be on guard when you're writing code for stuff that's tricky, and
+should constantly be putting yourself in someone else's shoes and asking
+if that person could figure out without much difficulty what's going
+on. (Assume they are a competent programmer who understands the
+essentials of how the XEmacs code is structured but doesn't know much
+about the module you're working on or any algorithms you're using.) If
+you're not sure whether they would be able to, add a comment.  Always
+err on the side of more comments, rather than less.
+Generally, when making comments, there is no need to attribute them with
+your name or initials.  This especially goes for small,
+easy-to-understand, non-opinionated ones.  Also, comments indicating
+where, when, and by whom a file was changed are @emph{strongly}
+discouraged, and in general will be removed as they are discovered.
+This is exactly what @file{ChangeLogs} are there for.  However, it can
+occasionally be useful to mark exactly where (but not when or by whom)
+changes are made, particularly when making small changes to a file
+imported from elsewhere.  These marks help when later on a newer version
+of the file is imported and the changes need to be merged. (If
+everything were always kept in CVS, there would be no need for this.
+But in practice, this often doesn't happen, or the CVS repository is
+later on lost or unavailable to the person doing the update.)
+When putting in an explicit opinion in a comment, you should
+@emph{always} attribute it with your name and the date.  This also goes
+for long, complex comments explaining in detail the workings of
+something -- by putting your name there, you make it possible for
+someone who has questions about how that thing works to determine who
+wrote the comment so they can write to them.  Use your actual name or
+your alias at xemacs.org, and not your initials or nickname, unless that
+is generally recognized (e.g. @samp{jwz}).  Even then, please consider
+requesting a virtual user at xemacs.org (forwarding address; we can't
+provide an actual mailbox).  Otherwise, give first and last name.  If
+you're not a regular contributor, you might consider putting your email
+address in -- it may be in the ChangeLog, but after awhile ChangeLogs
+have a tendency of disappearing or getting muddled.  (E.g. your comment
+may get copied somewhere else or even into another program, and tracking
+down the proper ChangeLog may be very difficult.)
+If you come across an opinion that is not or is no longer valid, or you
+come across any comment that no longer applies but you want to keep it
+around, enclose it in @samp{[[ } and @samp{ ]]} marks and add a comment
+afterwards explaining why the preceding comment is no longer valid.  Put
+your name on this comment, as explained above.
+Just as comments are a lifeline to programmers, incorrect comments are
+death.  If you come across an incorrect comment, @strong{immediately}
+correct it or flag it as incorrect, as described in the previous
+paragraph.  Whenever you work on a section of code, @emph{always} make
+sure to update any comments to be correct -- or, at the very least, flag
+them as incorrect.
+To indicate a "todo" or other problem, use four pound signs --
+i.e. @samp{####}.
+@node Adding Global Lisp Variables, Writing Macros, Writing Good Comments, Rules When Writing New C Code
+@section Adding Global Lisp Variables
+@cindex global Lisp variables, adding
+@cindex variables, adding global Lisp
+Global variables whose names begin with @samp{Q} are constants whose
+value is a symbol of a particular name.  The name of the variable should
+be derived from the name of the symbol using the same rules as for Lisp
+primitives.  These variables are initialized using a call to
+@code{defsymbol()} in the @code{syms_of_*()} function. (This call
+interns a symbol, sets the C variable to the resulting Lisp object, and
+calls @code{staticpro()} on the C variable to tell the
+garbage-collection mechanism about this variable.  What
+@code{staticpro()} does is add a pointer to the variable to a large
+global array; when garbage-collection happens, all pointers listed in
+the array are used as starting points for marking Lisp objects.  This is
+important because it's quite possible that the only current reference to
+the object is the C variable.  In the case of symbols, the
+@code{staticpro()} doesn't matter all that much because the symbol is
+contained in @code{obarray}, which is itself @code{staticpro()}ed.
+However, it's possible that a naughty user could do something like
+uninterning the symbol out of @code{obarray} or even setting
+@code{obarray} to a different value [although this is likely to make
+XEmacs crash!].)
+@strong{Please note:} It is potentially deadly if you declare a
+@samp{Q...}  variable in two different modules.  The two calls to
+@code{defsymbol()} are no problem, but some linkers will complain about
+multiply-defined symbols.  The most insidious aspect of this is that
+often the link will succeed anyway, but then the resulting executable
+will sometimes crash in obscure ways during certain operations!
+To avoid this problem, declare any symbols with common names (such as
+@code{text}) that are not obviously associated with this particular
+module in the file @file{general-slots.h}.  The ``-slots'' suffix
+indicates that this is a file that is included multiple times in
+@file{general.c}.  Redefinition of preprocessor macros allows the
+effects to be different in each context, so this is actually more
+convenient and less error-prone than doing it in your module.
+Global variables whose names begin with @samp{V} are variables that
+contain Lisp objects.  The convention here is that all global variables
+of type @code{Lisp_Object} begin with @samp{V}, and all others don't
+(including integer and boolean variables that have Lisp
+equivalents). Most of the time, these variables have equivalents in
+Lisp, but some don't.  Those that do are declared this way by a call to
+@code{DEFVAR_LISP()} in the @code{vars_of_*()} initializer for the
+module.  What this does is create a special @dfn{symbol-value-forward}
+Lisp object that contains a pointer to the C variable, intern a symbol
+whose name is as specified in the call to @code{DEFVAR_LISP()}, and set
+its value to the symbol-value-forward Lisp object; it also calls
+@code{staticpro()} on the C variable to tell the garbage-collection
+mechanism about the variable.  When @code{eval} (or actually
+@code{symbol-value}) encounters this special object in the process of
+retrieving a variable's value, it follows the indirection to the C
+variable and gets its value.  @code{setq} does similar things so that
+the C variable gets changed.
+Whether or not you @code{DEFVAR_LISP()} a variable, you need to
+initialize it in the @code{vars_of_*()} function; otherwise it will end
+up as all zeroes, which is the integer 0 (@emph{not} @code{nil}), and
+this is probably not what you want.  Also, if the variable is not
+@code{DEFVAR_LISP()}ed, @strong{you must call} @code{staticpro()} on the
+C variable in the @code{vars_of_*()} function.  Otherwise, the
+garbage-collection mechanism won't know that the object in this variable
+is in use, and will happily collect it and reuse its storage for another
+Lisp object, and you will be the one who's unhappy when you can't figure
+out how your variable got overwritten.
+@node Writing Macros, Proper Use of Unsigned Types, Adding Global Lisp Variables, Rules When Writing New C Code
+@section Writing Macros
+@cindex writing macros
+@cindex macros, writing
+The three golden rules of macros:
+@enumerate
+@item
+Anything that's an lvalue can be evaluated more than once.
+@item
+Macros where anything else can be evaluated more than once should
+have the word "unsafe" in their name (exceptions may be made for
+large sets of macros that evaluate arguments of certain types more
+than once, e.g. struct buffer * arguments, when clearly indicated in
+the macro documentation).  These macros are generally meant to be
+called only by other macros that have already stored the calling
+values in temporary variables.
+@item
+Nothing else can be evaluated more than once.  Use inline
+functions, if necessary, to prevent multiple evaluation.
+@end enumerate
+NOTE: The functions and macros below are given full prototypes in their
+docs, even when the implementation is a macro.  In such cases, passing
+an argument of a type other than expected will produce undefined
+results.  Also, given that macros can do things functions can't (in
+particular, directly modify arguments as if they were passed by
+reference), the declaration syntax has been extended to include the
+call-by-reference syntax from C++, where an & after a type indicates
+that the argument is an lvalue and is passed by reference, i.e. the
+function can modify its value. (This is equivalent in C to passing a
+pointer to the argument, but without the need to explicitly worry about
+pointers.)
+When to capitalize macros:
+@itemize @bullet
+@item
+Capitalize macros doing stuff obviously impossible with (C)
+functions, e.g. directly modifying arguments as if they were passed by
+reference.
+@item
+Capitalize macros that evaluate @strong{any} argument more than once regardless
+of whether that's "allowed" (e.g. buffer arguments).
+@item
+Capitalize macros that directly access a field in a Lisp_Object or
+its equivalent underlying structure.  In such cases, access through the
+Lisp_Object precedes the macro with an X, and access through the underlying
+structure doesn't.
+@item
+Capitalize certain other basic macros relating to Lisp_Objects; e.g.
+FRAMEP, CHECK_FRAME, etc.
+@item
+Try to avoid capitalizing any other macros.
+@end itemize
+@node Proper Use of Unsigned Types, Techniques for XEmacs Developers, Writing Macros, Rules When Writing New C Code
+@section Proper Use of Unsigned Types
+@cindex unsigned types, proper use of
+@cindex types, proper use of unsigned
+Avoid using @code{unsigned int} and @code{unsigned long} whenever
+possible.  Unsigned types are viral -- any arithmetic or comparisons
+involving mixed signed and unsigned types are automatically converted to
+unsigned, which is almost certainly not what you want.  Many subtle and
+hard-to-find bugs are created by careless use of unsigned types.  In
+general, you should almost @emph{never} use an unsigned type to hold a
+regular quantity of any sort.  The only exceptions are
+@enumerate
+@item
+When there's a reasonable possibility you will actually need all 32 or
+64 bits to store the quantity.
+@item
+When calling existing API's that require unsigned types.  In this case,
+you should still do all manipulation using signed types, and do the
+conversion at the very threshold of the API call.
+@item
+In existing code that you don't want to modify because you don't
+maintain it.
+@item
+In bit-field structures.
+@end enumerate
+Other reasonable uses of @code{unsigned int} and @code{unsigned long}
+are representing non-quantities -- e.g. bit-oriented flags and such.
+@node Techniques for XEmacs Developers,  , Proper Use of Unsigned Types, Rules When Writing New C Code
+@section Techniques for XEmacs Developers
+@cindex techniques for XEmacs developers
+@cindex developers, techniques for XEmacs
+@cindex Purify
+@cindex Quantify
+To make a purified XEmacs, do: @code{make puremacs}.
+To make a quantified XEmacs, do: @code{make quantmacs}.
+You simply can't dump Quantified and Purified images (unless using the
+portable dumper).  Purify gets confused when xemacs frees memory in one
+process that was allocated in a @emph{different} process on a different
+machine!  Run it like so:
+@example
+temacs -batch -l loadup.el run-temacs @var{xemacs-args...}
+@end example
+@cindex error checking
+Before you go through the trouble, are you compiling with all
+debugging and error-checking off?  If not, try that first.  Be warned
+that while Quantify is directly responsible for quite a few
+optimizations which have been made to XEmacs, doing a run which
+generates results which can be acted upon is not necessarily a trivial
+task.
+Also, if you're still willing to do some runs make sure you configure
+with the @samp{--quantify} flag.  That will keep Quantify from starting
+to record data until after the loadup is completed and will shut off
+recording right before it shuts down (which generates enough bogus data
+to throw most results off).  It also enables three additional elisp
+commands: @code{quantify-start-recording-data},
+@code{quantify-stop-recording-data} and @code{quantify-clear-data}.
+If you want to make XEmacs faster, target your favorite slow benchmark,
+run a profiler like Quantify, @code{gprof}, or @code{tcov}, and figure
+out where the cycles are going.  In many cases you can localize the
+problem (because a particular new feature or even a single patch
+elicited it).  Don't hesitate to use brute force techniques like a
+global counter incremented at strategic places, especially in
+combination with other performance indications (@emph{e.g.}, degree of
+buffer fragmentation into extents).
+Specific projects:
+@itemize @bullet
+@item
+Make the garbage collector faster.  Figure out how to write an
+incremental garbage collector.
+@item
+Write a compiler that takes bytecode and spits out C code.
+Unfortunately, you will then need a C compiler and a more fully
+developed module system.
+@item
+Speed up redisplay.
+@item
+Speed up syntax highlighting.  It was suggested that ``maybe moving some
+of the syntax highlighting capabilities into C would make a
+difference.''  Wrong idea, I think.  When processing one 400kB file a
+particular low-level routine was being called 40 @emph{million} times
+simply for @emph{one} call to @code{newline-and-indent}.  Syntax
+highlighting needs to be rewritten to use a reliable, fast parser, then
+to trust the pre-parsed structure, and only do re-highlighting locally
+to a text change.  Modern machines are fast enough to implement such
+parsers in Lisp; but no machine will ever be fast enough to deal with
+quadratic (or worse) algorithms!
+@item
+Implement tail recursion in Emacs Lisp (hard!).
+@end itemize
+Unfortunately, Emacs Lisp is slow, and is going to stay slow.  Function
+calls in elisp are especially expensive.  Iterating over a long list is
+going to be 30 times faster implemented in C than in Elisp.
+Heavily used small code fragments need to be fast.  The traditional way
+to implement such code fragments in C is with macros.  But macros in C
+are known to be broken.
+@cindex macro hygiene
+Macro arguments that are repeatedly evaluated may suffer from repeated
+side effects or suboptimal performance.
+Variable names used in macros may collide with caller's variables,
+causing (at least) unwanted compiler warnings.
+In order to solve these problems, and maintain statement semantics, one
+should use the @code{do @{ ... @} while (0)} trick while trying to
+reference macro arguments exactly once using local variables.
+Let's take a look at this poor macro definition:
+@example
+#define MARK_OBJECT(obj) \
+if (!marked_p (obj)) mark_object (obj), did_mark = 1
+@end example
+This macro evaluates its argument twice, and also fails if used like this:
+@example
+if (flag) MARK_OBJECT (obj); else @code{do_something()};
+@end example
+A much better definition is
+@example
+#define MARK_OBJECT(obj) do @{ \
+Lisp_Object mo_obj = (obj); \
+if (!marked_p (mo_obj))     \
+@{                         \
+mark_object (mo_obj);   \
+did_mark = 1;           \
+@}                         \
+@} while (0)
+@end example
+Notice the elimination of double evaluation by using the local variable
+with the obscure name.  Writing safe and efficient macros requires great
+care.  The one problem with macros that cannot be portably worked around
+is, since a C block has no value, a macro used as an expression rather
+than a statement cannot use the techniques just described to avoid
+multiple evaluation.
+@cindex inline functions
+In most cases where a macro has function semantics, an inline function
+is a better implementation technique.  Modern compiler optimizers tend
+to inline functions even if they have no @code{inline} keyword, and
+configure magic ensures that the @code{inline} keyword can be safely
+used as an additional compiler hint.  Inline functions used in a single
+.c files are easy.  The function must already be defined to be
+@code{static}.  Just add another @code{inline} keyword to the
+definition.
+@example
+inline static int
+heavily_used_small_function (int arg)
+@{
+...
+@}
+@end example
+Inline functions in header files are trickier, because we would like to
+make the following optimization if the function is @emph{not} inlined
+(for example, because we're compiling for debugging).  We would like the
+function to be defined externally exactly once, and each calling
+translation unit would create an external reference to the function,
+instead of including a definition of the inline function in the object
+code of every translation unit that uses it.  This optimization is
+currently only available for gcc.  But you don't have to worry about the
+trickiness; just define your inline functions in header files using this
+pattern:
+@example
+DECLARE_INLINE_HEADER (
+int
+i_used_to_be_a_crufty_macro_but_look_at_me_now (int arg)
+)
+@{
+...
+@}
+@end example
+We use @code{DECLARE_INLINE_HEADER} rather than just the modifier
+@code{INLINE_HEADER} to prevent warnings when compiling with @code{gcc
+-Wmissing-declarations}.  I consider issuing this warning for inline
+functions a gcc bug, but the gcc maintainers disagree.
+@cindex inline functions, headers
+@cindex header files, inline functions
+Every header which contains inline functions, either directly by using
+@code{DECLARE_INLINE_HEADER} or indirectly by using @code{DECLARE_LRECORD} must
+be added to @file{inline.c}'s includes to make the optimization
+described above work.  (Optimization note: if all INLINE_HEADER
+functions are in fact inlined in all translation units, then the linker
+can just discard @code{inline.o}, since it contains only unreferenced code).
+To get started debugging XEmacs, take a look at the @file{.gdbinit} and
+@file{.dbxrc} files in the @file{src} directory.  See the section in the
+XEmacs FAQ on How to Debug an XEmacs problem with a debugger.
+After making source code changes, run @code{make check} to ensure that
+you haven't introduced any regressions.  If you want to make xemacs more
+reliable, please improve the test suite in @file{tests/automated}.
+Did you make sure you didn't introduce any new compiler warnings?
+Before submitting a patch, please try compiling at least once with
+@example
+configure --with-mule --use-union-type --error-checking=all
+@end example
+Here are things to know when you create a new source file:
+@itemize @bullet
+@item
+All @file{.c} files should @code{#include <config.h>} first.  Almost all
+@file{.c} files should @code{#include "lisp.h"} second.
+@item
+Generated header files should be included using the @samp{#include <...>}
+syntax, not the @samp{#include "..."} syntax.  The generated headers are:
+@file{config.h sheap-adjust.h paths.h Emacs.ad.h}
+The basic rule is that you should assume builds using @samp{--srcdir}
+and the @samp{#include <...>} syntax needs to be used when the
+to-be-included generated file is in a potentially different directory
+@emph{at compile time}.  The non-obvious C rule is that
+@samp{#include "..."} means to search for the included file in the same
+directory as the including file, @emph{not} in the current directory.
+Normally this is not a problem but when building with @samp{--srcdir},
+@file{make} will search the @samp{VPATH} for you, while the C compiler
+knows nothing about it.
+@item
+Header files should @emph{not} include @samp{<config.h>} and
+@samp{"lisp.h"}.  It is the responsibility of the @file{.c} files that
+use it to do so.
+@end itemize
+@cindex Lisp object types, creating
+@cindex creating Lisp object types
+@cindex object types, creating Lisp
+Here is a checklist of things to do when creating a new lisp object type
+named @var{foo}:
+@enumerate
+@item
+create @var{foo}.h
+@item
+create @var{foo}.c
+@item
+add definitions of @code{syms_of_@var{foo}}, etc. to @file{@var{foo}.c}
+@item
+add declarations of @code{syms_of_@var{foo}}, etc. to @file{symsinit.h}
+@item
+add calls to @code{syms_of_@var{foo}}, etc. to @file{emacs.c}
+@item
+add definitions of macros like @code{CHECK_@var{FOO}} and
+@code{@var{FOO}P} to @file{@var{foo}.h}
+@item
+add the new type index to @code{enum lrecord_type}
+@item
+add a DEFINE_LRECORD_IMPLEMENTATION call to @file{@var{foo}.c}
+@item
+add an INIT_LRECORD_IMPLEMENTATION call to @code{syms_of_@var{foo}.c}
+@end enumerate
+@node Regression Testing XEmacs, CVS Techniques, Rules When Writing New C Code, Top
+@chapter Regression Testing XEmacs
+@cindex testing, regression
+@menu
+* How to Regression-Test::
+* Modules for Regression Testing::
+@end menu
+@node How to Regression-Test, Modules for Regression Testing, Regression Testing XEmacs, Regression Testing XEmacs
+@section How to Regression-Test
+@cindex how to regression-test
+@cindex regression-test, how to
+@cindex testing, regression, how to
+The source directory @file{tests/automated} contains XEmacs' automated
+test suite.  The usual way of running all the tests is running
+@code{make check} from the top-level build directory.
+The test suite is unfinished and it's still lacking some essential
+features.  It is nevertheless recommended that you run the tests to
+confirm that XEmacs behaves correctly.
+If you want to run a specific test case, you can do it from the
+command-line like this:
+@example
+$ xemacs -batch -l test-harness.elc -f batch-test-emacs TEST-FILE
+@end example
+If a test fails and you need more information, you can run the test
+suite interactively by loading @file{test-harness.el} into a running
+XEmacs and typing @kbd{M-x test-emacs-test-file RET <filename> RET}.
+You will see a log of passed and failed tests, which should allow you to
+investigate the source of the error and ultimately fix the bug.  If you
+are not capable of, or don't have time for, debugging it yourself,
+please do report the failures using @kbd{M-x report-emacs-bug} or
+@kbd{M-x build-report}.
+@deffn Command test-emacs-test-file file
+Runs the tests in @var{file}.  @file{test-harness.el} must be loaded.
+Defines all the macros described in this node, and undefines them when
+done.
+@end deffn
+Adding a new test file is trivial: just create a new file here and it
+will be run.  There is no need to byte-compile any of the files in
+this directory---the test-harness will take care of any necessary
+byte-compilation.
+Look at the existing test cases for the examples of coding test cases.
+It all boils down to your imagination and judicious use of the macros
+@code{Assert}, @code{Check-Error}, @code{Check-Error-Message}, and
+@code{Check-Message}.  Note that all of these macros are defined only
+for the duration of the test: they do not exist in the global
+environment.
+@deffn Macro Assert expr
+Check that @var{expr} is non-nil at this point in the test.
+@end deffn
+@deffn Macro Check-Error expected-error body
+Check that execution of @var{body} causes @var{expected-error} to be
+signaled.  @var{body} is a @code{progn}-like body, and may contain
+several expressions.  @var{expected-error} is a symbol defined as
+an error by @code{define-error}.
+@end deffn
+@deffn Macro Check-Error-Message expected-error expected-error-regexp body
+Check that execution of @var{body} causes @var{expected-error} to be
+signaled, and generate a message matching @var{expected-error-regexp}.
+@var{body} is a @code{progn}-like body, and may contain several
+expressions.  @var{expected-error} is a symbol defined as an error
+by @code{define-error}.
+@end deffn
+@deffn Macro Check-Message expected-message body
+Check that execution of @var{body} causes @var{expected-message} to be
+generated (using @code{message} or a similar function).  @var{body} is a
+@code{progn}-like body, and may contain several expressions.
+@end deffn
+Here's a simple example checking case-sensitive and case-insensitive
+comparisons from @file{case-tests.el}.
+@example
+(with-temp-buffer
+(insert "Test Buffer")
+(let ((case-fold-search t))
+(goto-char (point-min))
+(Assert (eq (search-forward "test buffer" nil t) 12))
+(goto-char (point-min))
+(Assert (eq (search-forward "Test buffer" nil t) 12))
+(goto-char (point-min))
+(Assert (eq (search-forward "Test Buffer" nil t) 12))
+(setq case-fold-search nil)
+(goto-char (point-min))
+(Assert (not (search-forward "test buffer" nil t)))
+(goto-char (point-min))
+(Assert (not (search-forward "Test buffer" nil t)))
+(goto-char (point-min))
+(Assert (eq (search-forward "Test Buffer" nil t) 12))))
+@end example
+This example could be saved in a file in @file{tests/automated}, and it
+would constitute a complete test, automatically executed when you run
+@kbd{make check} after building XEmacs.  More complex tests may require
+substantial temporary scaffolding to create the environment that elicits
+the bugs, but the top-level @file{Makefile} and @file{test-harness.el}
+handle the running and collection of results from the @code{Assert},
+@code{Check-Error}, @code{Check-Error-Message}, and @code{Check-Message}
+macros.
+Don't suppress tests just because they're due to known bugs not yet
+fixed---use the @code{Known-Bug-Expect-Failure} wrapper macro to mark
+them.
+@deffn Macro Known-Bug-Expect-Failure body
+Arrange for failing tests in @var{body} to generate messages prefixed
+with "KNOWN BUG:" instead of "FAIL:".  @var{body} is a @code{progn}-like
+body, and may contain several tests.
+@end deffn
+A lot of the tests we run push limits; suppress Ebola warning messages
+with the @code{Ignore-Ebola} wrapper macro.
+@deffn Macro Ignore-Ebola body
+Suppress Ebola warning messages while running tests in @var{body}.
+@var{body} is a @code{progn}-like body, and may contain several tests.
+@end deffn
+Both macros are defined temporarily within the test function.  Simple
+examples:
+@example
+;; Apparently Ignore-Ebola is a solution with no problem to address.
+;; There are no examples in 21.5, anyway.
+;; from regexp-tests.el
+(Known-Bug-Expect-Failure
+(Assert (not (string-match "\\b" "")))
+(Assert (not (string-match " \\b" " "))))
+@end example
+In general, you should avoid using functionality from packages in your
+tests, because you can't be sure that everyone will have the required
+package.  However, if you've got a test that works, by all means add it.
+Simply wrap the test in an appropriate test, add a notice that the test
+was skipped, and update the @code{skipped-test-reasons} hashtable.  The
+wrapper macro @code{Skip-Test-Unless} is provided to handle common
+cases.
+@defvar skipped-test-reasons
+Hash table counting the number of times a particular reason is given for
+skipping tests.  This is only defined within @code{test-emacs-test-file}.
+@end defvar
+@deffn Macro Skip-Test-Unless prerequisite reason description body
+@var{prerequisite} is usually a feature test (@code{featurep},
+@code{boundp}, @code{fboundp}).  @var{reason} is a string describing the
+prerequisite; it must be unique because it is used as a hash key in a
+table of reasons for skipping tests.  @var{description} describes the
+tests being skipped, for the test result summary.  @var{body} is a
+@code{progn}-like body, and may contain several tests.
+@end deffn
+@code{Skip-Test-Unless} is defined temporarily within the test function.
+Here's an example of usage from @file{syntax-tests.el}:
+@example
+;; Test forward-comment at buffer boundaries
+(with-temp-buffer
+;; try to use exactly what you need: featurep, boundp, fboundp
+(Skip-Test-Unless (fboundp 'c-mode)
+"c-mode unavailable"
+"comment and parse-partial-sexp tests"
+;; and here's the test code
+(c-mode)
+(insert "// comment\n")
+(forward-comment -2)
+(Assert (eq (point) (point-min)))
+(let ((point (point)))
+(insert "/* comment */")
+(goto-char point)
+(forward-comment 2)
+(Assert (eq (point) (point-max)))
+(parse-partial-sexp point (point-max)))))
+@end example
+@code{Skip-Test-Unless} is intended for use with features that are normally
+present in typical configurations.  For truly optional features, or
+tests that apply to one of several alternative implementations (eg, to
+GTK widgets, but not Athena, Motif, MS Windows, or Carbon), simply
+silently suppress the test if the feature is not available.
+Here are a few general hints for writing tests.
+@enumerate
+@item
+Include related successful cases.  Fixes often break something.
+@item
+Use the Known-Bug-Expect-Failure macro to mark the cases you know
+are going to fail.  We want to be able to distinguish between
+regressions and other unexpected failures, and cases that have
+been (partially) analyzed but not yet repaired.
+@item
+Mark the bug with the date of report.  An ``Unfixed since yyyy-mm-dd''
+gloss for Known-Bug-Expect-Failure is planned to further increase
+developer embarrassment (== incentive to fix the bug), but until then at
+least put a comment about the date so we can easily see when it was
+first reported.
+@item
+It's a matter of your judgement, but you should often use generic tests
+(@emph{e.g.}, @code{eq}) instead of more specific tests (@code{=} for
+numbers) even though you know that arguments ``should'' be of correct
+type.  That is, if the functions used can return generic objects
+(typically @code{nil}), as well as some more specific type that will be
+returned on success.  We don't want failures of those assertions
+reported as ``other failures'' (a wrong-type-arg signal, rather than a
+null return), we want them reported as ``assertion failures.''
+One example is a test that tests @code{(= (string-match this that) 0)},
+expecting a successful match.  Now suppose @code{string-match} is broken
+such that the match fails.  Then it will return @code{nil}, and @code{=}
+will signal ``wrong-type-argument, number-char-or-marker-p, nil'',
+generating an ``other failure'' in the report.  But this should be
+reported as an assertion failure (the test failed in a foreseeable way),
+rather than something else (we don't know what happened because XEmacs
+is broken in a way that we weren't trying to test!)
+@end enumerate
+@node Modules for Regression Testing,  , How to Regression-Test, Regression Testing XEmacs
+@section Modules for Regression Testing
+@cindex modules for regression testing
+@cindex regression testing, modules for
+@example
+@file{test-harness.el}
+@file{base64-tests.el}
+@file{byte-compiler-tests.el}
+@file{case-tests.el}
+@file{ccl-tests.el}
+@file{c-tests.el}
+@file{database-tests.el}
+@file{extent-tests.el}
+@file{hash-table-tests.el}
+@file{lisp-tests.el}
+@file{md5-tests.el}
+@file{mule-tests.el}
+@file{regexp-tests.el}
+@file{symbol-tests.el}
+@file{syntax-tests.el}
+@file{tag-tests.el}
+@file{weak-tests.el}
+@end example
+@file{test-harness.el} defines the macros @code{Assert},
+@code{Check-Error}, @code{Check-Error-Message}, and
+@code{Check-Message}.  The other files are test files, testing various
+XEmacs facilities.  @xref{Regression Testing XEmacs}.
+@node CVS Techniques, XEmacs from the Inside, Regression Testing XEmacs, Top
+@chapter CVS Techniques
+@cindex CVS techniques
+@menu
+* Merging a Branch into the Trunk::
+@end menu
+@node Merging a Branch into the Trunk,  , CVS Techniques, CVS Techniques
+@section Merging a Branch into the Trunk
+@cindex merging a branch into the trunk
+@enumerate
+@item
+If you haven't already done a merge, you will be merging from the branch
+point; otherwise you'll be merging from the last merge point, which
+should be marked by a tag, e.g. @samp{last-sync-ben-mule-21-5}.  In the
+former case, create the last-sync tag, e.g.
+@example
+crw rtag -r ben-mule-21-5-bp last-sync-ben-mule-21-5 xemacs
+@end example
+(You did create a branch point tag when you created the branch, didn't
+you?)
+@item
+Check everything in on your branch.
+@item
+Tag your branch with a pre-sync tag, e.g.
+@example
+crw rtag -r ben-mule-21-5 ben-mule-21-5-pre-feb-20-2002-sync xemacs
+@end example
+Note, you need to use rtag and specify a version with @samp{-r} (use
+@samp{-r HEAD} if necessary) so that removed files are handled correctly
+in some obscure cases.  See section 4.8 of the CVS manual.
+@item
+Tag the trunk so you have a stable place to merge up to in case people
+are asynchronously committing to the trunk, e.g.
+@example
+crw rtag -r HEAD main-branch-ben-mule-21-5-syncpoint-feb-20-2002 xemacs
+crw rtag -F -r main-branch-ben-mule-21-5-syncpoint-feb-20-2002 next-sync-ben-mule-21-5 xemacs
+@end example
+Use -F in the second case because the name might already exist, e.g. if
+you've already done a merge.  We make two tags because one is a
+permanent mark indicating a syncpoint when merging, and the other is a
+symbolic tag to make other operations easier.
+@item
+Make a backup of your source tree (not totally necessary but useful for
+reference and peace of mind): Move one level up from the top directory
+of your branch and do, e.g.
+@example
+cp -a mule mule-backup-2-23-02
+@end example
+@item
+Now, we're ready to merge!  Make sure you're in the top directory of
+your branch and do, e.g.
+@example
+cvs update -j last-sync-ben-mule-21-5 -j next-sync-ben-mule-21-5
+@end example
+@item
+Fix all merge conflicts.  Get the sucker to compile and run.
+@item
+Tag your branch with a post-sync tag, e.g.
+@example
+crw rtag -r ben-mule-21-5 ben-mule-21-5-post-feb-20-2002-sync xemacs
+@end example
+@item
+Update the last-sync tag, e.g.
+@example
+crw rtag -F -r next-sync-ben-mule-21-5 last-sync-ben-mule-21-5 xemacs
+@end example
+@end enumerate
+@node XEmacs from the Inside, The XEmacs Object System (Abstractly Speaking), CVS Techniques, Top
+@chapter XEmacs from the Inside
+@cindex XEmacs from the inside
+@cindex inside, XEmacs from the
+Internally, XEmacs is quite complex, and can be very confusing.  To
+simplify things, it can be useful to think of XEmacs as containing an
+event loop that ``drives'' everything, and a number of other subsystems,
+such as a Lisp engine and a redisplay mechanism.  Each of these other
+subsystems exists simultaneously in XEmacs, and each has a certain
+state.  The flow of control continually passes in and out of these
+different subsystems in the course of normal operation of the editor.
+It is important to keep in mind that, most of the time, the editor is
+``driven'' by the event loop.  Except during initialization and batch
+mode, all subsystems are entered directly or indirectly through the
+event loop, and ultimately, control exits out of all subsystems back up
+to the event loop.  This cycle of entering a subsystem, exiting back out
+to the event loop, and starting another iteration of the event loop
+occurs once each keystroke, mouse motion, etc.
+If you're trying to understand a particular subsystem (other than the
+event loop), think of it as a ``daemon'' process or ``servant'' that is
+responsible for one particular aspect of a larger system, and
+periodically receives commands or environment changes that cause it to
+do something.  Ultimately, these commands and environment changes are
+always triggered by the event loop.  For example:
+@itemize @bullet
+@item
+The window and frame mechanism is responsible for keeping track of what
+windows and frames exist, what buffers are in them, etc.  It is
+periodically given commands (usually from the user) to make a change to
+the current window/frame state: i.e. create a new frame, delete a
+window, etc.
+@item
+The buffer mechanism is responsible for keeping track of what buffers
+exist and what text is in them.  It is periodically given commands
+(usually from the user) to insert or delete text, create a buffer, etc.
+When it receives a text-change command, it notifies the redisplay
+mechanism.
+@item
+The redisplay mechanism is responsible for making sure that windows and
+frames are displayed correctly.  It is periodically told (by the event
+loop) to actually ``do its job'', i.e. snoop around and see what the
+current state of the environment (mostly of the currently-existing
+windows, frames, and buffers) is, and make sure that state matches
+what's actually displayed.  It keeps lots and lots of information around
+(such as what is actually being displayed currently, and what the
+environment was last time it checked) so that it can minimize the work
+it has to do.  It is also helped along in that whenever a relevant
+change to the environment occurs, the redisplay mechanism is told about
+this, so it has a pretty good idea of where it has to look to find
+possible changes and doesn't have to look everywhere.
+@item
+The Lisp engine is responsible for executing the Lisp code in which most
+user commands are written.  It is entered through a call to @code{eval}
+or @code{funcall}, which occurs as a result of dispatching an event from
+the event loop.  The functions it calls issue commands to the buffer
+mechanism, the window/frame subsystem, etc.
+@item
+The Lisp allocation subsystem is responsible for keeping track of Lisp
+objects.  It is given commands from the Lisp engine to allocate objects,
+garbage collect, etc.
+@end itemize
+etc.
+The important idea here is that there are a number of independent
+subsystems each with its own responsibility and persistent state, just
+like different employees in a company, and each subsystem is
+periodically given commands from other subsystems.  Commands can flow
+from any one subsystem to any other, but there is usually some sort of
+hierarchy, with all commands originating from the event subsystem.
+XEmacs is entered in @code{main()}, which is in @file{emacs.c}.  When
+this is called the first time (in a properly-invoked @file{temacs}), it
+does the following:
+@enumerate
+@item
+It does some very basic environment initializations, such as determining
+where it and its directories (e.g. @file{lisp/} and @file{etc/}) reside
+and setting up signal handlers.
+@item
+It initializes the entire Lisp interpreter.
+@item
+It sets the initial values of many built-in variables (including many
+variables that are visible to Lisp programs), such as the global keymap
+object and the built-in faces (a face is an object that describes the
+display characteristics of text).  This involves creating Lisp objects
+and thus is dependent on step (2).
+@item
+It performs various other initializations that are relevant to the
+particular environment it is running in, such as retrieving environment
+variables, determining the current date and the user who is running the
+program, examining its standard input, creating any necessary file
+descriptors, etc.
+@item
+At this point, the C initialization is complete.  A Lisp program that
+was specified on the command line (usually @file{loadup.el}) is called
+(temacs is normally invoked as @code{temacs -batch -l loadup.el dump}).
+@file{loadup.el} loads all of the other Lisp files that are needed for
+the operation of the editor, calls the @code{dump-emacs} function to
+write out @file{xemacs}, and then kills the temacs process.
+@end enumerate
+When @file{xemacs} is then run, it only redoes steps (1) and (4)
+above; all variables already contain the values they were set to when
+the executable was dumped, and all memory that was allocated with
+@code{malloc()} is still around. (XEmacs knows whether it is being run
+as @file{xemacs} or @file{temacs} because it sets the global variable
+@code{initialized} to 1 after step (4) above.) At this point,
+@file{xemacs} calls a Lisp function to do any further initialization,
+which includes parsing the command-line (the C code can only do limited
+command-line parsing, which includes looking for the @samp{-batch} and
+@samp{-l} flags and a few other flags that it needs to know about before
+initialization is complete), creating the first frame (or @dfn{window}
+in standard window-system parlance), running the user's init file
+(usually the file @file{.emacs} in the user's home directory), etc.  The
+function to do this is usually called @code{normal-top-level};
+@file{loadup.el} tells the C code about this function by setting its
+name as the value of the Lisp variable @code{top-level}.
+When the Lisp initialization code is done, the C code enters the event
+loop, and stays there for the duration of the XEmacs process.  The code
+for the event loop is contained in @file{cmdloop.c}, and is called
+@code{Fcommand_loop_1()}.  Note that this event loop could very well be
+written in Lisp, and in fact a Lisp version exists; but apparently,
+doing this makes XEmacs run noticeably slower.
+Notice how much of the initialization is done in Lisp, not in C.
+In general, XEmacs tries to move as much code as is possible
+into Lisp.  Code that remains in C is code that implements the
+Lisp interpreter itself, or code that needs to be very fast, or
+code that needs to do system calls or other such stuff that
+needs to be done in C, or code that needs to have access to
+``forbidden'' structures. (One conscious aspect of the design of
+Lisp under XEmacs is a clean separation between the external
+interface to a Lisp object's functionality and its internal
+implementation.  Part of this design is that Lisp programs
+are forbidden from accessing the contents of the object other
+than through using a standard API.  In this respect, XEmacs Lisp
+is similar to modern Lisp dialects but differs from GNU Emacs,
+which tends to expose the implementation and allow Lisp
+programs to look at it directly.  The major advantage of
+hiding the implementation is that it allows the implementation
+to be redesigned without affecting any Lisp programs, including
+those that might want to be ``clever'' by looking directly at
+the object's contents and possibly manipulating them.)
+Moving code into Lisp makes the code easier to debug and maintain and
+makes it much easier for people who are not XEmacs developers to
+customize XEmacs, because they can make a change with much less chance
+of obscure and unwanted interactions occurring than if they were to
+change the C code.
+@node The XEmacs Object System (Abstractly Speaking), How Lisp Objects Are Represented in C, XEmacs from the Inside, Top
+@chapter The XEmacs Object System (Abstractly Speaking)
+@cindex XEmacs object system (abstractly speaking), the
+@cindex object system (abstractly speaking), the XEmacs
+At the heart of the Lisp interpreter is its management of objects.
+XEmacs Lisp contains many built-in objects, some of which are
+simple and others of which can be very complex; and some of which
+are very common, and others of which are rarely used or are only
+used internally. (Since the Lisp allocation system, with its
+automatic reclamation of unused storage, is so much more convenient
+than @code{malloc()} and @code{free()}, the C code makes extensive use of it
+in its internal operations.)
+The basic Lisp objects are
+@table @code
+@item integer
+31 bits of precision, or 63 bits on 64-bit machines; the
+reason for this is described below when the internal Lisp object
+representation is described.
+@item char
+An object representing a single character of text; chars behave like
+integers in many ways but are logically considered text rather than
+numbers and have a different read syntax. (the read syntax for a char
+contains the char itself or some textual encoding of it---for example,
+a Japanese Kanji character might be encoded as @samp{^[$(B#&^[(B} using the
+ISO-2022 encoding standard---rather than the numerical representation
+of the char; this way, if the mapping between chars and integers
+changes, which is quite possible for Kanji characters and other extended
+characters, the same character will still be created.  Note that some
+primitives confuse chars and integers.  The worst culprit is @code{eq},
+which makes a special exception and considers a char to be @code{eq} to
+its integer equivalent, even though in no other case are objects of two
+different types @code{eq}.  The reason for this monstrosity is
+compatibility with existing code; the separation of char from integer
+came fairly recently.)
+@item float
+Same precision as a double in C.
+@item bignum
+@itemx ratio
+@itemx bigfloat
+As build-time options, arbitrary-precision numbers are available.
+Bignums are integers, and when available they remove the restriction on
+buffer size.  Ratios are non-integral rational numbers.  Bigfloats are
+arbitrary-precision floating point numbers, with precision specified at
+runtime.
+@item symbol
+An object that contains Lisp objects and is referred to by name;
+symbols are used to implement variables and named functions
+and to provide the equivalent of preprocessor constants in C.
+@item string
+Self-explanatory; behaves much like a vector of chars
+but has a different read syntax and is stored and manipulated
+more compactly.
+@item bit-vector
+A vector of bits; similar to a string in spirit.
+@item vector
+A one-dimensional array of Lisp objects providing constant-time access
+to any of the objects; access to an arbitrary object in a vector is
+faster than for lists, but the operations that can be done on a vector
+are more limited.
+@item compiled-function
+An object containing compiled Lisp code, known as @dfn{byte code}.
+@item subr
+A Lisp primitive, i.e. a Lisp-callable function implemented in C.
+@item cons
+A simple container for two Lisp objects, used to implement lists and
+most other data structures in Lisp.
+@end table
+Objects which are not conses are called atoms.
+@cindex closure
+Note that there is no basic ``function'' type, as in more powerful
+versions of Lisp (where it's called a @dfn{closure}).  XEmacs Lisp does
+not provide the closure semantics implemented by Common Lisp and Scheme.
+The guts of a function in XEmacs Lisp are represented in one of four
+ways: a symbol specifying another function (when one function is an
+alias for another), a list (whose first element must be the symbol
+@code{lambda}) containing the function's source code, a
+compiled-function object, or a subr object. (In other words, given a
+symbol specifying the name of a function, calling @code{symbol-function}
+to retrieve the contents of the symbol's function cell will return one
+of these types of objects.)
+XEmacs Lisp also contains numerous specialized objects used to implement
+the editor:
+@table @code
+@item buffer
+Stores text like a string, but is optimized for insertion and deletion
+and has certain other properties that can be set.
+@item frame
+An object with various properties whose displayable representation is a
+@dfn{window} in window-system parlance.
+@item window
+A section of a frame that displays the contents of a buffer;
+often called a @dfn{pane} in window-system parlance.
+@item window-configuration
+An object that represents a saved configuration of windows in a frame.
+@item device
+An object representing a screen on which frames can be displayed;
+equivalent to a @dfn{display} in the X Window System and a @dfn{TTY} in
+character mode.
+@item face
+An object specifying the appearance of text or graphics; it has
+properties such as font, foreground color, and background color.
+@item marker
+An object that refers to a particular position in a buffer and moves
+around as text is inserted and deleted to stay in the same relative
+position to the text around it.
+@item extent
+Similar to a marker but covers a range of text in a buffer; can also
+specify properties of the text, such as a face in which the text is to
+be displayed, whether the text is invisible or unmodifiable, etc.
+@item event
+Generated by calling @code{next-event} and contains information
+describing a particular event happening in the system, such as the user
+pressing a key or a process terminating.
+@item keymap
+An object that maps from events (described using lists, vectors, and
+symbols rather than with an event object because the mapping is for
+classes of events, rather than individual events) to functions to
+execute or other events to recursively look up; the functions are
+described by name, using a symbol, or using lists to specify the
+function's code.
+@item glyph
+An object that describes the appearance of an image (e.g.  pixmap) on
+the screen; glyphs can be attached to the beginning or end of extents
+and in some future version of XEmacs will be able to be inserted
+directly into a buffer.
+@item process
+An object that describes a connection to an externally-running process.
+@end table
+There are some other, less-commonly-encountered general objects:
+@table @code
+@item hash-table
+An object that maps from an arbitrary Lisp object to another arbitrary
+Lisp object, using hashing for fast lookup.
+@item obarray
+A limited form of hash-table that maps from strings to symbols; obarrays
+are used to look up a symbol given its name and are not actually their
+own object type but are kludgily represented using vectors with hidden
+fields (this representation derives from GNU Emacs).
+@item specifier
+A complex object used to specify the value of a display property; a
+default value is given and different values can be specified for
+particular frames, buffers, windows, devices, or classes of device.
+@item char-table
+An object that maps from chars or classes of chars to arbitrary Lisp
+objects; internally char tables use a complex nested-vector
+representation that is optimized to the way characters are represented
+as integers.
+@item range-table
+An object that maps from ranges of integers to arbitrary Lisp objects.
+@end table
+And some strange special-purpose objects:
+@table @code
+@item charset
+@itemx coding-system
+Objects used when MULE, or multi-lingual/Asian-language, support is
+enabled.
+@item color-instance
+@itemx font-instance
+@itemx image-instance
+An object that encapsulates a window-system resource; instances are
+mostly used internally but are exposed on the Lisp level for cleanness
+of the specifier model and because it's occasionally useful for Lisp
+program to create or query the properties of instances.
+@item subwindow
+An object that encapsulate a @dfn{subwindow} resource, i.e. a
+window-system child window that is drawn into by an external process;
+this object should be integrated into the glyph system but isn't yet,
+and may change form when this is done.
+@item tooltalk-message
+@itemx tooltalk-pattern
+Objects that represent resources used in the ToolTalk interprocess
+communication protocol.
+@item toolbar-button
+An object used in conjunction with the toolbar.
+@end table
+And objects that are only used internally:
+@table @code
+@item opaque
+A generic object for encapsulating arbitrary memory; this allows you the
+generality of @code{malloc()} and the convenience of the Lisp object
+system.
+@item lstream
+A buffering I/O stream, used to provide a unified interface to anything
+that can accept output or provide input, such as a file descriptor, a
+stdio stream, a chunk of memory, a Lisp buffer, a Lisp string, etc.;
+it's a Lisp object to make its memory management more convenient.
+@item char-table-entry
+Subsidiary objects in the internal char-table representation.
+@item extent-auxiliary
+@itemx menubar-data
+@itemx toolbar-data
+Various special-purpose objects that are basically just used to
+encapsulate memory for particular subsystems, similar to the more
+general ``opaque'' object.
+@item symbol-value-forward
+@itemx symbol-value-buffer-local
+@itemx symbol-value-varalias
+@itemx symbol-value-lisp-magic
+Special internal-only objects that are placed in the value cell of a
+symbol to indicate that there is something special with this variable --
+e.g. it has no value, it mirrors another variable, or it mirrors some C
+variable; there is really only one kind of object, called a
+@dfn{symbol-value-magic}, but it is sort-of halfway kludged into
+semi-different object types.
+@end table
+@cindex permanent objects
+@cindex temporary objects
+Some types of objects are @dfn{permanent}, meaning that once created,
+they do not disappear until explicitly destroyed, using a function such
+as @code{delete-buffer}, @code{delete-window}, @code{delete-frame}, etc.
+Others will disappear once they are not longer used, through the garbage
+collection mechanism.  Buffers, frames, windows, devices, and processes
+are among the objects that are permanent.  Note that some objects can go
+both ways: Faces can be created either way; extents are normally
+permanent, but detached extents (extents not referring to any text, as
+happens to some extents when the text they are referring to is deleted)
+are temporary.  Note that some permanent objects, such as faces and
+coding systems, cannot be deleted.  Note also that windows are unique in
+that they can be @emph{undeleted} after having previously been
+deleted. (This happens as a result of restoring a window configuration.)
+@cindex read syntax
+Many types of objects have a @dfn{read syntax}, i.e. a way of
+specifying an object of that type in Lisp code.  When you load a Lisp
+file, or type in code to be evaluated, what really happens is that the
+function @code{read} is called, which reads some text and creates an object
+based on the syntax of that text; then @code{eval} is called, which
+possibly does something special; then this loop repeats until there's
+no more text to read. (@code{eval} only actually does something special
+with symbols, which causes the symbol's value to be returned,
+similar to referencing a variable; and with conses [i.e. lists],
+which cause a function invocation.  All other values are returned
+unchanged.)
+The read syntax
+@example
+17297
+@end example
+converts to an integer whose value is 17297.
+@example
+355/113
+@end example
+converts to a ratio commonly used to approximate @emph{pi} when ratios
+are configured, and otherwise to a symbol whose name is ``355/113'' (for
+backward compatibility).
+@example
+1.983e-4
+@end example
+converts to a float whose value is 1.983e-4, or .0001983.
+@example
+?b
+@end example
+converts to a char that represents the lowercase letter b.
+@example
+?^[$(B#&^[(B
+@end example
+(where @samp{^[} actually is an @samp{ESC} character) converts to a
+particular Kanji character when using an ISO2022-based coding system for
+input. (To decode this goo: @samp{ESC} begins an escape sequence;
+@samp{ESC $ (} is a class of escape sequences meaning ``switch to a
+94x94 character set''; @samp{ESC $ ( B} means ``switch to Japanese
+Kanji''; @samp{#} and @samp{&} collectively index into a 94-by-94 array
+of characters [subtract 33 from the ASCII value of each character to get
+the corresponding index]; @samp{ESC (} is a class of escape sequences
+meaning ``switch to a 94 character set''; @samp{ESC (B} means ``switch
+to US ASCII''.  It is a coincidence that the letter @samp{B} is used to
+denote both Japanese Kanji and US ASCII.  If the first @samp{B} were
+replaced with an @samp{A}, you'd be requesting a Chinese Hanzi character
+from the GB2312 character set.)
+@example
+"foobar"
+@end example
+converts to a string.
+@example
+foobar
+@end example
+converts to a symbol whose name is @code{"foobar"}.  This is done by
+looking up the string equivalent in the global variable
+@code{obarray}, whose contents should be an obarray.  If no symbol
+is found, a new symbol with the name @code{"foobar"} is automatically
+created and added to @code{obarray}; this process is called
+@dfn{interning} the symbol.
+@cindex interning
+@example
+(foo . bar)
+@end example
+converts to a cons cell containing the symbols @code{foo} and @code{bar}.
+@example
+(1 a 2.5)
+@end example
+converts to a three-element list containing the specified objects
+(note that a list is actually a set of nested conses; see the
+XEmacs Lisp Reference).
+@example
+[1 a 2.5]
+@end example
+converts to a three-element vector containing the specified objects.
+@example
+#[... ... ... ...]
+@end example
+converts to a compiled-function object (the actual contents are not
+shown since they are not relevant here; look at a file that ends with
+@file{.elc} for examples).
+@example
+#*01110110
+@end example
+converts to a bit-vector.
+@example
+#s(hash-table ... ...)
+@end example
+converts to a hash table (the actual contents are not shown).
+@example
+#s(range-table ... ...)
+@end example
+converts to a range table (the actual contents are not shown).
+@example
+#s(char-table ... ...)
+@end example
+converts to a char table (the actual contents are not shown).
+Note that the @code{#s()} syntax is the general syntax for structures,
+which are not really implemented in XEmacs Lisp but should be.
+When an object is printed out (using @code{print} or a related
+function), the read syntax is used, so that the same object can be read
+in again.
+The other objects do not have read syntaxes, usually because it does not
+really make sense to create them in this fashion (i.e.  processes, where
+it doesn't make sense to have a subprocess created as a side effect of
+reading some Lisp code), or because they can't be created at all
+(e.g. subrs).  Permanent objects, as a rule, do not have a read syntax;
+nor do most complex objects, which contain too much state to be easily
+initialized through a read syntax.
+@node How Lisp Objects Are Represented in C, Allocation of Objects in XEmacs Lisp, The XEmacs Object System (Abstractly Speaking), Top
+@chapter How Lisp Objects Are Represented in C
+@cindex Lisp objects are represented in C, how
+@cindex objects are represented in C, how Lisp
+@cindex represented in C, how Lisp objects are
+Lisp objects are represented in C using a 32-bit or 64-bit machine word
+(depending on the processor; i.e. DEC Alphas use 64-bit Lisp objects and
+most other processors use 32-bit Lisp objects).  The representation
+stuffs a pointer together with a tag, as follows:
+@example
+[ 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 ]
+[ 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 ]
+<---------------------------------------------------------> <->
+a pointer to a structure, or an integer            tag
+@end example
+A tag of 00 is used for all pointer object types, a tag of 10 is used
+for characters, and the other two tags 01 and 11 are joined together to
+form the integer object type.  This representation gives us 31 bit
+integers and 30 bit characters, while pointers are represented directly
+without any bit masking or shifting.  This representation, though,
+assumes that pointers to structs are always aligned to multiples of 4,
+so the lower 2 bits are always zero.
+Lisp objects use the typedef @code{Lisp_Object}, but the actual C type
+used for the Lisp object can vary.  It can be either a simple type
+(@code{long} on the DEC Alpha, @code{int} on other machines) or a
+structure whose fields are bit fields that line up properly (actually, a
+union of structures is used).  Generally the simple integral type is
+preferable because it ensures that the compiler will actually use a
+machine word to represent the object (some compilers will use more
+general and less efficient code for unions and structs even if they can
+fit in a machine word).  The union type, however, has the advantage of
+stricter type checking.  If you accidentally pass an integer where a Lisp
+object is desired, you get a compile error.  The choice of which type
+to use is determined by the preprocessor constant @code{USE_UNION_TYPE}
+which is defined via the @code{--use-union-type} option to
+@code{configure}.
+Various macros are used to convert between Lisp_Objects and the
+corresponding C type.  Macros of the form @code{XINT()}, @code{XCHAR()},
+@code{XSTRING()}, @code{XSYMBOL()}, do any required bit shifting and/or
+masking and cast it to the appropriate type.  @code{XINT()} needs to be
+a bit tricky so that negative numbers are properly sign-extended.  Since
+integers are stored left-shifted, if the right-shift operator does an
+arithmetic shift (i.e. it leaves the most-significant bit as-is rather
+than shifting in a zero, so that it mimics a divide-by-two even for
+negative numbers) the shift to remove the tag bit is enough.  This is
+the case on all the systems we support.
+Note that when @code{ERROR_CHECK_TYPECHECK} is defined, the converter
+macros become more complicated---they check the tag bits and/or the
+type field in the first four bytes of a record type to ensure that the
+object is really of the correct type.  This is great for catching places
+where an incorrect type is being dereferenced---this typically results
+in a pointer being dereferenced as the wrong type of structure, with
+unpredictable (and sometimes not easily traceable) results.
+There are similar @code{XSET@var{TYPE}()} macros that construct a Lisp
+object.  These macros are of the form @code{XSET@var{TYPE}
+(@var{lvalue}, @var{result})}, i.e. they have to be a statement rather
+than just used in an expression.  The reason for this is that standard C
+doesn't let you ``construct'' a structure (but GCC does).  Granted, this
+sometimes isn't too convenient; for the case of integers, at least, you
+can use the function @code{make_int()}, which constructs and
+@emph{returns} an integer Lisp object.  Note that the
+@code{XSET@var{TYPE}()} macros are also affected by
+@code{ERROR_CHECK_TYPECHECK} and make sure that the structure is of the
+right type in the case of record types, where the type is contained in
+the structure.
+The C programmer is responsible for @strong{guaranteeing} that a
+Lisp_Object is the correct type before using the @code{X@var{TYPE}}
+macros.  This is especially important in the case of lists.  Use
+@code{XCAR} and @code{XCDR} if a Lisp_Object is certainly a cons cell,
+else use @code{Fcar()} and @code{Fcdr()}.  Trust other C code, but not
+Lisp code.  On the other hand, if XEmacs has an internal logic error,
+it's better to crash immediately, so sprinkle @code{assert()}s and
+``unreachable'' @code{abort()}s liberally about the source code.  Where
+performance is an issue, use @code{type_checking_assert},
+@code{bufpos_checking_assert}, and @code{gc_checking_assert}, which do
+nothing unless the corresponding configure error checking flag was
+specified.
+@node Allocation of Objects in XEmacs Lisp, The Lisp Reader and Compiler, How Lisp Objects Are Represented in C, Top
 @chapter Allocation of Objects in XEmacs Lisp
 @cindex allocation of objects in XEmacs Lisp
 @cindex objects in XEmacs Lisp, allocation of
 @cindex Lisp objects, allocation of in XEmacs
 @cindex function, compiled
 Not yet documented.
-@node Dumping, Events and the Event Loop, Allocation of Objects in XEmacs Lisp, Top
+@node The Lisp Reader and Compiler, Evaluation; Stack Frames; Bindings, Allocation of Objects in XEmacs Lisp, Top
-@chapter Dumping
+@chapter The Lisp Reader and Compiler
-@cindex dumping
+@cindex Lisp reader and compiler, the
+@cindex reader and compiler, the Lisp
+@cindex compiler, the Lisp reader and
+Not yet documented.
+@node Evaluation; Stack Frames; Bindings, Symbols and Variables, The Lisp Reader and Compiler, Top
+@chapter Evaluation; Stack Frames; Bindings
+@cindex evaluation; stack frames; bindings
+@cindex stack frames; bindings, evaluation;
+@cindex bindings, evaluation; stack frames;
 @menu
-* Dumping Justification::
+* Evaluation::
-* Overview::
+* Dynamic Binding; The specbinding Stack; Unwind-Protects::
-* Data descriptions::
+* Simple Special Forms::
-* Dumping phase::
+* Catch and Throw::
-* Reloading phase::
+* Error Trapping::
-* Remaining issues::
 @end menu
-@node Dumping Justification, Overview, Dumping, Dumping
+@node Evaluation, Dynamic Binding; The specbinding Stack; Unwind-Protects, Evaluation; Stack Frames; Bindings, Evaluation; Stack Frames; Bindings
-@section Dumping Justification
+@section Evaluation
-@cindex dumping, justification
+@cindex evaluation
-The C code of XEmacs is just a Lisp engine with a lot of built-in
+@code{Feval()} evaluates the form (a Lisp object) that is passed to
-primitives useful for writing an editor.  The editor itself is written
+it.  Note that evaluation is only non-trivial for two types of objects:
-mostly in Lisp, and represents around 100K lines of code.  Loading and
+symbols and conses.  A symbol is evaluated simply by calling
-executing the initialization of all this code takes a bit a time (five
+@code{symbol-value} on it and returning the value.
-to ten times the usual startup time of current xemacs) and requires
-having all the lisp source files around.  Having to reload them each
+Evaluating a cons means calling a function.  First, @code{eval} checks
-time the editor is started would not be acceptable.
+to see if garbage-collection is necessary, and calls
+@code{garbage_collect_1()} if so.  It then increases the evaluation
-The traditional solution to this problem is called dumping: the build
+depth by 1 (@code{lisp_eval_depth}, which is always less than
-process first creates the lisp engine under the name @file{temacs}, then
+@code{max_lisp_eval_depth}) and adds an element to the linked list of
-runs it until it has finished loading and initializing all the lisp
+@code{struct backtrace}'s (@code{backtrace_list}).  Each such structure
-code, and eventually creates a new executable called @file{xemacs}
+contains a pointer to the function being called plus a list of the
-including both the object code in @file{temacs} and all the contents of
+function's arguments.  Originally these values are stored unevalled, and
-the memory after the initialization.
+as they are evaluated, the backtrace structure is updated.  Garbage
+collection pays attention to the objects pointed to in the backtrace
-This solution, while working, has a huge problem: the creation of the
+structures (garbage collection might happen while a function is being
-new executable from the actual contents of memory is an extremely
+called or while an argument is being evaluated, and there could easily
-system-specific process, quite error-prone, and which interferes with a
+be no other references to the arguments in the argument list; once an
-lot of system libraries (like malloc).  It is even getting worse
+argument is evaluated, however, the unevalled version is not needed by
-nowadays with libraries using constructors which are automatically
+eval, and so the backtrace structure is changed).
-called when the program is started (even before @code{main()}) which tend to
-crash when they are called multiple times, once before dumping and once
+At this point, the function to be called is determined by looking at
-after (IRIX 6.x @file{libz.so} pulls in some C++ image libraries thru
+the car of the cons (if this is a symbol, its function definition is
-dependencies which have this problem).  Writing the dumper is also one
+retrieved and the process repeated).  The function should then consist
-of the most difficult parts of porting XEmacs to a new operating system.
+of either a @code{Lisp_Subr} (built-in function written in C), a
-Basically, `dumping' is an operation that is just not officially
+@code{Lisp_Compiled_Function} object, or a cons whose car is one of the
-supported on many operating systems.
+symbols @code{autoload}, @code{macro} or @code{lambda}.
-The aim of the portable dumper is to solve the same problem as the
+If the function is a @code{Lisp_Subr}, the lisp object points to a
-system-specific dumper, that is to be able to reload quickly, using only
+@code{struct Lisp_Subr} (created by @code{DEFUN()}), which contains a
-a small number of files, the fully initialized lisp part of the editor,
+pointer to the C function, a minimum and maximum number of arguments
-without any system-specific hacks.
+(or possibly the special constants @code{MANY} or @code{UNEVALLED}), a
+pointer to the symbol referring to that subr, and a couple of other
-@node Overview, Data descriptions, Dumping Justification, Dumping
+things.  If the subr wants its arguments @code{UNEVALLED}, they are
-@section Overview
+passed raw as a list.  Otherwise, an array of evaluated arguments is
-@cindex dumping overview
+created and put into the backtrace structure, and either passed whole
+(@code{MANY}) or each argument is passed as a C argument.
-The portable dumping system has to:
+If the function is a @code{Lisp_Compiled_Function},
+@code{funcall_compiled_function()} is called.  If the function is a
+lambda list, @code{funcall_lambda()} is called.  If the function is a
+macro, [..... fill in] is done.  If the function is an autoload,
+@code{do_autoload()} is called to load the definition and then eval
+starts over [explain this more].
+When @code{Feval()} exits, the evaluation depth is reduced by one, the
+debugger is called if appropriate, and the current backtrace structure
+is removed from the list.
+Both @code{funcall_compiled_function()} and @code{funcall_lambda()} need
+to go through the list of formal parameters to the function and bind
+them to the actual arguments, checking for @code{&rest} and
+@code{&optional} symbols in the formal parameters and making sure the
+number of actual arguments is correct.
+@code{funcall_compiled_function()} can do this a little more
+efficiently, since the formal parameter list can be checked for sanity
+when the compiled function object is created.
+@code{funcall_lambda()} simply calls @code{Fprogn} to execute the code
+in the lambda list.
+@code{funcall_compiled_function()} calls the real byte-code interpreter
+@code{execute_optimized_program()} on the byte-code instructions, which
+are converted into an internal form for faster execution.
+When a compiled function is executed for the first time by
+@code{funcall_compiled_function()}, or during the dump phase of building
+XEmacs, the byte-code instructions are converted from a
+@code{Lisp_String} (which is inefficient to access, especially in the
+presence of MULE) into a @code{Lisp_Opaque} object containing an array
+of unsigned char, which can be directly executed by the byte-code
+interpreter.  At this time the byte code is also analyzed for validity
+and transformed into a more optimized form, so that
+@code{execute_optimized_program()} can really fly.
+Here are some of the optimizations performed by the internal byte-code
+transformer:
 @enumerate
 @item
-At dump time, write all initialized, non-quickly-rebuildable data to a
+References to the @code{constants} array are checked for out-of-range
-file [Note: currently named @file{xemacs.dmp}, but the name will
+indices, so that the byte interpreter doesn't have to.
-change], along with all information needed for the reloading.
+@item
+References to the @code{constants} array that will be used as a Lisp
-@item
+variable are checked for being correct non-constant (i.e. not @code{t},
-When starting xemacs, reload the dump file, relocate it to its new
+@code{nil}, or @code{keywordp}) symbols, so that the byte interpreter
-starting address if needed, and reinitialize all pointers to this
+doesn't have to.
-data.  Also, rebuild all the quickly rebuildable data.
+@item
+The maximum number of variable bindings in the byte-code is
+pre-computed, so that space on the @code{specpdl} stack can be
+pre-reserved once for the whole function execution.
+@item
+All byte-code jumps are relative to the current program counter instead
+of the start of the program, thereby saving a register.
+@item
+One-byte relative jumps are converted from the byte-code form of unsigned
+chars offset by 127 to machine-friendly signed chars.
 @end enumerate
-Note: As of 21.5.18, the dump file has been moved inside of the
+Of course, this transformation of the @code{instructions} should not be
-executable, although there are still problems with this on some systems.
+visible to the user, so @code{Fcompiled_function_instructions()} needs
+to know how to convert the optimized opaque object back into a Lisp
-@node Data descriptions, Dumping phase, Overview, Dumping
+string that is identical to the original string from the @file{.elc}
-@section Data descriptions
+file.  (Actually, the resulting string may (rarely) contain slightly
-@cindex dumping data descriptions
+different, yet equivalent, byte code.)
-The more complex task of the dumper is to be able to write memory blocks
+@code{Ffuncall()} implements Lisp @code{funcall}.  @code{(funcall fun
-on the heap (lisp objects, i.e. lrecords, and C-allocated memory, such
+x1 x2 x3 ...)} is equivalent to @code{(eval (list fun (quote x1) (quote
-as structs and arrays) to disk and reload them at a different address,
+x2) (quote x3) ...))}.  @code{Ffuncall()} contains its own code to do
-updating all the pointers they include in the process.  This is done by
+the evaluation, however, and is very similar to @code{Feval()}.
-using external data descriptions that give information about the layout
-of the blocks in memory.
+From the performance point of view, it is worth knowing that most of the
+time in Lisp evaluation is spent executing @code{Lisp_Subr} and
-The specification of these descriptions is in lrecord.h.  A description
+@code{Lisp_Compiled_Function} objects via @code{Ffuncall()} (not
-of an lrecord is an array of struct memory_description.  Each of these
+@code{Feval()}).
-structs include a type, an offset in the block and some optional
-parameters depending on the type.  For instance, here is the string
+@code{Fapply()} implements Lisp @code{apply}, which is very similar to
-description:
+@code{funcall} except that if the last argument is a list, the result is the
+same as if each of the arguments in the list had been passed separately.
+@code{Fapply()} does some business to expand the last argument if it's a
+list, then calls @code{Ffuncall()} to do the work.
+@code{apply1()}, @code{call0()}, @code{call1()}, @code{call2()}, and
+@code{call3()} call a function, passing it the argument(s) given (the
+arguments are given as separate C arguments rather than being passed as
+an array).  @code{apply1()} uses @code{Fapply()} while the others use
+@code{Ffuncall()} to do the real work.
+@node Dynamic Binding; The specbinding Stack; Unwind-Protects, Simple Special Forms, Evaluation, Evaluation; Stack Frames; Bindings
+@section Dynamic Binding; The specbinding Stack; Unwind-Protects
+@cindex dynamic binding; the specbinding stack; unwind-protects
+@cindex binding; the specbinding stack; unwind-protects, dynamic
+@cindex specbinding stack; unwind-protects, dynamic binding; the
+@cindex unwind-protects, dynamic binding; the specbinding stack;
 @example
-static const struct memory_description string_description[] = @{
+struct specbinding
-@{ XD_BYTECOUNT,         offsetof (Lisp_String, size) @},
+@{
-@{ XD_OPAQUE_DATA_PTR,   offsetof (Lisp_String, data), XD_INDIRECT(0, 1) @},
+Lisp_Object symbol;
-@{ XD_LISP_OBJECT,       offsetof (Lisp_String, plist) @},
+Lisp_Object old_value;
-@{ XD_END @}
+Lisp_Object (*func) (Lisp_Object); /* for unwind-protect */
 @};
 @end example
-The first line indicates a member of type Bytecount, which is used by
+@code{struct specbinding} is used for local-variable bindings and
-the next, indirect directive.  The second means "there is a pointer to
+unwind-protects.  @code{specpdl} holds an array of @code{struct specbinding}'s,
-some opaque data in the field @code{data}".  The length of said data is
+@code{specpdl_ptr} points to the beginning of the free bindings in the
-given by the expression @code{XD_INDIRECT(0, 1)}, which means "the value
+array, @code{specpdl_size} specifies the total number of binding slots
-in the 0th line of the description (welcome to C) plus one".  The third
+in the array, and @code{max_specpdl_size} specifies the maximum number
-line means "there is a Lisp_Object member @code{plist} in the Lisp_String
+of bindings the array can be expanded to hold.  @code{grow_specpdl()}
-structure".  @code{XD_END} then ends the description.
+increases the size of the @code{specpdl} array, multiplying its size by
+2 but never exceeding @code{max_specpdl_size} (except that if this
-This gives us all the information we need to move around what is pointed
+number is less than 400, it is first set to 400).
-to by a memory block (C or lrecord) and, by transitivity, everything
-that it points to.  The only missing information for dumping is the size
+@code{specbind()} binds a symbol to a value and is used for local
-of the block.  For lrecords, this is part of the
+variables and @code{let} forms.  The symbol and its old value (which
-lrecord_implementation, so we don't need to duplicate it.  For C blocks
+might be @code{Qunbound}, indicating no prior value) are recorded in the
-we use a struct sized_memory_description, which includes a size field
+specpdl array, and @code{specpdl_size} is increased by 1.
-and a pointer to an associated array of memory_description.
+@code{record_unwind_protect()} implements an @dfn{unwind-protect},
-@node Dumping phase, Reloading phase, Data descriptions, Dumping
+which, when placed around a section of code, ensures that some specified
-@section Dumping phase
+cleanup routine will be executed even if the code exits abnormally
-@cindex dumping phase
+(e.g. through a @code{throw} or quit).  @code{record_unwind_protect()}
+simply adds a new specbinding to the @code{specpdl} array and stores the
-Dumping is done by calling the function @code{pdump()} (in @file{dumper.c}) which is
+appropriate information in it.  The cleanup routine can either be a C
-invoked from Fdump_emacs (in @file{emacs.c}).  This function performs a number
+function, which is stored in the @code{func} field, or a @code{progn}
-of tasks.
+form, which is stored in the @code{old_value} field.
+@code{unbind_to()} removes specbindings from the @code{specpdl} array
+until the specified position is reached.  Each specbinding can be one of
+three types:
+@enumerate
+@item
+an unwind-protect with a C cleanup function (@code{func} is not 0, and
+@code{old_value} holds an argument to be passed to the function);
+@item
+an unwind-protect with a Lisp form (@code{func} is 0, @code{symbol}
+is @code{nil}, and @code{old_value} holds the form to be executed with
+@code{Fprogn()}); or
+@item
+a local-variable binding (@code{func} is 0, @code{symbol} is not
+@code{nil}, and @code{old_value} holds the old value, which is stored as
+the symbol's value).
+@end enumerate
+@node Simple Special Forms, Catch and Throw, Dynamic Binding; The specbinding Stack; Unwind-Protects, Evaluation; Stack Frames; Bindings
+@section Simple Special Forms
+@cindex special forms, simple
+@code{or}, @code{and}, @code{if}, @code{cond}, @code{progn},
+@code{prog1}, @code{prog2}, @code{setq}, @code{quote}, @code{function},
+@code{let*}, @code{let}, @code{while}
+All of these are very simple and work as expected, calling
+@code{Feval()} or @code{Fprogn()} as necessary and (in the case of
+@code{let} and @code{let*}) using @code{specbind()} to create bindings
+and @code{unbind_to()} to undo the bindings when finished.
+Note that, with the exception of @code{Fprogn}, these functions are
+typically called in real life only in interpreted code, since the byte
+compiler knows how to convert calls to these functions directly into
+byte code.
+@node Catch and Throw, Error Trapping, Simple Special Forms, Evaluation; Stack Frames; Bindings
+@section Catch and Throw
+@cindex catch and throw
+@cindex throw, catch and
+@example
+struct catchtag
+@{
+Lisp_Object tag;
+Lisp_Object val;
+struct catchtag *next;
+struct gcpro *gcpro;
+jmp_buf jmp;
+struct backtrace *backlist;
+int lisp_eval_depth;
+int pdlcount;
+@};
+@end example
+@code{catch} is a Lisp function that places a catch around a body of
+code.  A catch is a means of non-local exit from the code.  When a catch
+is created, a tag is specified, and executing a @code{throw} to this tag
+will exit from the body of code caught with this tag, and its value will
+be the value given in the call to @code{throw}.  If there is no such
+call, the code will be executed normally.
+Information pertaining to a catch is held in a @code{struct catchtag},
+which is placed at the head of a linked list pointed to by
+@code{catchlist}.  @code{internal_catch()} is passed a C function to
+call (@code{Fprogn()} when Lisp @code{catch} is called) and arguments to
+give it, and places a catch around the function.  Each @code{struct
+catchtag} is held in the stack frame of the @code{internal_catch()}
+instance that created the catch.
+@code{internal_catch()} is fairly straightforward.  It stores into the
+@code{struct catchtag} the tag name and the current values of
+@code{backtrace_list}, @code{lisp_eval_depth}, @code{gcprolist}, and the
+offset into the @code{specpdl} array, sets a jump point with @code{_setjmp()}
+(storing the jump point into the @code{struct catchtag}), and calls the
+function.  Control will return to @code{internal_catch()} either when
+the function exits normally or through a @code{_longjmp()} to this jump
+point.  In the latter case, @code{throw} will store the value to be
+returned into the @code{struct catchtag} before jumping.  When it's
+done, @code{internal_catch()} removes the @code{struct catchtag} from
+the catchlist and returns the proper value.
+@code{Fthrow()} goes up through the catchlist until it finds one with
+a matching tag.  It then calls @code{unbind_catch()} to restore
+everything to what it was when the appropriate catch was set, stores the
+return value in the @code{struct catchtag}, and jumps (with
+@code{_longjmp()}) to its jump point.
+@code{unbind_catch()} removes all catches from the catchlist until it
+finds the correct one.  Some of the catches might have been placed for
+error-trapping, and if so, the appropriate entries on the handlerlist
+must be removed (see ``errors'').  @code{unbind_catch()} also restores
+the values of @code{gcprolist}, @code{backtrace_list}, and
+@code{lisp_eval}, and calls @code{unbind_to()} to undo any specbindings
+created since the catch.
+@node Error Trapping,  , Catch and Throw, Evaluation; Stack Frames; Bindings
+@section Error Trapping
+@cindex error trapping
+@subheading call_trapping_problems():
+This is equivalent to (*fun) (arg), except that various conditions
+can be trapped or inhibited, according to FLAGS.
+@itemize @bullet
+@item
+If FLAGS does not contain NO_INHIBIT_ERRORS, when an error occurs,
+the error is caught and a warning is issued, specifying the
+specific error that occurred and a backtrace.  In that case,
+WARNING_STRING should be given, and will be printed at the
+beginning of the error to indicate where the error occurred.
+@item
+If FLAGS does not contain NO_INHIBIT_THROWS, all attempts to
+@code{throw} out of the function being called are trapped, and a warning
+issued. (Again, WARNING_STRING should be given.)
+@item
+If FLAGS contains INHIBIT_WARNING_ISSUE, no warnings are issued;
+this applies to recursive invocations of call_trapping_problems, too.
+@item
+If FLAGS contains POSTPONE_WARNING_ISSUE, no warnings are issued;
+but values useful for generating a warning are still computed (in
+particular, the backtrace), so that the calling function can issue
+a warning.
+@item
+If FLAGS contains ISSUE_WARNINGS_AT_DEBUG_LEVEL, warnings will be
+issued, but at level @code{debug}, which normally is below the minimum
+specified by @code{log-warning-minimum-level}, meaning such warnings will
+be ignored entirely.  The user can change this variable, however,
+to see the warnings.)
+Note: If neither of NO_INHIBIT_THROWS or NO_INHIBIT_ERRORS is
+given, you are @strong{guaranteed} that there will be no non-local exits
+out of this function.
+@item
+If FLAGS contains INHIBIT_QUIT, QUIT using C-g is inhibited.  (This
+is @strong{rarely} a good idea.  Unless you use NO_INHIBIT_ERRORS, QUIT is
+automatically caught as well, and treated as an error; you can
+check for this using EQ (problems->error_conditions, Qquit).
+@item
+If FLAGS contains UNINHIBIT_QUIT, QUIT checking will be explicitly
+turned on. (It will abort the code being called, but will still be
+trapped and reported as an error, unless NO_INHIBIT_ERRORS is
+given.) This is useful when QUIT checking has been turned off by a
+higher-level caller.
+@item
+If FLAGS contains INHIBIT_GC, garbage collection is inhibited.
+This is useful for Lisp called within redisplay, for example.
+@item
+If FLAGS contains INHIBIT_EXISTING_PERMANENT_DISPLAY_OBJECT_DELETION,
+Lisp code is not allowed to delete any window, buffers, frames, devices,
+or consoles that were already in existence at the time this function
+was called. (However, it's perfectly legal for code to create a new
+buffer and then delete it.)
+#### It might be useful to have a flag that inhibits deletion of a
+specific permanent display object and everything it's attached to
+(e.g. a window, and the buffer, frame, device, and console it's
+attached to.
+@item
+If FLAGS contains INHIBIT_EXISTING_BUFFER_TEXT_MODIFICATION, Lisp
+code is not allowed to modify the text of any buffers that were
+already in existence at the time this function was called.
+(However, it's perfectly legal for code to create a new buffer and
+then modify its text.)
+@quotation
+[These last two flags are implemented using global variables
+Vdeletable_permanent_display_objects and Vmodifiable_buffers,
+which keep track of a list of all buffers or permanent display
+objects created since the last time one of these flags was set.
+The code that deletes buffers, etc. and modifies buffers checks
+@enumerate
+@item
+if the corresponding flag is set (through the global variable
+inhibit_flags or its accessor function get_inhibit_flags()), and
+@item
+if the object to be modified or deleted is not in the
+appropriate list.
+@end enumerate
+If so, it signals an error.
+Recursive calls to call_trapping_problems() are allowed.  In
+the case of the two flags mentioned above, the current values
+of the global variables are stored in an unwind-protect, and
+they're reset to nil.]
+@end quotation
+@item
+If FLAGS contains INHIBIT_ENTERING_DEBUGGER, the debugger will not
+be entered if an error occurs inside the Lisp code being called,
+even when the user has requested an error.  In such case, a warning
+is issued stating that access to the debugger is denied, unless
+INHIBIT_WARNING_ISSUE has also been supplied.  This is useful when
+calling Lisp code inside redisplay, in menu callbacks, etc. because
+in such cases either the display is in an inconsistent state or
+doing window operations is explicitly forbidden by the OS, and the
+debugger would causes visual changes on the screen and might create
+another frame.
+@item
+If FLAGS contains INHIBIT_ANY_CHANGE_AFFECTING_REDISPLAY, no
+changes of any sort to extents, faces, glyphs, buffer text,
+specifiers relating to display, other variables relating to
+display, splitting, deleting, or resizing windows or frames,
+deleting buffers, windows, frames, devices, or consoles, etc. is
+allowed.  This is for things called absolutely in the middle of
+redisplay, which expects things to be @strong{exactly} the same after the
+call as before.  This isn't completely implemented and needs to be
+thought out some more to determine exactly what its semantics are.
+For the moment, turning on this flag also turns on
+@itemize @minus
+@item
+INHIBIT_EXISTING_PERMANENT_DISPLAY_OBJECT_DELETION
+@item
+INHIBIT_EXISTING_BUFFER_TEXT_MODIFICATION
+@item
+INHIBIT_ENTERING_DEBUGGER
+@item
+INHIBIT_WARNING_ISSUE
+@item
+INHIBIT_GC
+@end itemize
+@item
+#### The following five flags are defined, but unimplemented:
+#define INHIBIT_EXISTING_CODING_SYSTEM_DELETION (1<<6)
+#define INHIBIT_EXISTING_CHARSET_DELETION (1<<7)
+#define INHIBIT_PERMANENT_DISPLAY_OBJECT_CREATION (1<<8)
+#define INHIBIT_CODING_SYSTEM_CREATION (1<<9)
+#define INHIBIT_CHARSET_CREATION (1<<10)
+@item
+FLAGS containing CALL_WITH_SUSPENDED_ERRORS is a sign that
+call_with_suspended_errors() was invoked.  This exists only for
+debugging purposes -- often we want to break when a signal happens,
+but ignore signals from call_with_suspended_errors(), because they
+occur often and for legitimate reasons.
+@end itemize
+If PROBLEM is non-zero, it should be a pointer to a structure into
+which exact information about any occurring problems (either an
+error or an attempted throw past this boundary).
+If a problem occurred and aborted operation (error, quit, or
+invalid throw), Qunbound is returned.  Otherwise the return value
+from the call to (*fun) (arg) is returned.
+@node Symbols and Variables, Buffers, Evaluation; Stack Frames; Bindings, Top
+@chapter Symbols and Variables
+@cindex symbols and variables
+@cindex variables, symbols and
 @menu
-* Object inventory::
+* Introduction to Symbols::
-* Address allocation::
+* Obarrays::
-* The header::
+* Symbol Values::
-* Data dumping::
-* Pointers dumping::
 @end menu
-@node Object inventory, Address allocation, Dumping phase, Dumping phase
+@node Introduction to Symbols, Obarrays, Symbols and Variables, Symbols and Variables
-@subsection Object inventory
+@section Introduction to Symbols
-@cindex dumping object inventory
+@cindex symbols, introduction to
-@cindex memory blocks
+A symbol is basically just an object with four fields: a name (a
-The first task is to build the list of the objects to dump.  This
+string), a value (some Lisp object), a function (some Lisp object), and
-includes:
+a property list (usually a list of alternating keyword/value pairs).
+What makes symbols special is that there is usually only one symbol with
+a given name, and the symbol is referred to by name.  This makes a
+symbol a convenient way of calling up data by name, i.e. of implementing
+variables. (The variable's value is stored in the @dfn{value slot}.)
+Similarly, functions are referenced by name, and the definition of the
+function is stored in a symbol's @dfn{function slot}.  This means that
+there can be a distinct function and variable with the same name.  The
+property list is used as a more general mechanism of associating
+additional values with particular names, and once again the namespace is
+independent of the function and variable namespaces.
+@node Obarrays, Symbol Values, Introduction to Symbols, Symbols and Variables
+@section Obarrays
+@cindex obarrays
+The identity of symbols with their names is accomplished through a
+structure called an obarray, which is just a poorly-implemented hash
+table mapping from strings to symbols whose name is that string. (I say
+``poorly implemented'' because an obarray appears in Lisp as a vector
+with some hidden fields rather than as its own opaque type.  This is an
+Emacs Lisp artifact that should be fixed.)
+Obarrays are implemented as a vector of some fixed size (which should
+be a prime for best results), where each ``bucket'' of the vector
+contains one or more symbols, threaded through a hidden @code{next}
+field in the symbol.  Lookup of a symbol in an obarray, and adding a
+symbol to an obarray, is accomplished through standard hash-table
+techniques.
+The standard Lisp function for working with symbols and obarrays is
+@code{intern}.  This looks up a symbol in an obarray given its name; if
+it's not found, a new symbol is automatically created with the specified
+name, added to the obarray, and returned.  This is what happens when the
+Lisp reader encounters a symbol (or more precisely, encounters the name
+of a symbol) in some text that it is reading.  There is a standard
+obarray called @code{obarray} that is used for this purpose, although
+the Lisp programmer is free to create his own obarrays and @code{intern}
+symbols in them.
+Note that, once a symbol is in an obarray, it stays there until
+something is done about it, and the standard obarray @code{obarray}
+always stays around, so once you use any particular variable name, a
+corresponding symbol will stay around in @code{obarray} until you exit
+XEmacs.
+Note that @code{obarray} itself is a variable, and as such there is a
+symbol in @code{obarray} whose name is @code{"obarray"} and which
+contains @code{obarray} as its value.
+Note also that this call to @code{intern} occurs only when in the Lisp
+reader, not when the code is executed (at which point the symbol is
+already around, stored as such in the definition of the function).
+You can create your own obarray using @code{make-vector} (this is
+horrible but is an artifact) and intern symbols into that obarray.
+Doing that will result in two or more symbols with the same name.
+However, at most one of these symbols is in the standard @code{obarray}:
+You cannot have two symbols of the same name in any particular obarray.
+Note that you cannot add a symbol to an obarray in any fashion other
+than using @code{intern}: i.e. you can't take an existing symbol and put
+it in an existing obarray.  Nor can you change the name of an existing
+symbol. (Since obarrays are vectors, you can violate the consistency of
+things by storing directly into the vector, but let's ignore that
+possibility.)
+Usually symbols are created by @code{intern}, but if you really want,
+you can explicitly create a symbol using @code{make-symbol}, giving it
+some name.  The resulting symbol is not in any obarray (i.e. it is
+@dfn{uninterned}), and you can't add it to any obarray.  Therefore its
+primary purpose is as a symbol to use in macros to avoid namespace
+pollution.  It can also be used as a carrier of information, but cons
+cells could probably be used just as well.
+You can also use @code{intern-soft} to look up a symbol but not create
+a new one, and @code{unintern} to remove a symbol from an obarray.  This
+returns the removed symbol. (Remember: You can't put the symbol back
+into any obarray.) Finally, @code{mapatoms} maps over all of the symbols
+in an obarray.
+@node Symbol Values,  , Obarrays, Symbols and Variables
+@section Symbol Values
+@cindex symbol values
+@cindex values, symbol
+The value field of a symbol normally contains a Lisp object.  However,
+a symbol can be @dfn{unbound}, meaning that it logically has no value.
+This is internally indicated by storing a special Lisp object, called
+@dfn{the unbound marker} and stored in the global variable
+@code{Qunbound}.  The unbound marker is of a special Lisp object type
+called @dfn{symbol-value-magic}.  It is impossible for the Lisp
+programmer to directly create or access any object of this type.
+@strong{You must not let any ``symbol-value-magic'' object escape to
+the Lisp level.}  Printing any of these objects will cause the message
+@samp{INTERNAL EMACS BUG} to appear as part of the print representation.
+(You may see this normally when you call @code{debug_print()} from the
+debugger on a Lisp object.) If you let one of these objects escape to
+the Lisp level, you will violate a number of assumptions contained in
+the C code and make the unbound marker not function right.
+When a symbol is created, its value field (and function field) are set
+to @code{Qunbound}.  The Lisp programmer can restore these conditions
+later using @code{makunbound} or @code{fmakunbound}, and can query to
+see whether the value of function fields are @dfn{bound} (i.e. have a
+value other than @code{Qunbound}) using @code{boundp} and
+@code{fboundp}.  The fields are set to a normal Lisp object using
+@code{set} (or @code{setq}) and @code{fset}.
+Other symbol-value-magic objects are used as special markers to
+indicate variables that have non-normal properties.  This includes any
+variables that are tied into C variables (setting the variable magically
+sets some global variable in the C code, and likewise for retrieving the
+variable's value), variables that magically tie into slots in the
+current buffer, variables that are buffer-local, etc.  The
+symbol-value-magic object is stored in the value cell in place of
+a normal object, and the code to retrieve a symbol's value
+(i.e. @code{symbol-value}) knows how to do special things with them.
+This means that you should not just fetch the value cell directly if you
+want a symbol's value.
+The exact workings of this are rather complex and involved and are
+well-documented in comments in @file{buffer.c}, @file{symbols.c}, and
+@file{lisp.h}.
+@node Buffers, Text, Symbols and Variables, Top
+@chapter Buffers
+@cindex buffers
+@menu
+* Introduction to Buffers::     A buffer holds a block of text such as a file.
+* Buffer Lists::                Keeping track of all buffers.
+* Markers and Extents::         Tagging locations within a buffer.
+* The Buffer Object::           The Lisp object corresponding to a buffer.
+@end menu
+@node Introduction to Buffers, Buffer Lists, Buffers, Buffers
+@section Introduction to Buffers
+@cindex buffers, introduction to
+A buffer is logically just a Lisp object that holds some text.
+In this, it is like a string, but a buffer is optimized for
+frequent insertion and deletion, while a string is not.  Furthermore:
+@enumerate
+@item
+Buffers are @dfn{permanent} objects, i.e. once you create them, they
+remain around, and need to be explicitly deleted before they go away.
+@item
+Each buffer has a unique name, which is a string.  Buffers are
+normally referred to by name.  In this respect, they are like
+symbols.
+@item
+Buffers have a default insertion position, called @dfn{point}.
+Inserting text (unless you explicitly give a position) goes at point,
+and moves point forward past the text.  This is what is going on when
+you type text into Emacs.
+@item
+Buffers have lots of extra properties associated with them.
+@item
+Buffers can be @dfn{displayed}.  What this means is that there
+exist a number of @dfn{windows}, which are objects that correspond
+to some visible section of your display, and each window has
+an associated buffer, and the current contents of the buffer
+are shown in that section of the display.  The redisplay mechanism
+(which takes care of doing this) knows how to look at the
+text of a buffer and come up with some reasonable way of displaying
+this.  Many of the properties of a buffer control how the
+buffer's text is displayed.
+@item
+One buffer is distinguished and called the @dfn{current buffer}.  It is
+stored in the variable @code{current_buffer}.  Buffer operations operate
+on this buffer by default.  When you are typing text into a buffer, the
+buffer you are typing into is always @code{current_buffer}.  Switching
+to a different window changes the current buffer.  Note that Lisp code
+can temporarily change the current buffer using @code{set-buffer} (often
+enclosed in a @code{save-excursion} so that the former current buffer
+gets restored when the code is finished).  However, calling
+@code{set-buffer} will NOT cause a permanent change in the current
+buffer.  The reason for this is that the top-level event loop sets
+@code{current_buffer} to the buffer of the selected window, each time
+it finishes executing a user command.
+@end enumerate
+Make sure you understand the distinction between @dfn{current buffer}
+and @dfn{buffer of the selected window}, and the distinction between
+@dfn{point} of the current buffer and @dfn{window-point} of the selected
+window. (This latter distinction is explained in detail in the section
+on windows.)
+@node Buffer Lists, Markers and Extents, Introduction to Buffers, Buffers
+@section Buffer Lists
+@cindex buffer lists
+Recall earlier that buffers are @dfn{permanent} objects, i.e.  that
+they remain around until explicitly deleted.  This entails that there is
+a list of all the buffers in existence.  This list is actually an
+assoc-list (mapping from the buffer's name to the buffer) and is stored
+in the global variable @code{Vbuffer_alist}.
+The order of the buffers in the list is important: the buffers are
+ordered approximately from most-recently-used to least-recently-used.
+Switching to a buffer using @code{switch-to-buffer},
+@code{pop-to-buffer}, etc. and switching windows using
+@code{other-window}, etc.  usually brings the new current buffer to the
+front of the list.  @code{switch-to-buffer}, @code{other-buffer},
+etc. look at the beginning of the list to find an alternative buffer to
+suggest.  You can also explicitly move a buffer to the end of the list
+using @code{bury-buffer}.
+In addition to the global ordering in @code{Vbuffer_alist}, each frame
+has its own ordering of the list.  These lists always contain the same
+elements as in @code{Vbuffer_alist} although possibly in a different
+order.  @code{buffer-list} normally returns the list for the selected
+frame.  This allows you to work in separate frames without things
+interfering with each other.
+The standard way to look up a buffer given a name is
+@code{get-buffer}, and the standard way to create a new buffer is
+@code{get-buffer-create}, which looks up a buffer with a given name,
+creating a new one if necessary.  These operations correspond exactly
+with the symbol operations @code{intern-soft} and @code{intern},
+respectively.  You can also force a new buffer to be created using
+@code{generate-new-buffer}, which takes a name and (if necessary) makes
+a unique name from this by appending a number, and then creates the
+buffer.  This is basically like the symbol operation @code{gensym}.
+@node Markers and Extents, The Buffer Object, Buffer Lists, Buffers
+@section Markers and Extents
+@cindex markers and extents
+@cindex extents, markers and
+Among the things associated with a buffer are things that are
+logically attached to certain buffer positions.  This can be used to
+keep track of a buffer position when text is inserted and deleted, so
+that it remains at the same spot relative to the text around it; to
+assign properties to particular sections of text; etc.  There are two
+such objects that are useful in this regard: they are @dfn{markers} and
+@dfn{extents}.
+A @dfn{marker} is simply a flag placed at a particular buffer
+position, which is moved around as text is inserted and deleted.
+Markers are used for all sorts of purposes, such as the @code{mark} that
+is the other end of textual regions to be cut, copied, etc.
+An @dfn{extent} is similar to two markers plus some associated
+properties, and is used to keep track of regions in a buffer as text is
+inserted and deleted, and to add properties (e.g. fonts) to particular
+regions of text.  The external interface of extents is explained
+elsewhere.
+The important thing here is that markers and extents simply contain
+buffer positions in them as integers, and every time text is inserted or
+deleted, these positions must be updated.  In order to minimize the
+amount of shuffling that needs to be done, the positions in markers and
+extents (there's one per marker, two per extent) are stored in Membpos's.
+This means that they only need to be moved when the text is physically
+moved in memory; since the gap structure tries to minimize this, it also
+minimizes the number of marker and extent indices that need to be
+adjusted.  Look in @file{insdel.c} for the details of how this works.
+One other important distinction is that markers are @dfn{temporary}
+while extents are @dfn{permanent}.  This means that markers disappear as
+soon as there are no more pointers to them, and correspondingly, there
+is no way to determine what markers are in a buffer if you are just
+given the buffer.  Extents remain in a buffer until they are detached
+(which could happen as a result of text being deleted) or the buffer is
+deleted, and primitives do exist to enumerate the extents in a buffer.
+@node The Buffer Object,  , Markers and Extents, Buffers
+@section The Buffer Object
+@cindex buffer object, the
+@cindex object, the buffer
+Buffers contain fields not directly accessible by the Lisp programmer.
+We describe them here, naming them by the names used in the C code.
+Many are accessible indirectly in Lisp programs via Lisp primitives.
+@table @code
+@item name
+The buffer name is a string that names the buffer.  It is guaranteed to
+be unique.  @xref{Buffer Names,,, lispref, XEmacs Lisp Reference
+Manual}.
+@item save_modified
+This field contains the time when the buffer was last saved, as an
+integer.  @xref{Buffer Modification,,, lispref, XEmacs Lisp Reference
+Manual}.
+@item modtime
+This field contains the modification time of the visited file.  It is
+set when the file is written or read.  Every time the buffer is written
+to the file, this field is compared to the modification time of the
+file.  @xref{Buffer Modification,,, lispref, XEmacs Lisp Reference
+Manual}.
+@item auto_save_modified
+This field contains the time when the buffer was last auto-saved.
+@item last_window_start
+This field contains the @code{window-start} position in the buffer as of
+the last time the buffer was displayed in a window.
+@item undo_list
+This field points to the buffer's undo list.  @xref{Undo,,, lispref,
+XEmacs Lisp Reference Manual}.
+@item syntax_table_v
+This field contains the syntax table for the buffer.  @xref{Syntax
+Tables,,, lispref, XEmacs Lisp Reference Manual}.
+@item downcase_table
+This field contains the conversion table for converting text to lower
+case.  @xref{Case Tables,,, lispref, XEmacs Lisp Reference Manual}.
+@item upcase_table
+This field contains the conversion table for converting text to upper
+case.  @xref{Case Tables,,, lispref, XEmacs Lisp Reference Manual}.
+@item case_canon_table
+This field contains the conversion table for canonicalizing text for
+case-folding search.  @xref{Case Tables,,, lispref, XEmacs Lisp
+Reference Manual}.
+@item case_eqv_table
+This field contains the equivalence table for case-folding search.
+@xref{Case Tables,,, lispref, XEmacs Lisp Reference Manual}.
+@item display_table
+This field contains the buffer's display table, or @code{nil} if it
+doesn't have one.  @xref{Display Tables,,, lispref, XEmacs Lisp
+Reference Manual}.
+@item markers
+This field contains the chain of all markers that currently point into
+the buffer.  Deletion of text in the buffer, and motion of the buffer's
+gap, must check each of these markers and perhaps update it.
+@xref{Markers,,, lispref, XEmacs Lisp Reference Manual}.
+@item backed_up
+This field is a flag that tells whether a backup file has been made for
+the visited file of this buffer.
+@item mark
+This field contains the mark for the buffer.  The mark is a marker,
+hence it is also included on the list @code{markers}.  @xref{The Mark,,,
+lispref, XEmacs Lisp Reference Manual}.
+@item mark_active
+This field is non-@code{nil} if the buffer's mark is active.
+@item local_var_alist
+This field contains the association list describing the variables local
+in this buffer, and their values, with the exception of local variables
+that have special slots in the buffer object.  (Those slots are omitted
+from this table.)  @xref{Buffer-Local Variables,,, lispref, XEmacs Lisp
+Reference Manual}.
+@item modeline_format
+This field contains a Lisp object which controls how to display the mode
+line for this buffer.  @xref{Modeline Format,,, lispref, XEmacs Lisp
+Reference Manual}.
+@item base_buffer
+This field holds the buffer's base buffer (if it is an indirect buffer),
+or @code{nil}.
+@end table
+@node Text, Multilingual Support, Buffers, Top
+@chapter Text
+@cindex text
+@menu
+* The Text in a Buffer::        Representation of the text in a buffer.
+* Ibytes and Ichars::           Representation of individual characters.
+* Byte-Char Position Conversion::
+* Searching and Matching::      Higher-level algorithms.
+@end menu
+@node The Text in a Buffer, Ibytes and Ichars, Text, Text
+@section The Text in a Buffer
+@cindex text in a buffer, the
+@cindex buffer, the text in a
+The text in a buffer consists of a sequence of zero or more
+characters.  A @dfn{character} is an integer that logically represents
+a letter, number, space, or other unit of text.  Most of the characters
+that you will typically encounter belong to the ASCII set of characters,
+but there are also characters for various sorts of accented letters,
+special symbols, Chinese and Japanese ideograms (i.e. Kanji, Katakana,
+etc.), Cyrillic and Greek letters, etc.  The actual number of possible
+characters is quite large.
+For now, we can view a character as some non-negative integer that
+has some shape that defines how it typically appears (e.g. as an
+uppercase A). (The exact way in which a character appears depends on the
+font used to display the character.) The internal type of characters in
+the C code is an @code{Ichar}; this is just an @code{int}, but using a
+symbolic type makes the code clearer.
+Between every character in a buffer is a @dfn{buffer position} or
+@dfn{character position}.  We can speak of the character before or after
+a particular buffer position, and when you insert a character at a
+particular position, all characters after that position end up at new
+positions.  When we speak of the character @dfn{at} a position, we
+really mean the character after the position.  (This schizophrenia
+between a buffer position being ``between'' two characters and ``on'' a
+character is rampant in Emacs.)
+Buffer positions are numbered starting at 1.  This means that
+position 1 is before the first character, and position 0 is not
+valid.  If there are N characters in a buffer, then buffer
+position N+1 is after the last one, and position N+2 is not valid.
+The internal makeup of the Ichar integer varies depending on whether
+we have compiled with MULE support.  If not, the Ichar integer is an
+8-bit integer with possible values from 0 - 255.  0 - 127 are the
+standard ASCII characters, while 128 - 255 are the characters from the
+ISO-8859-1 character set.  If we have compiled with MULE support, an
+Ichar is a 19-bit integer, with the various bits having meanings
+according to a complex scheme that will be detailed later.  The
+characters numbered 0 - 255 still have the same meanings as for the
+non-MULE case, though.
+Internally, the text in a buffer is represented in a fairly simple
+fashion: as a contiguous array of bytes, with a @dfn{gap} of some size
+in the middle.  Although the gap is of some substantial size in bytes,
+there is no text contained within it: From the perspective of the text
+in the buffer, it does not exist.  The gap logically sits at some buffer
+position, between two characters (or possibly at the beginning or end of
+the buffer).  Insertion of text in a buffer at a particular position is
+always accomplished by first moving the gap to that position
+(i.e. through some block moving of text), then writing the text into the
+beginning of the gap, thereby shrinking the gap.  If the gap shrinks
+down to nothing, a new gap is created. (What actually happens is that a
+new gap is ``created'' at the end of the buffer's text, which requires
+nothing more than changing a couple of indices; then the gap is
+``moved'' to the position where the insertion needs to take place by
+moving up in memory all the text after that position.)  Similarly,
+deletion occurs by moving the gap to the place where the text is to be
+deleted, and then simply expanding the gap to include the deleted text.
+(@dfn{Expanding} and @dfn{shrinking} the gap as just described means
+just that the internal indices that keep track of where the gap is
+located are changed.)
+Note that the total amount of memory allocated for a buffer text never
+decreases while the buffer is live.  Therefore, if you load up a
+20-megabyte file and then delete all but one character, there will be a
+20-megabyte gap, which won't get any smaller (except by inserting
+characters back again).  Once the buffer is killed, the memory allocated
+for the buffer text will be freed, but it will still be sitting on the
+heap, taking up virtual memory, and will not be released back to the
+operating system. (However, if you have compiled XEmacs with rel-alloc,
+the situation is different.  In this case, the space @emph{will} be
+released back to the operating system.  However, this tends to result in a
+noticeable speed penalty.)
+Astute readers may notice that the text in a buffer is represented as
+an array of @emph{bytes}, while (at least in the MULE case) an Ichar is
+a 19-bit integer, which clearly cannot fit in a byte.  This means (of
+course) that the text in a buffer uses a different representation from
+an Ichar: specifically, the 19-bit Ichar becomes a series of one to
+four bytes.  The conversion between these two representations is complex
+and will be described later.
+In the non-MULE case, everything is very simple: An Ichar
+is an 8-bit value, which fits neatly into one byte.
+If we are given a buffer position and want to retrieve the
+character at that position, we need to follow these steps:
+@enumerate
+@item
+Pretend there's no gap, and convert the buffer position into a @dfn{byte
+index} that indexes to the appropriate byte in the buffer's stream of
+textual bytes.  By convention, byte indices begin at 1, just like buffer
+positions.  In the non-MULE case, byte indices and buffer positions are
+identical, since one character equals one byte.
+@item
+Convert the byte index into a @dfn{memory index}, which takes the gap
+into account.  The memory index is a direct index into the block of
+memory that stores the text of a buffer.  This basically just involves
+checking to see if the byte index is past the gap, and if so, adding the
+size of the gap to it.  By convention, memory indices begin at 1, just
+like buffer positions and byte indices, and when referring to the
+position that is @dfn{at} the gap, we always use the memory position at
+the @emph{beginning}, not at the end, of the gap.
+@item
+Fetch the appropriate bytes at the determined memory position.
+@item
+Convert these bytes into an Ichar.
+@end enumerate
+In the non-Mule case, (3) and (4) boil down to a simple one-byte
+memory access.
+Note that we have defined three types of positions in a buffer:
+@enumerate
+@item
+@dfn{buffer positions} or @dfn{character positions}, typedef @code{Charbpos}
+@item
+@dfn{byte indices}, typedef @code{Bytebpos}
+@item
+@dfn{memory indices}, typedef @code{Membpos}
+@end enumerate
+All three typedefs are just @code{int}s, but defining them this way makes
+things a lot clearer.
+Most code works with buffer positions.  In particular, all Lisp code
+that refers to text in a buffer uses buffer positions.  Lisp code does
+not know that byte indices or memory indices exist.
+Finally, we have a typedef for the bytes in a buffer.  This is a
+@code{Ibyte}, which is an unsigned char.  Referring to them as
+Ibytes underscores the fact that we are working with a string of bytes
+in the internal Emacs buffer representation rather than in one of a
+number of possible alternative representations (e.g. EUC-encoded text,
+etc.).
+@node Ibytes and Ichars, Byte-Char Position Conversion, The Text in a Buffer, Text
+@section Ibytes and Ichars
+@cindex Ibytes and Ichars
+@cindex Ichars, Ibytes and
+Not yet documented.
+@node Byte-Char Position Conversion, Searching and Matching, Ibytes and Ichars, Text
+@section Byte-Char Position Conversion
+@cindex byte-char position conversion
+@cindex position conversion, byte-char
+@cindex conversion, byte-char position
+Oct 2004:
+This is what I wrote when describing the previous algorithm:
+@quotation
+The basic algorithm we use is to keep track of a known region of
+characters in each buffer, all of which are of the same width.  We keep
+track of the boundaries of the region in both Charbpos and Bytebpos
+coordinates and also keep track of the char width, which is 1 - 4 bytes.
+If the position we're translating is not in the known region, then we
+invoke a function to update the known region to surround the position in
+question.  This assumes locality of reference, which is usually the
+case.
+Note that the function to update the known region can be simple or
+complicated depending on how much information we cache.  In addition to
+the known region, we always cache the correct conversions for point,
+BEGV, and ZV, and in addition to this we cache 16 positions where the
+conversion is known.  We only look in the cache or update it when we
+need to move the known region more than a certain amount (currently 50
+chars), and then we throw away a "random" value and replace it with the
+newly calculated value.
+Finally, we maintain an extra flag that tracks whether the buffer is
+entirely ASCII, to speed up the conversions even more.  This flag is
+actually of dubious value because in an entirely-ASCII buffer the known
+region will always span the entire buffer (in fact, we update the flag
+based on this fact), and so all we're saving is a few machine cycles.
+A potentially smarter method than what we do with known regions and
+cached positions would be to keep some sort of pseudo-extent layer over
+the buffer; maybe keep track of the charbpos/bytebpos correspondence at
+the beginning of each line, which would allow us to do a binary search
+over the pseudo-extents to narrow things down to the correct line, at
+which point you could use a linear movement method.  This would also
+mesh well with efficiently implementing a line-numbering scheme.
+However, you have to weigh the amount of time spent updating the cache
+vs. the savings that result from it.  In reality, we modify the buffer
+far less often than we access it, so a cache of this sort that provides
+guaranteed LOG (N) performance (or perhaps N * LOG (N), if we set a
+maximum on the cache size) would indeed be a win, particularly in very
+large buffers.  If we ever implement this, we should probably set a
+reasonably high minimum below which we use the old method, because the
+time spent updating the fancy cache would likely become dominant when
+making buffer modifications in smaller buffers.
+Note also that we have to multiply or divide by the char width in order
+to convert the positions.  We do some tricks to avoid ever actually
+having to do a multiply or divide, because that is typically an
+expensive operation (esp. divide).  Multiplying or dividing by 1, 2, or
+4 can be implemented simply as a shift left or shift right, and we keep
+track of a shifter value (0, 1, or 2) indicating how much to shift.
+Multiplying by 3 can be implemented by doubling and then adding the
+original value.  Dividing by 3, alas, cannot be implemented in any
+simple shift/subtract method, as far as I know; so we just do a table
+lookup.  For simplicity, we use a table of size 128K, which indexes the
+"divide-by-3" values for the first 64K non-negative numbers. (Note that
+we can increase the size up to 384K, i.e. indexing the first 192K
+non-negative numbers, while still using shorts in the array.) This also
+means that the size of the known region can be at most 64K for
+width-three characters.
+@end quotation
+Unfortunately, it turned out that the implementation had serious problems
+which had never been corrected.  In particular, the known region had a
+large tendency to become zero-length and stay that way.
+So I decided to port the algorithm from FSF 21.3, in markers.c.
+This algorithm is fairly simple.  Instead of using markers I kept the cache
+array of known positions from the previous implementation.
+Basically, we keep a number of positions cached:
 @itemize @bullet
-@item lisp objects
+@item
-@item other memory blocks (C structures, arrays. etc)
+the actual end of the buffer
+@item
+the beginning and end of the accessible region
+@item
+the value of point
+@item
+the position of the gap
+@item
+the last value we computed
+@item
+a set of positions that are "far away" from previously computed positions
+(5000 chars currently; #### perhaps should be smaller)
 @end itemize
-We end up with one @code{pdump_block_list_elt} per object group (arrays
+For each position, we @code{CONSIDER()} it.  This means:
-of C structs are kept together) which includes a pointer to the first
-object of the group, the per-object size and the count of objects in the
+@itemize @bullet
-group, along with some other information which is initialized later.
+@item
+If the position is what we're looking for, return it directly.
-These entries are linked together in @code{pdump_block_list} structures
+@item
-and can be enumerated thru either:
+Starting with the beginning and end of the buffer, we successively
+compute the smallest enclosing range of known positions.  If at any
+point we discover that this range has the same byte and char length
+(i.e. is entirely single-byte), then our computation is trivial.
+@item
+If at any point we get a small enough range (50 chars currently),
+stop considering further positions.
+@end itemize
+Otherwise, once we have an enclosing range, see which side is closer, and
+iterate until we find the desired value.  As an optimization, I replaced
+the simple loop in FSF with the use of @code{bytecount_to_charcount()},
+@code{charcount_to_bytecount()}, @code{bytecount_to_charcount_down()}, or
+@code{charcount_to_bytecount_down()}. (The latter two I added for this purpose.)
+These scan 4 or 8 bytes at a time through purely single-byte characters.
+If the amount we had to scan was more than our "far away" distance (5000
+characters, see above), then cache the new position.
+#### Things to do:
+@itemize @bullet
+@item
+Look at the most recent GNU Emacs to see whether anything has changed.
+@item
+Think about whether it makes sense to try to implement some sort of
+known region or list of "known regions", like we had before.  This would
+be a region of entirely single-byte characters that we can check very
+quickly. (Previously I used a range of same-width characters of any
+size; but this adds extra complexity and slows down the scanning, and is
+probably not worth it.) As part of the scanning process in
+@code{bytecount_to_charcount()} et al, we skip over chunks of entirely
+single-byte chars, so it should be easy to remember the last one.
+Presumably what we should do is keep track of the largest known surrounding
+entirely-single-byte region for each of the cache positions as well as
+perhaps the last-cached position.  We want to be careful not to get bitten
+by the previous problem of having the known region getting reset too
+often.  If we implement this, we might well want to continue scanning
+some distance past the desired position (maybe 300-1000 bytes) if we are
+in a single-byte range so that we won't end up expanding the known range
+one position at a time and entering the function each time.
+@item
+Think about whether it makes sense to keep the position cache sorted.
+This would allow it to be larger and finer-grained in its positions.
+Note that with FSF's use of markers, they were sorted, but this
+was not really made good use of.  With an array, we can do binary searching
+to quickly find the smallest range.  We would probably want to make use of
+the gap-array code in extents.c.
+@end itemize
+Note that FSF's algorithm checked @strong{ALL} markers, not just the ones cached
+by this algorithm.  This includes markers created by the user as well as
+both ends of any overlays.  We could do similarly, and our extents could
+keep both byte and character positions rather than just the former.  (But
+this would probably be overkill.  We should just use our cache instead.
+Any place an extent was set was surely already visited by the char<-->byte
+conversion routines.)
+@node Searching and Matching,  , Byte-Char Position Conversion, Text
+@section Searching and Matching
+@cindex searching
+@cindex matching
+Very incomplete, limited to a brief introduction.
+People find the searching and matching code difficult to understand.
+And indeed, the details are hard.  However, the basic structures are not
+so complex.  First, there's a hard question with a simple answer.  What
+about Mule?  The answer here is that it turns out that Mule characters
+can be matched byte by byte, so neither the search code nor the regular
+expression code need take much notice of it at all!  Of course, we add
+some special features (such as regular expressions that match only
+certain charsets), but these do not require new concepts.  The main
+exception is that wild-card matches in Mule have to be careful to
+swallow whole characters.  This is handled using the same basic macros
+that are used for buffer and string movements.
+This will also be true if a UTF-8 representation is used for the
+internal encoding.
+The complex algorithms for searching are for simple string searches.  In
+particular, the algorithm used for fast string searching is Boyer-Moore.
+This algorithm is based on the idea that if you have a mismatch at a
+given position, you can precompute where to restart the search.  This
+typically means that you can often make many fewer than N character
+comparisons, where N is the position at which the match is found, or the
+size of the text if it contains no match.  That's fast!  But it's not
+easy.  You must ``compile'' the search string into a jump table.  See
+the source, @file{search.c}, for more information.
+Emacs changes the basic algorithms somewhat in order to handle
+case-insensitive searches without a full-blown regular expression.
+Regular expressions, on the other hand, have a trivial search
+implementation: try a match at each position.  (Under POSIX rules, it's
+a bit more complex, because POSIX requires that you find the
+@emph{longest} match in the text.  This means you keep a record of the
+best match so far, and find all the matches.)
+The matching code for regular expressions is quite complex.  First, the
+regular expression itself is compiled.  There are two basic approaches
+that could be taken.  The first is to compile the expression into tables
+to drive a generic finite automaton emulator.  This is the approach
+given in many textbooks (Sedgewick's @emph{Algorithms} and Aho, Sethi,
+and Ullmann's @emph{Compilers: Principles, Techniques, and Tools}, aka
+``The Dragon Book'') as well as being used by the @file{lex} family of
+lexical analysis engines.
+Emacs uses a somewhat different technique.  The expression is compiled
+into a form of bytecode, which is interpreted by a special interpreter.
+The interpreter itself basically amounts to an inline implementation of
+the finite automaton emulator.  The advantage of this technique is that
+it's easier to add special features, such as control of case-sensitivity
+via a global variable.
+The compiler is not treated here.  See the source, @file{regex.c}.  The
+interpreter, although it is divided into several functions, and looks
+fearsomely complex, is actually quite simple in concept.  However,
+basically what you're doing there is a strcmp on steroids, right?
+@example
+int
+strcmp (char *p,            /* pattern pointer */
+char *b)            /* buffer pointer  */
+@{
+while (*p++ == *b++)
+;
+return *(--p) - *(--b);   /* oops, we overshot */
+@}
+@end example
+Really, it's no harder than that.  (A bit of a white lie, OK?)
+How does the regexp code generalize this?
 @enumerate
 @item
-the @code{pdump_object_table}, an array of @code{pdump_block_list}, one
+Depending on the pattern, @code{*b} may have a general relationship to
-per lrecord type, indexed by type number.
+@code{*p}.  @emph{I.e.}, direct comparison against @code{*p} is
+generalized to include checks for set membership, and context dependent
-@item
+properties.  This depends on @code{&*b}.  Of course that's meaningless
-the @code{pdump_opaque_data_list}, used for the opaque data which does
+in C, so we use @code{b} directly, instead.
-not include pointers, and hence does not need descriptions.
+@item
-@item
+Although to ensure the algorithm terminates, @code{b} must advance step
-the @code{pdump_desc_table}, which is a vector of
+by step, @code{p} can branch and jump.
-@code{memory_description}/@code{pdump_block_list} pairs, used for
-non-opaque C memory blocks.
+@item
+The information returned is much greater, including information about
+subexpressions.
 @end enumerate
-This uses a marking strategy similar to the garbage collector.  Some
+We'll ignore (3).  (2) is mostly interesting when compiling the regular
-differences though:
+expression.  Now we have
+@example
+@group
+enum operator_t @{
+accept = 0,
+exact,
+any,
+range,
+group,       /* actually, these are probably */
+repeat,      /* turned into conditional code */
+/* etc */
+@};
+@end group
+@group
+enum status_t @{
+working = 0,
+matched,
+mismatch,
+end_of_buffer,
+error
+@};
+@end group
+@group
+struct pattern @{
+enum operator_t operator;
+char char_value;
+boolean range_table[256];
+/* etc, etc */
+@};
+@end group
+@group
+char *p,  /* pattern pointer */
+*b;  /* buffer pointer */
+enum status_t
+match (struct pattern *p, char *b)
+@{
+enum status_t done = working;
+while (!(done = match_1_operator (p, b)))
+@{
+struct pattern *p1 = p;
+p = next_p (p, b);
+b = next_b (p1, b);
+@}
+return done;
+@}
+@end group
+@end example
+This format exposes the underlying finite automaton.
+All of them have the following structure, except that the @samp{next_*}
+functions decide where to jump (for @samp{p}) and whether or not to
+increment (for @samp{b}), rather than checking for satisfaction of a
+matching condition.
+@example
+enum status_t
+match_1_operator (pattern *p, char *b)
+@{
+if (! *b) return end_of_buffer;
+switch (p->operator)
+@{
+case accept:
+return matched;
+case exact:
+if (*b != p->char_value) return mismatch; else break;
+case any:
+break;
+case range:
+/* range_table is computed in the regexp_compile function */
+if (! p->range_table[*b]) return mismatch;
+/* etc, etc */
+@}
+return working;
+@}
+@end example
+Grouping, repetition, and alternation are handled by compiling the
+subexpression and calling @code{match (p->subpattern, b)} recursively.
+In terms of reading the actual code, there are five optimizations
+(obfuscations, if you like) that have been done.
 @enumerate
 @item
-We do not use the mark bit (which does not exist for generic memory blocks
+An explicit "failure stack" has been substituted for recursion.
-anyway); we use a big hash table instead.
+@item
-@item
+The @code{match_1_operator}, @code{next_p}, and @code{next_b} functions
-We do not use the mark function of lrecords but instead rely on the
+are actually inlined into the @code{match} function for efficiency.
-external descriptions.  This happens essentially because we need to
+Then the pointer movement is interspersed with the matching operations.
-follow pointers to generic memory blocks and opaque data in addition to
-Lisp_Object members.
+@item
+If the operator uses buffer context, the buffer pointer movement is
+sometimes implicit in the operations retrieving the context.
+@item
+Some cases are combined into short preparation for individual cases, and
+a "fall-through" into combined code for several cases.
+@item
+The @code{pattern} type is not an explicit @samp{struct}.  Instead, the
+data (including, @emph{e.g.}, @samp{range_table}) is inlined into the
+compiled bytecode.  This leads to bizarre code in the interpreter like
+@example
+case range:
+p += *(p + 1); break;
+@end example
+in @code{next_p}, because the compiled pattern is laid out
+@example
+..., 'range', count, first_8_flags, second_8_flags, ..., next_op, ...
+@end example
 @end enumerate
-This is done by @code{pdump_register_object()}, which handles
+But if you keep your eye on the "switch in a loop" structure, you
-Lisp_Object variables, and @code{pdump_register_block()} which handles
+should be able to understand the parts you need.
-generic memory blocks (C structures, arrays, etc.), which both delegate
-the description management to @code{pdump_register_sub()}.
+@node Multilingual Support, Consoles; Devices; Frames; Windows, Text, Top
+@chapter Multilingual Support
-The hash table doubles as a map object to pdump_block_list_elmt (i.e.
+@cindex Mule character sets and encodings
-allows us to look up a pdump_block_list_elmt with the object it points
+@cindex character sets and encodings, Mule
-to).  Entries are added with @code{pdump_add_block()} and looked up with
+@cindex encodings, Mule character sets and
-@code{pdump_get_block()}.  There is no need for entry removal.  The hash
-value is computed quite simply from the object pointer by
+@emph{NOTE}: There is a great deal of overlapping and redundant
-@code{pdump_make_hash()}.
+information in this chapter.  Ben wrote introductions to Mule issues a
+number of times, each time not realizing that he had already written
-The roots for the marking are:
+another introduction previously.  Hopefully, in time these will all be
+integrated.
+@emph{NOTE}: The information at the top of the source file
+@file{text.c} is more complete than the following, and there is also a
+list of all other places to look for text/I18N-related info.  Also look in
+@file{text.h} for info about the DFC and Eistring API's.
+Recall that there are two primary ways that text is represented in
+XEmacs.  The @dfn{buffer} representation sees the text as a series of
+bytes (Ibytes), with a variable number of bytes used per character.
+The @dfn{character} representation sees the text as a series of integers
+(Ichars), one per character.  The character representation is a cleaner
+representation from a theoretical standpoint, and is thus used in many
+cases when lots of manipulations on a string need to be done.  However,
+the buffer representation is the standard representation used in both
+Lisp strings and buffers, and because of this, it is the ``default''
+representation that text comes in.  The reason for using this
+representation is that it's compact and is compatible with ASCII.
+@menu
+* Introduction to Multilingual Issues #1::
+* Introduction to Multilingual Issues #2::
+* Introduction to Multilingual Issues #3::
+* Introduction to Multilingual Issues #4::
+* Character Sets::
+* Encodings::
+* Internal Mule Encodings::
+* Byte/Character Types; Buffer Positions; Other Typedefs::
+* Internal Text API's::
+* Coding for Mule::
+* CCL::
+* Microsoft Windows-Related Multilingual Issues::
+* Modules for Internationalization::
+@end menu
+@node Introduction to Multilingual Issues #1, Introduction to Multilingual Issues #2, Multilingual Support, Multilingual Support
+@section Introduction to Multilingual Issues #1
+@cindex introduction to multilingual issues #1
+There is an introduction to these issues in the Lisp Reference manual.
+@xref{Internationalization Terminology,,, lispref, XEmacs Lisp Reference
+Manual}.  Among other documentation that may be of interest to internals
+programmers is ISO-2022 (@pxref{ISO 2022,,, lispref, XEmacs Lisp
+Reference Manual}) and CCL (@pxref{CCL,,, lispref, XEmacs Lisp Reference
+Manual})
+@node Introduction to Multilingual Issues #2, Introduction to Multilingual Issues #3, Introduction to Multilingual Issues #1, Multilingual Support
+@section Introduction to Multilingual Issues #2
+@cindex introduction to multilingual issues #2
+@subheading Introduction
+This document covers a number of design issues, problems and proposals
+with regards to XEmacs MULE.  At first we present some definitions and
+some aspects of the design that have been agreed upon.  Then we present
+some issues and problems that need to be addressed, and then I include a
+proposal of mine to address some of these issues.  When there are other
+proposals, for example from Olivier, these will be appended to the end
+of this document.
+@subheading Definitions and Design Basics
+First, @dfn{text} is defined to be a series of characters which together
+defines an utterance or partial utterance in some language.
+Generally, this language is a human language, but it may also be a
+computer language if the computer language uses a representation close
+enough to that of human languages for it to also make sense to call its
+representation text.  Text is opposed to @dfn{binary}, which is a sequence
+of bytes, representing machine-readable but not human-readable data.
+A @dfn{byte} is merely a number within a predefined range, which nowadays is
+nearly always zero to 255.  A @dfn{character} is a unit of text.  What makes
+one character different from another is not always clear-cut.  It is
+generally related to the appearance of the character, although perhaps
+not any possible appearance of that character, but some sort of ideal
+appearance that is assigned to a character.  Whether two characters
+that look very similar are actually the same depends on various
+factors such as political ones, such as whether the characters are
+used to mean similar sorts of things, or behave similarly in similar
+contexts.  In any case, it is not always clearly defined whether two
+characters are actually the same or not.  In practice, however, this
+is more or less agreed upon.
+A @dfn{character set} is just that, a set of one or more characters.
+The set is unique in that there will not be more than one instance of
+the same character in a character set, and logically is unordered,
+although an order is often imposed or suggested for the characters in
+the character set.  We can also define an @dfn{order} on a character
+set, which is a way of assigning a unique number, or possibly a pair of
+numbers, or a triplet of numbers, or even a set of four or more numbers
+to each character in the character set.  The combination of an order in
+the character set results in an @dfn{ordered character set}.  In an
+ordered character set, there is an upper limit and a lower limit on the
+possible values that a character, or that any number within the set of
+numbers assigned to a character, can take.  However, the lower limit
+does not have to start at zero or one, or anywhere else in particular,
+nor does the upper limit have to end anywhere particular, and there may
+be gaps within these ranges such that particular numbers or sets of
+numbers do not have a corresponding character, even though they are
+within the upper and lower limits.  For example, @dfn{ASCII} defines a
+very standard ordered character set.  It is normally defined to be 94
+characters in the range 33 through 126 inclusive on both ends, with
+every possible character within this range being actually present in the
+character set.
+Sometimes the ASCII character set is extended to include what are called
+@dfn{non-printing characters}.  Non-printing characters are characters
+which instead of really being displayed in a more or less rectangular
+block, like all other characters, instead indicate certain functions
+typically related to either control of the display upon which the
+characters are being displayed, or have some effect on a communications
+channel that may be currently open and transmitting characters, or may
+change the meaning of future characters as they are being decoded, or
+some other similar function.  You might say that non-printing characters
+are somewhat of a hack because they are a special exception to the
+standard concept of a character as being a printed glyph that has some
+direct correspondence in the non-computer world.
+With non-printing characters in mind, the 94-character ordered character
+set called ASCII is often extended into a 96-character ordered character
+set, also often called ASCII, which includes in addition to the 94
+characters already mentioned, two non-printing characters, one called
+space and assigned the number 32, just below the bottom of the previous
+range, and another called @dfn{delete} or @dfn{rubout}, which is given
+number 127 just above the end of the previous range.  Thus to reiterate,
+the result is a 96-character ordered character set, whose characters
+take the values from 32 to 127 inclusive.  Sometimes ASCII is further
+extended to contain 32 more non-printing characters, which are given the
+numbers zero through 31 so that the result is a 128-character ordered
+character set with characters numbered zero through 127, and with many
+non-printing characters.  Another way to look at this, and the way that
+is normally taken by XEmacs MULE, is that the characters that would be
+in the range 30 through 31 in the most extended definition of ASCII,
+instead form their own ordered character set, which is called
+@dfn{control zero}, and consists of 32 characters in the range zero
+through 31.  A similar ordered character set called @dfn{control one} is
+also created, and it contains 32 more non-printing characters in the
+range 128 through 159.  Note that none of these three ordered character
+sets overlaps in any of the numbers they are assigned to their
+characters, so they can all be used at once.  Note further that the same
+character can occur in more than one character set.  This was shown
+above, for example, in two different ordered character sets we defined,
+one of which we could have called @dfn{ASCII}, and the other
+@dfn{ASCII-extended}, to show that it had extended by two non-printable
+characters.  Most of the characters in these two character sets are
+shared and present in both of them.
+Note that there is no restriction on the size of the character set, or
+on the numbers that are assigned to characters in an ordered character
+set.  It is often extremely useful to represent a sequence of characters
+as a sequence of bytes, where a byte as defined above is a number in the
+range zero to 255.  An @dfn{encoding} does precisely this.  It is simply
+a mapping from a sequence of characters, possibly augmented with
+information indicating the character set that each of these characters
+belongs to, to a sequence of bytes which represents that sequence of
+characters and no other -- which is to say the mapping is reversible.
+A @dfn{coding system} is a set of rules for encoding a sequence of
+characters augmented with character set information into a sequence of
+bytes, and later performing the reverse operation.  It is frequently
+possible to group coding systems into classes or types based on common
+features.  Typically, for example, a particular coding system class
+may contain a base coding system which specifies some of the rules,
+but leaves the rest unspecified.  Individual members of the coding
+system class are formed by starting with the base coding system, and
+augmenting it with additional rules to produce a particular coding
+system, what you might think of as a sort of variation within a
+theme.
+@subheading XEmacs Specific Definitions
+First of all, in XEmacs, the concept of character is a little different
+from the general definition given above.  For one thing, the character
+set that a character belongs to may or may not be an inherent part of
+the character itself.  In other words, the same character occurring in
+two different character sets may appear in XEmacs as two different
+characters.  This is generally the case now, but we are attempting to
+move in the other direction.  Different proposals may have different
+ideas about exactly the extent to which this change will be carried out.
+The general trend, though, is to represent all information about a
+character other than the character itself, using text properties
+attached to the character.  That way two instances of the same character
+will look the same to lisp code that merely retrieves the character, and
+does not also look at the text properties of that character.  Everyone
+involved is in agreement in doing it this way with all Latin characters,
+and in fact for all characters other than Chinese, Japanese, and Korean
+ideographs.  For those, there may be a difference of opinion.
+A second difference between the general definition of character and the
+XEmacs usage of character is that each character is assigned a unique
+number that distinguishes it from all other characters in the world, or
+at the very least, from all other characters currently existing anywhere
+inside the current XEmacs invocation.  (If there is a case where the
+weaker statement applies, but not the stronger statement, it would
+possibly be with composite characters and any other such characters that
+are created on the sly.)
+This unique number is called the @dfn{character representation} of the
+character, and its particular details are a matter of debate.  There is
+the current standard in use that it is undoubtedly going to change.
+What has definitely been agreed upon is that it will be an integer, more
+specifically a positive integer, represented with less than or equal to
+31 bits on a 32-bit architecture, and possibly up to 63 bits on a 64-bit
+architecture, with the proviso that any characters that whose
+representation would fit in a 64-bit architecture, but not on a 32-bit
+architecture, would be used only for composite characters, and others
+that would satisfy the weak uniqueness property mentioned above, but not
+with the strong uniqueness property.
+At this point, it is useful to talk about the different representations
+that a sequence of characters can take.  The simplest representation is
+simply as a sequence of characters, and this is called the @dfn{Lisp
+representation} of text, because it is the representation that Lisp
+programs see.  Other representations include the external
+representation, which refers to any encoding of the sequence of
+characters, using the definition of encoding mentioned above.
+Typically, text in the external representation is used outside of
+XEmacs, for example in files, e-mail messages, web sites, and the like.
+Another representation for a sequence of characters is what I will call
+the @dfn{byte representation}, and it represents the way that XEmacs
+internally represents text in a buffer, or in a string.  Potentially,
+the representation could be different between a buffer and a string, and
+then the terms @dfn{buffer byte representation} and @dfn{string byte
+representation} would be used, but in practice I don't think this will
+occur.  It will be possible, of course, for buffers and strings, or
+particular buffers and particular strings, to contain different
+sub-representations of a single representation.  For example, Olivier's
+1-2-4 proposal allows for three sub-representations of his internal byte
+representation, allowing for 1 byte, 2 bytes, and 4 byte width
+characters respectively.  A particular string may be in one
+sub-representation, and a particular buffer in another
+sub-representation, but overall both are following the same byte
+representation.  I do not use the term @dfn{internal representation}
+here, as many people have, because it is potentially ambiguous.
+Another representation is called the @dfn{array of characters
+representation}.  This is a representation on the C-level in which the
+sequence of text is represented, not using the byte representation, but
+by using an array of characters, each represented using the character
+representation.  This sort of representation is often used by redisplay
+because it is more convenient to work with than any of the other
+internal representations.
+The term @dfn{binary representation} may also be heard.  Binary
+representation is used to represent binary data.  When binary data is
+represented in the lisp representation, an equivalence is simply set up
+between bytes zero through 255, and characters zero through 255.  These
+characters come from four character sets, which are from bottom to top,
+control zero, ASCII, control 1, and Latin 1.  Together, they comprise
+256 characters, and are a good mapping for the 256 possible bytes in a
+binary representation.  Binary representation could also be used to
+refer to an external representation of the binary data, which is a
+simple direct byte-to-byte representation.  No internal representation
+should ever be referred to as a binary representation because of
+ambiguity.  The terms character set/encoding system were defined
+generally, above.  In XEmacs, the equivalent concepts exist, although
+character set has been shortened to charset, and in fact represents
+specifically an ordered character set.  For each possible charset, and
+for each possible coding system, there is an associated object in
+XEmacs.  These objects will be of type charset and coding system,
+respectively.  Charsets and coding systems are divided into classes, or
+@dfn{types}, the normal term under XEmacs, and all possible charsets
+encoding systems that may be defined must be in one of these types.  If
+you need to create a charset or coding system that is not one of these
+types, you will have to modify the C code to support this new type.
+Some of the existing or soon-to-be-created types are, or will be,
+generic enough so that this shouldn't be an issue.  Note also that the
+byte encoding for text and the character coding of a character are
+closely related.  You might say that ideally each is the simplest
+equivalent of the other given the general constraints on each
+representation.
+To be specific, in the current MULE representation,
 @enumerate
 @item
-the @code{staticpro}'ed variables (there is a special
+Characters encode both the character itself and the character set
-@code{staticpro_nodump()} call for protected variables we do not want to
+that it comes from.  These character sets are always assumed to be
-dump).
+representable as an ordered character set of size 96 or of size 96
+by 96, or the trivially-related sizes 94 and 94 by 94.  The only
-@item
+allowable exceptions are the control zero and control one character
-the Lisp_Object variables registered via @code{dump_add_root_lisp_object}
+sets, which are of size 32.  Character sets which do not naturally
-(@code{staticpro()} is equivalent to @code{staticpro_nodump()} +
+have a compatible ordering such as this are shoehorned into an
-@code{dump_add_root_lisp_object()}).
+ordered character set, or possibly two ordered character sets of a
+compatible size.
 @item
-the data-segment memory blocks registered via @code{dump_add_root_block}
+The variable width byte representation was deliberately chosen to
-(for blocks with relocatable pointers), or @code{dump_add_opaque} (for
+allow scanning text forwards and backwards efficiently.  This
-"opaque" blocks with no relocatable pointers; this is just a shortcut
+necessitated defining the possible bytes into three ranges which
-for calling @code{dump_add_root_block} with a NULL description).
+we shall call A, B, and C.  Range A is used exclusively for
+single-byte characters, which is to say characters that are
-@item
+representing using only one contiguous byte.  Multi-byte
-the pointer variables registered via @code{dump_add_root_block_ptr},
+characters are always represented by using one byte from Range B,
-each of which points to a block of heap memory (generally a C structure
+followed by one or more bytes from Range C.  What this means is
-or array).  Note that @code{dump_add_root_block_ptr} is not technically
+that bytes that begin a character are unequivocally distinguished
-necessary, as a pointer variable can be seen as a special case of a
+from bytes that do not begin a character, and therefore there is
-data-segment memory block and registered using
+never a problem scaling backwards and finding the beginning of a
-@code{dump_add_root_block}.  Doing it this way, however, would require
+character.  Know that UTF8 adopts a proposal that is very similar
-another level of static structures declared.  Since pointer variables
+in spirit in that it uses separate ranges for the first byte of a
-are quite common, @code{dump_add_root_block_ptr} is provided for
+multi byte sequence, and the following bytes in multi-byte
-convenience.  Note also that internally we have to treat it separately
+sequence.
-from @code{dump_add_root_block} rather than writing the former as a call
+@item
-to the latter, since we don't have support for creating and using memory
+Given the fact that all ordered character sets allowed were
-descriptions on the fly -- they must all be statically declared in the
+essentially 96 characters per dimension, it made perfect sense to
-data-segment.
+make Range C comprise 96 bytes.  With a little more tweaking, the
+currently-standard MULE byte representation was created, and was
+drafted from this.
+@item
+The MULE byte representation defined four basic representations for
+characters, which would take up from one to four bytes,
+respectively.  The MULE character representation thus had the
+following constraints:
+@enumerate
+@item
+Character numbers zero through 255 should represent the
+characters that binary values zero through 255 would be
+mapped onto.  (Note: this was not the case in Kenichi Handa's
+version of this representation, but I changed it.)
+@item
+The four sub-classes of representation in the MULE byte
+representation should correspond to four contiguous
+non-overlapping ranges of characters.
+@item
+The algorithmic conversion between the single character
+represented in the byte representation and in the character
+representation should be as easy as possible.
+@item
+Given the previous constraints, the character representation
+should be as compact as possible, which is to say it should
+use the least number of bits possible.
 @end enumerate
+@end enumerate
-This does not include the GCPRO'ed variables, the specbinds, the
-catchtags, the backlist, the redisplay or the profiling info, since we
+So you see that the entire structure of the byte and character
-do not want to rebuild the actual chain of lisp calls which end up to
+representations stemmed from a very small number of basic choices,
-the dump-emacs call, only the global variables.
+which were
-Weak lists and weak hash tables are dumped as if they were their
-non-weak equivalent (without changing their type, of course).  This has
-not yet been a problem.
-@node Address allocation, The header, Object inventory, Dumping phase
-@subsection Address allocation
-@cindex dumping address allocation
-The next step is to allocate the offsets of each of the objects in the
-final dump file.  This is done by @code{pdump_allocate_offset()} which
-is called indirectly by @code{pdump_scan_by_alignment()}.
-The strategy to deal with alignment problems uses these facts:
 @enumerate
 @item
-real world alignment requirements are powers of two.
+the choice to encode character set information in a character
+@item
-@item
+the choice to assume that all character sets would have an order
-the C compiler is required to adjust the size of a struct so that you
+imposed upon them with 96 characters per one or two
-can have an array of them next to each other.  This means you can have an
+dimensions. (This is less arbitrary than it seems--it follows
-upper bound of the alignment requirements of a given structure by
+ISO-2022)
-looking at which power of two its size is a multiple.
+@item
+the choice to use a variable width byte representation.
-@item
-the non-variant part of variable size lrecords has an alignment
-requirement of 4.
 @end enumerate
-Hence, for each lrecord type, C struct type or opaque data block the
+What this means is that you cannot really separate the byte
-alignment requirement is computed as a power of two, with a minimum of
+representation, the character representation, and the assumptions made
-2^2 for lrecords.  @code{pdump_scan_by_alignment()} then scans all the
+about characters and whether they represent character sets from each
-@code{pdump_block_list_elmt}'s, the ones with the highest requirements
+other.  All of these are closely intertwined, and for purposes of
-first.  This ensures the best packing.
+simplicity, they should be designed together.  If you change one
+representation without changing another, you are in essence creating a
-The maximum alignment requirement we take into account is 2^8.
+completely new design with its own attendant problems--since your new
+design is likely to be quite complex and not very coherent with
-@code{pdump_allocate_offset()} only has to do a linear allocation,
+regards to the translation between the character and byte
-starting at offset 256 (this leaves room for the header and keeps the
+representations, you are likely to run into problems.
-alignments happy).
+@node Introduction to Multilingual Issues #3, Introduction to Multilingual Issues #4, Introduction to Multilingual Issues #2, Multilingual Support
-@node The header, Data dumping, Address allocation, Dumping phase
+@section Introduction to Multilingual Issues #3
-@subsection The header
+@cindex introduction to multilingual issues #3
-@cindex dumping, the header
+In XEmacs, Mule is a code word for the support for input handling and
-The next step creates the file and writes a header with a signature and
+display of multi-lingual text.  This section provides an overview of how
-some random information in it.  The @code{reloc_address} field, which
+this support impacts the C and Lisp code in XEmacs.  It is important for
-indicates at which address the file should be loaded if we want to avoid
+anyone who works on the C or the Lisp code, especially on the C code, to
-post-reload relocation, is set to 0.  It then seeks to offset 256 (base
+be aware of these issues, even if they don't work directly on code that
-offset for the objects).
+implements multi-lingual features, because there are various general
+procedures that need to be followed in order to write Mule-compliant
-@node Data dumping, Pointers dumping, The header, Dumping phase
+code.  (The specifics of these procedures are documented elsewhere in
-@subsection Data dumping
+this manual.)
-@cindex data dumping
-@cindex dumping, data
+There are four primary aspects of Mule support:
-The data is dumped in the same order as the addresses were allocated by
-@code{pdump_dump_data()}, called from @code{pdump_scan_by_alignment()}.
-This function copies the data to a temporary buffer, relocates all
-pointers in the object to the addresses allocated in step Address
-Allocation, and writes it to the file.  Using the same order means that,
-if we are careful with lrecords whose size is not a multiple of 4, we
-are ensured that the object is always written at the offset in the file
-allocated in step Address Allocation.
-@node Pointers dumping,  , Data dumping, Dumping phase
-@subsection Pointers dumping
-@cindex pointers dumping
-@cindex dumping, pointers
-A bunch of tables needed to reassign properly the global pointers are
-then written.  They are:
 @enumerate
 @item
-the pdump_root_block_ptrs dynarr
+internal handling and representation of multi-lingual text.
 @item
-the pdump_opaques dynarr
+conversion between the internal representation of text and the various
-@item
+external representations in which multi-lingual text is encoded, such as
-a vector of all the offsets to the objects in the file that include a
+Unicode representations (including mostly fixed width encodings such as
-description (for faster relocation at reload time)
+UCS-2/UTF-16 and UCS-4 and variable width ASCII conformant encodings,
-@item
+such as UTF-7 and UTF-8); the various ISO2022 representations, which
-the pdump_root_objects and pdump_weak_object_chains dynarrs.
+typically use escape sequences to switch between different character
+sets (such as Compound Text, used under X Windows; JIS, used
+specifically for encoding Japanese; and EUC, a non-modal encoding used
+for Japanese, Korean, and certain other languages); Microsoft's
+multi-byte encodings (such as Shift-JIS); various simple encodings for
+particular 8-bit character sets (such as Latin-1 and Latin-2, and
+encodings (such as koi8 and Alternativny) for Cyrillic); and others.
+This conversion needs to happen both for text in files and text sent to
+or retrieved from system API calls.  It even needs to happen for
+external binary data because the internal representation does not
+represent binary data simply as a sequence of bytes as it is represented
+externally.
+@item
+Proper display of multi-lingual characters.
+@item
+Input of multi-lingual text using the keyboard.
 @end enumerate
-For each of the dynarrs we write both the pointer to the variables and
+These four aspects are for the most part independent of each other.
-the relocated offset of the object they point to.  Since these variables
-are global, the pointers are still valid when restarting the program and
+@subheading Characters, Character Sets, and Encodings
-are used to regenerate the global pointers.
+A @dfn{character} (which is, BTW, a surprisingly complex concept) is, in
-The @code{pdump_weak_object_chains} dynarr is a special case.  The
+a written representation of text, the most basic written unit that has a
-variables it points to are the head of weak linked lists of lisp objects
+meaning of its own.  It's comparable to a phoneme when analyzing words
-of the same type.  Not all objects of this list are dumped so the
+in spoken speech (for example, the sound of @samp{t} in English, which
-relocated pointer we associate with them points to the first dumped
+in fact has different pronunciations in different words -- aspirated in
-object of the list, or Qnil if none is available.  This is also the
+@samp{time}, unaspirated in @samp{stop}, unreleased or even pronounced
-reason why they are not used as roots for the purpose of object
+as a glottal stop in @samp{button}, etc. -- but logically is a single
-enumeration.
+concept).  Like a phoneme, a character is an abstract concept defined by
+its @emph{meaning}.  The character @samp{lowercase f}, for example, can
-Some very important information like the @code{staticpros} and
+always be used to represent the first letter in the word @samp{fill},
-@code{lrecord_implementations_table} are handled indirectly using
+regardless of whether it's drawn upright or italic, whether the
-@code{dump_add_opaque} or @code{dump_add_root_block_ptr}.
+@samp{fi} combination is drawn as a single ligature, whether there are
+serifs on the bottom of the vertical stroke, etc. (These different
-This is the end of the dumping part.
+appearances of a single character are often called @dfn{graphs} or
+@dfn{glyphs}.) Our concern when representing text is on representing the
-@node Reloading phase, Remaining issues, Dumping phase, Dumping
+abstract characters, and not on their exact appearance.
-@section Reloading phase
-@cindex reloading phase
+A @dfn{character set} (or @dfn{charset}), as we define it, is a set of
-@cindex dumping, reloading phase
+characters, each with an associated number (or set of numbers -- see
+below), called a @dfn{code point}.  It's important to understand that a
-@subsection File loading
+character is not defined by any number attached to it, but by its
-@cindex dumping, file loading
+meaning.  For example, ASCII and EBCDIC are two charsets containing
+exactly the same characters (lowercase and uppercase letters, numbers 0
-The file is mmap'ed in memory (which ensures a PAGESIZE alignment, at
+through 9, particular punctuation marks) but with different
-least 4096), or if mmap is unavailable or fails, a 256-bytes aligned
+numberings. The `comma' character in ASCII and EBCDIC, for instance, is
-malloc is done and the file is loaded.
+the same character despite having a different numbering.  Conversely,
+when comparing ASCII and JIS-Roman, which look the same except that the
-Some variables are reinitialized from the values found in the header.
+latter has a yen sign substituted for the backslash, we would say that
+the backslash and yen sign are @strong{not} the same characters, despite having
-The difference between the actual loading address and the reloc_address
+the same number (95) and despite the fact that all other characters are
-is computed and will be used for all the relocations.
+present in both charsets, with the same numbering.  ASCII and JIS-Roman,
+then, do @emph{not} have exactly the same characters in them (ASCII has
+a backslash character but no yen-sign character, and vice-versa for
-@subsection Putting back the pdump_opaques
+JIS-Roman), unlike ASCII and EBCDIC, even though the numberings in ASCII
-@cindex dumping, putting back the pdump_opaques
+and JIS-Roman are closer.
-The memory contents are restored in the obvious and trivial way.
+It's also important to distinguish between charsets and encodings.  For
+a simple charset like ASCII, there is only one encoding normally used --
+each character is represented by a single byte, with the same value as
-@subsection Putting back the pdump_root_block_ptrs
+its code point.  For more complicated charsets, however, things are not
-@cindex dumping, putting back the pdump_root_block_ptrs
+so obvious.  Unicode version 2, for example, is a large charset with
+thousands of characters, each indexed by a 16-bit number, often
-The variables pointed to by pdump_root_block_ptrs in the dump phase are
+represented in hex, e.g. 0x05D0 for the Hebrew letter "aleph".  One
-reset to the right relocated object addresses.
+obvious encoding uses two bytes per character (actually two encodings,
+depending on which of the two possible byte orderings is chosen).  This
+encoding is convenient for internal processing of Unicode text; however,
-@subsection Object relocation
+it's incompatible with ASCII, so a different encoding, e.g. UTF-8, is
-@cindex dumping, object relocation
+usually used for external text, for example files or e-mail.  UTF-8
+represents Unicode characters with one to three bytes (often extended to
-All the objects are relocated using their description and their offset
+six bytes to handle characters with up to 31-bit indices).  Unicode
-by @code{pdump_reloc_one}.  This step is unnecessary if the
+characters 00 to 7F (identical with ASCII) are directly represented with
-reloc_address is equal to the file loading address.
+one byte, and other characters with two or more bytes, each in the range
+80 to FF.
-@subsection Putting back the pdump_root_objects and pdump_weak_object_chains
+In general, a single encoding may be able to represent more than one
-@cindex dumping, putting back the pdump_root_objects and pdump_weak_object_chains
+charset.
-Same as Putting back the pdump_root_block_ptrs.
+@subheading Internal Representation of Text
+In an ASCII or single-European-character-set world, life is very simple.
-@subsection Reorganize the hash tables
+There are 256 characters, and each character is represented using the
-@cindex dumping, reorganize the hash tables
+numbers 0 through 255, which fit into a single byte.  With a few
+exceptions (such as case-changing operations or syntax classes like
-Since some of the hash values in the lisp hash tables are
+'whitespace'), "text" is simply an array of indices into a font.  You
-address-dependent, their layout is now wrong.  So we go through each of
+can get different languages simply by choosing fonts with different
-them and have them resorted by calling @code{pdump_reorganize_hash_table}.
+8-bit character sets (ISO-8859-1, -2, special-symbol fonts, etc.), and
+everything will "just work" as long as anyone else receiving your text
-@node Remaining issues,  , Reloading phase, Dumping
+uses a compatible font.
-@section Remaining issues
-@cindex dumping, remaining issues
+In the multi-lingual world, however, it is much more complicated.  There
+are a great number of different characters which are organized in a
-The build process will have to start a post-dump xemacs, ask it the
+complex fashion into various character sets.  The representation to use
-loading address (which will, hopefully, be always the same between
+is not obvious because there are issues of size versus speed to
-different xemacs invocations) [[unfortunately, not true on Linux with
+consider.  In fact, there are in general two kinds of representations to
-the ExecShield feature]] and relocate the file to the new address.
+work with: one that represents a single character using an integer
-This way the object relocation phase will not have to be done, which
+(possibly a byte), and the other representing a single character as a
-means no writes in the objects and that, because of the use of mmap, the
+sequence of bytes.  The former representation is normally called fixed
-dumped data will be shared between all the xemacs running on the
+width, and the other variable width. Both representations represent
-computer.
+exactly the same characters, and the conversion from one representation
+to the other is governed by a specific formula (rather than by table
-Some executable signature will be necessary to ensure that a given dump
+lookup) but it may not be simple.  Most C code need not, and in fact
-file is really associated with a given executable, or random crashes
+should not, know the specifics of exactly how the representations work.
-will occur.  Maybe a random number set at compile or configure time thru
+In fact, the code must not make assumptions about the representations.
-a define.  This will also allow for having differently-compiled xemacsen
+This means in particular that it must use the proper macros for
-on the same system (mule and no-mule comes to mind).
+retrieving the character at a particular memory location, determining
+how many characters are present in a particular stretch of text, and
-The DOC file contents should probably end up in the dump file.
+incrementing a pointer to a particular character to point to the
+following character, and so on.  It must not assume that one character
+is stored using one byte, or even using any particular number of bytes.
-@node Events and the Event Loop, Asynchronous Events; Quit Checking, Dumping, Top
+It must not assume that the number of characters in a stretch of text
+bears any particular relation to a number of bytes in that stretch.  It
+must not assume that the character at a particular memory location can
+be retrieved simply by dereferencing the memory location, even if a
+character is known to be ASCII or is being compared with an ASCII
+character, etc.  Careful coding is required to be Mule clean.  The
+biggest work of adding Mule support, in fact, is converting all of the
+existing code to be Mule clean.
+Lisp code is mostly unaffected by these concerns.  Text in strings and
+buffers appears simply as a sequence of characters regardless of
+whether Mule support is present.  The biggest difference with older
+versions of Emacs, as well as current versions of GNU Emacs, is that
+integers and characters are no longer equivalent, but are separate
+Lisp Object types.
+@subheading Conversion Between Internal and External Representations
+All text needs to be converted to an external representation before being
+sent to a function or file, and all text retrieved from a function of
+file needs to be converted to the internal representation.  This
+conversion needs to happen as close to the source or destination of the
+text as possible.  No operations should ever be performed on text encoded
+in an external representation other than simple copying, because no
+assumptions can reliably be made about the format of this text.  You
+cannot assume, for example, that the end of text is terminated by a null
+byte. (For example, if the text is Unicode, it will have many null bytes
+in it.)  You cannot find the next "slash" character by searching through
+the bytes until you find a byte that looks like a "slash" character,
+because it might actually be the second byte of a Kanji character.
+Furthermore, all text in the internal representation must be converted,
+even if it is known to be completely ASCII, because the external
+representation may not be ASCII compatible (for example, if it is
+Unicode).
+The place where C code needs to be the most careful is when calling
+external API functions.  It is easy to forget that all text passed to or
+retrieved from these functions needs to be converted.  This includes text
+in structures passed to or retrieved from these functions and all text
+that is passed to a callback function that is called by the system.
+Macros are provided to perform conversions to or from external text.
+These macros are called TO_EXTERNAL_FORMAT and TO_INTERNAL_FORMAT
+respectively.  These macros accept input in various forms, for example,
+Lisp strings, buffers, lstreams, raw data, and can return data in
+multiple formats, including both @code{malloc()}ed and @code{alloca()}ed data.  The use
+of @code{alloca()}ed data here is particularly important because, in general,
+the returned data will not be used after making the API call, and as a
+result, using @code{alloca()}ed data provides a very cheap and easy to use
+method of allocation.
+These macros take a coding system argument which indicates the nature of
+the external encoding.  A coding system is an object that encapsulates
+the structures of a particular external encoding and the methods required
+to convert to and from this encoding.  A facility exists to create coding
+system aliases, which in essence gives a single coding system two
+different names.  It is effectively used in XEmacs to provide a layer of
+abstraction on top of the actual coding systems.  For example, the coding
+system alias "file-name" points to whichever coding system is currently
+used for encoding and decoding file names as passed to or retrieved from
+system calls.  In general, the actual encoding will differ from system to
+system, and also on the particular locale that the user is in.  The use
+of the file-name alias effectively hides that implementation detail on
+top of that abstract interface layer which provides a unified set of
+coding systems which are consistent across all operating environments.
+The choice of which coding system to use in a particular conversion macro
+requires some thought.  In general, you should choose a lower-level
+actual coding system when the very design of the APIs you are working
+with call for that particular coding system.  In all other cases, you
+should find the least general abstract coding system (i.e. coding system
+alias) that applies to your specific situation.  Only use the most
+general coding systems, such as native, when there is simply nothing else
+that is more appropriate.  By doing things this way, you allow the user
+more control over how the encoding actually works, because the user is
+free to map the abstracted coding system names onto to different actual
+coding systems.
+Some common coding systems are:
+@table @code
+@item ctext
+Compound Text, which is the standard encoding under X Windows, which is
+used for clipboard data and possibly other data.  (ctext is a coding
+system of type ISO2022.)
+@item mswindows-unicode
+this is used for representing text passed to MS Window API calls with
+arguments that need to be in Unicode format.  (mswindows-unicode is a
+coding system of type UTF-16)
+@item ms-windows-multi-byte
+this is used for representing text passed to MS Windows API calls with
+arguments that need to be in multi-byte format.  Note that there are
+very few if any examples of such calls.
+@item mswindows-tstr
+this is used for representing text passed to any MS Windows API calls
+that declare their argument as LPTSTR, or LPCTSTR.  This is the vast
+majority of system calls and automatically translates either to
+mswindows-unicode or mswindows-multi-byte, depending on the presence or
+absence of the UNICODE preprocessor constant.  (If we compile XEmacs
+with this preprocessor constant, then all API calls use Unicode for all
+text passed to or received from these API calls.)
+@item terminal
+used for text sent to or read from a text terminal in the absence of a
+more specific coding system (calls to window-system specific APIs should
+use the appropriate window-specific coding system if it makes sense to
+do so.)
+@item file-name
+used when specifying the names of files in the absence of a more
+specific encoding, such as ms-windows-tstr.
+@item native
+the most general coding system for specifying text passed to system
+calls.  This generally translates to whatever coding system is specified
+by the current locale.  This should only be used when none of the coding
+systems mentioned above are appropriate.
+@end table
+@subheading Proper Display of Multilingual Text
+There are two things required to get this working correctly.  One is
+selecting the correct font, and the other is encoding the text according
+to the encoding used for that specific font, or the window-system
+specific text display API.  Generally each separate character set has a
+different font associated with it, which is specified by name and each
+font has an associated encoding into which the characters must be
+translated.  (this is the case on X Windows, at least; on Windows there
+is a more general mechanism).  Both the specific font for a charset and
+the encoding of that font are system dependent.  Currently there is a
+way of specifying these two properties under X Windows (using the
+registry and ccl properties of a character set) but not for other window
+systems.  A more general system needs to be implemented to allow these
+characteristics to be specified for all Windows systems.
+Another issue is making sure that the necessary fonts for displaying
+various character sets are installed on the system.  Currently, XEmacs
+provides, on its web site, X Windows fonts for a number of different
+character sets that can be installed by users.  This isn't done yet for
+Windows, but it should be.
+@subheading Inputting of Multilingual Text
+This is a rather complicated issue because there are many paradigms
+defined for inputting multi-lingual text, some of which are specific to
+particular languages, and any particular language may have many
+different paradigms defined for inputting its text.  These paradigms are
+encoded in input methods and there is a standard API for defining an
+input method in XEmacs called LEIM, or Library of Emacs Input Methods.
+Some of these input methods are written entirely in Elisp, and thus are
+system-independent, while others require the aid either of an external
+process, or of C level support that ties into a particular
+system-specific input method API, for example, XIM under X Windows, or
+the active keyboard layout and IME support under Windows.  Currently,
+there is no support for any system-specific input methods under
+Microsoft Windows, although this will change.
+@node Introduction to Multilingual Issues #4, Character Sets, Introduction to Multilingual Issues #3, Multilingual Support
+@section Introduction to Multilingual Issues #4
+@cindex introduction to multilingual issues #4
+The rest of the sections in this chapter consist of yet another
+introduction to multilingual issues, duplicating the information in the
+previous sections.
+@node Character Sets, Encodings, Introduction to Multilingual Issues #4, Multilingual Support
+@section Character Sets
+@cindex character sets
+A @dfn{character set} (or @dfn{charset}) is an ordered set of
+characters.  A particular character in a charset is indexed using one or
+more @dfn{position codes}, which are non-negative integers.  The number
+of position codes needed to identify a particular character in a charset
+is called the @dfn{dimension} of the charset.  In XEmacs/Mule, all
+charsets have dimension 1 or 2, and the size of all charsets (except for
+a few special cases) is either 94, 96, 94 by 94, or 96 by 96.  The range
+of position codes used to index characters from any of these types of
+character sets is as follows:
+@example
+Charset type            Position code 1         Position code 2
+------------------------------------------------------------
+94                      33 - 126                N/A
+96                      32 - 127                N/A
+94x94                   33 - 126                33 - 126
+96x96                   32 - 127                32 - 127
+@end example
+Note that in the above cases position codes do not start at an
+expected value such as 0 or 1.  The reason for this will become clear
+later.
+For example, Latin-1 is a 96-character charset, and JISX0208 (the
+Japanese national character set) is a 94x94-character charset.
+[Note that, although the ranges above define the @emph{valid} position
+codes for a charset, some of the slots in a particular charset may in
+fact be empty.  This is the case for JISX0208, for example, where (e.g.)
+all the slots whose first position code is in the range 118 - 127 are
+empty.]
+There are three charsets that do not follow the above rules.  All of
+them have one dimension, and have ranges of position codes as follows:
+@example
+Charset name            Position code 1
+------------------------------------
+ASCII                   0 - 127
+Control-1               0 - 31
+Composite               0 - some large number
+@end example
+(The upper bound of the position code for composite characters has not
+yet been determined, but it will probably be at least 16,383).
+ASCII is the union of two subsidiary character sets: Printing-ASCII
+(the printing ASCII character set, consisting of position codes 33 -
+126, like for a standard 94-character charset) and Control-ASCII (the
+non-printing characters that would appear in a binary file with codes 0
+- 32 and 127).
+Control-1 contains the non-printing characters that would appear in a
+binary file with codes 128 - 159.
+Composite contains characters that are generated by overstriking one
+or more characters from other charsets.
+Note that some characters in ASCII, and all characters in Control-1,
+are @dfn{control} (non-printing) characters.  These have no printed
+representation but instead control some other function of the printing
+(e.g. TAB or 8 moves the current character position to the next tab
+stop).  All other characters in all charsets are @dfn{graphic}
+(printing) characters.
+When a binary file is read in, the bytes in the file are assigned to
+character sets as follows:
+@example
+Bytes           Character set           Range
+--------------------------------------------------
+0 - 127         ASCII                   0 - 127
+128 - 159       Control-1               0 - 31
+160 - 255       Latin-1                 32 - 127
+@end example
+This is a bit ad-hoc but gets the job done.
+@node Encodings, Internal Mule Encodings, Character Sets, Multilingual Support
+@section Encodings
+@cindex encodings, Mule
+@cindex Mule encodings
+An @dfn{encoding} is a way of numerically representing characters from
+one or more character sets.  If an encoding only encompasses one
+character set, then the position codes for the characters in that
+character set could be used directly.  This is not possible, however, if
+more than one character set is to be used in the encoding.
+For example, the conversion detailed above between bytes in a binary
+file and characters is effectively an encoding that encompasses the
+three character sets ASCII, Control-1, and Latin-1 in a stream of 8-bit
+bytes.
+Thus, an encoding can be viewed as a way of encoding characters from a
+specified group of character sets using a stream of bytes, each of which
+contains a fixed number of bits (but not necessarily 8, as in the common
+usage of ``byte'').
+Here are descriptions of a couple of common
+encodings:
+@menu
+* Japanese EUC (Extended Unix Code)::
+* JIS7::
+@end menu
+@node Japanese EUC (Extended Unix Code), JIS7, Encodings, Encodings
+@subsection Japanese EUC (Extended Unix Code)
+@cindex Japanese EUC (Extended Unix Code)
+@cindex EUC (Extended Unix Code), Japanese
+@cindex Extended Unix Code, Japanese EUC
+This encompasses the character sets Printing-ASCII, Katakana-JISX0201
+(half-width katakana, the right half of JISX0201), Japanese-JISX0208,
+and Japanese-JISX0212.
+Note that Printing-ASCII and Katakana-JISX0201 are 94-character
+charsets, while Japanese-JISX0208 and Japanese-JISX0212 are
+94x94-character charsets.
+The encoding is as follows:
+@example
+Character set            Representation (PC=position-code)
+-------------            --------------
+Printing-ASCII           PC1
+Katakana-JISX0201        0x8E       | PC1 + 0x80
+Japanese-JISX0208        PC1 + 0x80 | PC2 + 0x80
+Japanese-JISX0212        PC1 + 0x80 | PC2 + 0x80
+@end example
+Note that there are other versions of EUC for other Asian languages.
+EUC in general is characterized by
+@enumerate
+@item
+row-column encoding,
+@item
+big-endian (row-first) ordering, and
+@item
+ASCII compatibility in variable width forms.
+@end enumerate
+@node JIS7,  , Japanese EUC (Extended Unix Code), Encodings
+@subsection JIS7
+@cindex JIS7
+This encompasses the character sets Printing-ASCII,
+Latin-JISX0201 (the left half of JISX0201; this character set
+is very similar to Printing-ASCII and is a 94-character charset),
+Japanese-JISX0208, and Katakana-JISX0201.  It uses 7-bit bytes.
+Unlike EUC, this is a @dfn{modal} encoding, which means that there are
+multiple states that the encoding can be in, which affect how the bytes
+are to be interpreted.  Special sequences of bytes (called @dfn{escape
+sequences}) are used to change states.
+The encoding is as follows:
+@example
+Character set              Representation (PC=position-code)
+-------------              --------------
+Printing-ASCII             PC1
+Latin-JISX0201             PC1
+Katakana-JISX0201          PC1
+Japanese-JISX0208          PC1 | PC2
+Escape sequence   ASCII equivalent   Meaning
+---------------   ----------------   -------
+0x1B 0x28 0x4A    ESC ( J            invoke Latin-JISX0201
+0x1B 0x28 0x49    ESC ( I            invoke Katakana-JISX0201
+0x1B 0x24 0x42    ESC $ B            invoke Japanese-JISX0208
+0x1B 0x28 0x42    ESC ( B            invoke Printing-ASCII
+@end example
+Initially, Printing-ASCII is invoked.
+@node Internal Mule Encodings, Byte/Character Types; Buffer Positions; Other Typedefs, Encodings, Multilingual Support
+@section Internal Mule Encodings
+@cindex internal Mule encodings
+@cindex Mule encodings, internal
+@cindex encodings, internal Mule
+In XEmacs/Mule, each character set is assigned a unique number, called a
+@dfn{leading byte}.  This is used in the encodings of a character.
+Leading bytes are in the range 0x80 - 0xFF (except for ASCII, which has
+a leading byte of 0), although some leading bytes are reserved.
+Charsets whose leading byte is in the range 0x80 - 0x9F are called
+@dfn{official} and are used for built-in charsets.  Other charsets are
+called @dfn{private} and have leading bytes in the range 0xA0 - 0xFF;
+these are user-defined charsets.
+More specifically:
+@example
+Character set                Leading byte
+-------------                ------------
+ASCII                        0 (0x7F in arrays indexed by leading byte)
+Composite                    0x8D
+Dimension-1 Official         0x80 - 0x8C/0x8D
+(0x8E is free)
+Control                      0x8F
+Dimension-2 Official         0x90 - 0x99
+(0x9A - 0x9D are free)
+Dimension-1 Private Marker   0x9E
+Dimension-2 Private Marker   0x9F
+Dimension-1 Private          0xA0 - 0xEF
+Dimension-2 Private          0xF0 - 0xFF
+@end example
+There are two internal encodings for characters in XEmacs/Mule.  One is
+called @dfn{string encoding} and is an 8-bit encoding that is used for
+representing characters in a buffer or string.  It uses 1 to 4 bytes per
+character.  The other is called @dfn{character encoding} and is a 19-bit
+encoding that is used for representing characters individually in a
+variable.
+(In the following descriptions, we'll ignore composite characters for
+the moment.  We also give a general (structural) overview first,
+followed later by the exact details.)
+@menu
+* Internal String Encoding::
+* Internal Character Encoding::
+@end menu
+@node Internal String Encoding, Internal Character Encoding, Internal Mule Encodings, Internal Mule Encodings
+@subsection Internal String Encoding
+@cindex internal string encoding
+@cindex string encoding, internal
+@cindex encoding, internal string
+ASCII characters are encoded using their position code directly.  Other
+characters are encoded using their leading byte followed by their
+position code(s) with the high bit set.  Characters in private character
+sets have their leading byte prefixed with a @dfn{leading byte prefix},
+which is either 0x9E or 0x9F. (No character sets are ever assigned these
+leading bytes.) Specifically:
+@example
+Character set           Encoding (PC=position-code, LB=leading-byte)
+-------------           --------
+ASCII                   PC-1 |
+Control-1               LB   |  PC1 + 0xA0 |
+Dimension-1 official    LB   |  PC1 + 0x80 |
+Dimension-1 private     0x9E |  LB         | PC1 + 0x80 |
+Dimension-2 official    LB   |  PC1 + 0x80 | PC2 + 0x80 |
+Dimension-2 private     0x9F |  LB         | PC1 + 0x80 | PC2 + 0x80
+@end example
+The basic characteristic of this encoding is that the first byte
+of all characters is in the range 0x00 - 0x9F, and the second and
+following bytes of all characters is in the range 0xA0 - 0xFF.
+This means that it is impossible to get out of sync, or more
+specifically:
+@enumerate
+@item
+Given any byte position, the beginning of the character it is
+within can be determined in constant time.
+@item
+Given any byte position at the beginning of a character, the
+beginning of the next character can be determined in constant
+time.
+@item
+Given any byte position at the beginning of a character, the
+beginning of the previous character can be determined in constant
+time.
+@item
+Textual searches can simply treat encoded strings as if they
+were encoded in a one-byte-per-character fashion rather than
+the actual multi-byte encoding.
+@end enumerate
+None of the standard non-modal encodings meet all of these
+conditions.  For example, EUC satisfies only (2) and (3), while
+Shift-JIS and Big5 (not yet described) satisfy only (2). (All
+non-modal encodings must satisfy (2), in order to be unambiguous.)
+@node Internal Character Encoding,  , Internal String Encoding, Internal Mule Encodings
+@subsection Internal Character Encoding
+@cindex internal character encoding
+@cindex character encoding, internal
+@cindex encoding, internal character
+One 19-bit word represents a single character.  The word is
+separated into three fields:
+@example
+Bit number:     18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
+<------------> <------------------> <------------------>
+Field:                1                  2                    3
+@end example
+Note that fields 2 and 3 hold 7 bits each, while field 1 holds 5 bits.
+@example
+Character set           Field 1         Field 2         Field 3
+-------------           -------         -------         -------
+ASCII                      0               0              PC1
+range:                                                   (00 - 7F)
+Control-1                  0               1              PC1
+range:                                                   (00 - 1F)
+Dimension-1 official       0            LB - 0x7F         PC1
+range:                                    (01 - 0D)      (20 - 7F)
+Dimension-1 private        0            LB - 0x80         PC1
+range:                                    (20 - 6F)      (20 - 7F)
+Dimension-2 official    LB - 0x8F         PC1             PC2
+range:                    (01 - 0A)       (20 - 7F)      (20 - 7F)
+Dimension-2 private     LB - 0xE1         PC1             PC2
+range:                    (0F - 1E)       (20 - 7F)      (20 - 7F)
+Composite                 0x1F             ?               ?
+@end example
+Note that character codes 0 - 255 are the same as the ``binary
+encoding'' described above.
+Most of the code in XEmacs knows nothing of the representation of a
+character other than that values 0 - 255 represent ASCII, Control 1,
+and Latin 1.
+@strong{WARNING WARNING WARNING}: The Boyer-Moore code in
+@file{search.c}, and the code in @code{search_buffer()} that determines
+whether that code can be used, knows that ``field 3'' in a character
+always corresponds to the last byte in the textual representation of the
+character. (This is important because the Boyer-Moore algorithm works by
+looking at the last byte of the search string and &&#### finish this.
+@node Byte/Character Types; Buffer Positions; Other Typedefs, Internal Text API's, Internal Mule Encodings, Multilingual Support
+@section Byte/Character Types; Buffer Positions; Other Typedefs
+@cindex byte/character types; buffer positions; other typedefs
+@cindex byte/character types
+@cindex character types
+@cindex buffer positions
+@cindex typedefs, other
+@menu
+* Byte Types::
+* Different Ways of Seeing Internal Text::
+* Buffer Positions::
+* Other Typedefs::
+* Usage of the Various Representations::
+* Working With the Various Representations::
+@end menu
+@node Byte Types, Different Ways of Seeing Internal Text, Byte/Character Types; Buffer Positions; Other Typedefs, Byte/Character Types; Buffer Positions; Other Typedefs
+@subsection Byte Types
+@cindex byte types
+Stuff pointed to by a char * or unsigned char * will nearly always be
+one of the following types:
+@itemize @minus
+@item
+a) [Ibyte] pointer to internally-formatted text
+@item
+b) [Extbyte] pointer to text in some external format, which can be
+defined as all formats other than the internal one
+@item
+c) [Ascbyte] pure ASCII text
+@item
+d) [Binbyte] binary data that is not meant to be interpreted as text
+@item
+e) [Rawbyte] general data in memory, where we don't care about whether
+it's text or binary
+@item
+f) [Boolbyte] a zero or a one
+@item
+g) [Bitbyte] a byte used for bit fields
+@item
+h) [Chbyte] null-semantics @code{char *}; used when casting an argument to
+an external API where the the other types may not be
+appropriate
+@end itemize
+Types (b), (c), (f) and (h) are defined as @code{char}, while the others are
+@code{unsigned char}.  This is for maximum safety (signed characters are
+dangerous to work with) while maintaining as much compatibility with
+external API's and string constants as possible.
+We also provide versions of the above types defined with different
+underlying C types, for API compatibility.  These use the following
+prefixes:
+@example
+C = plain char, when the base type is unsigned
+U = unsigned
+S = signed
+@end example
+(Formerly I had a comment saying that type (e) "should be replaced with
+void *".  However, there are in fact many places where an unsigned char
+* might be used -- e.g. for ease in pointer computation, since void *
+doesn't allow this, and for compatibility with external API's.)
+Note that these typedefs are purely for documentation purposes; from
+the C code's perspective, they are exactly equivalent to @code{char *},
+@code{unsigned char *}, etc., so you can freely use them with library
+functions declared as such.
+Using these more specific types rather than the general ones helps avoid
+the confusions that occur when the semantics of a char * or unsigned
+char * argument being studied are unclear.  Furthermore, by requiring
+that ALL uses of @code{char} be replaced with some other type as part of the
+Mule-ization process, we can use a search for @code{char} as a way of finding
+code that has not been properly Mule-ized yet.
+@node Different Ways of Seeing Internal Text, Buffer Positions, Byte Types, Byte/Character Types; Buffer Positions; Other Typedefs
+@subsection Different Ways of Seeing Internal Text
+@cindex different ways of seeing internal text
+There are various ways of representing internal text.  The two primary
+ways are as an "array" of individual characters; the other is as a
+"stream" of bytes.  In the ASCII world, where there are only 255
+characters at most, things are easy because each character fits into a
+byte.  In general, however, this is not true -- see the above discussion
+of characters vs. encodings.
+In some cases, it's also important to distinguish between a stream
+representation as a series of bytes and as a series of textual units.
+This is particularly important wrt Unicode.  The UTF-16 representation
+(sometimes referred to, rather sloppily, as simply the "Unicode" format)
+represents text as a series of 16-bit units.  Mostly, each unit
+corresponds to a single character, but not necessarily, as characters
+outside of the range 0-65535 (the BMP or "Basic Multilingual Plane" of
+Unicode) require two 16-bit units, through the mechanism of
+"surrogates".  When a series of 16-bit units is serialized into a byte
+stream, there are at least two possible representations, little-endian
+and big-endian, and which one is used may depend on the native format of
+16-bit integers in the CPU of the machine that XEmacs is running
+on. (Similarly, UTF-32 is logically a representation with 32-bit textual
+units.)
+Specifically:
+@itemize @minus
+@item
+UTF-8 has 1-byte (8-bit) units.
+@item
+UTF-16 has 2-byte (16-bit) units.
+@item
+UTF-32 has 4-byte (32-bit) units.
+@item
+XEmacs-internal encoding (the old "Mule" encoding) has 1-byte (8-bit)
+units.
+@item
+UTF-7 technically has 7-bit units that are within the "mail-safe" range
+(ASCII 32 - 126 plus a few control characters), but normally is encoded
+in an 8-bit stream. (UTF-7 is also a modal encoding, since it has a
+normal mode where printable ASCII characters represent themselves and a
+shifted mode, introduced with a plus sign, where a base-64 encoding is
+used.)
+@item
+UTF-5 technically has 7-bit units (normally encoded in an 8-bit stream,
+like UTF-7), but only uses uppercase A-V and 0-9, and only encodes 4
+bits worth of data per character.  UTF-5 is meant for encoding Unicode
+inside of DNS names.
+@end itemize
+Thus, we can imagine three levels in the representation of texual data:
+@example
+series of characters -> series of textual units -> series of bytes
+[Ichar]                 [Itext]                 [Ibyte]
+@end example
+XEmacs has three corresponding typedefs:
+@itemize @minus
+@item
+An Ichar is an integer (at least 32-bit), representing a 31-bit
+character.
+@item
+An Itext is an unsigned value, either 8, 16 or 32 bits, depending
+on the nature of the internal representation, and corresponding to
+a single textual unit.
+@item
+An Ibyte is an @code{unsigned char}, representing a single byte in a
+textual byte stream.
+@end itemize
+Internal text in stream format can be simultaneously viewed as either
+@code{Itext *} or @code{Ibyte *}.  The @code{Ibyte *} representation is convenient for
+copying data from one place to another, because such routines usually
+expect byte counts.  However, @code{Itext *} is much better for actually
+working with the data.
+From a text-unit perspective, units 0 through 127 will always be ASCII
+compatible, and data in Lisp strings (and other textual data generated
+as a whole, e.g. from external conversion) will be followed by a
+null-unit terminator.  From an @code{Ibyte *} perspective, however, the
+encoding is only ASCII-compatible if it uses 1-byte units.
+Similarly to the different text representations, three integral count
+types exist -- Charcount, Textcount and Bytecount.
+NOTE: Despite the presence of the terminator, internal text itself can
+have nulls in it! (Null text units, not just the null bytes present in
+any UTF-16 encoding.) The terminator is present because in many cases
+internal text is passed to routines that will ultimately pass the text
+to library functions that cannot handle embedded nulls, e.g. functions
+manipulating filenames, and it is a real hassle to have to pass the
+length around constantly.  But this can lead to sloppy coding!  We need
+to be careful about watching for nulls in places that are important,
+e.g. manipulating string objects or passing data to/from the clipboard.
+@table @code
+@item Ibyte
+The data in a buffer or string is logically made up of Ibyte objects,
+where a Ibyte takes up the same amount of space as a char. (It is
+declared differently, though, to catch invalid usages.) Strings stored
+using Ibytes are said to be in "internal format".  The important
+characteristics of internal format are
+@itemize @minus
+@item
+ASCII characters are represented as a single Ibyte, in the range 0 -
+0x7f.
+@item
+All other characters are represented as a Ibyte in the range 0x80 - 0x9f
+followed by one or more Ibytes in the range 0xa0 to 0xff.
+@end itemize
+This leads to a number of desirable properties:
+@itemize @minus
+@item
+Given the position of the beginning of a character, you can find the
+beginning of the next or previous character in constant time.
+@item
+When searching for a substring or an ASCII character within the string,
+you need merely use standard searching routines.
+@end itemize
+@item Itext
+#### Document me.
+@item Ichar
+This typedef represents a single Emacs character, which can be ASCII,
+ISO-8859, or some extended character, as would typically be used for
+Kanji.  Note that the representation of a character as an Ichar is @strong{not}
+the same as the representation of that same character in a string; thus,
+you cannot do the standard C trick of passing a pointer to a character
+to a function that expects a string.
+An Ichar takes up 19 bits of representation and (for code compatibility
+and such) is compatible with an int.  This representation is visible on
+the Lisp level.  The important characteristics of the Ichar
+representation are
+@itemize @minus
+@item
+values 0x00 - 0x7f represent ASCII.
+@item
+values 0x80 - 0xff represent the right half of ISO-8859-1.
+@item
+values 0x100 and up represent all other characters.
+@end itemize
+This means that Ichar values are upwardly compatible with the standard
+8-bit representation of ASCII/ISO-8859-1.
+@item Extbyte
+Strings that go in or out of Emacs are in "external format", typedef'ed
+as an array of char or a char *.  There is more than one external format
+(JIS, EUC, etc.) but they all have similar properties.  They are modal
+encodings, which is to say that the meaning of particular bytes is not
+fixed but depends on what "mode" the string is currently in (e.g. bytes
+in the range 0 - 0x7f might be interpreted as ASCII, or as Hiragana, or
+as 2-byte Kanji, depending on the current mode).  The mode starts out in
+ASCII/ISO-8859-1 and is switched using escape sequences -- for example,
+in the JIS encoding, 'ESC $ B' switches to a mode where pairs of bytes
+in the range 0 - 0x7f are interpreted as Kanji characters.
+External-formatted data is generally desirable for passing data between
+programs because it is upwardly compatible with standard
+ASCII/ISO-8859-1 strings and may require less space than internal
+encodings such as the one described above.  In addition, some encodings
+(e.g. JIS) keep all characters (except the ESC used to switch modes) in
+the printing ASCII range 0x20 - 0x7e, which results in a much higher
+probability that the data will avoid being garbled in transmission.
+Externally-formatted data is generally not very convenient to work with,
+however, and for this reason is usually converted to internal format
+before any work is done on the string.
+NOTE: filenames need to be in external format so that ISO-8859-1
+characters come out correctly.
+@end table
+@node Buffer Positions, Other Typedefs, Different Ways of Seeing Internal Text, Byte/Character Types; Buffer Positions; Other Typedefs
+@subsection Buffer Positions
+@cindex buffer positions
+There are three possible ways to specify positions in a buffer.  All
+of these are one-based: the beginning of the buffer is position or
+index 1, and 0 is not a valid position.
+As a "buffer position" (typedef Charbpos):
+This is an index specifying an offset in characters from the
+beginning of the buffer.  Note that buffer positions are
+logically @strong{between} characters, not on a character.  The
+difference between two buffer positions specifies the number of
+characters between those positions.  Buffer positions are the
+only kind of position externally visible to the user.
+As a "byte index" (typedef Bytebpos):
+This is an index over the bytes used to represent the characters
+in the buffer.  If there is no Mule support, this is identical
+to a buffer position, because each character is represented
+using one byte.  However, with Mule support, many characters
+require two or more bytes for their representation, and so a
+byte index may be greater than the corresponding buffer
+position.
+As a "memory index" (typedef Membpos):
+This is the byte index adjusted for the gap.  For positions
+before the gap, this is identical to the byte index.  For
+positions after the gap, this is the byte index plus the gap
+size.  There are two possible memory indices for the gap
+position; the memory index at the beginning of the gap should
+always be used, except in code that deals with manipulating the
+gap, where both indices may be seen.  The address of the
+character "at" (i.e. following) a particular position can be
+obtained from the formula
+buffer_start_address + memory_index(position) - 1
+except in the case of characters at the gap position.
+@node Other Typedefs, Usage of the Various Representations, Buffer Positions, Byte/Character Types; Buffer Positions; Other Typedefs
+@subsection Other Typedefs
+@cindex other typedefs
+Charcount:
+----------
+This typedef represents a count of characters, such as
+a character offset into a string or the number of
+characters between two positions in a buffer.  The
+difference between two Charbpos's is a Charcount, and
+character positions in a string are represented using
+a Charcount.
+Textcount:
+----------
+#### Document me.
+Bytecount:
+----------
+Similar to a Charcount but represents a count of bytes.
+The difference between two Bytebpos's is a Bytecount.
+@node Usage of the Various Representations, Working With the Various Representations, Other Typedefs, Byte/Character Types; Buffer Positions; Other Typedefs
+@subsection Usage of the Various Representations
+@cindex usage of the various representations
+Memory indices are used in low-level functions in insdel.c and for
+extent endpoints and marker positions.  The reason for this is that
+this way, the extents and markers don't need to be updated for most
+insertions, which merely shrink the gap and don't move any
+characters around in memory.
+(The beginning-of-gap memory index simplifies insertions w.r.t.
+markers, because text usually gets inserted after markers.  For
+extents, it is merely for consistency, because text can get
+inserted either before or after an extent's endpoint depending on
+the open/closedness of the endpoint.)
+Byte indices are used in other code that needs to be fast,
+such as the searching, redisplay, and extent-manipulation code.
+Buffer positions are used in all other code.  This is because this
+representation is easiest to work with (especially since Lisp
+code always uses buffer positions), necessitates the fewest
+changes to existing code, and is the safest (e.g. if the text gets
+shifted underneath a buffer position, it will still point to a
+character; if text is shifted under a byte index, it might point
+to the middle of a character, which would be bad).
+Similarly, Charcounts are used in all code that deals with strings
+except for code that needs to be fast, which used Bytecounts.
+Strings are always passed around internally using internal format.
+Conversions between external format are performed at the time
+that the data goes in or out of Emacs.
+@node Working With the Various Representations,  , Usage of the Various Representations, Byte/Character Types; Buffer Positions; Other Typedefs
+@subsection Working With the Various Representations
+@cindex working with the various representations
+We write things this way because it's very important the
+MAX_BYTEBPOS_GAP_SIZE_3 is a multiple of 3. (As it happens,
+65535 is a multiple of 3, but this may not always be the
+case. #### unfinished
+@node Internal Text API's, Coding for Mule, Byte/Character Types; Buffer Positions; Other Typedefs, Multilingual Support
+@section Internal Text API's
+@cindex internal text API's
+@cindex text API's, internal
+@cindex API's, text, internal
+@strong{NOTE}: The most current documentation for these API's is in
+@file{text.h}.  In case of error, assume that file is correct and this
+one wrong.
+@menu
+* Basic internal-format API's::
+* The DFC API::
+* The Eistring API::
+@end menu
+@node Basic internal-format API's, The DFC API, Internal Text API's, Internal Text API's
+@subsection Basic internal-format API's
+@cindex basic internal-format API's
+@cindex internal-format API's, basic
+@cindex API's, basic internal-format
+These are simple functions and macros to convert between text
+representation and characters, move forward and back in text, etc.
+#### Finish the rest of this.
+Use the following functions/macros on contiguous text in any of the
+internal formats.  Those that take a format arg work on all internal
+formats; the others work only on the default (variable-width under Mule)
+format.  If the text you're operating on is known to come from a buffer,
+use the buffer-level functions in buffer.h, which automatically know the
+correct format and handle the gap.
+Some terminology:
+"itext" appearing in the macros means "internal-format text" -- type
+@code{Ibyte *}.  Operations on such pointers themselves, rather than on the
+text being pointed to, have "itext" instead of "itext" in the macro
+name.  "ichar" in the macro names means an Ichar -- the representation
+of a character as a single integer rather than a series of bytes, as part
+of "itext".  Many of the macros below are for converting between the
+two representations of characters.
+Note also that we try to consistently distinguish between an "Ichar" and
+a Lisp character.  Stuff working with Lisp characters often just says
+"char", so we consistently use "Ichar" when that's what we're working
+with.
+@node The DFC API, The Eistring API, Basic internal-format API's, Internal Text API's
+@subsection The DFC API
+@cindex DFC API
+@cindex API, DFC
+This is for conversion between internal and external text.  Note that
+there is also the "new DFC" API, which @strong{returns} a pointer to the
+converted text (in alloca space), rather than storing it into a
+variable.
+The macros below are used for converting data between different formats.
+Generally, the data is textual, and the formats are related to
+internationalization (e.g. converting between internal-format text and
+UTF-8) -- but the mechanism is general, and could be used for anything,
+e.g. decoding gzipped data.
+In general, conversion involves a source of data, a sink, the existing
+format of the source data, and the desired format of the sink.  The
+macros below, however, always require that either the source or sink is
+internal-format text.  Therefore, in practice the conversions below
+involve source, sink, an external format (specified by a coding system),
+and the direction of conversion (internal->external or vice-versa).
+Sources and sinks can be raw data (sized or unsized -- when unsized,
+input data is assumed to be null-terminated [double null-terminated for
+Unicode-format data], and on output the length is not stored anywhere),
+Lisp strings, Lisp buffers, lstreams, and opaque data objects.  When the
+output is raw data, the result can be allocated either with @code{alloca()} or
+@code{malloc()}. (There is currently no provision for writing into a fixed
+buffer.  If you want this, use @code{alloca()} output and then copy the data --
+but be careful with the size!  Unless you are very sure of the encoding
+being used, upper bounds for the size are not in general computable.)
+The obvious restrictions on source and sink types apply (e.g. Lisp
+strings are a source and sink only for internal data).
+All raw data outputted will contain an extra null byte (two bytes for
+Unicode -- currently, in fact, all output data, whether internal or
+external, is double-null-terminated, but you can't count on this; see
+below).  This means that enough space is allocated to contain the extra
+nulls; however, these nulls are not reflected in the returned output
+size.
+The most basic macros are TO_EXTERNAL_FORMAT and TO_INTERNAL_FORMAT.
+These can be used to convert between any kinds of sources or sinks.
+However, 99% of conversions involve raw data or Lisp strings as both
+source and sink, and usually data is output as @code{alloca()} rather than
+@code{malloc()}.  For this reason, convenience macros are defined for many types
+of conversions involving raw data and/or Lisp strings, especially when
+the output is an @code{alloca()}ed string. (When the destination is a
+Lisp_String, there are other functions that should be used instead --
+@code{build_ext_string()} and @code{make_ext_string()}, for example.) The convenience
+macros are of two types -- the older kind that store the result into a
+specified variable, and the newer kind that return the result.  The newer
+kind of macros don't exist when the output is sized data, because that
+would have two return values.  NOTE: All convenience macros are
+ultimately defined in terms of TO_EXTERNAL_FORMAT and TO_INTERNAL_FORMAT.
+Thus, any comments below about the workings of these macros also apply to
+all convenience macros.
+@example
+TO_EXTERNAL_FORMAT (source_type, source, sink_type, sink, codesys)
+TO_INTERNAL_FORMAT (source_type, source, sink_type, sink, codesys)
+@end example
+Typical use is
+@example
+TO_EXTERNAL_FORMAT (LISP_STRING, str, C_STRING_MALLOC, ptr, Qfile_name);
+@end example
+which means that the contents of the lisp string @var{str} are written
+to a malloc'ed memory area which will be pointed to by @var{ptr}, after the
+function returns.  The conversion will be done using the @code{file-name}
+coding system (which will be controlled by the user indirectly by
+setting or binding the variable @code{file-name-coding-system}).
+Some sources and sinks require two C variables to specify.  We use
+some preprocessor magic to allow different source and sink types, and
+even different numbers of arguments to specify different types of
+sources and sinks.
+So we can have a call that looks like
+@example
+TO_INTERNAL_FORMAT (DATA, (ptr, len),
+MALLOC, (ptr, len),
+coding_system);
+@end example
+The parenthesized argument pairs are required to make the
+preprocessor magic work.
+NOTE: GC is inhibited during the entire operation of these macros.  This
+is because frequently the data to be converted comes from strings but
+gets passed in as just DATA, and GC may move around the string data.  If
+we didn't inhibit GC, there'd have to be a lot of messy recoding,
+alloca-copying of strings and other annoying stuff.
+The source or sink can be specified in one of these ways:
+@example
+DATA,   (ptr, len),    // input data is a fixed buffer of size len
+ALLOCA, (ptr, len),    // output data is in a @code{ALLOCA()}ed buffer of size len
+MALLOC, (ptr, len),    // output data is in a @code{malloc()}ed buffer of size len
+C_STRING_ALLOCA, ptr,  // equivalent to ALLOCA (ptr, len_ignored) on output
+C_STRING_MALLOC, ptr,  // equivalent to MALLOC (ptr, len_ignored) on output
+C_STRING,     ptr,     // equivalent to DATA, (ptr, strlen/wcslen (ptr))
+// on input (the Unicode version is used when correct)
+LISP_STRING,  string,  // input or output is a Lisp_Object of type string
+LISP_BUFFER,  buffer,  // output is written to (point) in lisp buffer
+LISP_LSTREAM, lstream, // input or output is a Lisp_Object of type lstream
+LISP_OPAQUE,  object,  // input or output is a Lisp_Object of type opaque
+@end example
+When specifying the sink, use lvalues, since the macro will assign to them,
+except when the sink is an lstream or a lisp buffer.
+For the sink types @code{ALLOCA} and @code{C_STRING_ALLOCA}, the resulting text is
+stored in a stack-allocated buffer, which is automatically freed on
+returning from the function.  However, the sink types @code{MALLOC} and
+@code{C_STRING_MALLOC} return @code{xmalloc()}ed memory.  The caller is responsible
+for freeing this memory using @code{xfree()}.
+The macros accept the kinds of sources and sinks appropriate for
+internal and external data representation.  See the type_checking_assert
+macros below for the actual allowed types.
+Since some sources and sinks use one argument (a Lisp_Object) to
+specify them, while others take a (pointer, length) pair, we use
+some C preprocessor trickery to allow pair arguments to be specified
+by parenthesizing them, as in the examples above.
+Anything prefixed by dfc_ (`data format conversion') is private.
+They are only used to implement these macros.
+[[Using C_STRING* is appropriate for using with external APIs that
+take null-terminated strings.  For internal data, we should try to
+be '\0'-clean - i.e. allow arbitrary data to contain embedded '\0'.
+Sometime in the future we might allow output to C_STRING_ALLOCA or
+C_STRING_MALLOC _only_ with @code{TO_EXTERNAL_FORMAT()}, not
+@code{TO_INTERNAL_FORMAT()}.]]
+The above comments are not true.  Frequently (most of the time, in
+fact), external strings come as zero-terminated entities, where the
+zero-termination is the only way to find out the length.  Even in
+cases where you can get the length, most of the time the system will
+still use the null to signal the end of the string, and there will
+still be no way to either send in or receive a string with embedded
+nulls.  In such situations, it's pointless to track the length
+because null bytes can never be in the string.  We have a lot of
+operations that make it easy to operate on zero-terminated strings,
+and forcing the user the deal with the length everywhere would only
+make the code uglier and more complicated, for no gain. --ben
+There is no problem using the same lvalue for source and sink.
+Also, when pointers are required, the code (currently at least) is
+lax and allows any pointer types, either in the source or the sink.
+This makes it possible, e.g., to deal with internal format data held
+in char *'s or external format data held in WCHAR * (i.e. Unicode).
+Finally, whenever storage allocation is called for, extra space is
+allocated for a terminating zero, and such a zero is stored in the
+appropriate place, regardless of whether the source data was
+specified using a length or was specified as zero-terminated.  This
+allows you to freely pass the resulting data, no matter how
+obtained, to a routine that expects zero termination (modulo, of
+course, that any embedded zeros in the resulting text will cause
+truncation).  In fact, currently two embedded zeros are allocated
+and stored after the data result.  This is to allow for the
+possibility of storing a Unicode value on output, which needs the
+two zeros.  Currently, however, the two zeros are stored regardless
+of whether the conversion is internal or external and regardless of
+whether the external coding system is in fact Unicode.  This
+behavior may change in the future, and you cannot rely on this --
+the most you can rely on is that sink data in Unicode format will
+have two terminating nulls, which combine to form one Unicode null
+character.
+NOTE: You might ask, why are these not written as functions that
+@strong{RETURN} the converted string, since that would allow them to be used
+much more conveniently, without having to constantly declare temporary
+variables?  The answer is that in fact I originally did write the
+routines that way, but that required either
+@itemize @bullet
+@item
+(a) calling @code{alloca()} inside of a function call, or
+@item
+(b) using expressions separated by commas and a global temporary variable, or
+@item
+(c) using the GCC extension (@{ ... @}).
+@end itemize
+Turned out that all of the above had bugs, all caused by GCC (hence the
+comments about "those GCC wankers" and "ream gcc up the ass").  As for
+(a), some versions of GCC (especially on Intel platforms), which had
+buggy implementations of @code{alloca()} that couldn't handle being called
+inside of a function call -- they just decremented the stack right in the
+middle of pushing args.  Oops, crash with stack trashing, very bad.  (b)
+was an attempt to fix (a), and that led to further GCC crashes, esp. when
+you had two such calls in a single subexpression, because GCC couldn't be
+counted upon to follow even a minimally reasonable order of execution.
+True, you can't count on one argument being evaluated before another, but
+GCC would actually interleave them so that the temp var got stomped on by
+one while the other was accessing it.  So I tried (c), which was
+problematic because that GCC extension has more bugs in it than a
+termite's nest.
+So reluctantly I converted to the current way.  Now, that was awhile ago
+(c. 1994), and it appears that the bug involving alloca in function calls
+has long since been fixed.  More recently, I defined the new-dfc routines
+down below, which DO allow exactly such convenience of returning your
+args rather than store them in temp variables, and I also wrote a
+configure check to see whether @code{alloca()} causes crashes inside of function
+calls, and if so use the portable @code{alloca()} implementation in alloca.c.
+If you define TEST_NEW_DFC, the old routines get written in terms of the
+new ones, and I've had a beta put out with this on and it appeared to
+this appears to cause no problems -- so we should consider
+switching, and feel no compunctions about writing further such function-
+like @code{alloca()} routines in lieu of statement-like ones. --ben
+@node The Eistring API,  , The DFC API, Internal Text API's
+@subsection The Eistring API
+@cindex Eistring API
+@cindex API, Eistring
+(This API is currently under-used) When doing simple things with
+internal text, the basic internal-format API's are enough.  But to do
+things like delete or replace a substring, concatenate various strings,
+etc. is difficult to do cleanly because of the allocation issues.
+The Eistring API is designed to deal with this, and provides a clean
+way of modifying and building up internal text. (Note that the former
+lack of this API has meant that some code uses Lisp strings to do
+similar manipulations, resulting in excess garbage and increased
+garbage collection.)
+NOTE: The Eistring API is (or should be) Mule-correct even without
+an ASCII-compatible internal representation.
+@example
+#### NOTE: This is a work in progress.  Neither the API nor especially
+the implementation is finished.
+NOTE: An Eistring is a structure that makes it easy to work with
+internally-formatted strings of data.  It provides operations similar
+in feel to the standard @code{strcpy()}, @code{strcat()}, @code{strlen()}, etc., but
+(a) it is Mule-correct
+(b) it does dynamic allocation so you never have to worry about size
+restrictions
+(c) it comes in an @code{ALLOCA()} variety (all allocation is stack-local,
+so there is no need to explicitly clean up) as well as a @code{malloc()}
+variety
+(d) it knows its own length, so it does not suffer from standard null
+byte brain-damage -- but it null-terminates the data anyway, so
+it can be passed to standard routines
+(e) it provides a much more powerful set of operations and knows about
+all the standard places where string data might reside: Lisp_Objects,
+other Eistrings, Ibyte * data with or without an explicit length,
+ASCII strings, Ichars, etc.
+(f) it provides easy operations to convert to/from externally-formatted
+data, and is easier to use than the standard TO_INTERNAL_FORMAT
+and TO_EXTERNAL_FORMAT macros. (An Eistring can store both the internal
+and external version of its data, but the external version is only
+initialized or changed when you call @code{eito_external()}.)
+The idea is to make it as easy to write Mule-correct string manipulation
+code as it is to write normal string manipulation code.  We also make
+the API sufficiently general that it can handle multiple internal data
+formats (e.g. some fixed-width optimizing formats and a default variable
+width format) and allows for @strong{ANY} data format we might choose in the
+future for the default format, including UCS2. (In other words, we can't
+assume that the internal format is ASCII-compatible and we can't assume
+it doesn't have embedded null bytes.  We do assume, however, that any
+chosen format will have the concept of null-termination.) All of this is
+hidden from the user.
+#### It is really too bad that we don't have a real object-oriented
+language, or at least a language with polymorphism!
+**********************************************
+*                 Declaration                *
+**********************************************
+To declare an Eistring, either put one of the following in the local
+variable section:
+DECLARE_EISTRING (name);
+Declare a new Eistring and initialize it to the empy string.  This
+is a standard local variable declaration and can go anywhere in the
+variable declaration section.  NAME itself is declared as an
+Eistring *, and its storage declared on the stack.
+DECLARE_EISTRING_MALLOC (name);
+Declare and initialize a new Eistring, which uses @code{malloc()}ed
+instead of @code{ALLOCA()}ed data.  This is a standard local variable
+declaration and can go anywhere in the variable declaration
+section.  Once you initialize the Eistring, you will have to free
+it using @code{eifree()} to avoid memory leaks.  You will need to use this
+form if you are passing an Eistring to any function that modifies
+it (otherwise, the modified data may be in stack space and get
+overwritten when the function returns).
+or use
+Eistring ei;
+void eiinit (Eistring *ei);
+void eiinit_malloc (Eistring *einame);
+If you need to put an Eistring elsewhere than in a local variable
+declaration (e.g. in a structure), declare it as shown and then
+call one of the init macros.
+Also note:
+void eifree (Eistring *ei);
+If you declared an Eistring to use @code{malloc()} to hold its data,
+or converted it to the heap using @code{eito_malloc()}, then this
+releases any data in it and afterwards resets the Eistring
+using @code{eiinit_malloc()}.  Otherwise, it just resets the Eistring
+using @code{eiinit()}.
+**********************************************
+*                 Conventions                *
+**********************************************
+- The names of the functions have been chosen, where possible, to
+match the names of @code{str*()} functions in the standard C API.
+-
+**********************************************
+*               Initialization               *
+**********************************************
+void eireset (Eistring *eistr);
+Initialize the Eistring to the empty string.
+void eicpy_* (Eistring *eistr, ...);
+Initialize the Eistring from somewhere:
+void eicpy_ei (Eistring *eistr, Eistring *eistr2);
+... from another Eistring.
+void eicpy_lstr (Eistring *eistr, Lisp_Object lisp_string);
+... from a Lisp_Object string.
+void eicpy_ch (Eistring *eistr, Ichar ch);
+... from an Ichar (this can be a conventional C character).
+void eicpy_lstr_off (Eistring *eistr, Lisp_Object lisp_string,
+Bytecount off, Charcount charoff,
+Bytecount len, Charcount charlen);
+... from a section of a Lisp_Object string.
+void eicpy_lbuf (Eistring *eistr, Lisp_Object lisp_buf,
+	    Bytecount off, Charcount charoff,
+	    Bytecount len, Charcount charlen);
+... from a section of a Lisp_Object buffer.
+void eicpy_raw (Eistring *eistr, const Ibyte *data, Bytecount len);
+... from raw internal-format data in the default internal format.
+void eicpy_rawz (Eistring *eistr, const Ibyte *data);
+... from raw internal-format data in the default internal format
+that is "null-terminated" (the meaning of this depends on the nature
+of the default internal format).
+void eicpy_raw_fmt (Eistring *eistr, const Ibyte *data, Bytecount len,
+Internal_Format intfmt, Lisp_Object object);
+... from raw internal-format data in the specified format.
+void eicpy_rawz_fmt (Eistring *eistr, const Ibyte *data,
+Internal_Format intfmt, Lisp_Object object);
+... from raw internal-format data in the specified format that is
+"null-terminated" (the meaning of this depends on the nature of
+the specific format).
+void eicpy_c (Eistring *eistr, const Ascbyte *c_string);
+... from an ASCII null-terminated string.  Non-ASCII characters in
+the string are @strong{ILLEGAL} (read @code{abort()} with error-checking defined).
+void eicpy_c_len (Eistring *eistr, const Ascbyte *c_string, len);
+... from an ASCII string, with length specified.  Non-ASCII characters
+in the string are @strong{ILLEGAL} (read @code{abort()} with error-checking defined).
+void eicpy_ext (Eistring *eistr, const Extbyte *extdata,
+Lisp_Object codesys);
+... from external null-terminated data, with coding system specified.
+void eicpy_ext_len (Eistring *eistr, const Extbyte *extdata,
+Bytecount extlen, Lisp_Object codesys);
+... from external data, with length and coding system specified.
+void eicpy_lstream (Eistring *eistr, Lisp_Object lstream);
+... from an lstream; reads data till eof.  Data must be in default
+internal format; otherwise, interpose a decoding lstream.
+**********************************************
+*    Getting the data out of the Eistring    *
+**********************************************
+Ibyte *eidata (Eistring *eistr);
+Return a pointer to the raw data in an Eistring.  This is NOT
+a copy.
+Lisp_Object eimake_string (Eistring *eistr);
+Make a Lisp string out of the Eistring.
+Lisp_Object eimake_string_off (Eistring *eistr,
+Bytecount off, Charcount charoff,
+			  Bytecount len, Charcount charlen);
+Make a Lisp string out of a section of the Eistring.
+void eicpyout_alloca (Eistring *eistr, LVALUE: Ibyte *ptr_out,
+LVALUE: Bytecount len_out);
+Make an @code{ALLOCA()} copy of the data in the Eistring, using the
+default internal format.  Due to the nature of @code{ALLOCA()}, this
+must be a macro, with all lvalues passed in as parameters.
+(More specifically, not all compilers correctly handle using
+@code{ALLOCA()} as the argument to a function call -- GCC on x86
+didn't used to, for example.) A pointer to the @code{ALLOCA()}ed data
+is stored in PTR_OUT, and the length of the data (not including
+the terminating zero) is stored in LEN_OUT.
+void eicpyout_alloca_fmt (Eistring *eistr, LVALUE: Ibyte *ptr_out,
+LVALUE: Bytecount len_out,
+Internal_Format intfmt, Lisp_Object object);
+Like @code{eicpyout_alloca()}, but converts to the specified internal
+format. (No formats other than FORMAT_DEFAULT are currently
+implemented, and you get an assertion failure if you try.)
+Ibyte *eicpyout_malloc (Eistring *eistr, Bytecount *intlen_out);
+Make a @code{malloc()} copy of the data in the Eistring, using the
+default internal format.  This is a real function.  No lvalues
+passed in.  Returns the new data, and stores the length (not
+including the terminating zero) using INTLEN_OUT, unless it's
+a NULL pointer.
+Ibyte *eicpyout_malloc_fmt (Eistring *eistr, Internal_Format intfmt,
+Bytecount *intlen_out, Lisp_Object object);
+Like @code{eicpyout_malloc()}, but converts to the specified internal
+format. (No formats other than FORMAT_DEFAULT are currently
+implemented, and you get an assertion failure if you try.)
+**********************************************
+*             Moving to the heap             *
+**********************************************
+void eito_malloc (Eistring *eistr);
+Move this Eistring to the heap.  Its data will be stored in a
+@code{malloc()}ed block rather than the stack.  Subsequent changes to
+this Eistring will @code{realloc()} the block as necessary.  Use this
+when you want the Eistring to remain in scope past the end of
+this function call.  You will have to manually free the data
+in the Eistring using @code{eifree()}.
+void eito_alloca (Eistring *eistr);
+Move this Eistring back to the stack, if it was moved to the
+heap with @code{eito_malloc()}.  This will automatically free any
+heap-allocated data.
+**********************************************
+*            Retrieving the length           *
+**********************************************
+Bytecount eilen (Eistring *eistr);
+Return the length of the internal data, in bytes.  See also
+@code{eiextlen()}, below.
+Charcount eicharlen (Eistring *eistr);
+Return the length of the internal data, in characters.
+**********************************************
+*           Working with positions           *
+**********************************************
+Bytecount eicharpos_to_bytepos (Eistring *eistr, Charcount charpos);
+Convert a char offset to a byte offset.
+Charcount eibytepos_to_charpos (Eistring *eistr, Bytecount bytepos);
+Convert a byte offset to a char offset.
+Bytecount eiincpos (Eistring *eistr, Bytecount bytepos);
+Increment the given position by one character.
+Bytecount eiincpos_n (Eistring *eistr, Bytecount bytepos, Charcount n);
+Increment the given position by N characters.
+Bytecount eidecpos (Eistring *eistr, Bytecount bytepos);
+Decrement the given position by one character.
+Bytecount eidecpos_n (Eistring *eistr, Bytecount bytepos, Charcount n);
+Deccrement the given position by N characters.
+**********************************************
+*    Getting the character at a position     *
+**********************************************
+Ichar eigetch (Eistring *eistr, Bytecount bytepos);
+Return the character at a particular byte offset.
+Ichar eigetch_char (Eistring *eistr, Charcount charpos);
+Return the character at a particular character offset.
+**********************************************
+*    Setting the character at a position     *
+**********************************************
+Ichar eisetch (Eistring *eistr, Bytecount bytepos, Ichar chr);
+Set the character at a particular byte offset.
+Ichar eisetch_char (Eistring *eistr, Charcount charpos, Ichar chr);
+Set the character at a particular character offset.
+**********************************************
+*               Concatenation                *
+**********************************************
+void eicat_* (Eistring *eistr, ...);
+Concatenate onto the end of the Eistring, with data coming from the
+same places as above:
+void eicat_ei (Eistring *eistr, Eistring *eistr2);
+... from another Eistring.
+void eicat_c (Eistring *eistr, Ascbyte *c_string);
+... from an ASCII null-terminated string.  Non-ASCII characters in
+the string are @strong{ILLEGAL} (read @code{abort()} with error-checking defined).
+void eicat_raw (ei, const Ibyte *data, Bytecount len);
+... from raw internal-format data in the default internal format.
+void eicat_rawz (ei, const Ibyte *data);
+... from raw internal-format data in the default internal format
+that is "null-terminated" (the meaning of this depends on the nature
+of the default internal format).
+void eicat_lstr (ei, Lisp_Object lisp_string);
+... from a Lisp_Object string.
+void eicat_ch (ei, Ichar ch);
+... from an Ichar.
+All except the first variety are convenience functions.
+n the general case, create another Eistring from the source.)
+**********************************************
+*                Replacement                 *
+**********************************************
+void eisub_* (Eistring *eistr, Bytecount off, Charcount charoff,
+			  Bytecount len, Charcount charlen, ...);
+Replace a section of the Eistring, specifically:
+void eisub_ei (Eistring *eistr, Bytecount off, Charcount charoff,
+	  Bytecount len, Charcount charlen, Eistring *eistr2);
+... with another Eistring.
+void eisub_c (Eistring *eistr, Bytecount off, Charcount charoff,
+	 Bytecount len, Charcount charlen, Ascbyte *c_string);
+... with an ASCII null-terminated string.  Non-ASCII characters in
+the string are @strong{ILLEGAL} (read @code{abort()} with error-checking defined).
+void eisub_ch (Eistring *eistr, Bytecount off, Charcount charoff,
+	  Bytecount len, Charcount charlen, Ichar ch);
+... with an Ichar.
+void eidel (Eistring *eistr, Bytecount off, Charcount charoff,
+Bytecount len, Charcount charlen);
+Delete a section of the Eistring.
+**********************************************
+*      Converting to an external format      *
+**********************************************
+void eito_external (Eistring *eistr, Lisp_Object codesys);
+Convert the Eistring to an external format and store the result
+in the string.  NOTE: Further changes to the Eistring will @strong{NOT}
+change the external data stored in the string.  You will have to
+call @code{eito_external()} again in such a case if you want the external
+data.
+Extbyte *eiextdata (Eistring *eistr);
+Return a pointer to the external data stored in the Eistring as
+a result of a prior call to @code{eito_external()}.
+Bytecount eiextlen (Eistring *eistr);
+Return the length in bytes of the external data stored in the
+Eistring as a result of a prior call to @code{eito_external()}.
+**********************************************
+* Searching in the Eistring for a character  *
+**********************************************
+Bytecount eichr (Eistring *eistr, Ichar chr);
+Charcount eichr_char (Eistring *eistr, Ichar chr);
+Bytecount eichr_off (Eistring *eistr, Ichar chr, Bytecount off,
+		Charcount charoff);
+Charcount eichr_off_char (Eistring *eistr, Ichar chr, Bytecount off,
+		     Charcount charoff);
+Bytecount eirchr (Eistring *eistr, Ichar chr);
+Charcount eirchr_char (Eistring *eistr, Ichar chr);
+Bytecount eirchr_off (Eistring *eistr, Ichar chr, Bytecount off,
+		 Charcount charoff);
+Charcount eirchr_off_char (Eistring *eistr, Ichar chr, Bytecount off,
+		      Charcount charoff);
+**********************************************
+*   Searching in the Eistring for a string   *
+**********************************************
+Bytecount eistr_ei (Eistring *eistr, Eistring *eistr2);
+Charcount eistr_ei_char (Eistring *eistr, Eistring *eistr2);
+Bytecount eistr_ei_off (Eistring *eistr, Eistring *eistr2, Bytecount off,
+		   Charcount charoff);
+Charcount eistr_ei_off_char (Eistring *eistr, Eistring *eistr2,
+			Bytecount off, Charcount charoff);
+Bytecount eirstr_ei (Eistring *eistr, Eistring *eistr2);
+Charcount eirstr_ei_char (Eistring *eistr, Eistring *eistr2);
+Bytecount eirstr_ei_off (Eistring *eistr, Eistring *eistr2, Bytecount off,
+		    Charcount charoff);
+Charcount eirstr_ei_off_char (Eistring *eistr, Eistring *eistr2,
+			 Bytecount off, Charcount charoff);
+Bytecount eistr_c (Eistring *eistr, Ascbyte *c_string);
+Charcount eistr_c_char (Eistring *eistr, Ascbyte *c_string);
+Bytecount eistr_c_off (Eistring *eistr, Ascbyte *c_string, Bytecount off,
+		   Charcount charoff);
+Charcount eistr_c_off_char (Eistring *eistr, Ascbyte *c_string,
+		       Bytecount off, Charcount charoff);
+Bytecount eirstr_c (Eistring *eistr, Ascbyte *c_string);
+Charcount eirstr_c_char (Eistring *eistr, Ascbyte *c_string);
+Bytecount eirstr_c_off (Eistring *eistr, Ascbyte *c_string,
+		   Bytecount off, Charcount charoff);
+Charcount eirstr_c_off_char (Eistring *eistr, Ascbyte *c_string,
+			Bytecount off, Charcount charoff);
+**********************************************
+*                 Comparison                 *
+**********************************************
+int eicmp_* (Eistring *eistr, ...);
+int eicmp_off_* (Eistring *eistr, Bytecount off, Charcount charoff,
+Bytecount len, Charcount charlen, ...);
+int eicasecmp_* (Eistring *eistr, ...);
+int eicasecmp_off_* (Eistring *eistr, Bytecount off, Charcount charoff,
+Bytecount len, Charcount charlen, ...);
+int eicasecmp_i18n_* (Eistring *eistr, ...);
+int eicasecmp_i18n_off_* (Eistring *eistr, Bytecount off, Charcount charoff,
+Bytecount len, Charcount charlen, ...);
+Compare the Eistring with the other data.  Return value same as
+from strcmp.  The @code{*} is either @code{ei} for another Eistring (in
+which case @code{...} is an Eistring), or @code{c} for a pure-ASCII string
+(in which case @code{...} is a pointer to that string).  For anything
+more complex, first create an Eistring out of the source.
+Comparison is either simple (@code{eicmp_...}), ASCII case-folding
+(@code{eicasecmp_...}), or multilingual case-folding
+(@code{eicasecmp_i18n_...}).
+More specifically, the prototypes are:
+int eicmp_ei (Eistring *eistr, Eistring *eistr2);
+int eicmp_off_ei (Eistring *eistr, Bytecount off, Charcount charoff,
+Bytecount len, Charcount charlen, Eistring *eistr2);
+int eicasecmp_ei (Eistring *eistr, Eistring *eistr2);
+int eicasecmp_off_ei (Eistring *eistr, Bytecount off, Charcount charoff,
+Bytecount len, Charcount charlen, Eistring *eistr2);
+int eicasecmp_i18n_ei (Eistring *eistr, Eistring *eistr2);
+int eicasecmp_i18n_off_ei (Eistring *eistr, Bytecount off,
+		      Charcount charoff, Bytecount len,
+		      Charcount charlen, Eistring *eistr2);
+int eicmp_c (Eistring *eistr, Ascbyte *c_string);
+int eicmp_off_c (Eistring *eistr, Bytecount off, Charcount charoff,
+Bytecount len, Charcount charlen, Ascbyte *c_string);
+int eicasecmp_c (Eistring *eistr, Ascbyte *c_string);
+int eicasecmp_off_c (Eistring *eistr, Bytecount off, Charcount charoff,
+Bytecount len, Charcount charlen,
+Ascbyte *c_string);
+int eicasecmp_i18n_c (Eistring *eistr, Ascbyte *c_string);
+int eicasecmp_i18n_off_c (Eistring *eistr, Bytecount off, Charcount charoff,
+Bytecount len, Charcount charlen,
+Ascbyte *c_string);
+**********************************************
+*         Case-changing the Eistring         *
+**********************************************
+void eilwr (Eistring *eistr);
+Convert all characters in the Eistring to lowercase.
+void eiupr (Eistring *eistr);
+Convert all characters in the Eistring to uppercase.
+@end example
+@node Coding for Mule, CCL, Internal Text API's, Multilingual Support
+@section Coding for Mule
+@cindex coding for Mule
+@cindex Mule, coding for
+Although Mule support is not compiled by default in XEmacs, many people
+are using it, and we consider it crucial that new code works correctly
+with multibyte characters.  This is not hard; it is only a matter of
+following several simple user-interface guidelines.  Even if you never
+compile with Mule, with a little practice you will find it quite easy
+to code Mule-correctly.
+Note that these guidelines are not necessarily tied to the current Mule
+implementation; they are also a good idea to follow on the grounds of
+code generalization for future I18N work.
+@menu
+* Character-Related Data Types::
+* Working With Character and Byte Positions::
+* Conversion to and from External Data::
+* General Guidelines for Writing Mule-Aware Code::
+* An Example of Mule-Aware Code::
+* Mule-izing Code::
+@end menu
+@node Character-Related Data Types, Working With Character and Byte Positions, Coding for Mule, Coding for Mule
+@subsection Character-Related Data Types
+@cindex character-related data types
+@cindex data types, character-related
+First, let's review the basic character-related datatypes used by
+XEmacs.  Note that some of the separate @code{typedef}s are not
+mandatory, but they improve clarity of code a great deal, because one
+glance at the declaration can tell the intended use of the variable.
+@table @code
+@item Ichar
+@cindex Ichar
+An @code{Ichar} holds a single Emacs character.
+Obviously, the equality between characters and bytes is lost in the Mule
+world.  Characters can be represented by one or more bytes in the
+buffer, and @code{Ichar} is a C type large enough to hold any
+character.  (This currently isn't quite true for ISO 10646, which
+defines a character as a 31-bit non-negative quantity, while XEmacs
+characters are only 30-bits.  This is irrelevant, unless you are
+considering using the ISO 10646 private groups to support really large
+private character sets---in particular, the Mule character set!---in
+a version of XEmacs using Unicode internally.)
+Without Mule support, an @code{Ichar} is equivalent to an
+@code{unsigned char}.  [[This doesn't seem to be true; @file{lisp.h}
+unconditionally @samp{typedef}s @code{Ichar} to @code{int}.]]
+@item Ibyte
+@cindex Ibyte
+The data representing the text in a buffer or string is logically a set
+of @code{Ibyte}s.
+XEmacs does not work with the same character formats all the time; when
+reading characters from the outside, it decodes them to an internal
+format, and likewise encodes them when writing.  @code{Ibyte} (in fact
+@code{unsigned char}) is the basic unit of XEmacs internal buffers and
+strings format.  An @code{Ibyte *} is the type that points at text
+encoded in the variable-width internal encoding.
+One character can correspond to one or more @code{Ibyte}s.  In the
+current Mule implementation, an ASCII character is represented by the
+same @code{Ibyte}, and other characters are represented by a sequence
+of two or more @code{Ibyte}s.  (This will also be true of an
+implementation using UTF-8 as the internal encoding.  In fact, only code
+that implements character code conversions and a very few macros used to
+implement motion by whole characters will notice the difference between
+UTF-8 and the Mule encoding.)
+Without Mule support, there are exactly 256 characters, implicitly
+Latin-1, and each character is represented using one @code{Ibyte}, and
+there is a one-to-one correspondence between @code{Ibyte}s and
+@code{Ichar}s.
+@item Charxpos
+@item Charbpos
+@itemx Charcount
+@cindex Charxpos
+@cindex Charbpos
+@cindex Charcount
+A @code{Charbpos} represents a character position in a buffer.  A
+@code{Charcount} represents a number (count) of characters.  Logically,
+subtracting two @code{Charbpos} values yields a @code{Charcount} value.
+When representing a character position in a string, we just use
+@code{Charcount} directly.  The reason for having a separate typedef for
+buffer positions is that they are 1-based, whereas string positions are
+0-based and hence string counts and positions can be freely intermixed (a
+string position is equivalent to the count of characters from the
+beginning).  When representing a character position that could be either
+in a buffer or string (for example, in the extent code), @code{Charxpos}
+is used.  Although all of these are @code{typedef}ed to
+@code{EMACS_INT}, we use them in preference to @code{EMACS_INT} to make
+it clear what sort of position is being used.
+@code{Charxpos}, @code{Charbpos} and @code{Charcount} values are the
+only ones that are ever visible to Lisp.
+@item Bytexpos
+@itemx Bytecount
+@cindex Bytebpos
+@cindex Bytecount
+A @code{Bytebpos} represents a byte position in a buffer.  A
+@code{Bytecount} represents the distance between two positions, in
+bytes.  Byte positions in strings use @code{Bytecount}, and for byte
+positions that can be either in a buffer or string, @code{Bytexpos} is
+used.  The relationship between @code{Bytexpos}, @code{Bytebpos} and
+@code{Bytecount} is the same as the relationship between
+@code{Charxpos}, @code{Charbpos} and @code{Charcount}.
+@item Extbyte
+@cindex Extbyte
+When dealing with the outside world, XEmacs works with @code{Extbyte}s,
+which are equivalent to @code{char}.  The distance between two
+@code{Extbyte}s is a @code{Bytecount}, since external text is a
+byte-by-byte encoding.  Extbytes occur mainly at the transition point
+between internal text and external functions.  XEmacs code should not,
+if it can possibly avoid it, do any actual manipulation using external
+text, since its format is completely unpredictable (it might not even be
+ASCII-compatible).
+@end table
+@node Working With Character and Byte Positions, Conversion to and from External Data, Character-Related Data Types, Coding for Mule
+@subsection Working With Character and Byte Positions
+@cindex character and byte positions, working with
+@cindex byte positions, working with character and
+@cindex positions, working with character and byte
+Now that we have defined the basic character-related types, we can look
+at the macros and functions designed for work with them and for
+conversion between them.  Most of these macros are defined in
+@file{buffer.h}, and we don't discuss all of them here, but only the
+most important ones.  Examining the existing code is the best way to
+learn about them.
+@table @code
+@item MAX_ICHAR_LEN
+@cindex MAX_ICHAR_LEN
+This preprocessor constant is the maximum number of buffer bytes to
+represent an Emacs character in the variable width internal encoding.
+It is useful when allocating temporary strings to keep a known number of
+characters.  For instance:
+@example
+@group
+@{
+Charcount cclen;
+...
+@{
+/* Allocate place for @var{cclen} characters. */
+Ibyte *buf = (Ibyte *) alloca (cclen * MAX_ICHAR_LEN);
+...
+@end group
+@end example
+If you followed the previous section, you can guess that, logically,
+multiplying a @code{Charcount} value with @code{MAX_ICHAR_LEN} produces
+a @code{Bytecount} value.
+In the current Mule implementation, @code{MAX_ICHAR_LEN} equals 4.
+Without Mule, it is 1.  In a mature Unicode-based XEmacs, it will also
+be 4 (since all Unicode characters can be encoded in UTF-8 in 4 bytes or
+less), but some versions may use up to 6, in order to use the large
+private space provided by ISO 10646 to ``mirror'' the Mule code space.
+@item itext_ichar
+@itemx set_itext_ichar
+@cindex itext_ichar
+@cindex set_itext_ichar
+The @code{itext_ichar} macro takes a @code{Ibyte} pointer and
+returns the @code{Ichar} stored at that position.  If it were a
+function, its prototype would be:
+@example
+Ichar itext_ichar (Ibyte *p);
+@end example
+@code{set_itext_ichar} stores an @code{Ichar} to the specified byte
+position.  It returns the number of bytes stored:
+@example
+Bytecount set_itext_ichar (Ibyte *p, Ichar c);
+@end example
+It is important to note that @code{set_itext_ichar} is safe only for
+appending a character at the end of a buffer, not for overwriting a
+character in the middle.  This is because the width of characters
+varies, and @code{set_itext_ichar} cannot resize the string if it
+writes, say, a two-byte character where a single-byte character used to
+reside.
+A typical use of @code{set_itext_ichar} can be demonstrated by this
+example, which copies characters from buffer @var{buf} to a temporary
+string of Ibytes.
+@example
+@group
+@{
+Charbpos pos;
+for (pos = beg; pos < end; pos++)
+@{
+Ichar c = BUF_FETCH_CHAR (buf, pos);
+p += set_itext_ichar (buf, c);
+@}
+@}
+@end group
+@end example
+Note how @code{set_itext_ichar} is used to store the @code{Ichar}
+and increment the counter, at the same time.
+@item INC_IBYTEPTR
+@itemx DEC_IBYTEPTR
+@cindex INC_IBYTEPTR
+@cindex DEC_IBYTEPTR
+These two macros increment and decrement an @code{Ibyte} pointer,
+respectively.  They will adjust the pointer by the appropriate number of
+bytes according to the byte length of the character stored there.  Both
+macros assume that the memory address is located at the beginning of a
+valid character.
+Without Mule support, @code{INC_IBYTEPTR (p)} and @code{DEC_IBYTEPTR (p)}
+simply expand to @code{p++} and @code{p--}, respectively.
+@item bytecount_to_charcount
+@cindex bytecount_to_charcount
+Given a pointer to a text string and a length in bytes, return the
+equivalent length in characters.
+@example
+Charcount bytecount_to_charcount (Ibyte *p, Bytecount bc);
+@end example
+@item charcount_to_bytecount
+@cindex charcount_to_bytecount
+Given a pointer to a text string and a length in characters, return the
+equivalent length in bytes.
+@example
+Bytecount charcount_to_bytecount (Ibyte *p, Charcount cc);
+@end example
+@item itext_n_addr
+@cindex itext_n_addr
+Return a pointer to the beginning of the character offset @var{cc} (in
+characters) from @var{p}.
+@example
+Ibyte *itext_n_addr (Ibyte *p, Charcount cc);
+@end example
+@end table
+@node Conversion to and from External Data, General Guidelines for Writing Mule-Aware Code, Working With Character and Byte Positions, Coding for Mule
+@subsection Conversion to and from External Data
+@cindex conversion to and from external data
+@cindex external data, conversion to and from
+When an external function, such as a C library function, returns a
+@code{char} pointer, you should almost never treat it as @code{Ibyte}.
+This is because these returned strings may contain 8bit characters which
+can be misinterpreted by XEmacs, and cause a crash.  Likewise, when
+exporting a piece of internal text to the outside world, you should
+always convert it to an appropriate external encoding, lest the internal
+stuff (such as the infamous \201 characters) leak out.
+The interface to conversion between the internal and external
+representations of text are the numerous conversion macros defined in
+@file{buffer.h}.  There used to be a fixed set of external formats
+supported by these macros, but now any coding system can be used with
+them.  The coding system alias mechanism is used to create the
+following logical coding systems, which replace the fixed external
+formats.  The (dontusethis-set-symbol-value-handler) mechanism was
+enhanced to make this possible (more work on that is needed).
+Often useful coding systems:
+@table @code
+@item Qbinary
+This is the simplest format and is what we use in the absence of a more
+appropriate format.  This converts according to the @code{binary} coding
+system:
+@enumerate a
+@item
+On input, bytes 0--255 are converted into (implicitly Latin-1)
+characters 0--255.  A non-Mule xemacs doesn't really know about
+different character sets and the fonts to display them, so the bytes can
+be treated as text in different 1-byte encodings by simply setting the
+appropriate fonts.  So in a sense, non-Mule xemacs is a multi-lingual
+editor if, for example, different fonts are used to display text in
+different buffers, faces, or windows.  The specifier mechanism gives the
+user complete control over this kind of behavior.
+@item
+On output, characters 0--255 are converted into bytes 0--255 and other
+characters are converted into @samp{~}.
+@end enumerate
+@item Qnative
+Format used for the external Unix environment---@code{argv[]}, stuff
+from @code{getenv()}, stuff from the @file{/etc/passwd} file, etc.
+This is encoded according to the encoding specified by the current locale.
+[[This is dangerous; current locale is user preference, and the system
+is probably going to be something else.  Is there anything we can do
+about it?]]
+@item Qfile_name
+Format used for filenames.  This is normally the same as @code{Qnative},
+but the two should be distinguished for clarity and possible future
+separation -- and also because @code{Qfile_name} can be changed using either
+the @code{file-name-coding-system} or @code{pathname-coding-system} (now
+obsolete) variables.
+@item Qctext
+Compound-text format.  This is the standard X11 format used for data
+stored in properties, selections, and the like.  This is an 8-bit
+no-lock-shift ISO2022 coding system.  This is a real coding system,
+unlike @code{Qfile_name}, which is user-definable.
+@item Qmswindows_tstr
+Used for external data in all MS Windows functions that are declared to
+accept data of type @code{LPTSTR} or @code{LPCSTR}.  This maps to either
+@code{Qmswindows_multibyte} (a locale-specific encoding, same as
+@code{Qnative}) or @code{Qmswindows_unicode}, depending on whether
+XEmacs is being run under Windows 9X or Windows NT/2000/XP.
+@end table
+Many other coding systems are provided by default.
+There are two fundamental macros to convert between external and
+internal format, as well as various convenience macros to simplify the
+most common operations.
+@code{TO_INTERNAL_FORMAT} converts external data to internal format, and
+@code{TO_EXTERNAL_FORMAT} converts the other way around.  The arguments
+each of these receives are a source type, a source, a sink type, a sink,
+and a coding system (or a symbol naming a coding system).
+A typical call looks like
+@example
+TO_EXTERNAL_FORMAT (LISP_STRING, str, C_STRING_MALLOC, ptr, Qfile_name);
+@end example
+which means that the contents of the lisp string @code{str} are written
+to a malloc'ed memory area which will be pointed to by @code{ptr}, after
+the function returns.  The conversion will be done using the
+@code{file-name} coding system, which will be controlled by the user
+indirectly by setting or binding the variable
+@code{file-name-coding-system}.
+Some sources and sinks require two C variables to specify.  We use some
+preprocessor magic to allow different source and sink types, and even
+different numbers of arguments to specify different types of sources and
+sinks.
+So we can have a call that looks like
+@example
+TO_INTERNAL_FORMAT (DATA, (ptr, len),
+MALLOC, (ptr, len),
+coding_system);
+@end example
+The parenthesized argument pairs are required to make the preprocessor
+magic work.
+Here are the different source and sink types:
+@table @code
+@item @code{DATA, (ptr, len),}
+input data is a fixed buffer of size @var{len} at address @var{ptr}
+@item @code{ALLOCA, (ptr, len),}
+output data is placed in an @code{alloca()}ed buffer of size @var{len} pointed to by @var{ptr}
+@item @code{MALLOC, (ptr, len),}
+output data is in a @code{malloc()}ed buffer of size @var{len} pointed to by @var{ptr}
+@item @code{C_STRING_ALLOCA, ptr,}
+equivalent to @code{ALLOCA (ptr, len_ignored)} on output.
+@item @code{C_STRING_MALLOC, ptr,}
+equivalent to @code{MALLOC (ptr, len_ignored)} on output
+@item @code{C_STRING, ptr,}
+equivalent to @code{DATA, (ptr, strlen/wcslen (ptr))} on input
+@item @code{LISP_STRING, string,}
+input or output is a Lisp_Object of type string
+@item @code{LISP_BUFFER, buffer,}
+output is written to @code{(point)} in lisp buffer @var{buffer}
+@item @code{LISP_LSTREAM, lstream,}
+input or output is a Lisp_Object of type lstream
+@item @code{LISP_OPAQUE, object,}
+input or output is a Lisp_Object of type opaque
+@end table
+A source type of @code{C_STRING} or a sink type of
+@code{C_STRING_ALLOCA} or @code{C_STRING_MALLOC} is appropriate where
+the external API is not '\0'-byte-clean -- i.e. it expects strings to be
+terminated with a null byte.  For external API's that are in fact
+'\0'-byte-clean, we should of course not use these.
+The sinks to be specified must be lvalues, unless they are the lisp
+object types @code{LISP_LSTREAM} or @code{LISP_BUFFER}.
+There is no problem using the same lvalue for source and sink.
+Garbage collection is inhibited during these conversion operations, so
+it is OK to pass in data from Lisp strings using @code{XSTRING_DATA}.
+For the sink types @code{ALLOCA} and @code{C_STRING_ALLOCA}, the
+resulting text is stored in a stack-allocated buffer, which is
+automatically freed on returning from the function.  However, the sink
+types @code{MALLOC} and @code{C_STRING_MALLOC} return @code{xmalloc()}ed
+memory.  The caller is responsible for freeing this memory using
+@code{xfree()}.
+Note that it doesn't make sense for @code{LISP_STRING} to be a source
+for @code{TO_INTERNAL_FORMAT} or a sink for @code{TO_EXTERNAL_FORMAT}.
+You'll get an assertion failure if you try.
+99% of conversions involve raw data or Lisp strings as both source and
+sink, and usually data is output as @code{alloca()}, or sometimes
+@code{xmalloc()}.  For this reason, convenience macros are defined for
+many types of conversions involving raw data and/or Lisp strings,
+especially when the output is an @code{alloca()}ed string. (When the
+destination is a Lisp string, there are other functions that should be
+used instead -- @code{build_ext_string()} and @code{make_ext_string()},
+for example.) The convenience macros are of two types -- the older kind
+that store the result into a specified variable, and the newer kind that
+return the result.  The newer kind of macros don't exist when the output
+is sized data, because that would have two return values.  NOTE: All
+convenience macros are ultimately defined in terms of
+@code{TO_EXTERNAL_FORMAT} and @code{TO_INTERNAL_FORMAT}.  Thus, any
+comments above about the workings of these macros also apply to all
+convenience macros.
+A typical old-style convenience macro is
+@example
+C_STRING_TO_EXTERNAL (in, out, codesys);
+@end example
+This is equivalent to
+@example
+TO_EXTERNAL_FORMAT (C_STRING, in, C_STRING_ALLOCA, out, codesys);
+@end example
+but is easier to write and somewhat clearer, since it clearly identifies
+the arguments without the clutter of having the preprocessor types mixed
+in.
+The new-style equivalent is @code{NEW_C_STRING_TO_EXTERNAL (src,
+codesys)}, which @emph{returns} the converted data (still in
+@code{alloca()} space).  This is far more convenient for most
+operations.
+@node General Guidelines for Writing Mule-Aware Code, An Example of Mule-Aware Code, Conversion to and from External Data, Coding for Mule
+@subsection General Guidelines for Writing Mule-Aware Code
+@cindex writing Mule-aware code, general guidelines for
+@cindex Mule-aware code, general guidelines for writing
+@cindex code, general guidelines for writing Mule-aware
+This section contains some general guidance on how to write Mule-aware
+code, as well as some pitfalls you should avoid.
+@table @emph
+@item Never use @code{char} and @code{char *}.
+In XEmacs, the use of @code{char} and @code{char *} is almost always a
+mistake.  If you want to manipulate an Emacs character from ``C'', use
+@code{Ichar}.  If you want to examine a specific octet in the internal
+format, use @code{Ibyte}.  If you want a Lisp-visible character, use a
+@code{Lisp_Object} and @code{make_char}.  If you want a pointer to move
+through the internal text, use @code{Ibyte *}.  Also note that you
+almost certainly do not need @code{Ichar *}.  Other typedefs to clarify
+the use of @code{char} are @code{Char_ASCII}, @code{Char_Binary},
+@code{UChar_Binary}, and @code{CIbyte}.
+@item Be careful not to confuse @code{Charcount}, @code{Bytecount}, @code{Charbpos} and @code{Bytebpos}.
+The whole point of using different types is to avoid confusion about the
+use of certain variables.  Lest this effect be nullified, you need to be
+careful about using the right types.
+@item Always convert external data
+It is extremely important to always convert external data, because
+XEmacs can crash if unexpected 8-bit sequences are copied to its internal
+buffers literally.
+This means that when a system function, such as @code{readdir}, returns
+a string, you normally need to convert it using one of the conversion macros
+described in the previous chapter, before passing it further to Lisp.
+Actually, most of the basic system functions that accept '\0'-terminated
+string arguments, like @code{stat()} and @code{open()}, have
+@strong{encapsulated} equivalents that do the internal to external
+conversion themselves.  The encapsulated equivalents have a @code{qxe_}
+prefix and have string arguments of type @code{Ibyte *}, and you can
+pass internally encoded data to them, often from a Lisp string using
+@code{XSTRING_DATA}. (A better design might be to provide versions that
+accept Lisp strings directly.)  [[Really?  Then they'd either take
+@code{Lisp_Object}s and need to check type, or they'd take
+@code{Lisp_String}s, and violate the rules about passing any of the
+specific Lisp types.]]
+Also note that many internal functions, such as @code{make_string},
+accept Ibytes, which removes the need for them to convert the data they
+receive.  This increases efficiency because that way external data needs
+to be decoded only once, when it is read.  After that, it is passed
+around in internal format.
+@item Do all work in internal format
+External-formatted data is completely unpredictable in its format.  It
+may be fixed-width Unicode (not even ASCII compatible); it may be a
+modal encoding, in
+which case some occurrences of (e.g.) the slash character may be part of
+two-byte Asian-language characters, and a naive attempt to split apart a
+pathname by slashes will fail; etc.  Internal-format text should be
+converted to external format only at the point where an external API is
+actually called, and the first thing done after receiving
+external-format text from an external API should be to convert it to
+internal text.
+@end table
+@node An Example of Mule-Aware Code, Mule-izing Code, General Guidelines for Writing Mule-Aware Code, Coding for Mule
+@subsection An Example of Mule-Aware Code
+@cindex code, an example of Mule-aware
+@cindex Mule-aware code, an example of
+As an example of Mule-aware code, we will analyze the @code{string}
+function, which conses up a Lisp string from the character arguments it
+receives.  Here is the definition, pasted from @code{alloc.c}:
+@example
+@group
+DEFUN ("string", Fstring, 0, MANY, 0, /*
+Concatenate all the argument characters and make the result a string.
+*/
+(int nargs, Lisp_Object *args))
+@{
+Ibyte *storage = alloca_array (Ibyte, nargs * MAX_ICHAR_LEN);
+Ibyte *p = storage;
+for (; nargs; nargs--, args++)
+@{
+Lisp_Object lisp_char = *args;
+CHECK_CHAR_COERCE_INT (lisp_char);
+p += set_itext_ichar (p, XCHAR (lisp_char));
+@}
+return make_string (storage, p - storage);
+@}
+@end group
+@end example
+Now we can analyze the source line by line.
+Obviously, string will be as long as there are arguments to the
+function.  This is why we allocate @code{MAX_ICHAR_LEN} * @var{nargs}
+bytes on the stack, i.e. the worst-case number of bytes for @var{nargs}
+@code{Ichar}s to fit in the string.
+Then, the loop checks that each element is a character, converting
+integers in the process.  Like many other functions in XEmacs, this
+function silently accepts integers where characters are expected, for
+historical and compatibility reasons.  Unless you know what you are
+doing, @code{CHECK_CHAR} will also suffice.  @code{XCHAR (lisp_char)}
+extracts the @code{Ichar} from the @code{Lisp_Object}, and
+@code{set_itext_ichar} stores it to storage, increasing @code{p} in
+the process.
+Other instructive examples of correct coding under Mule can be found all
+over the XEmacs code.  For starters, I recommend
+@code{Fnormalize_menu_item_name} in @file{menubar.c}.  After you have
+understood this section of the manual and studied the examples, you can
+proceed writing new Mule-aware code.
+@node Mule-izing Code,  , An Example of Mule-Aware Code, Coding for Mule
+@subsection Mule-izing Code
+A lot of code is written without Mule in mind, and needs to be made
+Mule-correct or "Mule-ized".  There is really no substitute for
+line-by-line analysis when doing this, but the following checklist can
+help:
+@itemize @bullet
+@item
+Check all uses of @code{XSTRING_DATA}.
+@item
+Check all uses of @code{build_string} and @code{make_string}.
+@item
+Check all uses of @code{tolower} and @code{toupper}.
+@item
+Check object print methods.
+@item
+Check for use of functions such as @code{write_c_string},
+@code{write_fmt_string}, @code{stderr_out}, @code{stdout_out}.
+@item
+Check all occurrences of @code{char} and correct to one of the other
+typedefs described above.
+@item
+Check all existing uses of @code{TO_EXTERNAL_FORMAT},
+@code{TO_INTERNAL_FORMAT}, and any convenience macros (grep for
+@samp{EXTERNAL_TO}, @samp{TO_EXTERNAL}, and @samp{TO_SIZED_EXTERNAL}).
+@item
+In Windows code, string literals may need to be encapsulated with @code{XETEXT}.
+@end itemize
+@node CCL, Microsoft Windows-Related Multilingual Issues, Coding for Mule, Multilingual Support
+@section CCL
+@cindex CCL
+@example
+MACHINE CODE:
+The machine code consists of a vector of 32-bit words.
+The first such word specifies the start of the EOF section of the code;
+this is the code executed to handle any stuff that needs to be done
+(e.g. designating back to ASCII and left-to-right mode) after all
+other encoded/decoded data has been written out.  This is not used for
+charset CCL programs.
+REGISTER: 0..7  -- referred by RRR or rrr
+OPERATOR BIT FIELD (27-bit): XXXXXXXXXXXXXXX RRR TTTTT
+TTTTT (5-bit): operator type
+RRR (3-bit): register number
+XXXXXXXXXXXXXXXX (15-bit):
+CCCCCCCCCCCCCCC: constant or address
+000000000000rrr: register number
+AAAA:   00000 +
+00001 -
+00010 *
+00011 /
+00100 %
+00101 &
+00110 |
+00111 ~
+01000 <<
+01001 >>
+01010 <8
+01011 >8
+01100 //
+01101 not used
+01110 not used
+01111 not used
+10000 <
+10001 >
+10010 ==
+10011 <=
+10100 >=
+10101 !=
+OPERATORS:      TTTTT RRR XX..
+SetCS:          00000 RRR C...C      RRR = C...C
+SetCL:          00001 RRR .....      RRR = c...c
+c.............c
+SetR:           00010 RRR ..rrr      RRR = rrr
+SetA:           00011 RRR ..rrr      RRR = array[rrr]
+C.............C      size of array = C...C
+c.............c      contents = c...c
+Jump:           00100 000 c...c      jump to c...c
+JumpCond:       00101 RRR c...c      if (!RRR) jump to c...c
+WriteJump:      00110 RRR c...c      Write1 RRR, jump to c...c
+WriteReadJump:  00111 RRR c...c      Write1, Read1 RRR, jump to c...c
+WriteCJump:     01000 000 c...c      Write1 C...C, jump to c...c
+C...C
+WriteCReadJump: 01001 RRR c...c      Write1 C...C, Read1 RRR,
+C.............C      and jump to c...c
+WriteSJump:     01010 000 c...c      WriteS, jump to c...c
+C.............C
+S.............S
+...
+WriteSReadJump: 01011 RRR c...c      WriteS, Read1 RRR, jump to c...c
+C.............C
+S.............S
+...
+WriteAReadJump: 01100 RRR c...c      WriteA, Read1 RRR, jump to c...c
+C.............C      size of array = C...C
+c.............c      contents = c...c
+...
+Branch:         01101 RRR C...C      if (RRR >= 0 && RRR < C..)
+c.............c      branch to (RRR+1)th address
+Read1:          01110 RRR ...        read 1-byte to RRR
+Read2:          01111 RRR ..rrr      read 2-byte to RRR and rrr
+ReadBranch:     10000 RRR C...C      Read1 and Branch
+c.............c
+...
+Write1:         10001 RRR .....      write 1-byte RRR
+Write2:         10010 RRR ..rrr      write 2-byte RRR and rrr
+WriteC:         10011 000 .....      write 1-char C...CC
+C.............C
+WriteS:         10100 000 .....      write C..-byte of string
+C.............C
+S.............S
+...
+WriteA:         10101 RRR .....      write array[RRR]
+C.............C      size of array = C...C
+c.............c      contents = c...c
+...
+End:            10110 000 .....      terminate the execution
+SetSelfCS:      10111 RRR C...C      RRR AAAAA= C...C
+..........AAAAA
+SetSelfCL:      11000 RRR .....      RRR AAAAA= c...c
+c.............c
+..........AAAAA
+SetSelfR:       11001 RRR ..Rrr      RRR AAAAA= rrr
+..........AAAAA
+SetExprCL:      11010 RRR ..Rrr      RRR = rrr AAAAA c...c
+c.............c
+..........AAAAA
+SetExprR:       11011 RRR ..rrr      RRR = rrr AAAAA Rrr
+............Rrr
+..........AAAAA
+JumpCondC:      11100 RRR c...c      if !(RRR AAAAA C..) jump to c...c
+C.............C
+..........AAAAA
+JumpCondR:      11101 RRR c...c      if !(RRR AAAAA rrr) jump to c...c
+............rrr
+..........AAAAA
+ReadJumpCondC:  11110 RRR c...c      Read1 and JumpCondC
+C.............C
+..........AAAAA
+ReadJumpCondR:  11111 RRR c...c      Read1 and JumpCondR
+............rrr
+..........AAAAA
+@end example
+@node Microsoft Windows-Related Multilingual Issues, Modules for Internationalization, CCL, Multilingual Support
+@section Microsoft Windows-Related Multilingual Issues
+@cindex Microsoft Windows-related multilingual issues
+@cindex Windows-related multilingual issues
+@cindex multilingual issues, Windows-related
+@menu
+* Microsoft Documentation::
+* Locales::
+* More about code pages::
+* More about locales::
+* Unicode support under Windows::
+* The golden rules of writing Unicode-safe code::
+* The format of the locale in setlocale()::
+* Random other Windows I18N docs::
+@end menu
+@node Microsoft Documentation, Locales, Microsoft Windows-Related Multilingual Issues, Microsoft Windows-Related Multilingual Issues
+@subsection Microsoft Documentation
+@cindex Microsoft documentation
+Documentation on international support in Windows is scattered throughout MSDN.
+Here are some good places to look:
+@enumerate
+@item
+C Runtime (CRT) intl support
+@enumerate
+@item
+Visual Tools and Languages -> Visual Studio 6.0 Documentation -> Visual C++ Documentation -> Using Visual C++ -> Run-Time Library Reference -> Internationalization
+@item
+Visual Tools and Languages -> Visual Studio 6.0 Documentation -> Visual C++ Documentation -> Using Visual C++ -> Run-Time Library Reference -> Global Constants -> Locale Categories
+@item
+Visual Tools and Languages -> Visual Studio 6.0 Documentation -> Visual C++ Documentation -> Using Visual C++ -> Run-Time Library Reference -> Appendixes -> Language and Country/Region Strings
+@item
+Visual Tools and Languages -> Visual Studio 6.0 Documentation -> Visual C++ Documentation -> Using Visual C++ -> Run-Time Library Reference -> Appendixes -> Generic-Text Mappings
+@item
+Function documentation for various functions:
+Visual Tools and Languages -> Visual Studio 6.0 Documentation -> Visual C++ Documentation -> Using Visual C++ -> Run-Time Library Reference -> Alphabetic Function Reference
+e.g. _setmbcp(), setlocale(), strcoll functions
+@end enumerate
+@item
+Win32 API intl support
+@enumerate
+@item
+Platform SDK Documentation -> Base Services -> International Features
+@item
+Platform SDK Documentation -> User Interface Services -> Windows User Interface -> User Input -> Keyboard Input -> Character Messages -> International Features
+@item
+Backgrounders -> Windows Platform -> Windows 2000 -> International Support in Microsoft Windows 2000
+@end enumerate
+@item
+Microsoft Layer for Unicode
+Platform SDK Documentation -> Windows API -> Windows 95/98/Me Programming -> Windows 95/98/Me Overviews -> Microsoft Layer for Unicode on Windows 95/98/Me Systems
+@item
+Look in the CRT sources!  They come with VC++.  See win32.c.
+@end enumerate
+@node Locales, More about code pages, Microsoft Documentation, Microsoft Windows-Related Multilingual Issues
+@subsection Locales, code pages, and other concepts of "language"
+@cindex locales, code pages, and other concepts of "language"
+First, make sure you clearly understand the difference between the C
+runtime library (CRT) and the Win32 API!  See win32.c.
+There are various different ways of representing the vague concept
+of "language", and it can be very confusing.  So:
+@itemize @bullet
+@item
+The CRT library has the concept of "locale", which is a
+combination of language and country, and which controls the way
+currency and dates are displayed, the encoding of data, etc.
+@item
+XEmacs has the concept of "language environment", more or less
+like a locale; although currently in most cases it just refers to
+the language, and no sub-language distinctions are
+made. (Exceptions are with Chinese, which has different language
+environments for Taiwan and mainland China, due to the different
+encodings and writing systems.)
+@item
+Windows has a number of different language concepts:
+@enumerate
+@item
+There are "languages" and "sublanguages", which correspond to
+the languages and countries of the C library -- e.g. LANG_ENGLISH
+and SUBLANG_ENGLISH_US.  These are identified by 8-bit integers,
+called the "primary language identifier" and "sublanguage
+identifier", respectively.  These are combined into a 16-bit
+integer or "language identifier" by MAKELANGID().
+@item
+The language identifier in turn is combined with a "sort
+identifier" (and optionally a "sort version") to yield a 32-bit
+integer called a "locale identifier" (type LCID), which identifies
+locales -- the primary means of distinguishing language/regional
+settings and similar to C library locales.
+@item
+A "code page" combines the XEmacs concepts of "charset" and "coding
+system".  It logically encompasses
+@itemize @minus
+@item
+a set of supported characters
+@item
+an enumeration associating each character with a code point, which
+is a number or number pair; there may be disjoint ranges of numbers
+supported
+@item
+a way of encoding a series of characters into a string of bytes
+@end itemize
+Note that the first two properties correspond to an XEmacs "charset"
+and the latter an XEmacs "coding system".
+Traditional encodings are either simple one-byte encodings, or
+combination one-byte/two-byte encodings (aka MBCS encodings, where MBCS
+stands for "Multibyte Character Set") with the following properties:
+@itemize @minus
+@item
+all characters are encoded as a one-byte or two-byte sequence
+@item
+the encoding is stateless (non-modal)
+@item
+the lower 128 bytes are compatible with ASCII
+@item
+in the higher bytes, the value of the first byte ("lead byte")
+determines whether a second byte follows
+@item
+the values used for second bytes may overlap those used for first
+bytes, and (in some encodings) include values in the low half; thus,
+moving backwards is hard, and pure-ASCII algorithms (e.g. finding the
+next slash) will fail unless rewritten to be MBCS-aware (neither of
+these problems exist in UTF-8 or in the XEmacs internal string
+encoding)
+@end itemize
+Recent code pages, however, do not necessarily follow these properties --
+code pages have been expanded to include arbitrary encodings, such as
+UTF-8 (may have more than two bytes per character) and ISO-2022-JP
+(complex modal encoding).
+@item
+Every Windows locale has four associated code pages: ANSI (an
+international standard or some Microsoft-created approximation; the
+native code page under Windows), OEM (a DOS encoding, still used in the
+FAT file system), Mac (an encoding used on the Macintosh) and EBCDIC (a
+non-ASCII-compatible encoding used on IBM mainframes, originally based
+on the BCD or "binary-coded decimal" encoding of numbers).  All code
+pages associated with a locale follow (as far as I know) the properties
+listed above for traditional code pages.  More than one locale can share
+a code page -- e.g. all the Western European languages, including
+English, do.
+@item
+Windows also has an "input locale identifier" (aka "keyboard
+layout id") or HKL, which is a 32-bit integer composed of the
+16-bit language identifier and a 16-bit "device identifier", which
+originally specified a particular keyboard layout (e.g. the locale
+"US English" can have the QWERTY layout, the Dvorak layout, etc.),
+but has been expanded to include speech-to-text converters and
+other non-keyboard ways of inputting text.  Note that both the HKL
+and LCID share the language identifier in the lower 16 bits, and in
+both cases a 0 in the upper 16 bits means "default" (sort order or
+device), providing a way to convert between HKL's, LCID's, and
+language identifiers (i.e. language/sublanguage pairs).  The
+default keyboard layout for a language is (as far as I can
+determine) established using the Regional Settings control panel
+applet, where you can add input locales as combinations of language
+(actually language/sublanguage) and layout; presumably if you list
+only one input locale with a particular language, the corresponding
+layout is the default for that language.  But what if you list more
+than one?  You can specify a single default input locale, but there
+appears to be no way to do so on a per-language basis.
+@end enumerate
+@end itemize
+@node More about code pages, More about locales, Locales, Microsoft Windows-Related Multilingual Issues
+@subsection More about code pages
+@cindex more about code pages
+Here is what MSDN says about code pages (article "Code Pages"):
+@quotation
+A code page is a character set, which can include numbers,
+punctuation marks, and other glyphs. Different languages and locales
+may use different code pages. For example, ANSI code page 1252 is
+used for American English and most European languages; OEM code page
+932 is used for Japanese Kanji.
+A code page can be represented in a table as a mapping of characters
+to single-byte values or multibyte values. Many code pages share the
+ASCII character set for characters in the range 0x00 ?0x7F.
+The Microsoft run-time library uses the following types of code pages:
+-- System-default ANSI code page. By default, at startup the run-time
+system automatically sets the multibyte code page to the
+system-default ANSI code page, which is obtained from the operating
+system. The call
+setlocale ( LC_ALL, "" );
+also sets the locale to the system-default ANSI code page.
+-- Locale code page. The behavior of a number of run-time routines is
+dependent on the current locale setting, which includes the locale
+code page. (For more information, see Locale-Dependent Routines.) By
+default, all locale-dependent routines in the Microsoft run-time
+library use the code page that corresponds to the ��?locale. At
+run-time you can change or query the locale code page in use with a
+call to setlocale.
+-- Multibyte code page. The behavior of most of the multibyte-character
+routines in the run-time library depends on the current multibyte
+code page setting. By default, these routines use the system-default
+ANSI code page. At run-time you can query and change the multibyte
+code page with _getmbcp and _setmbcp, respectively.
+-- The "C" locale is defined by ANSI to correspond to the locale in
+which C programs have traditionally executed. The code page for the
+"C" locale (��?code page) corresponds to the ASCII character
+set. For example, in the "C" locale, islower returns true for the
+values 0x61 ?0x7A only. In another locale, islower may return true
+for these as well as other values, as defined by that locale.
+Under "Locale-Dependent Routines" we notice the following setlocale
+dependencies:
+atof, atoi, atol (LC_NUMERIC)
+is Routines (LC_CTYPE)
+isleadbyte (LC_CTYPE)
+localeconv (LC_MONETARY, LC_NUMERIC)
+MB_CUR_MAX (LC_CTYPE)
+_mbccpy (LC_CTYPE)
+_mbclen (LC_CTYPE)
+mblen (LC_CTYPE )
+_mbstrlen (LC_CTYPE)
+mbstowcs (LC_CTYPE)
+mbtowc (LC_CTYPE)
+printf (LC_NUMERIC, for radix character output)
+scanf (LC_NUMERIC, for radix character recognition)
+setlocale/_wsetlocale (Not applicable)
+strcoll (LC_COLLATE)
+_stricoll/_wcsicoll (LC_COLLATE)
+_strncoll/_wcsncoll (LC_COLLATE)
+_strnicoll/_wcsnicoll (LC_COLLATE)
+strftime, wcsftime (LC_TIME)
+_strlwr (LC_CTYPE)
+strtod/wcstod/strol/wcstol/strtoul/wcstoul (LC_NUMERIC, for radix character recognition)
+_strupr (LC_CTYPE)
+strxfrm/wcsxfrm (LC_COLLATE)
+tolower/towlower (LC_CTYPE)
+toupper/towupper (LC_CTYPE)
+wcstombs (LC_CTYPE)
+wctomb (LC_CTYPE)
+_wtoi/_wtol (LC_NUMERIC)
+@end quotation
+NOTE: The above documentation doesn't clearly explain the "locale code
+page" and "multibyte code page".  These are two different values,
+maintained respectively in the CRT global variables __lc_codepage and
+__mbcodepage.  Calling e.g. setlocale (LC_ALL, "JAPANESE") sets @strong{ONLY}
+__lc_codepage to 932 (the code page for Japanese), and leaves
+__mbcodepage unchanged (usually 1252, i.e. Windows-ANSI).  You'd have to
+call _setmbcp() to change __mbcodepage.  Figuring out from the
+documentation which routines use which code page is not so obvious.  But:
+@itemize @bullet
+@item
+from "Interpretation of Multibyte-Character Sequences" it appears that
+all "multibyte-character routines" use the multibyte code page except for
+mblen(), _mbstrlen(), mbstowcs(), mbtowc(), wcstombs(), and wctomb().
+@item
+from "_setmbcp": "The multibyte code page also affects
+multibyte-character processing by the following run-time library
+routines: _exec functions _mktemp _stat _fullpath _spawn functions
+_tempnam _makepath _splitpath tmpnam.  In addition, all run-time library
+routines that receive multibyte-character argv or envp program arguments
+as parameters (such as the _exec and _spawn families) process these
+strings according to the multibyte code page. Hence these routines are
+also affected by a call to _setmbcp that changes the multibyte code
+page."
+@end itemize
+Summary: from looking at the CRT source (which comes with VC++) and
+carefully looking through the docs, it appears that:
+@itemize @bullet
+@item
+the "locale code page" is used by all of the routines listed above
+under "Locale-Dependent Routines" (EXCEPT _mbccpy() and _mbclen()),
+as well as any other place that converts between multibyte and Unicode
+strings, e.g. the startup code.
+@item
+the "multibyte code page" is used in all of the *mb*() routines
+except mblen(), _mbstrlen(), mbstowcs(), mbtowc(), wcstombs(),
+and wctomb(); also _exec*(), _spawn*(), _mktemp(), _stat(), _fullpath(),
+_tempnam(), _makepath(), _splitpath(), tmpnam(), and similar functions
+without the leading underscore.
+@end itemize
+@node More about locales, Unicode support under Windows, More about code pages, Microsoft Windows-Related Multilingual Issues
+@subsection More about locales
+@cindex more about locales
+In addition to the locale defined by the CRT, Windows (i.e. the Win32 API)
+defines various locales:
+@itemize @bullet
+@item
+The system-default locale is the locale defined under "Language
+settings for the system" in the "Regional Options" control panel.  This
+is NOT user-specific, and changing it requires a reboot (at least under
+Windows 2000).  The ANSI code page of the system-default locale is
+returned by GetACP(), and you can specify this code page in calls
+e.g. to MultiByteToWideChar with the constant CP_ACP.
+@item
+The user-default locale is the locale defined under "Settings for the
+current user" in the "Regional Options" control panel.
+@item
+There is a thread-local locale set by SetThreadLocale. #### What is this
+used for?
+@end itemize
+The Win32 API has a bunch of multibyte functions -- all of those that
+end with ...A(), and on which we spend so much effort in
+intl-encap-win32.c.  These appear to ALWAYS use the ANSI code page of
+the system-default locale (GetACP(), CP_ACP).  Note that this applies
+also, for example, to the encoding of filenames in all file-handling
+routines, including the CRT ones such as open(), because they pass their
+args unchanged to the Win32 API.
+@node Unicode support under Windows, The golden rules of writing Unicode-safe code, More about locales, Microsoft Windows-Related Multilingual Issues
+@subsection Unicode support under Windows
+@cindex unicode support under windows
+Basically, the whole concept of locales and code pages is broken, because
+it is extremely messy to support and does not allow for documents that use
+multiple languages simultaneously.  Unicode was designed in response to
+this, the idea being to create a single character set that could be used to
+encode all the world's languages.  Windows has supported Unicode since the
+beginning of the Win32 API.  Internally, every code page has an associated
+table to convert the characters of that code page to and from Unicode, and
+the Win32 API itself probably (perhaps always) uses Unicode internally.
+Under Windows there are two different versions of all library routines that
+accept or return text, those that handle Unicode text and those handling
+"multibyte" text, i.e. variable-width ASCII-compatible text in some
+national format such as EUC or Shift-JIS.  Because Windows 95 basically
+doesn't support Unicode but Windows NT does, and Microsoft doesn't provide
+any way of writing a single binary that will work on both systems and still
+use Unicode when it's available (although see below, Microsoft Layer for
+Unicode), we need to provide a way of run-time conditionalizing so you
+could have one binary for both systems.  "Unicode-splitting" refers to
+writing code that will handle this properly.  This means using
+Qmswindows_tstr as the external conversion format, calling the appropriate
+qxe...() Unicode-split version of library functions, and doing other things
+in certain cases, e.g. when a qxe() function is not present.
+Unicode support also requires that the various Windows API's be
+"Unicode-encapsulated", so that they automatically call the ANSI or
+Unicode version of the API call appropriately and handle the size
+differences in structures.  What this means is:
+@itemize @bullet
+@item
+first, note that Windows already provides a sort of encapsulation
+of all API's that deal with text.  All such API's are underlyingly
+provided in two versions, with an A or W suffix (ANSI or "wide"
+i.e. Unicode), and the compile-time constant UNICODE controls which is
+selected by the unsuffixed API.  Same thing happens with structures, and
+also with types, where the generic types have names beginning with T --
+TCHAR, LPTSTR, etc..  Unfortunately, this is compile-time only, not
+run-time, so not sufficient. (Creating the necessary run-time encoding
+is not conceptually difficult, but very time-consuming to write.  It
+adds no significant overhead, and the only reason it's not standard in
+Windows is conscious marketing attempts by Microsoft to cripple Windows
+95.  FUCK MICROSOFT!  They even describe in a KnowledgeBase article
+exactly how to create such an API [although we don't exactly follow
+their procedure], and point out its usefulness; the procedure is also
+described more generally in Nadine Kano's book on Win32
+internationalization -- written SIX YEARS AGO!  Obviously Microsoft has
+such an API available internally.)
+@item
+what we do is provide an encapsulation of each standard Windows API call
+that is split into A and W versions.  current theory is to avoid all
+preprocessor games; so we name the function with a prefix -- "qxe"
+currently -- and require callers to use the prefixed name.  Callers need
+to explicitly use the W version of all structures, and convert text
+themselves using Qmswindows_tstr.  the qxe encapsulated version will
+automatically call the appropriate A or W version depending on whether
+we're running on 9x or NT (you can force use of the A calls on NT,
+e.g. for testing purposes, using the command- line switch -nuni aka
+-no-unicode-lib-calls), and copy data between W and A versions of the
+structures as necessary.
+@item
+We require the caller to handle the actual translation of text to
+avoid possible overflow when dealing with fixed-size Windows
+structures.  There are no such problems when copying data between
+the A and W versions because ANSI text is never larger than its
+equivalent Unicode representation.
+@end itemize
+NOTE NOTE NOTE: As of August 2001, Microsoft (finally!  See my nasty
+comment above) released their own Unicode-encapsulation library, called
+Microsoft Layer for Unicode on Windows 95/98/Me Systems.  It tries to be
+more transparent than we are, in that
+@itemize @bullet
+@item
+its routines do ANSI/Unicode string translation, while we don't, for
+efficiency (we already have to do internal/external conversion so it's
+no extra burden to do the proper conversion directly rather than always
+converting to Unicode and then doing a second conversion to ANSI as
+necessary)
+@item
+rather than requiring separately-named routines (qxeFooBar), they
+physically override the existing routines at the link level.  it also
+appears that they do this BADLY, in that if you link with the MLU, you
+get an application that runs ONLY on Win9x!!! (hint -- use
+GetProcAddress()).  there's still no way to create a single binary!
+fucking losers.
+@item
+they assume you compile with UNICODE defined, so there's no need for the
+application to explicitly use ...W structures, as we require.
+@item
+they also intercept windows procedures to deal with notify messages as
+necessary, which we don't do yet.
+@item
+they (of course) don't use Extbyte.
+@end itemize
+at some point (especially when they fix the single-binary problem!), we
+should consider switching.  for the meantime, we'll stick with what i've
+already written.  perhaps we should think about adopting some of the
+greater transparency they have; but i opted against transparency on
+purpose, to make the code easier to follow for someone who's not familiar
+with it.  until our library is really complete and bug-free, we should
+think twice before doing this.
+According to Microsoft documentation, only the following functions are
+provided under Windows 9x to support Unicode (see MSDN page "Windows
+95/98/Me General Limitations"):
+EnumResourceLanguages
+EnumResourceNames
+EnumResourceTypes
+ExtTextOut
+FindResource
+FindResourceEx
+GetCharWidth
+GetCommandLine
+GetTextExtentPoint
+GetTextExtentPoint32
+lstrcat
+lstrcpy
+lstrlen
+MessageBox
+MessageBoxEx
+MultiByteToWideChar
+TextOut
+WideCharToMultiByte
+also maybe GetTextExtentExPoint? (KB Q125671 "Unicode Functions Supported
+by Windows 95")
+However, the C runtime library provides some additional support (according
+to the CRT sources, as the docs are not very clear on this):
+@itemize @bullet
+@item
+wmain() is completely supported, and appropriate Unicode-formatted argv
+and envp will always be passed.
+@item
+Likewise, wWinMain() is completely supported. (NOTE: The docs are not at
+all clear on how these various entry points interact, and implies that
+a windows-subsystem program "must" use WinMain(), while a console-
+subsystem program "must" use main(), and a program compiled with UNICODE
+(which we don't, see above) "must" use the w*() versions, while a program
+not compiled this way "must" use the plain versions.  In fact it appears
+that the CRT provides four different compiler entry points, namely
+w?(main|WinMain)CRTStartup, and we simply choose the one we like using
+the appropriate link flag.
+@item
+_wenviron, _wputenv
+@end itemize
+NOTE:
+@itemize @bullet
+@item
+wsetargv.obj uses routines that were buggily left out of MSVCRT; anyway,
+from looking at the source, it does NOT correctly work under Win 9x as
+it blindly calls the Unicode version of Unicode-split API's such as
+FindFirstFile)
+@item
+the w*() file routines are @strong{NOT} supported -- or at least, they blindly
+call the ...W() versions of the Win32 API calls.
+@end itemize
+@node The golden rules of writing Unicode-safe code, The format of the locale in setlocale(), Unicode support under Windows, Microsoft Windows-Related Multilingual Issues
+@subsection The golden rules of writing Unicode-safe code
+@cindex the golden rules of writing unicode-safe code
+@itemize @bullet
+@item
+There are no preprocessor games going on.
+@item
+Do not set the UNICODE constant.
+@item
+You need to change your code to call the Windows API prefixed with "qxe"
+functions (when they exist) and use the ...W structs instead of the
+generic ones.  String arguments in the qxe functions are of type Extbyte
+*.
+@item
+You code is responsible for conversion of text arguments.  We try to
+handle everything else -- the argument differences, the copying back and
+forth of structures, etc.  Use Qmswindows_tstr and macros such as
+C_STRING_TO_TSTR.  You are also responsible for interpreting and
+specifying string sizes, which have not been changed.  Usually these are
+in characters, meaning you need to divide by XETCHAR_SIZE. (But, some
+functions want sizes in bytes, even with Unicode strings.  Look in the
+documentation.) Use XETEXT when specifying string constants, so that
+they show up in Unicode as necessary.
+@item
+If you need to process external strings (in general you should not do
+this; do all your manipulations in internal format and convert at the
+point of entry into or exit from the function), use the xet...()
+functions.
+@item
+If you have to declare a fixed array to hold a string coming from
+Windows (and hence either multibyte or Unicode), declare it of type
+Extbyte[] and multiply the size by MAX_XETCHAR_SIZE.
+@end itemize
+@node The format of the locale in setlocale(), Random other Windows I18N docs, The golden rules of writing Unicode-safe code, Microsoft Windows-Related Multilingual Issues
+@subsection The format of the locale in setlocale()
+@cindex the format of the locale in setlocale()
+It appears that under Unix the standard format for the string in
+setlocale() involves two-letter language and country abbreviations, e.g.
+ja or ja_jp or ja_jp.euc for Japanese.  Windows (MSDN article "Language
+Strings" in the run-time reference appendix, see doc list above) speaks
+of "(primary) language" and "sublanguage" (usually a country, but in the
+case of Chinese the sublanguage is "simplified" or "traditional").  It
+is highly flexible in what it takes, and thankfully it canonicalizes the
+result to a unique form "Language_Country.Encoding".  It allows (note
+that all specifications can be in any case):
+@itemize @bullet
+@item
+the full "language_country.encoding" specification or just
+language_country", in which case the default encoding will be chosen.
+@item
+a three-letter acronym, consisting of the ISO-standard two-letter
+language abbreviation followed by a third letter indicating the
+sublanguage.
+@item
+just a language name, e.g. "dutch", standing for the combination of
+the language with "default" as sublanguage, referring to the default
+(often "prototypical") country for that language (in this case the
+Netherlands).  You can abbreviate the name by removing any number of
+letters from the end.  Ambiguity is not a problem: Even specifying
+just a single letter is valid providing any language starting with
+that letter exists, but the result may not be what you want (e.g. "c"
+maps to "catalan", not "chinese", "czech", etc.).  The way of
+resolving ambiguity appears fairly random -- it's not alphabetical
+("a" maps to "arabic" not "albanian").
+@item
+a combination of language and sublanguage separated by a hyphen,
+e.g. "dutch-belgian"; note that the sublanguage designator in this
+case is NOT necessarily the same as the country, e.g. "belgian" vs.
+"belgium".  "dutch-belgium" (or even "dutch-belg") does @strong{NOT} get you
+the right result, but returns "Dutch_Netherlands.1252" instead!  This
+is because, although you may not abbreviate the result, Windows
+accepts any unknown value in the sublanguage field and treats it as
+equivalent to "default".  Note also that the if the sublanguage name
+has underscores in it, you need to change them to spaces, e.g.
+"spanish-dominican republic".
+@item
+sometimes, just a sublanguage name, e.g. "belgian", standing for
+the combination of one of the languages spoken in that region and
+the sublanguage of the region -- in this case Dutch.  Note that
+there is no guarantee of "protypicality" in this case in choice of
+language!  You could hardly say that Dutch (aka Flemish) is more
+prototypical of Belgium than French.  You cannot abbreviate this
+form, if it's allowed at all.
+@end itemize
+In addition:
+@itemize @bullet
+@item
+note further that you are not limited to the language/sublanguage
+combinations predefined by Windows.  You can set weird combinations
+like "Chinese_Kenya.1255" (Chinese spoken in Kenya, represented by
+Windows-1255, i.e. Hebrew!) and Windows don't complain, despite the
+language-encoding inconsistency.  You can also make up a weird
+combination and leave out the encoding, e.g. "Chinese_Qatar", which
+maps to "Chinese_Qatar.1256", where Windows-1256 is Arabic -- i.e. it
+appears to be choosing the encoding based on a default for the
+country.
+@item
+note also that the names for countries are often not what you expect.
+"urdu_pakistan" fails, and just "urdu" shows why, as it maps to
+"Urdu_Islamic Republic of Pakistan.1256".  That is, some countries
+exist in their full name, and the canonicalized form with underscore
+is not very forgiving in its handling of country specifications.
+Similarly, Uzbekistan is "Republic of Uzbekistan", and "China" is
+"People's Republic of China" -- but in this latter case, unlike the
+other two, just "China" works as an alias, e.g. "uzbek_china" maps
+to "Uzbek_People's Republic of China.936".
+@item
+note that just the two-letter ISO language code is NOT allowed.
+Sometimes you'll get lucky (e.g. "fr" does map to "france"), but
+sometimes you'll get no match (e.g. "pl"), and sometimes you'll get
+really unlucky in that the call will succeed but with the wrong
+language (e.g. "es" maps to "estonian", not "spanish").
+@end itemize
+As an example, MSDN article "Language Strings" indicates that German
+(default) can be specified using "deu" or "german"; German (Austrian)
+with "dea" or "german-austrian"; German (Swiss) with "des",
+"german-swiss", or "swiss"; French (Swiss) with "french-swiss" or "frs";
+and English (USA) with "american", "american english",
+"american-english", "english-american", "english-us", "english-usa",
+"enu", "us", or "usa".  This is not, of course, an exhaustive list even
+for just the given locales -- just "english" works in practice because
+English (Default) maps to English (USA). (#### Is this always the case?)
+Given the canonicalization, we don't have to worry too much about the
+different kinds of inputs to setlocale() -- unlike for Unix, where no
+canonicalization is usually performed, the particular locales that
+exist vary tremendously from OS to OS, and we need to parse the
+uncanonicalized locale spec, directly from the user, to figure out the
+encoding to use, making various guesses if not enough information is
+present.  Yuck!  The tricky thing under Windows is figuring how to
+deal with the sublang.  It appears that the trick of simply passing the
+text of the manifest constant itself of the sublang, with appropriate
+hacking (e.g. of underscore to space), works most of the time.
+@node Random other Windows I18N docs,  , The format of the locale in setlocale(), Microsoft Windows-Related Multilingual Issues
+@subsection Random other Windows I18N docs
+@cindex random other windows i18n docs
+Introduction to Internationalization Issues in the Win32 API
+Abstract: This page provides an overview of the aspects of the Win32
+internationalization API that are relevant to XEmacs, including the
+basic distinction between multibyte and Unicode encodings. Also
+included are pointers to how XEmacs should make use of this API.
+The Win32 API is quite well-designed in its handling of strings
+encoded for various character sets. The API is geared around the idea
+that two different methods of encoding strings should be
+supported. These methods are called multibyte and Unicode,
+respectively. The multibyte encoding is compatible with ASCII strings
+and is a more efficient representation when dealing with strings
+containing primarily ASCII characters, but it has a great number of
+serious deficiencies and limitations, including that it is very
+difficult and error-prone to work with strings in this encoding, and
+any particular string in a multibyte encoding can only contain
+characters from a very limited number of character sets. The Unicode
+encoding rectifies all of these deficiencies, but it is not compatible
+with ASCII strings (in other words, an existing program will not be
+able to handle the encoded strings unless it is explicitly modified to
+do so), and it takes up twice as much memory space as multibyte
+encodings when encoding a purely ASCII string.
+Multibyte encodings use a variable number of bytes (either one or two)
+to represent characters. ASCII characters are also represented by a
+single byte with its high bit not set, and non-ASCII characters are
+represented by one or two bytes, the first of which always has its
+high bit set. (The second byte, when it exists, may or may not have
+its high bit set.) There is no single multibyte encoding. Instead,
+there is generally one encoding per non-ASCII character set. Such an
+encoding is capable of representing (besides ASCII characters, of
+course) only characters from one (or possibly two) particular
+character sets.
+Multibyte encoding makes processing of strings very difficult. For
+example, given a pointer to the beginning of a character within a
+string, finding the pointer to the beginning of the previous character
+may require backing up all the way to the beginning of the string, and
+then moving forward. Also, an operation such as separating out the
+components of a path by searching for backslashes will fail if it's
+implemented in the simplest (but not multibyte-aware) fashion, because
+it may find what appears to be a backslash, but which is actually the
+second byte of a two-byte character. Also, the limited number of
+character sets that any particular multibyte encoding can represent
+means that loss of data is likely if a string is converted from the
+XEmacs internal format into a multibyte format.
+For these reasons, the C code in XEmacs should never do any sort of
+work with multibyte encoded strings (or with strings in any external
+encoding for that matter). Strings should always be maintained in the
+internal encoding, which is predictable, and converted to an external
+encoding only at the point where the string moves from the XEmacs C
+code and enters a system library function. Similarly, when a string is
+returned from a system library function, it should be immediately
+converted into the internal coding before any operations are done on
+it.
+Unicode, unlike multibyte encodings, is a fixed-width encoding where
+every character is represented using 16 bits. It is also capable of
+encoding all the characters from all the character sets in common use
+in the world. The predictability and completeness of the Unicode
+encoding makes it a very good encoding for strings that may contain
+characters from many character sets mixed up with each other. At the
+same time, of course, it is incompatible with routines that expect
+ASCII characters and also incompatible with general string
+manipulation routines, which will encounter a great number of what
+would appear to be embedded nulls in the string. It also takes twice
+as much room to encode strings containing primarily ASCII
+characters. This is why XEmacs does not use Unicode or similar
+encoding internally for buffers.
+The Win32 API cleverly deals with the issue of 8 bit vs. 16 bit
+characters by declaring a type called TCHAR which specifies a generic
+character, either 8 bits or 16 bits. Generally TCHAR is defined to be
+the same as the simple C type char, unless the preprocessor constant
+UNICODE is defined, in which case TCHAR is defined to be WCHAR, which
+is a 16 bit type. Nearly all functions in the Win32 API that take
+strings are defined to take strings that are actually arrays of
+TCHARs. There is a type LPTSTR which is defined to be a string of
+TCHARs and another type LPCTSTR which is a const string of TCHARs. The
+theory is that any program that uses TCHARs exclusively to represent
+characters and does not make assumptions about the size of a TCHAR or
+the way that the characters are encoded should work transparently
+regardless of whether the UNICODE preprocessor constant is defined,
+which is to say, regardless of whether 8 bit multibyte or 16 bit
+Unicode characters are being used. The way that this is actually
+implemented is that every Win32 API function that takes a string as an
+argument actually maps to one of two functions which are suffixed with
+an A (which stands for ANSI, and means multibyte strings) or W (which
+stands for wide, and means Unicode strings). The mapping is, of
+course, controlled by the same UNICODE preprocessor
+constant. Generally all structures containing strings in them actually
+map to one of two different kinds of structures, with either an A or a
+W suffix after the structure name.
+Unfortunately, not all of the implementations of the Win32 API
+implement all of the functionality described above. In particular,
+Windows 95 does not implement very much Unicode functionality. It does
+implement functions to convert multibyte-encoded strings to and from
+Unicode strings, and provides Unicode versions of certain low-level
+functions like ExtTextOut(). In fact, all of the rest of the Unicode
+versions of API functions are just stubs that return an
+error. Conversely, all versions of Windows NT completely implement all
+the Unicode functionality, but some versions (especially versions
+before Windows NT 4.0) don't implement much of the multibyte
+functionality. For this reason, as well as for general code
+cleanliness, XEmacs needs to be written in such a way that it works
+with or without the UNICODE preprocessor constant being defined.
+Getting XEmacs to run when all strings are Unicode primarily involves
+removing any assumptions made about the size of characters. Remember
+what I said earlier about how the point of conversion between
+internally and externally encoded strings should occur at the point of
+entry or exit into or out of a library function. With this in mind, an
+externally encoded string in XEmacs can be treated simply as an
+arbitrary sequence of bytes of some length which has no particular
+relationship to the length of the string in the internal encoding.
+Use Qnative for Unix conversion, Qmswindows_tstr for Windows ...
+String constants that are to be passed directly to Win32 API functions,
+such as the names of window classes, need to be bracketed in their
+definition with a call to the macro XETEXT. This appropriately makes a
+string of either regular or wide chars, which is to say this string may be
+prepended with an L (causing it to be a wide string) depending on
+XEUNICODE_P.
+@node Modules for Internationalization,  , Microsoft Windows-Related Multilingual Issues, Multilingual Support
+@section Modules for Internationalization
+@cindex modules for internationalization
+@cindex internationalization, modules for
+@example
+@file{mule-canna.c}
+@file{mule-ccl.c}
+@file{mule-charset.c}
+@file{mule-charset.h}
+@file{file-coding.c}
+@file{file-coding.h}
+@file{mule-coding.c}
+@file{mule-mcpath.c}
+@file{mule-mcpath.h}
+@file{mule-wnnfns.c}
+@file{mule.c}
+@end example
+These files implement the MULE (Asian-language) support.  Note that MULE
+actually provides a general interface for all sorts of languages, not
+just Asian languages (although they are generally the most complicated
+to support).  This code is still in beta.
+@file{mule-charset.*} and @file{file-coding.*} provide the heart of the
+XEmacs MULE support.  @file{mule-charset.*} implements the @dfn{charset}
+Lisp object type, which encapsulates a character set (an ordered one- or
+two-dimensional set of characters, such as US ASCII or JISX0208 Japanese
+Kanji).
+@file{file-coding.*} implements the @dfn{coding-system} Lisp object
+type, which encapsulates a method of converting between different
+encodings.  An encoding is a representation of a stream of characters,
+possibly from multiple character sets, using a stream of bytes or words,
+and defines (e.g.) which escape sequences are used to specify particular
+character sets, how the indices for a character are converted into bytes
+(sometimes this involves setting the high bit; sometimes complicated
+rearranging of the values takes place, as in the Shift-JIS encoding),
+etc.  It also contains some generic coding system implementations, such
+as the binary (no-conversion) coding system and a sample gzip coding system.
+@file{mule-coding.c} contains the implementations of text coding systems.
+@file{mule-ccl.c} provides the CCL (Code Conversion Language)
+interpreter.  CCL is similar in spirit to Lisp byte code and is used to
+implement converters for custom encodings.
+@file{mule-canna.c} and @file{mule-wnnfns.c} implement interfaces to
+external programs used to implement the Canna and WNN input methods,
+respectively.  This is currently in beta.
+@file{mule-mcpath.c} provides some functions to allow for pathnames
+containing extended characters.  This code is fragmentary, obsolete, and
+completely non-working.  Instead, @code{pathname-coding-system} is used
+to specify conversions of names of files and directories.  The standard
+C I/O functions like @samp{open()} are wrapped so that conversion occurs
+automatically.
+@file{mule.c} contains a few miscellaneous things.  It currently seems
+to be unused and probably should be removed.
+@example
+@file{intl.c}
+@end example
+This provides some miscellaneous internationalization code for
+implementing message translation and interfacing to the Ximp input
+method.  None of this code is currently working.
+@example
+@file{iso-wide.h}
+@end example
+This contains leftover code from an earlier implementation of
+Asian-language support, and is not currently used.
+@node Consoles; Devices; Frames; Windows, The Redisplay Mechanism, Multilingual Support, Top
+@chapter Consoles; Devices; Frames; Windows
+@cindex consoles; devices; frames; windows
+@cindex devices; frames; windows, consoles;
+@cindex frames; windows, consoles; devices;
+@cindex windows, consoles; devices; frames;
+@menu
+* Introduction to Consoles; Devices; Frames; Windows::
+* Point::
+* Window Hierarchy::
+* The Window Object::
+* Modules for the Basic Displayable Lisp Objects::
+@end menu
+@node Introduction to Consoles; Devices; Frames; Windows, Point, Consoles; Devices; Frames; Windows, Consoles; Devices; Frames; Windows
+@section Introduction to Consoles; Devices; Frames; Windows
+@cindex consoles; devices; frames; windows, introduction to
+@cindex devices; frames; windows, introduction to consoles;
+@cindex frames; windows, introduction to consoles; devices;
+@cindex windows, introduction to consoles; devices; frames;
+A window-system window that you see on the screen is called a
+@dfn{frame} in Emacs terminology.  Each frame is subdivided into one or
+more non-overlapping panes, called (confusingly) @dfn{windows}.  Each
+window displays the text of a buffer in it. (See above on Buffers.) Note
+that buffers and windows are independent entities: Two or more windows
+can be displaying the same buffer (potentially in different locations),
+and a buffer can be displayed in no windows.
+A single display screen that contains one or more frames is called
+a @dfn{display}.  Under most circumstances, there is only one display.
+However, more than one display can exist, for example if you have
+a @dfn{multi-headed} console, i.e. one with a single keyboard but
+multiple displays. (Typically in such a situation, the various
+displays act like one large display, in that the mouse is only
+in one of them at a time, and moving the mouse off of one moves
+it into another.) In some cases, the different displays will
+have different characteristics, e.g. one color and one mono.
+XEmacs can display frames on multiple displays.  It can even deal
+simultaneously with frames on multiple keyboards (called @dfn{consoles} in
+XEmacs terminology).  Here is one case where this might be useful: You
+are using XEmacs on your workstation at work, and leave it running.
+Then you go home and dial in on a TTY line, and you can use the
+already-running XEmacs process to display another frame on your local
+TTY.
+Thus, there is a hierarchy console -> display -> frame -> window.
+There is a separate Lisp object type for each of these four concepts.
+Furthermore, there is logically a @dfn{selected console},
+@dfn{selected display}, @dfn{selected frame}, and @dfn{selected window}.
+Each of these objects is distinguished in various ways, such as being the
+default object for various functions that act on objects of that type.
+Note that every containing object remembers the ``selected'' object
+among the objects that it contains: e.g. not only is there a selected
+window, but every frame remembers the last window in it that was
+selected, and changing the selected frame causes the remembered window
+within it to become the selected window.  Similar relationships apply
+for consoles to devices and devices to frames.
+@node Point, Window Hierarchy, Introduction to Consoles; Devices; Frames; Windows, Consoles; Devices; Frames; Windows
+@section Point
+@cindex point
+Recall that every buffer has a current insertion position, called
+@dfn{point}.  Now, two or more windows may be displaying the same buffer,
+and the text cursor in the two windows (i.e. @code{point}) can be in
+two different places.  You may ask, how can that be, since each
+buffer has only one value of @code{point}?  The answer is that each window
+also has a value of @code{point} that is squirreled away in it.  There
+is only one selected window, and the value of ``point'' in that buffer
+corresponds to that window.  When the selected window is changed
+from one window to another displaying the same buffer, the old
+value of @code{point} is stored into the old window's ``point'' and the
+value of @code{point} from the new window is retrieved and made the
+value of @code{point} in the buffer.  This means that @code{window-point}
+for the selected window is potentially inaccurate, and if you
+want to retrieve the correct value of @code{point} for a window,
+you must special-case on the selected window and retrieve the
+buffer's point instead.  This is related to why @code{save-window-excursion}
+does not save the selected window's value of @code{point}.
+@node Window Hierarchy, The Window Object, Point, Consoles; Devices; Frames; Windows
+@section Window Hierarchy
+@cindex window hierarchy
+@cindex hierarchy of windows
+If a frame contains multiple windows (panes), they are always created
+by splitting an existing window along the horizontal or vertical axis.
+Terminology is a bit confusing here: to @dfn{split a window
+horizontally} means to create two side-by-side windows, i.e. to make a
+@emph{vertical} cut in a window.  Likewise, to @dfn{split a window
+vertically} means to create two windows, one above the other, by making
+a @emph{horizontal} cut.
+If you split a window and then split again along the same axis, you
+will end up with a number of panes all arranged along the same axis.
+The precise way in which the splits were made should not be important,
+and this is reflected internally.  Internally, all windows are arranged
+in a tree, consisting of two types of windows, @dfn{combination} windows
+(which have children, and are covered completely by those children) and
+@dfn{leaf} windows, which have no children and are visible.  Every
+combination window has two or more children, all arranged along the same
+axis.  There are (logically) two subtypes of windows, depending on
+whether their children are horizontally or vertically arrayed.  There is
+always one root window, which is either a leaf window (if the frame
+contains only one window) or a combination window (if the frame contains
+more than one window).  In the latter case, the root window will have
+two or more children, either horizontally or vertically arrayed, and
+each of those children will be either a leaf window or another
+combination window.
+Here are some rules:
+@enumerate
+@item
+Horizontal combination windows can never have children that are
+horizontal combination windows; same for vertical.
+@item
+Only leaf windows can be split (obviously) and this splitting does one
+of two things: (a) turns the leaf window into a combination window and
+creates two new leaf children, or (b) turns the leaf window into one of
+the two new leaves and creates the other leaf.  Rule (1) dictates which
+of these two outcomes happens.
+@item
+Every combination window must have at least two children.
+@item
+Leaf windows can never become combination windows.  They can be deleted,
+however.  If this results in a violation of (3), the parent combination
+window also gets deleted.
+@item
+All functions that accept windows must be prepared to accept combination
+windows, and do something sane (e.g. signal an error if so).
+Combination windows @emph{do} escape to the Lisp level.
+@item
+All windows have three fields governing their contents:
+these are @dfn{hchild} (a list of horizontally-arrayed children),
+@dfn{vchild} (a list of vertically-arrayed children), and @dfn{buffer}
+(the buffer contained in a leaf window).  Exactly one of
+these will be non-@code{nil}.  Remember that @dfn{horizontally-arrayed}
+means ``side-by-side'' and @dfn{vertically-arrayed} means
+@dfn{one above the other}.
+@item
+Leaf windows also have markers in their @code{start} (the
+first buffer position displayed in the window) and @code{pointm}
+(the window's stashed value of @code{point}---see above) fields,
+while combination windows have @code{nil} in these fields.
+@item
+The list of children for a window is threaded through the
+@code{next} and @code{prev} fields of each child window.
+@item
+@strong{Deleted windows can be undeleted}.  This happens as a result of
+restoring a window configuration, and is unlike frames, displays, and
+consoles, which, once deleted, can never be restored.  Deleting a window
+does nothing except set a special @code{dead} bit to 1 and clear out the
+@code{next}, @code{prev}, @code{hchild}, and @code{vchild} fields, for
+GC purposes.
+@item
+Most frames actually have two top-level windows---one for the
+minibuffer and one (the @dfn{root}) for everything else.  The modeline
+(if present) separates these two.  The @code{next} field of the root
+points to the minibuffer, and the @code{prev} field of the minibuffer
+points to the root.  The other @code{next} and @code{prev} fields are
+@code{nil}, and the frame points to both of these windows.
+Minibuffer-less frames have no minibuffer window, and the @code{next}
+and @code{prev} of the root window are @code{nil}.  Minibuffer-only
+frames have no root window, and the @code{next} of the minibuffer window
+is @code{nil} but the @code{prev} points to itself. (#### This is an
+artifact that should be fixed.)
+@end enumerate
+@node The Window Object, Modules for the Basic Displayable Lisp Objects, Window Hierarchy, Consoles; Devices; Frames; Windows
+@section The Window Object
+@cindex window object, the
+@cindex object, the window
+Windows have the following accessible fields:
+@table @code
+@item frame
+The frame that this window is on.
+@item mini_p
+Non-@code{nil} if this window is a minibuffer window.
+@item buffer
+The buffer that the window is displaying.  This may change often during
+the life of the window.
+@item dedicated
+Non-@code{nil} if this window is dedicated to its buffer.
+@item pointm
+@cindex window point internals
+This is the value of point in the current buffer when this window is
+selected; when it is not selected, it retains its previous value.
+@item start
+The position in the buffer that is the first character to be displayed
+in the window.
+@item force_start
+If this flag is non-@code{nil}, it says that the window has been
+scrolled explicitly by the Lisp program.  This affects what the next
+redisplay does if point is off the screen: instead of scrolling the
+window to show the text around point, it moves point to a location that
+is on the screen.
+@item last_modified
+The @code{modified} field of the window's buffer, as of the last time
+a redisplay completed in this window.
+@item last_point
+The buffer's value of point, as of the last time
+a redisplay completed in this window.
+@item left
+This is the left-hand edge of the window, measured in columns.  (The
+leftmost column on the screen is @w{column 0}.)
+@item top
+This is the top edge of the window, measured in lines.  (The top line on
+the screen is @w{line 0}.)
+@item height
+The height of the window, measured in lines.
+@item width
+The width of the window, measured in columns.
+@item next
+This is the window that is the next in the chain of siblings.  It is
+@code{nil} in a window that is the rightmost or bottommost of a group of
+siblings.
+@item prev
+This is the window that is the previous in the chain of siblings.  It is
+@code{nil} in a window that is the leftmost or topmost of a group of
+siblings.
+@item parent
+Internally, XEmacs arranges windows in a tree; each group of siblings has
+a parent window whose area includes all the siblings.  This field points
+to a window's parent.
+Parent windows do not display buffers, and play little role in display
+except to shape their child windows.  Emacs Lisp programs usually have
+no access to the parent windows; they operate on the windows at the
+leaves of the tree, which actually display buffers.
+@item hscroll
+This is the number of columns that the display in the window is scrolled
+horizontally to the left.  Normally, this is 0.
+@item use_time
+This is the last time that the window was selected.  The function
+@code{get-lru-window} uses this field.
+@item display_table
+The window's display table, or @code{nil} if none is specified for it.
+@item update_mode_line
+Non-@code{nil} means this window's mode line needs to be updated.
+@item base_line_number
+The line number of a certain position in the buffer, or @code{nil}.
+This is used for displaying the line number of point in the mode line.
+@item base_line_pos
+The position in the buffer for which the line number is known, or
+@code{nil} meaning none is known.
+@item region_showing
+If the region (or part of it) is highlighted in this window, this field
+holds the mark position that made one end of that region.  Otherwise,
+this field is @code{nil}.
+@end table
+@node Modules for the Basic Displayable Lisp Objects,  , The Window Object, Consoles; Devices; Frames; Windows
+@section Modules for the Basic Displayable Lisp Objects
+@cindex modules for the basic displayable Lisp objects
+@cindex displayable Lisp objects, modules for the basic
+@cindex Lisp objects, modules for the basic displayable
+@cindex objects, modules for the basic displayable Lisp
+@example
+@file{console-msw.c}
+@file{console-msw.h}
+@file{console-stream.c}
+@file{console-stream.h}
+@file{console-tty.c}
+@file{console-tty.h}
+@file{console-x.c}
+@file{console-x.h}
+@file{console.c}
+@file{console.h}
+@end example
+These modules implement the @dfn{console} Lisp object type.  A console
+contains multiple display devices, but only one keyboard and mouse.
+Most of the time, a console will contain exactly one device.
+Consoles are the top of a lisp object inclusion hierarchy.  Consoles
+contain devices, which contain frames, which contain windows.
+@example
+@file{device-msw.c}
+@file{device-tty.c}
+@file{device-x.c}
+@file{device.c}
+@file{device.h}
+@end example
+These modules implement the @dfn{device} Lisp object type.  This
+abstracts a particular screen or connection on which frames are
+displayed.  As with Lisp objects, event interfaces, and other
+subsystems, the device code is separated into a generic component that
+contains a standardized interface (in the form of a set of methods) onto
+particular device types.
+The device subsystem defines all the methods and provides method
+services for not only device operations but also for the frame, window,
+menubar, scrollbar, toolbar, and other displayable-object subsystems.
+The reason for this is that all of these subsystems have the same
+subtypes (X, TTY, NeXTstep, Microsoft Windows, etc.) as devices do.
+@example
+@file{frame-msw.c}
+@file{frame-tty.c}
+@file{frame-x.c}
+@file{frame.c}
+@file{frame.h}
+@end example
+Each device contains one or more frames in which objects (e.g. text) are
+displayed.  A frame corresponds to a window in the window system;
+usually this is a top-level window but it could potentially be one of a
+number of overlapping child windows within a top-level window, using the
+MDI (Multiple Document Interface) protocol in Microsoft Windows or a
+similar scheme.
+The @file{frame-*} files implement the @dfn{frame} Lisp object type and
+provide the generic and device-type-specific operations on frames
+(e.g. raising, lowering, resizing, moving, etc.).
+@example
+@file{window.c}
+@file{window.h}
+@end example
+@cindex window (in Emacs)
+@cindex pane
+Each frame consists of one or more non-overlapping @dfn{windows} (better
+known as @dfn{panes} in standard window-system terminology) in which a
+buffer's text can be displayed.  Windows can also have scrollbars
+displayed around their edges.
+@file{window.c} and @file{window.h} implement the @dfn{window} Lisp
+object type and provide code to manage windows.  Since windows have no
+associated resources in the window system (the window system knows only
+about the frame; no child windows or anything are used for XEmacs
+windows), there is no device-type-specific code here; all of that code
+is part of the redisplay mechanism or the code for particular object
+types such as scrollbars.
+@node The Redisplay Mechanism, Extents, Consoles; Devices; Frames; Windows, Top
+@chapter The Redisplay Mechanism
+@cindex redisplay mechanism, the
+The redisplay mechanism is one of the most complicated sections of
+XEmacs, especially from a conceptual standpoint.  This is doubly so
+because, unlike for the basic aspects of the Lisp interpreter, the
+computer science theories of how to efficiently handle redisplay are not
+well-developed.
+When working with the redisplay mechanism, remember the Golden Rules
+of Redisplay:
+@enumerate
+@item
+It Is Better To Be Correct Than Fast.
+@item
+Thou Shalt Not Run Elisp From Within Redisplay.
+@item
+It Is Better To Be Fast Than Not To Be.
+@end enumerate
+@menu
+* Critical Redisplay Sections::
+* Line Start Cache::
+* Redisplay Piece by Piece::
+* Modules for the Redisplay Mechanism::
+* Modules for other Display-Related Lisp Objects::
+@end menu
+@node Critical Redisplay Sections, Line Start Cache, The Redisplay Mechanism, The Redisplay Mechanism
+@section Critical Redisplay Sections
+@cindex redisplay sections, critical
+@cindex critical redisplay sections
+Within this section, we are defenseless and assume that the
+following cannot happen:
+@enumerate
+@item
+garbage collection
+@item
+Lisp code evaluation
+@item
+frame size changes
+@end enumerate
+We ensure (3) by calling @code{hold_frame_size_changes()}, which
+will cause any pending frame size changes to get put on hold
+till after the end of the critical section.  (1) follows
+automatically if (2) is met.  #### Unfortunately, there are
+some places where Lisp code can be called within this section.
+We need to remove them.
+If @code{Fsignal()} is called during this critical section, we
+will @code{abort()}.
+If garbage collection is called during this critical section,
+we simply return. #### We should abort instead.
+#### If a frame-size change does occur we should probably
+actually be preempting redisplay.
+@node Line Start Cache, Redisplay Piece by Piece, Critical Redisplay Sections, The Redisplay Mechanism
+@section Line Start Cache
+@cindex line start cache
+The traditional scrolling code in Emacs breaks in a variable height
+world.  It depends on the key assumption that the number of lines that
+can be displayed at any given time is fixed.  This led to a complete
+separation of the scrolling code from the redisplay code.  In order to
+fully support variable height lines, the scrolling code must actually be
+tightly integrated with redisplay.  Only redisplay can determine how
+many lines will be displayed on a screen for any given starting point.
+What is ideally wanted is a complete list of the starting buffer
+position for every possible display line of a buffer along with the
+height of that display line.  Maintaining such a full list would be very
+expensive.  We settle for having it include information for all areas
+which we happen to generate anyhow (i.e. the region currently being
+displayed) and for those areas we need to work with.
+In order to ensure that the cache accurately represents what redisplay
+would actually show, it is necessary to invalidate it in many
+situations.  If the buffer changes, the starting positions may no longer
+be correct.  If a face or an extent has changed then the line heights
+may have altered.  These events happen frequently enough that the cache
+can end up being constantly disabled.  With this potentially constant
+invalidation when is the cache ever useful?
+Even if the cache is invalidated before every single usage, it is
+necessary.  Scrolling often requires knowledge about display lines which
+are actually above or below the visible region.  The cache provides a
+convenient light-weight method of storing this information for multiple
+display regions.  This knowledge is necessary for the scrolling code to
+always obey the First Golden Rule of Redisplay.
+If the cache already contains all of the information that the scrolling
+routines happen to need so that it doesn't have to go generate it, then
+we are able to obey the Third Golden Rule of Redisplay.  The first thing
+we do to help out the cache is to always add the displayed region.  This
+region had to be generated anyway, so the cache ends up getting the
+information basically for free.  In those cases where a user is simply
+scrolling around viewing a buffer there is a high probability that this
+is sufficient to always provide the needed information.  The second
+thing we can do is be smart about invalidating the cache.
+TODO---Be smart about invalidating the cache.  Potential places:
+@itemize @bullet
+@item
+Insertions at end-of-line which don't cause line-wraps do not alter the
+starting positions of any display lines.  These types of buffer
+modifications should not invalidate the cache.  This is actually a large
+optimization for redisplay speed as well.
+@item
+Buffer modifications frequently only affect the display of lines at and
+below where they occur.  In these situations we should only invalidate
+the part of the cache starting at where the modification occurs.
+@end itemize
+In case you're wondering, the Second Golden Rule of Redisplay is not
+applicable.
+@node Redisplay Piece by Piece, Modules for the Redisplay Mechanism, Line Start Cache, The Redisplay Mechanism
+@section Redisplay Piece by Piece
+@cindex redisplay piece by piece
+As you can begin to see redisplay is complex and also not well
+documented. Chuck no longer works on XEmacs so this section is my take
+on the workings of redisplay.
+Redisplay happens in three phases:
+@enumerate
+@item
+Determine desired display in area that needs redisplay.
+Implemented by @code{redisplay.c}
+@item
+Compare desired display with current display
+Implemented by @code{redisplay-output.c}
+@item
+Output changes Implemented by @code{redisplay-output.c},
+@code{redisplay-x.c}, @code{redisplay-msw.c} and @code{redisplay-tty.c}
+@end enumerate
+Steps 1 and 2 are device-independent and relatively complex.  Step 3 is
+mostly device-dependent.
+Determining the desired display
+Display attributes are stored in @code{display_line} structures. Each
+@code{display_line} consists of a set of @code{display_block}'s and each
+@code{display_block} contains a number of @code{rune}'s. Generally
+dynarr's of @code{display_line}'s are held by each window representing
+the current display and the desired display.
+The @code{display_line} structures are tightly tied to buffers which
+presents a problem for redisplay as this connection is bogus for the
+modeline. Hence the @code{display_line} generation routines are
+duplicated for generating the modeline. This means that the modeline
+display code has many bugs that the standard redisplay code does not.
+The guts of @code{display_line} generation are in
+@code{create_text_block}, which creates a single display line for the
+desired locale. This incrementally parses the characters on the current
+line and generates redisplay structures for each.
+Gutter redisplay is different. Because the data to display is stored in
+a string we cannot use @code{create_text_block}. Instead we use
+@code{create_text_string_block} which performs the same function as
+@code{create_text_block} but for strings. Many of the complexities of
+@code{create_text_block} to do with cursor handling and selective
+display have been removed.
+@node Modules for the Redisplay Mechanism, Modules for other Display-Related Lisp Objects, Redisplay Piece by Piece, The Redisplay Mechanism
+@section Modules for the Redisplay Mechanism
+@cindex modules for the redisplay mechanism
+@cindex redisplay mechanism, modules for the
+@example
+@file{redisplay-output.c}
+@file{redisplay-msw.c}
+@file{redisplay-tty.c}
+@file{redisplay-x.c}
+@file{redisplay.c}
+@file{redisplay.h}
+@end example
+These files provide the redisplay mechanism.  As with many other
+subsystems in XEmacs, there is a clean separation between the general
+and device-specific support.
+@file{redisplay.c} contains the bulk of the redisplay engine.  These
+functions update the redisplay structures (which describe how the screen
+is to appear) to reflect any changes made to the state of any
+displayable objects (buffer, frame, window, etc.) since the last time
+that redisplay was called.  These functions are highly optimized to
+avoid doing more work than necessary (since redisplay is called
+extremely often and is potentially a huge time sink), and depend heavily
+on notifications from the objects themselves that changes have occurred,
+so that redisplay doesn't explicitly have to check each possible object.
+The redisplay mechanism also contains a great deal of caching to further
+speed things up; some of this caching is contained within the various
+displayable objects.
+@file{redisplay-output.c} goes through the redisplay structures and converts
+them into calls to device-specific methods to actually output the screen
+changes.
+@file{redisplay-x.c} and @file{redisplay-tty.c} are two implementations
+of these redisplay output methods, for X frames and TTY frames,
+respectively.
+@example
+@file{indent.c}
+@end example
+This module contains various functions and Lisp primitives for
+converting between buffer positions and screen positions.  These
+functions call the redisplay mechanism to do most of the work, and then
+examine the redisplay structures to get the necessary information.  This
+module needs work.
+@example
+@file{termcap.c}
+@file{terminfo.c}
+@file{tparam.c}
+@end example
+These files contain functions for working with the termcap (BSD-style)
+and terminfo (System V style) databases of terminal capabilities and
+escape sequences, used when XEmacs is displaying in a TTY.
+@example
+@file{cm.c}
+@file{cm.h}
+@end example
+These files provide some miscellaneous TTY-output functions and should
+probably be merged into @file{redisplay-tty.c}.
+@node Modules for other Display-Related Lisp Objects,  , Modules for the Redisplay Mechanism, The Redisplay Mechanism
+@section Modules for other Display-Related Lisp Objects
+@cindex modules for other display-related Lisp objects
+@cindex display-related Lisp objects, modules for other
+@cindex Lisp objects, modules for other display-related
+@example
+@file{faces.c}
+@file{faces.h}
+@end example
+@example
+@file{bitmaps.h}
+@file{glyphs-eimage.c}
+@file{glyphs-msw.c}
+@file{glyphs-msw.h}
+@file{glyphs-widget.c}
+@file{glyphs-x.c}
+@file{glyphs-x.h}
+@file{glyphs.c}
+@file{glyphs.h}
+@end example
+@example
+@file{objects-msw.c}
+@file{objects-msw.h}
+@file{objects-tty.c}
+@file{objects-tty.h}
+@file{objects-x.c}
+@file{objects-x.h}
+@file{objects.c}
+@file{objects.h}
+@end example
+@example
+@file{menubar-msw.c}
+@file{menubar-msw.h}
+@file{menubar-x.c}
+@file{menubar.c}
+@file{menubar.h}
+@end example
+@example
+@file{scrollbar-msw.c}
+@file{scrollbar-msw.h}
+@file{scrollbar-x.c}
+@file{scrollbar-x.h}
+@file{scrollbar.c}
+@file{scrollbar.h}
+@end example
+@example
+@file{toolbar-msw.c}
+@file{toolbar-x.c}
+@file{toolbar.c}
+@file{toolbar.h}
+@end example
+@example
+@file{font-lock.c}
+@end example
+This file provides C support for syntax highlighting---i.e.
+highlighting different syntactic constructs of a source file in
+different colors, for easy reading.  The C support is provided so that
+this is fast.
+@example
+@file{dgif_lib.c}
+@file{gif_err.c}
+@file{gif_lib.h}
+@file{gifalloc.c}
+@end example
+These modules decode GIF-format image files, for use with glyphs.
+These files were removed due to Unisys patent infringement concerns.
+@node Extents, Faces, The Redisplay Mechanism, Top
+@chapter Extents
+@cindex extents
+@menu
+* Introduction to Extents::     Extents are ranges over text, with properties.
+* Extent Ordering::             How extents are ordered internally.
+* Format of the Extent Info::   The extent information in a buffer or string.
+* Zero-Length Extents::         A weird special case.
+* Mathematics of Extent Ordering::  A rigorous foundation.
+* Extent Fragments::            Cached information useful for redisplay.
+@end menu
+@node Introduction to Extents, Extent Ordering, Extents, Extents
+@section Introduction to Extents
+@cindex extents, introduction to
+Extents are regions over a buffer, with a start and an end position
+denoting the region of the buffer included in the extent.  In
+addition, either end can be closed or open, meaning that the endpoint
+is or is not logically included in the extent.  Insertion of a character
+at a closed endpoint causes the character to go inside the extent;
+insertion at an open endpoint causes the character to go outside.
+Extent endpoints are stored using memory indices (see @file{insdel.c}),
+to minimize the amount of adjusting that needs to be done when
+characters are inserted or deleted.
+(Formerly, extent endpoints at the gap could be either before or
+after the gap, depending on the open/closedness of the endpoint.
+The intent of this was to make it so that insertions would
+automatically go inside or out of extents as necessary with no
+further work needing to be done.  It didn't work out that way,
+however, and just ended up complexifying and buggifying all the
+rest of the code.)
+@node Extent Ordering, Format of the Extent Info, Introduction to Extents, Extents
+@section Extent Ordering
+@cindex extent ordering
+Extents are compared using memory indices.  There are two orderings
+for extents and both orders are kept current at all times.  The normal
+or @dfn{display} order is as follows:
+@example
+Extent A is ``less than'' extent B,
+that is, earlier in the display order,
+if:    A-start < B-start,
+or if: A-start = B-start, and A-end > B-end
+@end example
+So if two extents begin at the same position, the larger of them is the
+earlier one in the display order (@code{EXTENT_LESS} is true).
+For the e-order, the same thing holds:
+@example
+Extent A is ``less than'' extent B in e-order,
+that is, later in the buffer,
+if:    A-end < B-end,
+or if: A-end = B-end, and A-start > B-start
+@end example
+So if two extents end at the same position, the smaller of them is the
+earlier one in the e-order (@code{EXTENT_E_LESS} is true).
+The display order and the e-order are complementary orders: any
+theorem about the display order also applies to the e-order if you swap
+all occurrences of ``display order'' and ``e-order'', ``less than'' and
+``greater than'', and ``extent start'' and ``extent end''.
+@node Format of the Extent Info, Zero-Length Extents, Extent Ordering, Extents
+@section Format of the Extent Info
+@cindex extent info, format of the
+An extent-info structure consists of a list of the buffer or string's
+extents and a @dfn{stack of extents} that lists all of the extents over
+a particular position.  The stack-of-extents info is used for
+optimization purposes---it basically caches some info that might
+be expensive to compute.  Certain otherwise hard computations are easy
+given the stack of extents over a particular position, and if the
+stack of extents over a nearby position is known (because it was
+calculated at some prior point in time), it's easy to move the stack
+of extents to the proper position.
+Given that the stack of extents is an optimization, and given that
+it requires memory, a string's stack of extents is wiped out each
+time a garbage collection occurs.  Therefore, any time you retrieve
+the stack of extents, it might not be there.  If you need it to
+be there, use the @code{_force} version.
+Similarly, a string may or may not have an extent_info structure.
+(Generally it won't if there haven't been any extents added to the
+string.) So use the @code{_force} version if you need the extent_info
+structure to be there.
+A list of extents is maintained as a double gap array.  One gap array
+is ordered by start index (the @dfn{display order}) and the other is
+ordered by end index (the @dfn{e-order}).  Note that positions in an
+extent list should logically be conceived of as referring @emph{to} a
+particular extent (as is the norm in programs) rather than sitting
+between two extents.  Note also that callers of these functions should
+not be aware of the fact that the extent list is implemented as an
+array, except for the fact that positions are integers (this should be
+generalized to handle integers and linked list equally well).
+A gap array is the same structure used by buffer text: an array of
+elements with a "gap" somewhere in the middle.  Insertion and deletion
+happens by moving the gap to the insertion/deletion point, and then
+expanding/contracting as necessary.  Gap arrays have a number of
+useful properties:
+@enumerate
+@item
+They are space efficient, as there is no need for next/previous pointers.
+@item
+If the items in them are sorted, locating an item is fast -- @math{O(log N)}.
+@item
+Insertion and deletion is very fast (constant time, essentially) if the
+gap is near (which favors localized operations, as will usually be the
+case).  Even if not, it requires only a block move of memory, which is
+generally a highly optimized operation on modern processors.
+@item
+Code to manipulate them is relatively simple to write.
+@end enumerate
+An alternative would be balanced binary trees, which have guaranteed
+@math{O(log N)} time for all operations (although the constant factors
+are not as good, and repeated localized operations will be slower than
+for a gap array).  Such code is quite tricky to write, however.
+@node Zero-Length Extents, Mathematics of Extent Ordering, Format of the Extent Info, Extents
+@section Zero-Length Extents
+@cindex zero-length extents
+@cindex extents, zero-length
+Extents can be zero-length, and will end up that way if their endpoints
+are explicitly set that way or if their detachable property is @code{nil}
+and all the text in the extent is deleted. (The exception is open-open
+zero-length extents, which are barred from existing because there is
+no sensible way to define their properties.  Deletion of the text in
+an open-open extent causes it to be converted into a closed-open
+extent.)  Zero-length extents are primarily used to represent
+annotations, and behave as follows:
+@enumerate
+@item
+Insertion at the position of a zero-length extent expands the extent
+if both endpoints are closed; goes after the extent if it is closed-open;
+and goes before the extent if it is open-closed.
+@item
+Deletion of a character on a side of a zero-length extent whose
+corresponding endpoint is closed causes the extent to be detached if
+it is detachable; if the extent is not detachable or the corresponding
+endpoint is open, the extent remains in the buffer, moving as necessary.
+@end enumerate
+Note that closed-open, non-detachable zero-length extents behave
+exactly like markers and that open-closed, non-detachable zero-length
+extents behave like the ``point-type'' marker in Mule.
+@node Mathematics of Extent Ordering, Extent Fragments, Zero-Length Extents, Extents
+@section Mathematics of Extent Ordering
+@cindex mathematics of extent ordering
+@cindex extent mathematics
+@cindex extent ordering
+@cindex display order of extents
+@cindex extents, display order
+The extents in a buffer are ordered by ``display order'' because that
+is that order that the redisplay mechanism needs to process them in.
+The e-order is an auxiliary ordering used to facilitate operations
+over extents.  The operations that can be performed on the ordered
+list of extents in a buffer are
+@enumerate
+@item
+Locate where an extent would go if inserted into the list.
+@item
+Insert an extent into the list.
+@item
+Remove an extent from the list.
+@item
+Map over all the extents that overlap a range.
+@end enumerate
+(4) requires being able to determine the first and last extents
+that overlap a range.
+NOTE: @dfn{overlap} is used as follows:
+@itemize @bullet
+@item
+two ranges overlap if they have at least one point in common.
+Whether the endpoints are open or closed makes a difference here.
+@item
+a point overlaps a range if the point is contained within the
+range; this is equivalent to treating a point @math{P} as the range
+@math{[P, P]}.
+@item
+In the case of an @emph{extent} overlapping a point or range, the extent
+is normally treated as having closed endpoints.  This applies
+consistently in the discussion of stacks of extents and such below.
+Note that this definition of overlap is not necessarily consistent with
+the extents that @code{map-extents} maps over, since @code{map-extents}
+sometimes pays attention to whether the endpoints of an extents are open
+or closed.  But for our purposes, it greatly simplifies things to treat
+all extents as having closed endpoints.
+@end itemize
+First, define @math{>}, @math{<}, @math{<=}, etc. as applied to extents
+to mean comparison according to the display order.  Comparison between
+an extent @math{E} and an index @math{I} means comparison between
+@math{E} and the range @math{[I, I]}.
+Also define @math{e>}, @math{e<}, @math{e<=}, etc. to mean comparison
+according to the e-order.
+For any range @math{R}, define @math{R(0)} to be the starting index of
+the range and @math{R(1)} to be the ending index of the range.
+For any extent @math{E}, define @math{E(next)} to be the extent directly
+following @math{E}, and @math{E(prev)} to be the extent directly
+preceding @math{E}.  Assume @math{E(next)} and @math{E(prev)} can be
+determined from @math{E} in constant time.  (This is because we store
+the extent list as a doubly linked list.)
+Similarly, define @math{E(e-next)} and @math{E(e-prev)} to be the
+extents directly following and preceding @math{E} in the e-order.
+Now:
+Let @math{R} be a range.
+Let @math{F} be the first extent overlapping @math{R}.
+Let @math{L} be the last extent overlapping @math{R}.
+Theorem 1: @math{R(1)} lies between @math{L} and @math{L(next)},
+i.e. @math{L <= R(1) < L(next)}.
+This follows easily from the definition of display order.  The
+basic reason that this theorem applies is that the display order
+sorts by increasing starting index.
+Therefore, we can determine @math{L} just by looking at where we would
+insert @math{R(1)} into the list, and if we know @math{F} and are moving
+forward over extents, we can easily determine when we've hit @math{L} by
+comparing the extent we're at to @math{R(1)}.
+@example
+Theorem 2: @math{F(e-prev) e< [1, R(0)] e<= F}.
+@end example
+This is the analog of Theorem 1, and applies because the e-order
+sorts by increasing ending index.
+Therefore, @math{F} can be found in the same amount of time as
+operation (1), i.e. the time that it takes to locate where an extent
+would go if inserted into the e-order list.  This is @math{O(log N)},
+since we are using gap arrays to manage extents.
+Define a @dfn{stack of extents} (or @dfn{SOE}) as the set of extents
+(ordered in display order and e-order, just like for normal extent
+lists) that overlap an index @math{I}.
+Now:
+Let @math{I} be an index, let @math{S} be the stack of extents on
+@math{I} and let @math{F} be the first extent in @math{S}.
+Theorem 3: The first extent in @math{S} is the first extent that overlaps
+any range @math{[I, J]}.
+Proof: Any extent that overlaps @math{[I, J]} but does not include
+@math{I} must have a start index @math{> I}, and thus be greater than
+any extent in @math{S}.
+Therefore, finding the first extent that overlaps a range @math{R} is
+the same as finding the first extent that overlaps @math{R(0)}.
+Theorem 4: Let @math{I2} be an index such that @math{I2 > I}, and let
+@math{F2} be the first extent that overlaps @math{I2}.  Then, either
+@math{F2} is in @math{S} or @math{F2} is greater than any extent in
+@math{S}.
+Proof: If @math{F2} does not include @math{I} then its start index is
+greater than @math{I} and thus it is greater than any extent in
+@math{S}, including @math{F}.  Otherwise, @math{F2} includes @math{I}
+and thus is in @math{S}, and thus @math{F2 >= F}.
+@node Extent Fragments,  , Mathematics of Extent Ordering, Extents
+@section Extent Fragments
+@cindex extent fragments
+@cindex fragments, extent
+Imagine that the buffer is divided up into contiguous, non-overlapping
+@dfn{runs} of text such that no extent starts or ends within a run
+(extents that abut the run don't count).
+An extent fragment is a structure that holds data about the run that
+contains a particular buffer position (if the buffer position is at the
+junction of two runs, the run after the position is used)---the
+beginning and end of the run, a list of all of the extents in that run,
+the @dfn{merged face} that results from merging all of the faces
+corresponding to those extents, the begin and end glyphs at the
+beginning of the run, etc.  This is the information that redisplay needs
+in order to display this run.
+Extent fragments have to be very quick to update to a new buffer
+position when moving linearly through the buffer.  They rely on the
+stack-of-extents code, which does the heavy-duty algorithmic work of
+determining which extents overly a particular position.
+@node Faces, Glyphs, Extents, Top
+@chapter Faces
+@cindex faces
+Not yet documented.
+@node Glyphs, Specifiers, Faces, Top
+@chapter Glyphs
+@cindex glyphs
+Glyphs are graphical elements that can be displayed in XEmacs buffers or
+gutters. We use the term graphical element here in the broadest possible
+sense since glyphs can be as mundane as text or as arcane as a native
+tab widget.
+In XEmacs, glyphs represent the uninstantiated state of graphical
+elements, i.e. they hold all the information necessary to produce an
+image on-screen but the image need not exist at this stage, and multiple
+screen images can be instantiated from a single glyph.
+@c #### find a place for this discussion
+@c The decision to make image specifiers a separate type is debatable.
+@c In fact, the design decision to create a separate image specifier
+@c type, rather than make glyphs themselves be specifiers, is
+@c debatable---the other properties of glyphs are rarely used and could
+@c conceivably have been incorporated into the glyph's instantiator.
+@c The rarely used glyph types (buffer, pointer, icon) could also have
+@c been incorporated into the instantiator.
+Glyphs are lazily instantiated by calling one of the glyph
+functions. This usually occurs within redisplay when
+@code{Fglyph_height} is called. Instantiation causes an image-instance
+to be created and cached. This cache is on a per-device basis for all glyphs
+except widget-glyphs, and on a per-window basis for widgets-glyphs.  The
+caching is done by @code{image_instantiate} and is necessary because it
+is generally possible to display an image-instance in multiple
+domains. For instance if we create a Pixmap, we can actually display
+this on multiple windows - even though we only need a single Pixmap
+instance to do this. If caching wasn't done then it would be necessary
+to create image-instances for every displayable occurrence of a glyph -
+and every usage - and this would be extremely memory and cpu intensive.
+Widget-glyphs (a.k.a native widgets) are not cached in this way. This is
+because widget-glyph image-instances on screen are toolkit windows, and
+thus cannot be reused in multiple XEmacs domains. Thus widget-glyphs are
+cached on an XEmacs window basis.
+Any action on a glyph first consults the cache before actually
+instantiating a widget.
+@section Glyph Instantiation
+@cindex glyph instantiation
+@cindex instantiation, glyph
+Glyph instantiation is a hairy topic and requires some explanation. The
+guts of glyph instantiation is contained within
+@code{image_instantiate}. A glyph contains an image which is a
+specifier. When a glyph function - for instance @code{Fglyph_height} -
+asks for a property of the glyph that can only be determined from its
+instantiated state, then the glyph image is instantiated and an image
+instance created. The instantiation process is governed by the specifier
+code and goes through a series of steps:
+@itemize @bullet
+@item
+Validation. Instantiation of image instances happens dynamically - often
+within the guts of redisplay. Thus it is often not feasible to catch
+instantiator errors at instantiation time. Instead the instantiator is
+validated at the time it is added to the image specifier. This function
+is defined by @code{image_validate} and at a simple level validates
+keyword value pairs.
+@item
+Duplication. The specifier code by default takes a copy of the
+instantiator. This is reasonable for most specifiers but in the case of
+widget-glyphs can be problematic, since some of the properties in the
+instantiator - for instance callbacks - could cause infinite recursion
+in the copying process. Thus the image code defines a function -
+@code{image_copy_instantiator} - which will selectively copy values.
+This is controlled by the way that a keyword is defined either using
+@code{IIFORMAT_VALID_KEYWORD} or
+@code{IIFORMAT_VALID_NONCOPY_KEYWORD}. Note that the image caching and
+redisplay code relies on instantiator copying to ensure that current and
+new instantiators are actually different rather than referring to the
+same thing.
+@item
+Normalization. Once the instantiator has been copied it must be
+converted into a form that is viable at instantiation time. This can
+involve no changes at all, but typically involves things like converting
+file names to the actual data. This function is defined by
+@code{image_going_to_add} and @code{normalize_image_instantiator}.
+@item
+Instantiation. When an image instance is actually required for display
+it is instantiated using @code{image_instantiate}. This involves calling
+instantiate methods that are specific to the type of image being
+instantiated.
+@end itemize
+The final instantiation phase also involves a number of steps. In order
+to understand these we need to describe a number of concepts.
+An image is instantiated in a @dfn{domain}, where a domain can be any
+one of a device, frame, window or image-instance. The domain gives the
+image-instance context and identity and properties that affect the
+appearance of the image-instance may be different for the same glyph
+instantiated in different domains. An example is the face used to
+display the image-instance.
+Although an image is instantiated in a particular domain the
+instantiation domain is not necessarily the domain in which the
+image-instance is cached. For example a pixmap can be instantiated in a
+window be actually be cached on a per-device basis. The domain in which
+the image-instance is actually cached is called the
+@dfn{governing-domain}. A governing-domain is currently either a device
+or a window. Widget-glyphs and text-glyphs have a window as a
+governing-domain, all other image-instances have a device as the
+governing-domain. The governing domain for an image-instance is
+determined using the governing_domain image-instance method.
+@section Widget-Glyphs
+@cindex widget-glyphs
+@section Widget-Glyphs in the MS-Windows Environment
+@cindex widget-glyphs in the MS-Windows environment
+@cindex MS-Windows environment, widget-glyphs in the
+To Do
+@section Widget-Glyphs in the X Environment
+@cindex widget-glyphs in the X environment
+@cindex X environment, widget-glyphs in the
+Widget-glyphs under X make heavy use of lwlib (@pxref{Lucid Widget
+Library}) for manipulating the native toolkit objects. This is primarily
+so that different toolkits can be supported for widget-glyphs, just as
+they are supported for features such as menubars etc.
+Lwlib is extremely poorly documented and quite hairy so here is my
+understanding of what goes on.
+Lwlib maintains a set of widget_instances which mirror the hierarchical
+state of Xt widgets. I think this is so that widgets can be updated and
+manipulated generically by the lwlib library. For instance
+update_one_widget_instance can cope with multiple types of widget and
+multiple types of toolkit. Each element in the widget hierarchy is updated
+from its corresponding widget_instance by walking the widget_instance
+tree recursively.
+This has desirable properties such as lw_modify_all_widgets which is
+called from @file{glyphs-x.c} and updates all the properties of a widget
+without having to know what the widget is or what toolkit it is from.
+Unfortunately this also has hairy properties such as making the lwlib
+code quite complex. And of course lwlib has to know at some level what
+the widget is and how to set its properties.
+@node Specifiers, Menus, Glyphs, Top
+@chapter Specifiers
+@cindex specifiers
+Not yet documented.
+Specifiers are documented in depth in the Lisp Reference manual.
+@xref{Specifiers,,, lispref, XEmacs Lisp Reference Manual}.  The code in
+@file{specifier.c} is pretty straightforward.
+@node Menus, Events and the Event Loop, Specifiers, Top
+@chapter Menus
+@cindex menus
+A menu is set by setting the value of the variable
+@code{current-menubar} (which may be buffer-local) and then calling
+@code{set-menubar-dirty-flag} to signal a change.  This will cause the
+menu to be redrawn at the next redisplay.  The format of the data in
+@code{current-menubar} is described in @file{menubar.c}.
+Internally the data in current-menubar is parsed into a tree of
+@code{widget_value's} (defined in @file{lwlib.h}); this is accomplished
+by the recursive function @code{menu_item_descriptor_to_widget_value()},
+called by @code{compute_menubar_data()}.  Such a tree is deallocated
+using @code{free_widget_value()}.
+@code{update_screen_menubars()} is one of the external entry points.
+This checks to see, for each screen, if that screen's menubar needs to
+be updated.  This is the case if
+@enumerate
+@item
+@code{set-menubar-dirty-flag} was called since the last redisplay.  (This
+function sets the C variable menubar_has_changed.)
+@item
+The buffer displayed in the screen has changed.
+@item
+The screen has no menubar currently displayed.
+@end enumerate
+@code{set_screen_menubar()} is called for each such screen.  This
+function calls @code{compute_menubar_data()} to create the tree of
+widget_value's, then calls @code{lw_create_widget()},
+@code{lw_modify_all_widgets()}, and/or @code{lw_destroy_all_widgets()}
+to create the X-Toolkit widget associated with the menu.
+@code{update_psheets()}, the other external entry point, actually
+changes the menus being displayed.  It uses the widgets fixed by
+@code{update_screen_menubars()} and calls various X functions to ensure
+that the menus are displayed properly.
+The menubar widget is set up so that @code{pre_activate_callback()} is
+called when the menu is first selected (i.e. mouse button goes down),
+and @code{menubar_selection_callback()} is called when an item is
+selected.  @code{pre_activate_callback()} calls the function in
+activate-menubar-hook, which can change the menubar (this is described
+in @file{menubar.c}).  If the menubar is changed,
+@code{set_screen_menubars()} is called.
+@code{menubar_selection_callback()} enqueues a menu event, putting in it
+a function to call (either @code{eval} or @code{call-interactively}) and
+its argument, which is the callback function or form given in the menu's
+description.
+@node Events and the Event Loop, Asynchronous Events; Quit Checking, Menus, Top
 @chapter Events and the Event Loop
 @cindex events and the event loop
 @cindex event loop, events and the
 @menu
 the only code remaining is code to call out to Lisp or provide simple
 bootstrapping implementations early in temacs, before the echo-area Lisp
 code is loaded).
-@node Asynchronous Events; Quit Checking, Evaluation; Stack Frames; Bindings, Events and the Event Loop, Top
+@node Asynchronous Events; Quit Checking, Lstreams, Events and the Event Loop, Top
 @chapter Asynchronous Events; Quit Checking
 @cindex asynchronous events; quit checking
 @cindex asynchronous events
 @menu
 @item
 printing code does not do code conversion or gettext when
 printing to stdout/stderr.
 @end itemize
-@node Evaluation; Stack Frames; Bindings, Symbols and Variables, Asynchronous Events; Quit Checking, Top
+@node Lstreams, Subprocesses, Asynchronous Events; Quit Checking, Top
-@chapter Evaluation; Stack Frames; Bindings
-@cindex evaluation; stack frames; bindings
-@cindex stack frames; bindings, evaluation;
-@cindex bindings, evaluation; stack frames;
-@menu
-* Evaluation::
-* Dynamic Binding; The specbinding Stack; Unwind-Protects::
-* Simple Special Forms::
-* Catch and Throw::
-@end menu
-@node Evaluation, Dynamic Binding; The specbinding Stack; Unwind-Protects, Evaluation; Stack Frames; Bindings, Evaluation; Stack Frames; Bindings
-@section Evaluation
-@cindex evaluation
-@code{Feval()} evaluates the form (a Lisp object) that is passed to
-it.  Note that evaluation is only non-trivial for two types of objects:
-symbols and conses.  A symbol is evaluated simply by calling
-@code{symbol-value} on it and returning the value.
-Evaluating a cons means calling a function.  First, @code{eval} checks
-to see if garbage-collection is necessary, and calls
-@code{garbage_collect_1()} if so.  It then increases the evaluation
-depth by 1 (@code{lisp_eval_depth}, which is always less than
-@code{max_lisp_eval_depth}) and adds an element to the linked list of
-@code{struct backtrace}'s (@code{backtrace_list}).  Each such structure
-contains a pointer to the function being called plus a list of the
-function's arguments.  Originally these values are stored unevalled, and
-as they are evaluated, the backtrace structure is updated.  Garbage
-collection pays attention to the objects pointed to in the backtrace
-structures (garbage collection might happen while a function is being
-called or while an argument is being evaluated, and there could easily
-be no other references to the arguments in the argument list; once an
-argument is evaluated, however, the unevalled version is not needed by
-eval, and so the backtrace structure is changed).
-At this point, the function to be called is determined by looking at
-the car of the cons (if this is a symbol, its function definition is
-retrieved and the process repeated).  The function should then consist
-of either a @code{Lisp_Subr} (built-in function written in C), a
-@code{Lisp_Compiled_Function} object, or a cons whose car is one of the
-symbols @code{autoload}, @code{macro} or @code{lambda}.
-If the function is a @code{Lisp_Subr}, the lisp object points to a
-@code{struct Lisp_Subr} (created by @code{DEFUN()}), which contains a
-pointer to the C function, a minimum and maximum number of arguments
-(or possibly the special constants @code{MANY} or @code{UNEVALLED}), a
-pointer to the symbol referring to that subr, and a couple of other
-things.  If the subr wants its arguments @code{UNEVALLED}, they are
-passed raw as a list.  Otherwise, an array of evaluated arguments is
-created and put into the backtrace structure, and either passed whole
-(@code{MANY}) or each argument is passed as a C argument.
-If the function is a @code{Lisp_Compiled_Function},
-@code{funcall_compiled_function()} is called.  If the function is a
-lambda list, @code{funcall_lambda()} is called.  If the function is a
-macro, [..... fill in] is done.  If the function is an autoload,
-@code{do_autoload()} is called to load the definition and then eval
-starts over [explain this more].
-When @code{Feval()} exits, the evaluation depth is reduced by one, the
-debugger is called if appropriate, and the current backtrace structure
-is removed from the list.
-Both @code{funcall_compiled_function()} and @code{funcall_lambda()} need
-to go through the list of formal parameters to the function and bind
-them to the actual arguments, checking for @code{&rest} and
-@code{&optional} symbols in the formal parameters and making sure the
-number of actual arguments is correct.
-@code{funcall_compiled_function()} can do this a little more
-efficiently, since the formal parameter list can be checked for sanity
-when the compiled function object is created.
-@code{funcall_lambda()} simply calls @code{Fprogn} to execute the code
-in the lambda list.
-@code{funcall_compiled_function()} calls the real byte-code interpreter
-@code{execute_optimized_program()} on the byte-code instructions, which
-are converted into an internal form for faster execution.
-When a compiled function is executed for the first time by
-@code{funcall_compiled_function()}, or during the dump phase of building
-XEmacs, the byte-code instructions are converted from a
-@code{Lisp_String} (which is inefficient to access, especially in the
-presence of MULE) into a @code{Lisp_Opaque} object containing an array
-of unsigned char, which can be directly executed by the byte-code
-interpreter.  At this time the byte code is also analyzed for validity
-and transformed into a more optimized form, so that
-@code{execute_optimized_program()} can really fly.
-Here are some of the optimizations performed by the internal byte-code
-transformer:
-@enumerate
-@item
-References to the @code{constants} array are checked for out-of-range
-indices, so that the byte interpreter doesn't have to.
-@item
-References to the @code{constants} array that will be used as a Lisp
-variable are checked for being correct non-constant (i.e. not @code{t},
-@code{nil}, or @code{keywordp}) symbols, so that the byte interpreter
-doesn't have to.
-@item
-The maximum number of variable bindings in the byte-code is
-pre-computed, so that space on the @code{specpdl} stack can be
-pre-reserved once for the whole function execution.
-@item
-All byte-code jumps are relative to the current program counter instead
-of the start of the program, thereby saving a register.
-@item
-One-byte relative jumps are converted from the byte-code form of unsigned
-chars offset by 127 to machine-friendly signed chars.
-@end enumerate
-Of course, this transformation of the @code{instructions} should not be
-visible to the user, so @code{Fcompiled_function_instructions()} needs
-to know how to convert the optimized opaque object back into a Lisp
-string that is identical to the original string from the @file{.elc}
-file.  (Actually, the resulting string may (rarely) contain slightly
-different, yet equivalent, byte code.)
-@code{Ffuncall()} implements Lisp @code{funcall}.  @code{(funcall fun
-x1 x2 x3 ...)} is equivalent to @code{(eval (list fun (quote x1) (quote
-x2) (quote x3) ...))}.  @code{Ffuncall()} contains its own code to do
-the evaluation, however, and is very similar to @code{Feval()}.
-From the performance point of view, it is worth knowing that most of the
-time in Lisp evaluation is spent executing @code{Lisp_Subr} and
-@code{Lisp_Compiled_Function} objects via @code{Ffuncall()} (not
-@code{Feval()}).
-@code{Fapply()} implements Lisp @code{apply}, which is very similar to
-@code{funcall} except that if the last argument is a list, the result is the
-same as if each of the arguments in the list had been passed separately.
-@code{Fapply()} does some business to expand the last argument if it's a
-list, then calls @code{Ffuncall()} to do the work.
-@code{apply1()}, @code{call0()}, @code{call1()}, @code{call2()}, and
-@code{call3()} call a function, passing it the argument(s) given (the
-arguments are given as separate C arguments rather than being passed as
-an array).  @code{apply1()} uses @code{Fapply()} while the others use
-@code{Ffuncall()} to do the real work.
-@node Dynamic Binding; The specbinding Stack; Unwind-Protects, Simple Special Forms, Evaluation, Evaluation; Stack Frames; Bindings
-@section Dynamic Binding; The specbinding Stack; Unwind-Protects
-@cindex dynamic binding; the specbinding stack; unwind-protects
-@cindex binding; the specbinding stack; unwind-protects, dynamic
-@cindex specbinding stack; unwind-protects, dynamic binding; the
-@cindex unwind-protects, dynamic binding; the specbinding stack;
-@example
-struct specbinding
-@{
-Lisp_Object symbol;
-Lisp_Object old_value;
-Lisp_Object (*func) (Lisp_Object); /* for unwind-protect */
-@};
-@end example
-@code{struct specbinding} is used for local-variable bindings and
-unwind-protects.  @code{specpdl} holds an array of @code{struct specbinding}'s,
-@code{specpdl_ptr} points to the beginning of the free bindings in the
-array, @code{specpdl_size} specifies the total number of binding slots
-in the array, and @code{max_specpdl_size} specifies the maximum number
-of bindings the array can be expanded to hold.  @code{grow_specpdl()}
-increases the size of the @code{specpdl} array, multiplying its size by
-2 but never exceeding @code{max_specpdl_size} (except that if this
-number is less than 400, it is first set to 400).
-@code{specbind()} binds a symbol to a value and is used for local
-variables and @code{let} forms.  The symbol and its old value (which
-might be @code{Qunbound}, indicating no prior value) are recorded in the
-specpdl array, and @code{specpdl_size} is increased by 1.
-@code{record_unwind_protect()} implements an @dfn{unwind-protect},
-which, when placed around a section of code, ensures that some specified
-cleanup routine will be executed even if the code exits abnormally
-(e.g. through a @code{throw} or quit).  @code{record_unwind_protect()}
-simply adds a new specbinding to the @code{specpdl} array and stores the
-appropriate information in it.  The cleanup routine can either be a C
-function, which is stored in the @code{func} field, or a @code{progn}
-form, which is stored in the @code{old_value} field.
-@code{unbind_to()} removes specbindings from the @code{specpdl} array
-until the specified position is reached.  Each specbinding can be one of
-three types:
-@enumerate
-@item
-an unwind-protect with a C cleanup function (@code{func} is not 0, and
-@code{old_value} holds an argument to be passed to the function);
-@item
-an unwind-protect with a Lisp form (@code{func} is 0, @code{symbol}
-is @code{nil}, and @code{old_value} holds the form to be executed with
-@code{Fprogn()}); or
-@item
-a local-variable binding (@code{func} is 0, @code{symbol} is not
-@code{nil}, and @code{old_value} holds the old value, which is stored as
-the symbol's value).
-@end enumerate
-@node Simple Special Forms, Catch and Throw, Dynamic Binding; The specbinding Stack; Unwind-Protects, Evaluation; Stack Frames; Bindings
-@section Simple Special Forms
-@cindex special forms, simple
-@code{or}, @code{and}, @code{if}, @code{cond}, @code{progn},
-@code{prog1}, @code{prog2}, @code{setq}, @code{quote}, @code{function},
-@code{let*}, @code{let}, @code{while}
-All of these are very simple and work as expected, calling
-@code{Feval()} or @code{Fprogn()} as necessary and (in the case of
-@code{let} and @code{let*}) using @code{specbind()} to create bindings
-and @code{unbind_to()} to undo the bindings when finished.
-Note that, with the exception of @code{Fprogn}, these functions are
-typically called in real life only in interpreted code, since the byte
-compiler knows how to convert calls to these functions directly into
-byte code.
-@node Catch and Throw,  , Simple Special Forms, Evaluation; Stack Frames; Bindings
-@section Catch and Throw
-@cindex catch and throw
-@cindex throw, catch and
-@example
-struct catchtag
-@{
-Lisp_Object tag;
-Lisp_Object val;
-struct catchtag *next;
-struct gcpro *gcpro;
-jmp_buf jmp;
-struct backtrace *backlist;
-int lisp_eval_depth;
-int pdlcount;
-@};
-@end example
-@code{catch} is a Lisp function that places a catch around a body of
-code.  A catch is a means of non-local exit from the code.  When a catch
-is created, a tag is specified, and executing a @code{throw} to this tag
-will exit from the body of code caught with this tag, and its value will
-be the value given in the call to @code{throw}.  If there is no such
-call, the code will be executed normally.
-Information pertaining to a catch is held in a @code{struct catchtag},
-which is placed at the head of a linked list pointed to by
-@code{catchlist}.  @code{internal_catch()} is passed a C function to
-call (@code{Fprogn()} when Lisp @code{catch} is called) and arguments to
-give it, and places a catch around the function.  Each @code{struct
-catchtag} is held in the stack frame of the @code{internal_catch()}
-instance that created the catch.
-@code{internal_catch()} is fairly straightforward.  It stores into the
-@code{struct catchtag} the tag name and the current values of
-@code{backtrace_list}, @code{lisp_eval_depth}, @code{gcprolist}, and the
-offset into the @code{specpdl} array, sets a jump point with @code{_setjmp()}
-(storing the jump point into the @code{struct catchtag}), and calls the
-function.  Control will return to @code{internal_catch()} either when
-the function exits normally or through a @code{_longjmp()} to this jump
-point.  In the latter case, @code{throw} will store the value to be
-returned into the @code{struct catchtag} before jumping.  When it's
-done, @code{internal_catch()} removes the @code{struct catchtag} from
-the catchlist and returns the proper value.
-@code{Fthrow()} goes up through the catchlist until it finds one with
-a matching tag.  It then calls @code{unbind_catch()} to restore
-everything to what it was when the appropriate catch was set, stores the
-return value in the @code{struct catchtag}, and jumps (with
-@code{_longjmp()}) to its jump point.
-@code{unbind_catch()} removes all catches from the catchlist until it
-finds the correct one.  Some of the catches might have been placed for
-error-trapping, and if so, the appropriate entries on the handlerlist
-must be removed (see ``errors'').  @code{unbind_catch()} also restores
-the values of @code{gcprolist}, @code{backtrace_list}, and
-@code{lisp_eval}, and calls @code{unbind_to()} to undo any specbindings
-created since the catch.
-@node Symbols and Variables, Buffers, Evaluation; Stack Frames; Bindings, Top
-@chapter Symbols and Variables
-@cindex symbols and variables
-@cindex variables, symbols and
-@menu
-* Introduction to Symbols::
-* Obarrays::
-* Symbol Values::
-@end menu
-@node Introduction to Symbols, Obarrays, Symbols and Variables, Symbols and Variables
-@section Introduction to Symbols
-@cindex symbols, introduction to
-A symbol is basically just an object with four fields: a name (a
-string), a value (some Lisp object), a function (some Lisp object), and
-a property list (usually a list of alternating keyword/value pairs).
-What makes symbols special is that there is usually only one symbol with
-a given name, and the symbol is referred to by name.  This makes a
-symbol a convenient way of calling up data by name, i.e. of implementing
-variables. (The variable's value is stored in the @dfn{value slot}.)
-Similarly, functions are referenced by name, and the definition of the
-function is stored in a symbol's @dfn{function slot}.  This means that
-there can be a distinct function and variable with the same name.  The
-property list is used as a more general mechanism of associating
-additional values with particular names, and once again the namespace is
-independent of the function and variable namespaces.
-@node Obarrays, Symbol Values, Introduction to Symbols, Symbols and Variables
-@section Obarrays
-@cindex obarrays
-The identity of symbols with their names is accomplished through a
-structure called an obarray, which is just a poorly-implemented hash
-table mapping from strings to symbols whose name is that string. (I say
-``poorly implemented'' because an obarray appears in Lisp as a vector
-with some hidden fields rather than as its own opaque type.  This is an
-Emacs Lisp artifact that should be fixed.)
-Obarrays are implemented as a vector of some fixed size (which should
-be a prime for best results), where each ``bucket'' of the vector
-contains one or more symbols, threaded through a hidden @code{next}
-field in the symbol.  Lookup of a symbol in an obarray, and adding a
-symbol to an obarray, is accomplished through standard hash-table
-techniques.
-The standard Lisp function for working with symbols and obarrays is
-@code{intern}.  This looks up a symbol in an obarray given its name; if
-it's not found, a new symbol is automatically created with the specified
-name, added to the obarray, and returned.  This is what happens when the
-Lisp reader encounters a symbol (or more precisely, encounters the name
-of a symbol) in some text that it is reading.  There is a standard
-obarray called @code{obarray} that is used for this purpose, although
-the Lisp programmer is free to create his own obarrays and @code{intern}
-symbols in them.
-Note that, once a symbol is in an obarray, it stays there until
-something is done about it, and the standard obarray @code{obarray}
-always stays around, so once you use any particular variable name, a
-corresponding symbol will stay around in @code{obarray} until you exit
-XEmacs.
-Note that @code{obarray} itself is a variable, and as such there is a
-symbol in @code{obarray} whose name is @code{"obarray"} and which
-contains @code{obarray} as its value.
-Note also that this call to @code{intern} occurs only when in the Lisp
-reader, not when the code is executed (at which point the symbol is
-already around, stored as such in the definition of the function).
-You can create your own obarray using @code{make-vector} (this is
-horrible but is an artifact) and intern symbols into that obarray.
-Doing that will result in two or more symbols with the same name.
-However, at most one of these symbols is in the standard @code{obarray}:
-You cannot have two symbols of the same name in any particular obarray.
-Note that you cannot add a symbol to an obarray in any fashion other
-than using @code{intern}: i.e. you can't take an existing symbol and put
-it in an existing obarray.  Nor can you change the name of an existing
-symbol. (Since obarrays are vectors, you can violate the consistency of
-things by storing directly into the vector, but let's ignore that
-possibility.)
-Usually symbols are created by @code{intern}, but if you really want,
-you can explicitly create a symbol using @code{make-symbol}, giving it
-some name.  The resulting symbol is not in any obarray (i.e. it is
-@dfn{uninterned}), and you can't add it to any obarray.  Therefore its
-primary purpose is as a symbol to use in macros to avoid namespace
-pollution.  It can also be used as a carrier of information, but cons
-cells could probably be used just as well.
-You can also use @code{intern-soft} to look up a symbol but not create
-a new one, and @code{unintern} to remove a symbol from an obarray.  This
-returns the removed symbol. (Remember: You can't put the symbol back
-into any obarray.) Finally, @code{mapatoms} maps over all of the symbols
-in an obarray.
-@node Symbol Values,  , Obarrays, Symbols and Variables
-@section Symbol Values
-@cindex symbol values
-@cindex values, symbol
-The value field of a symbol normally contains a Lisp object.  However,
-a symbol can be @dfn{unbound}, meaning that it logically has no value.
-This is internally indicated by storing a special Lisp object, called
-@dfn{the unbound marker} and stored in the global variable
-@code{Qunbound}.  The unbound marker is of a special Lisp object type
-called @dfn{symbol-value-magic}.  It is impossible for the Lisp
-programmer to directly create or access any object of this type.
-@strong{You must not let any ``symbol-value-magic'' object escape to
-the Lisp level.}  Printing any of these objects will cause the message
-@samp{INTERNAL EMACS BUG} to appear as part of the print representation.
-(You may see this normally when you call @code{debug_print()} from the
-debugger on a Lisp object.) If you let one of these objects escape to
-the Lisp level, you will violate a number of assumptions contained in
-the C code and make the unbound marker not function right.
-When a symbol is created, its value field (and function field) are set
-to @code{Qunbound}.  The Lisp programmer can restore these conditions
-later using @code{makunbound} or @code{fmakunbound}, and can query to
-see whether the value of function fields are @dfn{bound} (i.e. have a
-value other than @code{Qunbound}) using @code{boundp} and
-@code{fboundp}.  The fields are set to a normal Lisp object using
-@code{set} (or @code{setq}) and @code{fset}.
-Other symbol-value-magic objects are used as special markers to
-indicate variables that have non-normal properties.  This includes any
-variables that are tied into C variables (setting the variable magically
-sets some global variable in the C code, and likewise for retrieving the
-variable's value), variables that magically tie into slots in the
-current buffer, variables that are buffer-local, etc.  The
-symbol-value-magic object is stored in the value cell in place of
-a normal object, and the code to retrieve a symbol's value
-(i.e. @code{symbol-value}) knows how to do special things with them.
-This means that you should not just fetch the value cell directly if you
-want a symbol's value.
-The exact workings of this are rather complex and involved and are
-well-documented in comments in @file{buffer.c}, @file{symbols.c}, and
-@file{lisp.h}.
-@node Buffers, Text, Symbols and Variables, Top
-@chapter Buffers
-@cindex buffers
-@menu
-* Introduction to Buffers::     A buffer holds a block of text such as a file.
-* Buffer Lists::                Keeping track of all buffers.
-* Markers and Extents::         Tagging locations within a buffer.
-* The Buffer Object::           The Lisp object corresponding to a buffer.
-@end menu
-@node Introduction to Buffers, Buffer Lists, Buffers, Buffers
-@section Introduction to Buffers
-@cindex buffers, introduction to
-A buffer is logically just a Lisp object that holds some text.
-In this, it is like a string, but a buffer is optimized for
-frequent insertion and deletion, while a string is not.  Furthermore:
-@enumerate
-@item
-Buffers are @dfn{permanent} objects, i.e. once you create them, they
-remain around, and need to be explicitly deleted before they go away.
-@item
-Each buffer has a unique name, which is a string.  Buffers are
-normally referred to by name.  In this respect, they are like
-symbols.
-@item
-Buffers have a default insertion position, called @dfn{point}.
-Inserting text (unless you explicitly give a position) goes at point,
-and moves point forward past the text.  This is what is going on when
-you type text into Emacs.
-@item
-Buffers have lots of extra properties associated with them.
-@item
-Buffers can be @dfn{displayed}.  What this means is that there
-exist a number of @dfn{windows}, which are objects that correspond
-to some visible section of your display, and each window has
-an associated buffer, and the current contents of the buffer
-are shown in that section of the display.  The redisplay mechanism
-(which takes care of doing this) knows how to look at the
-text of a buffer and come up with some reasonable way of displaying
-this.  Many of the properties of a buffer control how the
-buffer's text is displayed.
-@item
-One buffer is distinguished and called the @dfn{current buffer}.  It is
-stored in the variable @code{current_buffer}.  Buffer operations operate
-on this buffer by default.  When you are typing text into a buffer, the
-buffer you are typing into is always @code{current_buffer}.  Switching
-to a different window changes the current buffer.  Note that Lisp code
-can temporarily change the current buffer using @code{set-buffer} (often
-enclosed in a @code{save-excursion} so that the former current buffer
-gets restored when the code is finished).  However, calling
-@code{set-buffer} will NOT cause a permanent change in the current
-buffer.  The reason for this is that the top-level event loop sets
-@code{current_buffer} to the buffer of the selected window, each time
-it finishes executing a user command.
-@end enumerate
-Make sure you understand the distinction between @dfn{current buffer}
-and @dfn{buffer of the selected window}, and the distinction between
-@dfn{point} of the current buffer and @dfn{window-point} of the selected
-window. (This latter distinction is explained in detail in the section
-on windows.)
-@node Buffer Lists, Markers and Extents, Introduction to Buffers, Buffers
-@section Buffer Lists
-@cindex buffer lists
-Recall earlier that buffers are @dfn{permanent} objects, i.e.  that
-they remain around until explicitly deleted.  This entails that there is
-a list of all the buffers in existence.  This list is actually an
-assoc-list (mapping from the buffer's name to the buffer) and is stored
-in the global variable @code{Vbuffer_alist}.
-The order of the buffers in the list is important: the buffers are
-ordered approximately from most-recently-used to least-recently-used.
-Switching to a buffer using @code{switch-to-buffer},
-@code{pop-to-buffer}, etc. and switching windows using
-@code{other-window}, etc.  usually brings the new current buffer to the
-front of the list.  @code{switch-to-buffer}, @code{other-buffer},
-etc. look at the beginning of the list to find an alternative buffer to
-suggest.  You can also explicitly move a buffer to the end of the list
-using @code{bury-buffer}.
-In addition to the global ordering in @code{Vbuffer_alist}, each frame
-has its own ordering of the list.  These lists always contain the same
-elements as in @code{Vbuffer_alist} although possibly in a different
-order.  @code{buffer-list} normally returns the list for the selected
-frame.  This allows you to work in separate frames without things
-interfering with each other.
-The standard way to look up a buffer given a name is
-@code{get-buffer}, and the standard way to create a new buffer is
-@code{get-buffer-create}, which looks up a buffer with a given name,
-creating a new one if necessary.  These operations correspond exactly
-with the symbol operations @code{intern-soft} and @code{intern},
-respectively.  You can also force a new buffer to be created using
-@code{generate-new-buffer}, which takes a name and (if necessary) makes
-a unique name from this by appending a number, and then creates the
-buffer.  This is basically like the symbol operation @code{gensym}.
-@node Markers and Extents, The Buffer Object, Buffer Lists, Buffers
-@section Markers and Extents
-@cindex markers and extents
-@cindex extents, markers and
-Among the things associated with a buffer are things that are
-logically attached to certain buffer positions.  This can be used to
-keep track of a buffer position when text is inserted and deleted, so
-that it remains at the same spot relative to the text around it; to
-assign properties to particular sections of text; etc.  There are two
-such objects that are useful in this regard: they are @dfn{markers} and
-@dfn{extents}.
-A @dfn{marker} is simply a flag placed at a particular buffer
-position, which is moved around as text is inserted and deleted.
-Markers are used for all sorts of purposes, such as the @code{mark} that
-is the other end of textual regions to be cut, copied, etc.
-An @dfn{extent} is similar to two markers plus some associated
-properties, and is used to keep track of regions in a buffer as text is
-inserted and deleted, and to add properties (e.g. fonts) to particular
-regions of text.  The external interface of extents is explained
-elsewhere.
-The important thing here is that markers and extents simply contain
-buffer positions in them as integers, and every time text is inserted or
-deleted, these positions must be updated.  In order to minimize the
-amount of shuffling that needs to be done, the positions in markers and
-extents (there's one per marker, two per extent) are stored in Membpos's.
-This means that they only need to be moved when the text is physically
-moved in memory; since the gap structure tries to minimize this, it also
-minimizes the number of marker and extent indices that need to be
-adjusted.  Look in @file{insdel.c} for the details of how this works.
-One other important distinction is that markers are @dfn{temporary}
-while extents are @dfn{permanent}.  This means that markers disappear as
-soon as there are no more pointers to them, and correspondingly, there
-is no way to determine what markers are in a buffer if you are just
-given the buffer.  Extents remain in a buffer until they are detached
-(which could happen as a result of text being deleted) or the buffer is
-deleted, and primitives do exist to enumerate the extents in a buffer.
-@node The Buffer Object,  , Markers and Extents, Buffers
-@section The Buffer Object
-@cindex buffer object, the
-@cindex object, the buffer
-Buffers contain fields not directly accessible by the Lisp programmer.
-We describe them here, naming them by the names used in the C code.
-Many are accessible indirectly in Lisp programs via Lisp primitives.
-@table @code
-@item name
-The buffer name is a string that names the buffer.  It is guaranteed to
-be unique.  @xref{Buffer Names,,, lispref, XEmacs Lisp Reference
-Manual}.
-@item save_modified
-This field contains the time when the buffer was last saved, as an
-integer.  @xref{Buffer Modification,,, lispref, XEmacs Lisp Reference
-Manual}.
-@item modtime
-This field contains the modification time of the visited file.  It is
-set when the file is written or read.  Every time the buffer is written
-to the file, this field is compared to the modification time of the
-file.  @xref{Buffer Modification,,, lispref, XEmacs Lisp Reference
-Manual}.
-@item auto_save_modified
-This field contains the time when the buffer was last auto-saved.
-@item last_window_start
-This field contains the @code{window-start} position in the buffer as of
-the last time the buffer was displayed in a window.
-@item undo_list
-This field points to the buffer's undo list.  @xref{Undo,,, lispref,
-XEmacs Lisp Reference Manual}.
-@item syntax_table_v
-This field contains the syntax table for the buffer.  @xref{Syntax
-Tables,,, lispref, XEmacs Lisp Reference Manual}.
-@item downcase_table
-This field contains the conversion table for converting text to lower
-case.  @xref{Case Tables,,, lispref, XEmacs Lisp Reference Manual}.
-@item upcase_table
-This field contains the conversion table for converting text to upper
-case.  @xref{Case Tables,,, lispref, XEmacs Lisp Reference Manual}.
-@item case_canon_table
-This field contains the conversion table for canonicalizing text for
-case-folding search.  @xref{Case Tables,,, lispref, XEmacs Lisp
-Reference Manual}.
-@item case_eqv_table
-This field contains the equivalence table for case-folding search.
-@xref{Case Tables,,, lispref, XEmacs Lisp Reference Manual}.
-@item display_table
-This field contains the buffer's display table, or @code{nil} if it
-doesn't have one.  @xref{Display Tables,,, lispref, XEmacs Lisp
-Reference Manual}.
-@item markers
-This field contains the chain of all markers that currently point into
-the buffer.  Deletion of text in the buffer, and motion of the buffer's
-gap, must check each of these markers and perhaps update it.
-@xref{Markers,,, lispref, XEmacs Lisp Reference Manual}.
-@item backed_up
-This field is a flag that tells whether a backup file has been made for
-the visited file of this buffer.
-@item mark
-This field contains the mark for the buffer.  The mark is a marker,
-hence it is also included on the list @code{markers}.  @xref{The Mark,,,
-lispref, XEmacs Lisp Reference Manual}.
-@item mark_active
-This field is non-@code{nil} if the buffer's mark is active.
-@item local_var_alist
-This field contains the association list describing the variables local
-in this buffer, and their values, with the exception of local variables
-that have special slots in the buffer object.  (Those slots are omitted
-from this table.)  @xref{Buffer-Local Variables,,, lispref, XEmacs Lisp
-Reference Manual}.
-@item modeline_format
-This field contains a Lisp object which controls how to display the mode
-line for this buffer.  @xref{Modeline Format,,, lispref, XEmacs Lisp
-Reference Manual}.
-@item base_buffer
-This field holds the buffer's base buffer (if it is an indirect buffer),
-or @code{nil}.
-@end table
-@node Text, Multilingual Support, Buffers, Top
-@chapter Text
-@cindex text
-@menu
-* The Text in a Buffer::        Representation of the text in a buffer.
-* Ibytes and Ichars::           Representation of individual characters.
-* Byte-Char Position Conversion::
-* Searching and Matching::      Higher-level algorithms.
-@end menu
-@node The Text in a Buffer, Ibytes and Ichars, Text, Text
-@section The Text in a Buffer
-@cindex text in a buffer, the
-@cindex buffer, the text in a
-The text in a buffer consists of a sequence of zero or more
-characters.  A @dfn{character} is an integer that logically represents
-a letter, number, space, or other unit of text.  Most of the characters
-that you will typically encounter belong to the ASCII set of characters,
-but there are also characters for various sorts of accented letters,
-special symbols, Chinese and Japanese ideograms (i.e. Kanji, Katakana,
-etc.), Cyrillic and Greek letters, etc.  The actual number of possible
-characters is quite large.
-For now, we can view a character as some non-negative integer that
-has some shape that defines how it typically appears (e.g. as an
-uppercase A). (The exact way in which a character appears depends on the
-font used to display the character.) The internal type of characters in
-the C code is an @code{Ichar}; this is just an @code{int}, but using a
-symbolic type makes the code clearer.
-Between every character in a buffer is a @dfn{buffer position} or
-@dfn{character position}.  We can speak of the character before or after
-a particular buffer position, and when you insert a character at a
-particular position, all characters after that position end up at new
-positions.  When we speak of the character @dfn{at} a position, we
-really mean the character after the position.  (This schizophrenia
-between a buffer position being ``between'' two characters and ``on'' a
-character is rampant in Emacs.)
-Buffer positions are numbered starting at 1.  This means that
-position 1 is before the first character, and position 0 is not
-valid.  If there are N characters in a buffer, then buffer
-position N+1 is after the last one, and position N+2 is not valid.
-The internal makeup of the Ichar integer varies depending on whether
-we have compiled with MULE support.  If not, the Ichar integer is an
-8-bit integer with possible values from 0 - 255.  0 - 127 are the
-standard ASCII characters, while 128 - 255 are the characters from the
-ISO-8859-1 character set.  If we have compiled with MULE support, an
-Ichar is a 19-bit integer, with the various bits having meanings
-according to a complex scheme that will be detailed later.  The
-characters numbered 0 - 255 still have the same meanings as for the
-non-MULE case, though.
-Internally, the text in a buffer is represented in a fairly simple
-fashion: as a contiguous array of bytes, with a @dfn{gap} of some size
-in the middle.  Although the gap is of some substantial size in bytes,
-there is no text contained within it: From the perspective of the text
-in the buffer, it does not exist.  The gap logically sits at some buffer
-position, between two characters (or possibly at the beginning or end of
-the buffer).  Insertion of text in a buffer at a particular position is
-always accomplished by first moving the gap to that position
-(i.e. through some block moving of text), then writing the text into the
-beginning of the gap, thereby shrinking the gap.  If the gap shrinks
-down to nothing, a new gap is created. (What actually happens is that a
-new gap is ``created'' at the end of the buffer's text, which requires
-nothing more than changing a couple of indices; then the gap is
-``moved'' to the position where the insertion needs to take place by
-moving up in memory all the text after that position.)  Similarly,
-deletion occurs by moving the gap to the place where the text is to be
-deleted, and then simply expanding the gap to include the deleted text.
-(@dfn{Expanding} and @dfn{shrinking} the gap as just described means
-just that the internal indices that keep track of where the gap is
-located are changed.)
-Note that the total amount of memory allocated for a buffer text never
-decreases while the buffer is live.  Therefore, if you load up a
-20-megabyte file and then delete all but one character, there will be a
-20-megabyte gap, which won't get any smaller (except by inserting
-characters back again).  Once the buffer is killed, the memory allocated
-for the buffer text will be freed, but it will still be sitting on the
-heap, taking up virtual memory, and will not be released back to the
-operating system. (However, if you have compiled XEmacs with rel-alloc,
-the situation is different.  In this case, the space @emph{will} be
-released back to the operating system.  However, this tends to result in a
-noticeable speed penalty.)
-Astute readers may notice that the text in a buffer is represented as
-an array of @emph{bytes}, while (at least in the MULE case) an Ichar is
-a 19-bit integer, which clearly cannot fit in a byte.  This means (of
-course) that the text in a buffer uses a different representation from
-an Ichar: specifically, the 19-bit Ichar becomes a series of one to
-four bytes.  The conversion between these two representations is complex
-and will be described later.
-In the non-MULE case, everything is very simple: An Ichar
-is an 8-bit value, which fits neatly into one byte.
-If we are given a buffer position and want to retrieve the
-character at that position, we need to follow these steps:
-@enumerate
-@item
-Pretend there's no gap, and convert the buffer position into a @dfn{byte
-index} that indexes to the appropriate byte in the buffer's stream of
-textual bytes.  By convention, byte indices begin at 1, just like buffer
-positions.  In the non-MULE case, byte indices and buffer positions are
-identical, since one character equals one byte.
-@item
-Convert the byte index into a @dfn{memory index}, which takes the gap
-into account.  The memory index is a direct index into the block of
-memory that stores the text of a buffer.  This basically just involves
-checking to see if the byte index is past the gap, and if so, adding the
-size of the gap to it.  By convention, memory indices begin at 1, just
-like buffer positions and byte indices, and when referring to the
-position that is @dfn{at} the gap, we always use the memory position at
-the @emph{beginning}, not at the end, of the gap.
-@item
-Fetch the appropriate bytes at the determined memory position.
-@item
-Convert these bytes into an Ichar.
-@end enumerate
-In the non-Mule case, (3) and (4) boil down to a simple one-byte
-memory access.
-Note that we have defined three types of positions in a buffer:
-@enumerate
-@item
-@dfn{buffer positions} or @dfn{character positions}, typedef @code{Charbpos}
-@item
-@dfn{byte indices}, typedef @code{Bytebpos}
-@item
-@dfn{memory indices}, typedef @code{Membpos}
-@end enumerate
-All three typedefs are just @code{int}s, but defining them this way makes
-things a lot clearer.
-Most code works with buffer positions.  In particular, all Lisp code
-that refers to text in a buffer uses buffer positions.  Lisp code does
-not know that byte indices or memory indices exist.
-Finally, we have a typedef for the bytes in a buffer.  This is a
-@code{Ibyte}, which is an unsigned char.  Referring to them as
-Ibytes underscores the fact that we are working with a string of bytes
-in the internal Emacs buffer representation rather than in one of a
-number of possible alternative representations (e.g. EUC-encoded text,
-etc.).
-@node Ibytes and Ichars, Byte-Char Position Conversion, The Text in a Buffer, Text
-@section Ibytes and Ichars
-@cindex Ibytes and Ichars
-@cindex Ichars, Ibytes and
-Not yet documented.
-@node Byte-Char Position Conversion, Searching and Matching, Ibytes and Ichars, Text
-@section Byte-Char Position Conversion
-@cindex byte-char position conversion
-@cindex position conversion, byte-char
-@cindex conversion, byte-char position
-Oct 2004:
-This is what I wrote when describing the previous algorithm:
-@quotation
-The basic algorithm we use is to keep track of a known region of
-characters in each buffer, all of which are of the same width.  We keep
-track of the boundaries of the region in both Charbpos and Bytebpos
-coordinates and also keep track of the char width, which is 1 - 4 bytes.
-If the position we're translating is not in the known region, then we
-invoke a function to update the known region to surround the position in
-question.  This assumes locality of reference, which is usually the
-case.
-Note that the function to update the known region can be simple or
-complicated depending on how much information we cache.  In addition to
-the known region, we always cache the correct conversions for point,
-BEGV, and ZV, and in addition to this we cache 16 positions where the
-conversion is known.  We only look in the cache or update it when we
-need to move the known region more than a certain amount (currently 50
-chars), and then we throw away a "random" value and replace it with the
-newly calculated value.
-Finally, we maintain an extra flag that tracks whether the buffer is
-entirely ASCII, to speed up the conversions even more.  This flag is
-actually of dubious value because in an entirely-ASCII buffer the known
-region will always span the entire buffer (in fact, we update the flag
-based on this fact), and so all we're saving is a few machine cycles.
-A potentially smarter method than what we do with known regions and
-cached positions would be to keep some sort of pseudo-extent layer over
-the buffer; maybe keep track of the charbpos/bytebpos correspondence at
-the beginning of each line, which would allow us to do a binary search
-over the pseudo-extents to narrow things down to the correct line, at
-which point you could use a linear movement method.  This would also
-mesh well with efficiently implementing a line-numbering scheme.
-However, you have to weigh the amount of time spent updating the cache
-vs. the savings that result from it.  In reality, we modify the buffer
-far less often than we access it, so a cache of this sort that provides
-guaranteed LOG (N) performance (or perhaps N * LOG (N), if we set a
-maximum on the cache size) would indeed be a win, particularly in very
-large buffers.  If we ever implement this, we should probably set a
-reasonably high minimum below which we use the old method, because the
-time spent updating the fancy cache would likely become dominant when
-making buffer modifications in smaller buffers.
-Note also that we have to multiply or divide by the char width in order
-to convert the positions.  We do some tricks to avoid ever actually
-having to do a multiply or divide, because that is typically an
-expensive operation (esp. divide).  Multiplying or dividing by 1, 2, or
-4 can be implemented simply as a shift left or shift right, and we keep
-track of a shifter value (0, 1, or 2) indicating how much to shift.
-Multiplying by 3 can be implemented by doubling and then adding the
-original value.  Dividing by 3, alas, cannot be implemented in any
-simple shift/subtract method, as far as I know; so we just do a table
-lookup.  For simplicity, we use a table of size 128K, which indexes the
-"divide-by-3" values for the first 64K non-negative numbers. (Note that
-we can increase the size up to 384K, i.e. indexing the first 192K
-non-negative numbers, while still using shorts in the array.) This also
-means that the size of the known region can be at most 64K for
-width-three characters.
-@end quotation
-Unfortunately, it turned out that the implementation had serious problems
-which had never been corrected.  In particular, the known region had a
-large tendency to become zero-length and stay that way.
-So I decided to port the algorithm from FSF 21.3, in markers.c.
-This algorithm is fairly simple.  Instead of using markers I kept the cache
-array of known positions from the previous implementation.
-Basically, we keep a number of positions cached:
-@itemize @bullet
-@item
-the actual end of the buffer
-@item
-the beginning and end of the accessible region
-@item
-the value of point
-@item
-the position of the gap
-@item
-the last value we computed
-@item
-a set of positions that are "far away" from previously computed positions
-(5000 chars currently; #### perhaps should be smaller)
-@end itemize
-For each position, we @code{CONSIDER()} it.  This means:
-@itemize @bullet
-@item
-If the position is what we're looking for, return it directly.
-@item
-Starting with the beginning and end of the buffer, we successively
-compute the smallest enclosing range of known positions.  If at any
-point we discover that this range has the same byte and char length
-(i.e. is entirely single-byte), then our computation is trivial.
-@item
-If at any point we get a small enough range (50 chars currently),
-stop considering further positions.
-@end itemize
-Otherwise, once we have an enclosing range, see which side is closer, and
-iterate until we find the desired value.  As an optimization, I replaced
-the simple loop in FSF with the use of @code{bytecount_to_charcount()},
-@code{charcount_to_bytecount()}, @code{bytecount_to_charcount_down()}, or
-@code{charcount_to_bytecount_down()}. (The latter two I added for this purpose.)
-These scan 4 or 8 bytes at a time through purely single-byte characters.
-If the amount we had to scan was more than our "far away" distance (5000
-characters, see above), then cache the new position.
-#### Things to do:
-@itemize @bullet
-@item
-Look at the most recent GNU Emacs to see whether anything has changed.
-@item
-Think about whether it makes sense to try to implement some sort of
-known region or list of "known regions", like we had before.  This would
-be a region of entirely single-byte characters that we can check very
-quickly. (Previously I used a range of same-width characters of any
-size; but this adds extra complexity and slows down the scanning, and is
-probably not worth it.) As part of the scanning process in
-@code{bytecount_to_charcount()} et al, we skip over chunks of entirely
-single-byte chars, so it should be easy to remember the last one.
-Presumably what we should do is keep track of the largest known surrounding
-entirely-single-byte region for each of the cache positions as well as
-perhaps the last-cached position.  We want to be careful not to get bitten
-by the previous problem of having the known region getting reset too
-often.  If we implement this, we might well want to continue scanning
-some distance past the desired position (maybe 300-1000 bytes) if we are
-in a single-byte range so that we won't end up expanding the known range
-one position at a time and entering the function each time.
-@item
-Think about whether it makes sense to keep the position cache sorted.
-This would allow it to be larger and finer-grained in its positions.
-Note that with FSF's use of markers, they were sorted, but this
-was not really made good use of.  With an array, we can do binary searching
-to quickly find the smallest range.  We would probably want to make use of
-the gap-array code in extents.c.
-@end itemize
-Note that FSF's algorithm checked @strong{ALL} markers, not just the ones cached
-by this algorithm.  This includes markers created by the user as well as
-both ends of any overlays.  We could do similarly, and our extents could
-keep both byte and character positions rather than just the former.  (But
-this would probably be overkill.  We should just use our cache instead.
-Any place an extent was set was surely already visited by the char<-->byte
-conversion routines.)
-@node Searching and Matching,  , Byte-Char Position Conversion, Text
-@section Searching and Matching
-@cindex searching
-@cindex matching
-Very incomplete, limited to a brief introduction.
-People find the searching and matching code difficult to understand.
-And indeed, the details are hard.  However, the basic structures are not
-so complex.  First, there's a hard question with a simple answer.  What
-about Mule?  The answer here is that it turns out that Mule characters
-can be matched byte by byte, so neither the search code nor the regular
-expression code need take much notice of it at all!  Of course, we add
-some special features (such as regular expressions that match only
-certain charsets), but these do not require new concepts.  The main
-exception is that wild-card matches in Mule have to be careful to
-swallow whole characters.  This is handled using the same basic macros
-that are used for buffer and string movements.
-This will also be true if a UTF-8 representation is used for the
-internal encoding.
-The complex algorithms for searching are for simple string searches.  In
-particular, the algorithm used for fast string searching is Boyer-Moore.
-This algorithm is based on the idea that if you have a mismatch at a
-given position, you can precompute where to restart the search.  This
-typically means that you can often make many fewer than N character
-comparisons, where N is the position at which the match is found, or the
-size of the text if it contains no match.  That's fast!  But it's not
-easy.  You must ``compile'' the search string into a jump table.  See
-the source, @file{search.c}, for more information.
-Emacs changes the basic algorithms somewhat in order to handle
-case-insensitive searches without a full-blown regular expression.
-Regular expressions, on the other hand, have a trivial search
-implementation: try a match at each position.  (Under POSIX rules, it's
-a bit more complex, because POSIX requires that you find the
-@emph{longest} match in the text.  This means you keep a record of the
-best match so far, and find all the matches.)
-The matching code for regular expressions is quite complex.  First, the
-regular expression itself is compiled.  There are two basic approaches
-that could be taken.  The first is to compile the expression into tables
-to drive a generic finite automaton emulator.  This is the approach
-given in many textbooks (Sedgewick's @emph{Algorithms} and Aho, Sethi,
-and Ullmann's @emph{Compilers: Principles, Techniques, and Tools}, aka
-``The Dragon Book'') as well as being used by the @file{lex} family of
-lexical analysis engines.
-Emacs uses a somewhat different technique.  The expression is compiled
-into a form of bytecode, which is interpreted by a special interpreter.
-The interpreter itself basically amounts to an inline implementation of
-the finite automaton emulator.  The advantage of this technique is that
-it's easier to add special features, such as control of case-sensitivity
-via a global variable.
-The compiler is not treated here.  See the source, @file{regex.c}.  The
-interpreter, although it is divided into several functions, and looks
-fearsomely complex, is actually quite simple in concept.  However,
-basically what you're doing there is a strcmp on steroids, right?
-@example
-int
-strcmp (char *p,            /* pattern pointer */
-char *b)            /* buffer pointer  */
-@{
-while (*p++ == *b++)
-;
-return *(--p) - *(--b);   /* oops, we overshot */
-@}
-@end example
-Really, it's no harder than that.  (A bit of a white lie, OK?)
-How does the regexp code generalize this?
-@enumerate
-@item
-Depending on the pattern, @code{*b} may have a general relationship to
-@code{*p}.  @emph{I.e.}, direct comparison against @code{*p} is
-generalized to include checks for set membership, and context dependent
-properties.  This depends on @code{&*b}.  Of course that's meaningless
-in C, so we use @code{b} directly, instead.
-@item
-Although to ensure the algorithm terminates, @code{b} must advance step
-by step, @code{p} can branch and jump.
-@item
-The information returned is much greater, including information about
-subexpressions.
-@end enumerate
-We'll ignore (3).  (2) is mostly interesting when compiling the regular
-expression.  Now we have
-@example
-@group
-enum operator_t @{
-accept = 0,
-exact,
-any,
-range,
-group,       /* actually, these are probably */
-repeat,      /* turned into conditional code */
-/* etc */
-@};
-@end group
-@group
-enum status_t @{
-working = 0,
-matched,
-mismatch,
-end_of_buffer,
-error
-@};
-@end group
-@group
-struct pattern @{
-enum operator_t operator;
-char char_value;
-boolean range_table[256];
-/* etc, etc */
-@};
-@end group
-@group
-char *p,  /* pattern pointer */
-*b;  /* buffer pointer */
-enum status_t
-match (struct pattern *p, char *b)
-@{
-enum status_t done = working;
-while (!(done = match_1_operator (p, b)))
-@{
-struct pattern *p1 = p;
-p = next_p (p, b);
-b = next_b (p1, b);
-@}
-return done;
-@}
-@end group
-@end example
-This format exposes the underlying finite automaton.
-All of them have the following structure, except that the @samp{next_*}
-functions decide where to jump (for @samp{p}) and whether or not to
-increment (for @samp{b}), rather than checking for satisfaction of a
-matching condition.
-@example
-enum status_t
-match_1_operator (pattern *p, char *b)
-@{
-if (! *b) return end_of_buffer;
-switch (p->operator)
-@{
-case accept:
-return matched;
-case exact:
-if (*b != p->char_value) return mismatch; else break;
-case any:
-break;
-case range:
-/* range_table is computed in the regexp_compile function */
-if (! p->range_table[*b]) return mismatch;
-/* etc, etc */
-@}
-return working;
-@}
-@end example
-Grouping, repetition, and alternation are handled by compiling the
-subexpression and calling @code{match (p->subpattern, b)} recursively.
-In terms of reading the actual code, there are five optimizations
-(obfuscations, if you like) that have been done.
-@enumerate
-@item
-An explicit "failure stack" has been substituted for recursion.
-@item
-The @code{match_1_operator}, @code{next_p}, and @code{next_b} functions
-are actually inlined into the @code{match} function for efficiency.
-Then the pointer movement is interspersed with the matching operations.
-@item
-If the operator uses buffer context, the buffer pointer movement is
-sometimes implicit in the operations retrieving the context.
-@item
-Some cases are combined into short preparation for individual cases, and
-a "fall-through" into combined code for several cases.
-@item
-The @code{pattern} type is not an explicit @samp{struct}.  Instead, the
-data (including, @emph{e.g.}, @samp{range_table}) is inlined into the
-compiled bytecode.  This leads to bizarre code in the interpreter like
-@example
-case range:
-p += *(p + 1); break;
-@end example
-in @code{next_p}, because the compiled pattern is laid out
-@example
-..., 'range', count, first_8_flags, second_8_flags, ..., next_op, ...
-@end example
-@end enumerate
-But if you keep your eye on the "switch in a loop" structure, you
-should be able to understand the parts you need.
-@node Multilingual Support, The Lisp Reader and Compiler, Text, Top
-@chapter Multilingual Support
-@cindex Mule character sets and encodings
-@cindex character sets and encodings, Mule
-@cindex encodings, Mule character sets and
-@emph{NOTE}: There is a great deal of overlapping and redundant
-information in this chapter.  Ben wrote introductions to Mule issues a
-number of times, each time not realizing that he had already written
-another introduction previously.  Hopefully, in time these will all be
-integrated.
-@emph{NOTE}: The information at the top of the source file
-@file{text.c} is more complete than the following, and there is also a
-list of all other places to look for text/I18N-related info.  Also look in
-@file{text.h} for info about the DFC and Eistring API's.
-Recall that there are two primary ways that text is represented in
-XEmacs.  The @dfn{buffer} representation sees the text as a series of
-bytes (Ibytes), with a variable number of bytes used per character.
-The @dfn{character} representation sees the text as a series of integers
-(Ichars), one per character.  The character representation is a cleaner
-representation from a theoretical standpoint, and is thus used in many
-cases when lots of manipulations on a string need to be done.  However,
-the buffer representation is the standard representation used in both
-Lisp strings and buffers, and because of this, it is the ``default''
-representation that text comes in.  The reason for using this
-representation is that it's compact and is compatible with ASCII.
-@menu
-* Introduction to Multilingual Issues #1::
-* Introduction to Multilingual Issues #2::
-* Introduction to Multilingual Issues #3::
-* Introduction to Multilingual Issues #4::
-* Character Sets::
-* Encodings::
-* Internal Mule Encodings::
-* Byte/Character Types; Buffer Positions; Other Typedefs::
-* Internal Text API's::
-* Coding for Mule::
-* CCL::
-* Modules for Internationalization::
-@end menu
-@node Introduction to Multilingual Issues #1, Introduction to Multilingual Issues #2, Multilingual Support, Multilingual Support
-@section Introduction to Multilingual Issues #1
-@cindex introduction to multilingual issues #1
-There is an introduction to these issues in the Lisp Reference manual.
-@xref{Internationalization Terminology,,, lispref, XEmacs Lisp Reference
-Manual}.  Among other documentation that may be of interest to internals
-programmers is ISO-2022 (@pxref{ISO 2022,,, lispref, XEmacs Lisp
-Reference Manual}) and CCL (@pxref{CCL,,, lispref, XEmacs Lisp Reference
-Manual})
-@node Introduction to Multilingual Issues #2, Introduction to Multilingual Issues #3, Introduction to Multilingual Issues #1, Multilingual Support
-@section Introduction to Multilingual Issues #2
-@cindex introduction to multilingual issues #2
-@subheading Introduction
-This document covers a number of design issues, problems and proposals
-with regards to XEmacs MULE.  At first we present some definitions and
-some aspects of the design that have been agreed upon.  Then we present
-some issues and problems that need to be addressed, and then I include a
-proposal of mine to address some of these issues.  When there are other
-proposals, for example from Olivier, these will be appended to the end
-of this document.
-@subheading Definitions and Design Basics
-First, @dfn{text} is defined to be a series of characters which together
-defines an utterance or partial utterance in some language.
-Generally, this language is a human language, but it may also be a
-computer language if the computer language uses a representation close
-enough to that of human languages for it to also make sense to call its
-representation text.  Text is opposed to @dfn{binary}, which is a sequence
-of bytes, representing machine-readable but not human-readable data.
-A @dfn{byte} is merely a number within a predefined range, which nowadays is
-nearly always zero to 255.  A @dfn{character} is a unit of text.  What makes
-one character different from another is not always clear-cut.  It is
-generally related to the appearance of the character, although perhaps
-not any possible appearance of that character, but some sort of ideal
-appearance that is assigned to a character.  Whether two characters
-that look very similar are actually the same depends on various
-factors such as political ones, such as whether the characters are
-used to mean similar sorts of things, or behave similarly in similar
-contexts.  In any case, it is not always clearly defined whether two
-characters are actually the same or not.  In practice, however, this
-is more or less agreed upon.
-A @dfn{character set} is just that, a set of one or more characters.
-The set is unique in that there will not be more than one instance of
-the same character in a character set, and logically is unordered,
-although an order is often imposed or suggested for the characters in
-the character set.  We can also define an @dfn{order} on a character
-set, which is a way of assigning a unique number, or possibly a pair of
-numbers, or a triplet of numbers, or even a set of four or more numbers
-to each character in the character set.  The combination of an order in
-the character set results in an @dfn{ordered character set}.  In an
-ordered character set, there is an upper limit and a lower limit on the
-possible values that a character, or that any number within the set of
-numbers assigned to a character, can take.  However, the lower limit
-does not have to start at zero or one, or anywhere else in particular,
-nor does the upper limit have to end anywhere particular, and there may
-be gaps within these ranges such that particular numbers or sets of
-numbers do not have a corresponding character, even though they are
-within the upper and lower limits.  For example, @dfn{ASCII} defines a
-very standard ordered character set.  It is normally defined to be 94
-characters in the range 33 through 126 inclusive on both ends, with
-every possible character within this range being actually present in the
-character set.
-Sometimes the ASCII character set is extended to include what are called
-@dfn{non-printing characters}.  Non-printing characters are characters
-which instead of really being displayed in a more or less rectangular
-block, like all other characters, instead indicate certain functions
-typically related to either control of the display upon which the
-characters are being displayed, or have some effect on a communications
-channel that may be currently open and transmitting characters, or may
-change the meaning of future characters as they are being decoded, or
-some other similar function.  You might say that non-printing characters
-are somewhat of a hack because they are a special exception to the
-standard concept of a character as being a printed glyph that has some
-direct correspondence in the non-computer world.
-With non-printing characters in mind, the 94-character ordered character
-set called ASCII is often extended into a 96-character ordered character
-set, also often called ASCII, which includes in addition to the 94
-characters already mentioned, two non-printing characters, one called
-space and assigned the number 32, just below the bottom of the previous
-range, and another called @dfn{delete} or @dfn{rubout}, which is given
-number 127 just above the end of the previous range.  Thus to reiterate,
-the result is a 96-character ordered character set, whose characters
-take the values from 32 to 127 inclusive.  Sometimes ASCII is further
-extended to contain 32 more non-printing characters, which are given the
-numbers zero through 31 so that the result is a 128-character ordered
-character set with characters numbered zero through 127, and with many
-non-printing characters.  Another way to look at this, and the way that
-is normally taken by XEmacs MULE, is that the characters that would be
-in the range 30 through 31 in the most extended definition of ASCII,
-instead form their own ordered character set, which is called
-@dfn{control zero}, and consists of 32 characters in the range zero
-through 31.  A similar ordered character set called @dfn{control one} is
-also created, and it contains 32 more non-printing characters in the
-range 128 through 159.  Note that none of these three ordered character
-sets overlaps in any of the numbers they are assigned to their
-characters, so they can all be used at once.  Note further that the same
-character can occur in more than one character set.  This was shown
-above, for example, in two different ordered character sets we defined,
-one of which we could have called @dfn{ASCII}, and the other
-@dfn{ASCII-extended}, to show that it had extended by two non-printable
-characters.  Most of the characters in these two character sets are
-shared and present in both of them.
-Note that there is no restriction on the size of the character set, or
-on the numbers that are assigned to characters in an ordered character
-set.  It is often extremely useful to represent a sequence of characters
-as a sequence of bytes, where a byte as defined above is a number in the
-range zero to 255.  An @dfn{encoding} does precisely this.  It is simply
-a mapping from a sequence of characters, possibly augmented with
-information indicating the character set that each of these characters
-belongs to, to a sequence of bytes which represents that sequence of
-characters and no other -- which is to say the mapping is reversible.
-A @dfn{coding system} is a set of rules for encoding a sequence of
-characters augmented with character set information into a sequence of
-bytes, and later performing the reverse operation.  It is frequently
-possible to group coding systems into classes or types based on common
-features.  Typically, for example, a particular coding system class
-may contain a base coding system which specifies some of the rules,
-but leaves the rest unspecified.  Individual members of the coding
-system class are formed by starting with the base coding system, and
-augmenting it with additional rules to produce a particular coding
-system, what you might think of as a sort of variation within a
-theme.
-@subheading XEmacs Specific Definitions
-First of all, in XEmacs, the concept of character is a little different
-from the general definition given above.  For one thing, the character
-set that a character belongs to may or may not be an inherent part of
-the character itself.  In other words, the same character occurring in
-two different character sets may appear in XEmacs as two different
-characters.  This is generally the case now, but we are attempting to
-move in the other direction.  Different proposals may have different
-ideas about exactly the extent to which this change will be carried out.
-The general trend, though, is to represent all information about a
-character other than the character itself, using text properties
-attached to the character.  That way two instances of the same character
-will look the same to lisp code that merely retrieves the character, and
-does not also look at the text properties of that character.  Everyone
-involved is in agreement in doing it this way with all Latin characters,
-and in fact for all characters other than Chinese, Japanese, and Korean
-ideographs.  For those, there may be a difference of opinion.
-A second difference between the general definition of character and the
-XEmacs usage of character is that each character is assigned a unique
-number that distinguishes it from all other characters in the world, or
-at the very least, from all other characters currently existing anywhere
-inside the current XEmacs invocation.  (If there is a case where the
-weaker statement applies, but not the stronger statement, it would
-possibly be with composite characters and any other such characters that
-are created on the sly.)
-This unique number is called the @dfn{character representation} of the
-character, and its particular details are a matter of debate.  There is
-the current standard in use that it is undoubtedly going to change.
-What has definitely been agreed upon is that it will be an integer, more
-specifically a positive integer, represented with less than or equal to
-31 bits on a 32-bit architecture, and possibly up to 63 bits on a 64-bit
-architecture, with the proviso that any characters that whose
-representation would fit in a 64-bit architecture, but not on a 32-bit
-architecture, would be used only for composite characters, and others
-that would satisfy the weak uniqueness property mentioned above, but not
-with the strong uniqueness property.
-At this point, it is useful to talk about the different representations
-that a sequence of characters can take.  The simplest representation is
-simply as a sequence of characters, and this is called the @dfn{Lisp
-representation} of text, because it is the representation that Lisp
-programs see.  Other representations include the external
-representation, which refers to any encoding of the sequence of
-characters, using the definition of encoding mentioned above.
-Typically, text in the external representation is used outside of
-XEmacs, for example in files, e-mail messages, web sites, and the like.
-Another representation for a sequence of characters is what I will call
-the @dfn{byte representation}, and it represents the way that XEmacs
-internally represents text in a buffer, or in a string.  Potentially,
-the representation could be different between a buffer and a string, and
-then the terms @dfn{buffer byte representation} and @dfn{string byte
-representation} would be used, but in practice I don't think this will
-occur.  It will be possible, of course, for buffers and strings, or
-particular buffers and particular strings, to contain different
-sub-representations of a single representation.  For example, Olivier's
-1-2-4 proposal allows for three sub-representations of his internal byte
-representation, allowing for 1 byte, 2 bytes, and 4 byte width
-characters respectively.  A particular string may be in one
-sub-representation, and a particular buffer in another
-sub-representation, but overall both are following the same byte
-representation.  I do not use the term @dfn{internal representation}
-here, as many people have, because it is potentially ambiguous.
-Another representation is called the @dfn{array of characters
-representation}.  This is a representation on the C-level in which the
-sequence of text is represented, not using the byte representation, but
-by using an array of characters, each represented using the character
-representation.  This sort of representation is often used by redisplay
-because it is more convenient to work with than any of the other
-internal representations.
-The term @dfn{binary representation} may also be heard.  Binary
-representation is used to represent binary data.  When binary data is
-represented in the lisp representation, an equivalence is simply set up
-between bytes zero through 255, and characters zero through 255.  These
-characters come from four character sets, which are from bottom to top,
-control zero, ASCII, control 1, and Latin 1.  Together, they comprise
-256 characters, and are a good mapping for the 256 possible bytes in a
-binary representation.  Binary representation could also be used to
-refer to an external representation of the binary data, which is a
-simple direct byte-to-byte representation.  No internal representation
-should ever be referred to as a binary representation because of
-ambiguity.  The terms character set/encoding system were defined
-generally, above.  In XEmacs, the equivalent concepts exist, although
-character set has been shortened to charset, and in fact represents
-specifically an ordered character set.  For each possible charset, and
-for each possible coding system, there is an associated object in
-XEmacs.  These objects will be of type charset and coding system,
-respectively.  Charsets and coding systems are divided into classes, or
-@dfn{types}, the normal term under XEmacs, and all possible charsets
-encoding systems that may be defined must be in one of these types.  If
-you need to create a charset or coding system that is not one of these
-types, you will have to modify the C code to support this new type.
-Some of the existing or soon-to-be-created types are, or will be,
-generic enough so that this shouldn't be an issue.  Note also that the
-byte encoding for text and the character coding of a character are
-closely related.  You might say that ideally each is the simplest
-equivalent of the other given the general constraints on each
-representation.
-To be specific, in the current MULE representation,
-@enumerate
-@item
-Characters encode both the character itself and the character set
-that it comes from.  These character sets are always assumed to be
-representable as an ordered character set of size 96 or of size 96
-by 96, or the trivially-related sizes 94 and 94 by 94.  The only
-allowable exceptions are the control zero and control one character
-sets, which are of size 32.  Character sets which do not naturally
-have a compatible ordering such as this are shoehorned into an
-ordered character set, or possibly two ordered character sets of a
-compatible size.
-@item
-The variable width byte representation was deliberately chosen to
-allow scanning text forwards and backwards efficiently.  This
-necessitated defining the possible bytes into three ranges which
-we shall call A, B, and C.  Range A is used exclusively for
-single-byte characters, which is to say characters that are
-representing using only one contiguous byte.  Multi-byte
-characters are always represented by using one byte from Range B,
-followed by one or more bytes from Range C.  What this means is
-that bytes that begin a character are unequivocally distinguished
-from bytes that do not begin a character, and therefore there is
-never a problem scaling backwards and finding the beginning of a
-character.  Know that UTF8 adopts a proposal that is very similar
-in spirit in that it uses separate ranges for the first byte of a
-multi byte sequence, and the following bytes in multi-byte
-sequence.
-@item
-Given the fact that all ordered character sets allowed were
-essentially 96 characters per dimension, it made perfect sense to
-make Range C comprise 96 bytes.  With a little more tweaking, the
-currently-standard MULE byte representation was created, and was
-drafted from this.
-@item
-The MULE byte representation defined four basic representations for
-characters, which would take up from one to four bytes,
-respectively.  The MULE character representation thus had the
-following constraints:
-@enumerate
-@item
-Character numbers zero through 255 should represent the
-characters that binary values zero through 255 would be
-mapped onto.  (Note: this was not the case in Kenichi Handa's
-version of this representation, but I changed it.)
-@item
-The four sub-classes of representation in the MULE byte
-representation should correspond to four contiguous
-non-overlapping ranges of characters.
-@item
-The algorithmic conversion between the single character
-represented in the byte representation and in the character
-representation should be as easy as possible.
-@item
-Given the previous constraints, the character representation
-should be as compact as possible, which is to say it should
-use the least number of bits possible.
-@end enumerate
-@end enumerate
-So you see that the entire structure of the byte and character
-representations stemmed from a very small number of basic choices,
-which were
-@enumerate
-@item
-the choice to encode character set information in a character
-@item
-the choice to assume that all character sets would have an order
-imposed upon them with 96 characters per one or two
-dimensions. (This is less arbitrary than it seems--it follows
-ISO-2022)
-@item
-the choice to use a variable width byte representation.
-@end enumerate
-What this means is that you cannot really separate the byte
-representation, the character representation, and the assumptions made
-about characters and whether they represent character sets from each
-other.  All of these are closely intertwined, and for purposes of
-simplicity, they should be designed together.  If you change one
-representation without changing another, you are in essence creating a
-completely new design with its own attendant problems--since your new
-design is likely to be quite complex and not very coherent with
-regards to the translation between the character and byte
-representations, you are likely to run into problems.
-@node Introduction to Multilingual Issues #3, Introduction to Multilingual Issues #4, Introduction to Multilingual Issues #2, Multilingual Support
-@section Introduction to Multilingual Issues #3
-@cindex introduction to multilingual issues #3
-In XEmacs, Mule is a code word for the support for input handling and
-display of multi-lingual text.  This section provides an overview of how
-this support impacts the C and Lisp code in XEmacs.  It is important for
-anyone who works on the C or the Lisp code, especially on the C code, to
-be aware of these issues, even if they don't work directly on code that
-implements multi-lingual features, because there are various general
-procedures that need to be followed in order to write Mule-compliant
-code.  (The specifics of these procedures are documented elsewhere in
-this manual.)
-There are four primary aspects of Mule support:
-@enumerate
-@item
-internal handling and representation of multi-lingual text.
-@item
-conversion between the internal representation of text and the various
-external representations in which multi-lingual text is encoded, such as
-Unicode representations (including mostly fixed width encodings such as
-UCS-2/UTF-16 and UCS-4 and variable width ASCII conformant encodings,
-such as UTF-7 and UTF-8); the various ISO2022 representations, which
-typically use escape sequences to switch between different character
-sets (such as Compound Text, used under X Windows; JIS, used
-specifically for encoding Japanese; and EUC, a non-modal encoding used
-for Japanese, Korean, and certain other languages); Microsoft's
-multi-byte encodings (such as Shift-JIS); various simple encodings for
-particular 8-bit character sets (such as Latin-1 and Latin-2, and
-encodings (such as koi8 and Alternativny) for Cyrillic); and others.
-This conversion needs to happen both for text in files and text sent to
-or retrieved from system API calls.  It even needs to happen for
-external binary data because the internal representation does not
-represent binary data simply as a sequence of bytes as it is represented
-externally.
-@item
-Proper display of multi-lingual characters.
-@item
-Input of multi-lingual text using the keyboard.
-@end enumerate
-These four aspects are for the most part independent of each other.
-@subheading Characters, Character Sets, and Encodings
-A @dfn{character} (which is, BTW, a surprisingly complex concept) is, in
-a written representation of text, the most basic written unit that has a
-meaning of its own.  It's comparable to a phoneme when analyzing words
-in spoken speech (for example, the sound of @samp{t} in English, which
-in fact has different pronunciations in different words -- aspirated in
-@samp{time}, unaspirated in @samp{stop}, unreleased or even pronounced
-as a glottal stop in @samp{button}, etc. -- but logically is a single
-concept).  Like a phoneme, a character is an abstract concept defined by
-its @emph{meaning}.  The character @samp{lowercase f}, for example, can
-always be used to represent the first letter in the word @samp{fill},
-regardless of whether it's drawn upright or italic, whether the
-@samp{fi} combination is drawn as a single ligature, whether there are
-serifs on the bottom of the vertical stroke, etc. (These different
-appearances of a single character are often called @dfn{graphs} or
-@dfn{glyphs}.) Our concern when representing text is on representing the
-abstract characters, and not on their exact appearance.
-A @dfn{character set} (or @dfn{charset}), as we define it, is a set of
-characters, each with an associated number (or set of numbers -- see
-below), called a @dfn{code point}.  It's important to understand that a
-character is not defined by any number attached to it, but by its
-meaning.  For example, ASCII and EBCDIC are two charsets containing
-exactly the same characters (lowercase and uppercase letters, numbers 0
-through 9, particular punctuation marks) but with different
-numberings. The `comma' character in ASCII and EBCDIC, for instance, is
-the same character despite having a different numbering.  Conversely,
-when comparing ASCII and JIS-Roman, which look the same except that the
-latter has a yen sign substituted for the backslash, we would say that
-the backslash and yen sign are @strong{not} the same characters, despite having
-the same number (95) and despite the fact that all other characters are
-present in both charsets, with the same numbering.  ASCII and JIS-Roman,
-then, do @emph{not} have exactly the same characters in them (ASCII has
-a backslash character but no yen-sign character, and vice-versa for
-JIS-Roman), unlike ASCII and EBCDIC, even though the numberings in ASCII
-and JIS-Roman are closer.
-It's also important to distinguish between charsets and encodings.  For
-a simple charset like ASCII, there is only one encoding normally used --
-each character is represented by a single byte, with the same value as
-its code point.  For more complicated charsets, however, things are not
-so obvious.  Unicode version 2, for example, is a large charset with
-thousands of characters, each indexed by a 16-bit number, often
-represented in hex, e.g. 0x05D0 for the Hebrew letter "aleph".  One
-obvious encoding uses two bytes per character (actually two encodings,
-depending on which of the two possible byte orderings is chosen).  This
-encoding is convenient for internal processing of Unicode text; however,
-it's incompatible with ASCII, so a different encoding, e.g. UTF-8, is
-usually used for external text, for example files or e-mail.  UTF-8
-represents Unicode characters with one to three bytes (often extended to
-six bytes to handle characters with up to 31-bit indices).  Unicode
-characters 00 to 7F (identical with ASCII) are directly represented with
-one byte, and other characters with two or more bytes, each in the range
-80 to FF.
-In general, a single encoding may be able to represent more than one
-charset.
-@subheading Internal Representation of Text
-In an ASCII or single-European-character-set world, life is very simple.
-There are 256 characters, and each character is represented using the
-numbers 0 through 255, which fit into a single byte.  With a few
-exceptions (such as case-changing operations or syntax classes like
-'whitespace'), "text" is simply an array of indices into a font.  You
-can get different languages simply by choosing fonts with different
-8-bit character sets (ISO-8859-1, -2, special-symbol fonts, etc.), and
-everything will "just work" as long as anyone else receiving your text
-uses a compatible font.
-In the multi-lingual world, however, it is much more complicated.  There
-are a great number of different characters which are organized in a
-complex fashion into various character sets.  The representation to use
-is not obvious because there are issues of size versus speed to
-consider.  In fact, there are in general two kinds of representations to
-work with: one that represents a single character using an integer
-(possibly a byte), and the other representing a single character as a
-sequence of bytes.  The former representation is normally called fixed
-width, and the other variable width. Both representations represent
-exactly the same characters, and the conversion from one representation
-to the other is governed by a specific formula (rather than by table
-lookup) but it may not be simple.  Most C code need not, and in fact
-should not, know the specifics of exactly how the representations work.
-In fact, the code must not make assumptions about the representations.
-This means in particular that it must use the proper macros for
-retrieving the character at a particular memory location, determining
-how many characters are present in a particular stretch of text, and
-incrementing a pointer to a particular character to point to the
-following character, and so on.  It must not assume that one character
-is stored using one byte, or even using any particular number of bytes.
-It must not assume that the number of characters in a stretch of text
-bears any particular relation to a number of bytes in that stretch.  It
-must not assume that the character at a particular memory location can
-be retrieved simply by dereferencing the memory location, even if a
-character is known to be ASCII or is being compared with an ASCII
-character, etc.  Careful coding is required to be Mule clean.  The
-biggest work of adding Mule support, in fact, is converting all of the
-existing code to be Mule clean.
-Lisp code is mostly unaffected by these concerns.  Text in strings and
-buffers appears simply as a sequence of characters regardless of
-whether Mule support is present.  The biggest difference with older
-versions of Emacs, as well as current versions of GNU Emacs, is that
-integers and characters are no longer equivalent, but are separate
-Lisp Object types.
-@subheading Conversion Between Internal and External Representations
-All text needs to be converted to an external representation before being
-sent to a function or file, and all text retrieved from a function of
-file needs to be converted to the internal representation.  This
-conversion needs to happen as close to the source or destination of the
-text as possible.  No operations should ever be performed on text encoded
-in an external representation other than simple copying, because no
-assumptions can reliably be made about the format of this text.  You
-cannot assume, for example, that the end of text is terminated by a null
-byte. (For example, if the text is Unicode, it will have many null bytes
-in it.)  You cannot find the next "slash" character by searching through
-the bytes until you find a byte that looks like a "slash" character,
-because it might actually be the second byte of a Kanji character.
-Furthermore, all text in the internal representation must be converted,
-even if it is known to be completely ASCII, because the external
-representation may not be ASCII compatible (for example, if it is
-Unicode).
-The place where C code needs to be the most careful is when calling
-external API functions.  It is easy to forget that all text passed to or
-retrieved from these functions needs to be converted.  This includes text
-in structures passed to or retrieved from these functions and all text
-that is passed to a callback function that is called by the system.
-Macros are provided to perform conversions to or from external text.
-These macros are called TO_EXTERNAL_FORMAT and TO_INTERNAL_FORMAT
-respectively.  These macros accept input in various forms, for example,
-Lisp strings, buffers, lstreams, raw data, and can return data in
-multiple formats, including both @code{malloc()}ed and @code{alloca()}ed data.  The use
-of @code{alloca()}ed data here is particularly important because, in general,
-the returned data will not be used after making the API call, and as a
-result, using @code{alloca()}ed data provides a very cheap and easy to use
-method of allocation.
-These macros take a coding system argument which indicates the nature of
-the external encoding.  A coding system is an object that encapsulates
-the structures of a particular external encoding and the methods required
-to convert to and from this encoding.  A facility exists to create coding
-system aliases, which in essence gives a single coding system two
-different names.  It is effectively used in XEmacs to provide a layer of
-abstraction on top of the actual coding systems.  For example, the coding
-system alias "file-name" points to whichever coding system is currently
-used for encoding and decoding file names as passed to or retrieved from
-system calls.  In general, the actual encoding will differ from system to
-system, and also on the particular locale that the user is in.  The use
-of the file-name alias effectively hides that implementation detail on
-top of that abstract interface layer which provides a unified set of
-coding systems which are consistent across all operating environments.
-The choice of which coding system to use in a particular conversion macro
-requires some thought.  In general, you should choose a lower-level
-actual coding system when the very design of the APIs you are working
-with call for that particular coding system.  In all other cases, you
-should find the least general abstract coding system (i.e. coding system
-alias) that applies to your specific situation.  Only use the most
-general coding systems, such as native, when there is simply nothing else
-that is more appropriate.  By doing things this way, you allow the user
-more control over how the encoding actually works, because the user is
-free to map the abstracted coding system names onto to different actual
-coding systems.
-Some common coding systems are:
-@table @code
-@item ctext
-Compound Text, which is the standard encoding under X Windows, which is
-used for clipboard data and possibly other data.  (ctext is a coding
-system of type ISO2022.)
-@item mswindows-unicode
-this is used for representing text passed to MS Window API calls with
-arguments that need to be in Unicode format.  (mswindows-unicode is a
-coding system of type UTF-16)
-@item ms-windows-multi-byte
-this is used for representing text passed to MS Windows API calls with
-arguments that need to be in multi-byte format.  Note that there are
-very few if any examples of such calls.
-@item mswindows-tstr
-this is used for representing text passed to any MS Windows API calls
-that declare their argument as LPTSTR, or LPCTSTR.  This is the vast
-majority of system calls and automatically translates either to
-mswindows-unicode or mswindows-multi-byte, depending on the presence or
-absence of the UNICODE preprocessor constant.  (If we compile XEmacs
-with this preprocessor constant, then all API calls use Unicode for all
-text passed to or received from these API calls.)
-@item terminal
-used for text sent to or read from a text terminal in the absence of a
-more specific coding system (calls to window-system specific APIs should
-use the appropriate window-specific coding system if it makes sense to
-do so.)
-@item file-name
-used when specifying the names of files in the absence of a more
-specific encoding, such as ms-windows-tstr.
-@item native
-the most general coding system for specifying text passed to system
-calls.  This generally translates to whatever coding system is specified
-by the current locale.  This should only be used when none of the coding
-systems mentioned above are appropriate.
-@end table
-@subheading Proper Display of Multilingual Text
-There are two things required to get this working correctly.  One is
-selecting the correct font, and the other is encoding the text according
-to the encoding used for that specific font, or the window-system
-specific text display API.  Generally each separate character set has a
-different font associated with it, which is specified by name and each
-font has an associated encoding into which the characters must be
-translated.  (this is the case on X Windows, at least; on Windows there
-is a more general mechanism).  Both the specific font for a charset and
-the encoding of that font are system dependent.  Currently there is a
-way of specifying these two properties under X Windows (using the
-registry and ccl properties of a character set) but not for other window
-systems.  A more general system needs to be implemented to allow these
-characteristics to be specified for all Windows systems.
-Another issue is making sure that the necessary fonts for displaying
-various character sets are installed on the system.  Currently, XEmacs
-provides, on its web site, X Windows fonts for a number of different
-character sets that can be installed by users.  This isn't done yet for
-Windows, but it should be.
-@subheading Inputting of Multilingual Text
-This is a rather complicated issue because there are many paradigms
-defined for inputting multi-lingual text, some of which are specific to
-particular languages, and any particular language may have many
-different paradigms defined for inputting its text.  These paradigms are
-encoded in input methods and there is a standard API for defining an
-input method in XEmacs called LEIM, or Library of Emacs Input Methods.
-Some of these input methods are written entirely in Elisp, and thus are
-system-independent, while others require the aid either of an external
-process, or of C level support that ties into a particular
-system-specific input method API, for example, XIM under X Windows, or
-the active keyboard layout and IME support under Windows.  Currently,
-there is no support for any system-specific input methods under
-Microsoft Windows, although this will change.
-@node Introduction to Multilingual Issues #4, Character Sets, Introduction to Multilingual Issues #3, Multilingual Support
-@section Introduction to Multilingual Issues #4
-@cindex introduction to multilingual issues #4
-The rest of the sections in this chapter consist of yet another
-introduction to multilingual issues, duplicating the information in the
-previous sections.
-@node Character Sets, Encodings, Introduction to Multilingual Issues #4, Multilingual Support
-@section Character Sets
-@cindex character sets
-A @dfn{character set} (or @dfn{charset}) is an ordered set of
-characters.  A particular character in a charset is indexed using one or
-more @dfn{position codes}, which are non-negative integers.  The number
-of position codes needed to identify a particular character in a charset
-is called the @dfn{dimension} of the charset.  In XEmacs/Mule, all
-charsets have dimension 1 or 2, and the size of all charsets (except for
-a few special cases) is either 94, 96, 94 by 94, or 96 by 96.  The range
-of position codes used to index characters from any of these types of
-character sets is as follows:
-@example
-Charset type            Position code 1         Position code 2
-------------------------------------------------------------
-94                      33 - 126                N/A
-96                      32 - 127                N/A
-94x94                   33 - 126                33 - 126
-96x96                   32 - 127                32 - 127
-@end example
-Note that in the above cases position codes do not start at an
-expected value such as 0 or 1.  The reason for this will become clear
-later.
-For example, Latin-1 is a 96-character charset, and JISX0208 (the
-Japanese national character set) is a 94x94-character charset.
-[Note that, although the ranges above define the @emph{valid} position
-codes for a charset, some of the slots in a particular charset may in
-fact be empty.  This is the case for JISX0208, for example, where (e.g.)
-all the slots whose first position code is in the range 118 - 127 are
-empty.]
-There are three charsets that do not follow the above rules.  All of
-them have one dimension, and have ranges of position codes as follows:
-@example
-Charset name            Position code 1
-------------------------------------
-ASCII                   0 - 127
-Control-1               0 - 31
-Composite               0 - some large number
-@end example
-(The upper bound of the position code for composite characters has not
-yet been determined, but it will probably be at least 16,383).
-ASCII is the union of two subsidiary character sets: Printing-ASCII
-(the printing ASCII character set, consisting of position codes 33 -
-126, like for a standard 94-character charset) and Control-ASCII (the
-non-printing characters that would appear in a binary file with codes 0
-- 32 and 127).
-Control-1 contains the non-printing characters that would appear in a
-binary file with codes 128 - 159.
-Composite contains characters that are generated by overstriking one
-or more characters from other charsets.
-Note that some characters in ASCII, and all characters in Control-1,
-are @dfn{control} (non-printing) characters.  These have no printed
-representation but instead control some other function of the printing
-(e.g. TAB or 8 moves the current character position to the next tab
-stop).  All other characters in all charsets are @dfn{graphic}
-(printing) characters.
-When a binary file is read in, the bytes in the file are assigned to
-character sets as follows:
-@example
-Bytes           Character set           Range
---------------------------------------------------
-0 - 127         ASCII                   0 - 127
-128 - 159       Control-1               0 - 31
-160 - 255       Latin-1                 32 - 127
-@end example
-This is a bit ad-hoc but gets the job done.
-@node Encodings, Internal Mule Encodings, Character Sets, Multilingual Support
-@section Encodings
-@cindex encodings, Mule
-@cindex Mule encodings
-An @dfn{encoding} is a way of numerically representing characters from
-one or more character sets.  If an encoding only encompasses one
-character set, then the position codes for the characters in that
-character set could be used directly.  This is not possible, however, if
-more than one character set is to be used in the encoding.
-For example, the conversion detailed above between bytes in a binary
-file and characters is effectively an encoding that encompasses the
-three character sets ASCII, Control-1, and Latin-1 in a stream of 8-bit
-bytes.
-Thus, an encoding can be viewed as a way of encoding characters from a
-specified group of character sets using a stream of bytes, each of which
-contains a fixed number of bits (but not necessarily 8, as in the common
-usage of ``byte'').
-Here are descriptions of a couple of common
-encodings:
-@menu
-* Japanese EUC (Extended Unix Code)::
-* JIS7::
-@end menu
-@node Japanese EUC (Extended Unix Code), JIS7, Encodings, Encodings
-@subsection Japanese EUC (Extended Unix Code)
-@cindex Japanese EUC (Extended Unix Code)
-@cindex EUC (Extended Unix Code), Japanese
-@cindex Extended Unix Code, Japanese EUC
-This encompasses the character sets Printing-ASCII, Katakana-JISX0201
-(half-width katakana, the right half of JISX0201), Japanese-JISX0208,
-and Japanese-JISX0212.
-Note that Printing-ASCII and Katakana-JISX0201 are 94-character
-charsets, while Japanese-JISX0208 and Japanese-JISX0212 are
-94x94-character charsets.
-The encoding is as follows:
-@example
-Character set            Representation (PC=position-code)
--------------            --------------
-Printing-ASCII           PC1
-Katakana-JISX0201        0x8E       | PC1 + 0x80
-Japanese-JISX0208        PC1 + 0x80 | PC2 + 0x80
-Japanese-JISX0212        PC1 + 0x80 | PC2 + 0x80
-@end example
-Note that there are other versions of EUC for other Asian languages.
-EUC in general is characterized by
-@enumerate
-@item
-row-column encoding,
-@item
-big-endian (row-first) ordering, and
-@item
-ASCII compatibility in variable width forms.
-@end enumerate
-@node JIS7,  , Japanese EUC (Extended Unix Code), Encodings
-@subsection JIS7
-@cindex JIS7
-This encompasses the character sets Printing-ASCII,
-Latin-JISX0201 (the left half of JISX0201; this character set
-is very similar to Printing-ASCII and is a 94-character charset),
-Japanese-JISX0208, and Katakana-JISX0201.  It uses 7-bit bytes.
-Unlike EUC, this is a @dfn{modal} encoding, which means that there are
-multiple states that the encoding can be in, which affect how the bytes
-are to be interpreted.  Special sequences of bytes (called @dfn{escape
-sequences}) are used to change states.
-The encoding is as follows:
-@example
-Character set              Representation (PC=position-code)
--------------              --------------
-Printing-ASCII             PC1
-Latin-JISX0201             PC1
-Katakana-JISX0201          PC1
-Japanese-JISX0208          PC1 | PC2
-Escape sequence   ASCII equivalent   Meaning
----------------   ----------------   -------
-0x1B 0x28 0x4A    ESC ( J            invoke Latin-JISX0201
-0x1B 0x28 0x49    ESC ( I            invoke Katakana-JISX0201
-0x1B 0x24 0x42    ESC $ B            invoke Japanese-JISX0208
-0x1B 0x28 0x42    ESC ( B            invoke Printing-ASCII
-@end example
-Initially, Printing-ASCII is invoked.
-@node Internal Mule Encodings, Byte/Character Types; Buffer Positions; Other Typedefs, Encodings, Multilingual Support
-@section Internal Mule Encodings
-@cindex internal Mule encodings
-@cindex Mule encodings, internal
-@cindex encodings, internal Mule
-In XEmacs/Mule, each character set is assigned a unique number, called a
-@dfn{leading byte}.  This is used in the encodings of a character.
-Leading bytes are in the range 0x80 - 0xFF (except for ASCII, which has
-a leading byte of 0), although some leading bytes are reserved.
-Charsets whose leading byte is in the range 0x80 - 0x9F are called
-@dfn{official} and are used for built-in charsets.  Other charsets are
-called @dfn{private} and have leading bytes in the range 0xA0 - 0xFF;
-these are user-defined charsets.
-More specifically:
-@example
-Character set                Leading byte
--------------                ------------
-ASCII                        0 (0x7F in arrays indexed by leading byte)
-Composite                    0x8D
-Dimension-1 Official         0x80 - 0x8C/0x8D
-(0x8E is free)
-Control                      0x8F
-Dimension-2 Official         0x90 - 0x99
-(0x9A - 0x9D are free)
-Dimension-1 Private Marker   0x9E
-Dimension-2 Private Marker   0x9F
-Dimension-1 Private          0xA0 - 0xEF
-Dimension-2 Private          0xF0 - 0xFF
-@end example
-There are two internal encodings for characters in XEmacs/Mule.  One is
-called @dfn{string encoding} and is an 8-bit encoding that is used for
-representing characters in a buffer or string.  It uses 1 to 4 bytes per
-character.  The other is called @dfn{character encoding} and is a 19-bit
-encoding that is used for representing characters individually in a
-variable.
-(In the following descriptions, we'll ignore composite characters for
-the moment.  We also give a general (structural) overview first,
-followed later by the exact details.)
-@menu
-* Internal String Encoding::
-* Internal Character Encoding::
-@end menu
-@node Internal String Encoding, Internal Character Encoding, Internal Mule Encodings, Internal Mule Encodings
-@subsection Internal String Encoding
-@cindex internal string encoding
-@cindex string encoding, internal
-@cindex encoding, internal string
-ASCII characters are encoded using their position code directly.  Other
-characters are encoded using their leading byte followed by their
-position code(s) with the high bit set.  Characters in private character
-sets have their leading byte prefixed with a @dfn{leading byte prefix},
-which is either 0x9E or 0x9F. (No character sets are ever assigned these
-leading bytes.) Specifically:
-@example
-Character set           Encoding (PC=position-code, LB=leading-byte)
--------------           --------
-ASCII                   PC-1 |
-Control-1               LB   |  PC1 + 0xA0 |
-Dimension-1 official    LB   |  PC1 + 0x80 |
-Dimension-1 private     0x9E |  LB         | PC1 + 0x80 |
-Dimension-2 official    LB   |  PC1 + 0x80 | PC2 + 0x80 |
-Dimension-2 private     0x9F |  LB         | PC1 + 0x80 | PC2 + 0x80
-@end example
-The basic characteristic of this encoding is that the first byte
-of all characters is in the range 0x00 - 0x9F, and the second and
-following bytes of all characters is in the range 0xA0 - 0xFF.
-This means that it is impossible to get out of sync, or more
-specifically:
-@enumerate
-@item
-Given any byte position, the beginning of the character it is
-within can be determined in constant time.
-@item
-Given any byte position at the beginning of a character, the
-beginning of the next character can be determined in constant
-time.
-@item
-Given any byte position at the beginning of a character, the
-beginning of the previous character can be determined in constant
-time.
-@item
-Textual searches can simply treat encoded strings as if they
-were encoded in a one-byte-per-character fashion rather than
-the actual multi-byte encoding.
-@end enumerate
-None of the standard non-modal encodings meet all of these
-conditions.  For example, EUC satisfies only (2) and (3), while
-Shift-JIS and Big5 (not yet described) satisfy only (2). (All
-non-modal encodings must satisfy (2), in order to be unambiguous.)
-@node Internal Character Encoding,  , Internal String Encoding, Internal Mule Encodings
-@subsection Internal Character Encoding
-@cindex internal character encoding
-@cindex character encoding, internal
-@cindex encoding, internal character
-One 19-bit word represents a single character.  The word is
-separated into three fields:
-@example
-Bit number:     18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
-<------------> <------------------> <------------------>
-Field:                1                  2                    3
-@end example
-Note that fields 2 and 3 hold 7 bits each, while field 1 holds 5 bits.
-@example
-Character set           Field 1         Field 2         Field 3
--------------           -------         -------         -------
-ASCII                      0               0              PC1
-range:                                                   (00 - 7F)
-Control-1                  0               1              PC1
-range:                                                   (00 - 1F)
-Dimension-1 official       0            LB - 0x7F         PC1
-range:                                    (01 - 0D)      (20 - 7F)
-Dimension-1 private        0            LB - 0x80         PC1
-range:                                    (20 - 6F)      (20 - 7F)
-Dimension-2 official    LB - 0x8F         PC1             PC2
-range:                    (01 - 0A)       (20 - 7F)      (20 - 7F)
-Dimension-2 private     LB - 0xE1         PC1             PC2
-range:                    (0F - 1E)       (20 - 7F)      (20 - 7F)
-Composite                 0x1F             ?               ?
-@end example
-Note that character codes 0 - 255 are the same as the ``binary
-encoding'' described above.
-Most of the code in XEmacs knows nothing of the representation of a
-character other than that values 0 - 255 represent ASCII, Control 1,
-and Latin 1.
-@strong{WARNING WARNING WARNING}: The Boyer-Moore code in
-@file{search.c}, and the code in @code{search_buffer()} that determines
-whether that code can be used, knows that ``field 3'' in a character
-always corresponds to the last byte in the textual representation of the
-character. (This is important because the Boyer-Moore algorithm works by
-looking at the last byte of the search string and &&#### finish this.
-@node Byte/Character Types; Buffer Positions; Other Typedefs, Internal Text API's, Internal Mule Encodings, Multilingual Support
-@section Byte/Character Types; Buffer Positions; Other Typedefs
-@cindex byte/character types; buffer positions; other typedefs
-@cindex byte/character types
-@cindex character types
-@cindex buffer positions
-@cindex typedefs, other
-@menu
-* Byte Types::
-* Different Ways of Seeing Internal Text::
-* Buffer Positions::
-* Other Typedefs::
-* Usage of the Various Representations::
-* Working With the Various Representations::
-@end menu
-@node Byte Types, Different Ways of Seeing Internal Text, Byte/Character Types; Buffer Positions; Other Typedefs, Byte/Character Types; Buffer Positions; Other Typedefs
-@subsection Byte Types
-@cindex byte types
-Stuff pointed to by a char * or unsigned char * will nearly always be
-one of the following types:
-@itemize @minus
-@item
-a) [Ibyte] pointer to internally-formatted text
-@item
-b) [Extbyte] pointer to text in some external format, which can be
-defined as all formats other than the internal one
-@item
-c) [Ascbyte] pure ASCII text
-@item
-d) [Binbyte] binary data that is not meant to be interpreted as text
-@item
-e) [Rawbyte] general data in memory, where we don't care about whether
-it's text or binary
-@item
-f) [Boolbyte] a zero or a one
-@item
-g) [Bitbyte] a byte used for bit fields
-@item
-h) [Chbyte] null-semantics @code{char *}; used when casting an argument to
-an external API where the the other types may not be
-appropriate
-@end itemize
-Types (b), (c), (f) and (h) are defined as @code{char}, while the others are
-@code{unsigned char}.  This is for maximum safety (signed characters are
-dangerous to work with) while maintaining as much compatibility with
-external API's and string constants as possible.
-We also provide versions of the above types defined with different
-underlying C types, for API compatibility.  These use the following
-prefixes:
-@example
-C = plain char, when the base type is unsigned
-U = unsigned
-S = signed
-@end example
-(Formerly I had a comment saying that type (e) "should be replaced with
-void *".  However, there are in fact many places where an unsigned char
-* might be used -- e.g. for ease in pointer computation, since void *
-doesn't allow this, and for compatibility with external API's.)
-Note that these typedefs are purely for documentation purposes; from
-the C code's perspective, they are exactly equivalent to @code{char *},
-@code{unsigned char *}, etc., so you can freely use them with library
-functions declared as such.
-Using these more specific types rather than the general ones helps avoid
-the confusions that occur when the semantics of a char * or unsigned
-char * argument being studied are unclear.  Furthermore, by requiring
-that ALL uses of @code{char} be replaced with some other type as part of the
-Mule-ization process, we can use a search for @code{char} as a way of finding
-code that has not been properly Mule-ized yet.
-@node Different Ways of Seeing Internal Text, Buffer Positions, Byte Types, Byte/Character Types; Buffer Positions; Other Typedefs
-@subsection Different Ways of Seeing Internal Text
-@cindex different ways of seeing internal text
-There are various ways of representing internal text.  The two primary
-ways are as an "array" of individual characters; the other is as a
-"stream" of bytes.  In the ASCII world, where there are only 255
-characters at most, things are easy because each character fits into a
-byte.  In general, however, this is not true -- see the above discussion
-of characters vs. encodings.
-In some cases, it's also important to distinguish between a stream
-representation as a series of bytes and as a series of textual units.
-This is particularly important wrt Unicode.  The UTF-16 representation
-(sometimes referred to, rather sloppily, as simply the "Unicode" format)
-represents text as a series of 16-bit units.  Mostly, each unit
-corresponds to a single character, but not necessarily, as characters
-outside of the range 0-65535 (the BMP or "Basic Multilingual Plane" of
-Unicode) require two 16-bit units, through the mechanism of
-"surrogates".  When a series of 16-bit units is serialized into a byte
-stream, there are at least two possible representations, little-endian
-and big-endian, and which one is used may depend on the native format of
-16-bit integers in the CPU of the machine that XEmacs is running
-on. (Similarly, UTF-32 is logically a representation with 32-bit textual
-units.)
-Specifically:
-@itemize @minus
-@item
-UTF-8 has 1-byte (8-bit) units.
-@item
-UTF-16 has 2-byte (16-bit) units.
-@item
-UTF-32 has 4-byte (32-bit) units.
-@item
-XEmacs-internal encoding (the old "Mule" encoding) has 1-byte (8-bit)
-units.
-@item
-UTF-7 technically has 7-bit units that are within the "mail-safe" range
-(ASCII 32 - 126 plus a few control characters), but normally is encoded
-in an 8-bit stream. (UTF-7 is also a modal encoding, since it has a
-normal mode where printable ASCII characters represent themselves and a
-shifted mode, introduced with a plus sign, where a base-64 encoding is
-used.)
-@item
-UTF-5 technically has 7-bit units (normally encoded in an 8-bit stream,
-like UTF-7), but only uses uppercase A-V and 0-9, and only encodes 4
-bits worth of data per character.  UTF-5 is meant for encoding Unicode
-inside of DNS names.
-@end itemize
-Thus, we can imagine three levels in the representation of texual data:
-@example
-series of characters -> series of textual units -> series of bytes
-[Ichar]                 [Itext]                 [Ibyte]
-@end example
-XEmacs has three corresponding typedefs:
-@itemize @minus
-@item
-An Ichar is an integer (at least 32-bit), representing a 31-bit
-character.
-@item
-An Itext is an unsigned value, either 8, 16 or 32 bits, depending
-on the nature of the internal representation, and corresponding to
-a single textual unit.
-@item
-An Ibyte is an @code{unsigned char}, representing a single byte in a
-textual byte stream.
-@end itemize
-Internal text in stream format can be simultaneously viewed as either
-@code{Itext *} or @code{Ibyte *}.  The @code{Ibyte *} representation is convenient for
-copying data from one place to another, because such routines usually
-expect byte counts.  However, @code{Itext *} is much better for actually
-working with the data.
-From a text-unit perspective, units 0 through 127 will always be ASCII
-compatible, and data in Lisp strings (and other textual data generated
-as a whole, e.g. from external conversion) will be followed by a
-null-unit terminator.  From an @code{Ibyte *} perspective, however, the
-encoding is only ASCII-compatible if it uses 1-byte units.
-Similarly to the different text representations, three integral count
-types exist -- Charcount, Textcount and Bytecount.
-NOTE: Despite the presence of the terminator, internal text itself can
-have nulls in it! (Null text units, not just the null bytes present in
-any UTF-16 encoding.) The terminator is present because in many cases
-internal text is passed to routines that will ultimately pass the text
-to library functions that cannot handle embedded nulls, e.g. functions
-manipulating filenames, and it is a real hassle to have to pass the
-length around constantly.  But this can lead to sloppy coding!  We need
-to be careful about watching for nulls in places that are important,
-e.g. manipulating string objects or passing data to/from the clipboard.
-@table @code
-@item Ibyte
-The data in a buffer or string is logically made up of Ibyte objects,
-where a Ibyte takes up the same amount of space as a char. (It is
-declared differently, though, to catch invalid usages.) Strings stored
-using Ibytes are said to be in "internal format".  The important
-characteristics of internal format are
-@itemize @minus
-@item
-ASCII characters are represented as a single Ibyte, in the range 0 -
-0x7f.
-@item
-All other characters are represented as a Ibyte in the range 0x80 - 0x9f
-followed by one or more Ibytes in the range 0xa0 to 0xff.
-@end itemize
-This leads to a number of desirable properties:
-@itemize @minus
-@item
-Given the position of the beginning of a character, you can find the
-beginning of the next or previous character in constant time.
-@item
-When searching for a substring or an ASCII character within the string,
-you need merely use standard searching routines.
-@end itemize
-@item Itext
-#### Document me.
-@item Ichar
-This typedef represents a single Emacs character, which can be ASCII,
-ISO-8859, or some extended character, as would typically be used for
-Kanji.  Note that the representation of a character as an Ichar is @strong{not}
-the same as the representation of that same character in a string; thus,
-you cannot do the standard C trick of passing a pointer to a character
-to a function that expects a string.
-An Ichar takes up 19 bits of representation and (for code compatibility
-and such) is compatible with an int.  This representation is visible on
-the Lisp level.  The important characteristics of the Ichar
-representation are
-@itemize @minus
-@item
-values 0x00 - 0x7f represent ASCII.
-@item
-values 0x80 - 0xff represent the right half of ISO-8859-1.
-@item
-values 0x100 and up represent all other characters.
-@end itemize
-This means that Ichar values are upwardly compatible with the standard
-8-bit representation of ASCII/ISO-8859-1.
-@item Extbyte
-Strings that go in or out of Emacs are in "external format", typedef'ed
-as an array of char or a char *.  There is more than one external format
-(JIS, EUC, etc.) but they all have similar properties.  They are modal
-encodings, which is to say that the meaning of particular bytes is not
-fixed but depends on what "mode" the string is currently in (e.g. bytes
-in the range 0 - 0x7f might be interpreted as ASCII, or as Hiragana, or
-as 2-byte Kanji, depending on the current mode).  The mode starts out in
-ASCII/ISO-8859-1 and is switched using escape sequences -- for example,
-in the JIS encoding, 'ESC $ B' switches to a mode where pairs of bytes
-in the range 0 - 0x7f are interpreted as Kanji characters.
-External-formatted data is generally desirable for passing data between
-programs because it is upwardly compatible with standard
-ASCII/ISO-8859-1 strings and may require less space than internal
-encodings such as the one described above.  In addition, some encodings
-(e.g. JIS) keep all characters (except the ESC used to switch modes) in
-the printing ASCII range 0x20 - 0x7e, which results in a much higher
-probability that the data will avoid being garbled in transmission.
-Externally-formatted data is generally not very convenient to work with,
-however, and for this reason is usually converted to internal format
-before any work is done on the string.
-NOTE: filenames need to be in external format so that ISO-8859-1
-characters come out correctly.
-@end table
-@node Buffer Positions, Other Typedefs, Different Ways of Seeing Internal Text, Byte/Character Types; Buffer Positions; Other Typedefs
-@subsection Buffer Positions
-@cindex buffer positions
-There are three possible ways to specify positions in a buffer.  All
-of these are one-based: the beginning of the buffer is position or
-index 1, and 0 is not a valid position.
-As a "buffer position" (typedef Charbpos):
-This is an index specifying an offset in characters from the
-beginning of the buffer.  Note that buffer positions are
-logically @strong{between} characters, not on a character.  The
-difference between two buffer positions specifies the number of
-characters between those positions.  Buffer positions are the
-only kind of position externally visible to the user.
-As a "byte index" (typedef Bytebpos):
-This is an index over the bytes used to represent the characters
-in the buffer.  If there is no Mule support, this is identical
-to a buffer position, because each character is represented
-using one byte.  However, with Mule support, many characters
-require two or more bytes for their representation, and so a
-byte index may be greater than the corresponding buffer
-position.
-As a "memory index" (typedef Membpos):
-This is the byte index adjusted for the gap.  For positions
-before the gap, this is identical to the byte index.  For
-positions after the gap, this is the byte index plus the gap
-size.  There are two possible memory indices for the gap
-position; the memory index at the beginning of the gap should
-always be used, except in code that deals with manipulating the
-gap, where both indices may be seen.  The address of the
-character "at" (i.e. following) a particular position can be
-obtained from the formula
-buffer_start_address + memory_index(position) - 1
-except in the case of characters at the gap position.
-@node Other Typedefs, Usage of the Various Representations, Buffer Positions, Byte/Character Types; Buffer Positions; Other Typedefs
-@subsection Other Typedefs
-@cindex other typedefs
-Charcount:
-----------
-This typedef represents a count of characters, such as
-a character offset into a string or the number of
-characters between two positions in a buffer.  The
-difference between two Charbpos's is a Charcount, and
-character positions in a string are represented using
-a Charcount.
-Textcount:
-----------
-#### Document me.
-Bytecount:
-----------
-Similar to a Charcount but represents a count of bytes.
-The difference between two Bytebpos's is a Bytecount.
-@node Usage of the Various Representations, Working With the Various Representations, Other Typedefs, Byte/Character Types; Buffer Positions; Other Typedefs
-@subsection Usage of the Various Representations
-@cindex usage of the various representations
-Memory indices are used in low-level functions in insdel.c and for
-extent endpoints and marker positions.  The reason for this is that
-this way, the extents and markers don't need to be updated for most
-insertions, which merely shrink the gap and don't move any
-characters around in memory.
-(The beginning-of-gap memory index simplifies insertions w.r.t.
-markers, because text usually gets inserted after markers.  For
-extents, it is merely for consistency, because text can get
-inserted either before or after an extent's endpoint depending on
-the open/closedness of the endpoint.)
-Byte indices are used in other code that needs to be fast,
-such as the searching, redisplay, and extent-manipulation code.
-Buffer positions are used in all other code.  This is because this
-representation is easiest to work with (especially since Lisp
-code always uses buffer positions), necessitates the fewest
-changes to existing code, and is the safest (e.g. if the text gets
-shifted underneath a buffer position, it will still point to a
-character; if text is shifted under a byte index, it might point
-to the middle of a character, which would be bad).
-Similarly, Charcounts are used in all code that deals with strings
-except for code that needs to be fast, which used Bytecounts.
-Strings are always passed around internally using internal format.
-Conversions between external format are performed at the time
-that the data goes in or out of Emacs.
-@node Working With the Various Representations,  , Usage of the Various Representations, Byte/Character Types; Buffer Positions; Other Typedefs
-@subsection Working With the Various Representations
-@cindex working with the various representations
-We write things this way because it's very important the
-MAX_BYTEBPOS_GAP_SIZE_3 is a multiple of 3. (As it happens,
-65535 is a multiple of 3, but this may not always be the
-case. #### unfinished
-@node Internal Text API's, Coding for Mule, Byte/Character Types; Buffer Positions; Other Typedefs, Multilingual Support
-@section Internal Text API's
-@cindex internal text API's
-@cindex text API's, internal
-@cindex API's, text, internal
-@strong{NOTE}: The most current documentation for these API's is in
-@file{text.h}.  In case of error, assume that file is correct and this
-one wrong.
-@menu
-* Basic internal-format API's::
-* The DFC API::
-* The Eistring API::
-@end menu
-@node Basic internal-format API's, The DFC API, Internal Text API's, Internal Text API's
-@subsection Basic internal-format API's
-@cindex basic internal-format API's
-@cindex internal-format API's, basic
-@cindex API's, basic internal-format
-These are simple functions and macros to convert between text
-representation and characters, move forward and back in text, etc.
-#### Finish the rest of this.
-Use the following functions/macros on contiguous text in any of the
-internal formats.  Those that take a format arg work on all internal
-formats; the others work only on the default (variable-width under Mule)
-format.  If the text you're operating on is known to come from a buffer,
-use the buffer-level functions in buffer.h, which automatically know the
-correct format and handle the gap.
-Some terminology:
-"itext" appearing in the macros means "internal-format text" -- type
-@code{Ibyte *}.  Operations on such pointers themselves, rather than on the
-text being pointed to, have "itext" instead of "itext" in the macro
-name.  "ichar" in the macro names means an Ichar -- the representation
-of a character as a single integer rather than a series of bytes, as part
-of "itext".  Many of the macros below are for converting between the
-two representations of characters.
-Note also that we try to consistently distinguish between an "Ichar" and
-a Lisp character.  Stuff working with Lisp characters often just says
-"char", so we consistently use "Ichar" when that's what we're working
-with.
-@node The DFC API, The Eistring API, Basic internal-format API's, Internal Text API's
-@subsection The DFC API
-@cindex DFC API
-@cindex API, DFC
-This is for conversion between internal and external text.  Note that
-there is also the "new DFC" API, which @strong{returns} a pointer to the
-converted text (in alloca space), rather than storing it into a
-variable.
-The macros below are used for converting data between different formats.
-Generally, the data is textual, and the formats are related to
-internationalization (e.g. converting between internal-format text and
-UTF-8) -- but the mechanism is general, and could be used for anything,
-e.g. decoding gzipped data.
-In general, conversion involves a source of data, a sink, the existing
-format of the source data, and the desired format of the sink.  The
-macros below, however, always require that either the source or sink is
-internal-format text.  Therefore, in practice the conversions below
-involve source, sink, an external format (specified by a coding system),
-and the direction of conversion (internal->external or vice-versa).
-Sources and sinks can be raw data (sized or unsized -- when unsized,
-input data is assumed to be null-terminated [double null-terminated for
-Unicode-format data], and on output the length is not stored anywhere),
-Lisp strings, Lisp buffers, lstreams, and opaque data objects.  When the
-output is raw data, the result can be allocated either with @code{alloca()} or
-@code{malloc()}. (There is currently no provision for writing into a fixed
-buffer.  If you want this, use @code{alloca()} output and then copy the data --
-but be careful with the size!  Unless you are very sure of the encoding
-being used, upper bounds for the size are not in general computable.)
-The obvious restrictions on source and sink types apply (e.g. Lisp
-strings are a source and sink only for internal data).
-All raw data outputted will contain an extra null byte (two bytes for
-Unicode -- currently, in fact, all output data, whether internal or
-external, is double-null-terminated, but you can't count on this; see
-below).  This means that enough space is allocated to contain the extra
-nulls; however, these nulls are not reflected in the returned output
-size.
-The most basic macros are TO_EXTERNAL_FORMAT and TO_INTERNAL_FORMAT.
-These can be used to convert between any kinds of sources or sinks.
-However, 99% of conversions involve raw data or Lisp strings as both
-source and sink, and usually data is output as @code{alloca()} rather than
-@code{malloc()}.  For this reason, convenience macros are defined for many types
-of conversions involving raw data and/or Lisp strings, especially when
-the output is an @code{alloca()}ed string. (When the destination is a
-Lisp_String, there are other functions that should be used instead --
-@code{build_ext_string()} and @code{make_ext_string()}, for example.) The convenience
-macros are of two types -- the older kind that store the result into a
-specified variable, and the newer kind that return the result.  The newer
-kind of macros don't exist when the output is sized data, because that
-would have two return values.  NOTE: All convenience macros are
-ultimately defined in terms of TO_EXTERNAL_FORMAT and TO_INTERNAL_FORMAT.
-Thus, any comments below about the workings of these macros also apply to
-all convenience macros.
-@example
-TO_EXTERNAL_FORMAT (source_type, source, sink_type, sink, codesys)
-TO_INTERNAL_FORMAT (source_type, source, sink_type, sink, codesys)
-@end example
-Typical use is
-@example
-TO_EXTERNAL_FORMAT (LISP_STRING, str, C_STRING_MALLOC, ptr, Qfile_name);
-@end example
-which means that the contents of the lisp string @var{str} are written
-to a malloc'ed memory area which will be pointed to by @var{ptr}, after the
-function returns.  The conversion will be done using the @code{file-name}
-coding system (which will be controlled by the user indirectly by
-setting or binding the variable @code{file-name-coding-system}).
-Some sources and sinks require two C variables to specify.  We use
-some preprocessor magic to allow different source and sink types, and
-even different numbers of arguments to specify different types of
-sources and sinks.
-So we can have a call that looks like
-@example
-TO_INTERNAL_FORMAT (DATA, (ptr, len),
-MALLOC, (ptr, len),
-coding_system);
-@end example
-The parenthesized argument pairs are required to make the
-preprocessor magic work.
-NOTE: GC is inhibited during the entire operation of these macros.  This
-is because frequently the data to be converted comes from strings but
-gets passed in as just DATA, and GC may move around the string data.  If
-we didn't inhibit GC, there'd have to be a lot of messy recoding,
-alloca-copying of strings and other annoying stuff.
-The source or sink can be specified in one of these ways:
-@example
-DATA,   (ptr, len),    // input data is a fixed buffer of size len
-ALLOCA, (ptr, len),    // output data is in a @code{ALLOCA()}ed buffer of size len
-MALLOC, (ptr, len),    // output data is in a @code{malloc()}ed buffer of size len
-C_STRING_ALLOCA, ptr,  // equivalent to ALLOCA (ptr, len_ignored) on output
-C_STRING_MALLOC, ptr,  // equivalent to MALLOC (ptr, len_ignored) on output
-C_STRING,     ptr,     // equivalent to DATA, (ptr, strlen/wcslen (ptr))
-// on input (the Unicode version is used when correct)
-LISP_STRING,  string,  // input or output is a Lisp_Object of type string
-LISP_BUFFER,  buffer,  // output is written to (point) in lisp buffer
-LISP_LSTREAM, lstream, // input or output is a Lisp_Object of type lstream
-LISP_OPAQUE,  object,  // input or output is a Lisp_Object of type opaque
-@end example
-When specifying the sink, use lvalues, since the macro will assign to them,
-except when the sink is an lstream or a lisp buffer.
-For the sink types @code{ALLOCA} and @code{C_STRING_ALLOCA}, the resulting text is
-stored in a stack-allocated buffer, which is automatically freed on
-returning from the function.  However, the sink types @code{MALLOC} and
-@code{C_STRING_MALLOC} return @code{xmalloc()}ed memory.  The caller is responsible
-for freeing this memory using @code{xfree()}.
-The macros accept the kinds of sources and sinks appropriate for
-internal and external data representation.  See the type_checking_assert
-macros below for the actual allowed types.
-Since some sources and sinks use one argument (a Lisp_Object) to
-specify them, while others take a (pointer, length) pair, we use
-some C preprocessor trickery to allow pair arguments to be specified
-by parenthesizing them, as in the examples above.
-Anything prefixed by dfc_ (`data format conversion') is private.
-They are only used to implement these macros.
-[[Using C_STRING* is appropriate for using with external APIs that
-take null-terminated strings.  For internal data, we should try to
-be '\0'-clean - i.e. allow arbitrary data to contain embedded '\0'.
-Sometime in the future we might allow output to C_STRING_ALLOCA or
-C_STRING_MALLOC _only_ with @code{TO_EXTERNAL_FORMAT()}, not
-@code{TO_INTERNAL_FORMAT()}.]]
-The above comments are not true.  Frequently (most of the time, in
-fact), external strings come as zero-terminated entities, where the
-zero-termination is the only way to find out the length.  Even in
-cases where you can get the length, most of the time the system will
-still use the null to signal the end of the string, and there will
-still be no way to either send in or receive a string with embedded
-nulls.  In such situations, it's pointless to track the length
-because null bytes can never be in the string.  We have a lot of
-operations that make it easy to operate on zero-terminated strings,
-and forcing the user the deal with the length everywhere would only
-make the code uglier and more complicated, for no gain. --ben
-There is no problem using the same lvalue for source and sink.
-Also, when pointers are required, the code (currently at least) is
-lax and allows any pointer types, either in the source or the sink.
-This makes it possible, e.g., to deal with internal format data held
-in char *'s or external format data held in WCHAR * (i.e. Unicode).
-Finally, whenever storage allocation is called for, extra space is
-allocated for a terminating zero, and such a zero is stored in the
-appropriate place, regardless of whether the source data was
-specified using a length or was specified as zero-terminated.  This
-allows you to freely pass the resulting data, no matter how
-obtained, to a routine that expects zero termination (modulo, of
-course, that any embedded zeros in the resulting text will cause
-truncation).  In fact, currently two embedded zeros are allocated
-and stored after the data result.  This is to allow for the
-possibility of storing a Unicode value on output, which needs the
-two zeros.  Currently, however, the two zeros are stored regardless
-of whether the conversion is internal or external and regardless of
-whether the external coding system is in fact Unicode.  This
-behavior may change in the future, and you cannot rely on this --
-the most you can rely on is that sink data in Unicode format will
-have two terminating nulls, which combine to form one Unicode null
-character.
-NOTE: You might ask, why are these not written as functions that
-@strong{RETURN} the converted string, since that would allow them to be used
-much more conveniently, without having to constantly declare temporary
-variables?  The answer is that in fact I originally did write the
-routines that way, but that required either
-@itemize @bullet
-@item
-(a) calling @code{alloca()} inside of a function call, or
-@item
-(b) using expressions separated by commas and a global temporary variable, or
-@item
-(c) using the GCC extension (@{ ... @}).
-@end itemize
-Turned out that all of the above had bugs, all caused by GCC (hence the
-comments about "those GCC wankers" and "ream gcc up the ass").  As for
-(a), some versions of GCC (especially on Intel platforms), which had
-buggy implementations of @code{alloca()} that couldn't handle being called
-inside of a function call -- they just decremented the stack right in the
-middle of pushing args.  Oops, crash with stack trashing, very bad.  (b)
-was an attempt to fix (a), and that led to further GCC crashes, esp. when
-you had two such calls in a single subexpression, because GCC couldn't be
-counted upon to follow even a minimally reasonable order of execution.
-True, you can't count on one argument being evaluated before another, but
-GCC would actually interleave them so that the temp var got stomped on by
-one while the other was accessing it.  So I tried (c), which was
-problematic because that GCC extension has more bugs in it than a
-termite's nest.
-So reluctantly I converted to the current way.  Now, that was awhile ago
-(c. 1994), and it appears that the bug involving alloca in function calls
-has long since been fixed.  More recently, I defined the new-dfc routines
-down below, which DO allow exactly such convenience of returning your
-args rather than store them in temp variables, and I also wrote a
-configure check to see whether @code{alloca()} causes crashes inside of function
-calls, and if so use the portable @code{alloca()} implementation in alloca.c.
-If you define TEST_NEW_DFC, the old routines get written in terms of the
-new ones, and I've had a beta put out with this on and it appeared to
-this appears to cause no problems -- so we should consider
-switching, and feel no compunctions about writing further such function-
-like @code{alloca()} routines in lieu of statement-like ones. --ben
-@node The Eistring API,  , The DFC API, Internal Text API's
-@subsection The Eistring API
-@cindex Eistring API
-@cindex API, Eistring
-(This API is currently under-used) When doing simple things with
-internal text, the basic internal-format API's are enough.  But to do
-things like delete or replace a substring, concatenate various strings,
-etc. is difficult to do cleanly because of the allocation issues.
-The Eistring API is designed to deal with this, and provides a clean
-way of modifying and building up internal text. (Note that the former
-lack of this API has meant that some code uses Lisp strings to do
-similar manipulations, resulting in excess garbage and increased
-garbage collection.)
-NOTE: The Eistring API is (or should be) Mule-correct even without
-an ASCII-compatible internal representation.
-@example
-#### NOTE: This is a work in progress.  Neither the API nor especially
-the implementation is finished.
-NOTE: An Eistring is a structure that makes it easy to work with
-internally-formatted strings of data.  It provides operations similar
-in feel to the standard @code{strcpy()}, @code{strcat()}, @code{strlen()}, etc., but
-(a) it is Mule-correct
-(b) it does dynamic allocation so you never have to worry about size
-restrictions
-(c) it comes in an @code{ALLOCA()} variety (all allocation is stack-local,
-so there is no need to explicitly clean up) as well as a @code{malloc()}
-variety
-(d) it knows its own length, so it does not suffer from standard null
-byte brain-damage -- but it null-terminates the data anyway, so
-it can be passed to standard routines
-(e) it provides a much more powerful set of operations and knows about
-all the standard places where string data might reside: Lisp_Objects,
-other Eistrings, Ibyte * data with or without an explicit length,
-ASCII strings, Ichars, etc.
-(f) it provides easy operations to convert to/from externally-formatted
-data, and is easier to use than the standard TO_INTERNAL_FORMAT
-and TO_EXTERNAL_FORMAT macros. (An Eistring can store both the internal
-and external version of its data, but the external version is only
-initialized or changed when you call @code{eito_external()}.)
-The idea is to make it as easy to write Mule-correct string manipulation
-code as it is to write normal string manipulation code.  We also make
-the API sufficiently general that it can handle multiple internal data
-formats (e.g. some fixed-width optimizing formats and a default variable
-width format) and allows for @strong{ANY} data format we might choose in the
-future for the default format, including UCS2. (In other words, we can't
-assume that the internal format is ASCII-compatible and we can't assume
-it doesn't have embedded null bytes.  We do assume, however, that any
-chosen format will have the concept of null-termination.) All of this is
-hidden from the user.
-#### It is really too bad that we don't have a real object-oriented
-language, or at least a language with polymorphism!
-**********************************************
-*                 Declaration                *
-**********************************************
-To declare an Eistring, either put one of the following in the local
-variable section:
-DECLARE_EISTRING (name);
-Declare a new Eistring and initialize it to the empy string.  This
-is a standard local variable declaration and can go anywhere in the
-variable declaration section.  NAME itself is declared as an
-Eistring *, and its storage declared on the stack.
-DECLARE_EISTRING_MALLOC (name);
-Declare and initialize a new Eistring, which uses @code{malloc()}ed
-instead of @code{ALLOCA()}ed data.  This is a standard local variable
-declaration and can go anywhere in the variable declaration
-section.  Once you initialize the Eistring, you will have to free
-it using @code{eifree()} to avoid memory leaks.  You will need to use this
-form if you are passing an Eistring to any function that modifies
-it (otherwise, the modified data may be in stack space and get
-overwritten when the function returns).
-or use
-Eistring ei;
-void eiinit (Eistring *ei);
-void eiinit_malloc (Eistring *einame);
-If you need to put an Eistring elsewhere than in a local variable
-declaration (e.g. in a structure), declare it as shown and then
-call one of the init macros.
-Also note:
-void eifree (Eistring *ei);
-If you declared an Eistring to use @code{malloc()} to hold its data,
-or converted it to the heap using @code{eito_malloc()}, then this
-releases any data in it and afterwards resets the Eistring
-using @code{eiinit_malloc()}.  Otherwise, it just resets the Eistring
-using @code{eiinit()}.
-**********************************************
-*                 Conventions                *
-**********************************************
-- The names of the functions have been chosen, where possible, to
-match the names of @code{str*()} functions in the standard C API.
--
-**********************************************
-*               Initialization               *
-**********************************************
-void eireset (Eistring *eistr);
-Initialize the Eistring to the empty string.
-void eicpy_* (Eistring *eistr, ...);
-Initialize the Eistring from somewhere:
-void eicpy_ei (Eistring *eistr, Eistring *eistr2);
-... from another Eistring.
-void eicpy_lstr (Eistring *eistr, Lisp_Object lisp_string);
-... from a Lisp_Object string.
-void eicpy_ch (Eistring *eistr, Ichar ch);
-... from an Ichar (this can be a conventional C character).
-void eicpy_lstr_off (Eistring *eistr, Lisp_Object lisp_string,
-Bytecount off, Charcount charoff,
-Bytecount len, Charcount charlen);
-... from a section of a Lisp_Object string.
-void eicpy_lbuf (Eistring *eistr, Lisp_Object lisp_buf,
-	    Bytecount off, Charcount charoff,
-	    Bytecount len, Charcount charlen);
-... from a section of a Lisp_Object buffer.
-void eicpy_raw (Eistring *eistr, const Ibyte *data, Bytecount len);
-... from raw internal-format data in the default internal format.
-void eicpy_rawz (Eistring *eistr, const Ibyte *data);
-... from raw internal-format data in the default internal format
-that is "null-terminated" (the meaning of this depends on the nature
-of the default internal format).
-void eicpy_raw_fmt (Eistring *eistr, const Ibyte *data, Bytecount len,
-Internal_Format intfmt, Lisp_Object object);
-... from raw internal-format data in the specified format.
-void eicpy_rawz_fmt (Eistring *eistr, const Ibyte *data,
-Internal_Format intfmt, Lisp_Object object);
-... from raw internal-format data in the specified format that is
-"null-terminated" (the meaning of this depends on the nature of
-the specific format).
-void eicpy_c (Eistring *eistr, const Ascbyte *c_string);
-... from an ASCII null-terminated string.  Non-ASCII characters in
-the string are @strong{ILLEGAL} (read @code{abort()} with error-checking defined).
-void eicpy_c_len (Eistring *eistr, const Ascbyte *c_string, len);
-... from an ASCII string, with length specified.  Non-ASCII characters
-in the string are @strong{ILLEGAL} (read @code{abort()} with error-checking defined).
-void eicpy_ext (Eistring *eistr, const Extbyte *extdata,
-Lisp_Object codesys);
-... from external null-terminated data, with coding system specified.
-void eicpy_ext_len (Eistring *eistr, const Extbyte *extdata,
-Bytecount extlen, Lisp_Object codesys);
-... from external data, with length and coding system specified.
-void eicpy_lstream (Eistring *eistr, Lisp_Object lstream);
-... from an lstream; reads data till eof.  Data must be in default
-internal format; otherwise, interpose a decoding lstream.
-**********************************************
-*    Getting the data out of the Eistring    *
-**********************************************
-Ibyte *eidata (Eistring *eistr);
-Return a pointer to the raw data in an Eistring.  This is NOT
-a copy.
-Lisp_Object eimake_string (Eistring *eistr);
-Make a Lisp string out of the Eistring.
-Lisp_Object eimake_string_off (Eistring *eistr,
-Bytecount off, Charcount charoff,
-			  Bytecount len, Charcount charlen);
-Make a Lisp string out of a section of the Eistring.
-void eicpyout_alloca (Eistring *eistr, LVALUE: Ibyte *ptr_out,
-LVALUE: Bytecount len_out);
-Make an @code{ALLOCA()} copy of the data in the Eistring, using the
-default internal format.  Due to the nature of @code{ALLOCA()}, this
-must be a macro, with all lvalues passed in as parameters.
-(More specifically, not all compilers correctly handle using
-@code{ALLOCA()} as the argument to a function call -- GCC on x86
-didn't used to, for example.) A pointer to the @code{ALLOCA()}ed data
-is stored in PTR_OUT, and the length of the data (not including
-the terminating zero) is stored in LEN_OUT.
-void eicpyout_alloca_fmt (Eistring *eistr, LVALUE: Ibyte *ptr_out,
-LVALUE: Bytecount len_out,
-Internal_Format intfmt, Lisp_Object object);
-Like @code{eicpyout_alloca()}, but converts to the specified internal
-format. (No formats other than FORMAT_DEFAULT are currently
-implemented, and you get an assertion failure if you try.)
-Ibyte *eicpyout_malloc (Eistring *eistr, Bytecount *intlen_out);
-Make a @code{malloc()} copy of the data in the Eistring, using the
-default internal format.  This is a real function.  No lvalues
-passed in.  Returns the new data, and stores the length (not
-including the terminating zero) using INTLEN_OUT, unless it's
-a NULL pointer.
-Ibyte *eicpyout_malloc_fmt (Eistring *eistr, Internal_Format intfmt,
-Bytecount *intlen_out, Lisp_Object object);
-Like @code{eicpyout_malloc()}, but converts to the specified internal
-format. (No formats other than FORMAT_DEFAULT are currently
-implemented, and you get an assertion failure if you try.)
-**********************************************
-*             Moving to the heap             *
-**********************************************
-void eito_malloc (Eistring *eistr);
-Move this Eistring to the heap.  Its data will be stored in a
-@code{malloc()}ed block rather than the stack.  Subsequent changes to
-this Eistring will @code{realloc()} the block as necessary.  Use this
-when you want the Eistring to remain in scope past the end of
-this function call.  You will have to manually free the data
-in the Eistring using @code{eifree()}.
-void eito_alloca (Eistring *eistr);
-Move this Eistring back to the stack, if it was moved to the
-heap with @code{eito_malloc()}.  This will automatically free any
-heap-allocated data.
-**********************************************
-*            Retrieving the length           *
-**********************************************
-Bytecount eilen (Eistring *eistr);
-Return the length of the internal data, in bytes.  See also
-@code{eiextlen()}, below.
-Charcount eicharlen (Eistring *eistr);
-Return the length of the internal data, in characters.
-**********************************************
-*           Working with positions           *
-**********************************************
-Bytecount eicharpos_to_bytepos (Eistring *eistr, Charcount charpos);
-Convert a char offset to a byte offset.
-Charcount eibytepos_to_charpos (Eistring *eistr, Bytecount bytepos);
-Convert a byte offset to a char offset.
-Bytecount eiincpos (Eistring *eistr, Bytecount bytepos);
-Increment the given position by one character.
-Bytecount eiincpos_n (Eistring *eistr, Bytecount bytepos, Charcount n);
-Increment the given position by N characters.
-Bytecount eidecpos (Eistring *eistr, Bytecount bytepos);
-Decrement the given position by one character.
-Bytecount eidecpos_n (Eistring *eistr, Bytecount bytepos, Charcount n);
-Deccrement the given position by N characters.
-**********************************************
-*    Getting the character at a position     *
-**********************************************
-Ichar eigetch (Eistring *eistr, Bytecount bytepos);
-Return the character at a particular byte offset.
-Ichar eigetch_char (Eistring *eistr, Charcount charpos);
-Return the character at a particular character offset.
-**********************************************
-*    Setting the character at a position     *
-**********************************************
-Ichar eisetch (Eistring *eistr, Bytecount bytepos, Ichar chr);
-Set the character at a particular byte offset.
-Ichar eisetch_char (Eistring *eistr, Charcount charpos, Ichar chr);
-Set the character at a particular character offset.
-**********************************************
-*               Concatenation                *
-**********************************************
-void eicat_* (Eistring *eistr, ...);
-Concatenate onto the end of the Eistring, with data coming from the
-same places as above:
-void eicat_ei (Eistring *eistr, Eistring *eistr2);
-... from another Eistring.
-void eicat_c (Eistring *eistr, Ascbyte *c_string);
-... from an ASCII null-terminated string.  Non-ASCII characters in
-the string are @strong{ILLEGAL} (read @code{abort()} with error-checking defined).
-void eicat_raw (ei, const Ibyte *data, Bytecount len);
-... from raw internal-format data in the default internal format.
-void eicat_rawz (ei, const Ibyte *data);
-... from raw internal-format data in the default internal format
-that is "null-terminated" (the meaning of this depends on the nature
-of the default internal format).
-void eicat_lstr (ei, Lisp_Object lisp_string);
-... from a Lisp_Object string.
-void eicat_ch (ei, Ichar ch);
-... from an Ichar.
-All except the first variety are convenience functions.
-n the general case, create another Eistring from the source.)
-**********************************************
-*                Replacement                 *
-**********************************************
-void eisub_* (Eistring *eistr, Bytecount off, Charcount charoff,
-			  Bytecount len, Charcount charlen, ...);
-Replace a section of the Eistring, specifically:
-void eisub_ei (Eistring *eistr, Bytecount off, Charcount charoff,
-	  Bytecount len, Charcount charlen, Eistring *eistr2);
-... with another Eistring.
-void eisub_c (Eistring *eistr, Bytecount off, Charcount charoff,
-	 Bytecount len, Charcount charlen, Ascbyte *c_string);
-... with an ASCII null-terminated string.  Non-ASCII characters in
-the string are @strong{ILLEGAL} (read @code{abort()} with error-checking defined).
-void eisub_ch (Eistring *eistr, Bytecount off, Charcount charoff,
-	  Bytecount len, Charcount charlen, Ichar ch);
-... with an Ichar.
-void eidel (Eistring *eistr, Bytecount off, Charcount charoff,
-Bytecount len, Charcount charlen);
-Delete a section of the Eistring.
-**********************************************
-*      Converting to an external format      *
-**********************************************
-void eito_external (Eistring *eistr, Lisp_Object codesys);
-Convert the Eistring to an external format and store the result
-in the string.  NOTE: Further changes to the Eistring will @strong{NOT}
-change the external data stored in the string.  You will have to
-call @code{eito_external()} again in such a case if you want the external
-data.
-Extbyte *eiextdata (Eistring *eistr);
-Return a pointer to the external data stored in the Eistring as
-a result of a prior call to @code{eito_external()}.
-Bytecount eiextlen (Eistring *eistr);
-Return the length in bytes of the external data stored in the
-Eistring as a result of a prior call to @code{eito_external()}.
-**********************************************
-* Searching in the Eistring for a character  *
-**********************************************
-Bytecount eichr (Eistring *eistr, Ichar chr);
-Charcount eichr_char (Eistring *eistr, Ichar chr);
-Bytecount eichr_off (Eistring *eistr, Ichar chr, Bytecount off,
-		Charcount charoff);
-Charcount eichr_off_char (Eistring *eistr, Ichar chr, Bytecount off,
-		     Charcount charoff);
-Bytecount eirchr (Eistring *eistr, Ichar chr);
-Charcount eirchr_char (Eistring *eistr, Ichar chr);
-Bytecount eirchr_off (Eistring *eistr, Ichar chr, Bytecount off,
-		 Charcount charoff);
-Charcount eirchr_off_char (Eistring *eistr, Ichar chr, Bytecount off,
-		      Charcount charoff);
-**********************************************
-*   Searching in the Eistring for a string   *
-**********************************************
-Bytecount eistr_ei (Eistring *eistr, Eistring *eistr2);
-Charcount eistr_ei_char (Eistring *eistr, Eistring *eistr2);
-Bytecount eistr_ei_off (Eistring *eistr, Eistring *eistr2, Bytecount off,
-		   Charcount charoff);
-Charcount eistr_ei_off_char (Eistring *eistr, Eistring *eistr2,
-			Bytecount off, Charcount charoff);
-Bytecount eirstr_ei (Eistring *eistr, Eistring *eistr2);
-Charcount eirstr_ei_char (Eistring *eistr, Eistring *eistr2);
-Bytecount eirstr_ei_off (Eistring *eistr, Eistring *eistr2, Bytecount off,
-		    Charcount charoff);
-Charcount eirstr_ei_off_char (Eistring *eistr, Eistring *eistr2,
-			 Bytecount off, Charcount charoff);
-Bytecount eistr_c (Eistring *eistr, Ascbyte *c_string);
-Charcount eistr_c_char (Eistring *eistr, Ascbyte *c_string);
-Bytecount eistr_c_off (Eistring *eistr, Ascbyte *c_string, Bytecount off,
-		   Charcount charoff);
-Charcount eistr_c_off_char (Eistring *eistr, Ascbyte *c_string,
-		       Bytecount off, Charcount charoff);
-Bytecount eirstr_c (Eistring *eistr, Ascbyte *c_string);
-Charcount eirstr_c_char (Eistring *eistr, Ascbyte *c_string);
-Bytecount eirstr_c_off (Eistring *eistr, Ascbyte *c_string,
-		   Bytecount off, Charcount charoff);
-Charcount eirstr_c_off_char (Eistring *eistr, Ascbyte *c_string,
-			Bytecount off, Charcount charoff);
-**********************************************
-*                 Comparison                 *
-**********************************************
-int eicmp_* (Eistring *eistr, ...);
-int eicmp_off_* (Eistring *eistr, Bytecount off, Charcount charoff,
-Bytecount len, Charcount charlen, ...);
-int eicasecmp_* (Eistring *eistr, ...);
-int eicasecmp_off_* (Eistring *eistr, Bytecount off, Charcount charoff,
-Bytecount len, Charcount charlen, ...);
-int eicasecmp_i18n_* (Eistring *eistr, ...);
-int eicasecmp_i18n_off_* (Eistring *eistr, Bytecount off, Charcount charoff,
-Bytecount len, Charcount charlen, ...);
-Compare the Eistring with the other data.  Return value same as
-from strcmp.  The `*' is either `ei' for another Eistring (in
-which case `...' is an Eistring), or `c' for a pure-ASCII string
-(in which case `...' is a pointer to that string).  For anything
-more complex, first create an Eistring out of the source.
-Comparison is either simple (`eicmp_...'), ASCII case-folding
-(`eicasecmp_...'), or multilingual case-folding
-(`eicasecmp_i18n_...).
-More specifically, the prototypes are:
-int eicmp_ei (Eistring *eistr, Eistring *eistr2);
-int eicmp_off_ei (Eistring *eistr, Bytecount off, Charcount charoff,
-Bytecount len, Charcount charlen, Eistring *eistr2);
-int eicasecmp_ei (Eistring *eistr, Eistring *eistr2);
-int eicasecmp_off_ei (Eistring *eistr, Bytecount off, Charcount charoff,
-Bytecount len, Charcount charlen, Eistring *eistr2);
-int eicasecmp_i18n_ei (Eistring *eistr, Eistring *eistr2);
-int eicasecmp_i18n_off_ei (Eistring *eistr, Bytecount off,
-		      Charcount charoff, Bytecount len,
-		      Charcount charlen, Eistring *eistr2);
-int eicmp_c (Eistring *eistr, Ascbyte *c_string);
-int eicmp_off_c (Eistring *eistr, Bytecount off, Charcount charoff,
-Bytecount len, Charcount charlen, Ascbyte *c_string);
-int eicasecmp_c (Eistring *eistr, Ascbyte *c_string);
-int eicasecmp_off_c (Eistring *eistr, Bytecount off, Charcount charoff,
-Bytecount len, Charcount charlen,
-Ascbyte *c_string);
-int eicasecmp_i18n_c (Eistring *eistr, Ascbyte *c_string);
-int eicasecmp_i18n_off_c (Eistring *eistr, Bytecount off, Charcount charoff,
-Bytecount len, Charcount charlen,
-Ascbyte *c_string);
-**********************************************
-*         Case-changing the Eistring         *
-**********************************************
-void eilwr (Eistring *eistr);
-Convert all characters in the Eistring to lowercase.
-void eiupr (Eistring *eistr);
-Convert all characters in the Eistring to uppercase.
-@end example
-@node Coding for Mule, CCL, Internal Text API's, Multilingual Support
-@section Coding for Mule
-@cindex coding for Mule
-@cindex Mule, coding for
-Although Mule support is not compiled by default in XEmacs, many people
-are using it, and we consider it crucial that new code works correctly
-with multibyte characters.  This is not hard; it is only a matter of
-following several simple user-interface guidelines.  Even if you never
-compile with Mule, with a little practice you will find it quite easy
-to code Mule-correctly.
-Note that these guidelines are not necessarily tied to the current Mule
-implementation; they are also a good idea to follow on the grounds of
-code generalization for future I18N work.
-@menu
-* Character-Related Data Types::
-* Working With Character and Byte Positions::
-* Conversion to and from External Data::
-* General Guidelines for Writing Mule-Aware Code::
-* An Example of Mule-Aware Code::
-* Mule-izing Code::
-@end menu
-@node Character-Related Data Types, Working With Character and Byte Positions, Coding for Mule, Coding for Mule
-@subsection Character-Related Data Types
-@cindex character-related data types
-@cindex data types, character-related
-First, let's review the basic character-related datatypes used by
-XEmacs.  Note that some of the separate @code{typedef}s are not
-mandatory, but they improve clarity of code a great deal, because one
-glance at the declaration can tell the intended use of the variable.
-@table @code
-@item Ichar
-@cindex Ichar
-An @code{Ichar} holds a single Emacs character.
-Obviously, the equality between characters and bytes is lost in the Mule
-world.  Characters can be represented by one or more bytes in the
-buffer, and @code{Ichar} is a C type large enough to hold any
-character.  (This currently isn't quite true for ISO 10646, which
-defines a character as a 31-bit non-negative quantity, while XEmacs
-characters are only 30-bits.  This is irrelevant, unless you are
-considering using the ISO 10646 private groups to support really large
-private character sets---in particular, the Mule character set!---in
-a version of XEmacs using Unicode internally.)
-Without Mule support, an @code{Ichar} is equivalent to an
-@code{unsigned char}.  [[This doesn't seem to be true; @file{lisp.h}
-unconditionally @samp{typedef}s @code{Ichar} to @code{int}.]]
-@item Ibyte
-@cindex Ibyte
-The data representing the text in a buffer or string is logically a set
-of @code{Ibyte}s.
-XEmacs does not work with the same character formats all the time; when
-reading characters from the outside, it decodes them to an internal
-format, and likewise encodes them when writing.  @code{Ibyte} (in fact
-@code{unsigned char}) is the basic unit of XEmacs internal buffers and
-strings format.  An @code{Ibyte *} is the type that points at text
-encoded in the variable-width internal encoding.
-One character can correspond to one or more @code{Ibyte}s.  In the
-current Mule implementation, an ASCII character is represented by the
-same @code{Ibyte}, and other characters are represented by a sequence
-of two or more @code{Ibyte}s.  (This will also be true of an
-implementation using UTF-8 as the internal encoding.  In fact, only code
-that implements character code conversions and a very few macros used to
-implement motion by whole characters will notice the difference between
-UTF-8 and the Mule encoding.)
-Without Mule support, there are exactly 256 characters, implicitly
-Latin-1, and each character is represented using one @code{Ibyte}, and
-there is a one-to-one correspondence between @code{Ibyte}s and
-@code{Ichar}s.
-@item Charxpos
-@item Charbpos
-@itemx Charcount
-@cindex Charxpos
-@cindex Charbpos
-@cindex Charcount
-A @code{Charbpos} represents a character position in a buffer.  A
-@code{Charcount} represents a number (count) of characters.  Logically,
-subtracting two @code{Charbpos} values yields a @code{Charcount} value.
-When representing a character position in a string, we just use
-@code{Charcount} directly.  The reason for having a separate typedef for
-buffer positions is that they are 1-based, whereas string positions are
-0-based and hence string counts and positions can be freely intermixed (a
-string position is equivalent to the count of characters from the
-beginning).  When representing a character position that could be either
-in a buffer or string (for example, in the extent code), @code{Charxpos}
-is used.  Although all of these are @code{typedef}ed to
-@code{EMACS_INT}, we use them in preference to @code{EMACS_INT} to make
-it clear what sort of position is being used.
-@code{Charxpos}, @code{Charbpos} and @code{Charcount} values are the
-only ones that are ever visible to Lisp.
-@item Bytexpos
-@itemx Bytecount
-@cindex Bytebpos
-@cindex Bytecount
-A @code{Bytebpos} represents a byte position in a buffer.  A
-@code{Bytecount} represents the distance between two positions, in
-bytes.  Byte positions in strings use @code{Bytecount}, and for byte
-positions that can be either in a buffer or string, @code{Bytexpos} is
-used.  The relationship between @code{Bytexpos}, @code{Bytebpos} and
-@code{Bytecount} is the same as the relationship between
-@code{Charxpos}, @code{Charbpos} and @code{Charcount}.
-@item Extbyte
-@cindex Extbyte
-When dealing with the outside world, XEmacs works with @code{Extbyte}s,
-which are equivalent to @code{char}.  The distance between two
-@code{Extbyte}s is a @code{Bytecount}, since external text is a
-byte-by-byte encoding.  Extbytes occur mainly at the transition point
-between internal text and external functions.  XEmacs code should not,
-if it can possibly avoid it, do any actual manipulation using external
-text, since its format is completely unpredictable (it might not even be
-ASCII-compatible).
-@end table
-@node Working With Character and Byte Positions, Conversion to and from External Data, Character-Related Data Types, Coding for Mule
-@subsection Working With Character and Byte Positions
-@cindex character and byte positions, working with
-@cindex byte positions, working with character and
-@cindex positions, working with character and byte
-Now that we have defined the basic character-related types, we can look
-at the macros and functions designed for work with them and for
-conversion between them.  Most of these macros are defined in
-@file{buffer.h}, and we don't discuss all of them here, but only the
-most important ones.  Examining the existing code is the best way to
-learn about them.
-@table @code
-@item MAX_ICHAR_LEN
-@cindex MAX_ICHAR_LEN
-This preprocessor constant is the maximum number of buffer bytes to
-represent an Emacs character in the variable width internal encoding.
-It is useful when allocating temporary strings to keep a known number of
-characters.  For instance:
-@example
-@group
-@{
-Charcount cclen;
-...
-@{
-/* Allocate place for @var{cclen} characters. */
-Ibyte *buf = (Ibyte *) alloca (cclen * MAX_ICHAR_LEN);
-...
-@end group
-@end example
-If you followed the previous section, you can guess that, logically,
-multiplying a @code{Charcount} value with @code{MAX_ICHAR_LEN} produces
-a @code{Bytecount} value.
-In the current Mule implementation, @code{MAX_ICHAR_LEN} equals 4.
-Without Mule, it is 1.  In a mature Unicode-based XEmacs, it will also
-be 4 (since all Unicode characters can be encoded in UTF-8 in 4 bytes or
-less), but some versions may use up to 6, in order to use the large
-private space provided by ISO 10646 to ``mirror'' the Mule code space.
-@item itext_ichar
-@itemx set_itext_ichar
-@cindex itext_ichar
-@cindex set_itext_ichar
-The @code{itext_ichar} macro takes a @code{Ibyte} pointer and
-returns the @code{Ichar} stored at that position.  If it were a
-function, its prototype would be:
-@example
-Ichar itext_ichar (Ibyte *p);
-@end example
-@code{set_itext_ichar} stores an @code{Ichar} to the specified byte
-position.  It returns the number of bytes stored:
-@example
-Bytecount set_itext_ichar (Ibyte *p, Ichar c);
-@end example
-It is important to note that @code{set_itext_ichar} is safe only for
-appending a character at the end of a buffer, not for overwriting a
-character in the middle.  This is because the width of characters
-varies, and @code{set_itext_ichar} cannot resize the string if it
-writes, say, a two-byte character where a single-byte character used to
-reside.
-A typical use of @code{set_itext_ichar} can be demonstrated by this
-example, which copies characters from buffer @var{buf} to a temporary
-string of Ibytes.
-@example
-@group
-@{
-Charbpos pos;
-for (pos = beg; pos < end; pos++)
-@{
-Ichar c = BUF_FETCH_CHAR (buf, pos);
-p += set_itext_ichar (buf, c);
-@}
-@}
-@end group
-@end example
-Note how @code{set_itext_ichar} is used to store the @code{Ichar}
-and increment the counter, at the same time.
-@item INC_IBYTEPTR
-@itemx DEC_IBYTEPTR
-@cindex INC_IBYTEPTR
-@cindex DEC_IBYTEPTR
-These two macros increment and decrement an @code{Ibyte} pointer,
-respectively.  They will adjust the pointer by the appropriate number of
-bytes according to the byte length of the character stored there.  Both
-macros assume that the memory address is located at the beginning of a
-valid character.
-Without Mule support, @code{INC_IBYTEPTR (p)} and @code{DEC_IBYTEPTR (p)}
-simply expand to @code{p++} and @code{p--}, respectively.
-@item bytecount_to_charcount
-@cindex bytecount_to_charcount
-Given a pointer to a text string and a length in bytes, return the
-equivalent length in characters.
-@example
-Charcount bytecount_to_charcount (Ibyte *p, Bytecount bc);
-@end example
-@item charcount_to_bytecount
-@cindex charcount_to_bytecount
-Given a pointer to a text string and a length in characters, return the
-equivalent length in bytes.
-@example
-Bytecount charcount_to_bytecount (Ibyte *p, Charcount cc);
-@end example
-@item itext_n_addr
-@cindex itext_n_addr
-Return a pointer to the beginning of the character offset @var{cc} (in
-characters) from @var{p}.
-@example
-Ibyte *itext_n_addr (Ibyte *p, Charcount cc);
-@end example
-@end table
-@node Conversion to and from External Data, General Guidelines for Writing Mule-Aware Code, Working With Character and Byte Positions, Coding for Mule
-@subsection Conversion to and from External Data
-@cindex conversion to and from external data
-@cindex external data, conversion to and from
-When an external function, such as a C library function, returns a
-@code{char} pointer, you should almost never treat it as @code{Ibyte}.
-This is because these returned strings may contain 8bit characters which
-can be misinterpreted by XEmacs, and cause a crash.  Likewise, when
-exporting a piece of internal text to the outside world, you should
-always convert it to an appropriate external encoding, lest the internal
-stuff (such as the infamous \201 characters) leak out.
-The interface to conversion between the internal and external
-representations of text are the numerous conversion macros defined in
-@file{buffer.h}.  There used to be a fixed set of external formats
-supported by these macros, but now any coding system can be used with
-them.  The coding system alias mechanism is used to create the
-following logical coding systems, which replace the fixed external
-formats.  The (dontusethis-set-symbol-value-handler) mechanism was
-enhanced to make this possible (more work on that is needed).
-Often useful coding systems:
-@table @code
-@item Qbinary
-This is the simplest format and is what we use in the absence of a more
-appropriate format.  This converts according to the @code{binary} coding
-system:
-@enumerate a
-@item
-On input, bytes 0--255 are converted into (implicitly Latin-1)
-characters 0--255.  A non-Mule xemacs doesn't really know about
-different character sets and the fonts to display them, so the bytes can
-be treated as text in different 1-byte encodings by simply setting the
-appropriate fonts.  So in a sense, non-Mule xemacs is a multi-lingual
-editor if, for example, different fonts are used to display text in
-different buffers, faces, or windows.  The specifier mechanism gives the
-user complete control over this kind of behavior.
-@item
-On output, characters 0--255 are converted into bytes 0--255 and other
-characters are converted into @samp{~}.
-@end enumerate
-@item Qnative
-Format used for the external Unix environment---@code{argv[]}, stuff
-from @code{getenv()}, stuff from the @file{/etc/passwd} file, etc.
-This is encoded according to the encoding specified by the current locale.
-[[This is dangerous; current locale is user preference, and the system
-is probably going to be something else.  Is there anything we can do
-about it?]]
-@item Qfile_name
-Format used for filenames.  This is normally the same as @code{Qnative},
-but the two should be distinguished for clarity and possible future
-separation -- and also because @code{Qfile_name} can be changed using either
-the @code{file-name-coding-system} or @code{pathname-coding-system} (now
-obsolete) variables.
-@item Qctext
-Compound-text format.  This is the standard X11 format used for data
-stored in properties, selections, and the like.  This is an 8-bit
-no-lock-shift ISO2022 coding system.  This is a real coding system,
-unlike @code{Qfile_name}, which is user-definable.
-@item Qmswindows_tstr
-Used for external data in all MS Windows functions that are declared to
-accept data of type @code{LPTSTR} or @code{LPCSTR}.  This maps to either
-@code{Qmswindows_multibyte} (a locale-specific encoding, same as
-@code{Qnative}) or @code{Qmswindows_unicode}, depending on whether
-XEmacs is being run under Windows 9X or Windows NT/2000/XP.
-@end table
-Many other coding systems are provided by default.
-There are two fundamental macros to convert between external and
-internal format, as well as various convenience macros to simplify the
-most common operations.
-@code{TO_INTERNAL_FORMAT} converts external data to internal format, and
-@code{TO_EXTERNAL_FORMAT} converts the other way around.  The arguments
-each of these receives are a source type, a source, a sink type, a sink,
-and a coding system (or a symbol naming a coding system).
-A typical call looks like
-@example
-TO_EXTERNAL_FORMAT (LISP_STRING, str, C_STRING_MALLOC, ptr, Qfile_name);
-@end example
-which means that the contents of the lisp string @code{str} are written
-to a malloc'ed memory area which will be pointed to by @code{ptr}, after
-the function returns.  The conversion will be done using the
-@code{file-name} coding system, which will be controlled by the user
-indirectly by setting or binding the variable
-@code{file-name-coding-system}.
-Some sources and sinks require two C variables to specify.  We use some
-preprocessor magic to allow different source and sink types, and even
-different numbers of arguments to specify different types of sources and
-sinks.
-So we can have a call that looks like
-@example
-TO_INTERNAL_FORMAT (DATA, (ptr, len),
-MALLOC, (ptr, len),
-coding_system);
-@end example
-The parenthesized argument pairs are required to make the preprocessor
-magic work.
-Here are the different source and sink types:
-@table @code
-@item @code{DATA, (ptr, len),}
-input data is a fixed buffer of size @var{len} at address @var{ptr}
-@item @code{ALLOCA, (ptr, len),}
-output data is placed in an @code{alloca()}ed buffer of size @var{len} pointed to by @var{ptr}
-@item @code{MALLOC, (ptr, len),}
-output data is in a @code{malloc()}ed buffer of size @var{len} pointed to by @var{ptr}
-@item @code{C_STRING_ALLOCA, ptr,}
-equivalent to @code{ALLOCA (ptr, len_ignored)} on output.
-@item @code{C_STRING_MALLOC, ptr,}
-equivalent to @code{MALLOC (ptr, len_ignored)} on output
-@item @code{C_STRING, ptr,}
-equivalent to @code{DATA, (ptr, strlen/wcslen (ptr))} on input
-@item @code{LISP_STRING, string,}
-input or output is a Lisp_Object of type string
-@item @code{LISP_BUFFER, buffer,}
-output is written to @code{(point)} in lisp buffer @var{buffer}
-@item @code{LISP_LSTREAM, lstream,}
-input or output is a Lisp_Object of type lstream
-@item @code{LISP_OPAQUE, object,}
-input or output is a Lisp_Object of type opaque
-@end table
-A source type of @code{C_STRING} or a sink type of
-@code{C_STRING_ALLOCA} or @code{C_STRING_MALLOC} is appropriate where
-the external API is not '\0'-byte-clean -- i.e. it expects strings to be
-terminated with a null byte.  For external API's that are in fact
-'\0'-byte-clean, we should of course not use these.
-The sinks to be specified must be lvalues, unless they are the lisp
-object types @code{LISP_LSTREAM} or @code{LISP_BUFFER}.
-There is no problem using the same lvalue for source and sink.
-Garbage collection is inhibited during these conversion operations, so
-it is OK to pass in data from Lisp strings using @code{XSTRING_DATA}.
-For the sink types @code{ALLOCA} and @code{C_STRING_ALLOCA}, the
-resulting text is stored in a stack-allocated buffer, which is
-automatically freed on returning from the function.  However, the sink
-types @code{MALLOC} and @code{C_STRING_MALLOC} return @code{xmalloc()}ed
-memory.  The caller is responsible for freeing this memory using
-@code{xfree()}.
-Note that it doesn't make sense for @code{LISP_STRING} to be a source
-for @code{TO_INTERNAL_FORMAT} or a sink for @code{TO_EXTERNAL_FORMAT}.
-You'll get an assertion failure if you try.
-99% of conversions involve raw data or Lisp strings as both source and
-sink, and usually data is output as @code{alloca()}, or sometimes
-@code{xmalloc()}.  For this reason, convenience macros are defined for
-many types of conversions involving raw data and/or Lisp strings,
-especially when the output is an @code{alloca()}ed string. (When the
-destination is a Lisp string, there are other functions that should be
-used instead -- @code{build_ext_string()} and @code{make_ext_string()},
-for example.) The convenience macros are of two types -- the older kind
-that store the result into a specified variable, and the newer kind that
-return the result.  The newer kind of macros don't exist when the output
-is sized data, because that would have two return values.  NOTE: All
-convenience macros are ultimately defined in terms of
-@code{TO_EXTERNAL_FORMAT} and @code{TO_INTERNAL_FORMAT}.  Thus, any
-comments above about the workings of these macros also apply to all
-convenience macros.
-A typical old-style convenience macro is
-@example
-C_STRING_TO_EXTERNAL (in, out, codesys);
-@end example
-This is equivalent to
-@example
-TO_EXTERNAL_FORMAT (C_STRING, in, C_STRING_ALLOCA, out, codesys);
-@end example
-but is easier to write and somewhat clearer, since it clearly identifies
-the arguments without the clutter of having the preprocessor types mixed
-in.
-The new-style equivalent is @code{NEW_C_STRING_TO_EXTERNAL (src,
-codesys)}, which @emph{returns} the converted data (still in
-@code{alloca()} space).  This is far more convenient for most
-operations.
-@node General Guidelines for Writing Mule-Aware Code, An Example of Mule-Aware Code, Conversion to and from External Data, Coding for Mule
-@subsection General Guidelines for Writing Mule-Aware Code
-@cindex writing Mule-aware code, general guidelines for
-@cindex Mule-aware code, general guidelines for writing
-@cindex code, general guidelines for writing Mule-aware
-This section contains some general guidance on how to write Mule-aware
-code, as well as some pitfalls you should avoid.
-@table @emph
-@item Never use @code{char} and @code{char *}.
-In XEmacs, the use of @code{char} and @code{char *} is almost always a
-mistake.  If you want to manipulate an Emacs character from ``C'', use
-@code{Ichar}.  If you want to examine a specific octet in the internal
-format, use @code{Ibyte}.  If you want a Lisp-visible character, use a
-@code{Lisp_Object} and @code{make_char}.  If you want a pointer to move
-through the internal text, use @code{Ibyte *}.  Also note that you
-almost certainly do not need @code{Ichar *}.  Other typedefs to clarify
-the use of @code{char} are @code{Char_ASCII}, @code{Char_Binary},
-@code{UChar_Binary}, and @code{CIbyte}.
-@item Be careful not to confuse @code{Charcount}, @code{Bytecount}, @code{Charbpos} and @code{Bytebpos}.
-The whole point of using different types is to avoid confusion about the
-use of certain variables.  Lest this effect be nullified, you need to be
-careful about using the right types.
-@item Always convert external data
-It is extremely important to always convert external data, because
-XEmacs can crash if unexpected 8-bit sequences are copied to its internal
-buffers literally.
-This means that when a system function, such as @code{readdir}, returns
-a string, you normally need to convert it using one of the conversion macros
-described in the previous chapter, before passing it further to Lisp.
-Actually, most of the basic system functions that accept '\0'-terminated
-string arguments, like @code{stat()} and @code{open()}, have
-@strong{encapsulated} equivalents that do the internal to external
-conversion themselves.  The encapsulated equivalents have a @code{qxe_}
-prefix and have string arguments of type @code{Ibyte *}, and you can
-pass internally encoded data to them, often from a Lisp string using
-@code{XSTRING_DATA}. (A better design might be to provide versions that
-accept Lisp strings directly.)  [[Really?  Then they'd either take
-@code{Lisp_Object}s and need to check type, or they'd take
-@code{Lisp_String}s, and violate the rules about passing any of the
-specific Lisp types.]]
-Also note that many internal functions, such as @code{make_string},
-accept Ibytes, which removes the need for them to convert the data they
-receive.  This increases efficiency because that way external data needs
-to be decoded only once, when it is read.  After that, it is passed
-around in internal format.
-@item Do all work in internal format
-External-formatted data is completely unpredictable in its format.  It
-may be fixed-width Unicode (not even ASCII compatible); it may be a
-modal encoding, in
-which case some occurrences of (e.g.) the slash character may be part of
-two-byte Asian-language characters, and a naive attempt to split apart a
-pathname by slashes will fail; etc.  Internal-format text should be
-converted to external format only at the point where an external API is
-actually called, and the first thing done after receiving
-external-format text from an external API should be to convert it to
-internal text.
-@end table
-@node An Example of Mule-Aware Code, Mule-izing Code, General Guidelines for Writing Mule-Aware Code, Coding for Mule
-@subsection An Example of Mule-Aware Code
-@cindex code, an example of Mule-aware
-@cindex Mule-aware code, an example of
-As an example of Mule-aware code, we will analyze the @code{string}
-function, which conses up a Lisp string from the character arguments it
-receives.  Here is the definition, pasted from @code{alloc.c}:
-@example
-@group
-DEFUN ("string", Fstring, 0, MANY, 0, /*
-Concatenate all the argument characters and make the result a string.
-*/
-(int nargs, Lisp_Object *args))
-@{
-Ibyte *storage = alloca_array (Ibyte, nargs * MAX_ICHAR_LEN);
-Ibyte *p = storage;
-for (; nargs; nargs--, args++)
-@{
-Lisp_Object lisp_char = *args;
-CHECK_CHAR_COERCE_INT (lisp_char);
-p += set_itext_ichar (p, XCHAR (lisp_char));
-@}
-return make_string (storage, p - storage);
-@}
-@end group
-@end example
-Now we can analyze the source line by line.
-Obviously, string will be as long as there are arguments to the
-function.  This is why we allocate @code{MAX_ICHAR_LEN} * @var{nargs}
-bytes on the stack, i.e. the worst-case number of bytes for @var{nargs}
-@code{Ichar}s to fit in the string.
-Then, the loop checks that each element is a character, converting
-integers in the process.  Like many other functions in XEmacs, this
-function silently accepts integers where characters are expected, for
-historical and compatibility reasons.  Unless you know what you are
-doing, @code{CHECK_CHAR} will also suffice.  @code{XCHAR (lisp_char)}
-extracts the @code{Ichar} from the @code{Lisp_Object}, and
-@code{set_itext_ichar} stores it to storage, increasing @code{p} in
-the process.
-Other instructive examples of correct coding under Mule can be found all
-over the XEmacs code.  For starters, I recommend
-@code{Fnormalize_menu_item_name} in @file{menubar.c}.  After you have
-understood this section of the manual and studied the examples, you can
-proceed writing new Mule-aware code.
-@node Mule-izing Code,  , An Example of Mule-Aware Code, Coding for Mule
-@subsection Mule-izing Code
-A lot of code is written without Mule in mind, and needs to be made
-Mule-correct or "Mule-ized".  There is really no substitute for
-line-by-line analysis when doing this, but the following checklist can
-help:
-@itemize @bullet
-@item
-Check all uses of @code{XSTRING_DATA}.
-@item
-Check all uses of @code{build_string} and @code{make_string}.
-@item
-Check all uses of @code{tolower} and @code{toupper}.
-@item
-Check object print methods.
-@item
-Check for use of functions such as @code{write_c_string},
-@code{write_fmt_string}, @code{stderr_out}, @code{stdout_out}.
-@item
-Check all occurrences of @code{char} and correct to one of the other
-typedefs described above.
-@item
-Check all existing uses of @code{TO_EXTERNAL_FORMAT},
-@code{TO_INTERNAL_FORMAT}, and any convenience macros (grep for
-@samp{EXTERNAL_TO}, @samp{TO_EXTERNAL}, and @samp{TO_SIZED_EXTERNAL}).
-@item
-In Windows code, string literals may need to be encapsulated with @code{XETEXT}.
-@end itemize
-@node CCL, Modules for Internationalization, Coding for Mule, Multilingual Support
-@section CCL
-@cindex CCL
-@example
-MACHINE CODE:
-The machine code consists of a vector of 32-bit words.
-The first such word specifies the start of the EOF section of the code;
-this is the code executed to handle any stuff that needs to be done
-(e.g. designating back to ASCII and left-to-right mode) after all
-other encoded/decoded data has been written out.  This is not used for
-charset CCL programs.
-REGISTER: 0..7  -- referred by RRR or rrr
-OPERATOR BIT FIELD (27-bit): XXXXXXXXXXXXXXX RRR TTTTT
-TTTTT (5-bit): operator type
-RRR (3-bit): register number
-XXXXXXXXXXXXXXXX (15-bit):
-CCCCCCCCCCCCCCC: constant or address
-000000000000rrr: register number
-AAAA:   00000 +
-00001 -
-00010 *
-00011 /
-00100 %
-00101 &
-00110 |
-00111 ~
-01000 <<
-01001 >>
-01010 <8
-01011 >8
-01100 //
-01101 not used
-01110 not used
-01111 not used
-10000 <
-10001 >
-10010 ==
-10011 <=
-10100 >=
-10101 !=
-OPERATORS:      TTTTT RRR XX..
-SetCS:          00000 RRR C...C      RRR = C...C
-SetCL:          00001 RRR .....      RRR = c...c
-c.............c
-SetR:           00010 RRR ..rrr      RRR = rrr
-SetA:           00011 RRR ..rrr      RRR = array[rrr]
-C.............C      size of array = C...C
-c.............c      contents = c...c
-Jump:           00100 000 c...c      jump to c...c
-JumpCond:       00101 RRR c...c      if (!RRR) jump to c...c
-WriteJump:      00110 RRR c...c      Write1 RRR, jump to c...c
-WriteReadJump:  00111 RRR c...c      Write1, Read1 RRR, jump to c...c
-WriteCJump:     01000 000 c...c      Write1 C...C, jump to c...c
-C...C
-WriteCReadJump: 01001 RRR c...c      Write1 C...C, Read1 RRR,
-C.............C      and jump to c...c
-WriteSJump:     01010 000 c...c      WriteS, jump to c...c
-C.............C
-S.............S
-...
-WriteSReadJump: 01011 RRR c...c      WriteS, Read1 RRR, jump to c...c
-C.............C
-S.............S
-...
-WriteAReadJump: 01100 RRR c...c      WriteA, Read1 RRR, jump to c...c
-C.............C      size of array = C...C
-c.............c      contents = c...c
-...
-Branch:         01101 RRR C...C      if (RRR >= 0 && RRR < C..)
-c.............c      branch to (RRR+1)th address
-Read1:          01110 RRR ...        read 1-byte to RRR
-Read2:          01111 RRR ..rrr      read 2-byte to RRR and rrr
-ReadBranch:     10000 RRR C...C      Read1 and Branch
-c.............c
-...
-Write1:         10001 RRR .....      write 1-byte RRR
-Write2:         10010 RRR ..rrr      write 2-byte RRR and rrr
-WriteC:         10011 000 .....      write 1-char C...CC
-C.............C
-WriteS:         10100 000 .....      write C..-byte of string
-C.............C
-S.............S
-...
-WriteA:         10101 RRR .....      write array[RRR]
-C.............C      size of array = C...C
-c.............c      contents = c...c
-...
-End:            10110 000 .....      terminate the execution
-SetSelfCS:      10111 RRR C...C      RRR AAAAA= C...C
-..........AAAAA
-SetSelfCL:      11000 RRR .....      RRR AAAAA= c...c
-c.............c
-..........AAAAA
-SetSelfR:       11001 RRR ..Rrr      RRR AAAAA= rrr
-..........AAAAA
-SetExprCL:      11010 RRR ..Rrr      RRR = rrr AAAAA c...c
-c.............c
-..........AAAAA
-SetExprR:       11011 RRR ..rrr      RRR = rrr AAAAA Rrr
-............Rrr
-..........AAAAA
-JumpCondC:      11100 RRR c...c      if !(RRR AAAAA C..) jump to c...c
-C.............C
-..........AAAAA
-JumpCondR:      11101 RRR c...c      if !(RRR AAAAA rrr) jump to c...c
-............rrr
-..........AAAAA
-ReadJumpCondC:  11110 RRR c...c      Read1 and JumpCondC
-C.............C
-..........AAAAA
-ReadJumpCondR:  11111 RRR c...c      Read1 and JumpCondR
-............rrr
-..........AAAAA
-@end example
-@node Modules for Internationalization,  , CCL, Multilingual Support
-@section Modules for Internationalization
-@cindex modules for internationalization
-@cindex internationalization, modules for
-@example
-@file{mule-canna.c}
-@file{mule-ccl.c}
-@file{mule-charset.c}
-@file{mule-charset.h}
-@file{file-coding.c}
-@file{file-coding.h}
-@file{mule-coding.c}
-@file{mule-mcpath.c}
-@file{mule-mcpath.h}
-@file{mule-wnnfns.c}
-@file{mule.c}
-@end example
-These files implement the MULE (Asian-language) support.  Note that MULE
-actually provides a general interface for all sorts of languages, not
-just Asian languages (although they are generally the most complicated
-to support).  This code is still in beta.
-@file{mule-charset.*} and @file{file-coding.*} provide the heart of the
-XEmacs MULE support.  @file{mule-charset.*} implements the @dfn{charset}
-Lisp object type, which encapsulates a character set (an ordered one- or
-two-dimensional set of characters, such as US ASCII or JISX0208 Japanese
-Kanji).
-@file{file-coding.*} implements the @dfn{coding-system} Lisp object
-type, which encapsulates a method of converting between different
-encodings.  An encoding is a representation of a stream of characters,
-possibly from multiple character sets, using a stream of bytes or words,
-and defines (e.g.) which escape sequences are used to specify particular
-character sets, how the indices for a character are converted into bytes
-(sometimes this involves setting the high bit; sometimes complicated
-rearranging of the values takes place, as in the Shift-JIS encoding),
-etc.  It also contains some generic coding system implementations, such
-as the binary (no-conversion) coding system and a sample gzip coding system.
-@file{mule-coding.c} contains the implementations of text coding systems.
-@file{mule-ccl.c} provides the CCL (Code Conversion Language)
-interpreter.  CCL is similar in spirit to Lisp byte code and is used to
-implement converters for custom encodings.
-@file{mule-canna.c} and @file{mule-wnnfns.c} implement interfaces to
-external programs used to implement the Canna and WNN input methods,
-respectively.  This is currently in beta.
-@file{mule-mcpath.c} provides some functions to allow for pathnames
-containing extended characters.  This code is fragmentary, obsolete, and
-completely non-working.  Instead, @code{pathname-coding-system} is used
-to specify conversions of names of files and directories.  The standard
-C I/O functions like @samp{open()} are wrapped so that conversion occurs
-automatically.
-@file{mule.c} contains a few miscellaneous things.  It currently seems
-to be unused and probably should be removed.
-@example
-@file{intl.c}
-@end example
-This provides some miscellaneous internationalization code for
-implementing message translation and interfacing to the Ximp input
-method.  None of this code is currently working.
-@example
-@file{iso-wide.h}
-@end example
-This contains leftover code from an earlier implementation of
-Asian-language support, and is not currently used.
-@node The Lisp Reader and Compiler, Lstreams, Multilingual Support, Top
-@chapter The Lisp Reader and Compiler
-@cindex Lisp reader and compiler, the
-@cindex reader and compiler, the Lisp
-@cindex compiler, the Lisp reader and
-Not yet documented.
-@node Lstreams, Consoles; Devices; Frames; Windows, The Lisp Reader and Compiler, Top
 @chapter Lstreams
 @cindex lstreams
 An @dfn{lstream} is an internal Lisp object that provides a generic
 buffering stream implementation.  Conceptually, you send data to the
 @deftypefn {Lstream Method} Lisp_Object marker (Lisp_Object @var{lstream}, void (*@var{markfun}) (Lisp_Object))
 Mark this object for garbage collection.  Same semantics as a standard
 @code{Lisp_Object} marker.  This function can be @code{NULL}.
 @end deftypefn
-@node Consoles; Devices; Frames; Windows, The Redisplay Mechanism, Lstreams, Top
+@node Subprocesses, Interface to MS Windows, Lstreams, Top
-@chapter Consoles; Devices; Frames; Windows
-@cindex consoles; devices; frames; windows
-@cindex devices; frames; windows, consoles;
-@cindex frames; windows, consoles; devices;
-@cindex windows, consoles; devices; frames;
-@menu
-* Introduction to Consoles; Devices; Frames; Windows::
-* Point::
-* Window Hierarchy::
-* The Window Object::
-* Modules for the Basic Displayable Lisp Objects::
-@end menu
-@node Introduction to Consoles; Devices; Frames; Windows, Point, Consoles; Devices; Frames; Windows, Consoles; Devices; Frames; Windows
-@section Introduction to Consoles; Devices; Frames; Windows
-@cindex consoles; devices; frames; windows, introduction to
-@cindex devices; frames; windows, introduction to consoles;
-@cindex frames; windows, introduction to consoles; devices;
-@cindex windows, introduction to consoles; devices; frames;
-A window-system window that you see on the screen is called a
-@dfn{frame} in Emacs terminology.  Each frame is subdivided into one or
-more non-overlapping panes, called (confusingly) @dfn{windows}.  Each
-window displays the text of a buffer in it. (See above on Buffers.) Note
-that buffers and windows are independent entities: Two or more windows
-can be displaying the same buffer (potentially in different locations),
-and a buffer can be displayed in no windows.
-A single display screen that contains one or more frames is called
-a @dfn{display}.  Under most circumstances, there is only one display.
-However, more than one display can exist, for example if you have
-a @dfn{multi-headed} console, i.e. one with a single keyboard but
-multiple displays. (Typically in such a situation, the various
-displays act like one large display, in that the mouse is only
-in one of them at a time, and moving the mouse off of one moves
-it into another.) In some cases, the different displays will
-have different characteristics, e.g. one color and one mono.
-XEmacs can display frames on multiple displays.  It can even deal
-simultaneously with frames on multiple keyboards (called @dfn{consoles} in
-XEmacs terminology).  Here is one case where this might be useful: You
-are using XEmacs on your workstation at work, and leave it running.
-Then you go home and dial in on a TTY line, and you can use the
-already-running XEmacs process to display another frame on your local
-TTY.
-Thus, there is a hierarchy console -> display -> frame -> window.
-There is a separate Lisp object type for each of these four concepts.
-Furthermore, there is logically a @dfn{selected console},
-@dfn{selected display}, @dfn{selected frame}, and @dfn{selected window}.
-Each of these objects is distinguished in various ways, such as being the
-default object for various functions that act on objects of that type.
-Note that every containing object remembers the ``selected'' object
-among the objects that it contains: e.g. not only is there a selected
-window, but every frame remembers the last window in it that was
-selected, and changing the selected frame causes the remembered window
-within it to become the selected window.  Similar relationships apply
-for consoles to devices and devices to frames.
-@node Point, Window Hierarchy, Introduction to Consoles; Devices; Frames; Windows, Consoles; Devices; Frames; Windows
-@section Point
-@cindex point
-Recall that every buffer has a current insertion position, called
-@dfn{point}.  Now, two or more windows may be displaying the same buffer,
-and the text cursor in the two windows (i.e. @code{point}) can be in
-two different places.  You may ask, how can that be, since each
-buffer has only one value of @code{point}?  The answer is that each window
-also has a value of @code{point} that is squirreled away in it.  There
-is only one selected window, and the value of ``point'' in that buffer
-corresponds to that window.  When the selected window is changed
-from one window to another displaying the same buffer, the old
-value of @code{point} is stored into the old window's ``point'' and the
-value of @code{point} from the new window is retrieved and made the
-value of @code{point} in the buffer.  This means that @code{window-point}
-for the selected window is potentially inaccurate, and if you
-want to retrieve the correct value of @code{point} for a window,
-you must special-case on the selected window and retrieve the
-buffer's point instead.  This is related to why @code{save-window-excursion}
-does not save the selected window's value of @code{point}.
-@node Window Hierarchy, The Window Object, Point, Consoles; Devices; Frames; Windows
-@section Window Hierarchy
-@cindex window hierarchy
-@cindex hierarchy of windows
-If a frame contains multiple windows (panes), they are always created
-by splitting an existing window along the horizontal or vertical axis.
-Terminology is a bit confusing here: to @dfn{split a window
-horizontally} means to create two side-by-side windows, i.e. to make a
-@emph{vertical} cut in a window.  Likewise, to @dfn{split a window
-vertically} means to create two windows, one above the other, by making
-a @emph{horizontal} cut.
-If you split a window and then split again along the same axis, you
-will end up with a number of panes all arranged along the same axis.
-The precise way in which the splits were made should not be important,
-and this is reflected internally.  Internally, all windows are arranged
-in a tree, consisting of two types of windows, @dfn{combination} windows
-(which have children, and are covered completely by those children) and
-@dfn{leaf} windows, which have no children and are visible.  Every
-combination window has two or more children, all arranged along the same
-axis.  There are (logically) two subtypes of windows, depending on
-whether their children are horizontally or vertically arrayed.  There is
-always one root window, which is either a leaf window (if the frame
-contains only one window) or a combination window (if the frame contains
-more than one window).  In the latter case, the root window will have
-two or more children, either horizontally or vertically arrayed, and
-each of those children will be either a leaf window or another
-combination window.
-Here are some rules:
-@enumerate
-@item
-Horizontal combination windows can never have children that are
-horizontal combination windows; same for vertical.
-@item
-Only leaf windows can be split (obviously) and this splitting does one
-of two things: (a) turns the leaf window into a combination window and
-creates two new leaf children, or (b) turns the leaf window into one of
-the two new leaves and creates the other leaf.  Rule (1) dictates which
-of these two outcomes happens.
-@item
-Every combination window must have at least two children.
-@item
-Leaf windows can never become combination windows.  They can be deleted,
-however.  If this results in a violation of (3), the parent combination
-window also gets deleted.
-@item
-All functions that accept windows must be prepared to accept combination
-windows, and do something sane (e.g. signal an error if so).
-Combination windows @emph{do} escape to the Lisp level.
-@item
-All windows have three fields governing their contents:
-these are @dfn{hchild} (a list of horizontally-arrayed children),
-@dfn{vchild} (a list of vertically-arrayed children), and @dfn{buffer}
-(the buffer contained in a leaf window).  Exactly one of
-these will be non-@code{nil}.  Remember that @dfn{horizontally-arrayed}
-means ``side-by-side'' and @dfn{vertically-arrayed} means
-@dfn{one above the other}.
-@item
-Leaf windows also have markers in their @code{start} (the
-first buffer position displayed in the window) and @code{pointm}
-(the window's stashed value of @code{point}---see above) fields,
-while combination windows have @code{nil} in these fields.
-@item
-The list of children for a window is threaded through the
-@code{next} and @code{prev} fields of each child window.
-@item
-@strong{Deleted windows can be undeleted}.  This happens as a result of
-restoring a window configuration, and is unlike frames, displays, and
-consoles, which, once deleted, can never be restored.  Deleting a window
-does nothing except set a special @code{dead} bit to 1 and clear out the
-@code{next}, @code{prev}, @code{hchild}, and @code{vchild} fields, for
-GC purposes.
-@item
-Most frames actually have two top-level windows---one for the
-minibuffer and one (the @dfn{root}) for everything else.  The modeline
-(if present) separates these two.  The @code{next} field of the root
-points to the minibuffer, and the @code{prev} field of the minibuffer
-points to the root.  The other @code{next} and @code{prev} fields are
-@code{nil}, and the frame points to both of these windows.
-Minibuffer-less frames have no minibuffer window, and the @code{next}
-and @code{prev} of the root window are @code{nil}.  Minibuffer-only
-frames have no root window, and the @code{next} of the minibuffer window
-is @code{nil} but the @code{prev} points to itself. (#### This is an
-artifact that should be fixed.)
-@end enumerate
-@node The Window Object, Modules for the Basic Displayable Lisp Objects, Window Hierarchy, Consoles; Devices; Frames; Windows
-@section The Window Object
-@cindex window object, the
-@cindex object, the window
-Windows have the following accessible fields:
-@table @code
-@item frame
-The frame that this window is on.
-@item mini_p
-Non-@code{nil} if this window is a minibuffer window.
-@item buffer
-The buffer that the window is displaying.  This may change often during
-the life of the window.
-@item dedicated
-Non-@code{nil} if this window is dedicated to its buffer.
-@item pointm
-@cindex window point internals
-This is the value of point in the current buffer when this window is
-selected; when it is not selected, it retains its previous value.
-@item start
-The position in the buffer that is the first character to be displayed
-in the window.
-@item force_start
-If this flag is non-@code{nil}, it says that the window has been
-scrolled explicitly by the Lisp program.  This affects what the next
-redisplay does if point is off the screen: instead of scrolling the
-window to show the text around point, it moves point to a location that
-is on the screen.
-@item last_modified
-The @code{modified} field of the window's buffer, as of the last time
-a redisplay completed in this window.
-@item last_point
-The buffer's value of point, as of the last time
-a redisplay completed in this window.
-@item left
-This is the left-hand edge of the window, measured in columns.  (The
-leftmost column on the screen is @w{column 0}.)
-@item top
-This is the top edge of the window, measured in lines.  (The top line on
-the screen is @w{line 0}.)
-@item height
-The height of the window, measured in lines.
-@item width
-The width of the window, measured in columns.
-@item next
-This is the window that is the next in the chain of siblings.  It is
-@code{nil} in a window that is the rightmost or bottommost of a group of
-siblings.
-@item prev
-This is the window that is the previous in the chain of siblings.  It is
-@code{nil} in a window that is the leftmost or topmost of a group of
-siblings.
-@item parent
-Internally, XEmacs arranges windows in a tree; each group of siblings has
-a parent window whose area includes all the siblings.  This field points
-to a window's parent.
-Parent windows do not display buffers, and play little role in display
-except to shape their child windows.  Emacs Lisp programs usually have
-no access to the parent windows; they operate on the windows at the
-leaves of the tree, which actually display buffers.
-@item hscroll
-This is the number of columns that the display in the window is scrolled
-horizontally to the left.  Normally, this is 0.
-@item use_time
-This is the last time that the window was selected.  The function
-@code{get-lru-window} uses this field.
-@item display_table
-The window's display table, or @code{nil} if none is specified for it.
-@item update_mode_line
-Non-@code{nil} means this window's mode line needs to be updated.
-@item base_line_number
-The line number of a certain position in the buffer, or @code{nil}.
-This is used for displaying the line number of point in the mode line.
-@item base_line_pos
-The position in the buffer for which the line number is known, or
-@code{nil} meaning none is known.
-@item region_showing
-If the region (or part of it) is highlighted in this window, this field
-holds the mark position that made one end of that region.  Otherwise,
-this field is @code{nil}.
-@end table
-@node Modules for the Basic Displayable Lisp Objects,  , The Window Object, Consoles; Devices; Frames; Windows
-@section Modules for the Basic Displayable Lisp Objects
-@cindex modules for the basic displayable Lisp objects
-@cindex displayable Lisp objects, modules for the basic
-@cindex Lisp objects, modules for the basic displayable
-@cindex objects, modules for the basic displayable Lisp
-@example
-@file{console-msw.c}
-@file{console-msw.h}
-@file{console-stream.c}
-@file{console-stream.h}
-@file{console-tty.c}
-@file{console-tty.h}
-@file{console-x.c}
-@file{console-x.h}
-@file{console.c}
-@file{console.h}
-@end example
-These modules implement the @dfn{console} Lisp object type.  A console
-contains multiple display devices, but only one keyboard and mouse.
-Most of the time, a console will contain exactly one device.
-Consoles are the top of a lisp object inclusion hierarchy.  Consoles
-contain devices, which contain frames, which contain windows.
-@example
-@file{device-msw.c}
-@file{device-tty.c}
-@file{device-x.c}
-@file{device.c}
-@file{device.h}
-@end example
-These modules implement the @dfn{device} Lisp object type.  This
-abstracts a particular screen or connection on which frames are
-displayed.  As with Lisp objects, event interfaces, and other
-subsystems, the device code is separated into a generic component that
-contains a standardized interface (in the form of a set of methods) onto
-particular device types.
-The device subsystem defines all the methods and provides method
-services for not only device operations but also for the frame, window,
-menubar, scrollbar, toolbar, and other displayable-object subsystems.
-The reason for this is that all of these subsystems have the same
-subtypes (X, TTY, NeXTstep, Microsoft Windows, etc.) as devices do.
-@example
-@file{frame-msw.c}
-@file{frame-tty.c}
-@file{frame-x.c}
-@file{frame.c}
-@file{frame.h}
-@end example
-Each device contains one or more frames in which objects (e.g. text) are
-displayed.  A frame corresponds to a window in the window system;
-usually this is a top-level window but it could potentially be one of a
-number of overlapping child windows within a top-level window, using the
-MDI (Multiple Document Interface) protocol in Microsoft Windows or a
-similar scheme.
-The @file{frame-*} files implement the @dfn{frame} Lisp object type and
-provide the generic and device-type-specific operations on frames
-(e.g. raising, lowering, resizing, moving, etc.).
-@example
-@file{window.c}
-@file{window.h}
-@end example
-@cindex window (in Emacs)
-@cindex pane
-Each frame consists of one or more non-overlapping @dfn{windows} (better
-known as @dfn{panes} in standard window-system terminology) in which a
-buffer's text can be displayed.  Windows can also have scrollbars
-displayed around their edges.
-@file{window.c} and @file{window.h} implement the @dfn{window} Lisp
-object type and provide code to manage windows.  Since windows have no
-associated resources in the window system (the window system knows only
-about the frame; no child windows or anything are used for XEmacs
-windows), there is no device-type-specific code here; all of that code
-is part of the redisplay mechanism or the code for particular object
-types such as scrollbars.
-@node The Redisplay Mechanism, Extents, Consoles; Devices; Frames; Windows, Top
-@chapter The Redisplay Mechanism
-@cindex redisplay mechanism, the
-The redisplay mechanism is one of the most complicated sections of
-XEmacs, especially from a conceptual standpoint.  This is doubly so
-because, unlike for the basic aspects of the Lisp interpreter, the
-computer science theories of how to efficiently handle redisplay are not
-well-developed.
-When working with the redisplay mechanism, remember the Golden Rules
-of Redisplay:
-@enumerate
-@item
-It Is Better To Be Correct Than Fast.
-@item
-Thou Shalt Not Run Elisp From Within Redisplay.
-@item
-It Is Better To Be Fast Than Not To Be.
-@end enumerate
-@menu
-* Critical Redisplay Sections::
-* Line Start Cache::
-* Redisplay Piece by Piece::
-* Modules for the Redisplay Mechanism::
-* Modules for other Display-Related Lisp Objects::
-@end menu
-@node Critical Redisplay Sections, Line Start Cache, The Redisplay Mechanism, The Redisplay Mechanism
-@section Critical Redisplay Sections
-@cindex redisplay sections, critical
-@cindex critical redisplay sections
-Within this section, we are defenseless and assume that the
-following cannot happen:
-@enumerate
-@item
-garbage collection
-@item
-Lisp code evaluation
-@item
-frame size changes
-@end enumerate
-We ensure (3) by calling @code{hold_frame_size_changes()}, which
-will cause any pending frame size changes to get put on hold
-till after the end of the critical section.  (1) follows
-automatically if (2) is met.  #### Unfortunately, there are
-some places where Lisp code can be called within this section.
-We need to remove them.
-If @code{Fsignal()} is called during this critical section, we
-will @code{abort()}.
-If garbage collection is called during this critical section,
-we simply return. #### We should abort instead.
-#### If a frame-size change does occur we should probably
-actually be preempting redisplay.
-@node Line Start Cache, Redisplay Piece by Piece, Critical Redisplay Sections, The Redisplay Mechanism
-@section Line Start Cache
-@cindex line start cache
-The traditional scrolling code in Emacs breaks in a variable height
-world.  It depends on the key assumption that the number of lines that
-can be displayed at any given time is fixed.  This led to a complete
-separation of the scrolling code from the redisplay code.  In order to
-fully support variable height lines, the scrolling code must actually be
-tightly integrated with redisplay.  Only redisplay can determine how
-many lines will be displayed on a screen for any given starting point.
-What is ideally wanted is a complete list of the starting buffer
-position for every possible display line of a buffer along with the
-height of that display line.  Maintaining such a full list would be very
-expensive.  We settle for having it include information for all areas
-which we happen to generate anyhow (i.e. the region currently being
-displayed) and for those areas we need to work with.
-In order to ensure that the cache accurately represents what redisplay
-would actually show, it is necessary to invalidate it in many
-situations.  If the buffer changes, the starting positions may no longer
-be correct.  If a face or an extent has changed then the line heights
-may have altered.  These events happen frequently enough that the cache
-can end up being constantly disabled.  With this potentially constant
-invalidation when is the cache ever useful?
-Even if the cache is invalidated before every single usage, it is
-necessary.  Scrolling often requires knowledge about display lines which
-are actually above or below the visible region.  The cache provides a
-convenient light-weight method of storing this information for multiple
-display regions.  This knowledge is necessary for the scrolling code to
-always obey the First Golden Rule of Redisplay.
-If the cache already contains all of the information that the scrolling
-routines happen to need so that it doesn't have to go generate it, then
-we are able to obey the Third Golden Rule of Redisplay.  The first thing
-we do to help out the cache is to always add the displayed region.  This
-region had to be generated anyway, so the cache ends up getting the
-information basically for free.  In those cases where a user is simply
-scrolling around viewing a buffer there is a high probability that this
-is sufficient to always provide the needed information.  The second
-thing we can do is be smart about invalidating the cache.
-TODO---Be smart about invalidating the cache.  Potential places:
-@itemize @bullet
-@item
-Insertions at end-of-line which don't cause line-wraps do not alter the
-starting positions of any display lines.  These types of buffer
-modifications should not invalidate the cache.  This is actually a large
-optimization for redisplay speed as well.
-@item
-Buffer modifications frequently only affect the display of lines at and
-below where they occur.  In these situations we should only invalidate
-the part of the cache starting at where the modification occurs.
-@end itemize
-In case you're wondering, the Second Golden Rule of Redisplay is not
-applicable.
-@node Redisplay Piece by Piece, Modules for the Redisplay Mechanism, Line Start Cache, The Redisplay Mechanism
-@section Redisplay Piece by Piece
-@cindex redisplay piece by piece
-As you can begin to see redisplay is complex and also not well
-documented. Chuck no longer works on XEmacs so this section is my take
-on the workings of redisplay.
-Redisplay happens in three phases:
-@enumerate
-@item
-Determine desired display in area that needs redisplay.
-Implemented by @code{redisplay.c}
-@item
-Compare desired display with current display
-Implemented by @code{redisplay-output.c}
-@item
-Output changes Implemented by @code{redisplay-output.c},
-@code{redisplay-x.c}, @code{redisplay-msw.c} and @code{redisplay-tty.c}
-@end enumerate
-Steps 1 and 2 are device-independent and relatively complex.  Step 3 is
-mostly device-dependent.
-Determining the desired display
-Display attributes are stored in @code{display_line} structures. Each
-@code{display_line} consists of a set of @code{display_block}'s and each
-@code{display_block} contains a number of @code{rune}'s. Generally
-dynarr's of @code{display_line}'s are held by each window representing
-the current display and the desired display.
-The @code{display_line} structures are tightly tied to buffers which
-presents a problem for redisplay as this connection is bogus for the
-modeline. Hence the @code{display_line} generation routines are
-duplicated for generating the modeline. This means that the modeline
-display code has many bugs that the standard redisplay code does not.
-The guts of @code{display_line} generation are in
-@code{create_text_block}, which creates a single display line for the
-desired locale. This incrementally parses the characters on the current
-line and generates redisplay structures for each.
-Gutter redisplay is different. Because the data to display is stored in
-a string we cannot use @code{create_text_block}. Instead we use
-@code{create_text_string_block} which performs the same function as
-@code{create_text_block} but for strings. Many of the complexities of
-@code{create_text_block} to do with cursor handling and selective
-display have been removed.
-@node Modules for the Redisplay Mechanism, Modules for other Display-Related Lisp Objects, Redisplay Piece by Piece, The Redisplay Mechanism
-@section Modules for the Redisplay Mechanism
-@cindex modules for the redisplay mechanism
-@cindex redisplay mechanism, modules for the
-@example
-@file{redisplay-output.c}
-@file{redisplay-msw.c}
-@file{redisplay-tty.c}
-@file{redisplay-x.c}
-@file{redisplay.c}
-@file{redisplay.h}
-@end example
-These files provide the redisplay mechanism.  As with many other
-subsystems in XEmacs, there is a clean separation between the general
-and device-specific support.
-@file{redisplay.c} contains the bulk of the redisplay engine.  These
-functions update the redisplay structures (which describe how the screen
-is to appear) to reflect any changes made to the state of any
-displayable objects (buffer, frame, window, etc.) since the last time
-that redisplay was called.  These functions are highly optimized to
-avoid doing more work than necessary (since redisplay is called
-extremely often and is potentially a huge time sink), and depend heavily
-on notifications from the objects themselves that changes have occurred,
-so that redisplay doesn't explicitly have to check each possible object.
-The redisplay mechanism also contains a great deal of caching to further
-speed things up; some of this caching is contained within the various
-displayable objects.
-@file{redisplay-output.c} goes through the redisplay structures and converts
-them into calls to device-specific methods to actually output the screen
-changes.
-@file{redisplay-x.c} and @file{redisplay-tty.c} are two implementations
-of these redisplay output methods, for X frames and TTY frames,
-respectively.
-@example
-@file{indent.c}
-@end example
-This module contains various functions and Lisp primitives for
-converting between buffer positions and screen positions.  These
-functions call the redisplay mechanism to do most of the work, and then
-examine the redisplay structures to get the necessary information.  This
-module needs work.
-@example
-@file{termcap.c}
-@file{terminfo.c}
-@file{tparam.c}
-@end example
-These files contain functions for working with the termcap (BSD-style)
-and terminfo (System V style) databases of terminal capabilities and
-escape sequences, used when XEmacs is displaying in a TTY.
-@example
-@file{cm.c}
-@file{cm.h}
-@end example
-These files provide some miscellaneous TTY-output functions and should
-probably be merged into @file{redisplay-tty.c}.
-@node Modules for other Display-Related Lisp Objects,  , Modules for the Redisplay Mechanism, The Redisplay Mechanism
-@section Modules for other Display-Related Lisp Objects
-@cindex modules for other display-related Lisp objects
-@cindex display-related Lisp objects, modules for other
-@cindex Lisp objects, modules for other display-related
-@example
-@file{faces.c}
-@file{faces.h}
-@end example
-@example
-@file{bitmaps.h}
-@file{glyphs-eimage.c}
-@file{glyphs-msw.c}
-@file{glyphs-msw.h}
-@file{glyphs-widget.c}
-@file{glyphs-x.c}
-@file{glyphs-x.h}
-@file{glyphs.c}
-@file{glyphs.h}
-@end example
-@example
-@file{objects-msw.c}
-@file{objects-msw.h}
-@file{objects-tty.c}
-@file{objects-tty.h}
-@file{objects-x.c}
-@file{objects-x.h}
-@file{objects.c}
-@file{objects.h}
-@end example
-@example
-@file{menubar-msw.c}
-@file{menubar-msw.h}
-@file{menubar-x.c}
-@file{menubar.c}
-@file{menubar.h}
-@end example
-@example
-@file{scrollbar-msw.c}
-@file{scrollbar-msw.h}
-@file{scrollbar-x.c}
-@file{scrollbar-x.h}
-@file{scrollbar.c}
-@file{scrollbar.h}
-@end example
-@example
-@file{toolbar-msw.c}
-@file{toolbar-x.c}
-@file{toolbar.c}
-@file{toolbar.h}
-@end example
-@example
-@file{font-lock.c}
-@end example
-This file provides C support for syntax highlighting---i.e.
-highlighting different syntactic constructs of a source file in
-different colors, for easy reading.  The C support is provided so that
-this is fast.
-@example
-@file{dgif_lib.c}
-@file{gif_err.c}
-@file{gif_lib.h}
-@file{gifalloc.c}
-@end example
-These modules decode GIF-format image files, for use with glyphs.
-These files were removed due to Unisys patent infringement concerns.
-@node Extents, Faces, The Redisplay Mechanism, Top
-@chapter Extents
-@cindex extents
-@menu
-* Introduction to Extents::     Extents are ranges over text, with properties.
-* Extent Ordering::             How extents are ordered internally.
-* Format of the Extent Info::   The extent information in a buffer or string.
-* Zero-Length Extents::         A weird special case.
-* Mathematics of Extent Ordering::  A rigorous foundation.
-* Extent Fragments::            Cached information useful for redisplay.
-@end menu
-@node Introduction to Extents, Extent Ordering, Extents, Extents
-@section Introduction to Extents
-@cindex extents, introduction to
-Extents are regions over a buffer, with a start and an end position
-denoting the region of the buffer included in the extent.  In
-addition, either end can be closed or open, meaning that the endpoint
-is or is not logically included in the extent.  Insertion of a character
-at a closed endpoint causes the character to go inside the extent;
-insertion at an open endpoint causes the character to go outside.
-Extent endpoints are stored using memory indices (see @file{insdel.c}),
-to minimize the amount of adjusting that needs to be done when
-characters are inserted or deleted.
-(Formerly, extent endpoints at the gap could be either before or
-after the gap, depending on the open/closedness of the endpoint.
-The intent of this was to make it so that insertions would
-automatically go inside or out of extents as necessary with no
-further work needing to be done.  It didn't work out that way,
-however, and just ended up complexifying and buggifying all the
-rest of the code.)
-@node Extent Ordering, Format of the Extent Info, Introduction to Extents, Extents
-@section Extent Ordering
-@cindex extent ordering
-Extents are compared using memory indices.  There are two orderings
-for extents and both orders are kept current at all times.  The normal
-or @dfn{display} order is as follows:
-@example
-Extent A is ``less than'' extent B,
-that is, earlier in the display order,
-if:    A-start < B-start,
-or if: A-start = B-start, and A-end > B-end
-@end example
-So if two extents begin at the same position, the larger of them is the
-earlier one in the display order (@code{EXTENT_LESS} is true).
-For the e-order, the same thing holds:
-@example
-Extent A is ``less than'' extent B in e-order,
-that is, later in the buffer,
-if:    A-end < B-end,
-or if: A-end = B-end, and A-start > B-start
-@end example
-So if two extents end at the same position, the smaller of them is the
-earlier one in the e-order (@code{EXTENT_E_LESS} is true).
-The display order and the e-order are complementary orders: any
-theorem about the display order also applies to the e-order if you swap
-all occurrences of ``display order'' and ``e-order'', ``less than'' and
-``greater than'', and ``extent start'' and ``extent end''.
-@node Format of the Extent Info, Zero-Length Extents, Extent Ordering, Extents
-@section Format of the Extent Info
-@cindex extent info, format of the
-An extent-info structure consists of a list of the buffer or string's
-extents and a @dfn{stack of extents} that lists all of the extents over
-a particular position.  The stack-of-extents info is used for
-optimization purposes---it basically caches some info that might
-be expensive to compute.  Certain otherwise hard computations are easy
-given the stack of extents over a particular position, and if the
-stack of extents over a nearby position is known (because it was
-calculated at some prior point in time), it's easy to move the stack
-of extents to the proper position.
-Given that the stack of extents is an optimization, and given that
-it requires memory, a string's stack of extents is wiped out each
-time a garbage collection occurs.  Therefore, any time you retrieve
-the stack of extents, it might not be there.  If you need it to
-be there, use the @code{_force} version.
-Similarly, a string may or may not have an extent_info structure.
-(Generally it won't if there haven't been any extents added to the
-string.) So use the @code{_force} version if you need the extent_info
-structure to be there.
-A list of extents is maintained as a double gap array.  One gap array
-is ordered by start index (the @dfn{display order}) and the other is
-ordered by end index (the @dfn{e-order}).  Note that positions in an
-extent list should logically be conceived of as referring @emph{to} a
-particular extent (as is the norm in programs) rather than sitting
-between two extents.  Note also that callers of these functions should
-not be aware of the fact that the extent list is implemented as an
-array, except for the fact that positions are integers (this should be
-generalized to handle integers and linked list equally well).
-A gap array is the same structure used by buffer text: an array of
-elements with a "gap" somewhere in the middle.  Insertion and deletion
-happens by moving the gap to the insertion/deletion point, and then
-expanding/contracting as necessary.  Gap arrays have a number of
-useful properties:
-@enumerate
-@item
-They are space efficient, as there is no need for next/previous pointers.
-@item
-If the items in them are sorted, locating an item is fast -- @math{O(log N)}.
-@item
-Insertion and deletion is very fast (constant time, essentially) if the
-gap is near (which favors localized operations, as will usually be the
-case).  Even if not, it requires only a block move of memory, which is
-generally a highly optimized operation on modern processors.
-@item
-Code to manipulate them is relatively simple to write.
-@end enumerate
-An alternative would be balanced binary trees, which have guaranteed
-@math{O(log N)} time for all operations (although the constant factors
-are not as good, and repeated localized operations will be slower than
-for a gap array).  Such code is quite tricky to write, however.
-@node Zero-Length Extents, Mathematics of Extent Ordering, Format of the Extent Info, Extents
-@section Zero-Length Extents
-@cindex zero-length extents
-@cindex extents, zero-length
-Extents can be zero-length, and will end up that way if their endpoints
-are explicitly set that way or if their detachable property is @code{nil}
-and all the text in the extent is deleted. (The exception is open-open
-zero-length extents, which are barred from existing because there is
-no sensible way to define their properties.  Deletion of the text in
-an open-open extent causes it to be converted into a closed-open
-extent.)  Zero-length extents are primarily used to represent
-annotations, and behave as follows:
-@enumerate
-@item
-Insertion at the position of a zero-length extent expands the extent
-if both endpoints are closed; goes after the extent if it is closed-open;
-and goes before the extent if it is open-closed.
-@item
-Deletion of a character on a side of a zero-length extent whose
-corresponding endpoint is closed causes the extent to be detached if
-it is detachable; if the extent is not detachable or the corresponding
-endpoint is open, the extent remains in the buffer, moving as necessary.
-@end enumerate
-Note that closed-open, non-detachable zero-length extents behave
-exactly like markers and that open-closed, non-detachable zero-length
-extents behave like the ``point-type'' marker in Mule.
-@node Mathematics of Extent Ordering, Extent Fragments, Zero-Length Extents, Extents
-@section Mathematics of Extent Ordering
-@cindex mathematics of extent ordering
-@cindex extent mathematics
-@cindex extent ordering
-@cindex display order of extents
-@cindex extents, display order
-The extents in a buffer are ordered by ``display order'' because that
-is that order that the redisplay mechanism needs to process them in.
-The e-order is an auxiliary ordering used to facilitate operations
-over extents.  The operations that can be performed on the ordered
-list of extents in a buffer are
-@enumerate
-@item
-Locate where an extent would go if inserted into the list.
-@item
-Insert an extent into the list.
-@item
-Remove an extent from the list.
-@item
-Map over all the extents that overlap a range.
-@end enumerate
-(4) requires being able to determine the first and last extents
-that overlap a range.
-NOTE: @dfn{overlap} is used as follows:
-@itemize @bullet
-@item
-two ranges overlap if they have at least one point in common.
-Whether the endpoints are open or closed makes a difference here.
-@item
-a point overlaps a range if the point is contained within the
-range; this is equivalent to treating a point @math{P} as the range
-@math{[P, P]}.
-@item
-In the case of an @emph{extent} overlapping a point or range, the extent
-is normally treated as having closed endpoints.  This applies
-consistently in the discussion of stacks of extents and such below.
-Note that this definition of overlap is not necessarily consistent with
-the extents that @code{map-extents} maps over, since @code{map-extents}
-sometimes pays attention to whether the endpoints of an extents are open
-or closed.  But for our purposes, it greatly simplifies things to treat
-all extents as having closed endpoints.
-@end itemize
-First, define @math{>}, @math{<}, @math{<=}, etc. as applied to extents
-to mean comparison according to the display order.  Comparison between
-an extent @math{E} and an index @math{I} means comparison between
-@math{E} and the range @math{[I, I]}.
-Also define @math{e>}, @math{e<}, @math{e<=}, etc. to mean comparison
-according to the e-order.
-For any range @math{R}, define @math{R(0)} to be the starting index of
-the range and @math{R(1)} to be the ending index of the range.
-For any extent @math{E}, define @math{E(next)} to be the extent directly
-following @math{E}, and @math{E(prev)} to be the extent directly
-preceding @math{E}.  Assume @math{E(next)} and @math{E(prev)} can be
-determined from @math{E} in constant time.  (This is because we store
-the extent list as a doubly linked list.)
-Similarly, define @math{E(e-next)} and @math{E(e-prev)} to be the
-extents directly following and preceding @math{E} in the e-order.
-Now:
-Let @math{R} be a range.
-Let @math{F} be the first extent overlapping @math{R}.
-Let @math{L} be the last extent overlapping @math{R}.
-Theorem 1: @math{R(1)} lies between @math{L} and @math{L(next)},
-i.e. @math{L <= R(1) < L(next)}.
-This follows easily from the definition of display order.  The
-basic reason that this theorem applies is that the display order
-sorts by increasing starting index.
-Therefore, we can determine @math{L} just by looking at where we would
-insert @math{R(1)} into the list, and if we know @math{F} and are moving
-forward over extents, we can easily determine when we've hit @math{L} by
-comparing the extent we're at to @math{R(1)}.
-@example
-Theorem 2: @math{F(e-prev) e< [1, R(0)] e<= F}.
-@end example
-This is the analog of Theorem 1, and applies because the e-order
-sorts by increasing ending index.
-Therefore, @math{F} can be found in the same amount of time as
-operation (1), i.e. the time that it takes to locate where an extent
-would go if inserted into the e-order list.  This is @math{O(log N)},
-since we are using gap arrays to manage extents.
-Define a @dfn{stack of extents} (or @dfn{SOE}) as the set of extents
-(ordered in display order and e-order, just like for normal extent
-lists) that overlap an index @math{I}.
-Now:
-Let @math{I} be an index, let @math{S} be the stack of extents on
-@math{I} and let @math{F} be the first extent in @math{S}.
-Theorem 3: The first extent in @math{S} is the first extent that overlaps
-any range @math{[I, J]}.
-Proof: Any extent that overlaps @math{[I, J]} but does not include
-@math{I} must have a start index @math{> I}, and thus be greater than
-any extent in @math{S}.
-Therefore, finding the first extent that overlaps a range @math{R} is
-the same as finding the first extent that overlaps @math{R(0)}.
-Theorem 4: Let @math{I2} be an index such that @math{I2 > I}, and let
-@math{F2} be the first extent that overlaps @math{I2}.  Then, either
-@math{F2} is in @math{S} or @math{F2} is greater than any extent in
-@math{S}.
-Proof: If @math{F2} does not include @math{I} then its start index is
-greater than @math{I} and thus it is greater than any extent in
-@math{S}, including @math{F}.  Otherwise, @math{F2} includes @math{I}
-and thus is in @math{S}, and thus @math{F2 >= F}.
-@node Extent Fragments,  , Mathematics of Extent Ordering, Extents
-@section Extent Fragments
-@cindex extent fragments
-@cindex fragments, extent
-Imagine that the buffer is divided up into contiguous, non-overlapping
-@dfn{runs} of text such that no extent starts or ends within a run
-(extents that abut the run don't count).
-An extent fragment is a structure that holds data about the run that
-contains a particular buffer position (if the buffer position is at the
-junction of two runs, the run after the position is used)---the
-beginning and end of the run, a list of all of the extents in that run,
-the @dfn{merged face} that results from merging all of the faces
-corresponding to those extents, the begin and end glyphs at the
-beginning of the run, etc.  This is the information that redisplay needs
-in order to display this run.
-Extent fragments have to be very quick to update to a new buffer
-position when moving linearly through the buffer.  They rely on the
-stack-of-extents code, which does the heavy-duty algorithmic work of
-determining which extents overly a particular position.
-@node Faces, Glyphs, Extents, Top
-@chapter Faces
-@cindex faces
-Not yet documented.
-@node Glyphs, Specifiers, Faces, Top
-@chapter Glyphs
-@cindex glyphs
-Glyphs are graphical elements that can be displayed in XEmacs buffers or
-gutters. We use the term graphical element here in the broadest possible
-sense since glyphs can be as mundane as text or as arcane as a native
-tab widget.
-In XEmacs, glyphs represent the uninstantiated state of graphical
-elements, i.e. they hold all the information necessary to produce an
-image on-screen but the image need not exist at this stage, and multiple
-screen images can be instantiated from a single glyph.
-@c #### find a place for this discussion
-@c The decision to make image specifiers a separate type is debatable.
-@c In fact, the design decision to create a separate image specifier
-@c type, rather than make glyphs themselves be specifiers, is
-@c debatable---the other properties of glyphs are rarely used and could
-@c conceivably have been incorporated into the glyph's instantiator.
-@c The rarely used glyph types (buffer, pointer, icon) could also have
-@c been incorporated into the instantiator.
-Glyphs are lazily instantiated by calling one of the glyph
-functions. This usually occurs within redisplay when
-@code{Fglyph_height} is called. Instantiation causes an image-instance
-to be created and cached. This cache is on a per-device basis for all glyphs
-except widget-glyphs, and on a per-window basis for widgets-glyphs.  The
-caching is done by @code{image_instantiate} and is necessary because it
-is generally possible to display an image-instance in multiple
-domains. For instance if we create a Pixmap, we can actually display
-this on multiple windows - even though we only need a single Pixmap
-instance to do this. If caching wasn't done then it would be necessary
-to create image-instances for every displayable occurrence of a glyph -
-and every usage - and this would be extremely memory and cpu intensive.
-Widget-glyphs (a.k.a native widgets) are not cached in this way. This is
-because widget-glyph image-instances on screen are toolkit windows, and
-thus cannot be reused in multiple XEmacs domains. Thus widget-glyphs are
-cached on an XEmacs window basis.
-Any action on a glyph first consults the cache before actually
-instantiating a widget.
-@section Glyph Instantiation
-@cindex glyph instantiation
-@cindex instantiation, glyph
-Glyph instantiation is a hairy topic and requires some explanation. The
-guts of glyph instantiation is contained within
-@code{image_instantiate}. A glyph contains an image which is a
-specifier. When a glyph function - for instance @code{Fglyph_height} -
-asks for a property of the glyph that can only be determined from its
-instantiated state, then the glyph image is instantiated and an image
-instance created. The instantiation process is governed by the specifier
-code and goes through a series of steps:
-@itemize @bullet
-@item
-Validation. Instantiation of image instances happens dynamically - often
-within the guts of redisplay. Thus it is often not feasible to catch
-instantiator errors at instantiation time. Instead the instantiator is
-validated at the time it is added to the image specifier. This function
-is defined by @code{image_validate} and at a simple level validates
-keyword value pairs.
-@item
-Duplication. The specifier code by default takes a copy of the
-instantiator. This is reasonable for most specifiers but in the case of
-widget-glyphs can be problematic, since some of the properties in the
-instantiator - for instance callbacks - could cause infinite recursion
-in the copying process. Thus the image code defines a function -
-@code{image_copy_instantiator} - which will selectively copy values.
-This is controlled by the way that a keyword is defined either using
-@code{IIFORMAT_VALID_KEYWORD} or
-@code{IIFORMAT_VALID_NONCOPY_KEYWORD}. Note that the image caching and
-redisplay code relies on instantiator copying to ensure that current and
-new instantiators are actually different rather than referring to the
-same thing.
-@item
-Normalization. Once the instantiator has been copied it must be
-converted into a form that is viable at instantiation time. This can
-involve no changes at all, but typically involves things like converting
-file names to the actual data. This function is defined by
-@code{image_going_to_add} and @code{normalize_image_instantiator}.
-@item
-Instantiation. When an image instance is actually required for display
-it is instantiated using @code{image_instantiate}. This involves calling
-instantiate methods that are specific to the type of image being
-instantiated.
-@end itemize
-The final instantiation phase also involves a number of steps. In order
-to understand these we need to describe a number of concepts.
-An image is instantiated in a @dfn{domain}, where a domain can be any
-one of a device, frame, window or image-instance. The domain gives the
-image-instance context and identity and properties that affect the
-appearance of the image-instance may be different for the same glyph
-instantiated in different domains. An example is the face used to
-display the image-instance.
-Although an image is instantiated in a particular domain the
-instantiation domain is not necessarily the domain in which the
-image-instance is cached. For example a pixmap can be instantiated in a
-window be actually be cached on a per-device basis. The domain in which
-the image-instance is actually cached is called the
-@dfn{governing-domain}. A governing-domain is currently either a device
-or a window. Widget-glyphs and text-glyphs have a window as a
-governing-domain, all other image-instances have a device as the
-governing-domain. The governing domain for an image-instance is
-determined using the governing_domain image-instance method.
-@section Widget-Glyphs
-@cindex widget-glyphs
-@section Widget-Glyphs in the MS-Windows Environment
-@cindex widget-glyphs in the MS-Windows environment
-@cindex MS-Windows environment, widget-glyphs in the
-To Do
-@section Widget-Glyphs in the X Environment
-@cindex widget-glyphs in the X environment
-@cindex X environment, widget-glyphs in the
-Widget-glyphs under X make heavy use of lwlib (@pxref{Lucid Widget
-Library}) for manipulating the native toolkit objects. This is primarily
-so that different toolkits can be supported for widget-glyphs, just as
-they are supported for features such as menubars etc.
-Lwlib is extremely poorly documented and quite hairy so here is my
-understanding of what goes on.
-Lwlib maintains a set of widget_instances which mirror the hierarchical
-state of Xt widgets. I think this is so that widgets can be updated and
-manipulated generically by the lwlib library. For instance
-update_one_widget_instance can cope with multiple types of widget and
-multiple types of toolkit. Each element in the widget hierarchy is updated
-from its corresponding widget_instance by walking the widget_instance
-tree recursively.
-This has desirable properties such as lw_modify_all_widgets which is
-called from @file{glyphs-x.c} and updates all the properties of a widget
-without having to know what the widget is or what toolkit it is from.
-Unfortunately this also has hairy properties such as making the lwlib
-code quite complex. And of course lwlib has to know at some level what
-the widget is and how to set its properties.
-@node Specifiers, Menus, Glyphs, Top
-@chapter Specifiers
-@cindex specifiers
-Not yet documented.
-Specifiers are documented in depth in the Lisp Reference manual.
-@xref{Specifiers,,, lispref, XEmacs Lisp Reference Manual}.  The code in
-@file{specifier.c} is pretty straightforward.
-@node Menus, Subprocesses, Specifiers, Top
-@chapter Menus
-@cindex menus
-A menu is set by setting the value of the variable
-@code{current-menubar} (which may be buffer-local) and then calling
-@code{set-menubar-dirty-flag} to signal a change.  This will cause the
-menu to be redrawn at the next redisplay.  The format of the data in
-@code{current-menubar} is described in @file{menubar.c}.
-Internally the data in current-menubar is parsed into a tree of
-@code{widget_value's} (defined in @file{lwlib.h}); this is accomplished
-by the recursive function @code{menu_item_descriptor_to_widget_value()},
-called by @code{compute_menubar_data()}.  Such a tree is deallocated
-using @code{free_widget_value()}.
-@code{update_screen_menubars()} is one of the external entry points.
-This checks to see, for each screen, if that screen's menubar needs to
-be updated.  This is the case if
-@enumerate
-@item
-@code{set-menubar-dirty-flag} was called since the last redisplay.  (This
-function sets the C variable menubar_has_changed.)
-@item
-The buffer displayed in the screen has changed.
-@item
-The screen has no menubar currently displayed.
-@end enumerate
-@code{set_screen_menubar()} is called for each such screen.  This
-function calls @code{compute_menubar_data()} to create the tree of
-widget_value's, then calls @code{lw_create_widget()},
-@code{lw_modify_all_widgets()}, and/or @code{lw_destroy_all_widgets()}
-to create the X-Toolkit widget associated with the menu.
-@code{update_psheets()}, the other external entry point, actually
-changes the menus being displayed.  It uses the widgets fixed by
-@code{update_screen_menubars()} and calls various X functions to ensure
-that the menus are displayed properly.
-The menubar widget is set up so that @code{pre_activate_callback()} is
-called when the menu is first selected (i.e. mouse button goes down),
-and @code{menubar_selection_callback()} is called when an item is
-selected.  @code{pre_activate_callback()} calls the function in
-activate-menubar-hook, which can change the menubar (this is described
-in @file{menubar.c}).  If the menubar is changed,
-@code{set_screen_menubars()} is called.
-@code{menubar_selection_callback()} enqueues a menu event, putting in it
-a function to call (either @code{eval} or @code{call-interactively}) and
-its argument, which is the callback function or form given in the menu's
-description.
-@node Subprocesses, Interface to MS Windows, Menus, Top
 @chapter Subprocesses
 @cindex subprocesses
 The fields of a process are:
 Auto-generated Unicode encapsulation functions
 @item intl-auto-encap-win32.h
 Auto-generated Unicode encapsulation headers
 @end table
-@node Interface to the X Window System, Future Work, Interface to MS Windows, Top
+@node Interface to the X Window System, Dumping, Interface to MS Windows, Top
 @chapter Interface to the X Window System
 @cindex X Window System, interface to the
 Mostly undocumented.
 @file{extw-*} is common code that is used for both the client and server.
 Don't touch this code; something is liable to break if you do.
-@node Future Work, Future Work Discussion, Interface to the X Window System, Top
+@node Dumping, Future Work, Interface to the X Window System, Top
+@chapter Dumping
+@cindex dumping
+@menu
+* Dumping Justification::
+* Overview::
+* Data descriptions::
+* Dumping phase::
+* Reloading phase::
+* Remaining issues::
+@end menu
+@node Dumping Justification, Overview, Dumping, Dumping
+@section Dumping Justification
+@cindex dumping, justification
+The C code of XEmacs is just a Lisp engine with a lot of built-in
+primitives useful for writing an editor.  The editor itself is written
+mostly in Lisp, and represents around 100K lines of code.  Loading and
+executing the initialization of all this code takes a bit a time (five
+to ten times the usual startup time of current xemacs) and requires
+having all the lisp source files around.  Having to reload them each
+time the editor is started would not be acceptable.
+The traditional solution to this problem is called dumping: the build
+process first creates the lisp engine under the name @file{temacs}, then
+runs it until it has finished loading and initializing all the lisp
+code, and eventually creates a new executable called @file{xemacs}
+including both the object code in @file{temacs} and all the contents of
+the memory after the initialization.
+This solution, while working, has a huge problem: the creation of the
+new executable from the actual contents of memory is an extremely
+system-specific process, quite error-prone, and which interferes with a
+lot of system libraries (like malloc).  It is even getting worse
+nowadays with libraries using constructors which are automatically
+called when the program is started (even before @code{main()}) which tend to
+crash when they are called multiple times, once before dumping and once
+after (IRIX 6.x @file{libz.so} pulls in some C++ image libraries thru
+dependencies which have this problem).  Writing the dumper is also one
+of the most difficult parts of porting XEmacs to a new operating system.
+Basically, `dumping' is an operation that is just not officially
+supported on many operating systems.
+The aim of the portable dumper is to solve the same problem as the
+system-specific dumper, that is to be able to reload quickly, using only
+a small number of files, the fully initialized lisp part of the editor,
+without any system-specific hacks.
+@node Overview, Data descriptions, Dumping Justification, Dumping
+@section Overview
+@cindex dumping overview
+The portable dumping system has to:
+@enumerate
+@item
+At dump time, write all initialized, non-quickly-rebuildable data to a
+file [Note: currently named @file{xemacs.dmp}, but the name will
+change], along with all information needed for the reloading.
+@item
+When starting xemacs, reload the dump file, relocate it to its new
+starting address if needed, and reinitialize all pointers to this
+data.  Also, rebuild all the quickly rebuildable data.
+@end enumerate
+Note: As of 21.5.18, the dump file has been moved inside of the
+executable, although there are still problems with this on some systems.
+@node Data descriptions, Dumping phase, Overview, Dumping
+@section Data descriptions
+@cindex dumping data descriptions
+The more complex task of the dumper is to be able to write memory blocks
+on the heap (lisp objects, i.e. lrecords, and C-allocated memory, such
+as structs and arrays) to disk and reload them at a different address,
+updating all the pointers they include in the process.  This is done by
+using external data descriptions that give information about the layout
+of the blocks in memory.
+The specification of these descriptions is in lrecord.h.  A description
+of an lrecord is an array of struct memory_description.  Each of these
+structs include a type, an offset in the block and some optional
+parameters depending on the type.  For instance, here is the string
+description:
+@example
+static const struct memory_description string_description[] = @{
+@{ XD_BYTECOUNT,         offsetof (Lisp_String, size) @},
+@{ XD_OPAQUE_DATA_PTR,   offsetof (Lisp_String, data), XD_INDIRECT(0, 1) @},
+@{ XD_LISP_OBJECT,       offsetof (Lisp_String, plist) @},
+@{ XD_END @}
+@};
+@end example
+The first line indicates a member of type Bytecount, which is used by
+the next, indirect directive.  The second means "there is a pointer to
+some opaque data in the field @code{data}".  The length of said data is
+given by the expression @code{XD_INDIRECT(0, 1)}, which means "the value
+in the 0th line of the description (welcome to C) plus one".  The third
+line means "there is a Lisp_Object member @code{plist} in the Lisp_String
+structure".  @code{XD_END} then ends the description.
+This gives us all the information we need to move around what is pointed
+to by a memory block (C or lrecord) and, by transitivity, everything
+that it points to.  The only missing information for dumping is the size
+of the block.  For lrecords, this is part of the
+lrecord_implementation, so we don't need to duplicate it.  For C blocks
+we use a struct sized_memory_description, which includes a size field
+and a pointer to an associated array of memory_description.
+@node Dumping phase, Reloading phase, Data descriptions, Dumping
+@section Dumping phase
+@cindex dumping phase
+Dumping is done by calling the function @code{pdump()} (in @file{dumper.c}) which is
+invoked from Fdump_emacs (in @file{emacs.c}).  This function performs a number
+of tasks.
+@menu
+* Object inventory::
+* Address allocation::
+* The header::
+* Data dumping::
+* Pointers dumping::
+@end menu
+@node Object inventory, Address allocation, Dumping phase, Dumping phase
+@subsection Object inventory
+@cindex dumping object inventory
+@cindex memory blocks
+The first task is to build the list of the objects to dump.  This
+includes:
+@itemize @bullet
+@item lisp objects
+@item other memory blocks (C structures, arrays. etc)
+@end itemize
+We end up with one @code{pdump_block_list_elt} per object group (arrays
+of C structs are kept together) which includes a pointer to the first
+object of the group, the per-object size and the count of objects in the
+group, along with some other information which is initialized later.
+These entries are linked together in @code{pdump_block_list} structures
+and can be enumerated thru either:
+@enumerate
+@item
+the @code{pdump_object_table}, an array of @code{pdump_block_list}, one
+per lrecord type, indexed by type number.
+@item
+the @code{pdump_opaque_data_list}, used for the opaque data which does
+not include pointers, and hence does not need descriptions.
+@item
+the @code{pdump_desc_table}, which is a vector of
+@code{memory_description}/@code{pdump_block_list} pairs, used for
+non-opaque C memory blocks.
+@end enumerate
+This uses a marking strategy similar to the garbage collector.  Some
+differences though:
+@enumerate
+@item
+We do not use the mark bit (which does not exist for generic memory blocks
+anyway); we use a big hash table instead.
+@item
+We do not use the mark function of lrecords but instead rely on the
+external descriptions.  This happens essentially because we need to
+follow pointers to generic memory blocks and opaque data in addition to
+Lisp_Object members.
+@end enumerate
+This is done by @code{pdump_register_object()}, which handles
+Lisp_Object variables, and @code{pdump_register_block()} which handles
+generic memory blocks (C structures, arrays, etc.), which both delegate
+the description management to @code{pdump_register_sub()}.
+The hash table doubles as a map object to pdump_block_list_elmt (i.e.
+allows us to look up a pdump_block_list_elmt with the object it points
+to).  Entries are added with @code{pdump_add_block()} and looked up with
+@code{pdump_get_block()}.  There is no need for entry removal.  The hash
+value is computed quite simply from the object pointer by
+@code{pdump_make_hash()}.
+The roots for the marking are:
+@enumerate
+@item
+the @code{staticpro}'ed variables (there is a special
+@code{staticpro_nodump()} call for protected variables we do not want to
+dump).
+@item
+the Lisp_Object variables registered via @code{dump_add_root_lisp_object}
+(@code{staticpro()} is equivalent to @code{staticpro_nodump()} +
+@code{dump_add_root_lisp_object()}).
+@item
+the data-segment memory blocks registered via @code{dump_add_root_block}
+(for blocks with relocatable pointers), or @code{dump_add_opaque} (for
+"opaque" blocks with no relocatable pointers; this is just a shortcut
+for calling @code{dump_add_root_block} with a NULL description).
+@item
+the pointer variables registered via @code{dump_add_root_block_ptr},
+each of which points to a block of heap memory (generally a C structure
+or array).  Note that @code{dump_add_root_block_ptr} is not technically
+necessary, as a pointer variable can be seen as a special case of a
+data-segment memory block and registered using
+@code{dump_add_root_block}.  Doing it this way, however, would require
+another level of static structures declared.  Since pointer variables
+are quite common, @code{dump_add_root_block_ptr} is provided for
+convenience.  Note also that internally we have to treat it separately
+from @code{dump_add_root_block} rather than writing the former as a call
+to the latter, since we don't have support for creating and using memory
+descriptions on the fly -- they must all be statically declared in the
+data-segment.
+@end enumerate
+This does not include the GCPRO'ed variables, the specbinds, the
+catchtags, the backlist, the redisplay or the profiling info, since we
+do not want to rebuild the actual chain of lisp calls which end up to
+the dump-emacs call, only the global variables.
+Weak lists and weak hash tables are dumped as if they were their
+non-weak equivalent (without changing their type, of course).  This has
+not yet been a problem.
+@node Address allocation, The header, Object inventory, Dumping phase
+@subsection Address allocation
+@cindex dumping address allocation
+The next step is to allocate the offsets of each of the objects in the
+final dump file.  This is done by @code{pdump_allocate_offset()} which
+is called indirectly by @code{pdump_scan_by_alignment()}.
+The strategy to deal with alignment problems uses these facts:
+@enumerate
+@item
+real world alignment requirements are powers of two.
+@item
+the C compiler is required to adjust the size of a struct so that you
+can have an array of them next to each other.  This means you can have an
+upper bound of the alignment requirements of a given structure by
+looking at which power of two its size is a multiple.
+@item
+the non-variant part of variable size lrecords has an alignment
+requirement of 4.
+@end enumerate
+Hence, for each lrecord type, C struct type or opaque data block the
+alignment requirement is computed as a power of two, with a minimum of
+2^2 for lrecords.  @code{pdump_scan_by_alignment()} then scans all the
+@code{pdump_block_list_elmt}'s, the ones with the highest requirements
+first.  This ensures the best packing.
+The maximum alignment requirement we take into account is 2^8.
+@code{pdump_allocate_offset()} only has to do a linear allocation,
+starting at offset 256 (this leaves room for the header and keeps the
+alignments happy).
+@node The header, Data dumping, Address allocation, Dumping phase
+@subsection The header
+@cindex dumping, the header
+The next step creates the file and writes a header with a signature and
+some random information in it.  The @code{reloc_address} field, which
+indicates at which address the file should be loaded if we want to avoid
+post-reload relocation, is set to 0.  It then seeks to offset 256 (base
+offset for the objects).
+@node Data dumping, Pointers dumping, The header, Dumping phase
+@subsection Data dumping
+@cindex data dumping
+@cindex dumping, data
+The data is dumped in the same order as the addresses were allocated by
+@code{pdump_dump_data()}, called from @code{pdump_scan_by_alignment()}.
+This function copies the data to a temporary buffer, relocates all
+pointers in the object to the addresses allocated in step Address
+Allocation, and writes it to the file.  Using the same order means that,
+if we are careful with lrecords whose size is not a multiple of 4, we
+are ensured that the object is always written at the offset in the file
+allocated in step Address Allocation.
+@node Pointers dumping,  , Data dumping, Dumping phase
+@subsection Pointers dumping
+@cindex pointers dumping
+@cindex dumping, pointers
+A bunch of tables needed to reassign properly the global pointers are
+then written.  They are:
+@enumerate
+@item
+the pdump_root_block_ptrs dynarr
+@item
+the pdump_opaques dynarr
+@item
+a vector of all the offsets to the objects in the file that include a
+description (for faster relocation at reload time)
+@item
+the pdump_root_objects and pdump_weak_object_chains dynarrs.
+@end enumerate
+For each of the dynarrs we write both the pointer to the variables and
+the relocated offset of the object they point to.  Since these variables
+are global, the pointers are still valid when restarting the program and
+are used to regenerate the global pointers.
+The @code{pdump_weak_object_chains} dynarr is a special case.  The
+variables it points to are the head of weak linked lists of lisp objects
+of the same type.  Not all objects of this list are dumped so the
+relocated pointer we associate with them points to the first dumped
+object of the list, or Qnil if none is available.  This is also the
+reason why they are not used as roots for the purpose of object
+enumeration.
+Some very important information like the @code{staticpros} and
+@code{lrecord_implementations_table} are handled indirectly using
+@code{dump_add_opaque} or @code{dump_add_root_block_ptr}.
+This is the end of the dumping part.
+@node Reloading phase, Remaining issues, Dumping phase, Dumping
+@section Reloading phase
+@cindex reloading phase
+@cindex dumping, reloading phase
+@subsection File loading
+@cindex dumping, file loading
+The file is mmap'ed in memory (which ensures a PAGESIZE alignment, at
+least 4096), or if mmap is unavailable or fails, a 256-bytes aligned
+malloc is done and the file is loaded.
+Some variables are reinitialized from the values found in the header.
+The difference between the actual loading address and the reloc_address
+is computed and will be used for all the relocations.
+@subsection Putting back the pdump_opaques
+@cindex dumping, putting back the pdump_opaques
+The memory contents are restored in the obvious and trivial way.
+@subsection Putting back the pdump_root_block_ptrs
+@cindex dumping, putting back the pdump_root_block_ptrs
+The variables pointed to by pdump_root_block_ptrs in the dump phase are
+reset to the right relocated object addresses.
+@subsection Object relocation
+@cindex dumping, object relocation
+All the objects are relocated using their description and their offset
+by @code{pdump_reloc_one}.  This step is unnecessary if the
+reloc_address is equal to the file loading address.
+@subsection Putting back the pdump_root_objects and pdump_weak_object_chains
+@cindex dumping, putting back the pdump_root_objects and pdump_weak_object_chains
+Same as Putting back the pdump_root_block_ptrs.
+@subsection Reorganize the hash tables
+@cindex dumping, reorganize the hash tables
+Since some of the hash values in the lisp hash tables are
+address-dependent, their layout is now wrong.  So we go through each of
+them and have them resorted by calling @code{pdump_reorganize_hash_table}.
+@node Remaining issues,  , Reloading phase, Dumping
+@section Remaining issues
+@cindex dumping, remaining issues
+The build process will have to start a post-dump xemacs, ask it the
+loading address (which will, hopefully, be always the same between
+different xemacs invocations) [[unfortunately, not true on Linux with
+the ExecShield feature]] and relocate the file to the new address.
+This way the object relocation phase will not have to be done, which
+means no writes in the objects and that, because of the use of mmap, the
+dumped data will be shared between all the xemacs running on the
+computer.
+Some executable signature will be necessary to ensure that a given dump
+file is really associated with a given executable, or random crashes
+will occur.  Maybe a random number set at compile or configure time thru
+a define.  This will also allow for having differently-compiled xemacsen
+on the same system (mule and no-mule comes to mind).
+The DOC file contents should probably end up in the dump file.
+@node Future Work, Future Work Discussion, Dumping, Top
 @chapter Future Work
 @cindex future work
 @menu
+* Future Work -- General Suggestions::
 * Future Work -- Elisp Compatibility Package::
 * Future Work -- Drag-n-Drop::
 * Future Work -- Standard Interface for Enabling Extensions::
 * Future Work -- Better Initialization File Scheme::
 * Future Work -- Keyword Parameters::
 * Future Work -- Display Tables::
 * Future Work -- Making Elisp Function Calls Faster::
 * Future Work -- Lisp Engine Replacement::
 @end menu
-@ignore
+@node Future Work -- General Suggestions, Future Work -- Elisp Compatibility Package, Future Work, Future Work
-Macro to convert a single line containing a heading into the format of
+@section Future Work -- General Suggestions
-all headings in the Future Work section.
+@cindex future work, general suggestions
+@cindex general suggestions, future work
-(setq last-kbd-macro (read-kbd-macro
-"<S-end> <f3> <home> @node SPC <end> RET @section SPC <f4> <home> <up> <C-right> <right> Future SPC Work SPC - - SPC <home> <down> <C-right> <right> Future SPC Work SPC - - SPC <end> RET @cindex SPC future SPC work, SPC <f4> C-r , RET C-x C-x M-l RET @cindex SPC <f4> <home> <C-right> <S-end> M-l , SPC future SPC work RET"))
+@subheading Jamie Zawinski's XEmacs Wishlist
-@end ignore
+This document is based on Jamie Zawinski's
-@node Future Work -- Elisp Compatibility Package, Future Work -- Drag-n-Drop, Future Work, Future Work
+@uref{http://www.jwz.org/doc/xemacs-wishlist.html,xemacs wishlist}.
+Throughout this page, ``I'' refers to Jamie.
+The list has been substantially reformatted and edited to fit the needs
+of this site. If you have any soul at all, you'll go check out the
+original. OK? You should also check out some other
+@uref{http://www.xemacs.org/Releases/Public-21.2/execution.html#wishlists,wishlists}.
+@subsubheading About the List
+I've ranked these (roughly) from easiest to hardest; though of all of
+them, I think the debugger improvements would be the most useful. I think
+the combination of emacs+gdb is the best Unix development environment
+currently available, but it's still lamentably primitive and extremely
+frustrating (much like Unix itself), especially if you know what kinds of
+features more modern integrated debuggers have.
+@subsubheading XEmacs Wishlist
+@table @strong
+@item Improve the keyboard macro system.
+Keyboard macros are one of the most useful concepts that emacs has to
+offer, but there's room for improvement.
+@table @strong
+@item Make it possible to embed one macro inside of another.
+Often, I'll define a keyboard macro, and then realize that I've
+left something out, or that there's more that I need to do; for
+example, I may define a macro that does something to the current line,
+and then realize that I want to apply it to a lot of lines. So, I'd
+like this to work:
+@example
+@kbd{C-x ( }
+; start macro #1
+@kbd{... }
+; (do stuff)
+@kbd{C-x ) }
+; done with macro #1
+@kbd{... }
+; (do stuff)
+@kbd{C-x ( }
+; start macro #2
+@kbd{C-x e }
+; execute macro #1 (splice it into macro #2)
+@kbd{C-s foo }
+; move forward to the next spot
+@kbd{C-x ) }
+; done with macro #2
+@kbd{C-u 1000 C-x e }
+; apply the new macro
+@end example
+That is, simply, one should be able to wrap new text around an
+existing macro. I can't tell you how many times I've defined a complex
+macro but left out the ``@kbd{C-n C-a}'' at the end...
+Yes, you can accomplish this with M-x name-last-kbd-macro, but
+that's a pain. And it's also more permanent than I'd often like.
+@item Make it possible to correct errors when defining a macro.
+Right now, the act of defining a macro stops if you get an error
+while defining it, and all of the characters you've already typed into
+the macro are gone. It needn't be that way. I think that, when that
+first error occurs, the user should be given the option of taking the
+last command off of the macro and trying again.
+The macro-reader knows where the bounds of multi-character command
+sequences are, and it could even keep track of the corresponding undo
+records; rubbing out the previous entry on the macro could also undo
+any changes that command had made. (This should also work if the macro
+spans multiple buffers, and should restore window configurations as
+well.)
+You'd want multi-level undo for this as well, so maybe the way to
+go would be to add some new key sequence which was used only as the
+back-up-inside-a-keyboard-macro-definition command.
+I'm not totally sure that this would end up being very usable;
+maybe it would be too hard to deal with. Which brings us to:
+@item Make it possible to edit a keyboard macro after it has been defined.
+I only just discovered @code{edit-kbd-macro} (@kbd{C-x C-k}).
+It is very, very cool.
+The trick it does of showing the command which will be executed is
+somewhat error-prone, as it can only look up things in the current map
+or the global map; if the macro changed buffers, it wouldn't be
+displaying the right commands. (One of the things I often use macros
+for is operating on many files at once, by bringing up a dired buffer
+of those files, editing them, and then moving on to the next.)
+However, if the act of recording a macro also kept track of the
+actual commands that had gotten executed, it could make use of that
+info as well.
+Another way of editing a macro, other than as text in a buffer,
+would be to have a command which single-steps a macro: you would lean
+on the space bar to watch the macro execute one character (command?)
+at a time, and then when you reached the point you wanted to change,
+you could do some gesture to either: insert some keystrokes into the
+middle of the macro and then continue; or to replace the rest of the
+macro from here to the end; or something.
+Another similar hack might be to convert a macro to the equivalent
+lisp code, so that one could tweak it later in ways that would be too
+hard to do from the keyboard (wrapping parts of it in @code{while} loops or
+something.) (@kbd{M-x insert-kbd-macro} isn't really what I'm
+talking about here: I mean insert the list of commands, not the list
+of keystrokes.)
+@end table
+@item Save my wrists!
+In the spirit of the `@code{teach-extended-commands-p}' variable,
+it would be interesting if emacs would keep track of what are the
+commands I use most often, perhaps grouped by proximity or mode -- it
+would then be more obvious which commands were most likely candidates
+for placement on a toolbar, or popup menu, or just a more convenient key
+binding.
+Bonus points if it figures out that I type ``@kbd{bt\n}'' and
+``@kbd{ret\ny\n}'' into my @samp{*gdb*} buffer about a hundred
+thousand times a day.
+@item XmCreateFileSelectionBox
+The thing that ``File/Open...'' pops up has excellent @emph{hack}
+value, but as a user interface, it's an abomination. Isn't it time
+someone added a real file selection dialog already? (For the
+Motifly-challenged, the Athena-based file selector that GhostView uses
+seems adequate.)
+@item Improve the toolbar system.
+It's great that XEmacs has a toolbar, but it's damn near impossible
+to customize it.
+@table @strong
+@item Make it easy to define new toolbar buttons.
+Currently, to define a toolbar button that has a text equivalent,
+one must edit a pixmap, and put the text there! That's prohibitive.
+One should be able to add some kind of generic toolbar button, with a
+plain icon or none at all, but which has a text label, without having
+to use a paint program.
+@item Make it easy to have customized, mode-local toolbars.
+In my @code{c-mode-hook}, for example, I can add a couple of new
+keybindings, and delete a few others, and to do that, I don't have to
+duplicate the entire definition of the @code{c-mode-map}. Making
+mode-local additions and subtractions to the toolbars should be as
+easy.
+@item Make it easy to have customized, mode-local popup menus.
+The same situation holds for the right-mouse-button popup menu; one
+should be able to add new commands to those menus without difficulty.
+One problem is that each mode which does have a popup menu implements
+it in a different way...
+@end table
+@item Make the External Widget work.
+About half of the work is done to make a replacement for the
+@code{XmText} widget which offloads editing responsibility to an
+external Emacs process. Someone should finish that. The benefit here
+would be that then, any Motif program could be linked such that all
+editing happened with a real Emacs behind it. (If you're Athena-minded,
+flavor with @code{Text} instead of @code{XmText} -- it's probably
+easy to make it work with both.)
+The part of this that is done already is the ability to run an Emacs
+screen on a Window object that has been created by another process (this
+is what the @file{ExternalClient.c} and @file{ExternalShell.c} stuff
+is.) What is left to be done is, adding the text-widget-editor aspects
+of this.
+First, the emacs screen being displayed on that window would have to
+be one without a modeline, and one which behaved sensibly in the context
+of ``I am a small multi-line text area embedded in a dialog box'' as
+opposed to ``I am a full-on text editor and lord of all that I survey.''
+Second, the API that the (non-emacs-aware) user of the
+@code{XmText} widget expects would need to be implemented: give the
+caller the ability to pull the edited text string back out, and so on.
+The idea here being, hooking up emacs as the widget editor should be as
+transparent as possible.
+@item Bring the debugger interface into the eighties.
+Some of you may have seen my @file{gdb-highlight.el}
+package, that I posted to gnu.emacs.sources last month. I think
+it's really cool, but there should be a lot more work in that direction.
+For those of you who haven't seen it, what it does is watch text that
+gets inserted into the @samp{*gdb*} buffer and make very nearly
+everything be clickable and have a context-sensitive menu. Generally,
+the types that are noticed are:
+@itemize
+@item function names;
+@item variable and parameter names;
+@item structure slots;
+@item source file names;
+@item type names;
+@item breakpoint numbers;
+@item stack frame numbers.
+@end itemize
+Any time one of those objects is presented in the @samp{*gdb*}
+buffer, it is mousable. Clicking middle button on it takes some default
+action (edits the function, selects the stack frame, disables the
+breakpoint, ...) Clicking the right button pops up a menu of commands,
+including commands specific to the object under the mouse, and/or other
+objects on the same line.
+So that's all well and good, and I get far more joy out of what this
+code does for me than I expected, but there are still a bunch of
+limitations. The debugger interface needs to do much, much more.
+@table @strong
+@item Make gdbsrc-mode not suck.
+The idea behind @code{gdbsrc-mode} is on the side of the angels:
+one should be able to focus on the source code and not on the debugger
+buffer, absolutely. But the implementation is just awful.
+First and foremost, it should not change ``modes'' (in the more
+general sense). Any commands that it defines should be on keys which
+are exclusively used for that purpose, not keys which are normally
+self-inserting. I can't be the only person who usually has occasion to
+actually @emph{edit} the sources which the debugger has chosen to
+display! Switching into and out of @code{gdbsrc-mode} is
+prohibitive.
+I want to be looking at my sources at all times, yet I don't want
+to have to give up my source-editing gestures. I think the right way
+to accomplish this is to put the gdbsrc commands on the toolbar and on
+popup menus; or to let the user define their own keys (I could see
+devoting my @key{kp_enter} key to ``step'', or something common
+like that.)
+Also it's extremely frustrating that one can't turn off gdbsrc mode
+once it has been loaded, without exiting and restarting emacs; that
+alone means that I'd probably never take the time to learn how to use
+it, without first having taken the time to repair it...
+@item Make it easier access to variable values.
+I want to be able to double-click on a variable name to highlight
+it, and then drag it to the debugger window to have its value printed.
+I want gestures that let me write as well as read: for example, to
+store value A into slot B.
+@item Make all breakpoints visible.
+Any time there is a running gdb which has breakpoints, the buffers
+holding the lines on which those breakpoints are set should have icons
+in them. These icons should be context-sensitive: I should be able to
+pop up a menu to enable or disable them, to delete them, to change
+their commands or conditions.
+I should also be able to @emph{move} them. It's
+annoying when you have a breakpoint with a complex condition or
+command on it, and then you realize that you really want it to be at a
+different location. I want to be able to drag-and-drop the icon to its
+new home.
+@item Make a debugger status display window.
+@itemize
+@item
+I want a window off to the side that shows persistent information
+-- it should have a pane which is a drag-editable, drag-reorderable
+representation of the elements on gdb's ``display'' list; they
+should be displayed here instead of being just dumped in with the
+rest of the output in the @samp{*gdb*} buffer.
+@item
+I want a pane that displays the current call-stack and nothing
+else. I want a pane that displays the arguments and locals of the
+currently-selected frame and nothing else. I want these both to
+update as I move around on the stack.
+@item
+Since the unfortunate reality is that excavating this information
+from gdb can be slow, it would be a good idea for these panes to
+have a toggle button on them which meant ``stop updating'', so that
+when I want to move fast, I can, but I can easily get the display
+back when I need it again.
+@end itemize
+The reason for all of this is that I spend entirely too much time
+scrolling around in the @samp{*gdb*} buffer; with gdb-highlight, I
+can just click on a line in the backtrace output to go to that frame,
+but I find that I spend a lot of time @emph{looking} for that
+backtrace: since it's mixed in with all the other random output, I
+waste time looking around for things (and usually just give up and
+type ``@kbd{bt}'' again, then thrash around as the buffer scrolls,
+and I try to find the lower frames that I'm interested in, as they
+have invariably scrolled off the window already...
+@item Save and restore breakpoints across emacs/debugger sessions.
+This would be especially handy given that gdb leaks like a sieve,
+and with a big program, I only get a few dozen relink-and-rerun
+attempts before gdb has blown my swap space.
+@item Keep breakpoints in sync with source lines.
+When a program is recompiled and then reloaded into gdb, the
+breakpoints often end up in less-than-useful places. For example, when
+I edit text which occurs in a file anywhere before a breakpoint, emacs
+is aware that the line of the bp hasn't changed, but just that it is
+in a different place relative to the top of the file. Gdb doesn't know
+this, so your breakpoints end up getting set in the wrong places
+(usually the maximally inconvenient places, like @emph{after} a loop
+instead of @emph{inside} it). But emacs knows, so emacs should
+inform the debugger, and move the breakpoints back to the places they
+were intended to be.
+@end table
+(Possibly the OOBR stuff does some of this, but can't tell, because
+I've never been able to get it to do anything but beep at me and mumble
+about environments. I find it pretty funny that the manual keeps
+explaining to me how intuitive it is, without actually giving me a clue
+how to launch it...)
+@item Add better dialog box features.
+It'd be nice to be able to create more complex dialog boxes from
+emacs-lisp: ones with checkboxes, radio button groups, text fields, and
+popup menus.
+@item Add embeddable dialog boxes.
+One of the things that the now-defunct Energize code (the C side of
+it, that is) could do was embed a dialog box between the toolbar and the
+main text area -- buffers could have control panels associated with
+them, that had all kinds of complex behavior.
+@item Make the mark-stack be visible.
+You know, I've encountered people who have been using emacs for
+years, and never use the mark stack for navigation. I can't live without
+it; ``@kbd{C-u C-SPC}'' is among my most common gestures.
+@enumerate
+@item
+It would be a lot easier to realize what's going to happen if the
+marks on the mark stack were visible. They could be displayed as small
+``caret'' glyphs, for example; something large enough to be visible,
+but not easily mistaken for a character or for the cursor.
+@item
+The marks and the selected region should be visible in the
+scrollbar as well -- I don't remember where I first saw this idea, but
+it's very cool: there's a second, less-strongly-rendered ``thumb'' in
+the scrollbar which indicates the position and size of the selection;
+and there are tiny tick-marks which indicate the positions of the
+saved points.
+@item
+Markers which are in registers (@code{point-to-register}, @kbd{C-x
+/}) should be displayed differently (more prominent.)
+@item
+It'd be cool if you could pick up markers and move them around, to
+adjust the points you'll be coming back to later.
+@end enumerate
+@item Write a new garbage collector.
+The emacs GC is very primitive; it is also, fortunately, a
+rather well isolated module, and it would not be a very big task to swap
+it with a new one (once that new one was written, that is.) Someone
+should go bone up on modern GC techniques, and then just dive right
+in...
+@item Add support for lexical scope to the emacs-lisp runtime.
+Yadda yadda, this list goes to eleven.
+@end table
+@*
+Subject:
+@strong{Re: XEmacs wishlist}
+Date: Wed, 14 May 1997 16:18:23 -0700
+From: Jamie Zawinski <jwz@@netscape.com>
+Newsgroups: comp.emacs.xemacs, comp.emacs
+Andreas Schwab wrote:
+@quotation
+@emph{Use `C-u C-x (': }
+@emph{start-kbd-macro:@*Non-nil arg (prefix arg) means append to last
+macro defined; This begins by re-executing that macro as if you typed it
+again. }
+@end quotation
+Cool, I didn't know it did that...
+But it only lets you append. I often want to prepend, or embed the
+macro multiple times (motion 1, C-x e, motion 2, C-x e, motion 3.)
+@subheading 21.2 Showstoppers
+Author: @uref{mailto:ben@@xemacs.org,Ben Wing}
+DISTRIBUTION ISSUES
+A. Unified Source Tarball.
+Packages go under root/lib/xemacs/xemacs-packages and no one ever has
+to mess with --package-path and the result can be moved from one
+directory to another pre- or post-install.
+Unified Binary Tarballs with Packages.
+Same principles as above.
+If people complain, we can also provide split binary tarballs
+(architecture dependent and independent) and place these files in a
+subdirectory so as not to confuse the majority just looking for one
+tarball.
+Under Windows, we need to provide a WISE-style GUI setup program. It's
+already there but needs some work so you can select "all" packages
+easily (should be the default).
+Parallel Root and Package Trees.
+If the user downloads separately, the main source and the packages, he
+will naturally untar them into the same directory. This results in the
+parallel root and package structure. We should support this as a "last
+resort," i.e., if we find no packages anywhere and are about to resign
+ourselves to not having packages, then look for a parallel package
+tree. The user who sets things up like this should be able to either
+run in place or "make install" and get a proper installed
+XEmacs. Never should the user have to touch --package-path.
+II. WINDOWS PRINTING
+Looks like the internals are done but not the GUI. This must be
+working in 21.2.
+III. WINDOWS MULE
+Basic support should be there. There's already a patch to get things
+started and I'll be doing more work to make this real.
+IV. GUTTER ETC.
+This stuff needs to be "stable" and generally free from bugs. Any
+API's we create need to be well-reviewed or marked clearly as
+experimental.
+V. PORTABLE DUMPER
+Last bits need to be cleaned up. This should be made the "default" for
+a while to flush-out problems. Under Microsoft Windows, Portable
+Dumper must be the default in 21.2 because of the problems with the
+existing dump process.
+COMMENT: I'd like to feature freeze this pretty soon and create a 21.3
+tree where all of my major overhauls of Mule-related stuff will go
+in. At the same time or around, we need to do the move-around in the
+repository (or create a new one) and "upgrade" to the latest CVS
+server.
+@node Future Work -- Elisp Compatibility Package, Future Work -- Drag-n-Drop, Future Work -- General Suggestions, Future Work
 @section Future Work -- Elisp Compatibility Package
 @cindex future work, elisp compatibility package
 @cindex elisp compatibility package, future work
+Author: @uref{mailto:ben@@xemacs.org,Ben Wing}
 A while ago I created a package called Sysdep, which aimed to be a
 forward compatibility package for Elisp.  The idea was that instead of
 having to write your package using the oldest version of Emacs that you
 wanted to support, you could use the newest XEmacs API, and then simply
 where a function is called using @code{funcall} or @code{apply}.
 However, such uses of functions would not be affected by the surrounding
 macrolet call, and so there doesn't appear to be any point in extracting
 them).
-@uref{../../www.666.com/ben/default.htm,Ben Wing}
 @node Future Work -- Drag-n-Drop, Future Work -- Standard Interface for Enabling Extensions, Future Work -- Elisp Compatibility Package, Future Work
 @section Future Work -- Drag-n-Drop
 @cindex future work, drag-n-drop
 @cindex drag-n-drop, future work
+Author: @uref{mailto:ben@@xemacs.org,Ben Wing}
 @strong{Abstract:} I propose completely redoing the drag-n-drop
 interface to make it powerful and extensible enough to support such
 concepts as drag over and drag under visuals and context menus invoked
 when a drag is done with the right mouse button, to allow drop handlers
 drop, etc.  This event is always passed to any function that is invoked
 as a result of the drag or drop.  There should never be any need to
 refer to the @code{current-mouse-event} variable, and in fact, this
 variable should not be changed at all during a drag or a drop.
-@uref{../../www.666.com/ben/default.htm,Ben Wing}
 @node Future Work -- Standard Interface for Enabling Extensions, Future Work -- Better Initialization File Scheme, Future Work -- Drag-n-Drop, Future Work
 @section Future Work -- Standard Interface for Enabling Extensions
 @cindex future work, standard interface for enabling extensions
 @cindex standard interface for enabling extensions, future work
+Author: @uref{mailto:ben@@xemacs.org,Ben Wing}
 @strong{Abstract:} Apparently, if you know the name of a package (for
 example, @code{fusion}), you can load it using the @code{require}
 function, but there's no standard way to turn it on or turn it off.  The
 only way to figure out how to do that is to go read the source file,
 extensions and a judgment on first of all, how commonly a user might
 want this extension, and second of all, how well written and bug-free
 the package is.  Both of these sorts of judgments could be obtained by
 doing user surveys if need be.
-@uref{../../www.666.com/ben/default.htm,Ben Wing}
 @node Future Work -- Better Initialization File Scheme, Future Work -- Keyword Parameters, Future Work -- Standard Interface for Enabling Extensions, Future Work
 @section Future Work -- Better Initialization File Scheme
 @cindex future work, better initialization file scheme
 @cindex better initialization file scheme, future work
+Author: @uref{mailto:ben@@xemacs.org,Ben Wing}
 @strong{Abstract:} A proposal is outlined for converting XEmacs to use
 the @code{.xemacs} subdirectory for its initialization files instead of
 putting them in the user's home directory.  In the process, a general
 pre-initialization scheme is created whereby all of the initialization
 @code{init.el} or @code{pre-init.el}, or if neither of those files is
 present, then it doesn't contain any sub-directories or files that look
 like what would be in a package root), then it becomes the value of the
 init file directory.  Otherwise the user's home directory is used.
 @item
 If the init file directory is the user's home directory, then the init
 file is called @code{.emacs}.  Otherwise, it's called @code{init.el}.
 @item
 If the init file directory is the user's home directory, then the
 pre-init file is called @code{.xemacs-pre-init.el}.  Otherwise it's
 called @code{pre-init.el}. (One of the reasons for this rule has to do
 with the dialog box that might be displayed at startup.  This will be
 described below.)
 @item
 If the init file directory is the user's home directory, then the custom
 init file is called @code{.xemacs-custom-init.el}.  Otherwise, it's
 called @code{custom-init.el}.
 If an error occurs in the init file, then the initial frame should
 always be created and mapped at that time so that the error is displayed
 and the debugger has a place to be invoked.
-@uref{../../www.666.com/ben/default.htm,Ben Wing}
 @node Future Work -- Keyword Parameters, Future Work -- Property Interface Changes, Future Work -- Better Initialization File Scheme, Future Work
 @section Future Work -- Keyword Parameters
 @cindex future work, keyword parameters
 @cindex keyword parameters, future work
+Author: @uref{mailto:ben@@xemacs.org,Ben Wing}
 NOTE: These changes are partly motivated by the various user-interface
 changes elsewhere in this document, and partly for Mule support.  In
 general the various API's in this document would benefit greatly from
 built-in keywords.
 @item
 The subr object type needs to be modified to contain additional slots
 for the number and names of any keyword parameters.
 @item
 The implementation of the @code{funcall} function needs to be modified
 so that it knows how to process keyword parameters.  This is the only
 place that will require very much intricate coding, and much of the
 logic that would need to be added can be lifted directly from the
 @code{cl} code.
 @item
 A new macro, similar to the @code{DEFUN} macro, and probably called
 @code{DEFUN_WITH_KEYWORDS}, needs to be defined so that built-in Lisp
 primitives containing keywords can be created.  Now, the
 @code{DEFUN_WITH_KEYWORDS} macro should take an additional parameter
 that specifies the number of keyword parameters.  However, this would
 require some additional complexity in the preprocessor definition of the
 @code{DEFUN_WITH_KEYWORDS} macro, and probably isn't worth
 implementing).
 @item
 The byte compiler would have to be modified slightly so that it knows
 about keyword parameters when it parses the parameter declaration of a
 function.  For example, so that it issues the correct warnings
 concerning calls to that function with incorrect arguments.
 @item
 The @code{make-docfile} program would have to be modified so that it
 generates the correct parameter lists for primitives defined using the
 @code{DEFUN_WITH_KEYWORDS} macro.
 @item
 Possibly other aspects of the help system that deal with function
 descriptions might have to be modified.
 @item
 A helper function might need to be defined to make it easier for
 primitives that use both the @code{&amp;rest} and @code{&amp;key}
 specifiers to parse their argument lists.
 @node Future Work -- Property Interface Changes, Future Work -- Toolbars, Future Work -- Keyword Parameters, Future Work
 @section Future Work -- Property Interface Changes
 @cindex future work, property interface changes
 @cindex property interface changes, future work
+Author: @uref{mailto:ben@@xemacs.org,Ben Wing}
 In my past work on XEmacs, I already expanded the standard property
 functions of @code{get}, @code{put}, and @code{remprop} to work on
 objects other than symbols and defined an additional function
 @code{object-plist} for this interface.  I'd like to expand this
 interface further and advertise it as the standard way to make property
 @dfn{unbound}, which is to say that its value has not been explicitly
 specified. Note: the way to make a property unbound is to call
 @code{remprop}.  Note also that for some built-in properties, setting
 the property to its default value is equivalent to making it unbound.
 @item
 The behavior of the @code{get} function is modified.  If the @code{get}
 function is called on a property that is unbound and the third, optional
 @var{default} argument is @code{nil}, then the default value of the
 property is returned.  If the @var{default} argument is not @code{nil},
 initial default value of @code{nil}.  Code that calls the @code{get}
 function and specifies @code{nil} for the @var{default} argument, and
 expects to get @code{nil} returned if the property is unbound, is almost
 certainly wrong anyway.
 @item
 A new function, @code{get1} is defined.  This function does not take a
 default argument like the @code{get} function.  Instead, if the property
 is unbound, an error is signaled.  Note: @code{get} can be implemented
 in terms of @code{get1}.
 @item
 New functions @code{property-default-value} and @code{property-bound-p}
 are defined with the obvious semantics.
 @item
 An additional function @code{property-built-in-p} is defined which takes
 two arguments, the first one being a symbol naming an object type, and
 the second one specifying a property, and indicates whether the property
 name has a built-in meaning for objects of that type.
 @item
 It is not necessary, or even desirable, for all object types to allow
 user-defined properties.  It is always possible to simulate user-defined
 properties for an object by using a weak hash table.  Therefore, whether
 an object allows a user to define properties or not should depend on the
 meaning of the object.  If an object does not allow user-defined
 properties, the @code{put} function should signal an error, such as
 @code{undefined-property}, when given any property other than those that
 are predefined.
 @item
 A function called @code{user-defined-properties-allowed-p} should be
 defined with the obvious semantics.  (See the previous item.)
 @item
 Three more functions should be defined, called
 @code{built-in-property-name-list}, @code{property-name-list}, and
 @code{user-defined-property-name-list}.
 e.g. (define-property-method 'hash-table
 :put #'(lambda (obj key value) (puthash key obj value)))
 @end example
 @node Future Work -- Toolbars, Future Work -- Menu API Changes, Future Work -- Property Interface Changes, Future Work
 @section Future Work -- Toolbars
 @cindex future work, toolbars
 @cindex toolbars
 @node Future Work -- Easier Toolbar Customization, Future Work -- Toolbar Interface Changes, Future Work -- Toolbars, Future Work -- Toolbars
 @subsection Future Work -- Easier Toolbar Customization
 @cindex future work, easier toolbar customization
 @cindex easier toolbar customization, future work
+Author: @uref{mailto:ben@@xemacs.org,Ben Wing}
 @strong{Abstract:} One of XEmacs' greatest strengths is its ability to
 be customized endlessly.  Unfortunately, it is often too difficult to
 figure out how to do this.  There has been some recent work like the
 Custom package, which helps in this regard, but I think there's a lot
 ones, would be the ability to change the font size of the captions.  I'm
 sure that Kyle, for one, would appreciate this.
 (This is incomplete.....)
-@uref{../../www.666.com/ben/default.htm,Ben Wing}
 @node Future Work -- Toolbar Interface Changes,  , Future Work -- Easier Toolbar Customization, Future Work -- Toolbars
 @subsection Future Work -- Toolbar Interface Changes
 @cindex future work, toolbar interface changes
 @cindex toolbar interface changes, future work
+Author: @uref{mailto:ben@@xemacs.org,Ben Wing}
 I propose changing the way that toolbars are specified to make them more
 flexible.
 @enumerate
 @node Future Work -- Menu API Changes, Future Work -- Removal of Misc-User Event Type, Future Work -- Toolbars, Future Work
 @section Future Work -- Menu API Changes
 @cindex future work, menu API changes
 @cindex menu API changes, future work
+Author: @uref{mailto:ben@@xemacs.org,Ben Wing}
 @enumerate
 @item
 I propose making a specifier for the menubar associated with the frame.
 properties may not actually be implemented at first, but at least the
 keywords for them should be defined.
 @end enumerate
-@uref{../../www.666.com/ben/default.htm,Ben Wing}
 @node Future Work -- Removal of Misc-User Event Type, Future Work -- Mouse Pointer, Future Work -- Menu API Changes, Future Work
 @section Future Work -- Removal of Misc-User Event Type
 @cindex future work, removal of misc-user event type
 @cindex removal of misc-user event type, future work
+Author: @uref{mailto:ben@@xemacs.org,Ben Wing}
 @strong{Abstract:} This page describes why the misc-user event type
 should be split up into a number of different event types, and how to do
 this.
 @node Future Work -- Abstracted Mouse Pointer Interface, Future Work -- Busy Pointer, Future Work -- Mouse Pointer, Future Work -- Mouse Pointer
 @subsection Future Work -- Abstracted Mouse Pointer Interface
 @cindex future work, abstracted mouse pointer interface
 @cindex abstracted mouse pointer interface, future work
+Author: @uref{mailto:ben@@xemacs.org,Ben Wing}
 @strong{Abstract:} We need to create a new image format that allows
 standard pointer shapes to be specified in a way that works on all
 Windows systems.  I suggest that this be called @code{pointer}, which
 has one tag associated with it, named @code{:data}, and whose value is a
 string.  The possible strings that can be specified here are predefined
 be @code{mswindows-resource}.  At least in the case of
 @code{cursor-font}, the old value should be maintained for compatibility
 as an obsolete alias.  The @code{resource} format was added so recently
 that it's possible that we can just change it.
-@uref{../../www.666.com/ben/default.htm,Ben Wing}
 @node Future Work -- Busy Pointer,  , Future Work -- Abstracted Mouse Pointer Interface, Future Work -- Mouse Pointer
 @subsection Future Work -- Busy Pointer
 @cindex future work, busy pointer
 @cindex busy pointer, future work
+Author: @uref{mailto:ben@@xemacs.org,Ben Wing}
 Automatically make the mouse pointer switch to a busy shape (watch
 signal) when XEmacs has been "busy" for more than, e.g. 2 seconds.
 Define the @dfn{busy time} as the time since the last time that XEmacs was
 ready to receive input from the user.  An implementation might be:
 @node Future Work -- Everything should obey duplicable extents,  , Future Work -- Extents, Future Work -- Extents
 @subsection Future Work -- Everything should obey duplicable extents
 @cindex future work, everything should obey duplicable extents
 @cindex everything should obey duplicable extents, future work
+Author: @uref{mailto:ben@@xemacs.org,Ben Wing}
 A lot of functions don't properly track duplicable extents.  For
 example, the @code{concat} function does, but the @code{format} function
 does not, and extents in keymap prompts are not displayed either.  All
 of the functions that generate strings or string-like entities should
 track the extents that are associated with the strings.  Currently this
 a Lisp string into a @code{lisp_string_struct}.  However, there is
 already a function @code{copy_string_extents()} that does basically this
 exact thing, and it should be easy to create a modified version of this
 function.
-@uref{../../www.666.com/ben/default.htm,Ben Wing}
 @node Future Work -- Version Number and Development Tree Organization, Future Work -- Improvements to the @code{xemacs.org} Website, Future Work -- Extents, Future Work
 @section Future Work -- Version Number and Development Tree Organization
 @cindex future work, version number and development tree organization
 @cindex version number and development tree organization, future work
+Author: @uref{mailto:ben@@xemacs.org,Ben Wing}
 @strong{Abstract:} The purpose of this proposal is to present a coherent
 plan for how development branches in XEmacs are managed.  This will
 cover such issues as stable versus experimental branches, creating new
 branches, synchronizing patches between branches, and how version
 without the diff getting cluttered up by these code cleanliness changes
 that don't change any actual behavior.
 @end enumerate
-@uref{../../www.666.com/ben,Ben Wing}
 @node Future Work -- Improvements to the @code{xemacs.org} Website, Future Work -- Keybindings, Future Work -- Version Number and Development Tree Organization, Future Work
 @section Future Work -- Improvements to the @code{xemacs.org} Website
 @cindex future work, improvements to the @code{xemacs.org} website
 @cindex improvements to the @code{xemacs.org} website, future work
+Author: @uref{mailto:ben@@xemacs.org,Ben Wing}
 The @code{xemacs.org} web site is the face that XEmacs presents to the
 outside world.  In my opinion, its most important function is to present
 information about XEmacs in such a way that solicits new XEmacs users
 and co-contributors.  Existing members of the XEmacs community can
 at @uref{../../www.freshmeat.net/default.htm,http://www.freshmeat.net},
 the various announcement news groups (for example,
 @uref{news:comp.os.linux.announce,comp.os.linux.announce}, and the
 Windows announcement news group) etc.
-@uref{../../www.666.com/ben/default.htm,Ben Wing}
 @node Future Work -- Keybindings, Future Work -- Byte Code Snippets, Future Work -- Improvements to the @code{xemacs.org} Website, Future Work
 @section Future Work -- Keybindings
 @cindex future work, keybindings
 @cindex keybindings, future work
 @node Future Work -- Keybinding Schemes, Future Work -- Better Support for Windows Style Key Bindings, Future Work -- Keybindings, Future Work -- Keybindings
 @subsection Future Work -- Keybinding Schemes
 @cindex future work, keybinding schemes
 @cindex keybinding schemes, future work
+Author: @uref{mailto:ben@@xemacs.org,Ben Wing}
 @strong{Abstract:} We need a standard mechanism that allows a different
 global key binding schemes to be defined.  Ideally, this would be the
 @uref{keyboard-actions.html,keyboard action interface} that I have
 proposed, however this would require a lot of work on the part of mode
 @node Future Work -- Better Support for Windows Style Key Bindings, Future Work -- Misc Key Binding Ideas, Future Work -- Keybinding Schemes, Future Work -- Keybindings
 @subsection Future Work -- Better Support for Windows Style Key Bindings
 @cindex future work, better support for windows style key bindings
 @cindex better support for windows style key bindings, future work
+Author: @uref{mailto:ben@@xemacs.org,Ben Wing}
 @strong{Abstract:} This page describes how we could create an XEmacs
 extension that modifies the global key bindings so that a Windows user
 would feel at home when using the keyboard in XEmacs.  Some of these
 bindings don't conflict with standard XEmacs keybindings and should be
 allows the user to make a selection of which key binding scheme they
 would prefer as the default, either the XEmacs standard bindings, Vi
 bindings (which would be Viper mode), Windows-style bindings, Brief,
 CodeWright, Visual C++, or whatever we manage to implement.
-@uref{../../www.666.com/ben/default.htm,Ben Wing}
 @node Future Work -- Misc Key Binding Ideas,  , Future Work -- Better Support for Windows Style Key Bindings, Future Work -- Keybindings
 @subsection Future Work -- Misc Key Binding Ideas
 @cindex future work, misc key binding ideas
 @cindex misc key binding ideas, future work
+Author: @uref{mailto:ben@@xemacs.org,Ben Wing}
 @itemize
 @item
 M-123 ... do digit arg
 @node Future Work -- Byte Code Snippets, Future Work -- Lisp Stream API, Future Work -- Keybindings, Future Work
 @section Future Work -- Byte Code Snippets
 @cindex future work, byte code snippets
 @cindex byte code snippets, future work
+Author: @uref{mailto:ben@@xemacs.org,Ben Wing}
 @itemize
 @item
 For use in time critical (e.g. redisplay) places such as display
 tables - a simple piece of code is evalled, e.g.
 @example
 @end itemize
 @menu
 * Future Work -- Autodetection::
 * Future Work -- Conversion Error Detection::
+* Future Work -- Unicode::
 * Future Work -- BIDI Support::
 * Future Work -- Localized Text/Messages::
 @end menu
 @node Future Work -- Autodetection, Future Work -- Conversion Error Detection, Future Work -- Byte Code Snippets, Future Work -- Byte Code Snippets
 @cindex future work, autodetection
 @cindex autodetection, future work
 There are various proposals contained here.
-@subsection New Implementation of Autodetection Mechanism
+@subheading New Implementation of Autodetection Mechanism
+Author: @uref{mailto:ben@@xemacs.org,Ben Wing}
 The current auto detection mechanism in XEmacs Mule has many
 problems. For one thing, it is wrong too much of the time. Another
 problem, although easily fixed, is that priority lists are fixed rather
 than varying, depending on the particular locale; and finally, it
 As part of the "are you sure" dialog box or question, the user can
 display the results of the decoding to make sure it's correct. If the
 user says "no, they're not sure," then the same list of choices as
 previously mentioned will be presented.
-@subheading Implementation of Coding System Priority Lists in Various Locales
+@subheading RFC: Autodetection
+Also appeared under heading "Implementation of Coding System Priority
+Lists in Various Locales" ?
+Author: @uref{mailto:stephen@@xemacs.org,Stephen Turnbull}
+Date: 11/1/1999 2:48 AM
 @example
+>>>>> "Hrvoje" == Hrvoje Niksic <hniksic@@srce.hr> writes:
+[Ben sez:]
+>> You are perfectly free to set up your XEmacs like this, but
+>> XEmacs/Mule @strong{will} autodetect by default if there is no
+>> Content-Type: info and no reason to believe we are dealing with
+>> binary files.
+Hrvoje> In that case, it will be a serious mistake to make
+Hrvoje> --with-mule the default, ever.  I think more care should
+Hrvoje> be shown in meeting the need of European users.
+@end example
+Hrvoje, I don't understand what you are worrying about.  I suspect you
+are worrying about Handa's hyperactive and obstinate Mule, not what
+Ben has in mind.  Yes, Ben has said "better guessing," but that's
+simply not reasonable without substantial language environment
+information.  I think trying to detect Latin-1 vs Latin-2 in the POSIX
+locale would be a big mistake, I think trying to guess Big 5 v. Shift
+JIS in a European locale would be a big mistake.
+If Ben doesn't mean "more appropriate use of language environment
+information" when he writes "better guessing," I, as much as you, want
+to see how he plans to do that.  Ben?  ("Yes/no/oops I need to think
+about it" is good enough if you have specifics you intend to put in
+the RFC you're planning to present.)
+Let me give a formal proposal of what I would like to see in the
+autodetection specification.
 @enumerate
+@item
+Definitions
+@enumerate
+@item
+@dfn{Autodetection} means detecting and making available to Mule
+the external file's encoding.  See (5), below.  It doesn't
+imply any specific actions based on that information.
+@item
+The @dfn{default} case is POSIX locale, and no environment
+information in ~/.emacs.
+N.B.  This @strong{will} cause breakage for all 1-byte users because
+the default case can no longer assume Latin-1.  You @strong{may} be
+able to use the TTY font or the Xt -font option to fake this,
+and default to iso8859-1; I would hope that we would not use
+such a kludge in the beta versions, although it might be
+satisfactory for general use.  In particular, encodings like
+VISCII (Vietnamese) and I believe KOI-8 (Cyrillic) are not
+ISO-2022-clean, but using C1 control characters as a heuristic
+for detecting binary files is useful.
+If we do allow it, I think that XEmacs should bitch and warn
+that the practices of implicitly specifying language
+environment by -font and defaulting on TTYs is deprecated and
+likely to be obsoleted.
+@item
+The @dfn{European} case is any Latin-* locale, either implied by
+setlocale() and friends or set in ~/.emacs.  Latin-1 is
+specifically not given precedence over other Latin-*, or
+non-Latin or non-ISO-8859 for that matter.  I suspect but am
+not sure that this case extends to all ISO-8859 encodings, and
+possibly to non-ISO-8859 single-byte encodings like KOI-8r (in
+particular when combined in a class with ISO-8859 encodings).
+@item
+The @dfn{CJK} case is any CJK locale.  Japanese is specifically
+not given precedence over other Asian locales.
+@item
+For completeness, define the @dfn{Unicode} case (Unicode
+unfortunately has lots of junk such as precomposed characters,
+language tags, and directionality indicators in it; we
+probably don't care yet, but we should also not claim
+compliance) and the @dfn{general} case (which has a lot of
+features similar to Unicode, but lacks the advantage of a
+unified encoding).  This proposal has no idea how to handle
+the special features of these, or even if that matters.  The
+general case includes stuff that nobody here really knows how
+it works, like Tibetan and Ethiopic.
+@end enumerate
+Each of the following cases is given in the order of priority of
+detection.  I'm not sure I'm serious about the top priority given the
+(optional) Unicode detection.  This may be appropriate if Ben is
+right that ISO-2022 is going to disappear, but possibly not until then
+(two two-byte sequences out of 65536 is probably 1.99 too many).  It
+probably isn't too risky if (6)(c) is taken pretty seriously; a Unicode
+file should contain _no_ private use characters unless the encoding is
+explicitly specified, and that's a block of 1/10 of the code space,
+which should help a lot in detecting binary files.
 @item
 Default locale
 @enumerate
 @item
 Newlines will be detected in text files.
 @end enumerate
 @item
 Unicode and general locales; multilingual use
-@end enumerate
 @enumerate
 @item
 Hopefully a system general enough to handle (2)--(4) will
 handle these, too, but we should watch out for gotchas like
 would involve (eg) heuristics like picking a set of code
 points that are frequent in Shift JIS and uncommon in Big 5
 and betting that a file containing many characters from that
 set is Shift JIS.
 @end enumerate
-@end example
+@item
+Relationship to decoding semantics
+@enumerate
+@item
+Autodetection should be run on every input stream unless the
+user explicitly disables it.
+@item
+The (conceptual) default procedure is
+@item
+Read the file into the buffer
+Announce the result of autodetection to the user.
+User may request decoding, with autodetected encoding(s)
+given priority in a list of available encodings.
+zations (see (e) below) should avoid introducing data
+tion that this default procedure would avoid.
+sly, it can't be perfect if any autodecoding is done;
+like Hrvoje should have an easily available option to
+to this default (or an optimized approximation which
+t actually read the whole file into a buffer) or simply
+y everything as binary (with the "font" for binary files
+a user option).
+@item
+This implies that we should be detecting conditions in the
+tail of the file which violate the implicit assumptions of the
+coding system autodetected (eg, in UTF-8 illegal UTF-8
+sequences, including those corresponding to surrogates) should
+raise a warning; the buffer should probably be made read-only
+and the user prompted.
+This could be taken to extremes, like checking by table
+whether all characters in a Japanese file are actually
+legitimate JIS codes; that's insane (and would cause corporate
+encodings to be recognized as binary).  But we should think
+about the idea that autodetection shouldn't mean XEmacs can't
+change its mind.
+@item
+A flexible means for the user to delegate the decision
+(conditional on the result of autodetection) to decode or not
+to XEmacs or a Lisp program should be provided (eg, the
+coding priority list and/or a file-coding-alist).
+@item
+Optimized operations (eg, the current lstreams) should be
+provided, with the recognition that if they depend on sampling
+the file they are risky.
+@item
+Mule should provide a reasonable set of default delegations
+(as in (d) above) for as many locales as possible.
+@end enumerate
+@item
+Implementation
+@enumerate
+@item
+I think all the decision logic suggested above can be
+accomplished through a coding-priority-list and appropriate
+initializations for different language environments, and a
+file-coding-alist.
+@item
+Many of the tests on the file's tail shouldn't be very
+expensive; in particular, all of the ones I've suggested are
+O(n) although they might involve moderate-sized auxiliary
+tables for efficiency (eg, 64kB for a single Unicode-oriented
+test).
+@end enumerate
+@end enumerate
+Other comments:
+It might be reasonable given Hrvoje's objections to require that any
+autodetection that could cause data loss (any coding system that
+involves escape sequences, and only those AFAIK: by design translation
+to Unicode is invertible) by default prompt the user (presumable with
+a novice-like ability to retain the prompt, always default to binary,
+or always default to the autodetected encoding) in the future, at
+least in locales that don't need it (POSIX, Latin-any).
+Ben thinks that we can remember the input data; I think it's going to
+be hard to comprehensively test that a highly optimized version works.
+Good design will help, but ISO-2022 is enormously complex, and there
+are many encodings that violate even its lax assumptions.  On the
+other hand, memory is the only way to get non-rewindable streams right.
+Hrvoje himself said he would like to have an XEmacs that distinguishes
+between Latin-1 and Latin-2 text.  Where it is possible to do that,
+this is exactly what autodetection of ISO-2022 and Unicode gives you.
+Many people would want that, even at some risk of binary corruption.
+>> Once again I remind you that XEmacs is a @strong{text} editor.  There
+>> are lots of files that potentially may have Japanese etc. in
+>> them without this marked, e.g. C or Elisp files in the XEmacs
+>> source.  Surely you're not arguing that we interpret even these
+>> files as binary by default?
+Hrvoje> I am.  If I want to see Japanese, I'll setup my
+Hrvoje> environment that way.  But I don't, and neither do 99% of
+Hrvoje> Croatian users.  I can't speak for French, Italian, and
+Hrvoje> others, but I'd assume similar.
+Hrvoje> If there is Japanese in the source files, I will see it as
+Hrvoje> escape sequences, which is perfectly fine, because I don't
+Hrvoje> read Japanese.
+And some (European) people will have their terminals scrambled,
+because Shift-JIS contains sequences that can change the state of
+XTerm (as do fixed-width Unicode and Big5).  This may also be a
+problem with some Windows-12xx encodings; I'm not sure they all are
+ISO-2022-clean.  (This isn't a problem for XEmacs native X11 frames or
+native MS-Windows frames, and the XEmacs sources themselves are all in
+7-bit ISO-2022 now IIRC.  But it is a potential source of great
+frustration for many users.)
+I think that should be considered too, although it is presumably lower
+priority than the data corruption of binary files.
+@subheading Response to RFC: Autodetection
+Author: @uref{mailto:ben@@xemacs.org,Ben Wing}
+Date: 11/1/1999 7:24 AM
+Stephen, thank you very much for writing this up.  I think it is a good start,
+and definitely moving in the direction I would like to see things going: more
+proposals, less arguing. (aka "more light, less heat") However, I have some
+suggestions for cleaning this up:
+You should try to make it more layered.  For example, you might have one
+section devoted to the workings of autodetection, which starts out like this
+(the section numbers below are totally arbitrary):
+@subsubheading Section 5
+@code{Autodetect()} is a function whose arguments are (1) a readable stream, (2) some
+hints indicating how the autodetection is to proceed, and (3) a value
+indicating the maximum number of characters to examine at the beginning of the
+stream.  (Possibly, the value in (3) may be some special symbol indicating
+that we only go as far as the next line, or a certain number of lines ahead;
+this would be used as part of "continuous autodetection", e.g. we are decoding
+the results of an interactive terminal session, where the user may
+periodically switch encodings, line terminations, etc. as different programs
+get run and/or telnet or similar sessions are entered into and exited.) We
+assume the stream is rewindable; if not, insert a "rewinding" stream in front
+of the non-rewinding stream; this kind of stream automatically buffers the
+data as necessary.
+[You can use pseudo-code terminology here.  No need for straight C or ELisp.]
+[Then proceed to describe what the hints look like -- e.g. you could portray
+it as a property list or whatever.  The idea is that, for each locale, there
+is a corresponding hints value that is used at least by default.  The hints
+structure also has to be set up to allow for two or more competing hints
+specifications to be merged together.  For example, the extension of a file
+might provide an additional hint or hints about how to interpret the data of
+that file, and the caller of @code{autodetect()}, when calling @code{autodetect()} on such a
+file, would need to have a way of gracefully merging the default hints
+corresponding to the locale with the more specific hints provided by the
+extension.  Furthermore, users like Hrvoje might well want to provide their
+own hints to supplement and override parts of the generic hints -- e.g. "I
+don't ever want to see non-European encodings decoded; treat them as binary
+instead".]
+[Then describe algorithmically how the autodetection works.  First, you could
+describe it more generally, i.e. presenting an algorithmic overview, then you
+could discuss in detail exactly how autodetection of a particular type of
+external encoding works -- e.g. "for iso2022, we first look for an escape
+character, followed by a byte in this range [. ... .] etc."]
+@subsubheading Section 6
+This section describes the concept of a locale in XEmacs, and how it is
+derived from the user's environment.  A locale in XEmacs is a pair, a country
+and a language, together determining the handling of locale-specific areas of
+XEmacs.  All locale-specific areas in XEmacs make use of this XEmacs locale,
+and do not attempt to derive the locale from any other sources.  The user is
+free to change the current locale at any time; accessor and mutator functions
+are provided to do this so that various locale-specific areas can optionally
+be changed together with it.
+[Then you describe how the XEmacs locale is extracted from .emacs, from
+@code{setlocale()}, from the LANG environment variables, from -font, or wherever
+else.  All other sections assume this dirty work is done and never even
+mention it]
+@subsubheading Section 7
+[Here you describe the default @code{autodetect()} hints value corresponding to each
+possible locale.  You should probably use a schematic description here, e.g.
+an actual Lisp property list, liberally commented.]
+@subsubheading Section 8 etc.
+[Other sections cover anything I've missed.  By being very careful to separate
+out the layers, you simultaneously introduce more rigor (easier to catch bugs)
+and make it easier for someone else to understand it completely.]
 @subheading Better Algorithm, More Flexibility, Different Levels of Certainty
 @subheading Much More Flexible Coding System Priority List, per-Language Environment
 @subheading User Ability to Select Encoding when System Unsure or Encounters Errors
 @subheading Another Autodetection Proposal
+Author: @uref{mailto:ben@@xemacs.org,Ben Wing}
 however, in general the detection code has major problems and needs lots
 of work:
 @itemize @bullet
 @item
 instead of merely "yes" or "no" for particular categories, we need a
 more flexible system, with various levels of likelihood.  Currently
 I've created a system with six levels, as follows:
-[see file-coding.h]
+[see @file{file-coding.h}]
 Let's consider what this might mean for an ASCII text detector.  (In
 order to have accurate detection, especially given the iteration I
 proposed below, we need active detectors for @strong{all} types of data we
 might reasonably encounter, such as ASCII text files, binary files,
 ben [at least that's what sjt thinks]
 *****
+Author: @uref{mailto:stephen@@xemacs.org,Stephen Turnbull}
 While this is clearly something of an improvement over earlier designs,
 it doesn't deal with the most important issue: to do better than categories
 (which in the medium term is mostly going to mean "which flavor of Unicode
 is this?"), we need to look at statistical behavior rather than ruling out
 categories via presence of specific sequences.  This means the stream
 and "magic" like Unicode signatures or file(1) magic.
 @end enumerate
 --sjt
-@node Future Work -- Conversion Error Detection, Future Work -- BIDI Support, Future Work -- Autodetection, Future Work -- Byte Code Snippets
+@node Future Work -- Conversion Error Detection, Future Work -- Unicode, Future Work -- Autodetection, Future Work -- Byte Code Snippets
 @subsection Future Work -- Conversion Error Detection
 @cindex future work, conversion error detection
 @cindex conversion error detection, future work
 @subheading "No Corruption" Scheme for Preserving External Encoding when Non-Invertible Transformation Applied
+Author: @uref{mailto:ben@@xemacs.org,Ben Wing}
 A preliminary and simple implementation is:
 @quotation
 But you could implement it much more simply and usefully by just
 correspondences to get the internal state right.
 @end enumerate
 @end quotation
 @subheading Another Error-Catching Idea
+Author: @uref{mailto:ben@@xemacs.org,Ben Wing}
 Nov 4, 1999
 Finally, I don't think "save the input" is as hard as you make it out to
 be.  Conceptually, in fact, it's simple: for each minimal group of bytes
 cases.  The hardest part, in fact, is making all the string/text
 handling in XEmacs be robust w.r.t. text properties.
 @subheading Strategies for Error Annotation and Coding Orthogonalization
-From sjt (?):
+Author: @uref{mailto:stephen@@xemacs.org,Stephen Turnbull}
 We really want to separate out a number of things.  Conceptually,
 there is a nested syntax.
 At the top level is the ISO 2022 extension syntax, including charset
 It's possible that, by doing the processing with tables of functions or
 the like, the parser can be used for both detection and translation.
 @subheading Handling Writing a File Safely, Without Data Loss
-From ben:
+Author: @uref{mailto:ben@@xemacs.org,Ben Wing}
 @quotation
 When writing a file, we need error detection; otherwise somebody
 will create a Unicode file without realizing the coding system
 of the buffer is Raw, and then lose all the non-ASCII/Latin-1
 same thing (error checking, list of alternatives, etc.) needs
 to happen when reading!  all of this will be a lot of work!
 @end enumerate
 @end quotation
---ben
+Author: @uref{mailto:stephen@@xemacs.org,Stephen Turnbull}
 I don't much like Ben's scheme.  First, this isn't an issue of I/O,
 it's a coding issue.  It can happen in many places, not just on stream
 I/O.  Error checking should take place on all translations.  Second,
 the two-pass algorithm should be avoided if possible.  In some cases
 characters.  So (up to some maximum) we should keep a list of unsafe
 text positions, and provide a convenient function for traversing them.
 --sjt
-@node Future Work -- BIDI Support, Future Work -- Localized Text/Messages, Future Work -- Conversion Error Detection, Future Work -- Byte Code Snippets
+@node Future Work -- Unicode, Future Work -- BIDI Support, Future Work -- Conversion Error Detection, Future Work -- Byte Code Snippets
+@subsection Future Work -- Unicode
+@cindex future work, unicode
+@cindex unicode, future work
+Author: @uref{mailto:ben@@xemacs.org,Ben Wing}
+Following is an old proposal.  Unicode has been implemented already, in
+a different fashion; but there are some ideas here for more general
+support, e.g. properties of Unicode characters other than their mappings
+to particular charsets.
+We recognize 128, [256], 128x128, [256x256] for source charsets;
+for Unicode, 256x256 or 16x256x256.
+In all cases, use tables of tables and substitute a default subtable
+if entire row is empty.
+If destination is Unicode, either 16 or 32 bits.
+If destination is charset, either 8 or 16 bits.
+For the moment, since we only do 94, 96, 94x94 or 96x96, only do 128
+or 128x128 for source charsets and use the range 33-126 or 32-127.
+(Except ASCII - we special case that and have no table because we can
+algorithmically translate)
+Also have a 16x256x256 table -> 32 bits of Unicode char properties.
+A particular charset contains two associated mapping tables, for both
+directions.
+API is set-unicode-mapping:
+@example
+(set-unicode-mapping
+unicode char
+unicode charset-code charset-offset
+unicode vector of char
+unicode list of char
+unicode string of char
+unicode vector or list of codes charset-offset
+@end example
+Establishes a mapping between a unicode codepoint (an integer) and
+one or more chars in a charset.  The mapping is automatically
+established in both directions.  Chars in a charset can be specified
+either with an actual character or a codepoint (i.e. an integer)
+and the charset it's within.  If a sequence of chars or charset
+points is given, multiple mappings are established for consecutive
+unicode codepoints starting with the given one.  Charset codepoints
+are specified as most-significant x 256 + least significant, with
+both bytes in the range 33-126 (for 94 or 94x94) or 32-127 (for 96
+or 96x96), unless an offset is given, which will be subtracted from
+each byte.  (Most common values are 128, for codepoints given with
+the high bit set, or -32, for codepoints given as 1-94 or 0-95.)
+Other API's:
+@example
+(write-unicode-mapping file charset)
+@end example
+Write the mapping table for a particular charset to the specified
+file.  The tables are written in an internal format that allows for
+efficient loading, for portability across platforms and XEmacs
+invocations, for conserving space, for appending multiple tables one
+directly after another with no need for a directory anywhere in the
+file, and for reorganizing a file as in this format (with a magic
+sequence at the beginning).  The data will be appended at the end of
+a file, so that multiple tables can be written to a file; remove the
+file first to avoid this.
+@example
+(write-unicode-properties file unicode-codepoint length)
+@end example
+Write the Unicode properties (not including charset mappings) for
+the specified range of contiguous Unicode codepoints to the end of
+the file (i.e. append mode) in a binary format similar to what was
+mentioned in the write-unicode-mapping description and with the same
+features.
+Extension to set-unicode-mapping:
+@example
+(set-unicode-mapping
+list-or-vector-of-unicode-codepoints char
+""                                   charset-code charset-offset
+""                                   sequence of char
+""                                   list-or-vector-of-codes
+charset-offset
+@end example
+The first two forms are conceptually the inverse of the forms above
+to specify characters for a contiguous range of Unicode codepoints.
+These new forms let you specify the Unicode codepoints for a
+contiguous range of chars in a charset.  "Contiguous" here means
+that if we run off the end of a row, we go to the first entry of the
+next row, rather than to an invalid code point.  For example, in a
+94x94 charset, valid rows and columns are in the range 0x21-0x7e;
+after 0x457c 0x457d 4x457e goes 0x4621, not something like 0x457f,
+which is invalid.
+The final two forms are the most general, letting you specify an
+arbitrary set of both Unicode points and charset chars, and the two
+are matched up just like a series of individual calls.  However, if
+the lists or vectors do not have the same length, an error is
+signaled.
+@example
+(load-unicode-mapping file &optional charset)
+@end example
+If charset is omitted, loads all charset mapping tables found and
+returns a list of the charsets found.  If charset is specified,
+searches through the file for the appropriate mapping tables.  (This
+is extremely fast because each entry in the file gives an offset to
+the next one).  Returns t if found.
+@example
+(load-unicode-properties file unicode-codepoint)
+@end example
+@example
+(list-unicode-entries file)
+@end example
+@example
+(autoload-unicode-mapping charset)
+@end example
+...
+(unfinished)
+@node Future Work -- BIDI Support, Future Work -- Localized Text/Messages, Future Work -- Unicode, Future Work -- Byte Code Snippets
 @subsection Future Work -- BIDI Support
 @cindex future work, bidi support
 @cindex bidi support, future work
+Author: @uref{mailto:ben@@xemacs.org,Ben Wing}
 @enumerate
 @item
 Use text properties to handle nesting levels, overrides
 BIDI-specific text properties (as per Unicode BIDI algorithm)
 (much of this comment is outdated, and a lot of it is actually
 implemented)
 @subsection Proposal for How This All Ought to Work
+Author: @uref{mailto:jwz@@jwz.org,Jamie Zawinski}
 this isn't implemented yet, but this is the plan-in-progress
 In general, it's accepted that the best way to internationalize is for all
 messages to be referred to by a symbolic name (or number) and come out of a
 one we know how to translate, then we translate it?  I think this is a
 worthy goal.  It remains to be seen how well it will work in practice.
 So, we should endeavor to minimize the impact on the lisp code.  Certain
 primitive lisp routines (the stuff in lisp/prim/, and especially in
-cmdloop.el and minibuf.el) may need to be changed to know about translation,
+@file{cmdloop.el} and @file{minibuf.el}) may need to be changed to know about translation,
 but that's an ideologically clean thing to do because those are considered
 a part of the emacs substrate.
 However, if we find ourselves wanting to make changes to, say, RMAIL, then
 something has gone wrong.  (Except to do things like remove assumptions
 the translation.  The new plan is to separate these two things more: the
 tags that we search for to build the catalog will be stuff that was in there
 already, and the translation will get done in some more centralized, lower
 level place.
-This program (make-msgfile.c) addresses the first part, extracting the
+This program (@file{make-msgfile.c}) addresses the first part, extracting the
 strings.
 For the emacs C code, we need to recognize the following patterns:
 @example
 I expect there will be a lot like the above; basically, any function which
 is a commonly used wrapper around an eventual call to @code{message} or
 @code{read-from-minibuffer} needs to be recognized by this program.
 @example
 (dgettext "domain-name" "string")		#### do we still need this?
 things that should probably be restructured:
-@code{princ} in cmdloop.el
+@code{princ} in @file{cmdloop.el}
-@code{insert} in debug.el
+@code{insert} in @file{debug.el}
 face-interactive
-help.el, syntax.el all messed up
+@file{help.el}, @file{syntax.el} all messed up
 @end example
+Author: @uref{mailto:ben@@xemacs.org,Ben Wing}
 ben: (format) is a tricky case.  If I use format to create a string
 that I then send to a file, I probably don't want the string translated.
 On the other hand, If the string gets used as an argument to (y-or-n-p)
 or some such function, I do want it translated, and it needs to be
 translated before the %s and such are replaced.  The proper solution
 We can solve this by adding a bit to Lisp_String objects which identifies
 them as having been read as literal constants from a .el or .elc file (as
 opposed to having been constructed at run time as it would in the above
 case.)  To solve this:
-@example
+@itemize @bullet
-- @code{Fmessage()} takes a lisp string as its first argument.
+@item
-If that string is a constant, that is, was read from a source file
+@code{Fmessage()} takes a lisp string as its first argument.
-as a literal, then it calls @code{message()} with it, which translates.
+If that string is a constant, that is, was read from a source file
-Otherwise, it calls @code{message_no_translate()}, which does not translate.
+as a literal, then it calls @code{message()} with it, which translates.
+Otherwise, it calls @code{message_no_translate()}, which does not translate.
-- @code{Ferror()} (actually, @code{Fsignal()} when condition is Qerror) works similarly.
-@end example
+@item
+@code{Ferror()} (actually, @code{Fsignal()} when condition is Qerror) works similarly.
+@end itemize
 More specifically, we do:
 @quotation
 Scan specified C and Lisp files, extracting the following messages:
 it might run into problems if Arg is used for other sorts
 of functions.
 @item
 @code{snarf()} should be modified so that it doesn't output null
 strings and non-textual strings (see the comment at the top
-of make-msgfile.c).
+of @file{make-msgfile.c}).
 @item
 parsing of (insert) should snarf all of the arguments.
 @item
 need to add set-keymap-prompt and deal with gettext of that.
 @item
 @node Future Work -- Lisp Stream API, Future Work -- Multiple Values, Future Work -- Byte Code Snippets, Future Work
 @section Future Work -- Lisp Stream API
 @cindex future work, Lisp stream API
 @cindex Lisp stream API, future work
+Author: @uref{mailto:ben@@xemacs.org,Ben Wing}
 Expose XEmacs internal lstreams to Lisp as stream objects.  (In
 addition to the functions given below, each stream object has
 properties that can be associated with it using the standard put, get
 etc. API.  For GNU Emacs, where put and get have not been extended to
 @node Future Work -- Multiple Values, Future Work -- Macros, Future Work -- Lisp Stream API, Future Work
 @section Future Work -- Multiple Values
 @cindex future work, multiple values
 @cindex multiple values, future work
+Author: @uref{mailto:ben@@xemacs.org,Ben Wing}
 On low level, all funs that can return multiple values are defined
 with DEFUN_MULTIPLE_VALUES and have an extra parameter, a struct
 mv_context *.
 It has to be this way to ensure that only the fun itself, and no called
 @node Future Work -- Macros, Future Work -- Specifiers, Future Work -- Multiple Values, Future Work
 @section Future Work -- Macros
 @cindex future work, macros
 @cindex macros, future work
+Author: @uref{mailto:ben@@xemacs.org,Ben Wing}
 @enumerate
 @item
 Option to control whether beep really kills a macro execution.
 @item
 @node Future Work -- Specifiers, Future Work -- Display Tables, Future Work -- Macros, Future Work
 @section Future Work -- Specifiers
 @cindex future work, specifiers
 @cindex specifiers, future work
+Author: @uref{mailto:ben@@xemacs.org,Ben Wing}
 @subheading Ideas To Work On When Their Time Has Come
 @itemize
 @item
 @node Future Work -- Display Tables, Future Work -- Making Elisp Function Calls Faster, Future Work -- Specifiers, Future Work
 @section Future Work -- Display Tables
 @cindex future work, display tables
 @cindex display tables, future work
+Author: @uref{mailto:ben@@xemacs.org,Ben Wing}
 #### It would also be really nice if you could specify that the
 characters come out in hex instead of in octal.  Mule does that by
 adding a @code{ctl-hexa} variable similar to @code{ctl-arrow}, but
 that's bogus -- we need a more general solution.  I think you need to
 extend the concept of display tables into a more general conversion
 @end example
 Since more than one display table is possible, you have
 great flexibility in mapping ranges of characters.
-@uref{../../www.666.com/ben/default.htm,Ben Wing}
 @node Future Work -- Making Elisp Function Calls Faster, Future Work -- Lisp Engine Replacement, Future Work -- Display Tables, Future Work
 @section Future Work -- Making Elisp Function Calls Faster
 @cindex future work, making Elisp function calls faster
 @cindex making Elisp function calls faster, future work
+Author: @uref{mailto:ben@@xemacs.org,Ben Wing}
 @strong{Abstract: }This page describes many optimizations that can be
 made to the existing Elisp function call mechanism without too much
 effort.  The most important optimizations can probably be implemented
 with only a day or two of work.  I think it's important to do this work
 Calling @code{Fset()} to change the variable's value.
 @end enumerate
 @end enumerate
 The entire series of calls to @code{specbind()} should be inline and
 merged into the argument processing code as a single tight loop, with no
 function calls in the vast majority of cases.  The @code{specbind()}
 logic should be streamlined as follows:
 issue here is with symbols whose names begin with a colon.  These
 symbols should simply be disallowed completely as parameter names.)
 @end enumerate
 @end enumerate
 Other optimizations that could be done are:
 @itemize
 @item
 true and is false.  (Note: the optimization detailed in this item is
 probably not worth doing on the first pass.)
 @end itemize
-@uref{../../www.666.com/ben/default.htm,Ben Wing}
 @node Future Work -- Lisp Engine Replacement,  , Future Work -- Making Elisp Function Calls Faster, Future Work
 @section Future Work -- Lisp Engine Replacement
 @cindex future work, lisp engine replacement
 @cindex lisp engine replacement, future work
 @menu
 * Future Work -- Lisp Engine Discussion::
 * Future Work -- Lisp Engine Replacement -- Implementation::
+* Future Work -- Startup File Modification by Packages::
 @end menu
 @node Future Work -- Lisp Engine Discussion, Future Work -- Lisp Engine Replacement -- Implementation, Future Work -- Lisp Engine Replacement, Future Work -- Lisp Engine Replacement
 @subsection Future Work -- Lisp Engine Discussion
 @cindex future work, lisp engine discussion
 @cindex lisp engine discussion, future work
+Author: @uref{mailto:ben@@xemacs.org,Ben Wing}
 @strong{Abstract: }Recently there has been a great deal of talk on the
 XEmacs mailing lists about potential changes to the XEmacs Lisp engine.
 Usually the discussion has centered around the question which is better,
 Common Lisp or Scheme?  This is certainly an interesting debate topic,
 to make this safe would be to do conservative garbage collection over
 the C stack and to eliminate the GCPRO declarations entirely.  But how
 many of the Lisp engines that are being considered have such a mechanism
 built into them?
 @subsubheading Maintainability.
 A new Lisp engine might well improve the maintainability of XEmacs by
 offloading the maintenance of the Lisp engine.  However, we need to make
 very sure that this is, in fact, the case before embarking on a project
 naturally in an object-oriented system.  However, neither Scheme nor
 Common Lisp has been designed with object orientation in mind.  There is
 a standard object system for Common Lisp, but it is extremely complex
 and difficult to understand.
+@node Future Work -- Lisp Engine Replacement -- Implementation, Future Work -- Startup File Modification by Packages, Future Work -- Lisp Engine Discussion, Future Work -- Lisp Engine Replacement
-@uref{../../www.666.com/ben/default.htm,Ben Wing}
-@node Future Work -- Lisp Engine Replacement -- Implementation,  , Future Work -- Lisp Engine Discussion, Future Work -- Lisp Engine Replacement
 @subsection Future Work -- Lisp Engine Replacement -- Implementation
 @cindex future work, lisp engine replacement, implementation
 @cindex lisp engine replacement, implementation, future work
+Author: @uref{mailto:ben@@xemacs.org,Ben Wing}
 Let's take a look at the sort of work that would be required if we were
 to replace the existing Elisp engine in XEmacs with some other engine,
 for example, the Clisp engine.  I'm assuming here, of course, that we
 are not going to be changing the interface here at the same time, which
 something special needs to happen when this is done.  This could be
 handled fairly easily by having our new and improved @code{DEFUN} macro
 define a new macro for use when calling a primitive.
 @end enumerate
 @subsubheading Make the Existing Lisp Engine be Self-contained.
 The goal of this stage is to gradually build up a self-contained Lisp
 engine out of the existing XEmacs core, which has no dependencies on any
 of the code elsewhere in the XEmacs core, and has a well-defined and
 again on the old and buggy interfaced Lisp engine, it would note the
 bug.
 @end enumerate
+@node Future Work -- Startup File Modification by Packages,  , Future Work -- Lisp Engine Replacement -- Implementation, Future Work -- Lisp Engine Replacement
-@uref{../../www.666.com/ben/default.htm,Ben Wing}
+@subsection Future Work -- Startup File Modification by Packages
+@cindex future work, startup file modification by packages
+@cindex startup file modification by packages, future work
+Author: @uref{mailto:ben@@xemacs.org,Ben Wing}
+OK, we need to create a design document for all of this, including:
+PRINCIPLE #1: Whenever you have auto-generated stuff, @strong{CLEARLY}
+indicate this in comments around the stuff.  These comments get
+searched for, and used to locate the existing generated stuff to
+replace.  Custom currently doesn't do this.
+PRINCIPLE #2: Currently, lots of functions want to add code to the
+.emacs. (e.g. I get prompted for my mail address from
+add-change-log-entry, and then prompted if I want to make this
+permanent).  There needs to be a Lisp API for working with arbitrary
+code to be added to a user's startup.  This API hides all the details
+of which file to put the fragment in, where in it, how to mark it with
+magical comments of the right kind so that previous fragments can be
+replaced, etc.
+PRINCIPLE #3: @strong{ALL} generated stuff should be loaded before any
+user-written init stuff.  This way the user can override the generated
+settings.  Although in the case of customize, it may work when the
+custom stuff is at the end of the init file, it surely won't work for
+arbitrary code fragments (which typically do @code{setq} or the like).
+PRINCIPLE #4: As much as possible, generated stuff should be place in
+separate files from non-generated stuff.  Otherwise it's inevitable
+that some corruption is going to result.
+PRINCIPLE #5: Packages are encouraged, as much as possible, to work
+within the customize model and store all their customizations there.
+However, if they really need to have their own init files, these files
+should be placed in .xemacs/, given normal names
+(e.g. @file{saved-abbrevs.el} not .abbrevs), and there should be some magic
+comment at the top of the file that causes it to get automatically
+loaded while loading a user's init file. (Alternatively, the
+above-named API could specify a function that lets a package specify
+that they want such-and-such file loaded from the init file, and have
+the specifics of this get handled correctly.)
+OVERARCHING GOAL: The overarching goal is to provide a unified
+mechanism for packages to store state and setting information about
+the user and what they were doing when XEmacs exited, so that the same
+or a similar environment can be automatically set up the next time.
+In general, we are working more and more towards being a truly GUI app
+where users' settings are easy to change and get remembered correctly
+and consistently from one session to the next, rather than requiring
+nasty hacking in elisp.
+Hrvoje, do you have any interest in this?  How about you, Martin?
+This seems like it might be up your alley.  This stuff has been
+ad-hocked since kingdom come, and it's high time that we make this
+work properly so that it could be relied upon, and a lot of things
+could "just work".
 @node Future Work Discussion, Old Future Work, Future Work, Top
 @chapter Future Work Discussion
 @cindex future work, discussion
 @cindex discussion, future work
 into the normal Future Work section.
 @menu
 * Discussion -- garbage collection::
 * Discussion -- glyphs::
+* Discussion -- Dialog Boxes::
+* Discussion -- Multilingual Issues::
+* Discussion -- Windows External Widget::
+* Discussion -- Packages::
+* Discussion -- Distribution Layout::
 @end menu
 @node Discussion -- garbage collection, Discussion -- glyphs, Future Work Discussion, Future Work Discussion
 @section Discussion -- garbage collection
 @cindex discussion, garbage collection
 @cindex garbage collection, discussion
-@example
 On Tue, Oct 12, 1999 at 03:36:59AM -0700, Ben Wing wrote:
-@end example
 So what am I missing here?
-@example
 In response, Olivier Galibert wrote:
-@end example
 Two things:
 @enumerate
 @item
 The purespace is gone
 was used.
 @item
 move the markbit outside of the lrecord.
 @end itemize
 The second solution is more appealing to me for a bunch of reasons:
 @itemize @bullet
 @item
 more things are shared  than only what  is purecopied (not yet  used
 functions come to mind)
 So no,  it's  not a _necessity_.   But  it  helps.  And the  automatic
 sharing of  all objects until  you write  to   them explicitely is,  I
 think, really cool.
 @end enumerate
-@example
 On 10/12/1999 5:49 PM Ben Wing wrote:
 Subject: Re: hashtable-based marking and cleanups
-@end example
 OK, I can see the advantages.  But:
 @enumerate
 @item
 @example
 http://www.amazon.com/exec/obidos/ASIN/0471941484/qid=939775572/sr=1-1/002-3092633-2509405
 @end example
-@node Discussion -- glyphs,  , Discussion -- garbage collection, Future Work Discussion
+@node Discussion -- glyphs, Discussion -- Dialog Boxes, Discussion -- garbage collection, Future Work Discussion
 @section Discussion -- glyphs
 @cindex discussion, glyphs
 @cindex glyphs, discussion
 Some comments (not always pretty!) by Ben:
-@example
 March 20, 2000
 Andy, I use the tab widgets but I've been having lots of problems.
 1] Sometimes clicking on them does nothing.
 to the front of the buffer list, like it should.  It looks like you're
 doing this to avoid having the order of the tabs change, but this is
 wrong: If you don't reorder the buffer list, everything else gets
 screwed up.  If you want the order of the tabs not to change, you need
 to decouple this order from the buffer list order.
-@end example
-@example
 March 23, 2000
 I'm very confused.  The SIGIO timer is used @strong{only} for C-g.  It has
 nothing to do with any other events.  (sit-for 0) ought to
 leery of introducing new Lisp functions to deal with specific problems.
 Pretty soon we end up with a whole bevy of such ill-defined functions,
 like we already have.  I think instead, you should introduce the
 following primitive:
+@example
 (wait-for-event redisplay &rest event-specs)
+@end example
 Waits for one of the event specifications specified to happen.  Returns
 something about what happened.
 REDISPLAY controls the behavior of redisplay during waiting.  Something
 like
-- nil (never redisplay),
+@itemize @bullet
-- t (redisplay when it seems appropriate), etc.
+@item
+nil (never redisplay),
+@item
+t (redisplay when it seems appropriate), etc.
+@end itemize
 EVENT-SPECS could be
+@example
 t                     -- drain all non-user events, and then return
 any-process           -- wait till input or state change on any process
 process               -- wait till input or state change on process
 time                  -- wait till such-and-such time has elapsed
 'user                 -- wait till user event has happened
 '(user predicate)     -- wait till user event matching the predicate has
 happened
 'event                -- wait till any event has happened
 '(event predicate)    -- wait till event matching the predicate has happened
+@end example
 The existing functions @code{next-event}, @code{next-command-event},
 @code{accept-process-output}, @code{sit-for}, @code{sleep-for}, etc. could all be
 written in terms of this new command.  You could use this command inside
 of your glyph code to ensure that the events get processed that need do
 in order for widget updates to happen.
 But you said something about need a magic event to invoke redisplay?
 Why is that?
-@end example
-@example
 April 2, 2000
 the internal distinction between "widget" and "layout" is bogus.  there
 exist widgets that do drawing and do layout of their children,
 e.g. group-box widgets and proper tab widgets.  the only sensible
 distinction is between widgets with children and those without children.
-@end example
-@example
 April 5, 2000
 andy, i'm not sure i really believe that you need to cycle the event
 code to get widgets to redisplay, but in any case you should
 @end enumerate
 in other words, dispatch-non-command-events must go, and i am proposing
 a general function (redisplay OBJECT) to replace the existing ad-hoc
 functions.
-@end example
-@example
 April 6, 2000
 the tab widget code should simply be able to create a whole lot of tabs
 without regard to the size of the gutter, and the surrounding layout
 widget (please please make layouts be proper widgets!) should
 automatically map and unmap them as necessary, to fill up the available
 space.  perhaps this already works and what you're doing is just for
 optimization?  but i get the feeling this is not the case.
-@end example
-@example
 April 6, 2000
 the function make-gutter-only-dialog-frame is bogus.  the use of the
 gutter here to hold widgets is an implementation detail and should not
 be exposed in the interface.  similarly, make-search-dialog should not
 hidden.  you should have a simple function make-dialog-frame that takes
 a dialog specification, and that's all you need to do.
 also, these dialog boxes, and this function make-dialog-frame, should
-a] be in dialog.el, not gutter-items.el.
+@enumerate
-b] when possible, be placed in the interactive spec of standard lisp
+@item
-functions rather than accessed directly from menubar-items.el
+be in @file{dialog.el}, not gutter-items.el.
-c] wrapped in calls to should-use-dialog-box-p, so the user has control
+@item
+when possible, be placed in the interactive spec of standard lisp
+functions rather than accessed directly from @file{menubar-items.el}
+@item
+wrapped in calls to should-use-dialog-box-p, so the user has control
 over when dialog boxes appear.
-@end example
+@end enumerate
-@example
 April 7, 2000
 hmmm ...  in that case, the whitespace absolutely needs to be specified
 as properties of the layout widget (e.g. :border-width and
 :border-height), rather than setting an overall size.  you have no idea
 what the correct size should be if the user changes font size or uses
 translations in a different language.
 Your modus operandi should be "hardcoded pixel sizes are @strong{always} bad."
-@end example
-@example
 April 7, 2000
 you mean the number of tabs adjusts, or the size of each tab adjusts (by
 making the font smaller or something)?  if the size of a single tab is
 not related to the total space the tabs can fix into, then it should be
 a maximum width (which should be done in 'n' sizes, not in pixels!).
 i won't stop complaining until i see nearly every one of those
 pixel-width and pixel-height parameters gone, and the remaining ones
 there for a very, very good reason.
-@end example
+April 7, 2000
+Andy Piper wrote:
 @example
-April 7, 2000
-Andy Piper wrote:
 > At 03:51 PM 4/6/00 -0700, Ben Wing wrote:
 > >[the function make-gutter-only-dialog-frame is bogus]
 >
 > The problem is that some of the callbacks and such need access to the
 > @strong{created} frame, so you end up in a catch 22 unless you do what I've done.
+@end example
 [Ben proposes other ways to avoid exposing all the guts, as in
 @code{make-gutter-only-dialog-frame}:]
 @enumerate
 (depending on where the glyph is) where the invocation actually
 happened.  That way, the callbacks can easily figure out the dialog
 box and its parent, and not have to worry about embedding it in at
 creation time.
 @end enumerate
-@end example
-@example
 April 15, 2000
 I don't understand when you say "the various types of callback".  Are
 you using the callback for various different purposes?
 Your widget callbacks should work just like any other callback: they
 take two arguments, one indicating the object to which the callback was
 attached (an image instance, i think), and the event that caused the
 callback to be invoked.
-@end example
-@example
 April 17, 2000
 I am completely vetoing widget-callback-current-channel.  How about you
 create a new keyword, :new-callback, that is a function of two args,
 like i specified before.
 result as widget-callback-current-channel.
 the problem with this and everything you've proposed is that there's no
 way, of course, to get at the actual widget that you were invoked from.
 would you propose adding widget-callback-current-widget?
+@node Discussion -- Dialog Boxes, Discussion -- Multilingual Issues, Discussion -- glyphs, Future Work Discussion
+@section Discussion -- Dialog Boxes
+@cindex discussion, dialog boxes
+@cindex dialog boxes, discussion
+@example
+From:
+Ben Wing <ben@@666.com>
+10/7/1999 5:57 PM
+Subject:
+Re: Animated gif patch (2)
+To:
+Andy Piper <andy@@xemacs.org>
+CC:
+xemacs-review@@xemacs.org, xemacs-beta@@xemacs.org
+The distinction between layouts and widgets makes no sense, so you should combine
+the different data required.  Consider a grouping widget.  Is this a layout or a
+widget?  It draws, like a widget, but has children, like a layout.  Same for a tab
+widget, properly implemented.  It draws, handles input, has children, and makes
+choices about how to lay them out.
+ben
+From:
+Ben Wing <ben@@666.com>
+9/7/1999 8:50 PM
+Subject:
+Re: Layouts done
+To:
+Andy Piper <andyp@@beasys.com>
+this sounds great!  where can i see the code?
+as for user-defined layouts, you must certainly have some sort of abstraction
+layer for layouts, with DEFINE_LAYOUT_TYPE or something similar just like device
+types and such.  If not, you should certainly make one ...  it would have methods
+such as query-geometry and do-layout.  It should be easy to create a user-defined
+layout if you have such an abstraction.
+with a user-defined layout, complex built-in layouts such as grid should not be
+necessary because it's so easy to write snippets of lisp.
+as for the "redisplay too much" problem, perhaps you could put a dirty flag in
+each glyph indicating whether it needs to be redisplayed, recalculated, etc.?
+Andy Piper wrote:
+> You may want to check them out. I haven't done the user-defined layout
+> callback - I'm not sure what sort of API this could have.  Keywords I've done:
+>
+> :orientation - vertical or horizontal
+> :justify - left, center or right
+> :border - etch-in, etch-out, bevel-in, bevel -out or text (which gives you
+> etch-in with a title)
+>
+> You can embed any glyph type in a layout.
+>
+> There is probably room for improvements for justify to do grid-type layouts
+> as per java.
+>
+> The only annoying thing is that I've hacked up font-lock support to do a
+> progress gauge in the gutter area. I've used a layout to set things out
+> correctly. The problem is if you change one of the sub-widgets, the whole
+> layout gets redisplayed because it is treated as a single glyph by redisplay.
+>
+> Oh, and I've done line based scrolling so that glyphs scroll off the page
+> in units of the average display line height rather than the whole line at
+> once. This could easily be converted to pixel scrolling but would be very
+> slow I fear.
+>
+> andy
+> --------------------------------------------------------------
+> Dr Andy Piper
+> Senior Consultant Architect, BEA Systems Ltd
+From:
+Ben Wing <ben@@666.com>
+8/10/1999 11:11 PM
+Subject:
+Re: Widgets
+To:
+Andy Piper <andy@@xemacs.org>
+I think you might have misinterpreted what i meant.  I meant to say that XEmacs should
+implement the @strong{concept} of a hierarchy of nested child "widgets" or "gui items" or
+whatever we want to call them -- this includes container "widgets" such as grouping
+widgets (which draw a border around the children, like in Windows), tab widgets, simple
+layout widgets (invisible, but lay out their children appropriately), etc, plus leaf
+"widgets" (buttons, sliders, etc., also standard Emacs windows).  The layout calculations
+for these widgets would be handled entirely by XEmacs in a window-system-independent way.
+There is no need to create a corresponding hierarchy of window-system
+widgets/controls/whatever if it's not required, and certainly no need to try to use the
+window-system-supplied geometry management routines.  It's absolutely necessary to support
+this nesting concept in XEmacs, however, or it's impossible to have easily-designable
+dialog boxes.  On the other hand, I think it @strong{is} required to create much of this
+hierarchy within the actual window system, at the very least for non-invisible container
+widgets (tab, grouping, etc.), otherwise we will have very bogus, non-native-looking
+containers like your current tab-widget implementation.  It's critical for XEmacs to be
+able to create dialog boxes in Windows or Motif that look just like those in any other
+standard application.  Otherwise people will continue to think that XEmacs is a
+backwards-looking, badly implemented piece of software, which in many ways it is,
+particularly in regards to its user interface.
+Perhaps we should talk on the phone?  This typing is quite hard for me still.  What hours
+are you at work?  My hours are approx. 2pm - 2am Pacific time (GMT - 7 hours currently).
+ben
+From:
+Ben Wing <ben@@666.com>
+7/21/1999 2:44 AM
+Subject:
+Re: Tabs 'n widgets screenshot
+To:
+Andy Piper <andy@@xemacs.org>
+CC:
+xemacs-beta@@xemacs.org, wmperry@@aventail.com
+This is real cool, but looking at this, it's clear that it doesn't look the
+way tab widgets are supposed to work.  In particular, of course, they should
+have the proper borders around the stuff displayed.  I've attached a screen
+shot of a typical Windows dialog box with a tab widget in it.  The problem
+lies with this "expanded gutter" concept.  Tabs are @strong{NOT} extra graphical junk
+placed in the gutters of a buffer but are GUI objects with @strong{children} inside
+of them.  This is the right way to do things, and you would need no extra
+gutter functionality at all for this.  You just need to implement the concept
+of GUI objects containing other GUI objects within them.  One such GUI object
+needs to be a "Emacs-text" GUI object, which is an Emacs window and contains a
+buffer within it.  At this level, you need not be concerned with the
+complexities of geometry layout.  The only change that needs to be made in the
+overall strategy of frames, windows, etc. is that windows need not be exactly
+contiguous and tiled, as long as they are contained within a frame.  Or more
+specifically: Given that you could always split a window contained inside a
+GUI object, we just need to expand things so that each frame has @strong{multiple}
+hierarchies of windows in it, rather than just one.  A hierarchy of windows
+can nest inside of another window -- e.g. I put a tab widget or a text widget
+inside of a buffer.  This should be easy to implement -- just change things so
+there are multiple hierarchies of windows where there are one, each (except
+the top-level one) being rooted inside some other window.
+Anyone willing to implement this? Andy?
+From:
+Ben Wing <ben@@666.com>
+6/30/1999 3:30 PM
+Subject:
+Re: Focus Help!
+To:
+Andy Piper <andy@@xemacs.org>
+CC:
+Ben Wing <ben@@xemacs.org>, martin@@xemacs.org, andyp@@beasys.com
+It sounds like you're doing very good work.  It also sounds like the approach
+you have followed is the correct one.  Now, it seems like there isn't really
+that much work left to get dialog boxes working.  What you really just need to
+do is implement container widgets, that is to say, subwindows that can contain
+other subwindows.  For example, the tab widget works this way. (It sounds like
+you have already implemented tab widgets, so I don't quite see how you've done
+this without the concept of container widgets.) So you might just try adding a
+framework for container widgets and then implementing very simple container
+widgets.  The basic container widgets are:
+1. A vertical-layout widget, which draws nothing itself and lays out its
+children one above the next.
+2. A horizontal-layout widget, which draws nothing itself and lays out its
+children side-to-side.
+3. A box (or "grouping") widget, which draws a rectangle around its single child
+and optionally draws some text on the top or bottom line of the rectangle.
+4. A tab widget, which displays a series of tabs horizontally at the top of its
+area, and then below it places one of its children,
+corresponding to the selected tab.
+5. A user widget, which draws nothing itself and does no layout at all on its
+children, except that it has a "layout callback"
+property, a Lisp function, so that the programmer can control the layout.
+The framework is as follows:
+1. Every widget has at least the following properties:
+a) a size, whose value can be "unspecified", which might be implemented
+using the value -1.  The default value should be "unspecified".
+b) whether it's mapped, i.e. whether it will be displayed. (Some container
+widgets, such as the tab widget, set the mapped
+property themselves on their children.  Others, such as the vertical and
+horizontal layout widgets, don't change this property but pay attention to it,
+and ignore completely all children marked as unmapped.) The default value should
+be "true".
+c) whether its size can be changed by another widget's layout routine. The
+default value should be "true".
+d) a layout procedure, which (potentially at least) determines the size of
+the widget as well as the position, size and mappedness of its child widgets.
+The layout procedure is inherent in the widget and is not an external property
+of the widget (except in the case of the "user widget"): it is instead more like
+the redisplay callback that each widget has.
+2. Every container widget contains a property which is a list of child widgets.
+3. Every child widget contains the following properties:
+a) a position indicating where the child is located relative to the top
+left corner of its parent.  The position's value can be "unspecified", which
+might be implemented using the value -1.  The default value should be
+"unspecified".
+b) whether its position can be changed by another widget's layout routine.
+The default value should be "true".
+4. All of the properties just listed (except possibly the layout procedure) can
+be modified directly by the programmer, and there are no proscriptions against
+doing so.  However, if the programmer wants to resize, reposition, map or unmap
+a widget in such a way that the layout of all the other widgets in the tree
+changes appropriately, he should use a special function to change the property,
+as described below.
+The redisplay mechanism pays attention to the position, size, and mappedness
+properties and to the hierarchy of widgets, mapping, resizing and repositioning
+the corresponding subwindows (the "real representation" of the widgets) as
+necessary.  It also pays attention to the hierarchy of the widgets, making sure
+that container subwindows get drawn before their child subwindows.  When it
+encounters widgets with an unspecified size, it should not draw them, and should
+issue a warning.  When it encounters widgets with an unspecified position, it
+should draw them at position (0, 0) and should issue a warning.
+The above framework should be fairly simple to implement and is basically
+universal across all high-level windowing system toolkits.  The stickyness comes
+with what procedures you follow for getting the layout done.
+Andy, I understand that implementing this may seem like a daunting task.
+Therefore, I propose that at first you implement the above framework but don't
+implement any of the layout procedures, or any of the functions that call them:
+Just make them stubs that do nothing.  This way, the Lisp programmer can still
+create any dialog boxes he wants, he just has to set the sizes and positions of
+all the widgets explicitly, and then recompute them whenever the widget tree is
+resized (once you get around to allowing this).  I have a lot more to write
+about exactly how the layout procedures work, but I'll send that to you later
+once you're ready.
+You should also think about making a way to have widget trees as top-level
+windows rather than just glyphs in a buffer.  There's already the concept of
+"popup" frames.  You could provide an easy way to create a popup frame with no
+menu, toolbars, scrollbars, modeline or minibuffer, and put a single glyph in
+the displayed buffer that takes up the whole Emacs window.
+Ben
+March 20, 2000
+You wrote to me awhile ago about this and asked about documentation, and I
+dictated a response but never got it sent, so here it is:
+I don't think there's any more documentation on how things work under Xt but it
+should be clear. The EmacsFrame widget is the widget corresponding to the X
+window that Emacs draws into and there is a handler for expose events called
+from Xt which arranges for the invalidated areas to get redrawn. I think this
+used to happen as part of the handler itself but now it is delayed until the
+next call to redisplay.
+However, one thing that you absolutely must not do is remove the Xt support.
+This would be an incredibly unfriendly thing to do as it would prevent people
+from using any widget set other than Qt or GTK. Keep in mind that people run
+XEmacs on all sorts of different versions of X in Unix, and Xt is the standard
+and the only toolkit that probably exists on all of these systems.
+Pardon me if I've misunderstood your intentions w.r.t. this.
+As for how you would implement GTK support, it will not be very hard to convert
+redisplay to draw into a GTK window instead of an Xt window. In fact redisplay
+basically doesn't know about Xt at all, except in the portion that handles
+updating menubars and scrollbars and stuff that's directly related to Xt.
+What you'd probably want to do is create a new set of event routines to replace
+the ones in event-Xt.c. On the display side you could conceivably create a new
+device type but you probably wouldn't want to do that because it would be an
+externally visible change at the Lisp level. You might simply want to put a
+flag on each frame indicating what sort of toolkit the frame was created under
+and put conditions in the redisplay code and the code to update toolbars and
+menubars and so forth to test this flag and do the appropriate thing.
+April 12, 2000
+This is way cool, buuuuutttttttt .............
+what we @strong{really} need is the GUI interface on top of it.  I've taken a shot at
+it with generic-print-buffer
+(print-buffer is taken by lpr, which is such a total mess that it needs to be
+trashed; or at least, the generic
+stuff in this package needs to be taken out and properly genericized).  For
+the moment, generic-print-buffer
+just does something like what Kirill's been posting if we're running windows,
+and uses lpr otherwards.  However, what we absofuckinglutely need is a Lisp
+interface onto @code{EnumPrinters()} so that we can get the
+list of printers and have a nice menu listing the available printers, and you
+can check the one you want.  People in the Windows world don't normally even
+know the names of their local printers!
+Kirill, given what I've done in @file{simple.el} and @file{menubar-items.el}, do you think
+you could add the @code{EnumPrinters()}
+support and fix up the GUI?  If you don't feel comfortable with the GUI, at
+least do the @code{EnumPrinters()}.
+But ...  Kirill, I tried your formula for printing and nothing happened.
+Perhaps I didn't call redisplay-frame or something?  You need to fix this up
+and make it work for multi-page documents. (Again, this is in
+generic-print-buffer.)  Nothing special, it just needs to fucking work!  There
+are zillions and zillions of postings every day on xemacs-nt about how to get
+printing working, and none seem to refer to the built-in support.
+ben
+April 19, 2000
+Kirill 'Big K' Katsnelson wrote:
+> Some time ago, Ben Wing wrote...
+> >kirill, the interface i created is more general, like this:
+>
+> [snip]
+>
+> >Unfortunately I haven't implemented much of this; just some of the file
+> >dialog box.  but i think
+> >this is better than creating new mswindows-specific primitives.  if you
+> >are interested in working on
+> >this, i'll send you the code i have.
+>
+> Sure. Can you just commit it for my starting point?
+>
+> >also, the dialogs shouldn't have anything directly to do with the printer
+> >device.  all they should
+> >do is return a set of values.  it's the caller's responsibility to
+> >interpret them and set device
+> >properties accordingly.  this way, there's a complete separation between
+> >the underlying
+> >functionality and the gui.
+>
+> Unfortunately. I thought about doing it this way, but we then lose a lot of
+> printer-specific setup in this case. The DEVMODE structure contains two
+> parts: printer independent, as defined by SDK typedef DEVMODE, and
+> some trailing bytes, of unknown structure, used by a driver. The driver
+> only returns the extra length it wants. Such options as PCL ReT resolution
+> enhancement options or PostScript negative output are not available
+> through the standard part of the devmode structure, and stored in the
+> driver part (printer dialogs are driver-specific).
+>
+> So we have total of three options:
+> - Not to implement options beyond standard DEVMODE
+> - Make DEVMODE a Lisp object.
+> - Hide DEVMODE inside the device object.
+>
+> First case looks cheesy. Letting DEVMODE fall off the printer is no good
+> either, since one needs both the device and the devmode to edit the
+> devmode, and they must match. I am still convinced that the devmode and
+> the printer should not be separated.
+hmm, i see ...  this completely breaks abstraction though.  it fails in various
+scenarios, e.g. a program wants to initialize the dialog box with certain
+non-driver-specific properties, without caring about the particular printer.
+i think you should create a new print-properties object that encapsulates all
+printer properties (which can be changed using get/put), including the printer
+name, and contains a DEVMODE in it.  if the printer name gets changed, the
+DEVMODE might change too, but the print-properties object itself stays the
+same.  you pass this object as a parameter to the dialog box, and it gets
+changed accordingly.  you can call something like set-device-print-properties to
+stick everything in this structure into the device. (you could imagine a case
+where someone wanted to keep multiple print configurations around ...)
+>
+>
+> Big K
+--
+Ben
+@end example
+@node Discussion -- Multilingual Issues, Discussion -- Windows External Widget, Discussion -- Dialog Boxes, Future Work Discussion
+@section Discussion -- Multilingual Issues
+@cindex discussion, multilingual issues
+@cindex multilingual issues, discussion
+@example
+4/10/2000 4:13 AM
+BTW I am planning on adding some more powerful font-mapping capabilities to
+XEmacs (i.e. how do we map particular characters to the proper fonts that can
+display them, and how do we map the character's codes to the indices into the
+font).  These will replace to hackish charset-registry/charset-ccl-program stuff
+we currently have, and be [a] much more powerful, [b] designed in a
+window-system-independent way, [c] works with specifiers so you can control the
+mapping of individual buffers, and [d] works on a character rather than charset
+level, to correctly handle Unicode.  One possible usage would be to declare that
+all latin1 in a particular buffer to be displayed with latin2 fonts; I bet
+Hrvoje would really appreciate that
+---------------------------------------------------------------------------
+April 10, 2000
+[info from "creation of generic macros for accessing internally formatted data"]
+Hmm, so there I just wrote a detailed design for the macros.  I would be
+@strong{THRILLED} and overjoyed if you went ahead and implemented this mechanism, or
+parts of it.
+I've just finished arranging for a new transcriptionist, and soon I should be
+able to send off and get back my dictation of my (a) exposing streams to lisp,
+and (b) allowing for proper lisp-created coding systems, which define their
+reading, writing, and detecting methods in lisp.
+BTW How's it going wrt your Unicode and decode-priority stuff?
+And ...  you sent me mail asking what it was you had promised me, and listed
+only one thing, which was
+profiling of vm and certain other operations you found showed tremendous
+slowdown with Japanese characters.  The other main thing I want from you is
+-- Your priorities, as an actual Japanese user and XEmacs developer,
+concerning what MULE work should be done, how it should be done, in what
+order, etc.
+I'm sure there's something else, but it's been awhile since I took my sleeping
+dose and my brain can barely function anymore.  Just let me know how you're
+going to proceed with the above macro changes.
+BTW there's some nice Perl scripts written by Martin and fixed by me to make
+global-search-and-replace
+much, much easier.  I've attached them.  The first one is a shell script that
+works like
+gr foo bar *.[ch]
+and replaces foo with bar in all of the files.  For each modified file, a
+backup is created in the backup/ directory, which is created as necessary.
+This shell script is a fairly trivial front end onto global-replace2, which is
+a perl script that takes one argument (a Perl expression such as s/foo/bar/g)
+and a list of files obtained by reading the stdin, and does the same global
+replacement.  This means that the regexp syntax used here has to be perl-style
+rather than standard emacs/grep style.
+ben
+---------------------------------------------------------------------
+From:
+Ben Wing <ben@@666.com>
+12/23/1999 3:34 AM
+Subject:
+Re: check process state before accessing coding_stream (fix PR#1061)
+To:
+"Stephen J. Turnbull" <turnbull@@sk.tsukuba.ac.jp>
+CC:
+XEmacs Developers <xemacs-beta@@xemacs.org>
+Thankfully, nearly all of this horridity you bring up is irrelevant.  In
+XEmacs, "gettext" does not refer to any standard API, but is merely a stand-in
+for a translation routine (presumably written by us).  We may as well call it
+something else.  We define our own concept of "current language".  We also
+allow for a function that needs a different version for each language, which
+handles all cases where simple translation isn't sufficient, e.g. when you
+have to pluralize some noun given to you or insert the correct form of the
+definite article.  No weird hacks needed.  No interaction problems with other
+pieces of software.
+What I wrote "awhile ago" is (unfortunately) not anywhere public currently,
+but it's on my list to put it on the web site.  "There you go again" is
+usually not true; most of what I quote was indeed put out publicly at some
+point, but I'll try to be more explicit about this in the future.
+ben
+"Stephen J. Turnbull" wrote:
+> >>>>> "Ben" == Ben Wing <ben@@666.com> writes:
+>
+>     Ben> "Stephen J. Turnbull" wrote:
+>
+>     >> What I have in mind is not just gettext-izing everything in the
+>     >> XEmacs core sources.  I currently believe that to be
+>     >> unacceptable
+>
+>     Ben> I don't quite understand.  Could you elaborate and give some
+>     Ben> examples?
+>
+> Examples?  Hmm.
+>
+> First, there's the surface of Jan's y-or-n-p example.  You have to
+> coordinate the translation of the message string and the response
+> prompt.  This is handled by y-or-n-p itself (I see that we already do
+> have gettext for Emacs Lisp, that's nice to know).
+>
+> Except that it's not really handled by y-or-n-p.  There's no reason to
+> suppose that somebody writing a Lisp package would necessarily use the
+> XEmacs domain (in fact, due to the way gettext binds text domains---if
+> I understand that correctly---we don't want that to be the case,
+> because it means that every time a Lisp package is updated the whole
+> XEmacs catalog must also be updated).  So which domain gets used for
+> the message string?
+>
+> In the current implementation, it is the domain of y-or-n-p.  So
+> packages with their own domain won't get y-or-n-p prompts correctly
+> translated.  But that means that the package should do its own
+> translation.  But now you're applying gettext to the same string
+> twice; you just have to pray the that translator upstream doesn't
+> collide with an English string that's in the XEmacs domain.  (The
+> gettext docs mention the similar problem of English words with
+> multiple meanings that must map to different words in the target
+> language; this can be disambiguated by various trickeries in forming
+> the strings ... but only if you "own" them, which in the multi-domain,
+> interated gettext example you do not.)  AFAICT this means that you
+> must never pass untranslated strings across public APIs, but this may
+> or may not be reasonable, and certainly is inconvenient.
+>
+> Next, we have to translate the possible answer strings to match the
+> language being passed by the user.  This is presumably OK here,
+> because it's done by y-or-n-p.  But what if y-or-n-p returned a string
+> rather than a boolean?  Then we would need to coordinate the
+> presentation of the prompt (done by y-or-n-p) and the translation of
+> the possible answer strings (done by the caller).  This can in fact be
+> done using dgettext with the XEmacs domain, but you must know that
+> y-or-n-p is in the XEmacs domain.  This is not necessarily going to be
+> obvious, and it might very well be that sets of related packages might
+> have the same domain, so you wouldn't necessarily know which domain is
+> appropriate by looking at the requires.
+>
+> And what happens if one domain does supply translations for a language
+> and the other does not?  AFAIK, gettext has no way to find out if this
+> is the case.  But you might very will prefer a global fallback to
+> English if substantial phrases are drawn from both domains, while you
+> might prefer string-by-string fallback if the main text is translated
+> and only a few words are left to fallback to English.
+>
+> Aside from confusing users, this puts a great burden on programmers.
+> Programmers need to know about the status of the domains of packages
+> they use as well as the XEmacs domain; they need to program
+> defensively against the possibility that some package they use will
+> become gettext-ized, or the translation projects will be out of synch
+> (some teams will do the calling package first, others will do the
+> caller package first).
+>
+> I don't think anybody will use gettext in these circumstances.  At
+> least not after they get the first bug report that "XEmacs is stuck in
+> an infinite y-or-n-p loop and I can't get out."
+>
+>     Ben> I wrote this awhile ago:
+>
+> "There you go again."  Not anywhere I could see it!  (At least, it
+> doesn't look familiar and grepping the archives doesn't turn it up.)
+>
+> OK, you win.  Subscribe me to xemacs-review.  Or whatever seems
+> appropriate.
+>
+> --
+> University of Tsukuba                Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
+> Institute of Policy and Planning Sciences       Tel/fax: +81 (298) 53-5091
+> _________________  _________________  _________________  _________________
+> What are those straight lines for?  "XEmacs rules."
+--
+In order to save my hands, I am cutting back on my responses, especially
+to XEmacs-related mail.  You _will_ get a response, but please be patient.
+If you need an immediate response and it is not apparent in your message,
+please say so.  Thanks for your understanding.
+--------------------------------------------------------------------
+From:
+Ben Wing <ben@@666.com>
+12/21/1999 2:22 AM
+Subject:
+Re: check process state before accessing coding_stream (fix PR#1061)
+To:
+"Stephen J. Turnbull" <turnbull@@sk.tsukuba.ac.jp>
+CC:
+XEmacs Developers <xemacs-beta@@xemacs.org>
+"Stephen J. Turnbull" wrote:
+> >>>>> "Ben" == Ben Wing <ben@@666.com> writes:
+>
+>     Ben> Implementing message translation is not that hard.
+>
+> What I have in mind is not just gettext-izing everything in the XEmacs
+> core sources.  I currently believe that to be unacceptable (see Jan's
+> message for the pitfalls in I18N; it's worse for M17N).  I think
+> really solving this problem needs a specifier-like fallback mechanism
+> (this would solve Jan's example because you could query the
+> text-specifier presenting the question for the affirmative and
+> negative responses, and the catalog-building mechanism would have
+> checks to make sure they were properly set, perhaps a locale
+> (language) argument), and gettext is just not sufficient for that.
+I don't quite understand.  Could you elaborate and give some examples?
+>
+>
+> At a minimum, we need to implement gettext for Lisp packages.
+> (Currently, gettext is only implemented for C AFAIK.)  But this could
+> potentially cuase more trouble than it's worth.
+>
+>     Ben> A lot depends on priority: How important do you think this
+>     Ben> issue is to your average Japanese/Chinese/etc. user?
+>
+> Which average Japanese (etc) user?  The English-skilled (relatively)
+> programmer in the free software movement, or my not-at-all-competent
+> undergrad students who I would love to have using an Emacs?  This is a
+> really important ease-of-use issue.
+>
+> Realistically, for Japanese, it's low priority.  The Japanese team in
+> the GNU Translation Project is doing very little AFAIK, so even if the
+> capability were there, I doubt the message catalog would soon be done.
+>
+> But I think that many non-English speakers would find it very
+> attractive, and for many languages there are well-organized and
+> productive translation teams.  I suspect that if the I18N facility
+> were well-designed, many Western European languages would have full
+> catalogs within a year (granted, they are the ones where it's least
+> needed :-( ).
+>
+> Personally, I think doing it well is hard, and of little benefit to
+> _current_ core XEmacs constituency.  I think doing a good job, with
+> catalogs, would be very attractive to many non-English-speaking
+> _potential_ users.
+>
+>     Ben> How does it compare to some of the other important Mule
+>     Ben> issues that Martin and I are (trying to work) on?
+>
+> I don't know what you guys are _trying_ to work on.  Everything in the
+> I18N section of "Architecting XEmacs" is red-flagged.  OTOH, it's
+> clear from your posts that you are overburdened, so I can't read
+> priority into the fact that you've responded to specific issues in the
+> past.
+I wrote this awhile ago:
+>
+>     Ben> The big question is, would you be willing to help do the
+>     Ben> actual implementation, to "be my hands"?
+>
+> Sure, subject to the usual caveat that I'd need to be convinced it's
+> worth doing and a secondary caveat that I am not an experienced coder.
+If you'll implement it, I'll design it.  It's more a case of will on your part
+than anything else.  I can give you instructions sufficient enough to match
+your level of expertise.
+ben
+>
+>
+> --
+> University of Tsukuba                Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
+> Institute of Policy and Planning Sciences       Tel/fax: +81 (298) 53-5091
+> _________________  _________________  _________________  _________________
+> What are those straight lines for?  "XEmacs rules."
+--
+In order to save my hands, I am cutting back on my responses, especially
+to XEmacs-related mail.  You _will_ get a response, but please be patient.
+If you need an immediate response and it is not apparent in your message,
+please say so.  Thanks for your understanding.
+-----------------------------------------------------------------------------
+Dec 20, 1999
+Implementing message translation is not that hard.  I've already done a lot of
+preliminary work in places such as @file{make-msgfile.lex} in lib-src/.  Finishing up
+the work is not that big a task; I already know exactly how it should be
+done.  Perhaps I'll write up detailed design instructions for this, as I'm
+doing for other things.  A lot depends on priority: How important do you think
+this issue is to your average Japanese/Chinese/etc. user?  How does it compare
+to some of the other important Mule issues that Martin and I are (trying to
+work) on?  If I did the design document, would you be willing to do the
+necessary bit of C hackery to implement the document?  If the design document
+is not specific enough for you, I can give you an "implementation document"
+which will definitely be specific enough: i.e. I'll show you exactly where the
+code needs to be modified, and how.  The big question is, would you be willing
+to help do the actual implementation, to "be my hands"?
+---------------------------------------------------------------------------
+From:
+Ben Wing <ben@@666.com>
+12/14/1999 11:00 PM
+Subject:
+Re: Mule UI disaster: displaying character tables
+To:
+Hrvoje Niksic <hniksic@@iskon.hr>
+CC:
+XEmacs vs Mule <xemacs-mule@@xemacs.org>
+What I mean is, please put my name in the header, as well as xemacs-mule.
+That way I'll see it in my personal box.
+I agree that Mule has problems, but:
+Brokenness can be fixed.
+Slowness can be fixed.
+Limitations can be fixed.
+The design limitation you mention below, for example, is not really very
+hard to change.
+Keep in mind that I pretty much rewrote Mule from scratch, and did it
+@strong{all} in 6-7 months.  In comparison with that, the changes below are
+pretty minor, and each could be done by a good (and able-bodied!)
+programmer familiar with the Mule code in less than a week -- to the
+XEmacs code, at least.  The problem is, everyone who could do this work is
+instead spending their time complaining about Mule problems instead of
+doing things.
+I'll gladly help out anyone who wants to do Mule coding by explaining all
+the details; I'll even write a "Mule internals manual", if that will
+help.  I can also make international phone calls -- they're cheap here in
+the US due to the long distance wars.  But so far no one has asked me for
+help or shown any willingness to do any work on Mule.
+Perhaps people are daunted by the seeming vastness of the problems.  But I
+wager that if I had another 6 months to work on nothing but Mule, it would
+be nearly perfect.  The basic design of the XEmacs C code is good;
+incremental changes, without over-much concern for compatibility, could
+make huge strides in a short amount of time (as was the case the whole
+time I worked on it, esp. towards the end -- it didn't even @strong{compile} for
+4 months!).  A "total rewrite" would be an incredible waste of time.
+Again, I'm completely willing to provide help, documentation, design
+improvement suggestions (ala Architecting XEmacs -- which seems to have
+been completely ignored, alas), etc.
+ben
+Hrvoje Niksic wrote:
+> Ben Wing <ben@@666.com> writes:
+>
+> > I'm the one who did most of the Mule work in XEmacs, so if you have
+> > any questions about the core, please address them to me directly.  I
+> > can probably give you a very clear and detailed answer.
+>
+> Thanks.  I think it still makes sense to ask here, so that other
+> developer have a chance to chime in.
+>
+> > However, I need some explanation.  What's misdesigned that you're
+> > complaining about?  And what's the coding-system disaster?
+>
+> It's been spoken of a lot.  Basically:
+>
+> * Unlike XEmacs/no-Mule, XEmacs/Mule doesn't preserve binary files in
+>   Latin 2 locales by default.  This is annoying for users who are used
+>   to XEmacs/no-Mule.
+>
+> * XEmacs/Mule is much slower than XEmacs, and not only because of
+>   character/byte conversions.  It seems that font lookups etc. are
+>   slower.
+>
+> * The "coding-system disaster" refers to inherent limitations of the
+>   coding-system model.  If I understand things correctly,
+>   coding-systems convert streams of bytes to streams of Emchars.  It
+>   does not appear to be possible to create a "gzip" coding system for
+>   handling gzipped file.  Even EOL conversions look kludgish:
+>
+>     iso-2022-8
+>     iso-2022-8-dos
+>     iso-2022-8-mac
+>     iso-2022-8-unix
+>     iso-2022-8bit-ss2
+>     iso-2022-8bit-ss2-dos
+>     iso-2022-8bit-ss2-mac
+>     iso-2022-8bit-ss2-unix
+>     iso-2022-int-1
+>     iso-2022-int-1-dos
+>     iso-2022-int-1-mac
+>     iso-2022-int-1-unix
+>
+>   Ideally, it should be possible to specify a stream of
+>   coding-systems, where only the last one converts to actual Emchars.
+>
+> There are more problems I don't remember right now.  Many many usage
+> problems become apparent when I stand and look over the shoulders of
+> an XEmacs users who tries to use Mule.
+--
+In order to save my hands, I am cutting back on my responses, especially
+to XEmacs-related mail.  You _will_ get a response, but please be patient.
+If you need an immediate response and it is not apparent in your message,
+please say so.  Thanks for your understanding.
+-----------------------------------------------------------------------
+From:
+Ben Wing <ben@@666.com>
+12/14/1999 12:20 AM
+Subject:
+Re: Mule UI disaster: displaying character tables
+To:
+"Stephen J. Turnbull" <turnbull@@sk.tsukuba.ac.jp>
+CC:
+XEmacs vs Mule <xemacs-mule@@xemacs.org>
+I think you should go ahead with your proposal, and assume it will get
+implemented.  I don't think Martin is really suggesting that API changes not
+be allowed, but just that they proceed in a somewhat orderly fashion; and in
+any case, I imagine I have final say in cases of Mule-related conflicts.
+ben
+"Stephen J. Turnbull" wrote:
+> >>>>> "Hrvoje" == Hrvoje Niksic <hniksic@@iskon.hr> writes:
+>
+>     Hrvoje> So next I tried the "Mule" menu.  That's right, boys and
+>     Hrvoje> girls, I've never looked at it before.
+>
+> For quite a while, it didn't work at all, led to crashes and other
+> warm/fuzzy things.  IIRC there used to be a top level menu item
+> pointing to information about the current language environment but it
+> got removed.
+>
+>     Hrvoje> Wow.  Seeing shift_jis, iso-2022 variants and (above all
+>     Hrvoje> things) big5 makes me really warm and fuzzy.
+>
+> We've been through this recently---you were there.  We know what to do
+> about it, basically (Ben liked my proposal, and it would fix this
+> silliness as well as the binary file breakage).  But given that Ben
+> and Martin seem to have different ideas about where to go with Mule
+> (Ben seemed to be supporting API and implementation revisions, Martin
+> evidently wants to keep the current Mule), working on that proposal is
+> possibly a waste of time.  I've got other stuff on my plate and I'll
+> get back to it one of these days (not tomorrow but sooner than Real
+> Soon Now).
+>
+>     Hrvoje> The items it presents (leading to further submenus) are:
+>
+>     Hrvoje>     94 character set
+>     Hrvoje>     94 x 94 character set
+>     Hrvoje>     96 character set
+>
+> This _is_ bad UI, now that you point it out.  But it is quite natural
+> for a coding system lawyer (as all Japanese users have to be), I never
+> noticed it before.  Easy enough to fix ("raise my karma").
+>
+>     Hrvoje> But I do bear some Mule scars, so I happily select "96
+>     Hrvoje> character sets", then ISO8859-2.  And I get this:
+>
+> [Table omitted]
+>
+>     Hrvoje> So me wonders: what the hell is this?
+>
+> Huh?  That is the standard table that you see over and over again in
+> references.  I'll believe you if you say you've never seen one before,
+> but every Japanese users' manual has dozens of pages of those, using
+> exactly that format.
+>
+> The presentation in the range 00--7F is not unreasonable for Latin 2;
+> ISO-8859 is a version of ISO-2022, so the high bit should not be
+> interpreted as "+ x80" (technically speaking), it should be
+> interpreted as a character set shift.
+>
+> Of course, this doesn't make sense to anybody but a character set
+> lawyer, and so should be changed.  Especially since the header refers
+> to ISO-8859-2 which everybody these days thinks of as _one, 8-bit_
+> character set, not two 7-bit ones.
+>
+> As for the "Japanese" in the table, that's just a really stupid
+> "optimization": those happen to be line-drawing characters available
+> in JIS X 0208, to make pretty borders.  Substitute "-", "+", and "|"
+> in appropriate places to make ugly but portable borders.
+>
+>     Hrvoje> Mule is just broken.  Warn your friends.
+>
+> Hrvoje is on the rampage again.  Warn your friends ;-)
+>
+> --
+> University of Tsukuba                Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
+> Institute of Policy and Planning Sciences       Tel/fax: +81 (298) 53-5091
+> _________________  _________________  _________________  _________________
+> What are those straight lines for?  "XEmacs rules."
+--
+In order to save my hands, I am cutting back on my responses, especially
+to XEmacs-related mail.  You _will_ get a response, but please be patient.
+If you need an immediate response and it is not apparent in your message,
+please say so.  Thanks for your understanding.
+---------------------------------------------------------------------------
+From:
+Ben Wing <ben@@666.com>
+12/14/1999 10:28 PM
+Subject:
+Re: Autodetect proposal; specifer questions/suggestions
+To:
+"Stephen J. Turnbull" <turnbull@@sk.tsukuba.ac.jp>
+I've always thought the specifier API is too complicated (and too
+"write-only"), but I went back at one point well after I designed it and I
+couldn't figure out an obvious way to simplify it that still kept reasonable
+functionality.  Perhaps that's what Custom did, and why it turned out bad.
+Inefficiency is a stupid reason not to use them.  They seem efficient enough
+for redisplay.  Changing them might be inefficient, but Emacs Lisp is in
+general, right?
+Can you propose an API or functionality change that will make them more used?
+"Stephen J. Turnbull" wrote:
+> >>>>> "Ben" == Ben Wing <ben@@666.com> writes:
+>
+>     Ben> I think you should go ahead with your proposal, and assume it
+>     Ben> will get implemented.
+>
+> OK.  "yas baas" ;-)
+>
+> On something totally different.  I'm really bothered by the fact that
+> specifiers are so little used (eg, Custom reimplements them badly),
+> and the fact that every package seems to define its own set of faces
+> (or whatever), rather than use the specifier mechanism to inherit from
+> existing ones, or add new specifications to existing ones.  API problem?
+>
+> Also, faces (maybe specifiers in general?) should have an autoload
+> mechanism, and a @file{<package>-faces.el} (or @file{<package>-specifiers.el})
+> convention.  There are a number of faces in (eg) Custom that I like to
+> use, but I have to load Custom to get them.  And Custom should be able
+> to somehow see all the faces in various packages available, even when
+> they are not loaded.
+>
+> I've seen claims that specifiers aren't very efficient.
+>
+> Opinions?
+>
+> --
+> University of Tsukuba                Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
+> Institute of Policy and Planning Sciences       Tel/fax: +81 (298) 53-5091
+> _________________  _________________  _________________  _________________
+> What are those straight lines for?  "XEmacs rules."
+--
+In order to save my hands, I am cutting back on my responses, especially
+to XEmacs-related mail.  You _will_ get a response, but please be patient.
+If you need an immediate response and it is not apparent in your message,
+please say so.  Thanks for your understanding.
+-----------------------------------------------------------------------------
+From:
+Ben Wing <ben@@666.com>
+11/18/1999 9:02 PM
+Subject:
+Re: Char-related crashes (hopefully) fixed
+To:
+"Stephen J. Turnbull" <turnbull@@sk.tsukuba.ac.jp>
+CC:
+XEmacs Beta List <xemacs-beta@@xemacs.org>
+OK, in summation:
+1. C-q is a user-level function and should do whatever makes the most sense.
+2. int-char is a low-level primitive and should never depend on high-level
+settings like language environment.
+3. Everything you can do with int-char can and should be done with make-char
+-- representation-independent, much less likelihood of bugs, etc.  Therefore
+int-char should be removed.
+4. Note that CLTL2 also removes int-char.
+5. Your statement
+> In one-byte buffers (either Olivier's 1/2/4 extension or `xemacs -font
+> *-iso8859-2') it implicitly will have dependence whatever you say.
+is confusing internal and external representations.
+ben
+"Stephen J. Turnbull" wrote:
+> Can somebody give a bunch of examples where using integers as
+> characters is useful?  For that matter, where they are actually used?
+> Ben said "backward compatibility," but I haven't seen this used, and I
+> don't really know how to grep for it.  I have grepped for int-char,
+> int-to-char, char-int, and char-to-int and they're pretty rare in the
+> core and package code (2/3 of it) that I have.
+>
+> The only one that I ever use is the C-q hack for inserting characters
+> by code value at the keyboard, and that could arguably (and in
+> Japanese invariably is) delegated to an input method which would know
+> about language environment (and return a true character).
+>
+> For iterating over a character set in "natural" order, only ASCII
+> satisfies the requirement of having one, and even that's shaky.  AFAIK
+> the Swedes and the Norwegians, or is it the Danes, disagree on
+> ordering the _letters_ in ISO-8859-1 character set.  This really
+> should be table-driven, and will have to be for everything except
+> ASCII and ISO-8859-1 if we go to a Unicode internal representation.
+>
+> We already have primitives for efficient case conversion and the like.
+>
+> The only example I can think of offhand where you would really really
+> want the facility is to iterate over a code space where you don't know
+> which points are legal characters.  Eg, to print out tables of fonts.
+> Pretty specialized.  And this can be done through make-char, anyway.
+>
+> According to CLtL1, the main portable use for char-int is for hashing.
+> But that doesn't square with the kind of usage we've been talking
+> about (in loops and the like).
+>
+> What else am I missing?
+>
+> Ben's desiderata have some problems.
+>
+> >>>>> "Ben" == Ben Wing <ben@@666.com> writes:
+>
+>     Ben> Either int-char should be the mirror opposite of char-int
+>     Ben> (i.e. accept all legal char integers), or it should be
+>     Ben> removed entirely.
+>
+> OK.  I agree with this.
+>
+>     Ben> int-char should @strong{never} have any dependence on the language
+>     Ben> environment.
+>
+> In one-byte buffers (either Olivier's 1/2/4 extension or `xemacs -font
+> *-iso8859-2') it implicitly will have dependence whatever you say.
+> Even without Mule, people can always use external encoders to change
+> raw ISO-8859-2 to ISO-2022 (not that anybody sane ever would, OK,
+> Hrvoje?).  Then the two files will be interpreted differently in a
+> Latin-1 locale Mule; the ISO-8859-2 file will be recognized as
+> ISO-8859-1, and the ISO-2022 file will be internally interpreted as
+> ISO-8859-2.
+>
+> The point is that people normally assume that int-char should accept
+> their "natural" integer to character map.  For Americans, that's
+> ASCII, for Germans, that's ISO-8859-1, for Croatians, that's
+> ISO-8859-2.  And it works "correctly" in a no-mule XEmacs with `-font
+> *-iso8859-2'!  Japanese usually use ku-ten or JIS, and there's a
+> "natural" map from byte-sized integer pairs to shorts, but it's full
+> of holes.  So language environments don't agree on what a legal char
+> integer is, and where they do (eg, ISO-8859-1 and ISO-8859-2), they
+> don't agree on the map.  To satisfy your dictum (with which I agree,
+> but I take to mean we should get rid of these functions) we can take
+> the intersection where they agree
+>
+> ==> legal char integers == ASCII
+>
+> which is what I prefer, or pick something arbitrary and efficient
+>
+> ==> char-int returns the internal representation
+>
+> which I really hate, or something else.  Suggestions?
+>
+>     Ben> I don't think C-q should either.  If Hrvoje wants to insert
+>     Ben> Latin-2 characters by number, then make C-u C-q work so that
+>     Ben> it also prompts for a character set, with a default chosen
+>     Ben> from the language environment.
+>
+> And restrict this to ASCII?  Or assume Latin-1 in GR if there is no
+> prefix argument?
+>
+> This is a useful feature.  C-q currently inserts Latin-2 characters
+> for Hrvoje in no-mule XEmacs (stretching the point only a little); I
+> think it should continue to do so in Mule.  This really is an input
+> method issue, not a keyboard issue.  In XEmacs, inserting an integer
+> into a buffer has no meaning.  Users insert characters.  So this is a
+> completely different issue from the programming API, and should not be
+> considered analogous.
+>
+> Maybe we could have C-q insert according to the Unicode standard, and
+> treat C-u C-q as part of the input method.  But I think most users
+> would prefer to have C-q insert according to their locale-standard
+> tables, and select Unicode explicitly using the C-u C-q idiom.  In
+> fact (again this points to the input method idea), Japanese users
+> would probably like to have the alternatives of using kuten (pairs
+> from 1--94 x 1--94) or JIS (pairs from 0x21--0x7E x 0x21--0x7E) as
+> options since both indexing systems are common in tables.
+>
+> --
+> University of Tsukuba                Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
+> Institute of Policy and Planning Sciences       Tel/fax: +81 (298) 53-5091
+> __________________________________________________________________________
+> __________________________________________________________________________
+> What are those two straight lines for?  "Free software rules."
+--
+ben
+--
+In order to save my hands, I am cutting back on my responses, especially to
+XEmacs-related mail.  You
+_will_ get a response, but please be patient.  If you need an immediate
+response and it�s not apparent in
+your message, please say so.  Thanks for your understanding.
+-----------------------------------------------------------------------------
+From:
+Ben Wing <ben@@666.com>
+11/16/1999 11:03 PM
+Subject:
+Re: Char-related crashes (hopefully) fixed
+To:
+Yoshiki Hayashi <t90553@@m.ecc.u-tokyo.ac.jp>
+CC:
+Hrvoje Niksic <hniksic@@iskon.hr>,
+XEmacs Beta List <xemacs-beta@@xemacs.org>
+Either int-char should be the mirror opposite of char-int (i.e. accept all
+legal char integers), or it should be removed entirely.
+int-char should @strong{never} have any dependence on the language environment.
+I don't think C-q should either.  If Hrvoje wants to insert Latin-2
+characters by number, then make C-u C-q work so that it also prompts for a
+character set, with a default chosen from the language environment.
+ben
+Yoshiki Hayashi wrote:
+> Hrvoje Niksic <hniksic@@iskon.hr> writes:
+>
+> > As Ben said, now that we've fixed the actual bugs, we can think about
+> > changing the behaviour for int-char conversions for 21.2.
+>
+> Following are proposed which integers should be accepted
+> where characters are expected:
+>
+> 1) Don't allow anything
+> 2) Accept 0-127
+> 3) Accept 0-256
+> 4) Accept everything
+>
+> Other things proposed are:
+>
+> a) When doing C-q, treat 128-256 as Latin-2 in Latin 2
+>    language environment.
+>
+> So far, most of the proposal is intended to apply to every
+> int-char conversions, I'd like to make some functions to
+> accept.
+>
+> My plan is:
+> Accept only 0-256 in every place except int-to-char.
+> int-to-char accepts every valid integers.
+> Make new function which does int-to-char conversion
+> correctly according to the language environment.
+>
+> This way, most of the code which does (insert (1+ ?a)) or
+> something continues working. Now internal representation is
+> changed a little bit, so disabling > 256 characters will
+> warn those who are dealing with internal representation
+> directly, which is bad. Still, you can do
+> (let ((i 1442))
+>   (while (i < 2000)
+>     (insert (int-to-char i))
+>     (setq i (+1 i))))
+> to achieve old behaviour.
+>
+> For C-q, I'm not for changing it's original definition,
+> since it might confuse people who are expecting Latin-1 in
+> other language environment and typing just 1 integer doesn't
+> make sense for multibyte world. It's cleaner to make new
+> function, which does make-char according to the charset of
+> language-info-alist so that people who use that often can
+> bind it to C-q or some other keys.
+>
+> --
+> Yoshiki Hayashi
+--
+ben
+--
+In order to save my hands, I am cutting back on my responses, especially to
+XEmacs-related mail.  You
+_will_ get a response, but please be patient.  If you need an immediate
+response and it�s not apparent in
+your message, please say so.  Thanks for your understanding.
+@end example
+@node Discussion -- Windows External Widget, Discussion -- Packages, Discussion -- Multilingual Issues, Future Work Discussion
+@section Discussion -- Windows External Widget
+@cindex discussion, windows external widget
+@cindex windows external widget, discussion
+@example
+Subject:
+Re: External Widget Support for Xemacs on nt
+Date:
+Sat, 08 Jul 2000 01:47:14 -0700
+From:
+Ben Wing <ben@@666.com>
+To:
+Timothy.Fowler@@msdw.com
+CC:
+xemacs-nt@@xemacs.org
+References:
+1
+Nothing is currently done for external widget support under XEmacs but it should
+not be too hard to do and would be a great addition to XEmacs. What you would
+probably want to do is create an XEmacs control that has an interface something
+like the built-in edit control and which communicates to an existing XEmacs
+process using DDE. (Basically you would modify XEmacs so that it registered
+itself as a DDE server accepting external widget requests, and then the external
+edit control would simply send a DDE request and the result would be a handle of
+some sort used for future communication with that particular XEmacs process.)
+There are two basic issues in getting the external widget to work, which are
+display and input. Although I am not completely sure, I have a feeling that it
+is possible for one process to write into the window of another process, simply
+by using that window's HWND handle. If so it should be extremely easy to get the
+output working (this is exactly the approach used under Xt). For input, you
+would probably again want to do what is done under Xt, which is that the client
+widget simply passes all of the appropriate messages to the XEmacs server
+process using whatever communication channel was set up, e.g. DDE, and the
+XEmacs server processes them normally. Very few modifications would be needed to
+the XEmacs source code and all of the necessary modifications could be done
+simply by looking for existing external widget code in XEmacs.
+If you are interested in continuing this, I will certainly give you any support
+you need along the way. This would be a great project to be added to XEmacs.
+Timothy Fowler wrote:
+> I am looking into external widget support for xemacs nt similar to that
+> existing in xemacs for X
+> Have any developement efforts been made in this direction in the past?
+> Is there any current effort?
+> Any insight into the complexity of achieving this?
+> Any comments would be greatly appreciated
+> Thanks
+> Tim Fowler
+--
+Ben
+In order to save my hands, I am cutting back on my mail.  I also write
+as succinctly as possible -- please don't be offended.  If you send me
+mail, you _will_ get a response, but please be patient, especially for
+XEmacs-related mail.  If you need an immediate response and it is not
+apparent in your message, please say so.  Thanks for your understanding.
+See also http://www.666.com/ben/chronic-pain/
+Subject:
+RE: External Widget Support for Xemacs on nt
+Date:
+Mon, 10 Jul 2000 12:40:01 +0100
+From:
+"Alastair J. Houghton" <ajhoughton@@lineone.net>
+To:
+"Ben Wing" <ben@@666.com>, <xemacs-nt@@xemacs.org>
+CC:
+<Timothy.Fowler@@msdw.com>
+> -----Original Message-----
+> From: owner-xemacs-nt@@xemacs.org [mailto:owner-xemacs-nt@@xemacs.org]On
+> Behalf Of Ben Wing
+> Sent: 08 July 2000 09:47
+> To: Timothy.Fowler@@msdw.com
+> Cc: xemacs-nt@@xemacs.org
+> Subject: Re: External Widget Support for Xemacs on nt
+>
+> Nothing is currently done for external widget support under
+> XEmacs but it should
+> not be too hard to do and would be a great addition to XEmacs.
+> What you would
+> probably want to do is create an XEmacs control that has an
+> interface something
+> like the built-in edit control and which communicates to an
+> existing XEmacs
+> process using DDE.
+It would be @strong{much} better to use RPC or COM rather than DDE - and
+also it would provide a more useful interface to XEmacs (like the
+Microsoft rich text edit control that is used by Wordpad). It
+would probably also be easier...
+> If you are interested in continuing this, I will certainly give
+> you any support
+> you need along the way. This would be a great project to be added
+> to XEmacs.
+I agree. This would be a *really useful* thing to do...
+Regards,
+Alastair.
+____________________________________________________________
+Alastair Houghton                     ajhoughton@@lineone.net
+Subject:
+Re: External Widget Support for Xemacs on nt
+Date:
+Mon, 10 Jul 2000 22:56:06 -0700
+From:
+Ben Wing <ben@@666.com>
+To:
+"Alastair J. Houghton" <ajhoughton@@lineone.net>
+CC:
+xemacs-nt@@xemacs.org, Timothy.Fowler@@msdw.com
+References:
+1
+sounds good.  i don't know too much about windows ipc methods, so i suggested
+dde just as an example.
+"Alastair J. Houghton" wrote:
+> > -----Original Message-----
+> > From: owner-xemacs-nt@@xemacs.org [mailto:owner-xemacs-nt@@xemacs.org]On
+> > Behalf Of Ben Wing
+> > Sent: 08 July 2000 09:47
+> > To: Timothy.Fowler@@msdw.com
+> > Cc: xemacs-nt@@xemacs.org
+> > Subject: Re: External Widget Support for Xemacs on nt
+> >
+> > Nothing is currently done for external widget support under
+> > XEmacs but it should
+> > not be too hard to do and would be a great addition to XEmacs.
+> > What you would
+> > probably want to do is create an XEmacs control that has an
+> > interface something
+> > like the built-in edit control and which communicates to an
+> > existing XEmacs
+> > process using DDE.
+>
+> It would be @strong{much} better to use RPC or COM rather than DDE - and
+> also it would provide a more useful interface to XEmacs (like the
+> Microsoft rich text edit control that is used by Wordpad). It
+> would probably also be easier...
+>
+> > If you are interested in continuing this, I will certainly give
+> > you any support
+> > you need along the way. This would be a great project to be added
+> > to XEmacs.
+>
+> I agree. This would be a *really useful* thing to do...
+>
+> Regards,
+>
+> Alastair.
+>
+> ____________________________________________________________
+> Alastair Houghton                     ajhoughton@@lineone.net
+--
+Ben
+In order to save my hands, I am cutting back on my mail.  I also write
+as succinctly as possible -- please don't be offended.  If you send me
+mail, you _will_ get a response, but please be patient, especially for
+XEmacs-related mail.  If you need an immediate response and it is not
+apparent in your message, please say so.  Thanks for your understanding.
+See also http://www.666.com/ben/chronic-pain/
+@end example
+@node Discussion -- Packages, Discussion -- Distribution Layout, Discussion -- Windows External Widget, Future Work Discussion
+@section Discussion -- Packages
+@cindex discussion, packages
+@cindex packages, discussion
+Author: @uref{mailto:ben@@xemacs.org,Ben Wing}
+@subheading Important package-related changes
+This file details changes that make the package system no longer an
+unmitigated disaster.  This way, at the very least, people can
+essentially ignore the package system and not get bitten horribly the
+way they currently do.
+@enumerate
+@item
+A single tarball containing absolutely everything and named
+xemacs-21.2.68.tar.gz.  This must contain absolutely everything,
+including all of the packages, and in the proper directory
+structure, so that the paradigm for
+untar; configure; make; make install
+just works.
+@item
+Fixed startup slowdown when all packages are installed so that
+there is absolutely no penalty to having them all installed.  This
+may be hard.
+@item
+All files on the ftp site should be accessible through http.
+@item
+Put symlinks into the distribution directory to the appropriate
+files in the package directory.
+@item
+Eliminate the confusing SUMO name, choosing a much more obvious
+name such as all-packages.
+@item
+There should be no separation of mule and non-mule packages.
+@item
+Having 2 packages that conflict with each other should be
+completely disallowed.
+@item
+Fix vc and ps-print so that there is only ONE version.
+@item
+Fix up all of the READMEs on the distribution site to make it
+abundantly clear what needs to be obtained, where to get it, and
+how to install it, especially with regards to packages.
+@end enumerate
+@node Discussion -- Distribution Layout,  , Discussion -- Packages, Future Work Discussion
+@section Discussion -- Distribution Layout
+@cindex discussion, distribution layout
+@cindex distribution layout, discussion
+@example
+From:
+Ben Wing <ben@@666.com>
+10/15/1999 8:50 PM
+Subject:
+VOTE: Absolutely necessary changes to file naming in releases
+To:
+SL Baur <steve@@xemacs.org>,
+XEmacs Reviews <xemacs-review@@xemacs.org>
+Everybody except Steve seems to agree that we need to provide a single
+tar file containing the entire XEmacs tree whenever we release a new
+version of XEmacs (beta or not).  Therefore I propose the following
+simple changes, and ask for a vote.  If it is the general will of the
+developers, then Steve @strong{WILL} make these changes.  This is the
+definition of cooperative development -- no one, not even the
+maintainer, can assert absolute power over anything.
+I propose (assuming, for example, release 21.2.20):
+1. xemacs-21.2.20.tar.gz -> xemacs-21.2.20-core.tar.gz
+2. xemacs-sumo.tar.gz -> xemacs-packages.tar.gz
+3. xemacs-mule-sumo.tar.gz -> xemacs-mule-packages.tar.gz
+4. Symlinks to the files mentioned in #2 and #3 get created in the SAME
+directory as xemacs-21.2.20-*.tar.gz.
+5. MOST IMPORTANTLY, a new file xemacs-21.2.20.tar.gz gets created,
+which is the combination of the 5 files xemacs-21.2.20-core.tar.gz,
+xemacs-21.2.20-elc.tar.gz, xemacs-21.2.20-info.tar.gz,
+xemacs-packages.tar.gz, and xemacs-mule-packages.tar.gz.
+The directory structure of the new combined file xemacs-21.2.20.tar.gz
+would look like this:
+xemacs-21.2.20/
+xemacs-packages/
+xemacs-mule-packages/
+I am sorry to shout, but the current situation is just completely
+insane.
+ben
+From:
+Ben Wing <ben@@666.com>
+10/16/1999 3:12 AM
+Subject:
+Re: VOTE: Absolutely necessary changes to file naming in releases
+To:
+SL Baur <steve@@xemacs.org>,
+XEmacs Reviews <xemacs-review@@xemacs.org>,
+"Michael Sperber [Mr. Preprocessor]" <sperber@@informatik.uni-tuebingen.de>
+Something went wrong with my mail program while I was responding, so
+Michael's response is not quoted here.
+Let me rephrase my proposal, stressing the important points in order of
+importance:
+1. MOST IMPORTANT: There MUST be a SINGLE tar file containing the complete
+XEmacs sources, packages, etc.  The name of this tar file must have a
+format like this:
+xemacs-21.2.10.tar.gz
+The directory layout of the packages within it is not important as long as
+it works: The user who downloads the tar file MUST be able to apply the
+'configure; make; make install' paradigm at the top-level directory and
+have it work properly.
+2. All the pieces of XEmacs must be in the @strong{same} subdirectory on the FTP
+site.
+3. The names need to be obvious and standard.  Naming the core files
+"xemacs-21.2.20.tar.gz" is non-standard because those are only the core
+files.  The standard followed by everybody in the world is that a name like
+this refers to the entire product, with all ancillary files.  Also, "sumo",
+although a nice in-joke, is extremely confusing and needs to go.
+Referring to Michael's point about the layout I proposed, I also think that
+the package system needs to be modified to accept a layout produced by the
+"obvious" way of obtaining and untarring the parts, which leaves you with a
+directory consisting of
+xemacs-21.2.19/
+xemacs-packages/
+mule-packages/
+All at the same level.  However, this is an independent issue from the vote
+at hand.
+Consider the current insanity.  The new XEmacs user or beta tester goes to
+the FTP site, looks around, finds the file xemacs-21.2.19.tar.gz, and
+downloads it, because it looks like the obvious one to get.  But it doesn't
+work.  Oops ...  He looks some more and finds the other two -elc and -info
+parts, grabs them, and then tries again.  But it still doesn't work.  He
+manages to overhear something about packages, so he looks for them, but
+doesn't find them immediately (they're not even in the beta tree, though
+they obviously contain beta-level code, especially in xemacs-base and
+mule-base).  Eventually he discovers the package/ subdirectory, but what
+the hell does he do there?  There's no README at all there giving any
+clues, so he downloads everything.  Along with this, he gets some files
+called "sumo", which he doesn't understand, but he notices that some of
+them are extremely large.  "sumo" ... "large" ...  hehe, I get it.  Some
+silly developer's joke.  But then he tries again to compile things, and
+just can't figure things out.  He still doesn't know:
+-- "sumo" is not just some large file, but is a tar file of all the
+packages.
+-- The packages can't be placed is any subdirectory in any obvious relation
+to the XEmacs directory ("straight out of the box" if you manage to grok
+the significance of the sumo files, you get a layout like
+xemacs-21.2.19/
+xemacs-packages/
+mule-packages/
+which naturally doesn't work!  He needs to put them underneath
+xemacs-21.2.19/lib/xemacs/ or something.)
+At this point, he gives up, and (if he was a user of a pre-packagized
+XEmacs) wonders in despair how things got so messed up, when all older
+XEmacs releases, including all the betas, followed the standard "configure;
+make; make install" paradigm).
+Soooooo .........  PLEASE vote on issues #1-3 above, and add any comments
+you feel like adding.
+ben
+Ben Wing wrote:
+> Everybody except Steve seems to agree that we need to provide a single
+> tar file containing the entire XEmacs tree whenever we release a new
+> version of XEmacs (beta or not).  Therefore I propose the following
+> simple changes, and ask for a vote.  If it is the general will of the
+> developers, then Steve @strong{WILL} make these changes.  This is the
+> definition of cooperative development -- no one, not even the
+> maintainer, can assert absolute power over anything.
+>
+> I propose (assuming, for example, release 21.2.20):
+>
+> 1. xemacs-21.2.20.tar.gz -> xemacs-21.2.20-core.tar.gz
+>
+> 2. xemacs-sumo.tar.gz -> xemacs-packages.tar.gz
+>
+> 3. xemacs-mule-sumo.tar.gz -> xemacs-mule-packages.tar.gz
+>
+> 4. Symlinks to the files mentioned in #2 and #3 get created in the SAME
+> directory as xemacs-21.2.20-*.tar.gz.
+>
+> 5. MOST IMPORTANTLY, a new file xemacs-21.2.20.tar.gz gets created,
+> which is the combination of the 5 files xemacs-21.2.20-core.tar.gz,
+> xemacs-21.2.20-elc.tar.gz, xemacs-21.2.20-info.tar.gz,
+> xemacs-packages.tar.gz, and xemacs-mule-packages.tar.gz.
+>
+> The directory structure of the new combined file xemacs-21.2.20.tar.gz
+> would look like this:
+>
+> xemacs-21.2.20/
+> xemacs-packages/
+> xemacs-mule-packages/
+>
+> I am sorry to shout, but the current situation is just completely
+> insane.
+>
+> ben
+From:
+Ben Wing <ben@@666.com>
+12/6/1999 4:19 AM
+Subject:
+Re: Please Vote on Proposals
+To:
+Kyle Jones <kyle_jones@@wonderworks.com>
+CC:
+XEmacs Review <xemacs-review@@xemacs.org>
+OK Kyle, how about a different proposal:
+1. The distribution consists of the following three parts (let's assume
+v21.2.25):
+-- xemacs-21.2.25-core.tar.gz
+The same as would currently in xemacs-21.2.25.tar.gz.  You can
+run this editor and edit in fundamental mode, but not do anything
+else.
+-- xemacs-21.2.25-core-packages.tar.gz
+A useful and complete subset of all the possible packages.  Selection
+of
+what goes in and what goes out is based partially on consensus,
+partially
+on vote, and partially on these criteria:
+-- commonly-used packages go in.
+-- unmaintained or out-of-date packages go out.
+-- buggy, poorly-written packages go out.
+-- really obscure packages that hardly anybody could possibly care
+about go out.
+-- when there are two or three packages implementing basically the
+same functionality, pick only one to go in unless there are two
+that
+both are really commonly-used.
+-- if a package can be loaded implicitly as a result of something in
+the
+core, it needs to go in, regardless of whether it's been
+maintained.
+This applies, for example, to the mode files -- @strong{all} mode
+packages must
+go in (or more properly, every mode must have a corresponding
+package
+that's in, although if there are two or more packages implementing
+a
+particular mode, e.g. html, we are free to choose just one).
+-- xemacs-21.2.25-aux-packages.tar.gz
+All of the packages not in the previous file.  Generally
+crappy-quality,
+poorly-maintained code.
+Note, we do not make distinctions between Mule and non-Mule in our
+packaging scheme -- this is a bug and XEmacs and/or the packages should
+be fixed up so that this goes away.
+2. The distribution also contains two combination files:
+-- xemacs-21.2.25.tar.gz
+This is the "default" file that a naive user ought to retrieve, and
+he'll get a running XEmacs, just like he wants, and comfortable, too,
+because all of the common packages are there.  This file is a
+combination
+of xemacs-21.2.25-core.tar.gz and xemacs-21.2.25-core-packages.tar.gz.
+-- xemacs-21.2.25-everything.tar.gz
+This file contains absolutely everything, like it advertises --
+including the aux packages and all of their associated crappy-quality,
+unmaintained code.  This file is a combination of
+xemacs-21.2.25-core.tar.gz,
+xemacs-21.2.25-core-packages.tar.gz, and
+xemacs-21.2.25-aux-packages.tar.gz.
+I like this proposal better than the previous one I advocated, because it
+follows your good suggestion of separating the wheat from the chaff in
+the packages, so to speak.  People will grab xemacs-21.2.25.tar.gz by
+default, just like they should,
+and they'll get something they're quite happy with, and we're happy
+because we can exercise quality control over the packages and exclude the
+crappy ones most likely to cause grief later on.
+What say y'all?
+ben
+Kyle Jones wrote:
+> Ben Wing writes:
+>  > Disagree.  Please let's follow everyone else's convention, and not
+>  > introduce yet another randomness.
+>
+> It is not randomness! I think this is a semantic issue and an
+> important one.  The issue is: What do we consider part of XEmacs
+> and what is considered external to XEmacs.  If you put all the
+> packages in xemacs.tar.gz, then users can reasonably and wrongly
+> assume that all this random Lisp code is maintained by us.  We
+> are trying to stay away from that model because in the past it has
+> left us with piles and piles of orphaned code.  Even if every one
+> of us were paid to maintain XEmacs, it is just not practical for
+> us to continue to maintain all that code, let alone any new code.
+> So I think the naming distinction Jan is making is worth doing.
+>
+> Also, I don't consider the current situation broken, except
+> perhaps the sumo tarball being out of date.  I never, ever,
+> though it was a great idea to ship all the stuff that XEacs
+> shipped in the old days.  Because this pile of code was always
+> around in the distribution, an enormous web of undocumented
+> dependencies was constructed.  Eventually, you HAD to install
+> everything because if you left something out or removed something
+> you never knew when XEmacs would throw an error.  Thus the Cult
+> of the Cargo was born.
+>
+> One of the best things that came out of the package system was
+> the month or two we spent running XEmacs without all the assorted
+> Lisp installed.  Dependencies were removed or documented, some
+> stuff got retired, and for the first time we actually had a full
+> accounting of what we were shipping.  I currently run XEmacs with
+> 7 packages and I don't miss the other stuff.
+>
+> Having come this far, I do not think we should go back to
+> advocating that everyone just install everything and not
+> think about they are doing.  Besides saving space and startup
+> time, another reason to not install everything is that you
+> won't bloat your XEmacs process nearly as much if you go
+> exploring in the Custom menus, because there won't be as much
+> Lisp loaded as Custom sets up its groups and whatnot.
+--
+In order to save my hands, I am cutting back on my responses, especially
+to XEmacs-related mail.  You _will_ get a response, but please be
+patient.
+If you need an immediate response and it is not apparent in your message,
+please say so.  Thanks for your understanding.
 @end example
 @node Old Future Work, Index, Future Work Discussion, Top
 @chapter Old Future Work
 @cindex old future work
 implemented.  These proposals are included because they may describe to
 some extent the actual workings of the implemented code, and because
 they may discuss relevant design issues, alternative implementations, or
 work still to be done.
 @menu
-* Future Work -- A Portable Unexec Replacement::
+* Old Future Work -- A Portable Unexec Replacement::
-* Future Work -- Indirect Buffers::
+* Old Future Work -- Indirect Buffers::
-* Future Work -- Improvements in support for non-ASCII (European) keysyms under X::
+* Old Future Work -- Improvements in support for non-ASCII (European) keysyms under X::
-* Future Work -- xemacs.org Mailing Address Changes::
+* Old Future Work -- RTF Clipboard Support::
-* Future Work -- Lisp callbacks from critical areas of the C code::
+* Old Future Work -- xemacs.org Mailing Address Changes::
+* Old Future Work -- Lisp callbacks from critical areas of the C code::
 @end menu
-@node Future Work -- A Portable Unexec Replacement, Future Work -- Indirect Buffers, Old Future Work, Old Future Work
+@node Old Future Work -- A Portable Unexec Replacement, Old Future Work -- Indirect Buffers, Old Future Work, Old Future Work
-@section Future Work -- A Portable Unexec Replacement
+@section Old Future Work -- A Portable Unexec Replacement
-@cindex future work, a portable unexec replacement
+@cindex old future work, a portable unexec replacement
-@cindex a portable unexec replacement, future work
+@cindex a portable unexec replacement, old future work
+Author: @uref{mailto:ben@@xemacs.org,Ben Wing}
 @strong{Abstract:} Currently, during the build stage of XEmacs, a bare
 version of the program (called @dfn{temacs}) is run, which loads up a
 bunch of Lisp data and then writes out a modified executable file.  This
 process is very tricky to implement and highly system-dependent.  It can
 preprocessor, or by simply using a different name, such as
 @code{xmalloc}.  It's also very important that we use the correct
 @code{free} function when freeing dynamically-allocated data, depending
 on whether this data was allocated by us or by the
-@node Future Work -- Indirect Buffers, Future Work -- Improvements in support for non-ASCII (European) keysyms under X, Future Work -- A Portable Unexec Replacement, Old Future Work
+@node Old Future Work -- Indirect Buffers, Old Future Work -- Improvements in support for non-ASCII (European) keysyms under X, Old Future Work -- A Portable Unexec Replacement, Old Future Work
-@section Future Work -- Indirect Buffers
+@section Old Future Work -- Indirect Buffers
-@cindex future work, indirect buffers
+@cindex old future work, indirect buffers
-@cindex indirect buffers, future work
+@cindex indirect buffers, old future work
+Author: @uref{mailto:ben@@xemacs.org,Ben Wing}
 An indirect buffer is a buffer that shares its text with some other
 buffer, but has its own version of all of the buffer properties,
 including markers, extents, buffer local variables, etc.  Indirect
 buffers are not currently implemented in XEmacs, but they are in GNU
 done only once, rather than on each buffer.  I imagine it would be
 significantly easier to implement this, if a macro were created for
 iterating over a buffer, and then all of the indirect children of that
 buffer.
-@node Future Work -- Improvements in support for non-ASCII (European) keysyms under X, Future Work -- xemacs.org Mailing Address Changes, Future Work -- Indirect Buffers, Old Future Work
+@node Old Future Work -- Improvements in support for non-ASCII (European) keysyms under X, Old Future Work -- RTF Clipboard Support, Old Future Work -- Indirect Buffers, Old Future Work
-@section Future Work -- Improvements in support for non-ASCII (European) keysyms under X
+@section Old Future Work -- Improvements in support for non-ASCII (European) keysyms under X
-@cindex future work, improvements in support for non-ascii (european) keysyms under x
+@cindex old future work, improvements in support for non-ascii (european) keysyms under x
-@cindex improvements in support for non-ascii (european) keysyms under x, future work
+@cindex improvements in support for non-ascii (european) keysyms under x, old future work
-From Martin Buchholz.
+Author: @uref{mailto:martin@@xemacs.org,Martin Buchholz}
 If a user has a keyboard with known standard non-ASCII character
 equivalents, typically for European users, then Emacs' default
 binding should be self-insert-command, with the obvious character
 inserted.   For example, if a user has a keyboard with
 even be bound to anything by a user trying to customize it.
 This is implemented by maintaining a table of translations between all
 the known X keysym names and the corresponding (charset, octet) pairs.
+@quotation
 For every key on the keyboard that has a known character correspondence,
 we define the ascii-character property of the keysym, and make the
 default binding for the key be self-insert-command.
 The following magic is basically intimate knowledge of X11/keysymdef.h.
 except for Cyrillic and Greek.
 In a non-Mule world, a user can still have a multi-lingual editor, by doing
 (set-face-font "...-iso8859-2" (current-buffer))
 for all their Latin-2 buffers, etc.
+@end quotation
-@node Future Work -- xemacs.org Mailing Address Changes, Future Work -- Lisp callbacks from critical areas of the C code, Future Work -- Improvements in support for non-ASCII (European) keysyms under X, Old Future Work
-@section Future Work -- xemacs.org Mailing Address Changes
+@node Old Future Work -- RTF Clipboard Support, Old Future Work -- xemacs.org Mailing Address Changes, Old Future Work -- Improvements in support for non-ASCII (European) keysyms under X, Old Future Work
-@cindex future work, xemacs.org mailing address changes
+@section Old Future Work -- RTF Clipboard Support
-@cindex xemacs.org mailing address changes, future work
+@cindex old future work, RTF clipboard support
+@cindex RTF clipboard support, old future work
+Author: @uref{mailto:ben@@xemacs.org,Ben Wing}
+in fact, i merged the windows stuff with the already-existing generic code.
+what i'd like to see is something like this:
+@enumerate
+@item
+The current function
+@example
+(defun own-selection (data &optional type append)
+@end example
+should become
+@example
+(defun own-selection (data &optional type how-to-add data-type)
+@end example
+where data-type is the mswindows format, and how-to-add is
+@example
+'replace-all or nil -- remove data for all formats
+'replace-existing -- remove data for DATA-TYPE, but leave other formats alone
+'append or t -- append data to existing data in DATA-TYPE, and leave other
+formats alone
+@end example
+@item
+the function
+@example
+(get-selection &optional TYPE DATA-TYPE)
+@end example
+already has a data-type so you don't need to change it.
+@item
+the existing function
+@example
+(selection-exists-p &optional SELECTION DEVICE)
+@end example
+should become
+@example
+(selection-exists-p &optional SELECTION DEVICE DATA-TYPE)
+@end example
+@item
+a new function
+@example
+(register-selection-data-type DATA-TYPE)
+@end example
+like your mswindows-register-clipboard-format.
+@item
+there's already a selection-converter-alist, but that's only for data out.
+you should alias it to selection-conversion-out-alist, and create
+selection-conversion-in-alist.  these alists contain entries for CF_TEXT, which
+handles CR/LF conversion, and rtf, which does rtf in/out conversion -- no need
+for separate functions to do this.
+this may seem daunting, but it's much less hard to add stuff like this than it
+seems, and i and others will certainly give you lots of support if you run into
+problems.  it would be way cool to have a more powerful clipboard mechanism in
+XEmacs.
+@end enumerate
+@node Old Future Work -- xemacs.org Mailing Address Changes, Old Future Work -- Lisp callbacks from critical areas of the C code, Old Future Work -- RTF Clipboard Support, Old Future Work
+@section Old Future Work -- xemacs.org Mailing Address Changes
+@cindex old future work, xemacs.org mailing address changes
+@cindex xemacs.org mailing address changes, old future work
+Author: @uref{mailto:ben@@xemacs.org,Ben Wing}
 @subheading Personal addresses
 @enumerate
 @item
 addresses set up will make it much easier for this momentum to be built
 up and to remain.
 @uref{../../www.666.com/ben/default.htm,Ben Wing}
-@node Future Work -- Lisp callbacks from critical areas of the C code,  , Future Work -- xemacs.org Mailing Address Changes, Old Future Work
+@node Old Future Work -- Lisp callbacks from critical areas of the C code,  , Old Future Work -- xemacs.org Mailing Address Changes, Old Future Work
-@section Future Work -- Lisp callbacks from critical areas of the C code
+@section Old Future Work -- Lisp callbacks from critical areas of the C code
-@cindex future work, lisp callbacks from critical areas of the c code
+@cindex old future work, lisp callbacks from critical areas of the c code
-@cindex lisp callbacks from critical areas of the c code, future work
+@cindex lisp callbacks from critical areas of the c code, old future work
-@example
+Author: @uref{mailto:ben@@xemacs.org,Ben Wing}
 There are many places in the XEmacs C code where Lisp functions are
 called, usually because the Lisp function is acting as a callback,
 hook, process filter, or the like.  The lisp code is often called in
 places where some lisp operations are dangerous.  Currently there are
 a lot of ad-hoc schemes implemented to try to prevent these dangerous
 Corresponding to each of these entries is the C name of the bit flag.
 The sets of dangerous operations which can be prohibited are:
-OPERATION_GC_PROHIBITED
+@table @code
-1. garbage collection.  When this flag is set, and the garbage
+@item OPERATION_GC_PROHIBITED
-collection threshold is reached, garbage collection simply doesn't
+garbage collection.  When this flag is set, and the garbage
-happen.  It will happen at the next opportunity that it is allowed.
+collection threshold is reached, garbage collection simply doesn't
-Similarly, explicitly calling the Lisp function garbage-collect
+happen.  It will happen at the next opportunity that it is allowed.
-simply does nothing.
+Similarly, explicitly calling the Lisp function garbage-collect
+simply does nothing.
-OPERATION_CATCH_ERRORS
-2. signalling an error.  When @code{enter_sensitive_code_section()} is
+@item OPERATION_CATCH_ERRORS
-called, with the bit flag corresponding to this prohibited
+signalling an error.  When @code{enter_sensitive_code_section()} is
-operation.  When this bit flag is passed to
+called, with the bit flag corresponding to this prohibited
-@code{enter_sensitive_code_section()}, a catch is set up which catches all
+operation.  When this bit flag is passed to
-errors, signals a warning with @code{warn_when_safe()}, and then simply
+@code{enter_sensitive_code_section()}, a catch is set up which catches all
-continues.  This is exactly the same behavior you now get with the
+errors, signals a warning with @code{warn_when_safe()}, and then simply
-@code{call_*_trapping_errors()} functions.  (there should also be some way
+continues.  This is exactly the same behavior you now get with the
-of specifying a warning level and class here, similar to the
+@code{call_*_trapping_errors()} functions.  (there should also be some way
-@code{call_*_trapping_errors()} functions.  This is not completely
+of specifying a warning level and class here, similar to the
-important, however, because a standard warning level and class
+@code{call_*_trapping_errors()} functions.  This is not completely
-could simply be chosen.)
+important, however, because a standard warning level and class
+could simply be chosen.)
-OPERATION_NO_UNSAFE_OBJECT_DELETION
-3. This flag prohibits deletion of any permanent object (i.e. any
+@item OPERATION_NO_UNSAFE_OBJECT_DELETION
-object that does not automatically disappear when created, such as
+This flag prohibits deletion of any permanent object (i.e. any
-buffers, frames, devices, windows, etc...) unless they were created
+object that does not automatically disappear when created, such as
-after this bit flag was set.  This would be implemented using a
+buffers, frames, devices, windows, etc...) unless they were created
-list which stores all of the permanent objects created after this
+after this bit flag was set.  This would be implemented using a
-bit flag was set.  This list is reset to its previous value when
+list which stores all of the permanent objects created after this
-the call to @code{exit_sensitive_code_section()} occurs.  The motivation
+bit flag was set.  This list is reset to its previous value when
-here is to allow Lisp callbacks to create their own temporary
+the call to @code{exit_sensitive_code_section()} occurs.  The motivation
-buffers or frames, and later delete them, but not allow any other
+here is to allow Lisp callbacks to create their own temporary
-permanent objects to be deleted, because C code might be working
+buffers or frames, and later delete them, but not allow any other
-with them, and not expect them to change.
+permanent objects to be deleted, because C code might be working
+with them, and not expect them to change.
-OPERATION_NO_BUFFER_MODIFICATION
-4. This flag disallows modifications to the text, extent or any other
+@item OPERATION_NO_BUFFER_MODIFICATION
-properties of any buffers except those created after this flag was
+This flag disallows modifications to the text, extent or any other
-set, just like in the previous entry.
+properties of any buffers except those created after this flag was
+set, just like in the previous entry.
-OPERATION_NO_REDISPLAY
-5. This bit flag inhibits any redisplay-related operations from
+@item OPERATION_NO_REDISPLAY
-happening, more specifically, any entry into the redisplay-related
+This bit flag inhibits any redisplay-related operations from
-code.  This includes, for example, the Lisp functions sit-for,
+happening, more specifically, any entry into the redisplay-related
-force-redisplay, force-cursor-redisplay, window-end with certain
+code.  This includes, for example, the Lisp functions sit-for,
-arguments to it, and various other functions. When this flag is
+force-redisplay, force-cursor-redisplay, window-end with certain
-set, instead of entering the redisplay code, the calling function
+arguments to it, and various other functions. When this flag is
-should simply make sure not to enter the redisplay code, (for
+set, instead of entering the redisplay code, the calling function
-example, in the case of window-end), or postpone the redisplay
+should simply make sure not to enter the redisplay code, (for
-until such a time when it's safe (for example, with sit-for and
+example, in the case of window-end), or postpone the redisplay
-force-redisplay).
+until such a time when it's safe (for example, with sit-for and
+force-redisplay).
-OPERATION_NO_REDISPLAY_SETTINGS_CHANGE
-6. This flag prohibits any modifications to faces, glyphs, specifiers,
+@item OPERATION_NO_REDISPLAY_SETTINGS_CHANGE
-extents, or any other settings that will affect the way that any
+This flag prohibits any modifications to faces, glyphs, specifiers,
-window is displayed.
+extents, or any other settings that will affect the way that any
+window is displayed.
+@end table
 The idea here is that it will finally be safe to call Lisp code from
 nearly any part of the C code, simply by setting any combination of
 restricted operation bit flags.  This even includes from within
 redisplay. (in such a case, all of the bit flags need to be set).  The
 reason that I thought of this is that some coding system translations
 might cause Lisp code to be invoked and C code often invokes these
 translations in sensitive places.
-@end example
 @c Indexing guidelines
 @c I assume that all indexes will be combined.
 @c Therefore, if a generated findex and permutations

Mercurial > hg > xemacs-beta

comparison man/internals/internals.texi @ 2365:ce4aa0ef8af1