Mercurial > hg > xemacs-beta
diff man/internals/internals.texi @ 2365:ce4aa0ef8af1
[xemacs-hg @ 2004-11-04 07:48:14 by ben]
Major work on internals manual. Rearranged many chapters so as to lie in coherent divisions.
Add tons of stuff to Future Work, Old Future Work, Discussions.
Add lots of stuff to Mule section (Multilingual ...).
Remove index.texi, incorporate into internals.texi.
Section on early history and an introduction.
Section on XEmacs split. Lots of new MS Windows docs
Mostly recently: Windows-I18N docs. Lots if new I18N docs.
Loads of other stuff.
.
author | ben |
---|---|
date | Thu, 04 Nov 2004 07:48:14 +0000 |
parents | 6aa56b089139 |
children | 2d4dd2ef74e7 |
line wrap: on
line diff
--- a/man/internals/internals.texi Wed Nov 03 22:52:00 2004 +0000 +++ b/man/internals/internals.texi Thu Nov 04 07:48:14 2004 +0000 @@ -128,114 +128,203 @@ This Info file contains v21.5 of the XEmacs Internals Manual, October 2004. @end ifinfo -@c Don't update this by hand!!!!!! -@c Use C-u C-c C-u m (aka C-u M-x texinfo-master-list). -@c NOTE: This command does not include the Index:: menu entry. -@c You must add it by hand. - -@c Here are some useful Lisp routines for quickly Texinfo-izing text that -@c has been formatted into ASCII lists and tables. The first routine is -@c currently more general and well-developed than the second. - -@c (defun list-to-texinfo (b e) -@c "Convert the selected region from an ASCII list to a Texinfo list." -@c (interactive "r") -@c (save-restriction -@c (narrow-to-region b e) -@c (goto-char (point-min)) -@c (let ((dash-type "^ *-+ +") -@c (num-type "^ *[[(]?\\([0-9]+\\|[a-z]\\)[]).] +") -@c dash) -@c (save-excursion -@c (cond ((re-search-forward num-type nil t)) -@c ((re-search-forward dash-type nil t) (setq dash t)) -@c (t (error "No table entries?")))) -@c (if dash (insert "@itemize @bullet\n") -@c (insert "@enumerate\n")) -@c (while (re-search-forward (if dash dash-type num-type) nil t) -@c (let ((p (point))) -@c (or (re-search-forward (if dash dash-type num-type) nil t) -@c (goto-char (point-max))) -@c (beginning-of-line) -@c (forward-line -1) -@c (let ((q (point))) -@c (goto-char p) -@c (kill-rectangle p q)) -@c (insert "@item\n"))) -@c (goto-char (point-max)) -@c (beginning-of-line) -@c (if dash (insert "@end itemize\n") -@c (insert "@end enumerate\n"))))) - -@c (defun table-to-texinfo (b e) -@c "Convert the selected region from an ASCII table to a Texinfo table." -@c (interactive "r") -@c (save-restriction -@c (narrow-to-region b e) -@c (goto-char (point-min)) -@c (insert "@table @code\n") -@c (while (not (eobp)) -@c (insert "@item ") -@c (forward-sexp) -@c (delete-char) -@c (insert "\n") -@c (or (search-forward "\n\n" nil t) -@c (goto-char (point-max)))) -@c (beginning-of-line) -@c (insert "@end table\n"))) - -@c A useful Lisp routine for adding markup based on conventions used in plain -@c text files; see doc string below. - -@c (defun convert-text-to-texinfo (&optional no-narrow) -@c "Convert text to Texinfo. -@c If the region is active, do the region; otherwise, go from point to the end -@c of the buffer. This query-replaces for various kinds of conventions used -@c in text: @code{} surrounded by ` and ' or followed by a (); @strong{} -@c surrounded by *'s; @file{} something that looks like a file name." -@c (interactive) -@c (if (region-active-p) -@c (save-restriction -@c (narrow-to-region (region-beginning) (region-end)) -@c (convert-comments-to-texinfo t)) -@c (let ((p (point)) -@c (case-replace nil)) -@c (query-replace-regexp "`\\([^']+\\)'\\([^']\\)" "@code{\\1}\\2" nil) -@c (goto-char p) -@c (query-replace-regexp "\\(\\Sw\\)\\*\\(\\(?:\\s_\\|\\sw\\)+\\)\\*\\([^A-Za-z.}]\\)" "\\1@strong{\\2}\\3" nil) -@c (goto-char p) -@c (query-replace-regexp "\\(\\(\\s_\\|\\sw\\)+()\\)\\([^}]\\)" "@code{\\1}\\3" nil) -@c (goto-char p) -@c (query-replace-regexp "\\(\\(\\s_\\|\\sw\\)+\\.[A-Za-z]+\\)\\([^A-Za-z.}]\\)" "@file{\\1}\\3" nil) -@c ))) +@ignore +Don't update this by hand!!!!!! +Use C-u C-c C-u m (aka C-u M-x texinfo-master-list). +NOTE: This command does not include the Index:: menu entry. +You must add it by hand. + +Here are some useful Lisp routines for quickly Texinfo-izing text that +has been formatted into ASCII lists and tables. + +(defun list-to-texinfo (b e) + "Convert the selected region from an ASCII list to a Texinfo list." + (interactive "r") + (save-restriction + (narrow-to-region b e) + (goto-char (point-min)) + (let ((dash-type "^ *-+ +") + ;; allow single-letter numbering or roman numerals + (letter-type "^ *[[(]?\\([a-zA-Z]\\|[IVXivx]+\\)[]).] +") + (num-type "^ *[[(]?[0-9]+[]).] +") + dash regexp) + (save-excursion + (re-search-forward "\\s-*") + (cond ((looking-at dash-type) (setq regexp dash-type dash t)) + ((looking-at letter-type) (setq regexp letter-type)) + ((looking-at num-type) (setq regexp num-type)) + ((re-search-forward num-type nil t) (setq regexp num-type)) + ((re-search-forward letter-type nil t) (setq regexp letter-type)) + ((re-search-forward dash-type nil t) + (setq regexp dash-type dash t)) + (t (error "No table entries?")))) + (if dash (insert "@itemize @bullet\n") + (insert "@enumerate\n")) + (re-search-forward regexp nil 'limit) + (while (not (eobp)) + (delete-region (point-at-bol) (point)) + (insert "@item\n") + ;; move forward over any text following the dash to not screw + ;; up remove-spacing. + (forward-line 1) + (let ((p (point))) + (or (re-search-forward regexp nil t) + (goto-char (point-max))) + ;; trick to avoid using a marker + (save-excursion + ;; back up so as not to affect the line we're on (beginning of + ;; next entry) + (forward-line -1) + (remove-spacing p (point))))) + (beginning-of-line) + (if dash (insert "@end itemize\n") + (insert "@end enumerate\n"))))) + +(defun remove-spacing (b e) + "Remove leading space from the selected region. +This finds the maximum leading blank area common to all lines in the region. +This includes all lines any part of which are in the region." + (interactive "r") + (save-excursion + (let ((min 999999) + seen) + (goto-char e) + (end-of-line) + (setq e (point)) + (goto-char b) + (beginning-of-line) + (setq b (point)) + (while (< (point) e) + (cond ((looking-at "^\\s-+") + (goto-char (match-end 0)) + (setq min (min min (current-column)) + seen t)) + ((looking-at "^\\s-*$")) + (t (setq min 0))) + (forward-line 1)) + (when (and seen (> min 0)) + (goto-char e) + (untabify b e) + ;; we are at end of line already. + (if (not (= (point) (point-at-eol))) + (error "Logic error")) + ;; Pad line with spaces if necessary (it may be just a blank line) + (if (< (current-column) min) + (insert-char ?\ (- min (current-column))) + (beginning-of-line) + (forward-char min)) + (kill-rectangle b (point)))))) + +(defun table-to-texinfo (b e) + "Convert the selected region from an ASCII table to a Texinfo table. +Assumes entries are separated by a blank line, and the first sexp in +each entry is the table heading." + (interactive "r") + (save-restriction + (narrow-to-region b e) + (goto-char (point-min)) + (insert "@table @code\n") + (while (not (eobp)) + ;; remember where we want to insert the @item. + ;; delete the spacing first since inserting the @item may create + ;; a line with no spacing, if there is text following the heading on + ;; the same line. + (let ((beg (point))) + ;; removing the space and inserting the @item will change the + ;; position of the end of the region, so to make it easy on us + ;; leave point at end so it will be adjusted. + (forward-line 1) + (let ((beg2 (point))) + (or (re-search-forward "^$" nil t) + (goto-char (point-max))) + (backward-char 1) + (remove-spacing beg2 (point))) + (ignore-errors (forward-char 2)) + (save-excursion + (goto-char beg) + (insert "@item ") + (forward-sexp) + (delete-char) + (insert "\n")))) + (beginning-of-line) + (insert "@end table\n"))) + +A useful Lisp routine for adding markup based on conventions used in plain +text files; see doc string below. + +(defun convert-text-to-texinfo (&optional no-narrow) + "Convert text to Texinfo. +If the region is active, do the region; otherwise, go from point to the end +of the buffer. This query-replaces for various kinds of conventions used +in text: @code{} surrounded by ` and ' or followed by a (); @strong{} +surrounded by *'s; @file{} something that looks like a file name." + (interactive) + (if (region-active-p) + (save-restriction + (narrow-to-region (region-beginning) (region-end)) + (convert-comments-to-texinfo t)) + (let ((p (point)) + (case-replace nil)) + (query-replace-regexp "`\\([^']+\\)'\\([^']\\)" "@code{\\1}\\2" nil) + (goto-char p) + (query-replace-regexp "\\(\\Sw\\)\\*\\(\\(?:\\s_\\|\\sw\\)+\\)\\*\\([^A-Za-z.}]\\)" "\\1@strong{\\2}\\3" nil) + (goto-char p) + (query-replace-regexp "\\(\\(\\s_\\|\\sw\\)+()\\)\\([^}]\\)" "@code{\\1}\\3" nil) + (goto-char p) + (query-replace-regexp "\\(\\(\\s_\\|\\sw\\)+\\.[A-Za-z]+\\)\\([^A-Za-z.}]\\)" "@file{\\1}\\3" nil) + ))) + +Macro the generate the "Future Work" section from a title; put +point at beginning. + +(defalias 'make-future (read-kbd-macro +"<S-end> <f3> <home> @node SPC <end> RET @section SPC <f4> <home> <up> <C-right> <right> Future SPC Work SPC - - SPC <home> <down> <C-right> <right> Future SPC Work SPC - - SPC <end> RET @cindex SPC future SPC work, SPC <f4> C-r , RET C-x C-x M-l RET @cindex SPC <f4> <home> <C-right> <S-end> M-l , SPC future SPC work RET")) + +Similar but generates a "Discussion" section. + +(defalias 'make-discussion (read-kbd-macro +"<S-end> <f3> <home> @node SPC <end> RET @section SPC <f4> <home> <up> <C-right> <right> Discussion SPC - - SPC <home> <down> <C-right> <right> Discussion SPC - - SPC <end> RET @cindex SPC discussion, SPC <f4> C-r , RET C-x C-x M-l RET @cindex SPC <f4> <home> <C-right> <S-end> M-l , SPC discussion RET")) + +Similar but generates an "Old Future Work" section. + +(defalias 'make-old-future (read-kbd-macro +"<S-end> <f3> <home> @node SPC <end> RET @section SPC <f4> <home> <up> <C-right> <right> Old SPC Future SPC Work SPC - - SPC <home> <down> <C-right> <right> Old SPC Future SPC Work SPC - - SPC <end> RET @cindex SPC old SPC future SPC work, SPC <f4> C-r , RET C-x C-x M-l RET @cindex SPC <f4> <home> <C-right> <S-end> M-l , SPC old SPC future SPC work RET")) + +Similar but generates a general section. + +(defalias 'make-section (read-kbd-macro +"<S-end> <f3> <home> @node SPC <end> RET @section SPC <f4> RET @cindex SPC C-SPC C-g <f4> C-x C-x M-l <home> <down>")) + +Similar but generates a general subsection. + +(defalias 'make-subsection (read-kbd-macro +"<S-end> <f3> <home> @node SPC <end> RET @subsection SPC <f4> RET @cindex SPC C-SPC C-g <f4> C-x C-x M-l <home> <down>")) +@end ignore @menu * Introduction:: Overview of this manual. * Authorship of XEmacs:: * A History of Emacs:: Times, dates, important events. -* XEmacs From the Outside:: A broad conceptual overview. +* The XEmacs Split:: +* XEmacs from the Outside:: A broad conceptual overview. * The Lisp Language:: An overview. -* XEmacs From the Perspective of Building:: +* XEmacs from the Perspective of Building:: * Build-Time Dependencies:: -* XEmacs From the Inside:: -* The XEmacs Object System (Abstractly Speaking):: -* How Lisp Objects Are Represented in C:: +* The Modules of XEmacs:: * Major Textual Changes:: * Rules When Writing New C Code:: * Regression Testing XEmacs:: * CVS Techniques:: -* The Modules of XEmacs:: +* XEmacs from the Inside:: +* The XEmacs Object System (Abstractly Speaking):: +* How Lisp Objects Are Represented in C:: * Allocation of Objects in XEmacs Lisp:: -* Dumping:: -* Events and the Event Loop:: -* Asynchronous Events; Quit Checking:: +* The Lisp Reader and Compiler:: * Evaluation; Stack Frames; Bindings:: * Symbols and Variables:: * Buffers:: * Text:: * Multilingual Support:: -* The Lisp Reader and Compiler:: -* Lstreams:: * Consoles; Devices; Frames; Windows:: * The Redisplay Mechanism:: * Extents:: @@ -243,9 +332,13 @@ * Glyphs:: * Specifiers:: * Menus:: +* Events and the Event Loop:: +* Asynchronous Events; Quit Checking:: +* Lstreams:: * Subprocesses:: * Interface to MS Windows:: * Interface to the X Window System:: +* Dumping:: * Future Work:: * Future Work Discussion:: * Old Future Work:: @@ -262,6 +355,16 @@ * GNU Emacs 20:: The other version 20 Emacs. * XEmacs:: The continuation of Lucid Emacs. +The Modules of XEmacs + +* A Summary of the Various XEmacs Modules:: +* Low-Level Modules:: +* Basic Lisp Modules:: +* Modules for Standard Editing Operations:: +* Modules for Interfacing with the File System:: +* Modules for Other Aspects of the Lisp Interpreter and Object System:: +* Modules for Interfacing with the Operating System:: + Major Textual Changes * Great Integral Type Renaming:: @@ -288,16 +391,6 @@ * Merging a Branch into the Trunk:: -The Modules of XEmacs - -* A Summary of the Various XEmacs Modules:: -* Low-Level Modules:: -* Basic Lisp Modules:: -* Modules for Standard Editing Operations:: -* Modules for Interfacing with the File System:: -* Modules for Other Aspects of the Lisp Interpreter and Object System:: -* Modules for Interfacing with the Operating System:: - Allocation of Objects in XEmacs Lisp * Introduction to Allocation:: @@ -327,52 +420,13 @@ * sweep_strings:: * sweep_bit_vectors_1:: -Dumping - -* Dumping Justification:: -* Overview:: -* Data descriptions:: -* Dumping phase:: -* Reloading phase:: -* Remaining issues:: - -Dumping phase - -* Object inventory:: -* Address allocation:: -* The header:: -* Data dumping:: -* Pointers dumping:: - -Events and the Event Loop - -* Introduction to Events:: -* Main Loop:: -* Specifics of the Event Gathering Mechanism:: -* Specifics About the Emacs Event:: -* Event Queues:: -* Event Stream Callback Routines:: -* Other Event Loop Functions:: -* Stream Pairs:: -* Converting Events:: -* Dispatching Events; The Command Builder:: -* Focus Handling:: -* Editor-Level Control Flow Modules:: - -Asynchronous Events; Quit Checking - -* Signal Handling:: -* Control-G (Quit) Checking:: -* Profiling:: -* Asynchronous Timeouts:: -* Exiting:: - Evaluation; Stack Frames; Bindings * Evaluation:: * Dynamic Binding; The specbinding Stack; Unwind-Protects:: * Simple Special Forms:: * Catch and Throw:: +* Error Trapping:: Symbols and Variables @@ -407,6 +461,7 @@ * Internal Text API's:: * Coding for Mule:: * CCL:: +* Microsoft Windows-Related Multilingual Issues:: * Modules for Internationalization:: Encodings @@ -443,12 +498,16 @@ * An Example of Mule-Aware Code:: * Mule-izing Code:: -Lstreams - -* Creating an Lstream:: Creating an lstream object. -* Lstream Types:: Different sorts of things that are streamed. -* Lstream Functions:: Functions for working with lstreams. -* Lstream Methods:: Creating new lstream types. +Microsoft Windows-Related Multilingual Issues + +* Microsoft Documentation:: +* Locales:: +* More about code pages:: +* More about locales:: +* Unicode support under Windows:: +* The golden rules of writing Unicode-safe code:: +* The format of the locale in setlocale():: +* Random other Windows I18N docs:: Consoles; Devices; Frames; Windows @@ -475,6 +534,36 @@ * Mathematics of Extent Ordering:: A rigorous foundation. * Extent Fragments:: Cached information useful for redisplay. +Events and the Event Loop + +* Introduction to Events:: +* Main Loop:: +* Specifics of the Event Gathering Mechanism:: +* Specifics About the Emacs Event:: +* Event Queues:: +* Event Stream Callback Routines:: +* Other Event Loop Functions:: +* Stream Pairs:: +* Converting Events:: +* Dispatching Events; The Command Builder:: +* Focus Handling:: +* Editor-Level Control Flow Modules:: + +Asynchronous Events; Quit Checking + +* Signal Handling:: +* Control-G (Quit) Checking:: +* Profiling:: +* Asynchronous Timeouts:: +* Exiting:: + +Lstreams + +* Creating an Lstream:: Creating an lstream object. +* Lstream Types:: Different sorts of things that are streamed. +* Lstream Functions:: Functions for working with lstreams. +* Lstream Methods:: Creating new lstream types. + Interface to MS Windows * Different kinds of Windows environments:: @@ -496,8 +585,26 @@ * Progress Bars:: * Tab Controls:: +Dumping + +* Dumping Justification:: +* Overview:: +* Data descriptions:: +* Dumping phase:: +* Reloading phase:: +* Remaining issues:: + +Dumping phase + +* Object inventory:: +* Address allocation:: +* The header:: +* Data dumping:: +* Pointers dumping:: + Future Work +* Future Work -- General Suggestions:: * Future Work -- Elisp Compatibility Package:: * Future Work -- Drag-n-Drop:: * Future Work -- Standard Interface for Enabling Extensions:: @@ -545,6 +652,7 @@ * Future Work -- Autodetection:: * Future Work -- Conversion Error Detection:: +* Future Work -- Unicode:: * Future Work -- BIDI Support:: * Future Work -- Localized Text/Messages:: @@ -552,19 +660,26 @@ * Future Work -- Lisp Engine Discussion:: * Future Work -- Lisp Engine Replacement -- Implementation:: +* Future Work -- Startup File Modification by Packages:: Future Work Discussion * Discussion -- garbage collection:: * Discussion -- glyphs:: +* Discussion -- Dialog Boxes:: +* Discussion -- Multilingual Issues:: +* Discussion -- Windows External Widget:: +* Discussion -- Packages:: +* Discussion -- Distribution Layout:: Old Future Work -* Future Work -- A Portable Unexec Replacement:: -* Future Work -- Indirect Buffers:: -* Future Work -- Improvements in support for non-ASCII (European) keysyms under X:: -* Future Work -- xemacs.org Mailing Address Changes:: -* Future Work -- Lisp callbacks from critical areas of the C code:: +* Old Future Work -- A Portable Unexec Replacement:: +* Old Future Work -- Indirect Buffers:: +* Old Future Work -- Improvements in support for non-ASCII (European) keysyms under X:: +* Old Future Work -- RTF Clipboard Support:: +* Old Future Work -- xemacs.org Mailing Address Changes:: +* Old Future Work -- Lisp callbacks from critical areas of the C code:: @end detailmenu @end menu @@ -597,6 +712,40 @@ assume that the code comments are correct. (Because of the proximity of the comments to the code, comments will rarely be out-of-date.) +The manual is organized in chapters which are broadly grouped into major +divisions: + +@enumerate +@item +First is the introduction, including this chapter and chapters on the +history and authorship of XEmacs. +@item +Next, starting with @ref{XEmacs from the Outside}, are some chapters +giving a broad overview of the internal workings of XEmacs and +documenting important information relevant to those working on the code. +@item +The remaining divisions document the nitty-gritty details of the +internal workings. First, starting with @ref{XEmacs from the Outside}, +is a division on the workings of the Lisp interpreter that drives +XEmacs. +@item +Next, starting with @ref{Buffers}, is a division on the parts of the +code specifically devoted to text processing, including multilingual +support (Mule). +@item +Afterwards, starting with @ref{Consoles; Devices; Frames; Windows}, is a +division covering the display mechanism and the objects and modules +relevant to this. +@item +Then, starting with @ref{Events and the Event Loop}, is a division +covering the interface between XEmacs and the outside world, including +user interactions, subprocesses, file I/O, interfaces to particular +windowing systems, and dumping. +@item +Finally, starting with @ref{Future Work}, is a division containing +proposals and discussion relating to future work on XEmacs. +@end enumerate + This manual was primarily written by Ben Wing. Certain sections were written by others, including those mentioned on the title page as well as other coders. Some sections were lifted directly from comments in @@ -615,11 +764,12 @@ Various cleanup work, mostly post-2000. Object-Oriented Techniques in XEmacs. A Reader's Guide to XEmacs Coding Conventions. Searching and Matching. Regression Testing XEmacs. Modules for Regression Testing. -Lucid Widget Library. +Lucid Widget Library. A number of sections in the Future Work chapter. @item Martin Buchholz Various cleanup work, mostly pre-2001. Docs on inline functions. Docs on dfc conversion functions (Conversion to and from External Data). Improvements in support for non-ASCII (European) keysyms under X. +A section or two in the Future Work chapter. @item Hrvoje Niksic Coding for Mule. @item Matthias Neubauer @@ -632,6 +782,8 @@ Line Start Cache. @item Kenichi Handa CCL. +@item Jamie Zawinski +A couple of sections in the Future Work chapter. @end table @node Authorship of XEmacs, A History of Emacs, Introduction, Top @@ -774,7 +926,7 @@ basically no changes for Xemacs. @end table -@node A History of Emacs, XEmacs From the Outside, Authorship of XEmacs, Top +@node A History of Emacs, The XEmacs Split, Authorship of XEmacs, Top @chapter A History of Emacs @cindex history of Emacs, a @cindex Emacs, a history of @@ -1344,8 +1496,127 @@ version 21.2.46 released March 21, 2001. @end itemize -@node XEmacs From the Outside, The Lisp Language, A History of Emacs, Top -@chapter XEmacs From the Outside +@node The XEmacs Split, XEmacs from the Outside, A History of Emacs, Top +@chapter The XEmacs Split +@cindex XEmacs split + +Author: @uref{mailto:ben@@xemacs.org,Ben Wing} + +@strong{NOTE NOTE NOTE}: The following is a @strong{highly} opinionated +piece written by one of the main authors of XEmacs. This reflects his +opinions, and his only! It is included here because it may help to +clarify some of the issues that are keeping the two versions of Emacs +separate. + +Many people look at the split between GNU Emacs and XEmacs and are +convinced that the XEmacs team is being needlessly divisive and just needs +to cooperate a bit with RMS, and the two versions of Emacs will merge. In +fact there have been six to seven major attempts at merging, each running +hundreds of messages long and all of them coming from the XEmacs side. All +have failed because they have eventually come to the same conclusion, which +is that RMS has no real interest in cooperation at all. If you work with +him, you have to do it his way -- "my way or the highway". Specifically: + +@enumerate +@item + +RMS insists on having legal papers signed for every bit of code that goes +into GNU Emacs. RMS's lawyers have told him that every contribution over +ten lines long requires legal papers. These papers cannot be filled out +over to the web but must be done so in person and mailed to the FSF. +Obviously this by itself has a tendency to inhibit contributions because of +the hassle factor. Furthermore, many people (and especially organizations) +are either hesitant to or refuse to sign legal papers, for reasons +mentioned below. Because of these reasons, XEmacs has never enforced legal +signed papers for the code in it. Such papers are not a part of the GPL and +are not required by any projects other than those of the FSF (for example, +Linux does not require such papers). Since we do not know exactly who is +the author of every bit of code that has been contributed to XEmacs in the +last nine years, we would essentially have to rewrite large sections of the +code. The situation however, is worse than that because many of the large +copyright holders of XEmacs (for example Sun Microsystems) refuse to sign +legal papers. Although they have not stated their reasons, there are quite +a number of reasons not to sign legal papers: + +@itemize @bullet +@item +By doing so you essentially give up all control over your code. You can +no longer release your code under a different license. If you want to +use your code that you've contributed to the FSF in a project of your +own, and that project is not released under the GPL, you are not allowed +to do this. Obviously, large companies tend to want to reuse their code +in many different projects and as a result feel very uncomfortable about +signing legal papers. +@item +One of the dangers of assigning copyright to the FSF is that if the FSF +happens to be taken over by some evil corporate identity or anyone with +different ideas than RMS, they will own all copyright-assigned code, and +can revoke the GPL and enforce any license they please. If the code has +many different copyright holders, this is much less likely of a +scenario. +@end itemize + +@item +RMS does not like abstract data structures. Abstract data structures are +the foundation of XEmacs and most other modern programming projects. In +my opinion, is difficult to impossible to write maintainable and +expandable code without using abstract data structures. In merging talks +with RMS he has said we can have any abstract data structures we want in +a merged version but must allow direct access to the implementation as +well, which defeats the primary purpose of having abstract data +structures. + +@item +RMS is very unwilling to compromise when it comes to divergent +implementations of the same functionality, which is very common between +XEmacs and GNU Emacs. Rather than taking the better interface on +technical grounds, RMS insists that both interfaces must be implemented +in C at the same level (rather than implementing one in C and the other +on top if it), so that code that uses either interface is just as +fast. This means that the resulting merged Emacs would be filled with a +lot of very complicated code to simultaneously support two divergent +interfaces, and would be difficult to maintain in this state. + +@item +RMS's idea of compromise and cooperation is almost purely political +rather than technical. The XEmacs maintainers would like to have issues +resolved by examining them technically and deciding what makes the most +sense from a technical prospective. RMS however, wants to proceed on a +tit for tat kind of basis, which is to say, “If we support this feature +of yours, we also get to support this other feature of mine.” The +result of such a process is typically a big mess, because there is no +overarching design but instead a great deal of incompatible things +hodgepodged together. +@end enumerate + +If only some of the above differences were firmly held by RMS, and if he +were willing to compromise effectively on the others and to demonstrate +willingness to work with us on the issues that he is less willing to +compromise on, we might go ahead with the merge despite misgivings. However +RMS has shown no real interest at all in compromising. He has never stated +how all of the redundant work that would be required to support his +preconditions would get done. It's unlikely that he would do it all and +it's certainly not clear that the XEmacs project would be willing to do it +all, given that it is a tremendous amount of extra work and the XEmacs +project is already strapped for coding resources. (Not to mention the +inherent difficulty in convincing people to redo existing work for +primarily political reasons.) In general the free software community is +quite strapped as a whole for coding resources; duplicative efforts amount +to very little positively and have a lot of negative effects in that they +take away what few resources we do have from projects that would actually +be useful. + +RMS however, does not seem to be bothered by this. He is more interested in +sticking firm to his principles, though the heavens may fall down, than in +working forward to create genuinely useful software. It is abundantly clear +that RMS has no real interest in unity except if it happens to be on his +own terms and allows him ultimate control over the result. He would rather +see nothing happen at all than something that is not exactly according to +his principles. The fact that few if any people share his principles is +meaningless to him. + +@node XEmacs from the Outside, The Lisp Language, The XEmacs Split, Top +@chapter XEmacs from the Outside @cindex XEmacs from the outside @cindex outside, XEmacs from the @cindex read-eval-print @@ -1388,7 +1659,7 @@ as well make it do your taxes, compute pi, play bridge, etc. You'd just have to write functions to do those operations in Lisp. -@node The Lisp Language, XEmacs From the Perspective of Building, XEmacs From the Outside, Top +@node The Lisp Language, XEmacs from the Perspective of Building, XEmacs from the Outside, Top @chapter The Lisp Language @cindex Lisp language, the @cindex Lisp vs. C @@ -1610,8 +1881,8 @@ that makes it a full-fledged application platform, very much like an OS inside the real OS. -@node XEmacs From the Perspective of Building, Build-Time Dependencies, The Lisp Language, Top -@chapter XEmacs From the Perspective of Building +@node XEmacs from the Perspective of Building, Build-Time Dependencies, The Lisp Language, Top +@chapter XEmacs from the Perspective of Building @cindex XEmacs from the perspective of building @cindex building, XEmacs from the perspective of @@ -1721,7 +1992,7 @@ get mighty confused by the tricks played by the XEmacs build process, such as allocating memory in one process, and freeing it in the next. -@node Build-Time Dependencies, XEmacs From the Inside, XEmacs From the Perspective of Building, Top +@node Build-Time Dependencies, The Modules of XEmacs, XEmacs from the Perspective of Building, Top @chapter Build-Time Dependencies @cindex build-time dependencies @cindex dependencies, build-time @@ -1785,2375 +2056,7 @@ @code{custom-declare-variable-list} to prevent the @samp{void-variable} error. (Currently this is only needed for @file{make-docfile.el}.) -@node XEmacs From the Inside, The XEmacs Object System (Abstractly Speaking), Build-Time Dependencies, Top -@chapter XEmacs From the Inside -@cindex XEmacs from the inside -@cindex inside, XEmacs from the - -Internally, XEmacs is quite complex, and can be very confusing. To -simplify things, it can be useful to think of XEmacs as containing an -event loop that ``drives'' everything, and a number of other subsystems, -such as a Lisp engine and a redisplay mechanism. Each of these other -subsystems exists simultaneously in XEmacs, and each has a certain -state. The flow of control continually passes in and out of these -different subsystems in the course of normal operation of the editor. - -It is important to keep in mind that, most of the time, the editor is -``driven'' by the event loop. Except during initialization and batch -mode, all subsystems are entered directly or indirectly through the -event loop, and ultimately, control exits out of all subsystems back up -to the event loop. This cycle of entering a subsystem, exiting back out -to the event loop, and starting another iteration of the event loop -occurs once each keystroke, mouse motion, etc. - -If you're trying to understand a particular subsystem (other than the -event loop), think of it as a ``daemon'' process or ``servant'' that is -responsible for one particular aspect of a larger system, and -periodically receives commands or environment changes that cause it to -do something. Ultimately, these commands and environment changes are -always triggered by the event loop. For example: - -@itemize @bullet -@item -The window and frame mechanism is responsible for keeping track of what -windows and frames exist, what buffers are in them, etc. It is -periodically given commands (usually from the user) to make a change to -the current window/frame state: i.e. create a new frame, delete a -window, etc. - -@item -The buffer mechanism is responsible for keeping track of what buffers -exist and what text is in them. It is periodically given commands -(usually from the user) to insert or delete text, create a buffer, etc. -When it receives a text-change command, it notifies the redisplay -mechanism. - -@item -The redisplay mechanism is responsible for making sure that windows and -frames are displayed correctly. It is periodically told (by the event -loop) to actually ``do its job'', i.e. snoop around and see what the -current state of the environment (mostly of the currently-existing -windows, frames, and buffers) is, and make sure that state matches -what's actually displayed. It keeps lots and lots of information around -(such as what is actually being displayed currently, and what the -environment was last time it checked) so that it can minimize the work -it has to do. It is also helped along in that whenever a relevant -change to the environment occurs, the redisplay mechanism is told about -this, so it has a pretty good idea of where it has to look to find -possible changes and doesn't have to look everywhere. - -@item -The Lisp engine is responsible for executing the Lisp code in which most -user commands are written. It is entered through a call to @code{eval} -or @code{funcall}, which occurs as a result of dispatching an event from -the event loop. The functions it calls issue commands to the buffer -mechanism, the window/frame subsystem, etc. - -@item -The Lisp allocation subsystem is responsible for keeping track of Lisp -objects. It is given commands from the Lisp engine to allocate objects, -garbage collect, etc. -@end itemize - -etc. - - The important idea here is that there are a number of independent -subsystems each with its own responsibility and persistent state, just -like different employees in a company, and each subsystem is -periodically given commands from other subsystems. Commands can flow -from any one subsystem to any other, but there is usually some sort of -hierarchy, with all commands originating from the event subsystem. - - XEmacs is entered in @code{main()}, which is in @file{emacs.c}. When -this is called the first time (in a properly-invoked @file{temacs}), it -does the following: - -@enumerate -@item -It does some very basic environment initializations, such as determining -where it and its directories (e.g. @file{lisp/} and @file{etc/}) reside -and setting up signal handlers. -@item -It initializes the entire Lisp interpreter. -@item -It sets the initial values of many built-in variables (including many -variables that are visible to Lisp programs), such as the global keymap -object and the built-in faces (a face is an object that describes the -display characteristics of text). This involves creating Lisp objects -and thus is dependent on step (2). -@item -It performs various other initializations that are relevant to the -particular environment it is running in, such as retrieving environment -variables, determining the current date and the user who is running the -program, examining its standard input, creating any necessary file -descriptors, etc. -@item -At this point, the C initialization is complete. A Lisp program that -was specified on the command line (usually @file{loadup.el}) is called -(temacs is normally invoked as @code{temacs -batch -l loadup.el dump}). -@file{loadup.el} loads all of the other Lisp files that are needed for -the operation of the editor, calls the @code{dump-emacs} function to -write out @file{xemacs}, and then kills the temacs process. -@end enumerate - - When @file{xemacs} is then run, it only redoes steps (1) and (4) -above; all variables already contain the values they were set to when -the executable was dumped, and all memory that was allocated with -@code{malloc()} is still around. (XEmacs knows whether it is being run -as @file{xemacs} or @file{temacs} because it sets the global variable -@code{initialized} to 1 after step (4) above.) At this point, -@file{xemacs} calls a Lisp function to do any further initialization, -which includes parsing the command-line (the C code can only do limited -command-line parsing, which includes looking for the @samp{-batch} and -@samp{-l} flags and a few other flags that it needs to know about before -initialization is complete), creating the first frame (or @dfn{window} -in standard window-system parlance), running the user's init file -(usually the file @file{.emacs} in the user's home directory), etc. The -function to do this is usually called @code{normal-top-level}; -@file{loadup.el} tells the C code about this function by setting its -name as the value of the Lisp variable @code{top-level}. - - When the Lisp initialization code is done, the C code enters the event -loop, and stays there for the duration of the XEmacs process. The code -for the event loop is contained in @file{cmdloop.c}, and is called -@code{Fcommand_loop_1()}. Note that this event loop could very well be -written in Lisp, and in fact a Lisp version exists; but apparently, -doing this makes XEmacs run noticeably slower. - - Notice how much of the initialization is done in Lisp, not in C. -In general, XEmacs tries to move as much code as is possible -into Lisp. Code that remains in C is code that implements the -Lisp interpreter itself, or code that needs to be very fast, or -code that needs to do system calls or other such stuff that -needs to be done in C, or code that needs to have access to -``forbidden'' structures. (One conscious aspect of the design of -Lisp under XEmacs is a clean separation between the external -interface to a Lisp object's functionality and its internal -implementation. Part of this design is that Lisp programs -are forbidden from accessing the contents of the object other -than through using a standard API. In this respect, XEmacs Lisp -is similar to modern Lisp dialects but differs from GNU Emacs, -which tends to expose the implementation and allow Lisp -programs to look at it directly. The major advantage of -hiding the implementation is that it allows the implementation -to be redesigned without affecting any Lisp programs, including -those that might want to be ``clever'' by looking directly at -the object's contents and possibly manipulating them.) - - Moving code into Lisp makes the code easier to debug and maintain and -makes it much easier for people who are not XEmacs developers to -customize XEmacs, because they can make a change with much less chance -of obscure and unwanted interactions occurring than if they were to -change the C code. - -@node The XEmacs Object System (Abstractly Speaking), How Lisp Objects Are Represented in C, XEmacs From the Inside, Top -@chapter The XEmacs Object System (Abstractly Speaking) -@cindex XEmacs object system (abstractly speaking), the -@cindex object system (abstractly speaking), the XEmacs - - At the heart of the Lisp interpreter is its management of objects. -XEmacs Lisp contains many built-in objects, some of which are -simple and others of which can be very complex; and some of which -are very common, and others of which are rarely used or are only -used internally. (Since the Lisp allocation system, with its -automatic reclamation of unused storage, is so much more convenient -than @code{malloc()} and @code{free()}, the C code makes extensive use of it -in its internal operations.) - - The basic Lisp objects are - -@table @code -@item integer -31 bits of precision, or 63 bits on 64-bit machines; the -reason for this is described below when the internal Lisp object -representation is described. -@item char -An object representing a single character of text; chars behave like -integers in many ways but are logically considered text rather than -numbers and have a different read syntax. (the read syntax for a char -contains the char itself or some textual encoding of it---for example, -a Japanese Kanji character might be encoded as @samp{^[$(B#&^[(B} using the -ISO-2022 encoding standard---rather than the numerical representation -of the char; this way, if the mapping between chars and integers -changes, which is quite possible for Kanji characters and other extended -characters, the same character will still be created. Note that some -primitives confuse chars and integers. The worst culprit is @code{eq}, -which makes a special exception and considers a char to be @code{eq} to -its integer equivalent, even though in no other case are objects of two -different types @code{eq}. The reason for this monstrosity is -compatibility with existing code; the separation of char from integer -came fairly recently.) -@item float -Same precision as a double in C. -@item bignum -@itemx ratio -@itemx bigfloat -As build-time options, arbitrary-precision numbers are available. -Bignums are integers, and when available they remove the restriction on -buffer size. Ratios are non-integral rational numbers. Bigfloats are -arbitrary-precision floating point numbers, with precision specified at -runtime. -@item symbol -An object that contains Lisp objects and is referred to by name; -symbols are used to implement variables and named functions -and to provide the equivalent of preprocessor constants in C. -@item string -Self-explanatory; behaves much like a vector of chars -but has a different read syntax and is stored and manipulated -more compactly. -@item bit-vector -A vector of bits; similar to a string in spirit. -@item vector -A one-dimensional array of Lisp objects providing constant-time access -to any of the objects; access to an arbitrary object in a vector is -faster than for lists, but the operations that can be done on a vector -are more limited. -@item compiled-function -An object containing compiled Lisp code, known as @dfn{byte code}. -@item subr -A Lisp primitive, i.e. a Lisp-callable function implemented in C. -@item cons -A simple container for two Lisp objects, used to implement lists and -most other data structures in Lisp. -@end table - -Objects which are not conses are called atoms. - -@cindex closure -Note that there is no basic ``function'' type, as in more powerful -versions of Lisp (where it's called a @dfn{closure}). XEmacs Lisp does -not provide the closure semantics implemented by Common Lisp and Scheme. -The guts of a function in XEmacs Lisp are represented in one of four -ways: a symbol specifying another function (when one function is an -alias for another), a list (whose first element must be the symbol -@code{lambda}) containing the function's source code, a -compiled-function object, or a subr object. (In other words, given a -symbol specifying the name of a function, calling @code{symbol-function} -to retrieve the contents of the symbol's function cell will return one -of these types of objects.) - -XEmacs Lisp also contains numerous specialized objects used to implement -the editor: - -@table @code -@item buffer -Stores text like a string, but is optimized for insertion and deletion -and has certain other properties that can be set. -@item frame -An object with various properties whose displayable representation is a -@dfn{window} in window-system parlance. -@item window -A section of a frame that displays the contents of a buffer; -often called a @dfn{pane} in window-system parlance. -@item window-configuration -An object that represents a saved configuration of windows in a frame. -@item device -An object representing a screen on which frames can be displayed; -equivalent to a @dfn{display} in the X Window System and a @dfn{TTY} in -character mode. -@item face -An object specifying the appearance of text or graphics; it has -properties such as font, foreground color, and background color. -@item marker -An object that refers to a particular position in a buffer and moves -around as text is inserted and deleted to stay in the same relative -position to the text around it. -@item extent -Similar to a marker but covers a range of text in a buffer; can also -specify properties of the text, such as a face in which the text is to -be displayed, whether the text is invisible or unmodifiable, etc. -@item event -Generated by calling @code{next-event} and contains information -describing a particular event happening in the system, such as the user -pressing a key or a process terminating. -@item keymap -An object that maps from events (described using lists, vectors, and -symbols rather than with an event object because the mapping is for -classes of events, rather than individual events) to functions to -execute or other events to recursively look up; the functions are -described by name, using a symbol, or using lists to specify the -function's code. -@item glyph -An object that describes the appearance of an image (e.g. pixmap) on -the screen; glyphs can be attached to the beginning or end of extents -and in some future version of XEmacs will be able to be inserted -directly into a buffer. -@item process -An object that describes a connection to an externally-running process. -@end table - - There are some other, less-commonly-encountered general objects: - -@table @code -@item hash-table -An object that maps from an arbitrary Lisp object to another arbitrary -Lisp object, using hashing for fast lookup. -@item obarray -A limited form of hash-table that maps from strings to symbols; obarrays -are used to look up a symbol given its name and are not actually their -own object type but are kludgily represented using vectors with hidden -fields (this representation derives from GNU Emacs). -@item specifier -A complex object used to specify the value of a display property; a -default value is given and different values can be specified for -particular frames, buffers, windows, devices, or classes of device. -@item char-table -An object that maps from chars or classes of chars to arbitrary Lisp -objects; internally char tables use a complex nested-vector -representation that is optimized to the way characters are represented -as integers. -@item range-table -An object that maps from ranges of integers to arbitrary Lisp objects. -@end table - - And some strange special-purpose objects: - -@table @code -@item charset -@itemx coding-system -Objects used when MULE, or multi-lingual/Asian-language, support is -enabled. -@item color-instance -@itemx font-instance -@itemx image-instance -An object that encapsulates a window-system resource; instances are -mostly used internally but are exposed on the Lisp level for cleanness -of the specifier model and because it's occasionally useful for Lisp -program to create or query the properties of instances. -@item subwindow -An object that encapsulate a @dfn{subwindow} resource, i.e. a -window-system child window that is drawn into by an external process; -this object should be integrated into the glyph system but isn't yet, -and may change form when this is done. -@item tooltalk-message -@itemx tooltalk-pattern -Objects that represent resources used in the ToolTalk interprocess -communication protocol. -@item toolbar-button -An object used in conjunction with the toolbar. -@end table - - And objects that are only used internally: - -@table @code -@item opaque -A generic object for encapsulating arbitrary memory; this allows you the -generality of @code{malloc()} and the convenience of the Lisp object -system. -@item lstream -A buffering I/O stream, used to provide a unified interface to anything -that can accept output or provide input, such as a file descriptor, a -stdio stream, a chunk of memory, a Lisp buffer, a Lisp string, etc.; -it's a Lisp object to make its memory management more convenient. -@item char-table-entry -Subsidiary objects in the internal char-table representation. -@item extent-auxiliary -@itemx menubar-data -@itemx toolbar-data -Various special-purpose objects that are basically just used to -encapsulate memory for particular subsystems, similar to the more -general ``opaque'' object. -@item symbol-value-forward -@itemx symbol-value-buffer-local -@itemx symbol-value-varalias -@itemx symbol-value-lisp-magic -Special internal-only objects that are placed in the value cell of a -symbol to indicate that there is something special with this variable -- -e.g. it has no value, it mirrors another variable, or it mirrors some C -variable; there is really only one kind of object, called a -@dfn{symbol-value-magic}, but it is sort-of halfway kludged into -semi-different object types. -@end table - -@cindex permanent objects -@cindex temporary objects - Some types of objects are @dfn{permanent}, meaning that once created, -they do not disappear until explicitly destroyed, using a function such -as @code{delete-buffer}, @code{delete-window}, @code{delete-frame}, etc. -Others will disappear once they are not longer used, through the garbage -collection mechanism. Buffers, frames, windows, devices, and processes -are among the objects that are permanent. Note that some objects can go -both ways: Faces can be created either way; extents are normally -permanent, but detached extents (extents not referring to any text, as -happens to some extents when the text they are referring to is deleted) -are temporary. Note that some permanent objects, such as faces and -coding systems, cannot be deleted. Note also that windows are unique in -that they can be @emph{undeleted} after having previously been -deleted. (This happens as a result of restoring a window configuration.) - -@cindex read syntax - Many types of objects have a @dfn{read syntax}, i.e. a way of -specifying an object of that type in Lisp code. When you load a Lisp -file, or type in code to be evaluated, what really happens is that the -function @code{read} is called, which reads some text and creates an object -based on the syntax of that text; then @code{eval} is called, which -possibly does something special; then this loop repeats until there's -no more text to read. (@code{eval} only actually does something special -with symbols, which causes the symbol's value to be returned, -similar to referencing a variable; and with conses [i.e. lists], -which cause a function invocation. All other values are returned -unchanged.) - - The read syntax - -@example -17297 -@end example - -converts to an integer whose value is 17297. - -@example -355/113 -@end example - -converts to a ratio commonly used to approximate @emph{pi} when ratios -are configured, and otherwise to a symbol whose name is ``355/113'' (for -backward compatibility). - -@example -1.983e-4 -@end example - -converts to a float whose value is 1.983e-4, or .0001983. - -@example -?b -@end example - -converts to a char that represents the lowercase letter b. - -@example -?^[$(B#&^[(B -@end example - -(where @samp{^[} actually is an @samp{ESC} character) converts to a -particular Kanji character when using an ISO2022-based coding system for -input. (To decode this goo: @samp{ESC} begins an escape sequence; -@samp{ESC $ (} is a class of escape sequences meaning ``switch to a -94x94 character set''; @samp{ESC $ ( B} means ``switch to Japanese -Kanji''; @samp{#} and @samp{&} collectively index into a 94-by-94 array -of characters [subtract 33 from the ASCII value of each character to get -the corresponding index]; @samp{ESC (} is a class of escape sequences -meaning ``switch to a 94 character set''; @samp{ESC (B} means ``switch -to US ASCII''. It is a coincidence that the letter @samp{B} is used to -denote both Japanese Kanji and US ASCII. If the first @samp{B} were -replaced with an @samp{A}, you'd be requesting a Chinese Hanzi character -from the GB2312 character set.) - -@example -"foobar" -@end example - -converts to a string. - -@example -foobar -@end example - -converts to a symbol whose name is @code{"foobar"}. This is done by -looking up the string equivalent in the global variable -@code{obarray}, whose contents should be an obarray. If no symbol -is found, a new symbol with the name @code{"foobar"} is automatically -created and added to @code{obarray}; this process is called -@dfn{interning} the symbol. -@cindex interning - -@example -(foo . bar) -@end example - -converts to a cons cell containing the symbols @code{foo} and @code{bar}. - -@example -(1 a 2.5) -@end example - -converts to a three-element list containing the specified objects -(note that a list is actually a set of nested conses; see the -XEmacs Lisp Reference). - -@example -[1 a 2.5] -@end example - -converts to a three-element vector containing the specified objects. - -@example -#[... ... ... ...] -@end example - -converts to a compiled-function object (the actual contents are not -shown since they are not relevant here; look at a file that ends with -@file{.elc} for examples). - -@example -#*01110110 -@end example - -converts to a bit-vector. - -@example -#s(hash-table ... ...) -@end example - -converts to a hash table (the actual contents are not shown). - -@example -#s(range-table ... ...) -@end example - -converts to a range table (the actual contents are not shown). - -@example -#s(char-table ... ...) -@end example - -converts to a char table (the actual contents are not shown). - -Note that the @code{#s()} syntax is the general syntax for structures, -which are not really implemented in XEmacs Lisp but should be. - -When an object is printed out (using @code{print} or a related -function), the read syntax is used, so that the same object can be read -in again. - -The other objects do not have read syntaxes, usually because it does not -really make sense to create them in this fashion (i.e. processes, where -it doesn't make sense to have a subprocess created as a side effect of -reading some Lisp code), or because they can't be created at all -(e.g. subrs). Permanent objects, as a rule, do not have a read syntax; -nor do most complex objects, which contain too much state to be easily -initialized through a read syntax. - -@node How Lisp Objects Are Represented in C, Major Textual Changes, The XEmacs Object System (Abstractly Speaking), Top -@chapter How Lisp Objects Are Represented in C -@cindex Lisp objects are represented in C, how -@cindex objects are represented in C, how Lisp -@cindex represented in C, how Lisp objects are - -Lisp objects are represented in C using a 32-bit or 64-bit machine word -(depending on the processor; i.e. DEC Alphas use 64-bit Lisp objects and -most other processors use 32-bit Lisp objects). The representation -stuffs a pointer together with a tag, as follows: - -@example - [ 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 ] - [ 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 ] - - <---------------------------------------------------------> <-> - a pointer to a structure, or an integer tag -@end example - -A tag of 00 is used for all pointer object types, a tag of 10 is used -for characters, and the other two tags 01 and 11 are joined together to -form the integer object type. This representation gives us 31 bit -integers and 30 bit characters, while pointers are represented directly -without any bit masking or shifting. This representation, though, -assumes that pointers to structs are always aligned to multiples of 4, -so the lower 2 bits are always zero. - -Lisp objects use the typedef @code{Lisp_Object}, but the actual C type -used for the Lisp object can vary. It can be either a simple type -(@code{long} on the DEC Alpha, @code{int} on other machines) or a -structure whose fields are bit fields that line up properly (actually, a -union of structures is used). Generally the simple integral type is -preferable because it ensures that the compiler will actually use a -machine word to represent the object (some compilers will use more -general and less efficient code for unions and structs even if they can -fit in a machine word). The union type, however, has the advantage of -stricter type checking. If you accidentally pass an integer where a Lisp -object is desired, you get a compile error. The choice of which type -to use is determined by the preprocessor constant @code{USE_UNION_TYPE} -which is defined via the @code{--use-union-type} option to -@code{configure}. - -Various macros are used to convert between Lisp_Objects and the -corresponding C type. Macros of the form @code{XINT()}, @code{XCHAR()}, -@code{XSTRING()}, @code{XSYMBOL()}, do any required bit shifting and/or -masking and cast it to the appropriate type. @code{XINT()} needs to be -a bit tricky so that negative numbers are properly sign-extended. Since -integers are stored left-shifted, if the right-shift operator does an -arithmetic shift (i.e. it leaves the most-significant bit as-is rather -than shifting in a zero, so that it mimics a divide-by-two even for -negative numbers) the shift to remove the tag bit is enough. This is -the case on all the systems we support. - -Note that when @code{ERROR_CHECK_TYPECHECK} is defined, the converter -macros become more complicated---they check the tag bits and/or the -type field in the first four bytes of a record type to ensure that the -object is really of the correct type. This is great for catching places -where an incorrect type is being dereferenced---this typically results -in a pointer being dereferenced as the wrong type of structure, with -unpredictable (and sometimes not easily traceable) results. - -There are similar @code{XSET@var{TYPE}()} macros that construct a Lisp -object. These macros are of the form @code{XSET@var{TYPE} -(@var{lvalue}, @var{result})}, i.e. they have to be a statement rather -than just used in an expression. The reason for this is that standard C -doesn't let you ``construct'' a structure (but GCC does). Granted, this -sometimes isn't too convenient; for the case of integers, at least, you -can use the function @code{make_int()}, which constructs and -@emph{returns} an integer Lisp object. Note that the -@code{XSET@var{TYPE}()} macros are also affected by -@code{ERROR_CHECK_TYPECHECK} and make sure that the structure is of the -right type in the case of record types, where the type is contained in -the structure. - -The C programmer is responsible for @strong{guaranteeing} that a -Lisp_Object is the correct type before using the @code{X@var{TYPE}} -macros. This is especially important in the case of lists. Use -@code{XCAR} and @code{XCDR} if a Lisp_Object is certainly a cons cell, -else use @code{Fcar()} and @code{Fcdr()}. Trust other C code, but not -Lisp code. On the other hand, if XEmacs has an internal logic error, -it's better to crash immediately, so sprinkle @code{assert()}s and -``unreachable'' @code{abort()}s liberally about the source code. Where -performance is an issue, use @code{type_checking_assert}, -@code{bufpos_checking_assert}, and @code{gc_checking_assert}, which do -nothing unless the corresponding configure error checking flag was -specified. - -@node Major Textual Changes, Rules When Writing New C Code, How Lisp Objects Are Represented in C, Top -@chapter Major Textual Changes -@cindex textual changes, major -@cindex major textual changes - -Sometimes major textual changes are made to the source. This means that -a search-and-replace is done to change type names and such. Some people -disagree with such changes, and certainly if done without good reason -will just lead to headaches. But it's important to keep the code clean -and understable, and consistent naming goes a long way towards this. - -An example of the right way to do this was the so-called "great integral -type renaming". - -@menu -* Great Integral Type Renaming:: -* Text/Char Type Renaming:: -@end menu - -@node Great Integral Type Renaming, Text/Char Type Renaming, Major Textual Changes, Major Textual Changes -@section Great Integral Type Renaming -@cindex Great Integral Type Renaming -@cindex integral type renaming, great -@cindex type renaming, integral -@cindex renaming, integral types - -The purpose of this is to rationalize the names used for various -integral types, so that they match their intended uses and follow -consist conventions, and eliminate types that were not semantically -different from each other. - -The conventions are: - -@itemize @bullet -@item -All integral types that measure quantities of anything are signed. Some -people disagree vociferously with this, but their arguments are mostly -theoretical, and are vastly outweighed by the practical headaches of -mixing signed and unsigned values, and more importantly by the far -increased likelihood of inadvertent bugs: Because of the broken "viral" -nature of unsigned quantities in C (operations involving mixed -signed/unsigned are done unsigned, when exactly the opposite is nearly -always wanted), even a single error in declaring a quantity unsigned -that should be signed, or even the even more subtle error of comparing -signed and unsigned values and forgetting the necessary cast, can be -catastrophic, as comparisons will yield wrong results. -Wsign-compare -is turned on specifically to catch this, but this tends to result in a -great number of warnings when mixing signed and unsigned, and the casts -are annoying. More has been written on this elsewhere. - -@item -All such quantity types just mentioned boil down to EMACS_INT, which is -32 bits on 32-bit machines and 64 bits on 64-bit machines. This is -guaranteed to be the same size as Lisp objects of type @code{int}, and (as -far as I can tell) of size_t (unsigned!) and ssize_t. The only type -below that is not an EMACS_INT is Hashcode, which is an unsigned value -of the same size as EMACS_INT. - -@item -Type names should be relatively short (no more than 10 characters or -so), with the first letter capitalized and no underscores if they can at -all be avoided. - -@item -"count" == a zero-based measurement of some quantity. Includes sizes, -offsets, and indexes. - -@item -"bpos" == a one-based measurement of a position in a buffer. "Charbpos" -and "Bytebpos" count text in the buffer, rather than bytes in memory; -thus Bytebpos does not directly correspond to the memory representation. -Use "Membpos" for this. - -@item -"Char" refers to internal-format characters, not to the C type "char", -which is really a byte. -@end itemize - -For the actual name changes, see the script below. - -I ran the following script to do the conversion. (NOTE: This script is -idempotent. You can safely run it multiple times and it will not screw -up previous results -- in fact, it will do nothing if nothing has -changed. Thus, it can be run repeatedly as necessary to handle patches -coming in from old workspaces, or old branches.) There are two tags, -just before and just after the change: @samp{pre-integral-type-rename} -and @samp{post-integral-type-rename}. When merging code from the main -trunk into a branch, the best thing to do is first merge up to -@samp{pre-integral-type-rename}, then apply the script and associated -changes, then merge from @samp{post-integral-type-change} to the -present. (Alternatively, just do the merging in one operation; but you -may then have a lot of conflicts needing to be resolved by hand.) - -Script @samp{fixtypes.sh} follows: - -@example ------------------------------------ cut ------------------------------------ -files="*.[ch] s/*.h m/*.h config.h.in ../configure.in Makefile.in.in ../lib-src/*.[ch] ../lwlib/*.[ch]" -gr Memory_Count Bytecount $files -gr Lstream_Data_Count Bytecount $files -gr Element_Count Elemcount $files -gr Hash_Code Hashcode $files -gr extcount bytecount $files -gr bufpos charbpos $files -gr bytind bytebpos $files -gr memind membpos $files -gr bufbyte intbyte $files -gr Extcount Bytecount $files -gr Bufpos Charbpos $files -gr Bytind Bytebpos $files -gr Memind Membpos $files -gr Bufbyte Intbyte $files -gr EXTCOUNT BYTECOUNT $files -gr BUFPOS CHARBPOS $files -gr BYTIND BYTEBPOS $files -gr MEMIND MEMBPOS $files -gr BUFBYTE INTBYTE $files -gr MEMORY_COUNT BYTECOUNT $files -gr LSTREAM_DATA_COUNT BYTECOUNT $files -gr ELEMENT_COUNT ELEMCOUNT $files -gr HASH_CODE HASHCODE $files ------------------------------------ cut ------------------------------------ -@end example - -The @samp{gr} script, and the scripts it uses, are documented in -@file{README.global-renaming}, because if placed in this file they would -need to have their @@ characters doubled, meaning you couldn't easily -cut and paste from the source. - -In addition to those programs, I needed to fix up a few other -things, particularly relating to the duplicate definitions of -types, now that some types merged with others. Specifically: - -@enumerate -@item -in @file{lisp.h}, removed duplicate declarations of Bytecount. The changed -code should now look like this: (In each code snippet below, the first -and last lines are the same as the original, as are all lines outside of -those lines. That allows you to locate the section to be replaced, and -replace the stuff in that section, verifying that there isn't anything -new added that would need to be kept.) - -@example ---------------------------------- snip ------------------------------------- -/* Counts of bytes or chars */ -typedef EMACS_INT Bytecount; -typedef EMACS_INT Charcount; - -/* Counts of elements */ -typedef EMACS_INT Elemcount; - -/* Hash codes */ -typedef unsigned long Hashcode; - -/* ------------------------ dynamic arrays ------------------- */ ---------------------------------- snip ------------------------------------- -@end example - -@item -in @file{lstream.h}, removed duplicate declaration of Bytecount. Rewrote the -comment about this type. The changed code should now look like this: - -@example ---------------------------------- snip ------------------------------------- -#endif - -/* The have been some arguments over the what the type should be that - specifies a count of bytes in a data block to be written out or read in, - using @code{Lstream_read()}, @code{Lstream_write()}, and related functions. - Originally it was long, which worked fine; Martin "corrected" these to - size_t and ssize_t on the grounds that this is theoretically cleaner and - is in keeping with the C standards. Unfortunately, this practice is - horribly error-prone due to design flaws in the way that mixed - signed/unsigned arithmetic happens. In fact, by doing this change, - Martin introduced a subtle but fatal error that caused the operation of - sending large mail messages to the SMTP server under Windows to fail. - By putting all values back to be signed, avoiding any signed/unsigned - mixing, the bug immediately went away. The type then in use was - Lstream_Data_Count, so that it be reverted cleanly if a vote came to - that. Now it is Bytecount. - - Some earlier comments about why the type must be signed: This MUST BE - SIGNED, since it also is used in functions that return the number of - bytes actually read to or written from in an operation, and these - functions can return -1 to signal error. - - Note that the standard Unix @code{read()} and @code{write()} functions define the - count going in as a size_t, which is UNSIGNED, and the count going - out as an ssize_t, which is SIGNED. This is a horrible design - flaw. Not only is it highly likely to lead to logic errors when a - -1 gets interpreted as a large positive number, but operations are - bound to fail in all sorts of horrible ways when a number in the - upper-half of the size_t range is passed in -- this number is - unrepresentable as an ssize_t, so code that checks to see how many - bytes are actually written (which is mandatory if you are dealing - with certain types of devices) will get completely screwed up. - - --ben -*/ - -typedef enum lstream_buffering ---------------------------------- snip ------------------------------------- -@end example - -@item -in @file{dumper.c}, there are four places, all inside of @code{switch()} statements, -where XD_BYTECOUNT appears twice as a case tag. In each case, the two -case blocks contain identical code, and you should *REMOVE THE SECOND* -and leave the first. -@end enumerate - -@node Text/Char Type Renaming, , Great Integral Type Renaming, Major Textual Changes -@section Text/Char Type Renaming -@cindex Text/Char Type Renaming -@cindex type renaming, text/char -@cindex renaming, text/char types - -The purpose of this was - -@enumerate -@item -To distinguish between ``charptr'' when it refers to operations on -the pointer itself and when it refers to operations on text -@item -To use consistent naming for everything referring to internal format, i.e. -@end enumerate - -@example - Itext == text in internal format - Ibyte == a byte in such text - Ichar == a char as represented in internal character format -@end example - -Thus e.g. - -@example - set_charptr_emchar -> set_itext_ichar -@end example - -This was done using a script like this: - -@example -files="*.[ch] s/*.h m/*.h config.h.in ../configure.in Makefile.in.in ../lib-src/*.[ch] ../lwlib/*.[ch]" -gr Intbyte Ibyte $files -gr INTBYTE IBYTE $files -gr intbyte ibyte $files -gr EMCHAR ICHAR $files -gr emchar ichar $files -gr Emchar Ichar $files -gr INC_CHARPTR INC_IBYTEPTR $files -gr DEC_CHARPTR DEC_IBYTEPTR $files -gr VALIDATE_CHARPTR VALIDATE_IBYTEPTR $files -gr valid_charptr valid_ibyteptr $files -gr CHARPTR ITEXT $files -gr charptr itext $files -gr Charptr Itext $files -@end example - -See above for the source to @samp{gr}. - -As in the integral-types change, there are pre and post tags before and -after the change: - -@example - pre-internal-format-textual-renaming - post-internal-format-textual-renaming -@end example - -When merging a large branch, follow the same sort of procedure -documented above, using these tags -- essentially sync up to the pre -tag, then apply the script yourself, then sync from the post tag to the -present. You can probably do the same if you don't have a separate -workspace, but do have lots of outstanding changes and you'd rather not -just merge all the textual changes directly. Use something like this: - -(WARNING: I'm not a CVS guru; before trying this, or any large operation -that might potentially mess things up, @strong{DEFINITELY} make a backup of -your existing workspace.) - -@example -cup -r pre-internal-format-textual-renaming -<apply script> -cup -A -j post-internal-format-textual-renaming -j HEAD -@end example - -This might also work: - -@example -cup -j pre-internal-format-textual-renaming -<apply script> -cup -j post-internal-format-textual-renaming -j HEAD -@end example - -ben - -The following is a script to go in the opposite direction: - -@example -files="*.[ch] s/*.h m/*.h config.h.in ../configure.in Makefile.in.in ../lib-src/*.[ch] ../lwlib/*.[ch]" - -# Evidently Perl considers _ to be a word char ala \b, even though XEmacs -# doesn't. We need to be careful here with ibyte/ichar because of words -# like Richard, @code{eicharlen()}, multibyte, HIBYTE, etc. - -gr Ibyte Intbyte $files -gr '\bIBYTE' INTBYTE $files -gr '\bibyte' intbyte $files -gr '\bICHAR' EMCHAR $files -gr '\bichar' emchar $files -gr '\bIchar' Emchar $files -gr '\bIBYTEPTR' CHARPTR $files -gr '\bibyteptr' charptr $files -gr '\bITEXT' CHARPTR $files -gr '\bitext' charptr $files -gr '\bItext' CHARPTR $files - -gr '_IBYTE' _INTBYTE $files -gr '_ibyte' _intbyte $files -gr '_ICHAR' _EMCHAR $files -gr '_ichar' _emchar $files -gr '_Ichar' _Emchar $files -gr '_IBYTEPTR' _CHARPTR $files -gr '_ibyteptr' _charptr $files -gr '_ITEXT' _CHARPTR $files -gr '_itext' _charptr $files -gr '_Itext' _CHARPTR $files -@end example - -@node Rules When Writing New C Code, Regression Testing XEmacs, Major Textual Changes, Top -@chapter Rules When Writing New C Code -@cindex writing new C code, rules when -@cindex C code, rules when writing new -@cindex code, rules when writing new C - -The XEmacs C Code is extremely complex and intricate, and there are many -rules that are more or less consistently followed throughout the code. -Many of these rules are not obvious, so they are explained here. It is -of the utmost importance that you follow them. If you don't, you may -get something that appears to work, but which will crash in odd -situations, often in code far away from where the actual breakage is. - -@menu -* A Reader's Guide to XEmacs Coding Conventions:: -* General Coding Rules:: -* Object-Oriented Techniques for C:: -* Writing Lisp Primitives:: -* Writing Good Comments:: -* Adding Global Lisp Variables:: -* Writing Macros:: -* Proper Use of Unsigned Types:: -* Techniques for XEmacs Developers:: -@end menu - -See also @ref{Coding for Mule}. - -@node A Reader's Guide to XEmacs Coding Conventions, General Coding Rules, Rules When Writing New C Code, Rules When Writing New C Code -@section A Reader's Guide to XEmacs Coding Conventions -@cindex coding conventions -@cindex reader's guide -@cindex coding rules, naming - -Of course the low-level implementation language of XEmacs is C, but much -of that uses the Lisp engine to do its work. However, because the code -is ``inside'' of the protective containment shell around the ``reactor -core,'' you'll see lots of complex ``plumbing'' needed to do the work -and ``safety mechanisms,'' whose failure results in a meltdown. This -section provides a quick overview (or review) of the various components -of the implementation of Lisp objects. - - Two typographic conventions help to identify C objects that implement -Lisp objects. The first is that capitalized identifiers, especially -beginning with the letters @samp{Q}, @samp{V}, @samp{F}, and @samp{S}, -for C variables and functions, and C macros with beginning with the -letter @samp{X}, are used to implement Lisp. The second is that where -Lisp uses the hyphen @samp{-} in symbol names, the corresponding C -identifiers use the underscore @samp{_}. Of course, since XEmacs Lisp -contains interfaces to many external libraries, those external names -will follow the coding conventions their authors chose, and may overlap -the ``XEmacs name space.'' However these cases are usually pretty -obvious. - - All Lisp objects are handled indirectly. The @code{Lisp_Object} -type is usually a pointer to a structure, except for a very small number -of types with immediate representations (currently characters and -integers). However, these types cannot be directly operated on in C -code, either, so they can also be considered indirect. Types that do -not have an immediate representation always have a C typedef -@code{Lisp_@var{type}} for a corresponding structure. -@c #### mention l(c)records here? - - In older code, it was common practice to pass around pointers to -@code{Lisp_@var{type}}, but this is now deprecated in favor of using -@code{Lisp_Object} for all function arguments and return values that are -Lisp objects. The @code{X@var{type}} macro is used to extract the -pointer and cast it to @code{(Lisp_@var{type} *)} for the desired type. - - @strong{Convention}: macros whose names begin with @samp{X} operate on -@code{Lisp_Object}s and do no type-checking. Many such macros are type -extractors, but others implement Lisp operations in C (@emph{e.g.}, -@code{XCAR} implements the Lisp @code{car} function). These are unsafe, -and must only be used where types of all data have already been checked. -Such macros are only applied to @code{Lisp_Object}s. In internal -implementations where the pointer has already been converted, the -structure is operated on directly using the C @code{->} member access -operator. - - The @code{@var{type}P}, @code{CHECK_@var{type}}, and -@code{CONCHECK_@var{type}} macros are used to test types. The first -returns a Boolean value, and the latter signal errors. (The -@samp{CONCHECK} variety allows execution to be CONtinued under some -circumstances, thus the name.) Functions which expect to be passed user -data invariably call @samp{CHECK} macros on arguments. - - There are many types of specialized Lisp objects implemented in C, but -the most pervasive type is the @dfn{symbol}. Symbols are used as -identifiers, variables, and functions. - - @strong{Convention}: Global variables whose names begin with @samp{Q} -are constants whose value is a symbol. The name of the variable should -be derived from the name of the symbol using the same rules as for Lisp -primitives. Such variables allow the C code to check whether a -particular @code{Lisp_Object} is equal to a given symbol. Symbols are -Lisp objects, so these variables may be passed to Lisp primitives. (An -alternative to the use of @samp{Q...} variables is to call the -@code{intern} function at initialization in the -@code{vars_of_@var{module}} function, which is hardly less efficient.) - - @strong{Convention}: Global variables whose names begin with @samp{V} -are variables that contain Lisp objects. The convention here is that -all global variables of type @code{Lisp_Object} begin with @samp{V}, and -no others do (not even integer and boolean variables that have Lisp -equivalents). Most of the time, these variables have equivalents in -Lisp, which are defined via the @samp{DEFVAR} family of macros, but some -don't. Since the variable's value is a @code{Lisp_Object}, it can be -passed to Lisp primitives. - - The implementation of Lisp primitives is more complex. -@strong{Convention}: Global variables with names beginning with @samp{S} -contain a structure that allows the Lisp engine to identify and call a C -function. In modern versions of XEmacs, these identifiers are almost -always completely hidden in the @code{DEFUN} and @code{SUBR} macros, but -you will encounter them if you look at very old versions of XEmacs or at -GNU Emacs. @strong{Convention}: Functions with names beginning with -@samp{F} implement Lisp primitives. Of course all their arguments and -their return values must be Lisp_Objects. (This is hidden in the -@code{DEFUN} macro.) - - -@node General Coding Rules, Object-Oriented Techniques for C, A Reader's Guide to XEmacs Coding Conventions, Rules When Writing New C Code -@section General Coding Rules -@cindex coding rules, general - -The C code is actually written in a dialect of C called @dfn{Clean C}, -meaning that it can be compiled, mostly warning-free, with either a C or -C++ compiler. Coding in Clean C has several advantages over plain C. -C++ compilers are more nit-picking, and a number of coding errors have -been found by compiling with C++. The ability to use both C and C++ -tools means that a greater variety of development tools are available to -the developer. In addition, the ability to overload operators in C++ -means it is possible, for error-checking purposes, to redefine certain -simple types (normally defined as aliases for simple built-in types such -as @code{unsigned char} or @code{long}) as classes, strictly limiting the permissible -operations and catching illegal implicit casts and such. - -Every module includes @file{<config.h>} (angle brackets so that -@samp{--srcdir} works correctly; @file{config.h} may or may not be in -the same directory as the C sources) and @file{lisp.h}. @file{config.h} -must always be included before any other header files (including -system header files) to ensure that certain tricks played by various -@file{s/} and @file{m/} files work out correctly. - -When including header files, always use angle brackets, not double -quotes, except when the file to be included is always in the same -directory as the including file. If either file is a generated file, -then that is not likely to be the case. In order to understand why we -have this rule, imagine what happens when you do a build in the source -directory using @samp{./configure} and another build in another -directory using @samp{../work/configure}. There will be two different -@file{config.h} files. Which one will be used if you @samp{#include -"config.h"}? - -Almost every module contains a @code{syms_of_*()} function and a -@code{vars_of_*()} function. The former declares any Lisp primitives -you have defined and defines any symbols you will be using. The latter -declares any global Lisp variables you have added and initializes global -C variables in the module. @strong{Important}: There are stringent -requirements on exactly what can go into these functions. See the -comment in @file{emacs.c}. The reason for this is to avoid obscure -unwanted interactions during initialization. If you don't follow these -rules, you'll be sorry! If you want to do anything that isn't allowed, -create a @code{complex_vars_of_*()} function for it. Doing this is -tricky, though: you have to make sure your function is called at the -right time so that all the initialization dependencies work out. - -Declare each function of these kinds in @file{symsinit.h}. Make sure -it's called in the appropriate place in @file{emacs.c}. You never need -to include @file{symsinit.h} directly, because it is included by -@file{lisp.h}. - -@strong{All global and static variables that are to be modifiable must -be declared uninitialized.} This means that you may not use the -``declare with initializer'' form for these variables, such as @code{int -some_variable = 0;}. The reason for this has to do with some kludges -done during the dumping process: If possible, the initialized data -segment is re-mapped so that it becomes part of the (unmodifiable) code -segment in the dumped executable. This allows this memory to be shared -among multiple running XEmacs processes. XEmacs is careful to place as -much constant data as possible into initialized variables during the -@file{temacs} phase. - -@cindex copy-on-write -@strong{Please note:} This kludge only works on a few systems nowadays, -and is rapidly becoming irrelevant because most modern operating systems -provide @dfn{copy-on-write} semantics. All data is initially shared -between processes, and a private copy is automatically made (on a -page-by-page basis) when a process first attempts to write to a page of -memory. - -Formerly, there was a requirement that static variables not be declared -inside of functions. This had to do with another hack along the same -vein as what was just described: old USG systems put statically-declared -variables in the initialized data space, so those header files had a -@code{#define static} declaration. (That way, the data-segment remapping -described above could still work.) This fails badly on static variables -inside of functions, which suddenly become automatic variables; -therefore, you weren't supposed to have any of them. This awful kludge -has been removed in XEmacs because - -@enumerate -@item -almost all of the systems that used this kludge ended up having -to disable the data-segment remapping anyway; -@item -the only systems that didn't were extremely outdated ones; -@item -this hack completely messed up inline functions. -@end enumerate - -The C source code makes heavy use of C preprocessor macros. One popular -macro style is: - -@example -#define FOO(var, value) do @{ \ - Lisp_Object FOO_value = (value); \ - ... /* compute using FOO_value */ \ - (var) = bar; \ -@} while (0) -@end example - -The @code{do @{...@} while (0)} is a standard trick to allow FOO to have -statement semantics, so that it can safely be used within an @code{if} -statement in C, for example. Multiple evaluation is prevented by -copying a supplied argument into a local variable, so that -@code{FOO(var,fun(1))} only calls @code{fun} once. - -Lisp lists are popular data structures in the C code as well as in -Elisp. There are two sets of macros that iterate over lists. -@code{EXTERNAL_LIST_LOOP_@var{n}} should be used when the list has been -supplied by the user, and cannot be trusted to be acyclic and -@code{nil}-terminated. A @code{malformed-list} or @code{circular-list} error -will be generated if the list being iterated over is not entirely -kosher. @code{LIST_LOOP_@var{n}}, on the other hand, is faster and less -safe, and can be used only on trusted lists. - -Related macros are @code{GET_EXTERNAL_LIST_LENGTH} and -@code{GET_LIST_LENGTH}, which calculate the length of a list, and in the -case of @code{GET_EXTERNAL_LIST_LENGTH}, validating the properness of -the list. The macros @code{EXTERNAL_LIST_LOOP_DELETE_IF} and -@code{LIST_LOOP_DELETE_IF} delete elements from a lisp list satisfying some -predicate. - -@node Object-Oriented Techniques for C, Writing Lisp Primitives, General Coding Rules, Rules When Writing New C Code -@section Object-Oriented Techniques for C -@cindex coding rules, object-oriented -@cindex object-oriented techniques - -At the lowest levels, XEmacs makes heavy use of object-oriented -techniques to promote code-sharing and uniform interfaces for different -devices and platforms. Commonly, but not always, such objects are -``wrapped'' and exported to Lisp as Lisp objects. Usually they use -the internal structures developed for Lisp objects (the @samp{lrecord} -structure) in order to take advantage of Lisp memory management. -Unfortunately, XEmacs was originally written in C, so these techniques -are based on heavy use of C macros. - -@c You can't use @var{} for type below, because case is important. -A module defining a class is likely to use most of the following -declarations and macros. In the following, the notation @samp{<type>} -will stand for the full name of the class, and will be capitalized in -the way normal for its context. The notation @samp{<typ>} will stand -for the abbreviated form commonly used in macro names, while @samp{ty} -will be used as the typical name for instances of the class. (See the -entry for @samp{MAYBE_<TY>METH} below for an example using all three -notations.) - -In the interface (@file{.h} file), the following declarations are used -often. Others may be used in for particular modules. Since they're -quite short in most cases, the definitions are given as well. The -generic macros used are defined in @file{lisp.h} or @file{lrecord.h}. - -@c #### reorganize this table into stuff used in general code, and stuff -@c used only in declarations or initializations -@table @samp -@c #### declaration -@item typedef struct Lisp_<Type> Lisp_<Type> -This refers to the internal structure used by C code. The XEmacs coding -style now forbids passing pointers to @samp{Lisp_<Type>} structures into -or out of a function; instead, a @samp{Lisp_Object} should be passed or -returned (created using @samp{wrap_<type>}, if necessary). - -@c #### declaration -@item DECLARE_LRECORD (<type>, Lisp_<Type>) -Declares an @samp{lrecord} for @samp{<Type>}, which is the unit of -allocation. - -@item #define X<TYPE>(x) XRECORD (x, <type>, Lisp_<Type>) -Turns a @code{Lisp_Object} into a pointer to @samp{struct Lisp_<Type>}. - -@item #define wrap_<type>(p) wrap_record (p, <type>) -Turns a pointer to @samp{struct Lisp_<Type>} into a @code{Lisp_Object}. - -@item #define <TYPE>P(x) RECORDP (x, <type>) -Tests whether a given @code{Lisp_Object} is of type @samp{Lisp_<Type>}. -Returns a C int, not a Lisp Boolean value. - -@item #define CHECK_<TYPE>(x) CHECK_RECORD (x, <type>) -@itemx #define CONCHECK_<TYPE>(x) CONCHECK_RECORD (x, <type>) -Tests whether a given @code{Lisp_Object} is of type @samp{Lisp_<Type>}, -and signals a Lisp error if not. The @samp{CHECK} version of the macro -never returns if the type is wrong, while the @samp{CONCHECK} version -can return if the user catches it in the debugger and explicitly -requests a return. - -@item #define RAW_<TYP>METH(ty, m) ((ty)->methods->m##_method) -Return a function pointer for the method for an object @var{TY} of class -@samp{Lisp_<Type>}, or @samp{NULL} if there is none for this type. - -@item #define HAS_<TYP>METH_P(ty, m) (!!RAW_<TYP>METH (ty, m)) -Test whether the class that @var{TY} is an instance of has the method. - -@item #define <TYP>METH(ty, m, args) ((RAW_<TYP>METH (ty, m)) args) -Call the method on @samp{args}. @samp{args} must be enclosed in -parentheses in the call. It is the programmer's responsibility to -ensure that the method is available. The standard convenience macro -@samp{MAYBE_<TYP>METH} is often provided for the common case where a -void-returning method of @samp{Type} is called. - -@item #define MAYBE_<TYP>METH(ty, m, args) do @{ ... @} while (0) -Call a void-returning @samp{<Type>} method, if it exists. Note the use -of the @samp{do ... while (0)} idiom to give the macro call C statement -semantics. The full definition is equally idiomatic: - -@example -#define MAYBE_<TYP>METH(ty, m, args) do @{ \ - Lisp_<Type> *maybe_<typ>meth_ty = (ty); \ - if (HAS_<TYP>METH_P (maybe_<typ>meth_ty, m)) \ - <TYP>METH (maybe_<typ>meth_ty, m, args); \ -@} while (0) -@end example -@end table - -The use of macros for invoking an object's methods makes life a bit -difficult for the student or maintainer when browsing the code. In -particular, calls are of the form @samp{<TYP>METH (ty, some_method, (x, -y))}, but definitions typically are for @samp{<subtype>_some_method}. -Thus, when you are trying to find calls, you need to grep for -@samp{some_method}, but this will also catch calls and definitions of -that method for instances of other subtypes of @samp{<Type>}, and there -may be a rather large number of them. - - -@node Writing Lisp Primitives, Writing Good Comments, Object-Oriented Techniques for C, Rules When Writing New C Code -@section Writing Lisp Primitives -@cindex writing Lisp primitives -@cindex Lisp primitives, writing -@cindex primitives, writing Lisp - -Lisp primitives are Lisp functions implemented in C. The details of -interfacing the C function so that Lisp can call it are handled by a few -C macros. The only way to really understand how to write new C code is -to read the source, but we can explain some things here. - -An example of a special form is the definition of @code{prog1}, from -@file{eval.c}. (An ordinary function would have the same general -appearance.) - -@cindex garbage collection protection -@smallexample -@group -DEFUN ("prog1", Fprog1, 1, UNEVALLED, 0, /* -Similar to `progn', but the value of the first form is returned. -\(prog1 FIRST BODY...): All the arguments are evaluated sequentially. -The value of FIRST is saved during evaluation of the remaining args, -whose values are discarded. -*/ - (args)) -@{ - /* This function can GC */ - REGISTER Lisp_Object val, form, tail; - struct gcpro gcpro1; - - val = Feval (XCAR (args)); - - GCPRO1 (val); - - LIST_LOOP_3 (form, XCDR (args), tail) - Feval (form); - - UNGCPRO; - return val; -@} -@end group -@end smallexample - - Let's start with a precise explanation of the arguments to the -@code{DEFUN} macro. Here is a template for them: - -@example -@group -DEFUN (@var{lname}, @var{fname}, @var{min_args}, @var{max_args}, @var{interactive}, /* -@var{docstring} -*/ - (@var{arglist})) -@end group -@end example - -@table @var -@item lname -This string is the name of the Lisp symbol to define as the function -name; in the example above, it is @code{"prog1"}. - -@item fname -This is the C function name for this function. This is the name that is -used in C code for calling the function. The name is, by convention, -@samp{F} prepended to the Lisp name, with all dashes (@samp{-}) in the -Lisp name changed to underscores. Thus, to call this function from C -code, call @code{Fprog1}. Remember that the arguments are of type -@code{Lisp_Object}; various macros and functions for creating values of -type @code{Lisp_Object} are declared in the file @file{lisp.h}. - -Primitives whose names are special characters (e.g. @code{+} or -@code{<}) are named by spelling out, in some fashion, the special -character: e.g. @code{Fplus()} or @code{Flss()}. Primitives whose names -begin with normal alphanumeric characters but also contain special -characters are spelled out in some creative way, e.g. @code{let*} -becomes @code{FletX()}. - -Each function also has an associated structure that holds the data for -the subr object that represents the function in Lisp. This structure -conveys the Lisp symbol name to the initialization routine that will -create the symbol and store the subr object as its definition. The C -variable name of this structure is always @samp{S} prepended to the -@var{fname}. You hardly ever need to be aware of the existence of this -structure, since @code{DEFUN} plus @code{DEFSUBR} takes care of all the -details. - -@item min_args -This is the minimum number of arguments that the function requires. The -function @code{prog1} allows a minimum of one argument. - -@item max_args -This is the maximum number of arguments that the function accepts, if -there is a fixed maximum. Alternatively, it can be @code{UNEVALLED}, -indicating a special form that receives unevaluated arguments, or -@code{MANY}, indicating an unlimited number of evaluated arguments (the -C equivalent of @code{&rest}). Both @code{UNEVALLED} and @code{MANY} -are macros. If @var{max_args} is a number, it may not be less than -@var{min_args} and it may not be greater than 8. (If you need to add a -function with more than 8 arguments, use the @code{MANY} form. Resist -the urge to edit the definition of @code{DEFUN} in @file{lisp.h}. If -you do it anyways, make sure to also add another clause to the switch -statement in @code{primitive_funcall().}) - -@item interactive -This is an interactive specification, a string such as might be used as -the argument of @code{interactive} in a Lisp function. In the case of -@code{prog1}, it is 0 (a null pointer), indicating that @code{prog1} -cannot be called interactively. A value of @code{""} indicates a -function that should receive no arguments when called interactively. - -@item docstring -This is the documentation string. It is written just like a -documentation string for a function defined in Lisp; in particular, the -first line should be a single sentence. Note how the documentation -string is enclosed in a comment, none of the documentation is placed on -the same lines as the comment-start and comment-end characters, and the -comment-start characters are on the same line as the interactive -specification. @file{make-docfile}, which scans the C files for -documentation strings, is very particular about what it looks for, and -will not properly extract the doc string if it's not in this exact format. - -In order to make both @file{etags} and @file{make-docfile} happy, make -sure that the @code{DEFUN} line contains the @var{lname} and -@var{fname}, and that the comment-start characters for the doc string -are on the same line as the interactive specification, and put a newline -directly after them (and before the comment-end characters). - -@item arglist -This is the comma-separated list of arguments to the C function. For a -function with a fixed maximum number of arguments, provide a C argument -for each Lisp argument. In this case, unlike regular C functions, the -types of the arguments are not declared; they are simply always of type -@code{Lisp_Object}. - -The names of the C arguments will be used as the names of the arguments -to the Lisp primitive as displayed in its documentation, modulo the same -concerns described above for @code{F...} names (in particular, -underscores in the C arguments become dashes in the Lisp arguments). - -There is one additional kludge: A trailing @samp{_} on the C argument is -discarded when forming the Lisp argument. This allows C language -reserved words (like @code{default}) or global symbols (like -@code{dirname}) to be used as argument names without compiler warnings -or errors. - -A Lisp function with @w{@var{max_args} = @code{UNEVALLED}} is a -@w{@dfn{special form}}; its arguments are not evaluated. Instead it -receives one argument of type @code{Lisp_Object}, a (Lisp) list of the -unevaluated arguments, conventionally named @code{(args)}. - -When a Lisp function has no upper limit on the number of arguments, -specify @w{@var{max_args} = @code{MANY}}. In this case its implementation in -C actually receives exactly two arguments: the number of Lisp arguments -(an @code{int}) and the address of a block containing their values (a -@w{@code{Lisp_Object *}}). In this case only are the C types specified -in the @var{arglist}: @w{@code{(int nargs, Lisp_Object *args)}}. - -@end table - -Within the function @code{Fprog1} itself, note the use of the macros -@code{GCPRO1} and @code{UNGCPRO}. @code{GCPRO1} is used to ``protect'' -a variable from garbage collection---to inform the garbage collector -that it must look in that variable and regard the object pointed at by -its contents as an accessible object. This is necessary whenever you -call @code{Feval} or anything that can directly or indirectly call -@code{Feval} (this includes the @code{QUIT} macro!). At such a time, -any Lisp object that you intend to refer to again must be protected -somehow. @code{UNGCPRO} cancels the protection of the variables that -are protected in the current function. It is necessary to do this -explicitly. - -The macro @code{GCPRO1} protects just one local variable. If you want -to protect two, use @code{GCPRO2} instead; repeating @code{GCPRO1} will -not work. Macros @code{GCPRO3} and @code{GCPRO4} also exist. - -These macros implicitly use local variables such as @code{gcpro1}; you -must declare these explicitly, with type @code{struct gcpro}. Thus, if -you use @code{GCPRO2}, you must declare @code{gcpro1} and @code{gcpro2}. - -@cindex caller-protects (@code{GCPRO} rule) -Note also that the general rule is @dfn{caller-protects}; i.e. you are -only responsible for protecting those Lisp objects that you create. Any -objects passed to you as arguments should have been protected by whoever -created them, so you don't in general have to protect them. - -In particular, the arguments to any Lisp primitive are always -automatically @code{GCPRO}ed, when called ``normally'' from Lisp code or -bytecode. So only a few Lisp primitives that are called frequently from -C code, such as @code{Fprogn} protect their arguments as a service to -their caller. You don't need to protect your arguments when writing a -new @code{DEFUN}. - -@code{GCPRO}ing is perhaps the trickiest and most error-prone part of -XEmacs coding. It is @strong{extremely} important that you get this -right and use a great deal of discipline when writing this code. -@xref{GCPROing, ,@code{GCPRO}ing}, for full details on how to do this. - -What @code{DEFUN} actually does is declare a global structure of type -@code{Lisp_Subr} whose name begins with capital @samp{SF} and which -contains information about the primitive (e.g. a pointer to the -function, its minimum and maximum allowed arguments, a string describing -its Lisp name); @code{DEFUN} then begins a normal C function declaration -using the @code{F...} name. The Lisp subr object that is the function -definition of a primitive (i.e. the object in the function slot of the -symbol that names the primitive) actually points to this @samp{SF} -structure; when @code{Feval} encounters a subr, it looks in the -structure to find out how to call the C function. - -Defining the C function is not enough to make a Lisp primitive -available; you must also create the Lisp symbol for the primitive (the -symbol is @dfn{interned}; @pxref{Obarrays}) and store a suitable subr -object in its function cell. (If you don't do this, the primitive won't -be seen by Lisp code.) The code looks like this: - -@example -DEFSUBR (@var{fname}); -@end example - -@noindent -Here @var{fname} is the same name you used as the second argument to -@code{DEFUN}. - -This call to @code{DEFSUBR} should go in the @code{syms_of_*()} function -at the end of the module. If no such function exists, create it and -make sure to also declare it in @file{symsinit.h} and call it from the -appropriate spot in @code{main()}. @xref{General Coding Rules}. - -Note that C code cannot call functions by name unless they are defined -in C. The way to call a function written in Lisp from C is to use -@code{Ffuncall}, which embodies the Lisp function @code{funcall}. Since -the Lisp function @code{funcall} accepts an unlimited number of -arguments, in C it takes two: the number of Lisp-level arguments, and a -one-dimensional array containing their values. The first Lisp-level -argument is the Lisp function to call, and the rest are the arguments to -pass to it. Since @code{Ffuncall} can call the evaluator, you must -protect pointers from garbage collection around the call to -@code{Ffuncall}. (However, @code{Ffuncall} explicitly protects all of -its parameters, so you don't have to protect any pointers passed as -parameters to it.) - -The C functions @code{call0}, @code{call1}, @code{call2}, and so on, -provide handy ways to call a Lisp function conveniently with a fixed -number of arguments. They work by calling @code{Ffuncall}. - -@file{eval.c} is a very good file to look through for examples; -@file{lisp.h} contains the definitions for important macros and -functions. - -@node Writing Good Comments, Adding Global Lisp Variables, Writing Lisp Primitives, Rules When Writing New C Code -@section Writing Good Comments -@cindex writing good comments -@cindex comments, writing good - -Comments are a lifeline for programmers trying to understand tricky -code. In general, the less obvious it is what you are doing, the more -you need a comment, and the more detailed it needs to be. You should -always be on guard when you're writing code for stuff that's tricky, and -should constantly be putting yourself in someone else's shoes and asking -if that person could figure out without much difficulty what's going -on. (Assume they are a competent programmer who understands the -essentials of how the XEmacs code is structured but doesn't know much -about the module you're working on or any algorithms you're using.) If -you're not sure whether they would be able to, add a comment. Always -err on the side of more comments, rather than less. - -Generally, when making comments, there is no need to attribute them with -your name or initials. This especially goes for small, -easy-to-understand, non-opinionated ones. Also, comments indicating -where, when, and by whom a file was changed are @emph{strongly} -discouraged, and in general will be removed as they are discovered. -This is exactly what @file{ChangeLogs} are there for. However, it can -occasionally be useful to mark exactly where (but not when or by whom) -changes are made, particularly when making small changes to a file -imported from elsewhere. These marks help when later on a newer version -of the file is imported and the changes need to be merged. (If -everything were always kept in CVS, there would be no need for this. -But in practice, this often doesn't happen, or the CVS repository is -later on lost or unavailable to the person doing the update.) - -When putting in an explicit opinion in a comment, you should -@emph{always} attribute it with your name and the date. This also goes -for long, complex comments explaining in detail the workings of -something -- by putting your name there, you make it possible for -someone who has questions about how that thing works to determine who -wrote the comment so they can write to them. Use your actual name or -your alias at xemacs.org, and not your initials or nickname, unless that -is generally recognized (e.g. @samp{jwz}). Even then, please consider -requesting a virtual user at xemacs.org (forwarding address; we can't -provide an actual mailbox). Otherwise, give first and last name. If -you're not a regular contributor, you might consider putting your email -address in -- it may be in the ChangeLog, but after awhile ChangeLogs -have a tendency of disappearing or getting muddled. (E.g. your comment -may get copied somewhere else or even into another program, and tracking -down the proper ChangeLog may be very difficult.) - -If you come across an opinion that is not or is no longer valid, or you -come across any comment that no longer applies but you want to keep it -around, enclose it in @samp{[[ } and @samp{ ]]} marks and add a comment -afterwards explaining why the preceding comment is no longer valid. Put -your name on this comment, as explained above. - -Just as comments are a lifeline to programmers, incorrect comments are -death. If you come across an incorrect comment, @strong{immediately} -correct it or flag it as incorrect, as described in the previous -paragraph. Whenever you work on a section of code, @emph{always} make -sure to update any comments to be correct -- or, at the very least, flag -them as incorrect. - -To indicate a "todo" or other problem, use four pound signs -- -i.e. @samp{####}. - -@node Adding Global Lisp Variables, Writing Macros, Writing Good Comments, Rules When Writing New C Code -@section Adding Global Lisp Variables -@cindex global Lisp variables, adding -@cindex variables, adding global Lisp - -Global variables whose names begin with @samp{Q} are constants whose -value is a symbol of a particular name. The name of the variable should -be derived from the name of the symbol using the same rules as for Lisp -primitives. These variables are initialized using a call to -@code{defsymbol()} in the @code{syms_of_*()} function. (This call -interns a symbol, sets the C variable to the resulting Lisp object, and -calls @code{staticpro()} on the C variable to tell the -garbage-collection mechanism about this variable. What -@code{staticpro()} does is add a pointer to the variable to a large -global array; when garbage-collection happens, all pointers listed in -the array are used as starting points for marking Lisp objects. This is -important because it's quite possible that the only current reference to -the object is the C variable. In the case of symbols, the -@code{staticpro()} doesn't matter all that much because the symbol is -contained in @code{obarray}, which is itself @code{staticpro()}ed. -However, it's possible that a naughty user could do something like -uninterning the symbol out of @code{obarray} or even setting -@code{obarray} to a different value [although this is likely to make -XEmacs crash!].) - - @strong{Please note:} It is potentially deadly if you declare a -@samp{Q...} variable in two different modules. The two calls to -@code{defsymbol()} are no problem, but some linkers will complain about -multiply-defined symbols. The most insidious aspect of this is that -often the link will succeed anyway, but then the resulting executable -will sometimes crash in obscure ways during certain operations! - -To avoid this problem, declare any symbols with common names (such as -@code{text}) that are not obviously associated with this particular -module in the file @file{general-slots.h}. The ``-slots'' suffix -indicates that this is a file that is included multiple times in -@file{general.c}. Redefinition of preprocessor macros allows the -effects to be different in each context, so this is actually more -convenient and less error-prone than doing it in your module. - - Global variables whose names begin with @samp{V} are variables that -contain Lisp objects. The convention here is that all global variables -of type @code{Lisp_Object} begin with @samp{V}, and all others don't -(including integer and boolean variables that have Lisp -equivalents). Most of the time, these variables have equivalents in -Lisp, but some don't. Those that do are declared this way by a call to -@code{DEFVAR_LISP()} in the @code{vars_of_*()} initializer for the -module. What this does is create a special @dfn{symbol-value-forward} -Lisp object that contains a pointer to the C variable, intern a symbol -whose name is as specified in the call to @code{DEFVAR_LISP()}, and set -its value to the symbol-value-forward Lisp object; it also calls -@code{staticpro()} on the C variable to tell the garbage-collection -mechanism about the variable. When @code{eval} (or actually -@code{symbol-value}) encounters this special object in the process of -retrieving a variable's value, it follows the indirection to the C -variable and gets its value. @code{setq} does similar things so that -the C variable gets changed. - - Whether or not you @code{DEFVAR_LISP()} a variable, you need to -initialize it in the @code{vars_of_*()} function; otherwise it will end -up as all zeroes, which is the integer 0 (@emph{not} @code{nil}), and -this is probably not what you want. Also, if the variable is not -@code{DEFVAR_LISP()}ed, @strong{you must call} @code{staticpro()} on the -C variable in the @code{vars_of_*()} function. Otherwise, the -garbage-collection mechanism won't know that the object in this variable -is in use, and will happily collect it and reuse its storage for another -Lisp object, and you will be the one who's unhappy when you can't figure -out how your variable got overwritten. - -@node Writing Macros, Proper Use of Unsigned Types, Adding Global Lisp Variables, Rules When Writing New C Code -@section Writing Macros -@cindex writing macros -@cindex macros, writing - -The three golden rules of macros: - -@enumerate -@item -Anything that's an lvalue can be evaluated more than once. -@item -Macros where anything else can be evaluated more than once should -have the word "unsafe" in their name (exceptions may be made for -large sets of macros that evaluate arguments of certain types more -than once, e.g. struct buffer * arguments, when clearly indicated in -the macro documentation). These macros are generally meant to be -called only by other macros that have already stored the calling -values in temporary variables. -@item -Nothing else can be evaluated more than once. Use inline -functions, if necessary, to prevent multiple evaluation. -@end enumerate - -NOTE: The functions and macros below are given full prototypes in their -docs, even when the implementation is a macro. In such cases, passing -an argument of a type other than expected will produce undefined -results. Also, given that macros can do things functions can't (in -particular, directly modify arguments as if they were passed by -reference), the declaration syntax has been extended to include the -call-by-reference syntax from C++, where an & after a type indicates -that the argument is an lvalue and is passed by reference, i.e. the -function can modify its value. (This is equivalent in C to passing a -pointer to the argument, but without the need to explicitly worry about -pointers.) - -When to capitalize macros: - -@itemize @bullet -@item -Capitalize macros doing stuff obviously impossible with (C) -functions, e.g. directly modifying arguments as if they were passed by -reference. -@item -Capitalize macros that evaluate @strong{any} argument more than once regardless -of whether that's "allowed" (e.g. buffer arguments). -@item -Capitalize macros that directly access a field in a Lisp_Object or -its equivalent underlying structure. In such cases, access through the -Lisp_Object precedes the macro with an X, and access through the underlying -structure doesn't. -@item -Capitalize certain other basic macros relating to Lisp_Objects; e.g. -FRAMEP, CHECK_FRAME, etc. -@item -Try to avoid capitalizing any other macros. -@end itemize - -@node Proper Use of Unsigned Types, Techniques for XEmacs Developers, Writing Macros, Rules When Writing New C Code -@section Proper Use of Unsigned Types -@cindex unsigned types, proper use of -@cindex types, proper use of unsigned - -Avoid using @code{unsigned int} and @code{unsigned long} whenever -possible. Unsigned types are viral -- any arithmetic or comparisons -involving mixed signed and unsigned types are automatically converted to -unsigned, which is almost certainly not what you want. Many subtle and -hard-to-find bugs are created by careless use of unsigned types. In -general, you should almost @emph{never} use an unsigned type to hold a -regular quantity of any sort. The only exceptions are - -@enumerate -@item -When there's a reasonable possibility you will actually need all 32 or -64 bits to store the quantity. -@item -When calling existing API's that require unsigned types. In this case, -you should still do all manipulation using signed types, and do the -conversion at the very threshold of the API call. -@item -In existing code that you don't want to modify because you don't -maintain it. -@item -In bit-field structures. -@end enumerate - -Other reasonable uses of @code{unsigned int} and @code{unsigned long} -are representing non-quantities -- e.g. bit-oriented flags and such. - -@node Techniques for XEmacs Developers, , Proper Use of Unsigned Types, Rules When Writing New C Code -@section Techniques for XEmacs Developers -@cindex techniques for XEmacs developers -@cindex developers, techniques for XEmacs - -@cindex Purify -@cindex Quantify -To make a purified XEmacs, do: @code{make puremacs}. -To make a quantified XEmacs, do: @code{make quantmacs}. - -You simply can't dump Quantified and Purified images (unless using the -portable dumper). Purify gets confused when xemacs frees memory in one -process that was allocated in a @emph{different} process on a different -machine! Run it like so: -@example -temacs -batch -l loadup.el run-temacs @var{xemacs-args...} -@end example - -@cindex error checking -Before you go through the trouble, are you compiling with all -debugging and error-checking off? If not, try that first. Be warned -that while Quantify is directly responsible for quite a few -optimizations which have been made to XEmacs, doing a run which -generates results which can be acted upon is not necessarily a trivial -task. - -Also, if you're still willing to do some runs make sure you configure -with the @samp{--quantify} flag. That will keep Quantify from starting -to record data until after the loadup is completed and will shut off -recording right before it shuts down (which generates enough bogus data -to throw most results off). It also enables three additional elisp -commands: @code{quantify-start-recording-data}, -@code{quantify-stop-recording-data} and @code{quantify-clear-data}. - -If you want to make XEmacs faster, target your favorite slow benchmark, -run a profiler like Quantify, @code{gprof}, or @code{tcov}, and figure -out where the cycles are going. In many cases you can localize the -problem (because a particular new feature or even a single patch -elicited it). Don't hesitate to use brute force techniques like a -global counter incremented at strategic places, especially in -combination with other performance indications (@emph{e.g.}, degree of -buffer fragmentation into extents). - -Specific projects: - -@itemize @bullet -@item -Make the garbage collector faster. Figure out how to write an -incremental garbage collector. -@item -Write a compiler that takes bytecode and spits out C code. -Unfortunately, you will then need a C compiler and a more fully -developed module system. -@item -Speed up redisplay. -@item -Speed up syntax highlighting. It was suggested that ``maybe moving some -of the syntax highlighting capabilities into C would make a -difference.'' Wrong idea, I think. When processing one 400kB file a -particular low-level routine was being called 40 @emph{million} times -simply for @emph{one} call to @code{newline-and-indent}. Syntax -highlighting needs to be rewritten to use a reliable, fast parser, then -to trust the pre-parsed structure, and only do re-highlighting locally -to a text change. Modern machines are fast enough to implement such -parsers in Lisp; but no machine will ever be fast enough to deal with -quadratic (or worse) algorithms! -@item -Implement tail recursion in Emacs Lisp (hard!). -@end itemize - -Unfortunately, Emacs Lisp is slow, and is going to stay slow. Function -calls in elisp are especially expensive. Iterating over a long list is -going to be 30 times faster implemented in C than in Elisp. - -Heavily used small code fragments need to be fast. The traditional way -to implement such code fragments in C is with macros. But macros in C -are known to be broken. - -@cindex macro hygiene -Macro arguments that are repeatedly evaluated may suffer from repeated -side effects or suboptimal performance. - -Variable names used in macros may collide with caller's variables, -causing (at least) unwanted compiler warnings. - -In order to solve these problems, and maintain statement semantics, one -should use the @code{do @{ ... @} while (0)} trick while trying to -reference macro arguments exactly once using local variables. - -Let's take a look at this poor macro definition: - -@example -#define MARK_OBJECT(obj) \ - if (!marked_p (obj)) mark_object (obj), did_mark = 1 -@end example - -This macro evaluates its argument twice, and also fails if used like this: -@example - if (flag) MARK_OBJECT (obj); else @code{do_something()}; -@end example - -A much better definition is - -@example -#define MARK_OBJECT(obj) do @{ \ - Lisp_Object mo_obj = (obj); \ - if (!marked_p (mo_obj)) \ - @{ \ - mark_object (mo_obj); \ - did_mark = 1; \ - @} \ -@} while (0) -@end example - -Notice the elimination of double evaluation by using the local variable -with the obscure name. Writing safe and efficient macros requires great -care. The one problem with macros that cannot be portably worked around -is, since a C block has no value, a macro used as an expression rather -than a statement cannot use the techniques just described to avoid -multiple evaluation. - -@cindex inline functions -In most cases where a macro has function semantics, an inline function -is a better implementation technique. Modern compiler optimizers tend -to inline functions even if they have no @code{inline} keyword, and -configure magic ensures that the @code{inline} keyword can be safely -used as an additional compiler hint. Inline functions used in a single -.c files are easy. The function must already be defined to be -@code{static}. Just add another @code{inline} keyword to the -definition. - -@example -inline static int -heavily_used_small_function (int arg) -@{ - ... -@} -@end example - -Inline functions in header files are trickier, because we would like to -make the following optimization if the function is @emph{not} inlined -(for example, because we're compiling for debugging). We would like the -function to be defined externally exactly once, and each calling -translation unit would create an external reference to the function, -instead of including a definition of the inline function in the object -code of every translation unit that uses it. This optimization is -currently only available for gcc. But you don't have to worry about the -trickiness; just define your inline functions in header files using this -pattern: - -@example -INLINE_HEADER int -i_used_to_be_a_crufty_macro_but_look_at_me_now (int arg); -INLINE_HEADER int -i_used_to_be_a_crufty_macro_but_look_at_me_now (int arg) -@{ - ... -@} -@end example - -The declaration right before the definition is to prevent warnings when -compiling with @code{gcc -Wmissing-declarations}. I consider issuing -this warning for inline functions a gcc bug, but the gcc maintainers disagree. - -@cindex inline functions, headers -@cindex header files, inline functions -Every header which contains inline functions, either directly by using -@code{INLINE_HEADER} or indirectly by using @code{DECLARE_LRECORD} must -be added to @file{inline.c}'s includes to make the optimization -described above work. (Optimization note: if all INLINE_HEADER -functions are in fact inlined in all translation units, then the linker -can just discard @code{inline.o}, since it contains only unreferenced code). - -To get started debugging XEmacs, take a look at the @file{.gdbinit} and -@file{.dbxrc} files in the @file{src} directory. See the section in the -XEmacs FAQ on How to Debug an XEmacs problem with a debugger. - -After making source code changes, run @code{make check} to ensure that -you haven't introduced any regressions. If you want to make xemacs more -reliable, please improve the test suite in @file{tests/automated}. - -Did you make sure you didn't introduce any new compiler warnings? - -Before submitting a patch, please try compiling at least once with - -@example -configure --with-mule --use-union-type --error-checking=all -@end example - -Here are things to know when you create a new source file: - -@itemize @bullet -@item -All @file{.c} files should @code{#include <config.h>} first. Almost all -@file{.c} files should @code{#include "lisp.h"} second. - -@item -Generated header files should be included using the @samp{#include <...>} -syntax, not the @samp{#include "..."} syntax. The generated headers are: - -@file{config.h sheap-adjust.h paths.h Emacs.ad.h} - -The basic rule is that you should assume builds using @samp{--srcdir} -and the @samp{#include <...>} syntax needs to be used when the -to-be-included generated file is in a potentially different directory -@emph{at compile time}. The non-obvious C rule is that -@samp{#include "..."} means to search for the included file in the same -directory as the including file, @emph{not} in the current directory. -Normally this is not a problem but when building with @samp{--srcdir}, -@file{make} will search the @samp{VPATH} for you, while the C compiler -knows nothing about it. - -@item -Header files should @emph{not} include @samp{<config.h>} and -@samp{"lisp.h"}. It is the responsibility of the @file{.c} files that -use it to do so. - -@end itemize - -@cindex Lisp object types, creating -@cindex creating Lisp object types -@cindex object types, creating Lisp -Here is a checklist of things to do when creating a new lisp object type -named @var{foo}: - -@enumerate -@item -create @var{foo}.h -@item -create @var{foo}.c -@item -add definitions of @code{syms_of_@var{foo}}, etc. to @file{@var{foo}.c} -@item -add declarations of @code{syms_of_@var{foo}}, etc. to @file{symsinit.h} -@item -add calls to @code{syms_of_@var{foo}}, etc. to @file{emacs.c} -@item -add definitions of macros like @code{CHECK_@var{FOO}} and -@code{@var{FOO}P} to @file{@var{foo}.h} -@item -add the new type index to @code{enum lrecord_type} -@item -add a DEFINE_LRECORD_IMPLEMENTATION call to @file{@var{foo}.c} -@item -add an INIT_LRECORD_IMPLEMENTATION call to @code{syms_of_@var{foo}.c} -@end enumerate - -@node Regression Testing XEmacs, CVS Techniques, Rules When Writing New C Code, Top -@chapter Regression Testing XEmacs -@cindex testing, regression - -@menu -* How to Regression-Test:: -* Modules for Regression Testing:: -@end menu - -@node How to Regression-Test, Modules for Regression Testing, Regression Testing XEmacs, Regression Testing XEmacs -@section How to Regression-Test -@cindex how to regression-test -@cindex regression-test, how to -@cindex testing, regression, how to - -The source directory @file{tests/automated} contains XEmacs' automated -test suite. The usual way of running all the tests is running -@code{make check} from the top-level build directory. - -The test suite is unfinished and it's still lacking some essential -features. It is nevertheless recommended that you run the tests to -confirm that XEmacs behaves correctly. - -If you want to run a specific test case, you can do it from the -command-line like this: - -@example -$ xemacs -batch -l test-harness.elc -f batch-test-emacs TEST-FILE -@end example - -If a test fails and you need more information, you can run the test -suite interactively by loading @file{test-harness.el} into a running -XEmacs and typing @kbd{M-x test-emacs-test-file RET <filename> RET}. -You will see a log of passed and failed tests, which should allow you to -investigate the source of the error and ultimately fix the bug. If you -are not capable of, or don't have time for, debugging it yourself, -please do report the failures using @kbd{M-x report-emacs-bug} or -@kbd{M-x build-report}. - -@deffn Command test-emacs-test-file file -Runs the tests in @var{file}. @file{test-harness.el} must be loaded. -Defines all the macros described in this node, and undefines them when -done. -@end deffn - -Adding a new test file is trivial: just create a new file here and it -will be run. There is no need to byte-compile any of the files in -this directory---the test-harness will take care of any necessary -byte-compilation. - -Look at the existing test cases for the examples of coding test cases. -It all boils down to your imagination and judicious use of the macros -@code{Assert}, @code{Check-Error}, @code{Check-Error-Message}, and -@code{Check-Message}. Note that all of these macros are defined only -for the duration of the test: they do not exist in the global -environment. - -@deffn Macro Assert expr -Check that @var{expr} is non-nil at this point in the test. -@end deffn - -@deffn Macro Check-Error expected-error body -Check that execution of @var{body} causes @var{expected-error} to be -signaled. @var{body} is a @code{progn}-like body, and may contain -several expressions. @var{expected-error} is a symbol defined as -an error by @code{define-error}. -@end deffn - -@deffn Macro Check-Error-Message expected-error expected-error-regexp body -Check that execution of @var{body} causes @var{expected-error} to be -signaled, and generate a message matching @var{expected-error-regexp}. -@var{body} is a @code{progn}-like body, and may contain several -expressions. @var{expected-error} is a symbol defined as an error -by @code{define-error}. -@end deffn - -@deffn Macro Check-Message expected-message body -Check that execution of @var{body} causes @var{expected-message} to be -generated (using @code{message} or a similar function). @var{body} is a -@code{progn}-like body, and may contain several expressions. -@end deffn - -Here's a simple example checking case-sensitive and case-insensitive -comparisons from @file{case-tests.el}. - -@example -(with-temp-buffer - (insert "Test Buffer") - (let ((case-fold-search t)) - (goto-char (point-min)) - (Assert (eq (search-forward "test buffer" nil t) 12)) - (goto-char (point-min)) - (Assert (eq (search-forward "Test buffer" nil t) 12)) - (goto-char (point-min)) - (Assert (eq (search-forward "Test Buffer" nil t) 12)) - - (setq case-fold-search nil) - (goto-char (point-min)) - (Assert (not (search-forward "test buffer" nil t))) - (goto-char (point-min)) - (Assert (not (search-forward "Test buffer" nil t))) - (goto-char (point-min)) - (Assert (eq (search-forward "Test Buffer" nil t) 12)))) -@end example - -This example could be saved in a file in @file{tests/automated}, and it -would constitute a complete test, automatically executed when you run -@kbd{make check} after building XEmacs. More complex tests may require -substantial temporary scaffolding to create the environment that elicits -the bugs, but the top-level @file{Makefile} and @file{test-harness.el} -handle the running and collection of results from the @code{Assert}, -@code{Check-Error}, @code{Check-Error-Message}, and @code{Check-Message} -macros. - -Don't suppress tests just because they're due to known bugs not yet -fixed---use the @code{Known-Bug-Expect-Failure} wrapper macro to mark -them. - -@deffn Macro Known-Bug-Expect-Failure body -Arrange for failing tests in @var{body} to generate messages prefixed -with "KNOWN BUG:" instead of "FAIL:". @var{body} is a @code{progn}-like -body, and may contain several tests. -@end deffn - -A lot of the tests we run push limits; suppress Ebola warning messages -with the @code{Ignore-Ebola} wrapper macro. - -@deffn Macro Ignore-Ebola body -Suppress Ebola warning messages while running tests in @var{body}. -@var{body} is a @code{progn}-like body, and may contain several tests. -@end deffn - -Both macros are defined temporarily within the test function. Simple -examples: - -@example -;; Apparently Ignore-Ebola is a solution with no problem to address. -;; There are no examples in 21.5, anyway. - -;; from regexp-tests.el -(Known-Bug-Expect-Failure - (Assert (not (string-match "\\b" ""))) - (Assert (not (string-match " \\b" " ")))) -@end example - -In general, you should avoid using functionality from packages in your -tests, because you can't be sure that everyone will have the required -package. However, if you've got a test that works, by all means add it. -Simply wrap the test in an appropriate test, add a notice that the test -was skipped, and update the @code{skipped-test-reasons} hashtable. The -wrapper macro @code{Skip-Test-Unless} is provided to handle common -cases. - -@defvar skipped-test-reasons -Hash table counting the number of times a particular reason is given for -skipping tests. This is only defined within @code{test-emacs-test-file}. -@end defvar - -@deffn Macro Skip-Test-Unless prerequisite reason description body -@var{prerequisite} is usually a feature test (@code{featurep}, -@code{boundp}, @code{fboundp}). @var{reason} is a string describing the -prerequisite; it must be unique because it is used as a hash key in a -table of reasons for skipping tests. @var{description} describes the -tests being skipped, for the test result summary. @var{body} is a -@code{progn}-like body, and may contain several tests. -@end deffn - -@code{Skip-Test-Unless} is defined temporarily within the test function. -Here's an example of usage from @file{syntax-tests.el}: - -@example -;; Test forward-comment at buffer boundaries -(with-temp-buffer - ;; try to use exactly what you need: featurep, boundp, fboundp - (Skip-Test-Unless (fboundp 'c-mode) - "c-mode unavailable" - "comment and parse-partial-sexp tests" - ;; and here's the test code - (c-mode) - (insert "// comment\n") - (forward-comment -2) - (Assert (eq (point) (point-min))) - (let ((point (point))) - (insert "/* comment */") - (goto-char point) - (forward-comment 2) - (Assert (eq (point) (point-max))) - (parse-partial-sexp point (point-max))))) -@end example - -@code{Skip-Test-Unless} is intended for use with features that are normally -present in typical configurations. For truly optional features, or -tests that apply to one of several alternative implementations (eg, to -GTK widgets, but not Athena, Motif, MS Windows, or Carbon), simply -silently suppress the test if the feature is not available. - -Here are a few general hints for writing tests. - -@enumerate -@item -Include related successful cases. Fixes often break something. - -@item -Use the Known-Bug-Expect-Failure macro to mark the cases you know -are going to fail. We want to be able to distinguish between -regressions and other unexpected failures, and cases that have -been (partially) analyzed but not yet repaired. - -@item -Mark the bug with the date of report. An ``Unfixed since yyyy-mm-dd'' -gloss for Known-Bug-Expect-Failure is planned to further increase -developer embarrassment (== incentive to fix the bug), but until then at -least put a comment about the date so we can easily see when it was -first reported. - -@item -It's a matter of your judgement, but you should often use generic tests -(@emph{e.g.}, @code{eq}) instead of more specific tests (@code{=} for -numbers) even though you know that arguments ``should'' be of correct -type. That is, if the functions used can return generic objects -(typically @code{nil}), as well as some more specific type that will be -returned on success. We don't want failures of those assertions -reported as ``other failures'' (a wrong-type-arg signal, rather than a -null return), we want them reported as ``assertion failures.'' - -One example is a test that tests @code{(= (string-match this that) 0)}, -expecting a successful match. Now suppose @code{string-match} is broken -such that the match fails. Then it will return @code{nil}, and @code{=} -will signal ``wrong-type-argument, number-char-or-marker-p, nil'', -generating an ``other failure'' in the report. But this should be -reported as an assertion failure (the test failed in a foreseeable way), -rather than something else (we don't know what happened because XEmacs -is broken in a way that we weren't trying to test!) -@end enumerate - -@node Modules for Regression Testing, , How to Regression-Test, Regression Testing XEmacs -@section Modules for Regression Testing -@cindex modules for regression testing -@cindex regression testing, modules for - -@example -@file{test-harness.el} -@file{base64-tests.el} -@file{byte-compiler-tests.el} -@file{case-tests.el} -@file{ccl-tests.el} -@file{c-tests.el} -@file{database-tests.el} -@file{extent-tests.el} -@file{hash-table-tests.el} -@file{lisp-tests.el} -@file{md5-tests.el} -@file{mule-tests.el} -@file{regexp-tests.el} -@file{symbol-tests.el} -@file{syntax-tests.el} -@file{tag-tests.el} -@file{weak-tests.el} -@end example - -@file{test-harness.el} defines the macros @code{Assert}, -@code{Check-Error}, @code{Check-Error-Message}, and -@code{Check-Message}. The other files are test files, testing various -XEmacs facilities. @xref{Regression Testing XEmacs}. - - -@node CVS Techniques, The Modules of XEmacs, Regression Testing XEmacs, Top -@chapter CVS Techniques -@cindex CVS techniques - -@menu -* Merging a Branch into the Trunk:: -@end menu - -@node Merging a Branch into the Trunk, , CVS Techniques, CVS Techniques -@section Merging a Branch into the Trunk -@cindex merging a branch into the trunk - -@enumerate -@item -If you haven't already done a merge, you will be merging from the branch -point; otherwise you'll be merging from the last merge point, which -should be marked by a tag, e.g. @samp{last-sync-ben-mule-21-5}. In the -former case, create the last-sync tag, e.g. - -@example -crw rtag -r ben-mule-21-5-bp last-sync-ben-mule-21-5 xemacs -@end example - -(You did create a branch point tag when you created the branch, didn't -you?) - -@item -Check everything in on your branch. - -@item -Tag your branch with a pre-sync tag, e.g. - -@example -crw rtag -r ben-mule-21-5 ben-mule-21-5-pre-feb-20-2002-sync xemacs -@end example - -Note, you need to use rtag and specify a version with @samp{-r} (use -@samp{-r HEAD} if necessary) so that removed files are handled correctly -in some obscure cases. See section 4.8 of the CVS manual. - -@item -Tag the trunk so you have a stable place to merge up to in case people -are asynchronously committing to the trunk, e.g. - -@example -crw rtag -r HEAD main-branch-ben-mule-21-5-syncpoint-feb-20-2002 xemacs -crw rtag -F -r main-branch-ben-mule-21-5-syncpoint-feb-20-2002 next-sync-ben-mule-21-5 xemacs -@end example - -Use -F in the second case because the name might already exist, e.g. if -you've already done a merge. We make two tags because one is a -permanent mark indicating a syncpoint when merging, and the other is a -symbolic tag to make other operations easier. - -@item -Make a backup of your source tree (not totally necessary but useful for -reference and peace of mind): Move one level up from the top directory -of your branch and do, e.g. - -@example -cp -a mule mule-backup-2-23-02 -@end example - -@item -Now, we're ready to merge! Make sure you're in the top directory of -your branch and do, e.g. - -@example -cvs update -j last-sync-ben-mule-21-5 -j next-sync-ben-mule-21-5 -@end example - -@item -Fix all merge conflicts. Get the sucker to compile and run. - -@item -Tag your branch with a post-sync tag, e.g. - -@example -crw rtag -r ben-mule-21-5 ben-mule-21-5-post-feb-20-2002-sync xemacs -@end example - -@item -Update the last-sync tag, e.g. - -@example -crw rtag -F -r next-sync-ben-mule-21-5 last-sync-ben-mule-21-5 xemacs -@end example -@end enumerate - - -@node The Modules of XEmacs, Allocation of Objects in XEmacs Lisp, CVS Techniques, Top +@node The Modules of XEmacs, Major Textual Changes, Build-Time Dependencies, Top @chapter The Modules of XEmacs @cindex modules of XEmacs @@ -5779,8 +3682,2376 @@ This module provides some terminal-control code necessary on versions of AIX prior to 4.1. - -@node Allocation of Objects in XEmacs Lisp, Dumping, The Modules of XEmacs, Top +@node Major Textual Changes, Rules When Writing New C Code, The Modules of XEmacs, Top +@chapter Major Textual Changes +@cindex textual changes, major +@cindex major textual changes + +Sometimes major textual changes are made to the source. This means that +a search-and-replace is done to change type names and such. Some people +disagree with such changes, and certainly if done without good reason +will just lead to headaches. But it's important to keep the code clean +and understable, and consistent naming goes a long way towards this. + +An example of the right way to do this was the so-called "great integral +type renaming". + +@menu +* Great Integral Type Renaming:: +* Text/Char Type Renaming:: +@end menu + +@node Great Integral Type Renaming, Text/Char Type Renaming, Major Textual Changes, Major Textual Changes +@section Great Integral Type Renaming +@cindex Great Integral Type Renaming +@cindex integral type renaming, great +@cindex type renaming, integral +@cindex renaming, integral types + +The purpose of this is to rationalize the names used for various +integral types, so that they match their intended uses and follow +consist conventions, and eliminate types that were not semantically +different from each other. + +The conventions are: + +@itemize @bullet +@item +All integral types that measure quantities of anything are signed. Some +people disagree vociferously with this, but their arguments are mostly +theoretical, and are vastly outweighed by the practical headaches of +mixing signed and unsigned values, and more importantly by the far +increased likelihood of inadvertent bugs: Because of the broken "viral" +nature of unsigned quantities in C (operations involving mixed +signed/unsigned are done unsigned, when exactly the opposite is nearly +always wanted), even a single error in declaring a quantity unsigned +that should be signed, or even the even more subtle error of comparing +signed and unsigned values and forgetting the necessary cast, can be +catastrophic, as comparisons will yield wrong results. -Wsign-compare +is turned on specifically to catch this, but this tends to result in a +great number of warnings when mixing signed and unsigned, and the casts +are annoying. More has been written on this elsewhere. + +@item +All such quantity types just mentioned boil down to EMACS_INT, which is +32 bits on 32-bit machines and 64 bits on 64-bit machines. This is +guaranteed to be the same size as Lisp objects of type @code{int}, and (as +far as I can tell) of size_t (unsigned!) and ssize_t. The only type +below that is not an EMACS_INT is Hashcode, which is an unsigned value +of the same size as EMACS_INT. + +@item +Type names should be relatively short (no more than 10 characters or +so), with the first letter capitalized and no underscores if they can at +all be avoided. + +@item +"count" == a zero-based measurement of some quantity. Includes sizes, +offsets, and indexes. + +@item +"bpos" == a one-based measurement of a position in a buffer. "Charbpos" +and "Bytebpos" count text in the buffer, rather than bytes in memory; +thus Bytebpos does not directly correspond to the memory representation. +Use "Membpos" for this. + +@item +"Char" refers to internal-format characters, not to the C type "char", +which is really a byte. +@end itemize + +For the actual name changes, see the script below. + +I ran the following script to do the conversion. (NOTE: This script is +idempotent. You can safely run it multiple times and it will not screw +up previous results -- in fact, it will do nothing if nothing has +changed. Thus, it can be run repeatedly as necessary to handle patches +coming in from old workspaces, or old branches.) There are two tags, +just before and just after the change: @samp{pre-integral-type-rename} +and @samp{post-integral-type-rename}. When merging code from the main +trunk into a branch, the best thing to do is first merge up to +@samp{pre-integral-type-rename}, then apply the script and associated +changes, then merge from @samp{post-integral-type-change} to the +present. (Alternatively, just do the merging in one operation; but you +may then have a lot of conflicts needing to be resolved by hand.) + +Script @samp{fixtypes.sh} follows: + +@example +----------------------------------- cut ------------------------------------ +files="*.[ch] s/*.h m/*.h config.h.in ../configure.in Makefile.in.in ../lib-src/*.[ch] ../lwlib/*.[ch]" +gr Memory_Count Bytecount $files +gr Lstream_Data_Count Bytecount $files +gr Element_Count Elemcount $files +gr Hash_Code Hashcode $files +gr extcount bytecount $files +gr bufpos charbpos $files +gr bytind bytebpos $files +gr memind membpos $files +gr bufbyte intbyte $files +gr Extcount Bytecount $files +gr Bufpos Charbpos $files +gr Bytind Bytebpos $files +gr Memind Membpos $files +gr Bufbyte Intbyte $files +gr EXTCOUNT BYTECOUNT $files +gr BUFPOS CHARBPOS $files +gr BYTIND BYTEBPOS $files +gr MEMIND MEMBPOS $files +gr BUFBYTE INTBYTE $files +gr MEMORY_COUNT BYTECOUNT $files +gr LSTREAM_DATA_COUNT BYTECOUNT $files +gr ELEMENT_COUNT ELEMCOUNT $files +gr HASH_CODE HASHCODE $files +----------------------------------- cut ------------------------------------ +@end example + +The @samp{gr} script, and the scripts it uses, are documented in +@file{README.global-renaming}, because if placed in this file they would +need to have their @@ characters doubled, meaning you couldn't easily +cut and paste from the source. + +In addition to those programs, I needed to fix up a few other +things, particularly relating to the duplicate definitions of +types, now that some types merged with others. Specifically: + +@enumerate +@item +in @file{lisp.h}, removed duplicate declarations of Bytecount. The changed +code should now look like this: (In each code snippet below, the first +and last lines are the same as the original, as are all lines outside of +those lines. That allows you to locate the section to be replaced, and +replace the stuff in that section, verifying that there isn't anything +new added that would need to be kept.) + +@example +--------------------------------- snip ------------------------------------- +/* Counts of bytes or chars */ +typedef EMACS_INT Bytecount; +typedef EMACS_INT Charcount; + +/* Counts of elements */ +typedef EMACS_INT Elemcount; + +/* Hash codes */ +typedef unsigned long Hashcode; + +/* ------------------------ dynamic arrays ------------------- */ +--------------------------------- snip ------------------------------------- +@end example + +@item +in @file{lstream.h}, removed duplicate declaration of Bytecount. Rewrote the +comment about this type. The changed code should now look like this: + +@example +--------------------------------- snip ------------------------------------- +#endif + +/* The have been some arguments over the what the type should be that + specifies a count of bytes in a data block to be written out or read in, + using @code{Lstream_read()}, @code{Lstream_write()}, and related functions. + Originally it was long, which worked fine; Martin "corrected" these to + size_t and ssize_t on the grounds that this is theoretically cleaner and + is in keeping with the C standards. Unfortunately, this practice is + horribly error-prone due to design flaws in the way that mixed + signed/unsigned arithmetic happens. In fact, by doing this change, + Martin introduced a subtle but fatal error that caused the operation of + sending large mail messages to the SMTP server under Windows to fail. + By putting all values back to be signed, avoiding any signed/unsigned + mixing, the bug immediately went away. The type then in use was + Lstream_Data_Count, so that it be reverted cleanly if a vote came to + that. Now it is Bytecount. + + Some earlier comments about why the type must be signed: This MUST BE + SIGNED, since it also is used in functions that return the number of + bytes actually read to or written from in an operation, and these + functions can return -1 to signal error. + + Note that the standard Unix @code{read()} and @code{write()} functions define the + count going in as a size_t, which is UNSIGNED, and the count going + out as an ssize_t, which is SIGNED. This is a horrible design + flaw. Not only is it highly likely to lead to logic errors when a + -1 gets interpreted as a large positive number, but operations are + bound to fail in all sorts of horrible ways when a number in the + upper-half of the size_t range is passed in -- this number is + unrepresentable as an ssize_t, so code that checks to see how many + bytes are actually written (which is mandatory if you are dealing + with certain types of devices) will get completely screwed up. + + --ben +*/ + +typedef enum lstream_buffering +--------------------------------- snip ------------------------------------- +@end example + +@item +in @file{dumper.c}, there are four places, all inside of @code{switch()} statements, +where XD_BYTECOUNT appears twice as a case tag. In each case, the two +case blocks contain identical code, and you should *REMOVE THE SECOND* +and leave the first. +@end enumerate + +@node Text/Char Type Renaming, , Great Integral Type Renaming, Major Textual Changes +@section Text/Char Type Renaming +@cindex Text/Char Type Renaming +@cindex type renaming, text/char +@cindex renaming, text/char types + +The purpose of this was + +@enumerate +@item +To distinguish between ``charptr'' when it refers to operations on +the pointer itself and when it refers to operations on text +@item +To use consistent naming for everything referring to internal format, i.e. +@end enumerate + +@example + Itext == text in internal format + Ibyte == a byte in such text + Ichar == a char as represented in internal character format +@end example + +Thus e.g. + +@example + set_charptr_emchar -> set_itext_ichar +@end example + +This was done using a script like this: + +@example +files="*.[ch] s/*.h m/*.h config.h.in ../configure.in Makefile.in.in ../lib-src/*.[ch] ../lwlib/*.[ch]" +gr Intbyte Ibyte $files +gr INTBYTE IBYTE $files +gr intbyte ibyte $files +gr EMCHAR ICHAR $files +gr emchar ichar $files +gr Emchar Ichar $files +gr INC_CHARPTR INC_IBYTEPTR $files +gr DEC_CHARPTR DEC_IBYTEPTR $files +gr VALIDATE_CHARPTR VALIDATE_IBYTEPTR $files +gr valid_charptr valid_ibyteptr $files +gr CHARPTR ITEXT $files +gr charptr itext $files +gr Charptr Itext $files +@end example + +See above for the source to @samp{gr}. + +As in the integral-types change, there are pre and post tags before and +after the change: + +@example + pre-internal-format-textual-renaming + post-internal-format-textual-renaming +@end example + +When merging a large branch, follow the same sort of procedure +documented above, using these tags -- essentially sync up to the pre +tag, then apply the script yourself, then sync from the post tag to the +present. You can probably do the same if you don't have a separate +workspace, but do have lots of outstanding changes and you'd rather not +just merge all the textual changes directly. Use something like this: + +(WARNING: I'm not a CVS guru; before trying this, or any large operation +that might potentially mess things up, @strong{DEFINITELY} make a backup of +your existing workspace.) + +@example +cup -r pre-internal-format-textual-renaming +<apply script> +cup -A -j post-internal-format-textual-renaming -j HEAD +@end example + +This might also work: + +@example +cup -j pre-internal-format-textual-renaming +<apply script> +cup -j post-internal-format-textual-renaming -j HEAD +@end example + +ben + +The following is a script to go in the opposite direction: + +@example +files="*.[ch] s/*.h m/*.h config.h.in ../configure.in Makefile.in.in ../lib-src/*.[ch] ../lwlib/*.[ch]" + +# Evidently Perl considers _ to be a word char ala \b, even though XEmacs +# doesn't. We need to be careful here with ibyte/ichar because of words +# like Richard, @code{eicharlen()}, multibyte, HIBYTE, etc. + +gr Ibyte Intbyte $files +gr '\bIBYTE' INTBYTE $files +gr '\bibyte' intbyte $files +gr '\bICHAR' EMCHAR $files +gr '\bichar' emchar $files +gr '\bIchar' Emchar $files +gr '\bIBYTEPTR' CHARPTR $files +gr '\bibyteptr' charptr $files +gr '\bITEXT' CHARPTR $files +gr '\bitext' charptr $files +gr '\bItext' CHARPTR $files + +gr '_IBYTE' _INTBYTE $files +gr '_ibyte' _intbyte $files +gr '_ICHAR' _EMCHAR $files +gr '_ichar' _emchar $files +gr '_Ichar' _Emchar $files +gr '_IBYTEPTR' _CHARPTR $files +gr '_ibyteptr' _charptr $files +gr '_ITEXT' _CHARPTR $files +gr '_itext' _charptr $files +gr '_Itext' _CHARPTR $files +@end example + +@node Rules When Writing New C Code, Regression Testing XEmacs, Major Textual Changes, Top +@chapter Rules When Writing New C Code +@cindex writing new C code, rules when +@cindex C code, rules when writing new +@cindex code, rules when writing new C + +The XEmacs C Code is extremely complex and intricate, and there are many +rules that are more or less consistently followed throughout the code. +Many of these rules are not obvious, so they are explained here. It is +of the utmost importance that you follow them. If you don't, you may +get something that appears to work, but which will crash in odd +situations, often in code far away from where the actual breakage is. + +@menu +* A Reader's Guide to XEmacs Coding Conventions:: +* General Coding Rules:: +* Object-Oriented Techniques for C:: +* Writing Lisp Primitives:: +* Writing Good Comments:: +* Adding Global Lisp Variables:: +* Writing Macros:: +* Proper Use of Unsigned Types:: +* Techniques for XEmacs Developers:: +@end menu + +See also @ref{Coding for Mule}. + +@node A Reader's Guide to XEmacs Coding Conventions, General Coding Rules, Rules When Writing New C Code, Rules When Writing New C Code +@section A Reader's Guide to XEmacs Coding Conventions +@cindex coding conventions +@cindex reader's guide +@cindex coding rules, naming + +Of course the low-level implementation language of XEmacs is C, but much +of that uses the Lisp engine to do its work. However, because the code +is ``inside'' of the protective containment shell around the ``reactor +core,'' you'll see lots of complex ``plumbing'' needed to do the work +and ``safety mechanisms,'' whose failure results in a meltdown. This +section provides a quick overview (or review) of the various components +of the implementation of Lisp objects. + + Two typographic conventions help to identify C objects that implement +Lisp objects. The first is that capitalized identifiers, especially +beginning with the letters @samp{Q}, @samp{V}, @samp{F}, and @samp{S}, +for C variables and functions, and C macros with beginning with the +letter @samp{X}, are used to implement Lisp. The second is that where +Lisp uses the hyphen @samp{-} in symbol names, the corresponding C +identifiers use the underscore @samp{_}. Of course, since XEmacs Lisp +contains interfaces to many external libraries, those external names +will follow the coding conventions their authors chose, and may overlap +the ``XEmacs name space.'' However these cases are usually pretty +obvious. + + All Lisp objects are handled indirectly. The @code{Lisp_Object} +type is usually a pointer to a structure, except for a very small number +of types with immediate representations (currently characters and +integers). However, these types cannot be directly operated on in C +code, either, so they can also be considered indirect. Types that do +not have an immediate representation always have a C typedef +@code{Lisp_@var{type}} for a corresponding structure. +@c #### mention l(c)records here? + + In older code, it was common practice to pass around pointers to +@code{Lisp_@var{type}}, but this is now deprecated in favor of using +@code{Lisp_Object} for all function arguments and return values that are +Lisp objects. The @code{X@var{type}} macro is used to extract the +pointer and cast it to @code{(Lisp_@var{type} *)} for the desired type. + + @strong{Convention}: macros whose names begin with @samp{X} operate on +@code{Lisp_Object}s and do no type-checking. Many such macros are type +extractors, but others implement Lisp operations in C (@emph{e.g.}, +@code{XCAR} implements the Lisp @code{car} function). These are unsafe, +and must only be used where types of all data have already been checked. +Such macros are only applied to @code{Lisp_Object}s. In internal +implementations where the pointer has already been converted, the +structure is operated on directly using the C @code{->} member access +operator. + + The @code{@var{type}P}, @code{CHECK_@var{type}}, and +@code{CONCHECK_@var{type}} macros are used to test types. The first +returns a Boolean value, and the latter signal errors. (The +@samp{CONCHECK} variety allows execution to be CONtinued under some +circumstances, thus the name.) Functions which expect to be passed user +data invariably call @samp{CHECK} macros on arguments. + + There are many types of specialized Lisp objects implemented in C, but +the most pervasive type is the @dfn{symbol}. Symbols are used as +identifiers, variables, and functions. + + @strong{Convention}: Global variables whose names begin with @samp{Q} +are constants whose value is a symbol. The name of the variable should +be derived from the name of the symbol using the same rules as for Lisp +primitives. Such variables allow the C code to check whether a +particular @code{Lisp_Object} is equal to a given symbol. Symbols are +Lisp objects, so these variables may be passed to Lisp primitives. (An +alternative to the use of @samp{Q...} variables is to call the +@code{intern} function at initialization in the +@code{vars_of_@var{module}} function, which is hardly less efficient.) + + @strong{Convention}: Global variables whose names begin with @samp{V} +are variables that contain Lisp objects. The convention here is that +all global variables of type @code{Lisp_Object} begin with @samp{V}, and +no others do (not even integer and boolean variables that have Lisp +equivalents). Most of the time, these variables have equivalents in +Lisp, which are defined via the @samp{DEFVAR} family of macros, but some +don't. Since the variable's value is a @code{Lisp_Object}, it can be +passed to Lisp primitives. + + The implementation of Lisp primitives is more complex. +@strong{Convention}: Global variables with names beginning with @samp{S} +contain a structure that allows the Lisp engine to identify and call a C +function. In modern versions of XEmacs, these identifiers are almost +always completely hidden in the @code{DEFUN} and @code{SUBR} macros, but +you will encounter them if you look at very old versions of XEmacs or at +GNU Emacs. @strong{Convention}: Functions with names beginning with +@samp{F} implement Lisp primitives. Of course all their arguments and +their return values must be Lisp_Objects. (This is hidden in the +@code{DEFUN} macro.) + + +@node General Coding Rules, Object-Oriented Techniques for C, A Reader's Guide to XEmacs Coding Conventions, Rules When Writing New C Code +@section General Coding Rules +@cindex coding rules, general + +The C code is actually written in a dialect of C called @dfn{Clean C}, +meaning that it can be compiled, mostly warning-free, with either a C or +C++ compiler. Coding in Clean C has several advantages over plain C. +C++ compilers are more nit-picking, and a number of coding errors have +been found by compiling with C++. The ability to use both C and C++ +tools means that a greater variety of development tools are available to +the developer. In addition, the ability to overload operators in C++ +means it is possible, for error-checking purposes, to redefine certain +simple types (normally defined as aliases for simple built-in types such +as @code{unsigned char} or @code{long}) as classes, strictly limiting the permissible +operations and catching illegal implicit casts and such. + +Every module includes @file{<config.h>} (angle brackets so that +@samp{--srcdir} works correctly; @file{config.h} may or may not be in +the same directory as the C sources) and @file{lisp.h}. @file{config.h} +must always be included before any other header files (including +system header files) to ensure that certain tricks played by various +@file{s/} and @file{m/} files work out correctly. + +When including header files, always use angle brackets, not double +quotes, except when the file to be included is always in the same +directory as the including file. If either file is a generated file, +then that is not likely to be the case. In order to understand why we +have this rule, imagine what happens when you do a build in the source +directory using @samp{./configure} and another build in another +directory using @samp{../work/configure}. There will be two different +@file{config.h} files. Which one will be used if you @samp{#include +"config.h"}? + +Almost every module contains a @code{syms_of_*()} function and a +@code{vars_of_*()} function. The former declares any Lisp primitives +you have defined and defines any symbols you will be using. The latter +declares any global Lisp variables you have added and initializes global +C variables in the module. @strong{Important}: There are stringent +requirements on exactly what can go into these functions. See the +comment in @file{emacs.c}. The reason for this is to avoid obscure +unwanted interactions during initialization. If you don't follow these +rules, you'll be sorry! If you want to do anything that isn't allowed, +create a @code{complex_vars_of_*()} function for it. Doing this is +tricky, though: you have to make sure your function is called at the +right time so that all the initialization dependencies work out. + +Declare each function of these kinds in @file{symsinit.h}. Make sure +it's called in the appropriate place in @file{emacs.c}. You never need +to include @file{symsinit.h} directly, because it is included by +@file{lisp.h}. + +@strong{All global and static variables that are to be modifiable must +be declared uninitialized.} This means that you may not use the +``declare with initializer'' form for these variables, such as @code{int +some_variable = 0;}. The reason for this has to do with some kludges +done during the dumping process: If possible, the initialized data +segment is re-mapped so that it becomes part of the (unmodifiable) code +segment in the dumped executable. This allows this memory to be shared +among multiple running XEmacs processes. XEmacs is careful to place as +much constant data as possible into initialized variables during the +@file{temacs} phase. + +@cindex copy-on-write +@strong{Please note:} This kludge only works on a few systems nowadays, +and is rapidly becoming irrelevant because most modern operating systems +provide @dfn{copy-on-write} semantics. All data is initially shared +between processes, and a private copy is automatically made (on a +page-by-page basis) when a process first attempts to write to a page of +memory. + +Formerly, there was a requirement that static variables not be declared +inside of functions. This had to do with another hack along the same +vein as what was just described: old USG systems put statically-declared +variables in the initialized data space, so those header files had a +@code{#define static} declaration. (That way, the data-segment remapping +described above could still work.) This fails badly on static variables +inside of functions, which suddenly become automatic variables; +therefore, you weren't supposed to have any of them. This awful kludge +has been removed in XEmacs because + +@enumerate +@item +almost all of the systems that used this kludge ended up having +to disable the data-segment remapping anyway; +@item +the only systems that didn't were extremely outdated ones; +@item +this hack completely messed up inline functions. +@end enumerate + +The C source code makes heavy use of C preprocessor macros. One popular +macro style is: + +@example +#define FOO(var, value) do @{ \ + Lisp_Object FOO_value = (value); \ + ... /* compute using FOO_value */ \ + (var) = bar; \ +@} while (0) +@end example + +The @code{do @{...@} while (0)} is a standard trick to allow FOO to have +statement semantics, so that it can safely be used within an @code{if} +statement in C, for example. Multiple evaluation is prevented by +copying a supplied argument into a local variable, so that +@code{FOO(var,fun(1))} only calls @code{fun} once. + +Lisp lists are popular data structures in the C code as well as in +Elisp. There are two sets of macros that iterate over lists. +@code{EXTERNAL_LIST_LOOP_@var{n}} should be used when the list has been +supplied by the user, and cannot be trusted to be acyclic and +@code{nil}-terminated. A @code{malformed-list} or @code{circular-list} error +will be generated if the list being iterated over is not entirely +kosher. @code{LIST_LOOP_@var{n}}, on the other hand, is faster and less +safe, and can be used only on trusted lists. + +Related macros are @code{GET_EXTERNAL_LIST_LENGTH} and +@code{GET_LIST_LENGTH}, which calculate the length of a list, and in the +case of @code{GET_EXTERNAL_LIST_LENGTH}, validating the properness of +the list. The macros @code{EXTERNAL_LIST_LOOP_DELETE_IF} and +@code{LIST_LOOP_DELETE_IF} delete elements from a lisp list satisfying some +predicate. + +@node Object-Oriented Techniques for C, Writing Lisp Primitives, General Coding Rules, Rules When Writing New C Code +@section Object-Oriented Techniques for C +@cindex coding rules, object-oriented +@cindex object-oriented techniques + +At the lowest levels, XEmacs makes heavy use of object-oriented +techniques to promote code-sharing and uniform interfaces for different +devices and platforms. Commonly, but not always, such objects are +``wrapped'' and exported to Lisp as Lisp objects. Usually they use +the internal structures developed for Lisp objects (the @samp{lrecord} +structure) in order to take advantage of Lisp memory management. +Unfortunately, XEmacs was originally written in C, so these techniques +are based on heavy use of C macros. + +@c You can't use @var{} for type below, because case is important. +A module defining a class is likely to use most of the following +declarations and macros. In the following, the notation @samp{<type>} +will stand for the full name of the class, and will be capitalized in +the way normal for its context. The notation @samp{<typ>} will stand +for the abbreviated form commonly used in macro names, while @samp{ty} +will be used as the typical name for instances of the class. (See the +entry for @samp{MAYBE_<TY>METH} below for an example using all three +notations.) + +In the interface (@file{.h} file), the following declarations are used +often. Others may be used in for particular modules. Since they're +quite short in most cases, the definitions are given as well. The +generic macros used are defined in @file{lisp.h} or @file{lrecord.h}. + +@c #### reorganize this table into stuff used in general code, and stuff +@c used only in declarations or initializations +@table @samp +@c #### declaration +@item typedef struct Lisp_<Type> Lisp_<Type> +This refers to the internal structure used by C code. The XEmacs coding +style now forbids passing pointers to @samp{Lisp_<Type>} structures into +or out of a function; instead, a @samp{Lisp_Object} should be passed or +returned (created using @samp{wrap_<type>}, if necessary). + +@c #### declaration +@item DECLARE_LRECORD (<type>, Lisp_<Type>) +Declares an @samp{lrecord} for @samp{<Type>}, which is the unit of +allocation. + +@item #define X<TYPE>(x) XRECORD (x, <type>, Lisp_<Type>) +Turns a @code{Lisp_Object} into a pointer to @samp{struct Lisp_<Type>}. + +@item #define wrap_<type>(p) wrap_record (p, <type>) +Turns a pointer to @samp{struct Lisp_<Type>} into a @code{Lisp_Object}. + +@item #define <TYPE>P(x) RECORDP (x, <type>) +Tests whether a given @code{Lisp_Object} is of type @samp{Lisp_<Type>}. +Returns a C int, not a Lisp Boolean value. + +@item #define CHECK_<TYPE>(x) CHECK_RECORD (x, <type>) +@itemx #define CONCHECK_<TYPE>(x) CONCHECK_RECORD (x, <type>) +Tests whether a given @code{Lisp_Object} is of type @samp{Lisp_<Type>}, +and signals a Lisp error if not. The @samp{CHECK} version of the macro +never returns if the type is wrong, while the @samp{CONCHECK} version +can return if the user catches it in the debugger and explicitly +requests a return. + +@item #define RAW_<TYP>METH(ty, m) ((ty)->methods->m##_method) +Return a function pointer for the method for an object @var{TY} of class +@samp{Lisp_<Type>}, or @samp{NULL} if there is none for this type. + +@item #define HAS_<TYP>METH_P(ty, m) (!!RAW_<TYP>METH (ty, m)) +Test whether the class that @var{TY} is an instance of has the method. + +@item #define <TYP>METH(ty, m, args) ((RAW_<TYP>METH (ty, m)) args) +Call the method on @samp{args}. @samp{args} must be enclosed in +parentheses in the call. It is the programmer's responsibility to +ensure that the method is available. The standard convenience macro +@samp{MAYBE_<TYP>METH} is often provided for the common case where a +void-returning method of @samp{Type} is called. + +@item #define MAYBE_<TYP>METH(ty, m, args) do @{ ... @} while (0) +Call a void-returning @samp{<Type>} method, if it exists. Note the use +of the @samp{do ... while (0)} idiom to give the macro call C statement +semantics. The full definition is equally idiomatic: + +@example +#define MAYBE_<TYP>METH(ty, m, args) do @{ \ + Lisp_<Type> *maybe_<typ>meth_ty = (ty); \ + if (HAS_<TYP>METH_P (maybe_<typ>meth_ty, m)) \ + <TYP>METH (maybe_<typ>meth_ty, m, args); \ +@} while (0) +@end example +@end table + +The use of macros for invoking an object's methods makes life a bit +difficult for the student or maintainer when browsing the code. In +particular, calls are of the form @samp{<TYP>METH (ty, some_method, (x, +y))}, but definitions typically are for @samp{<subtype>_some_method}. +Thus, when you are trying to find calls, you need to grep for +@samp{some_method}, but this will also catch calls and definitions of +that method for instances of other subtypes of @samp{<Type>}, and there +may be a rather large number of them. + + +@node Writing Lisp Primitives, Writing Good Comments, Object-Oriented Techniques for C, Rules When Writing New C Code +@section Writing Lisp Primitives +@cindex writing Lisp primitives +@cindex Lisp primitives, writing +@cindex primitives, writing Lisp + +Lisp primitives are Lisp functions implemented in C. The details of +interfacing the C function so that Lisp can call it are handled by a few +C macros. The only way to really understand how to write new C code is +to read the source, but we can explain some things here. + +An example of a special form is the definition of @code{prog1}, from +@file{eval.c}. (An ordinary function would have the same general +appearance.) + +@cindex garbage collection protection +@smallexample +@group +DEFUN ("prog1", Fprog1, 1, UNEVALLED, 0, /* +Similar to `progn', but the value of the first form is returned. +\(prog1 FIRST BODY...): All the arguments are evaluated sequentially. +The value of FIRST is saved during evaluation of the remaining args, +whose values are discarded. +*/ + (args)) +@{ + /* This function can GC */ + REGISTER Lisp_Object val, form, tail; + struct gcpro gcpro1; + + val = Feval (XCAR (args)); + + GCPRO1 (val); + + LIST_LOOP_3 (form, XCDR (args), tail) + Feval (form); + + UNGCPRO; + return val; +@} +@end group +@end smallexample + + Let's start with a precise explanation of the arguments to the +@code{DEFUN} macro. Here is a template for them: + +@example +@group +DEFUN (@var{lname}, @var{fname}, @var{min_args}, @var{max_args}, @var{interactive}, /* +@var{docstring} +*/ + (@var{arglist})) +@end group +@end example + +@table @var +@item lname +This string is the name of the Lisp symbol to define as the function +name; in the example above, it is @code{"prog1"}. + +@item fname +This is the C function name for this function. This is the name that is +used in C code for calling the function. The name is, by convention, +@samp{F} prepended to the Lisp name, with all dashes (@samp{-}) in the +Lisp name changed to underscores. Thus, to call this function from C +code, call @code{Fprog1}. Remember that the arguments are of type +@code{Lisp_Object}; various macros and functions for creating values of +type @code{Lisp_Object} are declared in the file @file{lisp.h}. + +Primitives whose names are special characters (e.g. @code{+} or +@code{<}) are named by spelling out, in some fashion, the special +character: e.g. @code{Fplus()} or @code{Flss()}. Primitives whose names +begin with normal alphanumeric characters but also contain special +characters are spelled out in some creative way, e.g. @code{let*} +becomes @code{FletX()}. + +Each function also has an associated structure that holds the data for +the subr object that represents the function in Lisp. This structure +conveys the Lisp symbol name to the initialization routine that will +create the symbol and store the subr object as its definition. The C +variable name of this structure is always @samp{S} prepended to the +@var{fname}. You hardly ever need to be aware of the existence of this +structure, since @code{DEFUN} plus @code{DEFSUBR} takes care of all the +details. + +@item min_args +This is the minimum number of arguments that the function requires. The +function @code{prog1} allows a minimum of one argument. + +@item max_args +This is the maximum number of arguments that the function accepts, if +there is a fixed maximum. Alternatively, it can be @code{UNEVALLED}, +indicating a special form that receives unevaluated arguments, or +@code{MANY}, indicating an unlimited number of evaluated arguments (the +C equivalent of @code{&rest}). Both @code{UNEVALLED} and @code{MANY} +are macros. If @var{max_args} is a number, it may not be less than +@var{min_args} and it may not be greater than 8. (If you need to add a +function with more than 8 arguments, use the @code{MANY} form. Resist +the urge to edit the definition of @code{DEFUN} in @file{lisp.h}. If +you do it anyways, make sure to also add another clause to the switch +statement in @code{primitive_funcall().}) + +@item interactive +This is an interactive specification, a string such as might be used as +the argument of @code{interactive} in a Lisp function. In the case of +@code{prog1}, it is 0 (a null pointer), indicating that @code{prog1} +cannot be called interactively. A value of @code{""} indicates a +function that should receive no arguments when called interactively. + +@item docstring +This is the documentation string. It is written just like a +documentation string for a function defined in Lisp; in particular, the +first line should be a single sentence. Note how the documentation +string is enclosed in a comment, none of the documentation is placed on +the same lines as the comment-start and comment-end characters, and the +comment-start characters are on the same line as the interactive +specification. @file{make-docfile}, which scans the C files for +documentation strings, is very particular about what it looks for, and +will not properly extract the doc string if it's not in this exact format. + +In order to make both @file{etags} and @file{make-docfile} happy, make +sure that the @code{DEFUN} line contains the @var{lname} and +@var{fname}, and that the comment-start characters for the doc string +are on the same line as the interactive specification, and put a newline +directly after them (and before the comment-end characters). + +@item arglist +This is the comma-separated list of arguments to the C function. For a +function with a fixed maximum number of arguments, provide a C argument +for each Lisp argument. In this case, unlike regular C functions, the +types of the arguments are not declared; they are simply always of type +@code{Lisp_Object}. + +The names of the C arguments will be used as the names of the arguments +to the Lisp primitive as displayed in its documentation, modulo the same +concerns described above for @code{F...} names (in particular, +underscores in the C arguments become dashes in the Lisp arguments). + +There is one additional kludge: A trailing @samp{_} on the C argument is +discarded when forming the Lisp argument. This allows C language +reserved words (like @code{default}) or global symbols (like +@code{dirname}) to be used as argument names without compiler warnings +or errors. + +A Lisp function with @w{@var{max_args} = @code{UNEVALLED}} is a +@w{@dfn{special form}}; its arguments are not evaluated. Instead it +receives one argument of type @code{Lisp_Object}, a (Lisp) list of the +unevaluated arguments, conventionally named @code{(args)}. + +When a Lisp function has no upper limit on the number of arguments, +specify @w{@var{max_args} = @code{MANY}}. In this case its implementation in +C actually receives exactly two arguments: the number of Lisp arguments +(an @code{int}) and the address of a block containing their values (a +@w{@code{Lisp_Object *}}). In this case only are the C types specified +in the @var{arglist}: @w{@code{(int nargs, Lisp_Object *args)}}. + +@end table + +Within the function @code{Fprog1} itself, note the use of the macros +@code{GCPRO1} and @code{UNGCPRO}. @code{GCPRO1} is used to ``protect'' +a variable from garbage collection---to inform the garbage collector +that it must look in that variable and regard the object pointed at by +its contents as an accessible object. This is necessary whenever you +call @code{Feval} or anything that can directly or indirectly call +@code{Feval} (this includes the @code{QUIT} macro!). At such a time, +any Lisp object that you intend to refer to again must be protected +somehow. @code{UNGCPRO} cancels the protection of the variables that +are protected in the current function. It is necessary to do this +explicitly. + +The macro @code{GCPRO1} protects just one local variable. If you want +to protect two, use @code{GCPRO2} instead; repeating @code{GCPRO1} will +not work. Macros @code{GCPRO3} and @code{GCPRO4} also exist. + +These macros implicitly use local variables such as @code{gcpro1}; you +must declare these explicitly, with type @code{struct gcpro}. Thus, if +you use @code{GCPRO2}, you must declare @code{gcpro1} and @code{gcpro2}. + +@cindex caller-protects (@code{GCPRO} rule) +Note also that the general rule is @dfn{caller-protects}; i.e. you are +only responsible for protecting those Lisp objects that you create. Any +objects passed to you as arguments should have been protected by whoever +created them, so you don't in general have to protect them. + +In particular, the arguments to any Lisp primitive are always +automatically @code{GCPRO}ed, when called ``normally'' from Lisp code or +bytecode. So only a few Lisp primitives that are called frequently from +C code, such as @code{Fprogn} protect their arguments as a service to +their caller. You don't need to protect your arguments when writing a +new @code{DEFUN}. + +@code{GCPRO}ing is perhaps the trickiest and most error-prone part of +XEmacs coding. It is @strong{extremely} important that you get this +right and use a great deal of discipline when writing this code. +@xref{GCPROing, ,@code{GCPRO}ing}, for full details on how to do this. + +What @code{DEFUN} actually does is declare a global structure of type +@code{Lisp_Subr} whose name begins with capital @samp{SF} and which +contains information about the primitive (e.g. a pointer to the +function, its minimum and maximum allowed arguments, a string describing +its Lisp name); @code{DEFUN} then begins a normal C function declaration +using the @code{F...} name. The Lisp subr object that is the function +definition of a primitive (i.e. the object in the function slot of the +symbol that names the primitive) actually points to this @samp{SF} +structure; when @code{Feval} encounters a subr, it looks in the +structure to find out how to call the C function. + +Defining the C function is not enough to make a Lisp primitive +available; you must also create the Lisp symbol for the primitive (the +symbol is @dfn{interned}; @pxref{Obarrays}) and store a suitable subr +object in its function cell. (If you don't do this, the primitive won't +be seen by Lisp code.) The code looks like this: + +@example +DEFSUBR (@var{fname}); +@end example + +@noindent +Here @var{fname} is the same name you used as the second argument to +@code{DEFUN}. + +This call to @code{DEFSUBR} should go in the @code{syms_of_*()} function +at the end of the module. If no such function exists, create it and +make sure to also declare it in @file{symsinit.h} and call it from the +appropriate spot in @code{main()}. @xref{General Coding Rules}. + +Note that C code cannot call functions by name unless they are defined +in C. The way to call a function written in Lisp from C is to use +@code{Ffuncall}, which embodies the Lisp function @code{funcall}. Since +the Lisp function @code{funcall} accepts an unlimited number of +arguments, in C it takes two: the number of Lisp-level arguments, and a +one-dimensional array containing their values. The first Lisp-level +argument is the Lisp function to call, and the rest are the arguments to +pass to it. Since @code{Ffuncall} can call the evaluator, you must +protect pointers from garbage collection around the call to +@code{Ffuncall}. (However, @code{Ffuncall} explicitly protects all of +its parameters, so you don't have to protect any pointers passed as +parameters to it.) + +The C functions @code{call0}, @code{call1}, @code{call2}, and so on, +provide handy ways to call a Lisp function conveniently with a fixed +number of arguments. They work by calling @code{Ffuncall}. + +@file{eval.c} is a very good file to look through for examples; +@file{lisp.h} contains the definitions for important macros and +functions. + +@node Writing Good Comments, Adding Global Lisp Variables, Writing Lisp Primitives, Rules When Writing New C Code +@section Writing Good Comments +@cindex writing good comments +@cindex comments, writing good + +Comments are a lifeline for programmers trying to understand tricky +code. In general, the less obvious it is what you are doing, the more +you need a comment, and the more detailed it needs to be. You should +always be on guard when you're writing code for stuff that's tricky, and +should constantly be putting yourself in someone else's shoes and asking +if that person could figure out without much difficulty what's going +on. (Assume they are a competent programmer who understands the +essentials of how the XEmacs code is structured but doesn't know much +about the module you're working on or any algorithms you're using.) If +you're not sure whether they would be able to, add a comment. Always +err on the side of more comments, rather than less. + +Generally, when making comments, there is no need to attribute them with +your name or initials. This especially goes for small, +easy-to-understand, non-opinionated ones. Also, comments indicating +where, when, and by whom a file was changed are @emph{strongly} +discouraged, and in general will be removed as they are discovered. +This is exactly what @file{ChangeLogs} are there for. However, it can +occasionally be useful to mark exactly where (but not when or by whom) +changes are made, particularly when making small changes to a file +imported from elsewhere. These marks help when later on a newer version +of the file is imported and the changes need to be merged. (If +everything were always kept in CVS, there would be no need for this. +But in practice, this often doesn't happen, or the CVS repository is +later on lost or unavailable to the person doing the update.) + +When putting in an explicit opinion in a comment, you should +@emph{always} attribute it with your name and the date. This also goes +for long, complex comments explaining in detail the workings of +something -- by putting your name there, you make it possible for +someone who has questions about how that thing works to determine who +wrote the comment so they can write to them. Use your actual name or +your alias at xemacs.org, and not your initials or nickname, unless that +is generally recognized (e.g. @samp{jwz}). Even then, please consider +requesting a virtual user at xemacs.org (forwarding address; we can't +provide an actual mailbox). Otherwise, give first and last name. If +you're not a regular contributor, you might consider putting your email +address in -- it may be in the ChangeLog, but after awhile ChangeLogs +have a tendency of disappearing or getting muddled. (E.g. your comment +may get copied somewhere else or even into another program, and tracking +down the proper ChangeLog may be very difficult.) + +If you come across an opinion that is not or is no longer valid, or you +come across any comment that no longer applies but you want to keep it +around, enclose it in @samp{[[ } and @samp{ ]]} marks and add a comment +afterwards explaining why the preceding comment is no longer valid. Put +your name on this comment, as explained above. + +Just as comments are a lifeline to programmers, incorrect comments are +death. If you come across an incorrect comment, @strong{immediately} +correct it or flag it as incorrect, as described in the previous +paragraph. Whenever you work on a section of code, @emph{always} make +sure to update any comments to be correct -- or, at the very least, flag +them as incorrect. + +To indicate a "todo" or other problem, use four pound signs -- +i.e. @samp{####}. + +@node Adding Global Lisp Variables, Writing Macros, Writing Good Comments, Rules When Writing New C Code +@section Adding Global Lisp Variables +@cindex global Lisp variables, adding +@cindex variables, adding global Lisp + +Global variables whose names begin with @samp{Q} are constants whose +value is a symbol of a particular name. The name of the variable should +be derived from the name of the symbol using the same rules as for Lisp +primitives. These variables are initialized using a call to +@code{defsymbol()} in the @code{syms_of_*()} function. (This call +interns a symbol, sets the C variable to the resulting Lisp object, and +calls @code{staticpro()} on the C variable to tell the +garbage-collection mechanism about this variable. What +@code{staticpro()} does is add a pointer to the variable to a large +global array; when garbage-collection happens, all pointers listed in +the array are used as starting points for marking Lisp objects. This is +important because it's quite possible that the only current reference to +the object is the C variable. In the case of symbols, the +@code{staticpro()} doesn't matter all that much because the symbol is +contained in @code{obarray}, which is itself @code{staticpro()}ed. +However, it's possible that a naughty user could do something like +uninterning the symbol out of @code{obarray} or even setting +@code{obarray} to a different value [although this is likely to make +XEmacs crash!].) + + @strong{Please note:} It is potentially deadly if you declare a +@samp{Q...} variable in two different modules. The two calls to +@code{defsymbol()} are no problem, but some linkers will complain about +multiply-defined symbols. The most insidious aspect of this is that +often the link will succeed anyway, but then the resulting executable +will sometimes crash in obscure ways during certain operations! + +To avoid this problem, declare any symbols with common names (such as +@code{text}) that are not obviously associated with this particular +module in the file @file{general-slots.h}. The ``-slots'' suffix +indicates that this is a file that is included multiple times in +@file{general.c}. Redefinition of preprocessor macros allows the +effects to be different in each context, so this is actually more +convenient and less error-prone than doing it in your module. + + Global variables whose names begin with @samp{V} are variables that +contain Lisp objects. The convention here is that all global variables +of type @code{Lisp_Object} begin with @samp{V}, and all others don't +(including integer and boolean variables that have Lisp +equivalents). Most of the time, these variables have equivalents in +Lisp, but some don't. Those that do are declared this way by a call to +@code{DEFVAR_LISP()} in the @code{vars_of_*()} initializer for the +module. What this does is create a special @dfn{symbol-value-forward} +Lisp object that contains a pointer to the C variable, intern a symbol +whose name is as specified in the call to @code{DEFVAR_LISP()}, and set +its value to the symbol-value-forward Lisp object; it also calls +@code{staticpro()} on the C variable to tell the garbage-collection +mechanism about the variable. When @code{eval} (or actually +@code{symbol-value}) encounters this special object in the process of +retrieving a variable's value, it follows the indirection to the C +variable and gets its value. @code{setq} does similar things so that +the C variable gets changed. + + Whether or not you @code{DEFVAR_LISP()} a variable, you need to +initialize it in the @code{vars_of_*()} function; otherwise it will end +up as all zeroes, which is the integer 0 (@emph{not} @code{nil}), and +this is probably not what you want. Also, if the variable is not +@code{DEFVAR_LISP()}ed, @strong{you must call} @code{staticpro()} on the +C variable in the @code{vars_of_*()} function. Otherwise, the +garbage-collection mechanism won't know that the object in this variable +is in use, and will happily collect it and reuse its storage for another +Lisp object, and you will be the one who's unhappy when you can't figure +out how your variable got overwritten. + +@node Writing Macros, Proper Use of Unsigned Types, Adding Global Lisp Variables, Rules When Writing New C Code +@section Writing Macros +@cindex writing macros +@cindex macros, writing + +The three golden rules of macros: + +@enumerate +@item +Anything that's an lvalue can be evaluated more than once. +@item +Macros where anything else can be evaluated more than once should +have the word "unsafe" in their name (exceptions may be made for +large sets of macros that evaluate arguments of certain types more +than once, e.g. struct buffer * arguments, when clearly indicated in +the macro documentation). These macros are generally meant to be +called only by other macros that have already stored the calling +values in temporary variables. +@item +Nothing else can be evaluated more than once. Use inline +functions, if necessary, to prevent multiple evaluation. +@end enumerate + +NOTE: The functions and macros below are given full prototypes in their +docs, even when the implementation is a macro. In such cases, passing +an argument of a type other than expected will produce undefined +results. Also, given that macros can do things functions can't (in +particular, directly modify arguments as if they were passed by +reference), the declaration syntax has been extended to include the +call-by-reference syntax from C++, where an & after a type indicates +that the argument is an lvalue and is passed by reference, i.e. the +function can modify its value. (This is equivalent in C to passing a +pointer to the argument, but without the need to explicitly worry about +pointers.) + +When to capitalize macros: + +@itemize @bullet +@item +Capitalize macros doing stuff obviously impossible with (C) +functions, e.g. directly modifying arguments as if they were passed by +reference. +@item +Capitalize macros that evaluate @strong{any} argument more than once regardless +of whether that's "allowed" (e.g. buffer arguments). +@item +Capitalize macros that directly access a field in a Lisp_Object or +its equivalent underlying structure. In such cases, access through the +Lisp_Object precedes the macro with an X, and access through the underlying +structure doesn't. +@item +Capitalize certain other basic macros relating to Lisp_Objects; e.g. +FRAMEP, CHECK_FRAME, etc. +@item +Try to avoid capitalizing any other macros. +@end itemize + +@node Proper Use of Unsigned Types, Techniques for XEmacs Developers, Writing Macros, Rules When Writing New C Code +@section Proper Use of Unsigned Types +@cindex unsigned types, proper use of +@cindex types, proper use of unsigned + +Avoid using @code{unsigned int} and @code{unsigned long} whenever +possible. Unsigned types are viral -- any arithmetic or comparisons +involving mixed signed and unsigned types are automatically converted to +unsigned, which is almost certainly not what you want. Many subtle and +hard-to-find bugs are created by careless use of unsigned types. In +general, you should almost @emph{never} use an unsigned type to hold a +regular quantity of any sort. The only exceptions are + +@enumerate +@item +When there's a reasonable possibility you will actually need all 32 or +64 bits to store the quantity. +@item +When calling existing API's that require unsigned types. In this case, +you should still do all manipulation using signed types, and do the +conversion at the very threshold of the API call. +@item +In existing code that you don't want to modify because you don't +maintain it. +@item +In bit-field structures. +@end enumerate + +Other reasonable uses of @code{unsigned int} and @code{unsigned long} +are representing non-quantities -- e.g. bit-oriented flags and such. + +@node Techniques for XEmacs Developers, , Proper Use of Unsigned Types, Rules When Writing New C Code +@section Techniques for XEmacs Developers +@cindex techniques for XEmacs developers +@cindex developers, techniques for XEmacs + +@cindex Purify +@cindex Quantify +To make a purified XEmacs, do: @code{make puremacs}. +To make a quantified XEmacs, do: @code{make quantmacs}. + +You simply can't dump Quantified and Purified images (unless using the +portable dumper). Purify gets confused when xemacs frees memory in one +process that was allocated in a @emph{different} process on a different +machine! Run it like so: +@example +temacs -batch -l loadup.el run-temacs @var{xemacs-args...} +@end example + +@cindex error checking +Before you go through the trouble, are you compiling with all +debugging and error-checking off? If not, try that first. Be warned +that while Quantify is directly responsible for quite a few +optimizations which have been made to XEmacs, doing a run which +generates results which can be acted upon is not necessarily a trivial +task. + +Also, if you're still willing to do some runs make sure you configure +with the @samp{--quantify} flag. That will keep Quantify from starting +to record data until after the loadup is completed and will shut off +recording right before it shuts down (which generates enough bogus data +to throw most results off). It also enables three additional elisp +commands: @code{quantify-start-recording-data}, +@code{quantify-stop-recording-data} and @code{quantify-clear-data}. + +If you want to make XEmacs faster, target your favorite slow benchmark, +run a profiler like Quantify, @code{gprof}, or @code{tcov}, and figure +out where the cycles are going. In many cases you can localize the +problem (because a particular new feature or even a single patch +elicited it). Don't hesitate to use brute force techniques like a +global counter incremented at strategic places, especially in +combination with other performance indications (@emph{e.g.}, degree of +buffer fragmentation into extents). + +Specific projects: + +@itemize @bullet +@item +Make the garbage collector faster. Figure out how to write an +incremental garbage collector. +@item +Write a compiler that takes bytecode and spits out C code. +Unfortunately, you will then need a C compiler and a more fully +developed module system. +@item +Speed up redisplay. +@item +Speed up syntax highlighting. It was suggested that ``maybe moving some +of the syntax highlighting capabilities into C would make a +difference.'' Wrong idea, I think. When processing one 400kB file a +particular low-level routine was being called 40 @emph{million} times +simply for @emph{one} call to @code{newline-and-indent}. Syntax +highlighting needs to be rewritten to use a reliable, fast parser, then +to trust the pre-parsed structure, and only do re-highlighting locally +to a text change. Modern machines are fast enough to implement such +parsers in Lisp; but no machine will ever be fast enough to deal with +quadratic (or worse) algorithms! +@item +Implement tail recursion in Emacs Lisp (hard!). +@end itemize + +Unfortunately, Emacs Lisp is slow, and is going to stay slow. Function +calls in elisp are especially expensive. Iterating over a long list is +going to be 30 times faster implemented in C than in Elisp. + +Heavily used small code fragments need to be fast. The traditional way +to implement such code fragments in C is with macros. But macros in C +are known to be broken. + +@cindex macro hygiene +Macro arguments that are repeatedly evaluated may suffer from repeated +side effects or suboptimal performance. + +Variable names used in macros may collide with caller's variables, +causing (at least) unwanted compiler warnings. + +In order to solve these problems, and maintain statement semantics, one +should use the @code{do @{ ... @} while (0)} trick while trying to +reference macro arguments exactly once using local variables. + +Let's take a look at this poor macro definition: + +@example +#define MARK_OBJECT(obj) \ + if (!marked_p (obj)) mark_object (obj), did_mark = 1 +@end example + +This macro evaluates its argument twice, and also fails if used like this: +@example + if (flag) MARK_OBJECT (obj); else @code{do_something()}; +@end example + +A much better definition is + +@example +#define MARK_OBJECT(obj) do @{ \ + Lisp_Object mo_obj = (obj); \ + if (!marked_p (mo_obj)) \ + @{ \ + mark_object (mo_obj); \ + did_mark = 1; \ + @} \ +@} while (0) +@end example + +Notice the elimination of double evaluation by using the local variable +with the obscure name. Writing safe and efficient macros requires great +care. The one problem with macros that cannot be portably worked around +is, since a C block has no value, a macro used as an expression rather +than a statement cannot use the techniques just described to avoid +multiple evaluation. + +@cindex inline functions +In most cases where a macro has function semantics, an inline function +is a better implementation technique. Modern compiler optimizers tend +to inline functions even if they have no @code{inline} keyword, and +configure magic ensures that the @code{inline} keyword can be safely +used as an additional compiler hint. Inline functions used in a single +.c files are easy. The function must already be defined to be +@code{static}. Just add another @code{inline} keyword to the +definition. + +@example +inline static int +heavily_used_small_function (int arg) +@{ + ... +@} +@end example + +Inline functions in header files are trickier, because we would like to +make the following optimization if the function is @emph{not} inlined +(for example, because we're compiling for debugging). We would like the +function to be defined externally exactly once, and each calling +translation unit would create an external reference to the function, +instead of including a definition of the inline function in the object +code of every translation unit that uses it. This optimization is +currently only available for gcc. But you don't have to worry about the +trickiness; just define your inline functions in header files using this +pattern: + +@example +DECLARE_INLINE_HEADER ( +int +i_used_to_be_a_crufty_macro_but_look_at_me_now (int arg) +) +@{ + ... +@} +@end example + +We use @code{DECLARE_INLINE_HEADER} rather than just the modifier +@code{INLINE_HEADER} to prevent warnings when compiling with @code{gcc +-Wmissing-declarations}. I consider issuing this warning for inline +functions a gcc bug, but the gcc maintainers disagree. + +@cindex inline functions, headers +@cindex header files, inline functions +Every header which contains inline functions, either directly by using +@code{DECLARE_INLINE_HEADER} or indirectly by using @code{DECLARE_LRECORD} must +be added to @file{inline.c}'s includes to make the optimization +described above work. (Optimization note: if all INLINE_HEADER +functions are in fact inlined in all translation units, then the linker +can just discard @code{inline.o}, since it contains only unreferenced code). + +To get started debugging XEmacs, take a look at the @file{.gdbinit} and +@file{.dbxrc} files in the @file{src} directory. See the section in the +XEmacs FAQ on How to Debug an XEmacs problem with a debugger. + +After making source code changes, run @code{make check} to ensure that +you haven't introduced any regressions. If you want to make xemacs more +reliable, please improve the test suite in @file{tests/automated}. + +Did you make sure you didn't introduce any new compiler warnings? + +Before submitting a patch, please try compiling at least once with + +@example +configure --with-mule --use-union-type --error-checking=all +@end example + +Here are things to know when you create a new source file: + +@itemize @bullet +@item +All @file{.c} files should @code{#include <config.h>} first. Almost all +@file{.c} files should @code{#include "lisp.h"} second. + +@item +Generated header files should be included using the @samp{#include <...>} +syntax, not the @samp{#include "..."} syntax. The generated headers are: + +@file{config.h sheap-adjust.h paths.h Emacs.ad.h} + +The basic rule is that you should assume builds using @samp{--srcdir} +and the @samp{#include <...>} syntax needs to be used when the +to-be-included generated file is in a potentially different directory +@emph{at compile time}. The non-obvious C rule is that +@samp{#include "..."} means to search for the included file in the same +directory as the including file, @emph{not} in the current directory. +Normally this is not a problem but when building with @samp{--srcdir}, +@file{make} will search the @samp{VPATH} for you, while the C compiler +knows nothing about it. + +@item +Header files should @emph{not} include @samp{<config.h>} and +@samp{"lisp.h"}. It is the responsibility of the @file{.c} files that +use it to do so. + +@end itemize + +@cindex Lisp object types, creating +@cindex creating Lisp object types +@cindex object types, creating Lisp +Here is a checklist of things to do when creating a new lisp object type +named @var{foo}: + +@enumerate +@item +create @var{foo}.h +@item +create @var{foo}.c +@item +add definitions of @code{syms_of_@var{foo}}, etc. to @file{@var{foo}.c} +@item +add declarations of @code{syms_of_@var{foo}}, etc. to @file{symsinit.h} +@item +add calls to @code{syms_of_@var{foo}}, etc. to @file{emacs.c} +@item +add definitions of macros like @code{CHECK_@var{FOO}} and +@code{@var{FOO}P} to @file{@var{foo}.h} +@item +add the new type index to @code{enum lrecord_type} +@item +add a DEFINE_LRECORD_IMPLEMENTATION call to @file{@var{foo}.c} +@item +add an INIT_LRECORD_IMPLEMENTATION call to @code{syms_of_@var{foo}.c} +@end enumerate + +@node Regression Testing XEmacs, CVS Techniques, Rules When Writing New C Code, Top +@chapter Regression Testing XEmacs +@cindex testing, regression + +@menu +* How to Regression-Test:: +* Modules for Regression Testing:: +@end menu + +@node How to Regression-Test, Modules for Regression Testing, Regression Testing XEmacs, Regression Testing XEmacs +@section How to Regression-Test +@cindex how to regression-test +@cindex regression-test, how to +@cindex testing, regression, how to + +The source directory @file{tests/automated} contains XEmacs' automated +test suite. The usual way of running all the tests is running +@code{make check} from the top-level build directory. + +The test suite is unfinished and it's still lacking some essential +features. It is nevertheless recommended that you run the tests to +confirm that XEmacs behaves correctly. + +If you want to run a specific test case, you can do it from the +command-line like this: + +@example +$ xemacs -batch -l test-harness.elc -f batch-test-emacs TEST-FILE +@end example + +If a test fails and you need more information, you can run the test +suite interactively by loading @file{test-harness.el} into a running +XEmacs and typing @kbd{M-x test-emacs-test-file RET <filename> RET}. +You will see a log of passed and failed tests, which should allow you to +investigate the source of the error and ultimately fix the bug. If you +are not capable of, or don't have time for, debugging it yourself, +please do report the failures using @kbd{M-x report-emacs-bug} or +@kbd{M-x build-report}. + +@deffn Command test-emacs-test-file file +Runs the tests in @var{file}. @file{test-harness.el} must be loaded. +Defines all the macros described in this node, and undefines them when +done. +@end deffn + +Adding a new test file is trivial: just create a new file here and it +will be run. There is no need to byte-compile any of the files in +this directory---the test-harness will take care of any necessary +byte-compilation. + +Look at the existing test cases for the examples of coding test cases. +It all boils down to your imagination and judicious use of the macros +@code{Assert}, @code{Check-Error}, @code{Check-Error-Message}, and +@code{Check-Message}. Note that all of these macros are defined only +for the duration of the test: they do not exist in the global +environment. + +@deffn Macro Assert expr +Check that @var{expr} is non-nil at this point in the test. +@end deffn + +@deffn Macro Check-Error expected-error body +Check that execution of @var{body} causes @var{expected-error} to be +signaled. @var{body} is a @code{progn}-like body, and may contain +several expressions. @var{expected-error} is a symbol defined as +an error by @code{define-error}. +@end deffn + +@deffn Macro Check-Error-Message expected-error expected-error-regexp body +Check that execution of @var{body} causes @var{expected-error} to be +signaled, and generate a message matching @var{expected-error-regexp}. +@var{body} is a @code{progn}-like body, and may contain several +expressions. @var{expected-error} is a symbol defined as an error +by @code{define-error}. +@end deffn + +@deffn Macro Check-Message expected-message body +Check that execution of @var{body} causes @var{expected-message} to be +generated (using @code{message} or a similar function). @var{body} is a +@code{progn}-like body, and may contain several expressions. +@end deffn + +Here's a simple example checking case-sensitive and case-insensitive +comparisons from @file{case-tests.el}. + +@example +(with-temp-buffer + (insert "Test Buffer") + (let ((case-fold-search t)) + (goto-char (point-min)) + (Assert (eq (search-forward "test buffer" nil t) 12)) + (goto-char (point-min)) + (Assert (eq (search-forward "Test buffer" nil t) 12)) + (goto-char (point-min)) + (Assert (eq (search-forward "Test Buffer" nil t) 12)) + + (setq case-fold-search nil) + (goto-char (point-min)) + (Assert (not (search-forward "test buffer" nil t))) + (goto-char (point-min)) + (Assert (not (search-forward "Test buffer" nil t))) + (goto-char (point-min)) + (Assert (eq (search-forward "Test Buffer" nil t) 12)))) +@end example + +This example could be saved in a file in @file{tests/automated}, and it +would constitute a complete test, automatically executed when you run +@kbd{make check} after building XEmacs. More complex tests may require +substantial temporary scaffolding to create the environment that elicits +the bugs, but the top-level @file{Makefile} and @file{test-harness.el} +handle the running and collection of results from the @code{Assert}, +@code{Check-Error}, @code{Check-Error-Message}, and @code{Check-Message} +macros. + +Don't suppress tests just because they're due to known bugs not yet +fixed---use the @code{Known-Bug-Expect-Failure} wrapper macro to mark +them. + +@deffn Macro Known-Bug-Expect-Failure body +Arrange for failing tests in @var{body} to generate messages prefixed +with "KNOWN BUG:" instead of "FAIL:". @var{body} is a @code{progn}-like +body, and may contain several tests. +@end deffn + +A lot of the tests we run push limits; suppress Ebola warning messages +with the @code{Ignore-Ebola} wrapper macro. + +@deffn Macro Ignore-Ebola body +Suppress Ebola warning messages while running tests in @var{body}. +@var{body} is a @code{progn}-like body, and may contain several tests. +@end deffn + +Both macros are defined temporarily within the test function. Simple +examples: + +@example +;; Apparently Ignore-Ebola is a solution with no problem to address. +;; There are no examples in 21.5, anyway. + +;; from regexp-tests.el +(Known-Bug-Expect-Failure + (Assert (not (string-match "\\b" ""))) + (Assert (not (string-match " \\b" " ")))) +@end example + +In general, you should avoid using functionality from packages in your +tests, because you can't be sure that everyone will have the required +package. However, if you've got a test that works, by all means add it. +Simply wrap the test in an appropriate test, add a notice that the test +was skipped, and update the @code{skipped-test-reasons} hashtable. The +wrapper macro @code{Skip-Test-Unless} is provided to handle common +cases. + +@defvar skipped-test-reasons +Hash table counting the number of times a particular reason is given for +skipping tests. This is only defined within @code{test-emacs-test-file}. +@end defvar + +@deffn Macro Skip-Test-Unless prerequisite reason description body +@var{prerequisite} is usually a feature test (@code{featurep}, +@code{boundp}, @code{fboundp}). @var{reason} is a string describing the +prerequisite; it must be unique because it is used as a hash key in a +table of reasons for skipping tests. @var{description} describes the +tests being skipped, for the test result summary. @var{body} is a +@code{progn}-like body, and may contain several tests. +@end deffn + +@code{Skip-Test-Unless} is defined temporarily within the test function. +Here's an example of usage from @file{syntax-tests.el}: + +@example +;; Test forward-comment at buffer boundaries +(with-temp-buffer + ;; try to use exactly what you need: featurep, boundp, fboundp + (Skip-Test-Unless (fboundp 'c-mode) + "c-mode unavailable" + "comment and parse-partial-sexp tests" + ;; and here's the test code + (c-mode) + (insert "// comment\n") + (forward-comment -2) + (Assert (eq (point) (point-min))) + (let ((point (point))) + (insert "/* comment */") + (goto-char point) + (forward-comment 2) + (Assert (eq (point) (point-max))) + (parse-partial-sexp point (point-max))))) +@end example + +@code{Skip-Test-Unless} is intended for use with features that are normally +present in typical configurations. For truly optional features, or +tests that apply to one of several alternative implementations (eg, to +GTK widgets, but not Athena, Motif, MS Windows, or Carbon), simply +silently suppress the test if the feature is not available. + +Here are a few general hints for writing tests. + +@enumerate +@item +Include related successful cases. Fixes often break something. + +@item +Use the Known-Bug-Expect-Failure macro to mark the cases you know +are going to fail. We want to be able to distinguish between +regressions and other unexpected failures, and cases that have +been (partially) analyzed but not yet repaired. + +@item +Mark the bug with the date of report. An ``Unfixed since yyyy-mm-dd'' +gloss for Known-Bug-Expect-Failure is planned to further increase +developer embarrassment (== incentive to fix the bug), but until then at +least put a comment about the date so we can easily see when it was +first reported. + +@item +It's a matter of your judgement, but you should often use generic tests +(@emph{e.g.}, @code{eq}) instead of more specific tests (@code{=} for +numbers) even though you know that arguments ``should'' be of correct +type. That is, if the functions used can return generic objects +(typically @code{nil}), as well as some more specific type that will be +returned on success. We don't want failures of those assertions +reported as ``other failures'' (a wrong-type-arg signal, rather than a +null return), we want them reported as ``assertion failures.'' + +One example is a test that tests @code{(= (string-match this that) 0)}, +expecting a successful match. Now suppose @code{string-match} is broken +such that the match fails. Then it will return @code{nil}, and @code{=} +will signal ``wrong-type-argument, number-char-or-marker-p, nil'', +generating an ``other failure'' in the report. But this should be +reported as an assertion failure (the test failed in a foreseeable way), +rather than something else (we don't know what happened because XEmacs +is broken in a way that we weren't trying to test!) +@end enumerate + +@node Modules for Regression Testing, , How to Regression-Test, Regression Testing XEmacs +@section Modules for Regression Testing +@cindex modules for regression testing +@cindex regression testing, modules for + +@example +@file{test-harness.el} +@file{base64-tests.el} +@file{byte-compiler-tests.el} +@file{case-tests.el} +@file{ccl-tests.el} +@file{c-tests.el} +@file{database-tests.el} +@file{extent-tests.el} +@file{hash-table-tests.el} +@file{lisp-tests.el} +@file{md5-tests.el} +@file{mule-tests.el} +@file{regexp-tests.el} +@file{symbol-tests.el} +@file{syntax-tests.el} +@file{tag-tests.el} +@file{weak-tests.el} +@end example + +@file{test-harness.el} defines the macros @code{Assert}, +@code{Check-Error}, @code{Check-Error-Message}, and +@code{Check-Message}. The other files are test files, testing various +XEmacs facilities. @xref{Regression Testing XEmacs}. + + +@node CVS Techniques, XEmacs from the Inside, Regression Testing XEmacs, Top +@chapter CVS Techniques +@cindex CVS techniques + +@menu +* Merging a Branch into the Trunk:: +@end menu + +@node Merging a Branch into the Trunk, , CVS Techniques, CVS Techniques +@section Merging a Branch into the Trunk +@cindex merging a branch into the trunk + +@enumerate +@item +If you haven't already done a merge, you will be merging from the branch +point; otherwise you'll be merging from the last merge point, which +should be marked by a tag, e.g. @samp{last-sync-ben-mule-21-5}. In the +former case, create the last-sync tag, e.g. + +@example +crw rtag -r ben-mule-21-5-bp last-sync-ben-mule-21-5 xemacs +@end example + +(You did create a branch point tag when you created the branch, didn't +you?) + +@item +Check everything in on your branch. + +@item +Tag your branch with a pre-sync tag, e.g. + +@example +crw rtag -r ben-mule-21-5 ben-mule-21-5-pre-feb-20-2002-sync xemacs +@end example + +Note, you need to use rtag and specify a version with @samp{-r} (use +@samp{-r HEAD} if necessary) so that removed files are handled correctly +in some obscure cases. See section 4.8 of the CVS manual. + +@item +Tag the trunk so you have a stable place to merge up to in case people +are asynchronously committing to the trunk, e.g. + +@example +crw rtag -r HEAD main-branch-ben-mule-21-5-syncpoint-feb-20-2002 xemacs +crw rtag -F -r main-branch-ben-mule-21-5-syncpoint-feb-20-2002 next-sync-ben-mule-21-5 xemacs +@end example + +Use -F in the second case because the name might already exist, e.g. if +you've already done a merge. We make two tags because one is a +permanent mark indicating a syncpoint when merging, and the other is a +symbolic tag to make other operations easier. + +@item +Make a backup of your source tree (not totally necessary but useful for +reference and peace of mind): Move one level up from the top directory +of your branch and do, e.g. + +@example +cp -a mule mule-backup-2-23-02 +@end example + +@item +Now, we're ready to merge! Make sure you're in the top directory of +your branch and do, e.g. + +@example +cvs update -j last-sync-ben-mule-21-5 -j next-sync-ben-mule-21-5 +@end example + +@item +Fix all merge conflicts. Get the sucker to compile and run. + +@item +Tag your branch with a post-sync tag, e.g. + +@example +crw rtag -r ben-mule-21-5 ben-mule-21-5-post-feb-20-2002-sync xemacs +@end example + +@item +Update the last-sync tag, e.g. + +@example +crw rtag -F -r next-sync-ben-mule-21-5 last-sync-ben-mule-21-5 xemacs +@end example +@end enumerate + + +@node XEmacs from the Inside, The XEmacs Object System (Abstractly Speaking), CVS Techniques, Top +@chapter XEmacs from the Inside +@cindex XEmacs from the inside +@cindex inside, XEmacs from the + +Internally, XEmacs is quite complex, and can be very confusing. To +simplify things, it can be useful to think of XEmacs as containing an +event loop that ``drives'' everything, and a number of other subsystems, +such as a Lisp engine and a redisplay mechanism. Each of these other +subsystems exists simultaneously in XEmacs, and each has a certain +state. The flow of control continually passes in and out of these +different subsystems in the course of normal operation of the editor. + +It is important to keep in mind that, most of the time, the editor is +``driven'' by the event loop. Except during initialization and batch +mode, all subsystems are entered directly or indirectly through the +event loop, and ultimately, control exits out of all subsystems back up +to the event loop. This cycle of entering a subsystem, exiting back out +to the event loop, and starting another iteration of the event loop +occurs once each keystroke, mouse motion, etc. + +If you're trying to understand a particular subsystem (other than the +event loop), think of it as a ``daemon'' process or ``servant'' that is +responsible for one particular aspect of a larger system, and +periodically receives commands or environment changes that cause it to +do something. Ultimately, these commands and environment changes are +always triggered by the event loop. For example: + +@itemize @bullet +@item +The window and frame mechanism is responsible for keeping track of what +windows and frames exist, what buffers are in them, etc. It is +periodically given commands (usually from the user) to make a change to +the current window/frame state: i.e. create a new frame, delete a +window, etc. + +@item +The buffer mechanism is responsible for keeping track of what buffers +exist and what text is in them. It is periodically given commands +(usually from the user) to insert or delete text, create a buffer, etc. +When it receives a text-change command, it notifies the redisplay +mechanism. + +@item +The redisplay mechanism is responsible for making sure that windows and +frames are displayed correctly. It is periodically told (by the event +loop) to actually ``do its job'', i.e. snoop around and see what the +current state of the environment (mostly of the currently-existing +windows, frames, and buffers) is, and make sure that state matches +what's actually displayed. It keeps lots and lots of information around +(such as what is actually being displayed currently, and what the +environment was last time it checked) so that it can minimize the work +it has to do. It is also helped along in that whenever a relevant +change to the environment occurs, the redisplay mechanism is told about +this, so it has a pretty good idea of where it has to look to find +possible changes and doesn't have to look everywhere. + +@item +The Lisp engine is responsible for executing the Lisp code in which most +user commands are written. It is entered through a call to @code{eval} +or @code{funcall}, which occurs as a result of dispatching an event from +the event loop. The functions it calls issue commands to the buffer +mechanism, the window/frame subsystem, etc. + +@item +The Lisp allocation subsystem is responsible for keeping track of Lisp +objects. It is given commands from the Lisp engine to allocate objects, +garbage collect, etc. +@end itemize + +etc. + + The important idea here is that there are a number of independent +subsystems each with its own responsibility and persistent state, just +like different employees in a company, and each subsystem is +periodically given commands from other subsystems. Commands can flow +from any one subsystem to any other, but there is usually some sort of +hierarchy, with all commands originating from the event subsystem. + + XEmacs is entered in @code{main()}, which is in @file{emacs.c}. When +this is called the first time (in a properly-invoked @file{temacs}), it +does the following: + +@enumerate +@item +It does some very basic environment initializations, such as determining +where it and its directories (e.g. @file{lisp/} and @file{etc/}) reside +and setting up signal handlers. +@item +It initializes the entire Lisp interpreter. +@item +It sets the initial values of many built-in variables (including many +variables that are visible to Lisp programs), such as the global keymap +object and the built-in faces (a face is an object that describes the +display characteristics of text). This involves creating Lisp objects +and thus is dependent on step (2). +@item +It performs various other initializations that are relevant to the +particular environment it is running in, such as retrieving environment +variables, determining the current date and the user who is running the +program, examining its standard input, creating any necessary file +descriptors, etc. +@item +At this point, the C initialization is complete. A Lisp program that +was specified on the command line (usually @file{loadup.el}) is called +(temacs is normally invoked as @code{temacs -batch -l loadup.el dump}). +@file{loadup.el} loads all of the other Lisp files that are needed for +the operation of the editor, calls the @code{dump-emacs} function to +write out @file{xemacs}, and then kills the temacs process. +@end enumerate + + When @file{xemacs} is then run, it only redoes steps (1) and (4) +above; all variables already contain the values they were set to when +the executable was dumped, and all memory that was allocated with +@code{malloc()} is still around. (XEmacs knows whether it is being run +as @file{xemacs} or @file{temacs} because it sets the global variable +@code{initialized} to 1 after step (4) above.) At this point, +@file{xemacs} calls a Lisp function to do any further initialization, +which includes parsing the command-line (the C code can only do limited +command-line parsing, which includes looking for the @samp{-batch} and +@samp{-l} flags and a few other flags that it needs to know about before +initialization is complete), creating the first frame (or @dfn{window} +in standard window-system parlance), running the user's init file +(usually the file @file{.emacs} in the user's home directory), etc. The +function to do this is usually called @code{normal-top-level}; +@file{loadup.el} tells the C code about this function by setting its +name as the value of the Lisp variable @code{top-level}. + + When the Lisp initialization code is done, the C code enters the event +loop, and stays there for the duration of the XEmacs process. The code +for the event loop is contained in @file{cmdloop.c}, and is called +@code{Fcommand_loop_1()}. Note that this event loop could very well be +written in Lisp, and in fact a Lisp version exists; but apparently, +doing this makes XEmacs run noticeably slower. + + Notice how much of the initialization is done in Lisp, not in C. +In general, XEmacs tries to move as much code as is possible +into Lisp. Code that remains in C is code that implements the +Lisp interpreter itself, or code that needs to be very fast, or +code that needs to do system calls or other such stuff that +needs to be done in C, or code that needs to have access to +``forbidden'' structures. (One conscious aspect of the design of +Lisp under XEmacs is a clean separation between the external +interface to a Lisp object's functionality and its internal +implementation. Part of this design is that Lisp programs +are forbidden from accessing the contents of the object other +than through using a standard API. In this respect, XEmacs Lisp +is similar to modern Lisp dialects but differs from GNU Emacs, +which tends to expose the implementation and allow Lisp +programs to look at it directly. The major advantage of +hiding the implementation is that it allows the implementation +to be redesigned without affecting any Lisp programs, including +those that might want to be ``clever'' by looking directly at +the object's contents and possibly manipulating them.) + + Moving code into Lisp makes the code easier to debug and maintain and +makes it much easier for people who are not XEmacs developers to +customize XEmacs, because they can make a change with much less chance +of obscure and unwanted interactions occurring than if they were to +change the C code. + +@node The XEmacs Object System (Abstractly Speaking), How Lisp Objects Are Represented in C, XEmacs from the Inside, Top +@chapter The XEmacs Object System (Abstractly Speaking) +@cindex XEmacs object system (abstractly speaking), the +@cindex object system (abstractly speaking), the XEmacs + + At the heart of the Lisp interpreter is its management of objects. +XEmacs Lisp contains many built-in objects, some of which are +simple and others of which can be very complex; and some of which +are very common, and others of which are rarely used or are only +used internally. (Since the Lisp allocation system, with its +automatic reclamation of unused storage, is so much more convenient +than @code{malloc()} and @code{free()}, the C code makes extensive use of it +in its internal operations.) + + The basic Lisp objects are + +@table @code +@item integer +31 bits of precision, or 63 bits on 64-bit machines; the +reason for this is described below when the internal Lisp object +representation is described. +@item char +An object representing a single character of text; chars behave like +integers in many ways but are logically considered text rather than +numbers and have a different read syntax. (the read syntax for a char +contains the char itself or some textual encoding of it---for example, +a Japanese Kanji character might be encoded as @samp{^[$(B#&^[(B} using the +ISO-2022 encoding standard---rather than the numerical representation +of the char; this way, if the mapping between chars and integers +changes, which is quite possible for Kanji characters and other extended +characters, the same character will still be created. Note that some +primitives confuse chars and integers. The worst culprit is @code{eq}, +which makes a special exception and considers a char to be @code{eq} to +its integer equivalent, even though in no other case are objects of two +different types @code{eq}. The reason for this monstrosity is +compatibility with existing code; the separation of char from integer +came fairly recently.) +@item float +Same precision as a double in C. +@item bignum +@itemx ratio +@itemx bigfloat +As build-time options, arbitrary-precision numbers are available. +Bignums are integers, and when available they remove the restriction on +buffer size. Ratios are non-integral rational numbers. Bigfloats are +arbitrary-precision floating point numbers, with precision specified at +runtime. +@item symbol +An object that contains Lisp objects and is referred to by name; +symbols are used to implement variables and named functions +and to provide the equivalent of preprocessor constants in C. +@item string +Self-explanatory; behaves much like a vector of chars +but has a different read syntax and is stored and manipulated +more compactly. +@item bit-vector +A vector of bits; similar to a string in spirit. +@item vector +A one-dimensional array of Lisp objects providing constant-time access +to any of the objects; access to an arbitrary object in a vector is +faster than for lists, but the operations that can be done on a vector +are more limited. +@item compiled-function +An object containing compiled Lisp code, known as @dfn{byte code}. +@item subr +A Lisp primitive, i.e. a Lisp-callable function implemented in C. +@item cons +A simple container for two Lisp objects, used to implement lists and +most other data structures in Lisp. +@end table + +Objects which are not conses are called atoms. + +@cindex closure +Note that there is no basic ``function'' type, as in more powerful +versions of Lisp (where it's called a @dfn{closure}). XEmacs Lisp does +not provide the closure semantics implemented by Common Lisp and Scheme. +The guts of a function in XEmacs Lisp are represented in one of four +ways: a symbol specifying another function (when one function is an +alias for another), a list (whose first element must be the symbol +@code{lambda}) containing the function's source code, a +compiled-function object, or a subr object. (In other words, given a +symbol specifying the name of a function, calling @code{symbol-function} +to retrieve the contents of the symbol's function cell will return one +of these types of objects.) + +XEmacs Lisp also contains numerous specialized objects used to implement +the editor: + +@table @code +@item buffer +Stores text like a string, but is optimized for insertion and deletion +and has certain other properties that can be set. +@item frame +An object with various properties whose displayable representation is a +@dfn{window} in window-system parlance. +@item window +A section of a frame that displays the contents of a buffer; +often called a @dfn{pane} in window-system parlance. +@item window-configuration +An object that represents a saved configuration of windows in a frame. +@item device +An object representing a screen on which frames can be displayed; +equivalent to a @dfn{display} in the X Window System and a @dfn{TTY} in +character mode. +@item face +An object specifying the appearance of text or graphics; it has +properties such as font, foreground color, and background color. +@item marker +An object that refers to a particular position in a buffer and moves +around as text is inserted and deleted to stay in the same relative +position to the text around it. +@item extent +Similar to a marker but covers a range of text in a buffer; can also +specify properties of the text, such as a face in which the text is to +be displayed, whether the text is invisible or unmodifiable, etc. +@item event +Generated by calling @code{next-event} and contains information +describing a particular event happening in the system, such as the user +pressing a key or a process terminating. +@item keymap +An object that maps from events (described using lists, vectors, and +symbols rather than with an event object because the mapping is for +classes of events, rather than individual events) to functions to +execute or other events to recursively look up; the functions are +described by name, using a symbol, or using lists to specify the +function's code. +@item glyph +An object that describes the appearance of an image (e.g. pixmap) on +the screen; glyphs can be attached to the beginning or end of extents +and in some future version of XEmacs will be able to be inserted +directly into a buffer. +@item process +An object that describes a connection to an externally-running process. +@end table + + There are some other, less-commonly-encountered general objects: + +@table @code +@item hash-table +An object that maps from an arbitrary Lisp object to another arbitrary +Lisp object, using hashing for fast lookup. +@item obarray +A limited form of hash-table that maps from strings to symbols; obarrays +are used to look up a symbol given its name and are not actually their +own object type but are kludgily represented using vectors with hidden +fields (this representation derives from GNU Emacs). +@item specifier +A complex object used to specify the value of a display property; a +default value is given and different values can be specified for +particular frames, buffers, windows, devices, or classes of device. +@item char-table +An object that maps from chars or classes of chars to arbitrary Lisp +objects; internally char tables use a complex nested-vector +representation that is optimized to the way characters are represented +as integers. +@item range-table +An object that maps from ranges of integers to arbitrary Lisp objects. +@end table + + And some strange special-purpose objects: + +@table @code +@item charset +@itemx coding-system +Objects used when MULE, or multi-lingual/Asian-language, support is +enabled. +@item color-instance +@itemx font-instance +@itemx image-instance +An object that encapsulates a window-system resource; instances are +mostly used internally but are exposed on the Lisp level for cleanness +of the specifier model and because it's occasionally useful for Lisp +program to create or query the properties of instances. +@item subwindow +An object that encapsulate a @dfn{subwindow} resource, i.e. a +window-system child window that is drawn into by an external process; +this object should be integrated into the glyph system but isn't yet, +and may change form when this is done. +@item tooltalk-message +@itemx tooltalk-pattern +Objects that represent resources used in the ToolTalk interprocess +communication protocol. +@item toolbar-button +An object used in conjunction with the toolbar. +@end table + + And objects that are only used internally: + +@table @code +@item opaque +A generic object for encapsulating arbitrary memory; this allows you the +generality of @code{malloc()} and the convenience of the Lisp object +system. +@item lstream +A buffering I/O stream, used to provide a unified interface to anything +that can accept output or provide input, such as a file descriptor, a +stdio stream, a chunk of memory, a Lisp buffer, a Lisp string, etc.; +it's a Lisp object to make its memory management more convenient. +@item char-table-entry +Subsidiary objects in the internal char-table representation. +@item extent-auxiliary +@itemx menubar-data +@itemx toolbar-data +Various special-purpose objects that are basically just used to +encapsulate memory for particular subsystems, similar to the more +general ``opaque'' object. +@item symbol-value-forward +@itemx symbol-value-buffer-local +@itemx symbol-value-varalias +@itemx symbol-value-lisp-magic +Special internal-only objects that are placed in the value cell of a +symbol to indicate that there is something special with this variable -- +e.g. it has no value, it mirrors another variable, or it mirrors some C +variable; there is really only one kind of object, called a +@dfn{symbol-value-magic}, but it is sort-of halfway kludged into +semi-different object types. +@end table + +@cindex permanent objects +@cindex temporary objects + Some types of objects are @dfn{permanent}, meaning that once created, +they do not disappear until explicitly destroyed, using a function such +as @code{delete-buffer}, @code{delete-window}, @code{delete-frame}, etc. +Others will disappear once they are not longer used, through the garbage +collection mechanism. Buffers, frames, windows, devices, and processes +are among the objects that are permanent. Note that some objects can go +both ways: Faces can be created either way; extents are normally +permanent, but detached extents (extents not referring to any text, as +happens to some extents when the text they are referring to is deleted) +are temporary. Note that some permanent objects, such as faces and +coding systems, cannot be deleted. Note also that windows are unique in +that they can be @emph{undeleted} after having previously been +deleted. (This happens as a result of restoring a window configuration.) + +@cindex read syntax + Many types of objects have a @dfn{read syntax}, i.e. a way of +specifying an object of that type in Lisp code. When you load a Lisp +file, or type in code to be evaluated, what really happens is that the +function @code{read} is called, which reads some text and creates an object +based on the syntax of that text; then @code{eval} is called, which +possibly does something special; then this loop repeats until there's +no more text to read. (@code{eval} only actually does something special +with symbols, which causes the symbol's value to be returned, +similar to referencing a variable; and with conses [i.e. lists], +which cause a function invocation. All other values are returned +unchanged.) + + The read syntax + +@example +17297 +@end example + +converts to an integer whose value is 17297. + +@example +355/113 +@end example + +converts to a ratio commonly used to approximate @emph{pi} when ratios +are configured, and otherwise to a symbol whose name is ``355/113'' (for +backward compatibility). + +@example +1.983e-4 +@end example + +converts to a float whose value is 1.983e-4, or .0001983. + +@example +?b +@end example + +converts to a char that represents the lowercase letter b. + +@example +?^[$(B#&^[(B +@end example + +(where @samp{^[} actually is an @samp{ESC} character) converts to a +particular Kanji character when using an ISO2022-based coding system for +input. (To decode this goo: @samp{ESC} begins an escape sequence; +@samp{ESC $ (} is a class of escape sequences meaning ``switch to a +94x94 character set''; @samp{ESC $ ( B} means ``switch to Japanese +Kanji''; @samp{#} and @samp{&} collectively index into a 94-by-94 array +of characters [subtract 33 from the ASCII value of each character to get +the corresponding index]; @samp{ESC (} is a class of escape sequences +meaning ``switch to a 94 character set''; @samp{ESC (B} means ``switch +to US ASCII''. It is a coincidence that the letter @samp{B} is used to +denote both Japanese Kanji and US ASCII. If the first @samp{B} were +replaced with an @samp{A}, you'd be requesting a Chinese Hanzi character +from the GB2312 character set.) + +@example +"foobar" +@end example + +converts to a string. + +@example +foobar +@end example + +converts to a symbol whose name is @code{"foobar"}. This is done by +looking up the string equivalent in the global variable +@code{obarray}, whose contents should be an obarray. If no symbol +is found, a new symbol with the name @code{"foobar"} is automatically +created and added to @code{obarray}; this process is called +@dfn{interning} the symbol. +@cindex interning + +@example +(foo . bar) +@end example + +converts to a cons cell containing the symbols @code{foo} and @code{bar}. + +@example +(1 a 2.5) +@end example + +converts to a three-element list containing the specified objects +(note that a list is actually a set of nested conses; see the +XEmacs Lisp Reference). + +@example +[1 a 2.5] +@end example + +converts to a three-element vector containing the specified objects. + +@example +#[... ... ... ...] +@end example + +converts to a compiled-function object (the actual contents are not +shown since they are not relevant here; look at a file that ends with +@file{.elc} for examples). + +@example +#*01110110 +@end example + +converts to a bit-vector. + +@example +#s(hash-table ... ...) +@end example + +converts to a hash table (the actual contents are not shown). + +@example +#s(range-table ... ...) +@end example + +converts to a range table (the actual contents are not shown). + +@example +#s(char-table ... ...) +@end example + +converts to a char table (the actual contents are not shown). + +Note that the @code{#s()} syntax is the general syntax for structures, +which are not really implemented in XEmacs Lisp but should be. + +When an object is printed out (using @code{print} or a related +function), the read syntax is used, so that the same object can be read +in again. + +The other objects do not have read syntaxes, usually because it does not +really make sense to create them in this fashion (i.e. processes, where +it doesn't make sense to have a subprocess created as a side effect of +reading some Lisp code), or because they can't be created at all +(e.g. subrs). Permanent objects, as a rule, do not have a read syntax; +nor do most complex objects, which contain too much state to be easily +initialized through a read syntax. + +@node How Lisp Objects Are Represented in C, Allocation of Objects in XEmacs Lisp, The XEmacs Object System (Abstractly Speaking), Top +@chapter How Lisp Objects Are Represented in C +@cindex Lisp objects are represented in C, how +@cindex objects are represented in C, how Lisp +@cindex represented in C, how Lisp objects are + +Lisp objects are represented in C using a 32-bit or 64-bit machine word +(depending on the processor; i.e. DEC Alphas use 64-bit Lisp objects and +most other processors use 32-bit Lisp objects). The representation +stuffs a pointer together with a tag, as follows: + +@example + [ 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 ] + [ 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 ] + + <---------------------------------------------------------> <-> + a pointer to a structure, or an integer tag +@end example + +A tag of 00 is used for all pointer object types, a tag of 10 is used +for characters, and the other two tags 01 and 11 are joined together to +form the integer object type. This representation gives us 31 bit +integers and 30 bit characters, while pointers are represented directly +without any bit masking or shifting. This representation, though, +assumes that pointers to structs are always aligned to multiples of 4, +so the lower 2 bits are always zero. + +Lisp objects use the typedef @code{Lisp_Object}, but the actual C type +used for the Lisp object can vary. It can be either a simple type +(@code{long} on the DEC Alpha, @code{int} on other machines) or a +structure whose fields are bit fields that line up properly (actually, a +union of structures is used). Generally the simple integral type is +preferable because it ensures that the compiler will actually use a +machine word to represent the object (some compilers will use more +general and less efficient code for unions and structs even if they can +fit in a machine word). The union type, however, has the advantage of +stricter type checking. If you accidentally pass an integer where a Lisp +object is desired, you get a compile error. The choice of which type +to use is determined by the preprocessor constant @code{USE_UNION_TYPE} +which is defined via the @code{--use-union-type} option to +@code{configure}. + +Various macros are used to convert between Lisp_Objects and the +corresponding C type. Macros of the form @code{XINT()}, @code{XCHAR()}, +@code{XSTRING()}, @code{XSYMBOL()}, do any required bit shifting and/or +masking and cast it to the appropriate type. @code{XINT()} needs to be +a bit tricky so that negative numbers are properly sign-extended. Since +integers are stored left-shifted, if the right-shift operator does an +arithmetic shift (i.e. it leaves the most-significant bit as-is rather +than shifting in a zero, so that it mimics a divide-by-two even for +negative numbers) the shift to remove the tag bit is enough. This is +the case on all the systems we support. + +Note that when @code{ERROR_CHECK_TYPECHECK} is defined, the converter +macros become more complicated---they check the tag bits and/or the +type field in the first four bytes of a record type to ensure that the +object is really of the correct type. This is great for catching places +where an incorrect type is being dereferenced---this typically results +in a pointer being dereferenced as the wrong type of structure, with +unpredictable (and sometimes not easily traceable) results. + +There are similar @code{XSET@var{TYPE}()} macros that construct a Lisp +object. These macros are of the form @code{XSET@var{TYPE} +(@var{lvalue}, @var{result})}, i.e. they have to be a statement rather +than just used in an expression. The reason for this is that standard C +doesn't let you ``construct'' a structure (but GCC does). Granted, this +sometimes isn't too convenient; for the case of integers, at least, you +can use the function @code{make_int()}, which constructs and +@emph{returns} an integer Lisp object. Note that the +@code{XSET@var{TYPE}()} macros are also affected by +@code{ERROR_CHECK_TYPECHECK} and make sure that the structure is of the +right type in the case of record types, where the type is contained in +the structure. + +The C programmer is responsible for @strong{guaranteeing} that a +Lisp_Object is the correct type before using the @code{X@var{TYPE}} +macros. This is especially important in the case of lists. Use +@code{XCAR} and @code{XCDR} if a Lisp_Object is certainly a cons cell, +else use @code{Fcar()} and @code{Fcdr()}. Trust other C code, but not +Lisp code. On the other hand, if XEmacs has an internal logic error, +it's better to crash immediately, so sprinkle @code{assert()}s and +``unreachable'' @code{abort()}s liberally about the source code. Where +performance is an issue, use @code{type_checking_assert}, +@code{bufpos_checking_assert}, and @code{gc_checking_assert}, which do +nothing unless the corresponding configure error checking flag was +specified. + +@node Allocation of Objects in XEmacs Lisp, The Lisp Reader and Compiler, How Lisp Objects Are Represented in C, Top @chapter Allocation of Objects in XEmacs Lisp @cindex allocation of objects in XEmacs Lisp @cindex objects in XEmacs Lisp, allocation of @@ -7060,1559 +7331,15 @@ Not yet documented. -@node Dumping, Events and the Event Loop, Allocation of Objects in XEmacs Lisp, Top -@chapter Dumping -@cindex dumping - -@menu -* Dumping Justification:: -* Overview:: -* Data descriptions:: -* Dumping phase:: -* Reloading phase:: -* Remaining issues:: -@end menu - -@node Dumping Justification, Overview, Dumping, Dumping -@section Dumping Justification -@cindex dumping, justification - -The C code of XEmacs is just a Lisp engine with a lot of built-in -primitives useful for writing an editor. The editor itself is written -mostly in Lisp, and represents around 100K lines of code. Loading and -executing the initialization of all this code takes a bit a time (five -to ten times the usual startup time of current xemacs) and requires -having all the lisp source files around. Having to reload them each -time the editor is started would not be acceptable. - -The traditional solution to this problem is called dumping: the build -process first creates the lisp engine under the name @file{temacs}, then -runs it until it has finished loading and initializing all the lisp -code, and eventually creates a new executable called @file{xemacs} -including both the object code in @file{temacs} and all the contents of -the memory after the initialization. - -This solution, while working, has a huge problem: the creation of the -new executable from the actual contents of memory is an extremely -system-specific process, quite error-prone, and which interferes with a -lot of system libraries (like malloc). It is even getting worse -nowadays with libraries using constructors which are automatically -called when the program is started (even before @code{main()}) which tend to -crash when they are called multiple times, once before dumping and once -after (IRIX 6.x @file{libz.so} pulls in some C++ image libraries thru -dependencies which have this problem). Writing the dumper is also one -of the most difficult parts of porting XEmacs to a new operating system. -Basically, `dumping' is an operation that is just not officially -supported on many operating systems. - -The aim of the portable dumper is to solve the same problem as the -system-specific dumper, that is to be able to reload quickly, using only -a small number of files, the fully initialized lisp part of the editor, -without any system-specific hacks. - -@node Overview, Data descriptions, Dumping Justification, Dumping -@section Overview -@cindex dumping overview - -The portable dumping system has to: - -@enumerate -@item -At dump time, write all initialized, non-quickly-rebuildable data to a -file [Note: currently named @file{xemacs.dmp}, but the name will -change], along with all information needed for the reloading. - -@item -When starting xemacs, reload the dump file, relocate it to its new -starting address if needed, and reinitialize all pointers to this -data. Also, rebuild all the quickly rebuildable data. -@end enumerate - -Note: As of 21.5.18, the dump file has been moved inside of the -executable, although there are still problems with this on some systems. - -@node Data descriptions, Dumping phase, Overview, Dumping -@section Data descriptions -@cindex dumping data descriptions - -The more complex task of the dumper is to be able to write memory blocks -on the heap (lisp objects, i.e. lrecords, and C-allocated memory, such -as structs and arrays) to disk and reload them at a different address, -updating all the pointers they include in the process. This is done by -using external data descriptions that give information about the layout -of the blocks in memory. - -The specification of these descriptions is in lrecord.h. A description -of an lrecord is an array of struct memory_description. Each of these -structs include a type, an offset in the block and some optional -parameters depending on the type. For instance, here is the string -description: - -@example -static const struct memory_description string_description[] = @{ - @{ XD_BYTECOUNT, offsetof (Lisp_String, size) @}, - @{ XD_OPAQUE_DATA_PTR, offsetof (Lisp_String, data), XD_INDIRECT(0, 1) @}, - @{ XD_LISP_OBJECT, offsetof (Lisp_String, plist) @}, - @{ XD_END @} -@}; -@end example - -The first line indicates a member of type Bytecount, which is used by -the next, indirect directive. The second means "there is a pointer to -some opaque data in the field @code{data}". The length of said data is -given by the expression @code{XD_INDIRECT(0, 1)}, which means "the value -in the 0th line of the description (welcome to C) plus one". The third -line means "there is a Lisp_Object member @code{plist} in the Lisp_String -structure". @code{XD_END} then ends the description. - -This gives us all the information we need to move around what is pointed -to by a memory block (C or lrecord) and, by transitivity, everything -that it points to. The only missing information for dumping is the size -of the block. For lrecords, this is part of the -lrecord_implementation, so we don't need to duplicate it. For C blocks -we use a struct sized_memory_description, which includes a size field -and a pointer to an associated array of memory_description. - -@node Dumping phase, Reloading phase, Data descriptions, Dumping -@section Dumping phase -@cindex dumping phase - -Dumping is done by calling the function @code{pdump()} (in @file{dumper.c}) which is -invoked from Fdump_emacs (in @file{emacs.c}). This function performs a number -of tasks. - -@menu -* Object inventory:: -* Address allocation:: -* The header:: -* Data dumping:: -* Pointers dumping:: -@end menu - -@node Object inventory, Address allocation, Dumping phase, Dumping phase -@subsection Object inventory -@cindex dumping object inventory -@cindex memory blocks - -The first task is to build the list of the objects to dump. This -includes: - -@itemize @bullet -@item lisp objects -@item other memory blocks (C structures, arrays. etc) -@end itemize - -We end up with one @code{pdump_block_list_elt} per object group (arrays -of C structs are kept together) which includes a pointer to the first -object of the group, the per-object size and the count of objects in the -group, along with some other information which is initialized later. - -These entries are linked together in @code{pdump_block_list} structures -and can be enumerated thru either: - -@enumerate -@item -the @code{pdump_object_table}, an array of @code{pdump_block_list}, one -per lrecord type, indexed by type number. - -@item -the @code{pdump_opaque_data_list}, used for the opaque data which does -not include pointers, and hence does not need descriptions. - -@item -the @code{pdump_desc_table}, which is a vector of -@code{memory_description}/@code{pdump_block_list} pairs, used for -non-opaque C memory blocks. -@end enumerate - -This uses a marking strategy similar to the garbage collector. Some -differences though: - -@enumerate -@item -We do not use the mark bit (which does not exist for generic memory blocks -anyway); we use a big hash table instead. - -@item -We do not use the mark function of lrecords but instead rely on the -external descriptions. This happens essentially because we need to -follow pointers to generic memory blocks and opaque data in addition to -Lisp_Object members. -@end enumerate - -This is done by @code{pdump_register_object()}, which handles -Lisp_Object variables, and @code{pdump_register_block()} which handles -generic memory blocks (C structures, arrays, etc.), which both delegate -the description management to @code{pdump_register_sub()}. - -The hash table doubles as a map object to pdump_block_list_elmt (i.e. -allows us to look up a pdump_block_list_elmt with the object it points -to). Entries are added with @code{pdump_add_block()} and looked up with -@code{pdump_get_block()}. There is no need for entry removal. The hash -value is computed quite simply from the object pointer by -@code{pdump_make_hash()}. - -The roots for the marking are: - -@enumerate -@item -the @code{staticpro}'ed variables (there is a special -@code{staticpro_nodump()} call for protected variables we do not want to -dump). - -@item -the Lisp_Object variables registered via @code{dump_add_root_lisp_object} -(@code{staticpro()} is equivalent to @code{staticpro_nodump()} + -@code{dump_add_root_lisp_object()}). - -@item -the data-segment memory blocks registered via @code{dump_add_root_block} -(for blocks with relocatable pointers), or @code{dump_add_opaque} (for -"opaque" blocks with no relocatable pointers; this is just a shortcut -for calling @code{dump_add_root_block} with a NULL description). - -@item -the pointer variables registered via @code{dump_add_root_block_ptr}, -each of which points to a block of heap memory (generally a C structure -or array). Note that @code{dump_add_root_block_ptr} is not technically -necessary, as a pointer variable can be seen as a special case of a -data-segment memory block and registered using -@code{dump_add_root_block}. Doing it this way, however, would require -another level of static structures declared. Since pointer variables -are quite common, @code{dump_add_root_block_ptr} is provided for -convenience. Note also that internally we have to treat it separately -from @code{dump_add_root_block} rather than writing the former as a call -to the latter, since we don't have support for creating and using memory -descriptions on the fly -- they must all be statically declared in the -data-segment. -@end enumerate - -This does not include the GCPRO'ed variables, the specbinds, the -catchtags, the backlist, the redisplay or the profiling info, since we -do not want to rebuild the actual chain of lisp calls which end up to -the dump-emacs call, only the global variables. - -Weak lists and weak hash tables are dumped as if they were their -non-weak equivalent (without changing their type, of course). This has -not yet been a problem. - -@node Address allocation, The header, Object inventory, Dumping phase -@subsection Address allocation -@cindex dumping address allocation - - -The next step is to allocate the offsets of each of the objects in the -final dump file. This is done by @code{pdump_allocate_offset()} which -is called indirectly by @code{pdump_scan_by_alignment()}. - -The strategy to deal with alignment problems uses these facts: - -@enumerate -@item -real world alignment requirements are powers of two. - -@item -the C compiler is required to adjust the size of a struct so that you -can have an array of them next to each other. This means you can have an -upper bound of the alignment requirements of a given structure by -looking at which power of two its size is a multiple. - -@item -the non-variant part of variable size lrecords has an alignment -requirement of 4. -@end enumerate - -Hence, for each lrecord type, C struct type or opaque data block the -alignment requirement is computed as a power of two, with a minimum of -2^2 for lrecords. @code{pdump_scan_by_alignment()} then scans all the -@code{pdump_block_list_elmt}'s, the ones with the highest requirements -first. This ensures the best packing. - -The maximum alignment requirement we take into account is 2^8. - -@code{pdump_allocate_offset()} only has to do a linear allocation, -starting at offset 256 (this leaves room for the header and keeps the -alignments happy). - -@node The header, Data dumping, Address allocation, Dumping phase -@subsection The header -@cindex dumping, the header - -The next step creates the file and writes a header with a signature and -some random information in it. The @code{reloc_address} field, which -indicates at which address the file should be loaded if we want to avoid -post-reload relocation, is set to 0. It then seeks to offset 256 (base -offset for the objects). - -@node Data dumping, Pointers dumping, The header, Dumping phase -@subsection Data dumping -@cindex data dumping -@cindex dumping, data - -The data is dumped in the same order as the addresses were allocated by -@code{pdump_dump_data()}, called from @code{pdump_scan_by_alignment()}. -This function copies the data to a temporary buffer, relocates all -pointers in the object to the addresses allocated in step Address -Allocation, and writes it to the file. Using the same order means that, -if we are careful with lrecords whose size is not a multiple of 4, we -are ensured that the object is always written at the offset in the file -allocated in step Address Allocation. - -@node Pointers dumping, , Data dumping, Dumping phase -@subsection Pointers dumping -@cindex pointers dumping -@cindex dumping, pointers - -A bunch of tables needed to reassign properly the global pointers are -then written. They are: - -@enumerate -@item -the pdump_root_block_ptrs dynarr -@item -the pdump_opaques dynarr -@item -a vector of all the offsets to the objects in the file that include a -description (for faster relocation at reload time) -@item -the pdump_root_objects and pdump_weak_object_chains dynarrs. -@end enumerate - -For each of the dynarrs we write both the pointer to the variables and -the relocated offset of the object they point to. Since these variables -are global, the pointers are still valid when restarting the program and -are used to regenerate the global pointers. - -The @code{pdump_weak_object_chains} dynarr is a special case. The -variables it points to are the head of weak linked lists of lisp objects -of the same type. Not all objects of this list are dumped so the -relocated pointer we associate with them points to the first dumped -object of the list, or Qnil if none is available. This is also the -reason why they are not used as roots for the purpose of object -enumeration. - -Some very important information like the @code{staticpros} and -@code{lrecord_implementations_table} are handled indirectly using -@code{dump_add_opaque} or @code{dump_add_root_block_ptr}. - -This is the end of the dumping part. - -@node Reloading phase, Remaining issues, Dumping phase, Dumping -@section Reloading phase -@cindex reloading phase -@cindex dumping, reloading phase - -@subsection File loading -@cindex dumping, file loading - -The file is mmap'ed in memory (which ensures a PAGESIZE alignment, at -least 4096), or if mmap is unavailable or fails, a 256-bytes aligned -malloc is done and the file is loaded. - -Some variables are reinitialized from the values found in the header. - -The difference between the actual loading address and the reloc_address -is computed and will be used for all the relocations. - - -@subsection Putting back the pdump_opaques -@cindex dumping, putting back the pdump_opaques - -The memory contents are restored in the obvious and trivial way. - - -@subsection Putting back the pdump_root_block_ptrs -@cindex dumping, putting back the pdump_root_block_ptrs - -The variables pointed to by pdump_root_block_ptrs in the dump phase are -reset to the right relocated object addresses. - - -@subsection Object relocation -@cindex dumping, object relocation - -All the objects are relocated using their description and their offset -by @code{pdump_reloc_one}. This step is unnecessary if the -reloc_address is equal to the file loading address. - - -@subsection Putting back the pdump_root_objects and pdump_weak_object_chains -@cindex dumping, putting back the pdump_root_objects and pdump_weak_object_chains - -Same as Putting back the pdump_root_block_ptrs. - - -@subsection Reorganize the hash tables -@cindex dumping, reorganize the hash tables - -Since some of the hash values in the lisp hash tables are -address-dependent, their layout is now wrong. So we go through each of -them and have them resorted by calling @code{pdump_reorganize_hash_table}. - -@node Remaining issues, , Reloading phase, Dumping -@section Remaining issues -@cindex dumping, remaining issues - -The build process will have to start a post-dump xemacs, ask it the -loading address (which will, hopefully, be always the same between -different xemacs invocations) [[unfortunately, not true on Linux with -the ExecShield feature]] and relocate the file to the new address. -This way the object relocation phase will not have to be done, which -means no writes in the objects and that, because of the use of mmap, the -dumped data will be shared between all the xemacs running on the -computer. - -Some executable signature will be necessary to ensure that a given dump -file is really associated with a given executable, or random crashes -will occur. Maybe a random number set at compile or configure time thru -a define. This will also allow for having differently-compiled xemacsen -on the same system (mule and no-mule comes to mind). - -The DOC file contents should probably end up in the dump file. - - -@node Events and the Event Loop, Asynchronous Events; Quit Checking, Dumping, Top -@chapter Events and the Event Loop -@cindex events and the event loop -@cindex event loop, events and the - -@menu -* Introduction to Events:: -* Main Loop:: -* Specifics of the Event Gathering Mechanism:: -* Specifics About the Emacs Event:: -* Event Queues:: -* Event Stream Callback Routines:: -* Other Event Loop Functions:: -* Stream Pairs:: -* Converting Events:: -* Dispatching Events; The Command Builder:: -* Focus Handling:: -* Editor-Level Control Flow Modules:: -@end menu - -@node Introduction to Events, Main Loop, Events and the Event Loop, Events and the Event Loop -@section Introduction to Events -@cindex events, introduction to - - An event is an object that encapsulates information about an -interesting occurrence in the operating system. Events are -generated either by user action, direct (e.g. typing on the -keyboard or moving the mouse) or indirect (moving another -window, thereby generating an expose event on an Emacs frame), -or as a result of some other typically asynchronous action happening, -such as output from a subprocess being ready or a timer expiring. -Events come into the system in an asynchronous fashion (typically -through a callback being called) and are converted into a -synchronous event queue (first-in, first-out) in a process that -we will call @dfn{collection}. - - Note that each application has its own event queue. (It is -immaterial whether the collection process directly puts the -events in the proper application's queue, or puts them into -a single system queue, which is later split up.) - - The most basic level of event collection is done by the -operating system or window system. Typically, XEmacs does -its own event collection as well. Often there are multiple -layers of collection in XEmacs, with events from various -sources being collected into a queue, which is then combined -with other sources to go into another queue (i.e. a second -level of collection), with perhaps another level on top of -this, etc. - - XEmacs has its own types of events (called @dfn{Emacs events}), -which provides an abstract layer on top of the system-dependent -nature of the most basic events that are received. Part of the -complex nature of the XEmacs event collection process involves -converting from the operating-system events into the proper -Emacs events---there may not be a one-to-one correspondence. - - Emacs events are documented in @file{events.h}; I'll discuss them -later. - -@node Main Loop, Specifics of the Event Gathering Mechanism, Introduction to Events, Events and the Event Loop -@section Main Loop -@cindex main loop -@cindex events, main loop - - The @dfn{command loop} is the top-level loop that the editor is always -running. It loops endlessly, calling @code{next-event} to retrieve an -event and @code{dispatch-event} to execute it. @code{dispatch-event} does -the appropriate thing with non-user events (process, timeout, -magic, eval, mouse motion); this involves calling a Lisp handler -function, redrawing a newly-exposed part of a frame, reading -subprocess output, etc. For user events, @code{dispatch-event} -looks up the event in relevant keymaps or menubars; when a -full key sequence or menubar selection is reached, the appropriate -function is executed. @code{dispatch-event} may have to keep state -across calls; this is done in the ``command-builder'' structure -associated with each console (remember, there's usually only -one console), and the engine that looks up keystrokes and -constructs full key sequences is called the @dfn{command builder}. -This is documented elsewhere. - - The guts of the command loop are in @code{command_loop_1()}. This -function doesn't catch errors, though---that's the job of -@code{command_loop_2()}, which is a condition-case (i.e. error-trapping) -wrapper around @code{command_loop_1()}. @code{command_loop_1()} never -returns, but may get thrown out of. - - When an error occurs, @code{cmd_error()} is called, which usually -invokes the Lisp error handler in @code{command-error}; however, a -default error handler is provided if @code{command-error} is @code{nil} -(e.g. during startup). The purpose of the error handler is simply to -display the error message and do associated cleanup; it does not need to -throw anywhere. When the error handler finishes, the condition-case in -@code{command_loop_2()} will finish and @code{command_loop_2()} will -reinvoke @code{command_loop_1()}. - - @code{command_loop_2()} is invoked from three places: from -@code{initial_command_loop()} (called from @code{main()} at the end of -internal initialization), from the Lisp function @code{recursive-edit}, -and from @code{call_command_loop()}. - - @code{call_command_loop()} is called when a macro is started and when -the minibuffer is entered; normal termination of the macro or minibuffer -causes a throw out of the recursive command loop. (To -@code{execute-kbd-macro} for macros and @code{exit} for minibuffers. -Note also that the low-level minibuffer-entering function, -@code{read-minibuffer-internal}, provides its own error handling and -does not need @code{command_loop_2()}'s error encapsulation; so it tells -@code{call_command_loop()} to invoke @code{command_loop_1()} directly.) - - Note that both read-minibuffer-internal and recursive-edit set up a -catch for @code{exit}; this is why @code{abort-recursive-edit}, which -throws to this catch, exits out of either one. - - @code{initial_command_loop()}, called from @code{main()}, sets up a -catch for @code{top-level} when invoking @code{command_loop_2()}, -allowing functions to throw all the way to the top level if they really -need to. Before invoking @code{command_loop_2()}, -@code{initial_command_loop()} calls @code{top_level_1()}, which handles -all of the startup stuff (creating the initial frame, handling the -command-line options, loading the user's @file{.emacs} file, etc.). The -function that actually does this is in Lisp and is pointed to by the -variable @code{top-level}; normally this function is -@code{normal-top-level}. @code{top_level_1()} is just an error-handling -wrapper similar to @code{command_loop_2()}. Note also that -@code{initial_command_loop()} sets up a catch for @code{top-level} when -invoking @code{top_level_1()}, just like when it invokes -@code{command_loop_2()}. - -@node Specifics of the Event Gathering Mechanism, Specifics About the Emacs Event, Main Loop, Events and the Event Loop -@section Specifics of the Event Gathering Mechanism -@cindex event gathering mechanism, specifics of the - - Here is an approximate diagram of the collection processes -at work in XEmacs, under TTY's (TTY's are simpler than X -so we'll look at this first): - -@noindent -@example - asynch. asynch. asynch. asynch. [Collectors in -kbd events kbd events process process the OS] - | | output output - | | | | - | | | | SIGINT, [signal handlers - | | | | SIGQUIT, in XEmacs] - V V V V SIGWINCH, - file file file file SIGALRM - desc. desc. desc. desc. | - (TTY) (TTY) (pipe) (pipe) | - | | | | fake timeouts - | | | | file | - | | | | desc. | - | | | | (pipe) | - | | | | | | - | | | | | | - | | | | | | - V V V V V V - ------>-----------<----------------<---------------- - | - | - | [collected using @code{select()} in @code{emacs_tty_next_event()} - | and converted to the appropriate Emacs event] - | - | - V (above this line is TTY-specific) - Emacs ----------------------------------------------- - event (below this line is the generic event mechanism) - | - | -was there if not, call -a SIGINT? @code{emacs_tty_next_event()} - | | - | | - | | - V V - --->------<---- - | - | [collected in @code{event_stream_next_event()}; - | SIGINT is converted using @code{maybe_read_quit_event()}] - V - Emacs - event - | - \---->------>----- maybe_kbd_translate() ---->---\ - | - | - | - command event queue | - if not from command - (contains events that were event queue, call - read earlier but not processed, @code{event_stream_next_event()} - typically when waiting in a | - sit-for, sleep-for, etc. for | - a particular event to be received) | - | | - | | - V V - ---->------------------------------------<---- - | - | [collected in - | @code{next_event_internal()}] - | - unread- unread- event from | - command- command- keyboard else, call - events event macro @code{next_event_internal()} - | | | | - | | | | - | | | | - V V V V - --------->----------------------<------------ - | - | [collected in @code{next-event}, which may loop - | more than once if the event it gets is on - | a dead frame, device, etc.] - | - | - V - feed into top-level event loop, - which repeatedly calls @code{next-event} - and then dispatches the event - using @code{dispatch-event} -@end example - -Notice the separation between TTY-specific and generic event mechanism. -When using the Xt-based event loop, the TTY-specific stuff is replaced -but the rest stays the same. - -It's also important to realize that only one different kind of -system-specific event loop can be operating at a time, and must be able -to receive all kinds of events simultaneously. For the two existing -event loops (implemented in @file{event-tty.c} and @file{event-Xt.c}, -respectively), the TTY event loop @emph{only} handles TTY consoles, -while the Xt event loop handles @emph{both} TTY and X consoles. This -situation is different from all of the output handlers, where you simply -have one per console type. - - Here's the Xt Event Loop Diagram (notice that below a certain point, -it's the same as the above diagram): - -@example -asynch. asynch. asynch. asynch. [Collectors in - kbd kbd process process the OS] -events events output output - | | | | - | | | | asynch. asynch. [Collectors in the - | | | | X X OS and X Window System] - | | | | events events - | | | | | | - | | | | | | - | | | | | | SIGINT, [signal handlers - | | | | | | SIGQUIT, in XEmacs] - | | | | | | SIGWINCH, - | | | | | | SIGALRM - | | | | | | | - | | | | | | | - | | | | | | | timeouts - | | | | | | | | - | | | | | | | | - | | | | | | V | - V V V V V V fake | - file file file file file file file | - desc. desc. desc. desc. desc. desc. desc. | - (TTY) (TTY) (pipe) (pipe) (socket) (socket) (pipe) | - | | | | | | | | - | | | | | | | | - | | | | | | | | - V V V V V V V V - --->----------------------------------------<---------<------ - | | | - | | |[collected using @code{select()} in - | | | @code{_XtWaitForSomething()}, called - | | | from @code{XtAppProcessEvent()}, called - | | | in @code{emacs_Xt_next_event()}; - | | | dispatched to various callbacks] - | | | - | | | - emacs_Xt_ p_s_callback(), | [popup_selection_callback] - event_handler() x_u_v_s_callback(),| [x_update_vertical_scrollbar_ - | x_u_h_s_callback(),| callback] - | search_callback() | [x_update_horizontal_scrollbar_ - | | | callback] - | | | - | | | - enqueue_Xt_ signal_special_ | - dispatch_event() Xt_user_event() | - [maybe multiple | | - times, maybe 0 | | - times] | | - | enqueue_Xt_ | - | dispatch_event() | - | | | - | | | - V V | - -->----------<-- | - | | - | | - dispatch @code{Xt_what_callback()} - event sets flags - queue | - | | - | | - | | - | | - ---->-----------<-------- - | - | - | [collected and converted as appropriate in - | @code{emacs_Xt_next_event()}] - | - | - V (above this line is Xt-specific) - Emacs ------------------------------------------------ - event (below this line is the generic event mechanism) - | - | -was there if not, call -a SIGINT? @code{emacs_Xt_next_event()} - | | - | | - | | - V V - --->-------<---- - | - | [collected in @code{event_stream_next_event()}; - | SIGINT is converted using @code{maybe_read_quit_event()}] - V - Emacs - event - | - \---->------>----- maybe_kbd_translate() -->-----\ - | - | - | - command event queue | - if not from command - (contains events that were event queue, call - read earlier but not processed, @code{event_stream_next_event()} - typically when waiting in a | - sit-for, sleep-for, etc. for | - a particular event to be received) | - | | - | | - V V - ---->----------------------------------<------ - | - | [collected in - | @code{next_event_internal()}] - | - unread- unread- event from | - command- command- keyboard else, call - events event macro @code{next_event_internal()} - | | | | - | | | | - | | | | - V V V V - --------->----------------------<------------ - | - | [collected in @code{next-event}, which may loop - | more than once if the event it gets is on - | a dead frame, device, etc.] - | - | - V - feed into top-level event loop, - which repeatedly calls @code{next-event} - and then dispatches the event - using @code{dispatch-event} -@end example - -@node Specifics About the Emacs Event, Event Queues, Specifics of the Event Gathering Mechanism, Events and the Event Loop -@section Specifics About the Emacs Event -@cindex event, specifics about the Lisp object - -@node Event Queues, Event Stream Callback Routines, Specifics About the Emacs Event, Events and the Event Loop -@section Event Queues -@cindex event queues -@cindex queues, event - -There are two event queues here -- the command event queue (#### which -should be called "deferred event queue" and is in my glyph ws) and the -dispatch event queue. (MS Windows actually has an extra dispatch queue -for non-user events and uses the generic one only for user events. This -is because user and non-user events in Windows come through the same -place -- the window procedure -- but under X, it's possible to -selectively process events such that we take all the user events before -the non-user ones. #### In fact, given the way we now drain the queue, -we might need two separate queues, like under Windows. Need to think -carefully exactly how this works, and should certainly generalize the -two different queues. - -The dispatch queue (which used to occur duplicated inside of each event -implementation) is used for events that have been read from the -window-system event queue(s) and not yet process by -@code{next_event_internal()}. It exists for two reasons: (1) because in many -implementations, events often come from the window system by way of -callbacks, and need to push the event to be returned onto a queue; (2) -in order to handle QUIT in a guaranteed correct fashion without -resorting to weird implementation-specific hacks that may or may not -work well, we need to drain the window-system event queues and then look -through to see if there's an event matching quit-char (usually ^G). the -drained events need to go onto a queue. (There are other, similar cases -where we need to drain the pending events so we can look ahead -- for -example, checking for pending expose events under X to avoid excessive -server activity.) - -The command event queue is used @strong{AFTER} an event has been read from -@code{next_event_internal()}, when it needs to be pushed back. This -includes, for example, @code{accept-process-output}, @code{sleep-for} -and @code{wait_delaying_user_input()}. Eval events and the like, -generated by @code{enqueue-eval-event}, -@code{enqueue_magic_eval_event()}, etc. are also pushed onto this queue. -Some events generated by callbacks are also pushed onto this queue, #### -although maybe shouldn't be. - -The command queue takes precedence over the dispatch queue. - -#### It is worth investigating to see whether both queues are really -needed, and how exactly they should be used. @code{enqueue-eval-event}, -for example, could certainly push onto the dispatch queue, and all -callbacks maybe should. @code{wait_delaying_user_input()} seems to need -both queues, since it can take events from the dispatch queue and push -them onto the command queue; but it perhaps could be rewritten to avoid -this. #### In general we need to review the handling of these two -queues, figure out exactly what ought to be happening, and document it. - - -@node Event Stream Callback Routines, Other Event Loop Functions, Event Queues, Events and the Event Loop -@section Event Stream Callback Routines -@cindex event stream callback routines -@cindex callback routines, event stream - -There is one object called an event_stream. This object contains -callback functions for doing the window-system-dependent operations -that XEmacs requires. - -If XEmacs is compiled with support for X11 and the X Toolkit, then this -event_stream structure will contain functions that can cope with input -on XEmacs windows on multiple displays, as well as input from dumb tty -frames. - -If it is desired to have XEmacs able to open frames on the displays of -multiple heterogeneous machines, X11 and SunView, or X11 and NeXT, for -example, then it will be necessary to construct an event_stream structure -that can cope with the given types. Currently, the only implemented -event_streams are for dumb-ttys, and for X11 plus dumb-ttys, -and for mswindows. - -To implement this for one window system is relatively simple. -To implement this for multiple window systems is trickier and may -not be possible in all situations, but it's been done for X and TTY. - -Note that these callbacks are @strong{NOT} console methods; that's because -the routines are not specific to a particular console type but must -be able to simultaneously cope with all allowable console types. - -The slots of the event_stream structure: - -@table @code -@item next_event_cb -A function which fills in an XEmacs_event structure with the next event -available. If there is no event available, then this should block. - -IMPORTANT: timer events and especially process events *must not* be -returned if there are events of other types available; otherwise you can -end up with an infinite loop in @code{Fdiscard_input()}. - -@item event_pending_cb -A function which says whether there are events to be read. If called -with an argument of 0, then this should say whether calling the -@code{next_event_cb} will block. If called with a non-zero argument, -then this should say whether there are that many user-generated events -pending (that is, keypresses, mouse-clicks, dialog-box selection events, -etc.). (This is used for redisplay optimization, among other things.) -The difference is that the former includes process events and timer -events, but the latter doesn't. - -If this function is not sure whether there are events to be read, it -@strong{must} return 0. Otherwise various undesirable effects will -occur, such as redisplay not occurring until the next event occurs. - -@item handle_magic_event_cb -XEmacs calls this with an event structure which contains window-system -dependent information that XEmacs doesn't need to know about, but which -must happen in order. If the @code{next_event_cb} never returns an -event of type "magic", this will never be used. - -@item format_magic_event_cb -Called with a magic event; print a representation of the innards of the -event to @var{PSTREAM}. - -@item compare_magic_event_cb -Called with two magic events; return non-zero if the innards of the two -are equal, zero otherwise. - -@item hash_magic_event_cb -Called with a magic event; return a hash of the innards of the event. - -@item add_timeout_cb -Called with an @var{EMACS_TIME}, the absolute time at which a wakeup event -should be generated; and a void *, which is an arbitrary value that will -be returned in the timeout event. The timeouts generated by this -function should be one-shots: they fire once and then disappear. This -callback should return an int id-number which uniquely identifies this -wakeup. If an implementation doesn't have microseconds or millisecond -granularity, it should round up to the closest value it can deal with. - -@item remove_timeout_cb -Called with an int, the id number of a wakeup to discard. This id -number must have been returned by the @code{add_timeout_cb}. If the given -wakeup has already expired, this should do nothing. - -@item select_process_cb -@item unselect_process_cb -These callbacks tell the underlying implementation to add or remove a -file descriptor from the list of fds which are polled for -inferior-process input. When input becomes available on the given -process connection, an event of type "process" should be generated. - -@item select_console_cb -@item unselect_console_cb -These callbacks tell the underlying implementation to add or remove a -console from the list of consoles which are polled for user-input. - -@item select_device_cb -@item unselect_device_cb -These callbacks are used by Unixoid event loops (those that use @code{select()} -and file descriptors and have a separate input fd per device). - -@item create_io_streams_cb -@item delete_io_streams_cb -These callbacks are called by process code to create the input and -output lstreams which are used for subprocess I/O. - -@item quitp_cb -A handler function called from the @code{QUIT} macro which should check -whether the quit character has been typed. On systems with SIGIO, this -will not be called unless the @code{sigio_happened} flag is true (it is set -from the SIGIO handler). -@end table - -XEmacs has its own event structures, which are distinct from the event -structures used by X or any other window system. It is the job of the -event_stream layer to translate to this format. - -@node Other Event Loop Functions, Stream Pairs, Event Stream Callback Routines, Events and the Event Loop -@section Other Event Loop Functions -@cindex event loop functions, other - - @code{detect_input_pending()} and @code{input-pending-p} look for -input by calling @code{event_stream->event_pending_p} and looking in -@code{[V]unread-command-event} and the @code{command_event_queue} (they -do not check for an executing keyboard macro, though). - - @code{discard-input} cancels any command events pending (and any -keyboard macros currently executing), and puts the others onto the -@code{command_event_queue}. There is a comment about a ``race -condition'', which is not a good sign. - - @code{next-command-event} and @code{read-char} are higher-level -interfaces to @code{next-event}. @code{next-command-event} gets the -next @dfn{command} event (i.e. keypress, mouse event, menu selection, -or scrollbar action), calling @code{dispatch-event} on any others. -@code{read-char} calls @code{next-command-event} and uses -@code{event_to_character()} to return the character equivalent. With -the right kind of input method support, it is possible for (read-char) -to return a Kanji character. - -@node Stream Pairs, Converting Events, Other Event Loop Functions, Events and the Event Loop -@section Stream Pairs -@cindex stream pairs -@cindex pairs, stream - -Since there are many possible processes/event loop combinations, the -event code is responsible for creating an appropriate lstream type. The -process implementation does not care about that implementation. - -The Create stream pair function is passed two void* values, which -identify process-dependent 'handles'. The process implementation uses -these handles to communicate with child processes. The function must be -prepared to receive handle types of any process implementation. Since -only one process implementation exists in a particular XEmacs -configuration, preprocessing is a means of compiling in the support for -the code which deals with particular handle types. - -For example, a unixoid type loop, which relies on file descriptors, may be -asked to create a pair of streams by a unix-style process implementation. -In this case, the handles passed are unix file descriptors, and the code -may deal with these directly. Although, the same code may be used on Win32 -system with X-Windows. In this case, Win32 process implementation passes -handles of type HANDLE, and the @code{create_io_streams} function must call -appropriate function to get file descriptors given HANDLEs, so that these -descriptors may be passed to @code{XtAddInput}. - -The handle given may have special denying value, in which case the -corresponding lstream should not be created. - -The return value of the function is a unique stream identifier. It is used -by processes implementation, in its platform-independent part. There is -the get_process_from_usid function, which returns process object given its -USID. The event stream is responsible for converting its internal handle -type into USID. - -Example is the TTY event stream. When a file descriptor signals input, the -event loop must determine process to which the input is destined. Thus, -the implementation uses process input stream file descriptor as USID, by -simply casting the fd value to USID type. - -There are two special USID values. One, @code{USID_ERROR}, indicates -that the stream pair cannot be created. The second, -@code{USID_DONTHASH}, indicates that streams are created, but the event -stream does not wish to be able to find the process by its -USID. Specifically, if an event stream implementation never calls -@code{get_process_from_usid}, this value should always be returned, to -prevent accumulating useless information on USID to process -relationship. - -@node Converting Events, Dispatching Events; The Command Builder, Stream Pairs, Events and the Event Loop -@section Converting Events -@cindex converting events -@cindex events, converting - - @code{character_to_event()}, @code{event_to_character()}, -@code{event-to-character}, and @code{character-to-event} convert between -characters and keypress events corresponding to the characters. If the -event was not a keypress, @code{event_to_character()} returns -1 and -@code{event-to-character} returns @code{nil}. These functions convert -between character representation and the split-up event representation -(keysym plus mod keys). - -@node Dispatching Events; The Command Builder, Focus Handling, Converting Events, Events and the Event Loop -@section Dispatching Events; The Command Builder -@cindex dispatching events; the command builder -@cindex events; the command builder, dispatching -@cindex command builder, dispatching events; the +@node The Lisp Reader and Compiler, Evaluation; Stack Frames; Bindings, Allocation of Objects in XEmacs Lisp, Top +@chapter The Lisp Reader and Compiler +@cindex Lisp reader and compiler, the +@cindex reader and compiler, the Lisp +@cindex compiler, the Lisp reader and Not yet documented. -@node Focus Handling, Editor-Level Control Flow Modules, Dispatching Events; The Command Builder, Events and the Event Loop -@section Focus Handling -@cindex focus handling - -Ben's capsule lecture on focus: - -In GNU Emacs @code{select-frame} never changes the window-manager frame -focus. All it does is change the "selected frame". This is similar to -what happens when we call @code{select-device} or @code{select-console}. -Whenever an event comes in (including a keyboard event), its frame is -selected; therefore, evaluating @code{select-frame} in @samp{*scratch*} -won't cause any effects because the next received event (in the same -frame) will cause a switch back to the frame displaying -@samp{*scratch*}. - -Whenever a focus-change event is received from the window manager, it -generates a @code{switch-frame} event, which causes the Lisp function -@code{handle-switch-frame} to get run. This basically just runs -@code{select-frame} (see below, however). - -In GNU Emacs, if you want to have an operation run when a frame is -selected, you supply an event binding for @code{switch-frame} (and then -maybe call @code{handle-switch-frame}, or something ...). - -In XEmacs, we @strong{do} change the window-manager frame focus as a -result of @code{select-frame}, but not until the next time an event is -received, so that a function that momentarily changes the selected frame -won't cause WM focus flashing. (#### There's something not quite right -here; this is causing the wrong-cursor-focus problems that you -occasionally see. But the general idea is correct.) This approach is -winning for people who use the explicit-focus model, but is trickier to -implement. - -We also don't make the @code{switch-frame} event visible but instead have -@code{select-frame-hook}, which is a better approach. - -There is the problem of surrogate minibuffers, where when we enter the -minibuffer, you essentially want to temporarily switch the WM focus to -the frame with the minibuffer, and switch it back when you exit the -minibuffer. - -GNU Emacs solves this with the crockish @code{redirect-frame-focus}, -which says "for keyboard events received from FRAME, act like they're -coming from FOCUS-FRAME". I think what this means is that, when a -keyboard event comes in and the event manager is about to select the -event's frame, if that frame has its focus redirected, the redirected-to -frame is selected instead. That way, if you're in a minibufferless -frame and enter the minibuffer, then all Lisp functions that run see the -selected frame as the minibuffer's frame rather than the minibufferless -frame you came from, so that (e.g.) your typing actually appears in the -minibuffer's frame and things behave sanely. - -There's also some weird logic that switches the redirected frame focus -from one frame to another if Lisp code explicitly calls -@code{select-frame} (but not if @code{handle-switch-frame} is called), -and saves and restores the frame focus in window configurations, -etc. etc. All of this logic is heavily @code{#if 0}'d, with lots of -comments saying "No, this approach doesn't seem to work, so I'm trying -this ... is it reasonable? Well, I'm not sure ..." that are a red flag -indicating crockishness. - -Because of our way of doing things, we can avoid all this crock. -Keyboard events never cause a select-frame (who cares what frame they're -associated with? They come from a console, only). We change the actual -WM focus to a surrogate minibuffer frame, so we don't have to do any -internal redirection. In order to get the focus back, I took the -approach in @file{minibuf.el} of just checking to see if the frame we moved to -is still the selected frame, and move back to the old one if so. -Conceivably we might have to do the weird "tracking" that GNU Emacs does -when @code{select-frame} is called, but I don't think so. If the -selected frame moved from the minibuffer frame, then we just leave it -there, figuring that someone knows what they're doing. Because we don't -have any redirection recorded anywhere, it's safe to do this, and we -don't end up with unwanted redirection. - -@node Editor-Level Control Flow Modules, , Focus Handling, Events and the Event Loop -@section Editor-Level Control Flow Modules -@cindex control flow modules, editor-level -@cindex modules, editor-level control flow - -@example -@file{event-Xt.c} -@file{event-msw.c} -@file{event-stream.c} -@file{event-tty.c} -@file{events-mod.h} -@file{gpmevent.c} -@file{gpmevent.h} -@file{events.c} -@file{events.h} -@end example - -These implement the handling of events (user input and other system -notifications). - -@file{events.c} and @file{events.h} define the @dfn{event} Lisp object -type and primitives for manipulating it. - -@file{event-stream.c} implements the basic functions for working with -event queues, dispatching an event by looking it up in relevant keymaps -and such, and handling timeouts; this includes the primitives -@code{next-event} and @code{dispatch-event}, as well as related -primitives such as @code{sit-for}, @code{sleep-for}, and -@code{accept-process-output}. (@file{event-stream.c} is one of the -hairiest and trickiest modules in XEmacs. Beware! You can easily mess -things up here.) - -@file{event-Xt.c} and @file{event-tty.c} implement the low-level -interfaces onto retrieving events from Xt (the X toolkit) and from TTY's -(using @code{read()} and @code{select()}), respectively. The event -interface enforces a clean separation between the specific code for -interfacing with the operating system and the generic code for working -with events, by defining an API of basic, low-level event methods; -@file{event-Xt.c} and @file{event-tty.c} are two different -implementations of this API. To add support for a new operating system -(e.g. NeXTstep), one merely needs to provide another implementation of -those API functions. - -Note that the choice of whether to use @file{event-Xt.c} or -@file{event-tty.c} is made at compile time! Or at the very latest, it -is made at startup time. @file{event-Xt.c} handles events for -@emph{both} X and TTY frames; @file{event-tty.c} is only used when X -support is not compiled into XEmacs. The reason for this is that there -is only one event loop in XEmacs: thus, it needs to be able to receive -events from all different kinds of frames. - - - -@example -@file{keymap.c} -@file{keymap.h} -@end example - -@file{keymap.c} and @file{keymap.h} define the @dfn{keymap} Lisp object -type and associated methods and primitives. (Remember that keymaps are -objects that associate event descriptions with functions to be called to -``execute'' those events; @code{dispatch-event} looks up events in the -relevant keymaps.) - - - -@example -@file{cmdloop.c} -@end example - -@file{cmdloop.c} contains functions that implement the actual editor -command loop---i.e. the event loop that cyclically retrieves and -dispatches events. This code is also rather tricky, just like -@file{event-stream.c}. - - - -@example -@file{macros.c} -@file{macros.h} -@end example - -These two modules contain the basic code for defining keyboard macros. -These functions don't actually do much; most of the code that handles keyboard -macros is mixed in with the event-handling code in @file{event-stream.c}. - - - -@example -@file{minibuf.c} -@end example - -This contains some miscellaneous code related to the minibuffer (most of -the minibuffer code was moved into Lisp by Richard Mlynarik). This -includes the primitives for completion (although filename completion is -in @file{dired.c}), the lowest-level interface to the minibuffer (if the -command loop were cleaned up, this too could be in Lisp), and code for -dealing with the echo area (this, too, was mostly moved into Lisp, and -the only code remaining is code to call out to Lisp or provide simple -bootstrapping implementations early in temacs, before the echo-area Lisp -code is loaded). - - -@node Asynchronous Events; Quit Checking, Evaluation; Stack Frames; Bindings, Events and the Event Loop, Top -@chapter Asynchronous Events; Quit Checking -@cindex asynchronous events; quit checking -@cindex asynchronous events - -@menu -* Signal Handling:: -* Control-G (Quit) Checking:: -* Profiling:: -* Asynchronous Timeouts:: -* Exiting:: -@end menu - -@node Signal Handling, Control-G (Quit) Checking, Asynchronous Events; Quit Checking, Asynchronous Events; Quit Checking -@section Signal Handling -@cindex signal handling - -@node Control-G (Quit) Checking, Profiling, Signal Handling, Asynchronous Events; Quit Checking -@section Control-G (Quit) Checking -@cindex Control-g checking -@cindex C-g checking -@cindex quit checking -@cindex QUIT checking -@cindex critical quit - -@emph{Note}: The code to handle QUIT is divided between @file{lisp.h} -and @file{signal.c}. There is also some special-case code in the async -timer code in @file{event-stream.c} to notice when the poll-for-quit -(and poll-for-sigchld) timers have gone off. - -Here's an overview of how this convoluted stuff works: - -@enumerate -@item - -Scattered throughout the XEmacs core code are calls to the macro QUIT; -This macro checks to see whether a @kbd{C-g} has recently been pressed -and not yet handled, and if so, it handles the @kbd{C-g} by calling -@code{signal_quit()}, which invokes the standard @code{Fsignal()} code, -with the error being @code{Qquit}. Lisp code can establish handlers -for this (using @code{condition-case}), but normally there is no -handler, and so execution is thrown back to the innermost enclosing -event loop. (One of the things that happens when entering an event loop -is that a @code{condition-case} is established that catches @strong{all} calls -to @code{signal}, including this one.) - -@item -How does the QUIT macro check to see whether @kbd{C-g} has been pressed; -obviously this needs to be extremely fast. Now for some history. -In early Lemacs as inherited from the FSF going back 15 years or -more, there was a great fondness for using SIGIO (which is sent -whenever there is I/O available on a given socket, tty, etc.). -In fact, in GNU Emacs, perhaps even today, all reading of events -from the X server occurs inside the SIGIO handler! This is crazy, -but not completely relevant. What is relevant is that similar -stuff happened inside the SIGIO handler for @kbd{C-g}: it searched -through all the pending (i.e. not yet delivered to XEmacs yet) -X events for one that matched @kbd{C-g}. When it saw a match, it set -Vquit_flag to Qt. On TTY's, @kbd{C-g} is actually mapped to be the -interrupt character (i.e. it generates SIGINT), and XEmacs's -handler for this signal sets Vquit_flag to Qt. Then, sometime -later after the signal handlers finished and a QUIT macro was -called, the macro noticed the setting of @code{Vquit_flag} and used -this as an indication to call @code{signal_quit()}. What @code{signal_quit()} -actually does is set @code{Vquit_flag} to Qnil (so that we won't get -repeated interruptions from a single @kbd{C-g} press) and then calls -the equivalent of (signal 'quit nil). - -@item -Another complication is introduced in that Vquit_flag is actually -exported to Lisp as @code{quit-flag}. This allows users some level of -control over whether and when @kbd{C-g} is processed as quit, esp. in -combination with @code{inhibit-quit}. This is another Lisp variable, -and if set to non-nil, it inhibits @code{signal_quit()} from getting -called, meaning that the @kbd{C-g} gets essentially ignored. But not -completely: Because the resetting of @code{quit-flag} happens only -in @code{signal_quit()}, which isn't getting called, the @kbd{C-g} press is -still noticed, and as soon as @code{inhibit-quit} is set back to nil, -a quit will be signalled at the next QUIT macro. Thus, what -@code{inhibit-quit} really does is defer quits until after the quit- -inhibitted period. - -@item -Another consideration, introduced by XEmacs, is critical quitting. If -you press @kbd{Control-Shift-G} instead of just @kbd{C-g}, -@code{quit-flag} is set to @code{critical} instead of to t. When QUIT -processes this value, it @strong{ignores} the value of -@code{inhibit-quit}. This allows you to quit even out of a -quit-inhibitted section of code! Furthermore, when @code{signal_quit()} -notices that it was invoked as a result of a critical quit, it -automatically invokes the debugger (which otherwise would only happen -when @code{debug-on-quit} is set to t). - -@item -Well, I explained above about how @code{quit-flag} gets set correctly, -but I began with a disclaimer stating that this was the old way -of doing things. What's done now? Well, first of all, the SIGIO -handler (which formerly checked all pending events to see if there's -a @kbd{C-g}) now does nothing but set a flag -- or actually two flags, -something_happened and quit_check_signal_happened. There are two -flags because the QUIT macro is now used for more than just handling -QUIT; it's also used for running asynchronous timeout handlers that -have recently expired, and perhaps other things. The idea here is -that the QUIT macros occur extremely often in the code, but only occur -at places that are relatively safe -- in particular, if an error occurs, -nothing will get completely trashed. - -@item -Now, let's look at QUIT again. - -@item - -UNFINISHED. Note, however, that as of the point when this comment got -committed to CVS (mid-2001), the interaction between reading @kbd{C-g} -as an event and processing it as QUIT was overhauled to (for the first -time) be understandable and actually work correctly. Now, the way -things work is that if @kbd{C-g} is pressed while XEmacs is blocking at -the top level, waiting for a user event, it will be read as an event; -otherwise, it will cause QUIT. (This includes times when XEmacs is -blocking, but not waiting for a user event, -e.g. @code{accept-process-output} and -@code{wait_delaying_user_events()}.) Formerly, this was supposed to -happen, but didn't always due to a bizarre and broken scheme, documented -in @code{next_event_internal} like this: - -@quotation -If we read a @kbd{C-g}, then set @code{quit-flag} but do not discard the -@kbd{C-g}. The callers of @code{next_event_internal()} will do one of -two things: - -@enumerate -@item -set @code{Vquit_flag} to Qnil. (@code{next-event} does this.) This will -cause the ^G to be treated as a normal keystroke. - -@item -not change @code{Vquit_flag} but attempt to enqueue the ^G, at which -point it will be discarded. The next time QUIT is called, it will -notice that @code{Vquit_flag} was set. -@end enumerate -@end quotation - -This required weirdness in @code{enqueue_command_event_1} like this: - -@quotation -put the event on the typeahead queue, unless the event is the quit char, -in which case the @code{QUIT} which will occur on the next trip through this -loop is all the processing we should do - leaving it on the queue would -cause the quit to be processed twice. -@end quotation - -And further weirdness elsewhere, none of which made any sense, and -didn't work, because (e.g.) it required that QUIT never happen anywhere -inside @code{next_event_internal()} or any callers when @kbd{C-g} should -be read as a user event, which was impossible to implement in practice. - -Now what we do is fairly simple. Callers of -@code{next_event_internal()} that want @kbd{C-g} read as a user event -call @code{begin_dont_check_for_quit()}. @code{next_event_internal()}, -when it gets a @kbd{C-g}, simply sets @code{Vquit_flag} (just as when a -@kbd{C-g} is detected during the operation of @code{QUIT} or -@code{QUITP}), and then tries to @code{QUIT}. This will fail if blocked -by the previous call, at which point @code{next_event_internal()} will -return the @kbd{C-g} as an event. To unblock things, first set -@code{Vquit_flag} to nil (it was set to t when the @kbd{C-g} was read, -and if we don't reset it, the next call to @code{QUIT} will quit), and -then @code{unbind_to()} the depth returned by -@code{begin_dont_check_for_quit()}. It makes no difference is -@code{QUIT} is called a zillion times in @code{next_event_internal()} or -anywhere else, because it's blocked and will never signal. -@end enumerate - -@node Profiling, Asynchronous Timeouts, Control-G (Quit) Checking, Asynchronous Events; Quit Checking -@section Profiling -@cindex profiling -@cindex SIGPROF - -We implement our own profiling scheme so that we can determine -things like which Lisp functions are occupying the most time. Any -standard OS-provided profiling works on C functions, which is -not always that useful -- and inconvenient, since it requires compiling -with profile info and can't be retrieved dynamically, as XEmacs is -running. - -The basic idea is simple. We set a profiling timer using setitimer -(ITIMER_PROF), which generates a SIGPROF every so often. (This runs not -in real time but rather when the process is executing or the system is -running on behalf of the process -- at least, that is the case under -Unix. Under MS Windows and Cygwin, there is no @code{setitimer()}, so we -simulate it using multimedia timers, which run in real time. To make -the results a bit more realistic, we ignore ticks that go off while -blocking on an event wait. Note that Cygwin does provide a simulation -of @code{setitimer()}, but it's in real time anyway, since Windows doesn't -provide a way to have process-time timers, and furthermore, it's broken, -so we don't use it.) When the signal goes off, we see what we're in, and -add 1 to the count associated with that function. - -It would be nice to use the Lisp allocation mechanism etc. to keep track -of the profiling information (i.e. to use Lisp hash tables), but we -can't because that's not safe -- updating the timing information happens -inside of a signal handler, so we can't rely on not being in the middle -of Lisp allocation, garbage collection, @code{malloc()}, etc. Trying to make -it work would be much more work than it's worth. Instead we use a basic -(non-Lisp) hash table, which will not conflict with garbage collection -or anything else as long as it doesn't try to resize itself. Resizing -itself, however (which happens as a result of a @code{puthash()}), could be -deadly. To avoid this, we make sure, at points where it's safe -(e.g. @code{profile_record_about_to_call()} -- recording the entry into a -function call), that the table always has some breathing room in it so -that no resizes will occur until at least that many items are added. -This is safe because any new item to be added in the sigprof would -likely have the @code{profile_record_about_to_call()} called just before it, -and the breathing room is checked. - -In general: any entry that the sigprof handler puts into the table comes -from a backtrace frame (except "Processing Events at Top Level", and -there's only one of those). Either that backtrace frame was added when -profiling was on (in which case @code{profile_record_about_to_call()} was -called and the breathing space updated), or when it was off -- and in -this case, no such frames can have been added since the last time -@code{start-profile} was called, so when @code{start-profile} is called we make -sure there is sufficient breathing room to account for all entries -currently on the stack. - -Jan 1998: In addition to timing info, I have added code to remember call -counts of Lisp funcalls. The @code{profile_increase_call_count()} -function is called from @code{Ffuncall()}, and serves to add data to -Vcall_count_profile_table. This mechanism is much simpler and -independent of the SIGPROF-driven one. It uses the Lisp allocation -mechanism normally, since it is not called from a handler. It may -even be useful to provide a way to turn on only one profiling -mechanism, but I haven't done so yet. --hniksic - -Dec 2002: Total overhaul of the interface, making it sane and easier to -use. --ben - -Feb 2003: Lots of rewriting of the internal code. Add GC-consing-usage, -total GC usage, and total timing to the information tracked. Track -profiling overhead and allow the ability to have internal sections -(e.g. internal-external conversion, byte-char conversion) that are -treated like Lisp functions for the purpose of profiling. --ben - -BEWARE: If you are modifying this file, be @strong{very} careful. Correctly -implementing the "total" values is very tricky due to the possibility of -recursion and of functions already on the stack when starting to -profile/still on the stack when stopping. - -@node Asynchronous Timeouts, Exiting, Profiling, Asynchronous Events; Quit Checking -@section Asynchronous Timeouts -@cindex asynchronous timeouts - -@node Exiting, , Asynchronous Timeouts, Asynchronous Events; Quit Checking -@section Exiting -@cindex exiting -@cindex crash -@cindex hang -@cindex core dump -@cindex Armageddon -@cindex exits, expected and unexpected -@cindex unexpected exits -@cindex expected exits - -Ben's capsule summary about expected and unexpected exits from XEmacs. - -Expected exits occur when the user directs XEmacs to exit, for example -by pressing the close button on the only frame in XEmacs, or by typing -@kbd{C-x C-c}. This runs @code{save-buffers-kill-emacs}, which saves -any necessary buffers, and then exits using the primitive -@code{kill-emacs}. - -However, unexpected exits occur in a few different ways: - -@itemize @bullet -@item -A memory access violation or other hardware-generated exception occurs. -This is the worst possible problem to deal with, because the fault can -occur while XEmacs is in any state whatsoever, even quite unstable ones. -As a result, we need to be @strong{extremely} careful what we do. - -@item -We are using one X display (or if we've used more, we've closed the -others already), and some hardware or other problem happens and -suddenly we've lost our connection to the display. In this situation, -things are not so dire as in the last one; our code itself isn't -trashed, so we can continue execution as normal, after having set -things up so that we can exit at the appropriate time. Our exit -still needs to be of the emergency nature; we have no displays, so -any attempts to use them will fail. We simply want to auto-save -(the single most important thing to do during shut-down), do minimal -cleanup of stuff that has an independent existence outside of XEmacs, -and exit. -@end itemize - -Currently, both unexpected exit scenarios described above set -@code{preparing_for_armageddon} to indicate that nonessential and possibly -dangerous things should not be done, specifically: - -@itemize @minus -@item -no garbage collection. -@item -no hooks are run. -@item -no messages of any sort from autosaving. -@item -autosaving tries harder, ignoring certain failures. -@item -existing frames are not deleted. -@end itemize - -(Also, all places that set @code{preparing_for_armageddon} also -set @code{dont_check_for_quit}. This happens separately because it's -also necessary to set other variables to make absolutely sure -no quitting happens.) - -In the first scenario above (the access violation), we also set -@code{fatal_error_in_progress}. This causes more things to not happen: - -@itemize @minus -@item -assertion failures do not abort. -@item -printing code does not do code conversion or gettext when -printing to stdout/stderr. -@end itemize - -@node Evaluation; Stack Frames; Bindings, Symbols and Variables, Asynchronous Events; Quit Checking, Top +@node Evaluation; Stack Frames; Bindings, Symbols and Variables, The Lisp Reader and Compiler, Top @chapter Evaluation; Stack Frames; Bindings @cindex evaluation; stack frames; bindings @cindex stack frames; bindings, evaluation; @@ -8623,6 +7350,7 @@ * Dynamic Binding; The specbinding Stack; Unwind-Protects:: * Simple Special Forms:: * Catch and Throw:: +* Error Trapping:: @end menu @node Evaluation, Dynamic Binding; The specbinding Stack; Unwind-Protects, Evaluation; Stack Frames; Bindings, Evaluation; Stack Frames; Bindings @@ -8832,7 +7560,7 @@ compiler knows how to convert calls to these functions directly into byte code. -@node Catch and Throw, , Simple Special Forms, Evaluation; Stack Frames; Bindings +@node Catch and Throw, Error Trapping, Simple Special Forms, Evaluation; Stack Frames; Bindings @section Catch and Throw @cindex catch and throw @cindex throw, catch and @@ -8892,6 +7620,171 @@ @code{lisp_eval}, and calls @code{unbind_to()} to undo any specbindings created since the catch. +@node Error Trapping, , Catch and Throw, Evaluation; Stack Frames; Bindings +@section Error Trapping +@cindex error trapping + +@subheading call_trapping_problems(): + +This is equivalent to (*fun) (arg), except that various conditions +can be trapped or inhibited, according to FLAGS. + +@itemize @bullet +@item +If FLAGS does not contain NO_INHIBIT_ERRORS, when an error occurs, +the error is caught and a warning is issued, specifying the +specific error that occurred and a backtrace. In that case, +WARNING_STRING should be given, and will be printed at the +beginning of the error to indicate where the error occurred. + +@item +If FLAGS does not contain NO_INHIBIT_THROWS, all attempts to +@code{throw} out of the function being called are trapped, and a warning +issued. (Again, WARNING_STRING should be given.) + +@item +If FLAGS contains INHIBIT_WARNING_ISSUE, no warnings are issued; +this applies to recursive invocations of call_trapping_problems, too. + +@item +If FLAGS contains POSTPONE_WARNING_ISSUE, no warnings are issued; +but values useful for generating a warning are still computed (in +particular, the backtrace), so that the calling function can issue +a warning. + +@item +If FLAGS contains ISSUE_WARNINGS_AT_DEBUG_LEVEL, warnings will be +issued, but at level @code{debug}, which normally is below the minimum +specified by @code{log-warning-minimum-level}, meaning such warnings will +be ignored entirely. The user can change this variable, however, +to see the warnings.) + +Note: If neither of NO_INHIBIT_THROWS or NO_INHIBIT_ERRORS is +given, you are @strong{guaranteed} that there will be no non-local exits +out of this function. + +@item +If FLAGS contains INHIBIT_QUIT, QUIT using C-g is inhibited. (This +is @strong{rarely} a good idea. Unless you use NO_INHIBIT_ERRORS, QUIT is +automatically caught as well, and treated as an error; you can +check for this using EQ (problems->error_conditions, Qquit). + +@item +If FLAGS contains UNINHIBIT_QUIT, QUIT checking will be explicitly +turned on. (It will abort the code being called, but will still be +trapped and reported as an error, unless NO_INHIBIT_ERRORS is +given.) This is useful when QUIT checking has been turned off by a +higher-level caller. + +@item +If FLAGS contains INHIBIT_GC, garbage collection is inhibited. +This is useful for Lisp called within redisplay, for example. + +@item +If FLAGS contains INHIBIT_EXISTING_PERMANENT_DISPLAY_OBJECT_DELETION, +Lisp code is not allowed to delete any window, buffers, frames, devices, +or consoles that were already in existence at the time this function +was called. (However, it's perfectly legal for code to create a new +buffer and then delete it.) + +#### It might be useful to have a flag that inhibits deletion of a +specific permanent display object and everything it's attached to +(e.g. a window, and the buffer, frame, device, and console it's +attached to. + +@item +If FLAGS contains INHIBIT_EXISTING_BUFFER_TEXT_MODIFICATION, Lisp +code is not allowed to modify the text of any buffers that were +already in existence at the time this function was called. +(However, it's perfectly legal for code to create a new buffer and +then modify its text.) + +@quotation +[These last two flags are implemented using global variables +Vdeletable_permanent_display_objects and Vmodifiable_buffers, +which keep track of a list of all buffers or permanent display +objects created since the last time one of these flags was set. +The code that deletes buffers, etc. and modifies buffers checks + +@enumerate +@item +if the corresponding flag is set (through the global variable +inhibit_flags or its accessor function get_inhibit_flags()), and + +@item +if the object to be modified or deleted is not in the +appropriate list. +@end enumerate + +If so, it signals an error. + +Recursive calls to call_trapping_problems() are allowed. In +the case of the two flags mentioned above, the current values +of the global variables are stored in an unwind-protect, and +they're reset to nil.] +@end quotation + +@item +If FLAGS contains INHIBIT_ENTERING_DEBUGGER, the debugger will not +be entered if an error occurs inside the Lisp code being called, +even when the user has requested an error. In such case, a warning +is issued stating that access to the debugger is denied, unless +INHIBIT_WARNING_ISSUE has also been supplied. This is useful when +calling Lisp code inside redisplay, in menu callbacks, etc. because +in such cases either the display is in an inconsistent state or +doing window operations is explicitly forbidden by the OS, and the +debugger would causes visual changes on the screen and might create +another frame. + +@item +If FLAGS contains INHIBIT_ANY_CHANGE_AFFECTING_REDISPLAY, no +changes of any sort to extents, faces, glyphs, buffer text, +specifiers relating to display, other variables relating to +display, splitting, deleting, or resizing windows or frames, +deleting buffers, windows, frames, devices, or consoles, etc. is +allowed. This is for things called absolutely in the middle of +redisplay, which expects things to be @strong{exactly} the same after the +call as before. This isn't completely implemented and needs to be +thought out some more to determine exactly what its semantics are. +For the moment, turning on this flag also turns on + +@itemize @minus +@item +INHIBIT_EXISTING_PERMANENT_DISPLAY_OBJECT_DELETION +@item +INHIBIT_EXISTING_BUFFER_TEXT_MODIFICATION +@item +INHIBIT_ENTERING_DEBUGGER +@item +INHIBIT_WARNING_ISSUE +@item +INHIBIT_GC +@end itemize + +@item +#### The following five flags are defined, but unimplemented: + +#define INHIBIT_EXISTING_CODING_SYSTEM_DELETION (1<<6) +#define INHIBIT_EXISTING_CHARSET_DELETION (1<<7) +#define INHIBIT_PERMANENT_DISPLAY_OBJECT_CREATION (1<<8) +#define INHIBIT_CODING_SYSTEM_CREATION (1<<9) +#define INHIBIT_CHARSET_CREATION (1<<10) + +@item +FLAGS containing CALL_WITH_SUSPENDED_ERRORS is a sign that +call_with_suspended_errors() was invoked. This exists only for +debugging purposes -- often we want to break when a signal happens, +but ignore signals from call_with_suspended_errors(), because they +occur often and for legitimate reasons. +@end itemize + +If PROBLEM is non-zero, it should be a pointer to a structure into +which exact information about any occurring problems (either an +error or an attempted throw past this boundary). + +If a problem occurred and aborted operation (error, quit, or +invalid throw), Qunbound is returned. Otherwise the return value +from the call to (*fun) (arg) is returned. @node Symbols and Variables, Buffers, Evaluation; Stack Frames; Bindings, Top @chapter Symbols and Variables @@ -9819,7 +8712,7 @@ But if you keep your eye on the "switch in a loop" structure, you should be able to understand the parts you need. -@node Multilingual Support, The Lisp Reader and Compiler, Text, Top +@node Multilingual Support, Consoles; Devices; Frames; Windows, Text, Top @chapter Multilingual Support @cindex Mule character sets and encodings @cindex character sets and encodings, Mule @@ -9860,6 +8753,7 @@ * Internal Text API's:: * Coding for Mule:: * CCL:: +* Microsoft Windows-Related Multilingual Issues:: * Modules for Internationalization:: @end menu @@ -11888,13 +10782,13 @@ Bytecount len, Charcount charlen, ...); Compare the Eistring with the other data. Return value same as - from strcmp. The `*' is either `ei' for another Eistring (in - which case `...' is an Eistring), or `c' for a pure-ASCII string - (in which case `...' is a pointer to that string). For anything + from strcmp. The @code{*} is either @code{ei} for another Eistring (in + which case @code{...} is an Eistring), or @code{c} for a pure-ASCII string + (in which case @code{...} is a pointer to that string). For anything more complex, first create an Eistring out of the source. - Comparison is either simple (`eicmp_...'), ASCII case-folding - (`eicasecmp_...'), or multilingual case-folding - (`eicasecmp_i18n_...). + Comparison is either simple (@code{eicmp_...}), ASCII case-folding + (@code{eicasecmp_...}), or multilingual case-folding + (@code{eicasecmp_i18n_...}). More specifically, the prototypes are: @@ -12538,7 +11432,7 @@ In Windows code, string literals may need to be encapsulated with @code{XETEXT}. @end itemize -@node CCL, Modules for Internationalization, Coding for Mule, Multilingual Support +@node CCL, Microsoft Windows-Related Multilingual Issues, Coding for Mule, Multilingual Support @section CCL @cindex CCL @@ -12664,7 +11558,813 @@ ..........AAAAA @end example -@node Modules for Internationalization, , CCL, Multilingual Support +@node Microsoft Windows-Related Multilingual Issues, Modules for Internationalization, CCL, Multilingual Support +@section Microsoft Windows-Related Multilingual Issues +@cindex Microsoft Windows-related multilingual issues +@cindex Windows-related multilingual issues +@cindex multilingual issues, Windows-related + +@menu +* Microsoft Documentation:: +* Locales:: +* More about code pages:: +* More about locales:: +* Unicode support under Windows:: +* The golden rules of writing Unicode-safe code:: +* The format of the locale in setlocale():: +* Random other Windows I18N docs:: +@end menu + +@node Microsoft Documentation, Locales, Microsoft Windows-Related Multilingual Issues, Microsoft Windows-Related Multilingual Issues +@subsection Microsoft Documentation +@cindex Microsoft documentation + +Documentation on international support in Windows is scattered throughout MSDN. +Here are some good places to look: + +@enumerate +@item +C Runtime (CRT) intl support + +@enumerate +@item +Visual Tools and Languages -> Visual Studio 6.0 Documentation -> Visual C++ Documentation -> Using Visual C++ -> Run-Time Library Reference -> Internationalization +@item +Visual Tools and Languages -> Visual Studio 6.0 Documentation -> Visual C++ Documentation -> Using Visual C++ -> Run-Time Library Reference -> Global Constants -> Locale Categories +@item +Visual Tools and Languages -> Visual Studio 6.0 Documentation -> Visual C++ Documentation -> Using Visual C++ -> Run-Time Library Reference -> Appendixes -> Language and Country/Region Strings +@item +Visual Tools and Languages -> Visual Studio 6.0 Documentation -> Visual C++ Documentation -> Using Visual C++ -> Run-Time Library Reference -> Appendixes -> Generic-Text Mappings +@item +Function documentation for various functions: +Visual Tools and Languages -> Visual Studio 6.0 Documentation -> Visual C++ Documentation -> Using Visual C++ -> Run-Time Library Reference -> Alphabetic Function Reference +e.g. _setmbcp(), setlocale(), strcoll functions +@end enumerate + +@item +Win32 API intl support + +@enumerate +@item +Platform SDK Documentation -> Base Services -> International Features +@item +Platform SDK Documentation -> User Interface Services -> Windows User Interface -> User Input -> Keyboard Input -> Character Messages -> International Features +@item +Backgrounders -> Windows Platform -> Windows 2000 -> International Support in Microsoft Windows 2000 +@end enumerate + +@item +Microsoft Layer for Unicode + +Platform SDK Documentation -> Windows API -> Windows 95/98/Me Programming -> Windows 95/98/Me Overviews -> Microsoft Layer for Unicode on Windows 95/98/Me Systems + +@item +Look in the CRT sources! They come with VC++. See win32.c. +@end enumerate + +@node Locales, More about code pages, Microsoft Documentation, Microsoft Windows-Related Multilingual Issues +@subsection Locales, code pages, and other concepts of "language" +@cindex locales, code pages, and other concepts of "language" + +First, make sure you clearly understand the difference between the C +runtime library (CRT) and the Win32 API! See win32.c. + +There are various different ways of representing the vague concept +of "language", and it can be very confusing. So: + +@itemize @bullet +@item +The CRT library has the concept of "locale", which is a +combination of language and country, and which controls the way +currency and dates are displayed, the encoding of data, etc. + +@item +XEmacs has the concept of "language environment", more or less +like a locale; although currently in most cases it just refers to +the language, and no sub-language distinctions are +made. (Exceptions are with Chinese, which has different language +environments for Taiwan and mainland China, due to the different +encodings and writing systems.) + +@item +Windows has a number of different language concepts: + +@enumerate +@item +There are "languages" and "sublanguages", which correspond to +the languages and countries of the C library -- e.g. LANG_ENGLISH +and SUBLANG_ENGLISH_US. These are identified by 8-bit integers, +called the "primary language identifier" and "sublanguage +identifier", respectively. These are combined into a 16-bit +integer or "language identifier" by MAKELANGID(). + +@item +The language identifier in turn is combined with a "sort +identifier" (and optionally a "sort version") to yield a 32-bit +integer called a "locale identifier" (type LCID), which identifies +locales -- the primary means of distinguishing language/regional +settings and similar to C library locales. + +@item +A "code page" combines the XEmacs concepts of "charset" and "coding +system". It logically encompasses + +@itemize @minus +@item +a set of supported characters +@item +an enumeration associating each character with a code point, which +is a number or number pair; there may be disjoint ranges of numbers +supported +@item +a way of encoding a series of characters into a string of bytes +@end itemize + +Note that the first two properties correspond to an XEmacs "charset" +and the latter an XEmacs "coding system". + +Traditional encodings are either simple one-byte encodings, or +combination one-byte/two-byte encodings (aka MBCS encodings, where MBCS +stands for "Multibyte Character Set") with the following properties: + +@itemize @minus +@item +all characters are encoded as a one-byte or two-byte sequence +@item +the encoding is stateless (non-modal) +@item +the lower 128 bytes are compatible with ASCII +@item +in the higher bytes, the value of the first byte ("lead byte") +determines whether a second byte follows +@item +the values used for second bytes may overlap those used for first +bytes, and (in some encodings) include values in the low half; thus, +moving backwards is hard, and pure-ASCII algorithms (e.g. finding the +next slash) will fail unless rewritten to be MBCS-aware (neither of +these problems exist in UTF-8 or in the XEmacs internal string +encoding) +@end itemize + +Recent code pages, however, do not necessarily follow these properties -- +code pages have been expanded to include arbitrary encodings, such as +UTF-8 (may have more than two bytes per character) and ISO-2022-JP +(complex modal encoding). + +@item +Every Windows locale has four associated code pages: ANSI (an +international standard or some Microsoft-created approximation; the +native code page under Windows), OEM (a DOS encoding, still used in the +FAT file system), Mac (an encoding used on the Macintosh) and EBCDIC (a +non-ASCII-compatible encoding used on IBM mainframes, originally based +on the BCD or "binary-coded decimal" encoding of numbers). All code +pages associated with a locale follow (as far as I know) the properties +listed above for traditional code pages. More than one locale can share +a code page -- e.g. all the Western European languages, including +English, do. + +@item +Windows also has an "input locale identifier" (aka "keyboard +layout id") or HKL, which is a 32-bit integer composed of the +16-bit language identifier and a 16-bit "device identifier", which +originally specified a particular keyboard layout (e.g. the locale +"US English" can have the QWERTY layout, the Dvorak layout, etc.), +but has been expanded to include speech-to-text converters and +other non-keyboard ways of inputting text. Note that both the HKL +and LCID share the language identifier in the lower 16 bits, and in +both cases a 0 in the upper 16 bits means "default" (sort order or +device), providing a way to convert between HKL's, LCID's, and +language identifiers (i.e. language/sublanguage pairs). The +default keyboard layout for a language is (as far as I can +determine) established using the Regional Settings control panel +applet, where you can add input locales as combinations of language +(actually language/sublanguage) and layout; presumably if you list +only one input locale with a particular language, the corresponding +layout is the default for that language. But what if you list more +than one? You can specify a single default input locale, but there +appears to be no way to do so on a per-language basis. +@end enumerate +@end itemize + +@node More about code pages, More about locales, Locales, Microsoft Windows-Related Multilingual Issues +@subsection More about code pages +@cindex more about code pages + +Here is what MSDN says about code pages (article "Code Pages"): + +@quotation +A code page is a character set, which can include numbers, +punctuation marks, and other glyphs. Different languages and locales +may use different code pages. For example, ANSI code page 1252 is +used for American English and most European languages; OEM code page +932 is used for Japanese Kanji. + +A code page can be represented in a table as a mapping of characters +to single-byte values or multibyte values. Many code pages share the +ASCII character set for characters in the range 0x00 ?0x7F. + +The Microsoft run-time library uses the following types of code pages: + +-- System-default ANSI code page. By default, at startup the run-time +system automatically sets the multibyte code page to the +system-default ANSI code page, which is obtained from the operating +system. The call + +setlocale ( LC_ALL, "" ); + +also sets the locale to the system-default ANSI code page. + +-- Locale code page. The behavior of a number of run-time routines is +dependent on the current locale setting, which includes the locale +code page. (For more information, see Locale-Dependent Routines.) By +default, all locale-dependent routines in the Microsoft run-time +library use the code page that corresponds to the ¡ë?locale. At +run-time you can change or query the locale code page in use with a +call to setlocale. + +-- Multibyte code page. The behavior of most of the multibyte-character +routines in the run-time library depends on the current multibyte +code page setting. By default, these routines use the system-default +ANSI code page. At run-time you can query and change the multibyte +code page with _getmbcp and _setmbcp, respectively. + +-- The "C" locale is defined by ANSI to correspond to the locale in +which C programs have traditionally executed. The code page for the +"C" locale (¡ë?code page) corresponds to the ASCII character +set. For example, in the "C" locale, islower returns true for the +values 0x61 ?0x7A only. In another locale, islower may return true +for these as well as other values, as defined by that locale. + +Under "Locale-Dependent Routines" we notice the following setlocale +dependencies: + +atof, atoi, atol (LC_NUMERIC) +is Routines (LC_CTYPE) +isleadbyte (LC_CTYPE) +localeconv (LC_MONETARY, LC_NUMERIC) +MB_CUR_MAX (LC_CTYPE) +_mbccpy (LC_CTYPE) +_mbclen (LC_CTYPE) +mblen (LC_CTYPE ) +_mbstrlen (LC_CTYPE) +mbstowcs (LC_CTYPE) +mbtowc (LC_CTYPE) +printf (LC_NUMERIC, for radix character output) +scanf (LC_NUMERIC, for radix character recognition) +setlocale/_wsetlocale (Not applicable) +strcoll (LC_COLLATE) +_stricoll/_wcsicoll (LC_COLLATE) +_strncoll/_wcsncoll (LC_COLLATE) +_strnicoll/_wcsnicoll (LC_COLLATE) +strftime, wcsftime (LC_TIME) +_strlwr (LC_CTYPE) +strtod/wcstod/strol/wcstol/strtoul/wcstoul (LC_NUMERIC, for radix character recognition) +_strupr (LC_CTYPE) +strxfrm/wcsxfrm (LC_COLLATE) +tolower/towlower (LC_CTYPE) +toupper/towupper (LC_CTYPE) +wcstombs (LC_CTYPE) +wctomb (LC_CTYPE) +_wtoi/_wtol (LC_NUMERIC) +@end quotation + +NOTE: The above documentation doesn't clearly explain the "locale code +page" and "multibyte code page". These are two different values, +maintained respectively in the CRT global variables __lc_codepage and +__mbcodepage. Calling e.g. setlocale (LC_ALL, "JAPANESE") sets @strong{ONLY} +__lc_codepage to 932 (the code page for Japanese), and leaves +__mbcodepage unchanged (usually 1252, i.e. Windows-ANSI). You'd have to +call _setmbcp() to change __mbcodepage. Figuring out from the +documentation which routines use which code page is not so obvious. But: + +@itemize @bullet +@item +from "Interpretation of Multibyte-Character Sequences" it appears that +all "multibyte-character routines" use the multibyte code page except for +mblen(), _mbstrlen(), mbstowcs(), mbtowc(), wcstombs(), and wctomb(). + +@item +from "_setmbcp": "The multibyte code page also affects +multibyte-character processing by the following run-time library +routines: _exec functions _mktemp _stat _fullpath _spawn functions +_tempnam _makepath _splitpath tmpnam. In addition, all run-time library +routines that receive multibyte-character argv or envp program arguments +as parameters (such as the _exec and _spawn families) process these +strings according to the multibyte code page. Hence these routines are +also affected by a call to _setmbcp that changes the multibyte code +page." +@end itemize + +Summary: from looking at the CRT source (which comes with VC++) and +carefully looking through the docs, it appears that: + +@itemize @bullet +@item +the "locale code page" is used by all of the routines listed above +under "Locale-Dependent Routines" (EXCEPT _mbccpy() and _mbclen()), +as well as any other place that converts between multibyte and Unicode +strings, e.g. the startup code. +@item +the "multibyte code page" is used in all of the *mb*() routines +except mblen(), _mbstrlen(), mbstowcs(), mbtowc(), wcstombs(), +and wctomb(); also _exec*(), _spawn*(), _mktemp(), _stat(), _fullpath(), +_tempnam(), _makepath(), _splitpath(), tmpnam(), and similar functions +without the leading underscore. +@end itemize + +@node More about locales, Unicode support under Windows, More about code pages, Microsoft Windows-Related Multilingual Issues +@subsection More about locales +@cindex more about locales + +In addition to the locale defined by the CRT, Windows (i.e. the Win32 API) +defines various locales: + +@itemize @bullet +@item +The system-default locale is the locale defined under "Language +settings for the system" in the "Regional Options" control panel. This +is NOT user-specific, and changing it requires a reboot (at least under +Windows 2000). The ANSI code page of the system-default locale is +returned by GetACP(), and you can specify this code page in calls +e.g. to MultiByteToWideChar with the constant CP_ACP. + +@item +The user-default locale is the locale defined under "Settings for the +current user" in the "Regional Options" control panel. + +@item +There is a thread-local locale set by SetThreadLocale. #### What is this +used for? +@end itemize + +The Win32 API has a bunch of multibyte functions -- all of those that +end with ...A(), and on which we spend so much effort in +intl-encap-win32.c. These appear to ALWAYS use the ANSI code page of +the system-default locale (GetACP(), CP_ACP). Note that this applies +also, for example, to the encoding of filenames in all file-handling +routines, including the CRT ones such as open(), because they pass their +args unchanged to the Win32 API. + +@node Unicode support under Windows, The golden rules of writing Unicode-safe code, More about locales, Microsoft Windows-Related Multilingual Issues +@subsection Unicode support under Windows +@cindex unicode support under windows + +Basically, the whole concept of locales and code pages is broken, because +it is extremely messy to support and does not allow for documents that use +multiple languages simultaneously. Unicode was designed in response to +this, the idea being to create a single character set that could be used to +encode all the world's languages. Windows has supported Unicode since the +beginning of the Win32 API. Internally, every code page has an associated +table to convert the characters of that code page to and from Unicode, and +the Win32 API itself probably (perhaps always) uses Unicode internally. + +Under Windows there are two different versions of all library routines that +accept or return text, those that handle Unicode text and those handling +"multibyte" text, i.e. variable-width ASCII-compatible text in some +national format such as EUC or Shift-JIS. Because Windows 95 basically +doesn't support Unicode but Windows NT does, and Microsoft doesn't provide +any way of writing a single binary that will work on both systems and still +use Unicode when it's available (although see below, Microsoft Layer for +Unicode), we need to provide a way of run-time conditionalizing so you +could have one binary for both systems. "Unicode-splitting" refers to +writing code that will handle this properly. This means using +Qmswindows_tstr as the external conversion format, calling the appropriate +qxe...() Unicode-split version of library functions, and doing other things +in certain cases, e.g. when a qxe() function is not present. + +Unicode support also requires that the various Windows API's be +"Unicode-encapsulated", so that they automatically call the ANSI or +Unicode version of the API call appropriately and handle the size +differences in structures. What this means is: + +@itemize @bullet +@item +first, note that Windows already provides a sort of encapsulation +of all API's that deal with text. All such API's are underlyingly +provided in two versions, with an A or W suffix (ANSI or "wide" +i.e. Unicode), and the compile-time constant UNICODE controls which is +selected by the unsuffixed API. Same thing happens with structures, and +also with types, where the generic types have names beginning with T -- +TCHAR, LPTSTR, etc.. Unfortunately, this is compile-time only, not +run-time, so not sufficient. (Creating the necessary run-time encoding +is not conceptually difficult, but very time-consuming to write. It +adds no significant overhead, and the only reason it's not standard in +Windows is conscious marketing attempts by Microsoft to cripple Windows +95. FUCK MICROSOFT! They even describe in a KnowledgeBase article +exactly how to create such an API [although we don't exactly follow +their procedure], and point out its usefulness; the procedure is also +described more generally in Nadine Kano's book on Win32 +internationalization -- written SIX YEARS AGO! Obviously Microsoft has +such an API available internally.) + +@item +what we do is provide an encapsulation of each standard Windows API call +that is split into A and W versions. current theory is to avoid all +preprocessor games; so we name the function with a prefix -- "qxe" +currently -- and require callers to use the prefixed name. Callers need +to explicitly use the W version of all structures, and convert text +themselves using Qmswindows_tstr. the qxe encapsulated version will +automatically call the appropriate A or W version depending on whether +we're running on 9x or NT (you can force use of the A calls on NT, +e.g. for testing purposes, using the command- line switch -nuni aka +-no-unicode-lib-calls), and copy data between W and A versions of the +structures as necessary. + +@item +We require the caller to handle the actual translation of text to +avoid possible overflow when dealing with fixed-size Windows +structures. There are no such problems when copying data between +the A and W versions because ANSI text is never larger than its +equivalent Unicode representation. +@end itemize + +NOTE NOTE NOTE: As of August 2001, Microsoft (finally! See my nasty +comment above) released their own Unicode-encapsulation library, called +Microsoft Layer for Unicode on Windows 95/98/Me Systems. It tries to be +more transparent than we are, in that + +@itemize @bullet +@item +its routines do ANSI/Unicode string translation, while we don't, for +efficiency (we already have to do internal/external conversion so it's +no extra burden to do the proper conversion directly rather than always +converting to Unicode and then doing a second conversion to ANSI as +necessary) + +@item +rather than requiring separately-named routines (qxeFooBar), they +physically override the existing routines at the link level. it also +appears that they do this BADLY, in that if you link with the MLU, you +get an application that runs ONLY on Win9x!!! (hint -- use +GetProcAddress()). there's still no way to create a single binary! +fucking losers. + +@item +they assume you compile with UNICODE defined, so there's no need for the +application to explicitly use ...W structures, as we require. + +@item +they also intercept windows procedures to deal with notify messages as +necessary, which we don't do yet. + +@item +they (of course) don't use Extbyte. +@end itemize + +at some point (especially when they fix the single-binary problem!), we +should consider switching. for the meantime, we'll stick with what i've +already written. perhaps we should think about adopting some of the +greater transparency they have; but i opted against transparency on +purpose, to make the code easier to follow for someone who's not familiar +with it. until our library is really complete and bug-free, we should +think twice before doing this. + +According to Microsoft documentation, only the following functions are +provided under Windows 9x to support Unicode (see MSDN page "Windows +95/98/Me General Limitations"): + +EnumResourceLanguages +EnumResourceNames +EnumResourceTypes +ExtTextOut +FindResource +FindResourceEx +GetCharWidth +GetCommandLine +GetTextExtentPoint +GetTextExtentPoint32 +lstrcat +lstrcpy +lstrlen +MessageBox +MessageBoxEx +MultiByteToWideChar +TextOut +WideCharToMultiByte + +also maybe GetTextExtentExPoint? (KB Q125671 "Unicode Functions Supported +by Windows 95") + +However, the C runtime library provides some additional support (according +to the CRT sources, as the docs are not very clear on this): + +@itemize @bullet +@item +wmain() is completely supported, and appropriate Unicode-formatted argv +and envp will always be passed. +@item +Likewise, wWinMain() is completely supported. (NOTE: The docs are not at +all clear on how these various entry points interact, and implies that +a windows-subsystem program "must" use WinMain(), while a console- +subsystem program "must" use main(), and a program compiled with UNICODE +(which we don't, see above) "must" use the w*() versions, while a program +not compiled this way "must" use the plain versions. In fact it appears +that the CRT provides four different compiler entry points, namely +w?(main|WinMain)CRTStartup, and we simply choose the one we like using +the appropriate link flag. +@item +_wenviron, _wputenv +@end itemize + +NOTE: + +@itemize @bullet +@item +wsetargv.obj uses routines that were buggily left out of MSVCRT; anyway, +from looking at the source, it does NOT correctly work under Win 9x as +it blindly calls the Unicode version of Unicode-split API's such as +FindFirstFile) + +@item +the w*() file routines are @strong{NOT} supported -- or at least, they blindly +call the ...W() versions of the Win32 API calls. +@end itemize + +@node The golden rules of writing Unicode-safe code, The format of the locale in setlocale(), Unicode support under Windows, Microsoft Windows-Related Multilingual Issues +@subsection The golden rules of writing Unicode-safe code +@cindex the golden rules of writing unicode-safe code + +@itemize @bullet +@item +There are no preprocessor games going on. + +@item +Do not set the UNICODE constant. + +@item +You need to change your code to call the Windows API prefixed with "qxe" +functions (when they exist) and use the ...W structs instead of the +generic ones. String arguments in the qxe functions are of type Extbyte +*. + +@item +You code is responsible for conversion of text arguments. We try to +handle everything else -- the argument differences, the copying back and +forth of structures, etc. Use Qmswindows_tstr and macros such as +C_STRING_TO_TSTR. You are also responsible for interpreting and +specifying string sizes, which have not been changed. Usually these are +in characters, meaning you need to divide by XETCHAR_SIZE. (But, some +functions want sizes in bytes, even with Unicode strings. Look in the +documentation.) Use XETEXT when specifying string constants, so that +they show up in Unicode as necessary. + +@item +If you need to process external strings (in general you should not do +this; do all your manipulations in internal format and convert at the +point of entry into or exit from the function), use the xet...() +functions. + +@item +If you have to declare a fixed array to hold a string coming from +Windows (and hence either multibyte or Unicode), declare it of type +Extbyte[] and multiply the size by MAX_XETCHAR_SIZE. +@end itemize + +@node The format of the locale in setlocale(), Random other Windows I18N docs, The golden rules of writing Unicode-safe code, Microsoft Windows-Related Multilingual Issues +@subsection The format of the locale in setlocale() +@cindex the format of the locale in setlocale() + +It appears that under Unix the standard format for the string in +setlocale() involves two-letter language and country abbreviations, e.g. +ja or ja_jp or ja_jp.euc for Japanese. Windows (MSDN article "Language +Strings" in the run-time reference appendix, see doc list above) speaks +of "(primary) language" and "sublanguage" (usually a country, but in the +case of Chinese the sublanguage is "simplified" or "traditional"). It +is highly flexible in what it takes, and thankfully it canonicalizes the +result to a unique form "Language_Country.Encoding". It allows (note +that all specifications can be in any case): + +@itemize @bullet +@item +the full "language_country.encoding" specification or just +language_country", in which case the default encoding will be chosen. + +@item +a three-letter acronym, consisting of the ISO-standard two-letter +language abbreviation followed by a third letter indicating the +sublanguage. + +@item +just a language name, e.g. "dutch", standing for the combination of +the language with "default" as sublanguage, referring to the default +(often "prototypical") country for that language (in this case the +Netherlands). You can abbreviate the name by removing any number of +letters from the end. Ambiguity is not a problem: Even specifying +just a single letter is valid providing any language starting with +that letter exists, but the result may not be what you want (e.g. "c" +maps to "catalan", not "chinese", "czech", etc.). The way of +resolving ambiguity appears fairly random -- it's not alphabetical +("a" maps to "arabic" not "albanian"). + +@item +a combination of language and sublanguage separated by a hyphen, +e.g. "dutch-belgian"; note that the sublanguage designator in this +case is NOT necessarily the same as the country, e.g. "belgian" vs. +"belgium". "dutch-belgium" (or even "dutch-belg") does @strong{NOT} get you +the right result, but returns "Dutch_Netherlands.1252" instead! This +is because, although you may not abbreviate the result, Windows +accepts any unknown value in the sublanguage field and treats it as +equivalent to "default". Note also that the if the sublanguage name +has underscores in it, you need to change them to spaces, e.g. +"spanish-dominican republic". + +@item +sometimes, just a sublanguage name, e.g. "belgian", standing for +the combination of one of the languages spoken in that region and +the sublanguage of the region -- in this case Dutch. Note that +there is no guarantee of "protypicality" in this case in choice of +language! You could hardly say that Dutch (aka Flemish) is more +prototypical of Belgium than French. You cannot abbreviate this +form, if it's allowed at all. +@end itemize + +In addition: + +@itemize @bullet +@item +note further that you are not limited to the language/sublanguage +combinations predefined by Windows. You can set weird combinations +like "Chinese_Kenya.1255" (Chinese spoken in Kenya, represented by +Windows-1255, i.e. Hebrew!) and Windows don't complain, despite the +language-encoding inconsistency. You can also make up a weird +combination and leave out the encoding, e.g. "Chinese_Qatar", which +maps to "Chinese_Qatar.1256", where Windows-1256 is Arabic -- i.e. it +appears to be choosing the encoding based on a default for the +country. + +@item +note also that the names for countries are often not what you expect. +"urdu_pakistan" fails, and just "urdu" shows why, as it maps to +"Urdu_Islamic Republic of Pakistan.1256". That is, some countries +exist in their full name, and the canonicalized form with underscore +is not very forgiving in its handling of country specifications. +Similarly, Uzbekistan is "Republic of Uzbekistan", and "China" is +"People's Republic of China" -- but in this latter case, unlike the +other two, just "China" works as an alias, e.g. "uzbek_china" maps +to "Uzbek_People's Republic of China.936". + +@item +note that just the two-letter ISO language code is NOT allowed. +Sometimes you'll get lucky (e.g. "fr" does map to "france"), but +sometimes you'll get no match (e.g. "pl"), and sometimes you'll get +really unlucky in that the call will succeed but with the wrong +language (e.g. "es" maps to "estonian", not "spanish"). +@end itemize + +As an example, MSDN article "Language Strings" indicates that German +(default) can be specified using "deu" or "german"; German (Austrian) +with "dea" or "german-austrian"; German (Swiss) with "des", +"german-swiss", or "swiss"; French (Swiss) with "french-swiss" or "frs"; +and English (USA) with "american", "american english", +"american-english", "english-american", "english-us", "english-usa", +"enu", "us", or "usa". This is not, of course, an exhaustive list even +for just the given locales -- just "english" works in practice because +English (Default) maps to English (USA). (#### Is this always the case?) + +Given the canonicalization, we don't have to worry too much about the +different kinds of inputs to setlocale() -- unlike for Unix, where no +canonicalization is usually performed, the particular locales that +exist vary tremendously from OS to OS, and we need to parse the +uncanonicalized locale spec, directly from the user, to figure out the +encoding to use, making various guesses if not enough information is +present. Yuck! The tricky thing under Windows is figuring how to +deal with the sublang. It appears that the trick of simply passing the +text of the manifest constant itself of the sublang, with appropriate +hacking (e.g. of underscore to space), works most of the time. + +@node Random other Windows I18N docs, , The format of the locale in setlocale(), Microsoft Windows-Related Multilingual Issues +@subsection Random other Windows I18N docs +@cindex random other windows i18n docs + +Introduction to Internationalization Issues in the Win32 API + +Abstract: This page provides an overview of the aspects of the Win32 +internationalization API that are relevant to XEmacs, including the +basic distinction between multibyte and Unicode encodings. Also +included are pointers to how XEmacs should make use of this API. + +The Win32 API is quite well-designed in its handling of strings +encoded for various character sets. The API is geared around the idea +that two different methods of encoding strings should be +supported. These methods are called multibyte and Unicode, +respectively. The multibyte encoding is compatible with ASCII strings +and is a more efficient representation when dealing with strings +containing primarily ASCII characters, but it has a great number of +serious deficiencies and limitations, including that it is very +difficult and error-prone to work with strings in this encoding, and +any particular string in a multibyte encoding can only contain +characters from a very limited number of character sets. The Unicode +encoding rectifies all of these deficiencies, but it is not compatible +with ASCII strings (in other words, an existing program will not be +able to handle the encoded strings unless it is explicitly modified to +do so), and it takes up twice as much memory space as multibyte +encodings when encoding a purely ASCII string. + +Multibyte encodings use a variable number of bytes (either one or two) +to represent characters. ASCII characters are also represented by a +single byte with its high bit not set, and non-ASCII characters are +represented by one or two bytes, the first of which always has its +high bit set. (The second byte, when it exists, may or may not have +its high bit set.) There is no single multibyte encoding. Instead, +there is generally one encoding per non-ASCII character set. Such an +encoding is capable of representing (besides ASCII characters, of +course) only characters from one (or possibly two) particular +character sets. + +Multibyte encoding makes processing of strings very difficult. For +example, given a pointer to the beginning of a character within a +string, finding the pointer to the beginning of the previous character +may require backing up all the way to the beginning of the string, and +then moving forward. Also, an operation such as separating out the +components of a path by searching for backslashes will fail if it's +implemented in the simplest (but not multibyte-aware) fashion, because +it may find what appears to be a backslash, but which is actually the +second byte of a two-byte character. Also, the limited number of +character sets that any particular multibyte encoding can represent +means that loss of data is likely if a string is converted from the +XEmacs internal format into a multibyte format. + +For these reasons, the C code in XEmacs should never do any sort of +work with multibyte encoded strings (or with strings in any external +encoding for that matter). Strings should always be maintained in the +internal encoding, which is predictable, and converted to an external +encoding only at the point where the string moves from the XEmacs C +code and enters a system library function. Similarly, when a string is +returned from a system library function, it should be immediately +converted into the internal coding before any operations are done on +it. + +Unicode, unlike multibyte encodings, is a fixed-width encoding where +every character is represented using 16 bits. It is also capable of +encoding all the characters from all the character sets in common use +in the world. The predictability and completeness of the Unicode +encoding makes it a very good encoding for strings that may contain +characters from many character sets mixed up with each other. At the +same time, of course, it is incompatible with routines that expect +ASCII characters and also incompatible with general string +manipulation routines, which will encounter a great number of what +would appear to be embedded nulls in the string. It also takes twice +as much room to encode strings containing primarily ASCII +characters. This is why XEmacs does not use Unicode or similar +encoding internally for buffers. + +The Win32 API cleverly deals with the issue of 8 bit vs. 16 bit +characters by declaring a type called TCHAR which specifies a generic +character, either 8 bits or 16 bits. Generally TCHAR is defined to be +the same as the simple C type char, unless the preprocessor constant +UNICODE is defined, in which case TCHAR is defined to be WCHAR, which +is a 16 bit type. Nearly all functions in the Win32 API that take +strings are defined to take strings that are actually arrays of +TCHARs. There is a type LPTSTR which is defined to be a string of +TCHARs and another type LPCTSTR which is a const string of TCHARs. The +theory is that any program that uses TCHARs exclusively to represent +characters and does not make assumptions about the size of a TCHAR or +the way that the characters are encoded should work transparently +regardless of whether the UNICODE preprocessor constant is defined, +which is to say, regardless of whether 8 bit multibyte or 16 bit +Unicode characters are being used. The way that this is actually +implemented is that every Win32 API function that takes a string as an +argument actually maps to one of two functions which are suffixed with +an A (which stands for ANSI, and means multibyte strings) or W (which +stands for wide, and means Unicode strings). The mapping is, of +course, controlled by the same UNICODE preprocessor +constant. Generally all structures containing strings in them actually +map to one of two different kinds of structures, with either an A or a +W suffix after the structure name. + +Unfortunately, not all of the implementations of the Win32 API +implement all of the functionality described above. In particular, +Windows 95 does not implement very much Unicode functionality. It does +implement functions to convert multibyte-encoded strings to and from +Unicode strings, and provides Unicode versions of certain low-level +functions like ExtTextOut(). In fact, all of the rest of the Unicode +versions of API functions are just stubs that return an +error. Conversely, all versions of Windows NT completely implement all +the Unicode functionality, but some versions (especially versions +before Windows NT 4.0) don't implement much of the multibyte +functionality. For this reason, as well as for general code +cleanliness, XEmacs needs to be written in such a way that it works +with or without the UNICODE preprocessor constant being defined. + +Getting XEmacs to run when all strings are Unicode primarily involves +removing any assumptions made about the size of characters. Remember +what I said earlier about how the point of conversion between +internally and externally encoded strings should occur at the point of +entry or exit into or out of a library function. With this in mind, an +externally encoded string in XEmacs can be treated simply as an +arbitrary sequence of bytes of some length which has no particular +relationship to the length of the string in the internal encoding. + +Use Qnative for Unix conversion, Qmswindows_tstr for Windows ... + +String constants that are to be passed directly to Win32 API functions, +such as the names of window classes, need to be bracketed in their +definition with a call to the macro XETEXT. This appropriately makes a +string of either regular or wide chars, which is to say this string may be +prepended with an L (causing it to be a wide string) depending on +XEUNICODE_P. + +@node Modules for Internationalization, , Microsoft Windows-Related Multilingual Issues, Multilingual Support @section Modules for Internationalization @cindex modules for internationalization @cindex internationalization, modules for @@ -12745,245 +12445,7 @@ Asian-language support, and is not currently used. -@node The Lisp Reader and Compiler, Lstreams, Multilingual Support, Top -@chapter The Lisp Reader and Compiler -@cindex Lisp reader and compiler, the -@cindex reader and compiler, the Lisp -@cindex compiler, the Lisp reader and - -Not yet documented. - -@node Lstreams, Consoles; Devices; Frames; Windows, The Lisp Reader and Compiler, Top -@chapter Lstreams -@cindex lstreams - - An @dfn{lstream} is an internal Lisp object that provides a generic -buffering stream implementation. Conceptually, you send data to the -stream or read data from the stream, not caring what's on the other end -of the stream. The other end could be another stream, a file -descriptor, a stdio stream, a fixed block of memory, a reallocating -block of memory, etc. The main purpose of the stream is to provide a -standard interface and to do buffering. Macros are defined to read or -write characters, so the calling functions do not have to worry about -blocking data together in order to achieve efficiency. - -@menu -* Creating an Lstream:: Creating an lstream object. -* Lstream Types:: Different sorts of things that are streamed. -* Lstream Functions:: Functions for working with lstreams. -* Lstream Methods:: Creating new lstream types. -@end menu - -@node Creating an Lstream, Lstream Types, Lstreams, Lstreams -@section Creating an Lstream -@cindex lstream, creating an - -Lstreams come in different types, depending on what is being interfaced -to. Although the primitive for creating new lstreams is -@code{Lstream_new()}, generally you do not call this directly. Instead, -you call some type-specific creation function, which creates the lstream -and initializes it as appropriate for the particular type. - -All lstream creation functions take a @var{mode} argument, specifying -what mode the lstream should be opened as. This controls whether the -lstream is for input and output, and optionally whether data should be -blocked up in units of MULE characters. Note that some types of -lstreams can only be opened for input; others only for output; and -others can be opened either way. #### Richard Mlynarik thinks that -there should be a strict separation between input and output streams, -and he's probably right. - - @var{mode} is a string, one of - -@table @code -@item "r" - Open for reading. -@item "w" - Open for writing. -@item "rc" - Open for reading, but ``read'' never returns partial MULE characters. -@item "wc" - Open for writing, but never writes partial MULE characters. -@end table - -@node Lstream Types, Lstream Functions, Creating an Lstream, Lstreams -@section Lstream Types -@cindex lstream types -@cindex types, lstream - -@table @asis -@item stdio - -@item filedesc - -@item lisp-string - -@item fixed-buffer - -@item resizing-buffer - -@item dynarr - -@item lisp-buffer - -@item print - -@item decoding - -@item encoding -@end table - -@node Lstream Functions, Lstream Methods, Lstream Types, Lstreams -@section Lstream Functions -@cindex lstream functions - -@deftypefun {Lstream *} Lstream_new (Lstream_implementation *@var{imp}, const char *@var{mode}) -Allocate and return a new Lstream. This function is not really meant to -be called directly; rather, each stream type should provide its own -stream creation function, which creates the stream and does any other -necessary creation stuff (e.g. opening a file). -@end deftypefun - -@deftypefun void Lstream_set_buffering (Lstream *@var{lstr}, Lstream_buffering @var{buffering}, int @var{buffering_size}) -Change the buffering of a stream. See @file{lstream.h}. By default the -buffering is @code{STREAM_BLOCK_BUFFERED}. -@end deftypefun - -@deftypefun int Lstream_flush (Lstream *@var{lstr}) -Flush out any pending unwritten data in the stream. Clear any buffered -input data. Returns 0 on success, -1 on error. -@end deftypefun - -@deftypefn Macro int Lstream_putc (Lstream *@var{stream}, int @var{c}) -Write out one byte to the stream. This is a macro and so it is very -efficient. The @var{c} argument is only evaluated once but the @var{stream} -argument is evaluated more than once. Returns 0 on success, -1 on -error. -@end deftypefn - -@deftypefn Macro int Lstream_getc (Lstream *@var{stream}) -Read one byte from the stream. This is a macro and so it is very -efficient. The @var{stream} argument is evaluated more than once. Return -value is -1 for EOF or error. -@end deftypefn - -@deftypefn Macro void Lstream_ungetc (Lstream *@var{stream}, int @var{c}) -Push one byte back onto the input queue. This will be the next byte -read from the stream. Any number of bytes can be pushed back and will -be read in the reverse order they were pushed back---most recent -first. (This is necessary for consistency---if there are a number of -bytes that have been unread and I read and unread a byte, it needs to be -the first to be read again.) This is a macro and so it is very -efficient. The @var{c} argument is only evaluated once but the @var{stream} -argument is evaluated more than once. -@end deftypefn - -@deftypefun int Lstream_fputc (Lstream *@var{stream}, int @var{c}) -@deftypefunx int Lstream_fgetc (Lstream *@var{stream}) -@deftypefunx void Lstream_fungetc (Lstream *@var{stream}, int @var{c}) -Function equivalents of the above macros. -@end deftypefun - -@deftypefun Bytecount Lstream_read (Lstream *@var{stream}, void *@var{data}, Bytecount @var{size}) -Read @var{size} bytes of @var{data} from the stream. Return the number -of bytes read. 0 means EOF. -1 means an error occurred and no bytes -were read. -@end deftypefun - -@deftypefun Bytecount Lstream_write (Lstream *@var{stream}, void *@var{data}, Bytecount @var{size}) -Write @var{size} bytes of @var{data} to the stream. Return the number -of bytes written. -1 means an error occurred and no bytes were written. -@end deftypefun - -@deftypefun void Lstream_unread (Lstream *@var{stream}, void *@var{data}, Bytecount @var{size}) -Push back @var{size} bytes of @var{data} onto the input queue. The next -call to @code{Lstream_read()} with the same size will read the same -bytes back. Note that this will be the case even if there is other -pending unread data. -@end deftypefun - -@deftypefun int Lstream_close (Lstream *@var{stream}) -Close the stream. All data will be flushed out. -@end deftypefun - -@deftypefun void Lstream_reopen (Lstream *@var{stream}) -Reopen a closed stream. This enables I/O on it again. This is not -meant to be called except from a wrapper routine that reinitializes -variables and such---the close routine may well have freed some -necessary storage structures, for example. -@end deftypefun - -@deftypefun void Lstream_rewind (Lstream *@var{stream}) -Rewind the stream to the beginning. -@end deftypefun - -@node Lstream Methods, , Lstream Functions, Lstreams -@section Lstream Methods -@cindex lstream methods - -@deftypefn {Lstream Method} Bytecount reader (Lstream *@var{stream}, unsigned char *@var{data}, Bytecount @var{size}) -Read some data from the stream's end and store it into @var{data}, which -can hold @var{size} bytes. Return the number of bytes read. A return -value of 0 means no bytes can be read at this time. This may be because -of an EOF, or because there is a granularity greater than one byte that -the stream imposes on the returned data, and @var{size} is less than -this granularity. (This will happen frequently for streams that need to -return whole characters, because @code{Lstream_read()} calls the reader -function repeatedly until it has the number of bytes it wants or until 0 -is returned.) The lstream functions do not treat a 0 return as EOF or -do anything special; however, the calling function will interpret any 0 -it gets back as EOF. This will normally not happen unless the caller -calls @code{Lstream_read()} with a very small size. - -This function can be @code{NULL} if the stream is output-only. -@end deftypefn - -@deftypefn {Lstream Method} Bytecount writer (Lstream *@var{stream}, const unsigned char *@var{data}, Bytecount @var{size}) -Send some data to the stream's end. Data to be sent is in @var{data} -and is @var{size} bytes. Return the number of bytes sent. This -function can send and return fewer bytes than is passed in; in that -case, the function will just be called again until there is no data left -or 0 is returned. A return value of 0 means that no more data can be -currently stored, but there is no error; the data will be squirreled -away until the writer can accept data. (This is useful, e.g., if you're -dealing with a non-blocking file descriptor and are getting -@code{EWOULDBLOCK} errors.) This function can be @code{NULL} if the -stream is input-only. -@end deftypefn - -@deftypefn {Lstream Method} int rewinder (Lstream *@var{stream}) -Rewind the stream. If this is @code{NULL}, the stream is not seekable. -@end deftypefn - -@deftypefn {Lstream Method} int seekable_p (Lstream *@var{stream}) -Indicate whether this stream is seekable---i.e. it can be rewound. -This method is ignored if the stream does not have a rewind method. If -this method is not present, the result is determined by whether a rewind -method is present. -@end deftypefn - -@deftypefn {Lstream Method} int flusher (Lstream *@var{stream}) -Perform any additional operations necessary to flush the data in this -stream. -@end deftypefn - -@deftypefn {Lstream Method} int pseudo_closer (Lstream *@var{stream}) -@end deftypefn - -@deftypefn {Lstream Method} int closer (Lstream *@var{stream}) -Perform any additional operations necessary to close this stream down. -May be @code{NULL}. This function is called when @code{Lstream_close()} -is called or when the stream is garbage-collected. When this function -is called, all pending data in the stream will already have been written -out. -@end deftypefn - -@deftypefn {Lstream Method} Lisp_Object marker (Lisp_Object @var{lstream}, void (*@var{markfun}) (Lisp_Object)) -Mark this object for garbage collection. Same semantics as a standard -@code{Lisp_Object} marker. This function can be @code{NULL}. -@end deftypefn - -@node Consoles; Devices; Frames; Windows, The Redisplay Mechanism, Lstreams, Top +@node Consoles; Devices; Frames; Windows, The Redisplay Mechanism, Multilingual Support, Top @chapter Consoles; Devices; Frames; Windows @cindex consoles; devices; frames; windows @cindex devices; frames; windows, consoles; @@ -14186,7 +13648,7 @@ @xref{Specifiers,,, lispref, XEmacs Lisp Reference Manual}. The code in @file{specifier.c} is pretty straightforward. -@node Menus, Subprocesses, Specifiers, Top +@node Menus, Events and the Event Loop, Specifiers, Top @chapter Menus @cindex menus @@ -14239,7 +13701,1378 @@ its argument, which is the callback function or form given in the menu's description. -@node Subprocesses, Interface to MS Windows, Menus, Top +@node Events and the Event Loop, Asynchronous Events; Quit Checking, Menus, Top +@chapter Events and the Event Loop +@cindex events and the event loop +@cindex event loop, events and the + +@menu +* Introduction to Events:: +* Main Loop:: +* Specifics of the Event Gathering Mechanism:: +* Specifics About the Emacs Event:: +* Event Queues:: +* Event Stream Callback Routines:: +* Other Event Loop Functions:: +* Stream Pairs:: +* Converting Events:: +* Dispatching Events; The Command Builder:: +* Focus Handling:: +* Editor-Level Control Flow Modules:: +@end menu + +@node Introduction to Events, Main Loop, Events and the Event Loop, Events and the Event Loop +@section Introduction to Events +@cindex events, introduction to + + An event is an object that encapsulates information about an +interesting occurrence in the operating system. Events are +generated either by user action, direct (e.g. typing on the +keyboard or moving the mouse) or indirect (moving another +window, thereby generating an expose event on an Emacs frame), +or as a result of some other typically asynchronous action happening, +such as output from a subprocess being ready or a timer expiring. +Events come into the system in an asynchronous fashion (typically +through a callback being called) and are converted into a +synchronous event queue (first-in, first-out) in a process that +we will call @dfn{collection}. + + Note that each application has its own event queue. (It is +immaterial whether the collection process directly puts the +events in the proper application's queue, or puts them into +a single system queue, which is later split up.) + + The most basic level of event collection is done by the +operating system or window system. Typically, XEmacs does +its own event collection as well. Often there are multiple +layers of collection in XEmacs, with events from various +sources being collected into a queue, which is then combined +with other sources to go into another queue (i.e. a second +level of collection), with perhaps another level on top of +this, etc. + + XEmacs has its own types of events (called @dfn{Emacs events}), +which provides an abstract layer on top of the system-dependent +nature of the most basic events that are received. Part of the +complex nature of the XEmacs event collection process involves +converting from the operating-system events into the proper +Emacs events---there may not be a one-to-one correspondence. + + Emacs events are documented in @file{events.h}; I'll discuss them +later. + +@node Main Loop, Specifics of the Event Gathering Mechanism, Introduction to Events, Events and the Event Loop +@section Main Loop +@cindex main loop +@cindex events, main loop + + The @dfn{command loop} is the top-level loop that the editor is always +running. It loops endlessly, calling @code{next-event} to retrieve an +event and @code{dispatch-event} to execute it. @code{dispatch-event} does +the appropriate thing with non-user events (process, timeout, +magic, eval, mouse motion); this involves calling a Lisp handler +function, redrawing a newly-exposed part of a frame, reading +subprocess output, etc. For user events, @code{dispatch-event} +looks up the event in relevant keymaps or menubars; when a +full key sequence or menubar selection is reached, the appropriate +function is executed. @code{dispatch-event} may have to keep state +across calls; this is done in the ``command-builder'' structure +associated with each console (remember, there's usually only +one console), and the engine that looks up keystrokes and +constructs full key sequences is called the @dfn{command builder}. +This is documented elsewhere. + + The guts of the command loop are in @code{command_loop_1()}. This +function doesn't catch errors, though---that's the job of +@code{command_loop_2()}, which is a condition-case (i.e. error-trapping) +wrapper around @code{command_loop_1()}. @code{command_loop_1()} never +returns, but may get thrown out of. + + When an error occurs, @code{cmd_error()} is called, which usually +invokes the Lisp error handler in @code{command-error}; however, a +default error handler is provided if @code{command-error} is @code{nil} +(e.g. during startup). The purpose of the error handler is simply to +display the error message and do associated cleanup; it does not need to +throw anywhere. When the error handler finishes, the condition-case in +@code{command_loop_2()} will finish and @code{command_loop_2()} will +reinvoke @code{command_loop_1()}. + + @code{command_loop_2()} is invoked from three places: from +@code{initial_command_loop()} (called from @code{main()} at the end of +internal initialization), from the Lisp function @code{recursive-edit}, +and from @code{call_command_loop()}. + + @code{call_command_loop()} is called when a macro is started and when +the minibuffer is entered; normal termination of the macro or minibuffer +causes a throw out of the recursive command loop. (To +@code{execute-kbd-macro} for macros and @code{exit} for minibuffers. +Note also that the low-level minibuffer-entering function, +@code{read-minibuffer-internal}, provides its own error handling and +does not need @code{command_loop_2()}'s error encapsulation; so it tells +@code{call_command_loop()} to invoke @code{command_loop_1()} directly.) + + Note that both read-minibuffer-internal and recursive-edit set up a +catch for @code{exit}; this is why @code{abort-recursive-edit}, which +throws to this catch, exits out of either one. + + @code{initial_command_loop()}, called from @code{main()}, sets up a +catch for @code{top-level} when invoking @code{command_loop_2()}, +allowing functions to throw all the way to the top level if they really +need to. Before invoking @code{command_loop_2()}, +@code{initial_command_loop()} calls @code{top_level_1()}, which handles +all of the startup stuff (creating the initial frame, handling the +command-line options, loading the user's @file{.emacs} file, etc.). The +function that actually does this is in Lisp and is pointed to by the +variable @code{top-level}; normally this function is +@code{normal-top-level}. @code{top_level_1()} is just an error-handling +wrapper similar to @code{command_loop_2()}. Note also that +@code{initial_command_loop()} sets up a catch for @code{top-level} when +invoking @code{top_level_1()}, just like when it invokes +@code{command_loop_2()}. + +@node Specifics of the Event Gathering Mechanism, Specifics About the Emacs Event, Main Loop, Events and the Event Loop +@section Specifics of the Event Gathering Mechanism +@cindex event gathering mechanism, specifics of the + + Here is an approximate diagram of the collection processes +at work in XEmacs, under TTY's (TTY's are simpler than X +so we'll look at this first): + +@noindent +@example + asynch. asynch. asynch. asynch. [Collectors in +kbd events kbd events process process the OS] + | | output output + | | | | + | | | | SIGINT, [signal handlers + | | | | SIGQUIT, in XEmacs] + V V V V SIGWINCH, + file file file file SIGALRM + desc. desc. desc. desc. | + (TTY) (TTY) (pipe) (pipe) | + | | | | fake timeouts + | | | | file | + | | | | desc. | + | | | | (pipe) | + | | | | | | + | | | | | | + | | | | | | + V V V V V V + ------>-----------<----------------<---------------- + | + | + | [collected using @code{select()} in @code{emacs_tty_next_event()} + | and converted to the appropriate Emacs event] + | + | + V (above this line is TTY-specific) + Emacs ----------------------------------------------- + event (below this line is the generic event mechanism) + | + | +was there if not, call +a SIGINT? @code{emacs_tty_next_event()} + | | + | | + | | + V V + --->------<---- + | + | [collected in @code{event_stream_next_event()}; + | SIGINT is converted using @code{maybe_read_quit_event()}] + V + Emacs + event + | + \---->------>----- maybe_kbd_translate() ---->---\ + | + | + | + command event queue | + if not from command + (contains events that were event queue, call + read earlier but not processed, @code{event_stream_next_event()} + typically when waiting in a | + sit-for, sleep-for, etc. for | + a particular event to be received) | + | | + | | + V V + ---->------------------------------------<---- + | + | [collected in + | @code{next_event_internal()}] + | + unread- unread- event from | + command- command- keyboard else, call + events event macro @code{next_event_internal()} + | | | | + | | | | + | | | | + V V V V + --------->----------------------<------------ + | + | [collected in @code{next-event}, which may loop + | more than once if the event it gets is on + | a dead frame, device, etc.] + | + | + V + feed into top-level event loop, + which repeatedly calls @code{next-event} + and then dispatches the event + using @code{dispatch-event} +@end example + +Notice the separation between TTY-specific and generic event mechanism. +When using the Xt-based event loop, the TTY-specific stuff is replaced +but the rest stays the same. + +It's also important to realize that only one different kind of +system-specific event loop can be operating at a time, and must be able +to receive all kinds of events simultaneously. For the two existing +event loops (implemented in @file{event-tty.c} and @file{event-Xt.c}, +respectively), the TTY event loop @emph{only} handles TTY consoles, +while the Xt event loop handles @emph{both} TTY and X consoles. This +situation is different from all of the output handlers, where you simply +have one per console type. + + Here's the Xt Event Loop Diagram (notice that below a certain point, +it's the same as the above diagram): + +@example +asynch. asynch. asynch. asynch. [Collectors in + kbd kbd process process the OS] +events events output output + | | | | + | | | | asynch. asynch. [Collectors in the + | | | | X X OS and X Window System] + | | | | events events + | | | | | | + | | | | | | + | | | | | | SIGINT, [signal handlers + | | | | | | SIGQUIT, in XEmacs] + | | | | | | SIGWINCH, + | | | | | | SIGALRM + | | | | | | | + | | | | | | | + | | | | | | | timeouts + | | | | | | | | + | | | | | | | | + | | | | | | V | + V V V V V V fake | + file file file file file file file | + desc. desc. desc. desc. desc. desc. desc. | + (TTY) (TTY) (pipe) (pipe) (socket) (socket) (pipe) | + | | | | | | | | + | | | | | | | | + | | | | | | | | + V V V V V V V V + --->----------------------------------------<---------<------ + | | | + | | |[collected using @code{select()} in + | | | @code{_XtWaitForSomething()}, called + | | | from @code{XtAppProcessEvent()}, called + | | | in @code{emacs_Xt_next_event()}; + | | | dispatched to various callbacks] + | | | + | | | + emacs_Xt_ p_s_callback(), | [popup_selection_callback] + event_handler() x_u_v_s_callback(),| [x_update_vertical_scrollbar_ + | x_u_h_s_callback(),| callback] + | search_callback() | [x_update_horizontal_scrollbar_ + | | | callback] + | | | + | | | + enqueue_Xt_ signal_special_ | + dispatch_event() Xt_user_event() | + [maybe multiple | | + times, maybe 0 | | + times] | | + | enqueue_Xt_ | + | dispatch_event() | + | | | + | | | + V V | + -->----------<-- | + | | + | | + dispatch @code{Xt_what_callback()} + event sets flags + queue | + | | + | | + | | + | | + ---->-----------<-------- + | + | + | [collected and converted as appropriate in + | @code{emacs_Xt_next_event()}] + | + | + V (above this line is Xt-specific) + Emacs ------------------------------------------------ + event (below this line is the generic event mechanism) + | + | +was there if not, call +a SIGINT? @code{emacs_Xt_next_event()} + | | + | | + | | + V V + --->-------<---- + | + | [collected in @code{event_stream_next_event()}; + | SIGINT is converted using @code{maybe_read_quit_event()}] + V + Emacs + event + | + \---->------>----- maybe_kbd_translate() -->-----\ + | + | + | + command event queue | + if not from command + (contains events that were event queue, call + read earlier but not processed, @code{event_stream_next_event()} + typically when waiting in a | + sit-for, sleep-for, etc. for | + a particular event to be received) | + | | + | | + V V + ---->----------------------------------<------ + | + | [collected in + | @code{next_event_internal()}] + | + unread- unread- event from | + command- command- keyboard else, call + events event macro @code{next_event_internal()} + | | | | + | | | | + | | | | + V V V V + --------->----------------------<------------ + | + | [collected in @code{next-event}, which may loop + | more than once if the event it gets is on + | a dead frame, device, etc.] + | + | + V + feed into top-level event loop, + which repeatedly calls @code{next-event} + and then dispatches the event + using @code{dispatch-event} +@end example + +@node Specifics About the Emacs Event, Event Queues, Specifics of the Event Gathering Mechanism, Events and the Event Loop +@section Specifics About the Emacs Event +@cindex event, specifics about the Lisp object + +@node Event Queues, Event Stream Callback Routines, Specifics About the Emacs Event, Events and the Event Loop +@section Event Queues +@cindex event queues +@cindex queues, event + +There are two event queues here -- the command event queue (#### which +should be called "deferred event queue" and is in my glyph ws) and the +dispatch event queue. (MS Windows actually has an extra dispatch queue +for non-user events and uses the generic one only for user events. This +is because user and non-user events in Windows come through the same +place -- the window procedure -- but under X, it's possible to +selectively process events such that we take all the user events before +the non-user ones. #### In fact, given the way we now drain the queue, +we might need two separate queues, like under Windows. Need to think +carefully exactly how this works, and should certainly generalize the +two different queues. + +The dispatch queue (which used to occur duplicated inside of each event +implementation) is used for events that have been read from the +window-system event queue(s) and not yet process by +@code{next_event_internal()}. It exists for two reasons: (1) because in many +implementations, events often come from the window system by way of +callbacks, and need to push the event to be returned onto a queue; (2) +in order to handle QUIT in a guaranteed correct fashion without +resorting to weird implementation-specific hacks that may or may not +work well, we need to drain the window-system event queues and then look +through to see if there's an event matching quit-char (usually ^G). the +drained events need to go onto a queue. (There are other, similar cases +where we need to drain the pending events so we can look ahead -- for +example, checking for pending expose events under X to avoid excessive +server activity.) + +The command event queue is used @strong{AFTER} an event has been read from +@code{next_event_internal()}, when it needs to be pushed back. This +includes, for example, @code{accept-process-output}, @code{sleep-for} +and @code{wait_delaying_user_input()}. Eval events and the like, +generated by @code{enqueue-eval-event}, +@code{enqueue_magic_eval_event()}, etc. are also pushed onto this queue. +Some events generated by callbacks are also pushed onto this queue, #### +although maybe shouldn't be. + +The command queue takes precedence over the dispatch queue. + +#### It is worth investigating to see whether both queues are really +needed, and how exactly they should be used. @code{enqueue-eval-event}, +for example, could certainly push onto the dispatch queue, and all +callbacks maybe should. @code{wait_delaying_user_input()} seems to need +both queues, since it can take events from the dispatch queue and push +them onto the command queue; but it perhaps could be rewritten to avoid +this. #### In general we need to review the handling of these two +queues, figure out exactly what ought to be happening, and document it. + + +@node Event Stream Callback Routines, Other Event Loop Functions, Event Queues, Events and the Event Loop +@section Event Stream Callback Routines +@cindex event stream callback routines +@cindex callback routines, event stream + +There is one object called an event_stream. This object contains +callback functions for doing the window-system-dependent operations +that XEmacs requires. + +If XEmacs is compiled with support for X11 and the X Toolkit, then this +event_stream structure will contain functions that can cope with input +on XEmacs windows on multiple displays, as well as input from dumb tty +frames. + +If it is desired to have XEmacs able to open frames on the displays of +multiple heterogeneous machines, X11 and SunView, or X11 and NeXT, for +example, then it will be necessary to construct an event_stream structure +that can cope with the given types. Currently, the only implemented +event_streams are for dumb-ttys, and for X11 plus dumb-ttys, +and for mswindows. + +To implement this for one window system is relatively simple. +To implement this for multiple window systems is trickier and may +not be possible in all situations, but it's been done for X and TTY. + +Note that these callbacks are @strong{NOT} console methods; that's because +the routines are not specific to a particular console type but must +be able to simultaneously cope with all allowable console types. + +The slots of the event_stream structure: + +@table @code +@item next_event_cb +A function which fills in an XEmacs_event structure with the next event +available. If there is no event available, then this should block. + +IMPORTANT: timer events and especially process events *must not* be +returned if there are events of other types available; otherwise you can +end up with an infinite loop in @code{Fdiscard_input()}. + +@item event_pending_cb +A function which says whether there are events to be read. If called +with an argument of 0, then this should say whether calling the +@code{next_event_cb} will block. If called with a non-zero argument, +then this should say whether there are that many user-generated events +pending (that is, keypresses, mouse-clicks, dialog-box selection events, +etc.). (This is used for redisplay optimization, among other things.) +The difference is that the former includes process events and timer +events, but the latter doesn't. + +If this function is not sure whether there are events to be read, it +@strong{must} return 0. Otherwise various undesirable effects will +occur, such as redisplay not occurring until the next event occurs. + +@item handle_magic_event_cb +XEmacs calls this with an event structure which contains window-system +dependent information that XEmacs doesn't need to know about, but which +must happen in order. If the @code{next_event_cb} never returns an +event of type "magic", this will never be used. + +@item format_magic_event_cb +Called with a magic event; print a representation of the innards of the +event to @var{PSTREAM}. + +@item compare_magic_event_cb +Called with two magic events; return non-zero if the innards of the two +are equal, zero otherwise. + +@item hash_magic_event_cb +Called with a magic event; return a hash of the innards of the event. + +@item add_timeout_cb +Called with an @var{EMACS_TIME}, the absolute time at which a wakeup event +should be generated; and a void *, which is an arbitrary value that will +be returned in the timeout event. The timeouts generated by this +function should be one-shots: they fire once and then disappear. This +callback should return an int id-number which uniquely identifies this +wakeup. If an implementation doesn't have microseconds or millisecond +granularity, it should round up to the closest value it can deal with. + +@item remove_timeout_cb +Called with an int, the id number of a wakeup to discard. This id +number must have been returned by the @code{add_timeout_cb}. If the given +wakeup has already expired, this should do nothing. + +@item select_process_cb +@item unselect_process_cb +These callbacks tell the underlying implementation to add or remove a +file descriptor from the list of fds which are polled for +inferior-process input. When input becomes available on the given +process connection, an event of type "process" should be generated. + +@item select_console_cb +@item unselect_console_cb +These callbacks tell the underlying implementation to add or remove a +console from the list of consoles which are polled for user-input. + +@item select_device_cb +@item unselect_device_cb +These callbacks are used by Unixoid event loops (those that use @code{select()} +and file descriptors and have a separate input fd per device). + +@item create_io_streams_cb +@item delete_io_streams_cb +These callbacks are called by process code to create the input and +output lstreams which are used for subprocess I/O. + +@item quitp_cb +A handler function called from the @code{QUIT} macro which should check +whether the quit character has been typed. On systems with SIGIO, this +will not be called unless the @code{sigio_happened} flag is true (it is set +from the SIGIO handler). +@end table + +XEmacs has its own event structures, which are distinct from the event +structures used by X or any other window system. It is the job of the +event_stream layer to translate to this format. + +@node Other Event Loop Functions, Stream Pairs, Event Stream Callback Routines, Events and the Event Loop +@section Other Event Loop Functions +@cindex event loop functions, other + + @code{detect_input_pending()} and @code{input-pending-p} look for +input by calling @code{event_stream->event_pending_p} and looking in +@code{[V]unread-command-event} and the @code{command_event_queue} (they +do not check for an executing keyboard macro, though). + + @code{discard-input} cancels any command events pending (and any +keyboard macros currently executing), and puts the others onto the +@code{command_event_queue}. There is a comment about a ``race +condition'', which is not a good sign. + + @code{next-command-event} and @code{read-char} are higher-level +interfaces to @code{next-event}. @code{next-command-event} gets the +next @dfn{command} event (i.e. keypress, mouse event, menu selection, +or scrollbar action), calling @code{dispatch-event} on any others. +@code{read-char} calls @code{next-command-event} and uses +@code{event_to_character()} to return the character equivalent. With +the right kind of input method support, it is possible for (read-char) +to return a Kanji character. + +@node Stream Pairs, Converting Events, Other Event Loop Functions, Events and the Event Loop +@section Stream Pairs +@cindex stream pairs +@cindex pairs, stream + +Since there are many possible processes/event loop combinations, the +event code is responsible for creating an appropriate lstream type. The +process implementation does not care about that implementation. + +The Create stream pair function is passed two void* values, which +identify process-dependent 'handles'. The process implementation uses +these handles to communicate with child processes. The function must be +prepared to receive handle types of any process implementation. Since +only one process implementation exists in a particular XEmacs +configuration, preprocessing is a means of compiling in the support for +the code which deals with particular handle types. + +For example, a unixoid type loop, which relies on file descriptors, may be +asked to create a pair of streams by a unix-style process implementation. +In this case, the handles passed are unix file descriptors, and the code +may deal with these directly. Although, the same code may be used on Win32 +system with X-Windows. In this case, Win32 process implementation passes +handles of type HANDLE, and the @code{create_io_streams} function must call +appropriate function to get file descriptors given HANDLEs, so that these +descriptors may be passed to @code{XtAddInput}. + +The handle given may have special denying value, in which case the +corresponding lstream should not be created. + +The return value of the function is a unique stream identifier. It is used +by processes implementation, in its platform-independent part. There is +the get_process_from_usid function, which returns process object given its +USID. The event stream is responsible for converting its internal handle +type into USID. + +Example is the TTY event stream. When a file descriptor signals input, the +event loop must determine process to which the input is destined. Thus, +the implementation uses process input stream file descriptor as USID, by +simply casting the fd value to USID type. + +There are two special USID values. One, @code{USID_ERROR}, indicates +that the stream pair cannot be created. The second, +@code{USID_DONTHASH}, indicates that streams are created, but the event +stream does not wish to be able to find the process by its +USID. Specifically, if an event stream implementation never calls +@code{get_process_from_usid}, this value should always be returned, to +prevent accumulating useless information on USID to process +relationship. + +@node Converting Events, Dispatching Events; The Command Builder, Stream Pairs, Events and the Event Loop +@section Converting Events +@cindex converting events +@cindex events, converting + + @code{character_to_event()}, @code{event_to_character()}, +@code{event-to-character}, and @code{character-to-event} convert between +characters and keypress events corresponding to the characters. If the +event was not a keypress, @code{event_to_character()} returns -1 and +@code{event-to-character} returns @code{nil}. These functions convert +between character representation and the split-up event representation +(keysym plus mod keys). + +@node Dispatching Events; The Command Builder, Focus Handling, Converting Events, Events and the Event Loop +@section Dispatching Events; The Command Builder +@cindex dispatching events; the command builder +@cindex events; the command builder, dispatching +@cindex command builder, dispatching events; the + +Not yet documented. + +@node Focus Handling, Editor-Level Control Flow Modules, Dispatching Events; The Command Builder, Events and the Event Loop +@section Focus Handling +@cindex focus handling + +Ben's capsule lecture on focus: + +In GNU Emacs @code{select-frame} never changes the window-manager frame +focus. All it does is change the "selected frame". This is similar to +what happens when we call @code{select-device} or @code{select-console}. +Whenever an event comes in (including a keyboard event), its frame is +selected; therefore, evaluating @code{select-frame} in @samp{*scratch*} +won't cause any effects because the next received event (in the same +frame) will cause a switch back to the frame displaying +@samp{*scratch*}. + +Whenever a focus-change event is received from the window manager, it +generates a @code{switch-frame} event, which causes the Lisp function +@code{handle-switch-frame} to get run. This basically just runs +@code{select-frame} (see below, however). + +In GNU Emacs, if you want to have an operation run when a frame is +selected, you supply an event binding for @code{switch-frame} (and then +maybe call @code{handle-switch-frame}, or something ...). + +In XEmacs, we @strong{do} change the window-manager frame focus as a +result of @code{select-frame}, but not until the next time an event is +received, so that a function that momentarily changes the selected frame +won't cause WM focus flashing. (#### There's something not quite right +here; this is causing the wrong-cursor-focus problems that you +occasionally see. But the general idea is correct.) This approach is +winning for people who use the explicit-focus model, but is trickier to +implement. + +We also don't make the @code{switch-frame} event visible but instead have +@code{select-frame-hook}, which is a better approach. + +There is the problem of surrogate minibuffers, where when we enter the +minibuffer, you essentially want to temporarily switch the WM focus to +the frame with the minibuffer, and switch it back when you exit the +minibuffer. + +GNU Emacs solves this with the crockish @code{redirect-frame-focus}, +which says "for keyboard events received from FRAME, act like they're +coming from FOCUS-FRAME". I think what this means is that, when a +keyboard event comes in and the event manager is about to select the +event's frame, if that frame has its focus redirected, the redirected-to +frame is selected instead. That way, if you're in a minibufferless +frame and enter the minibuffer, then all Lisp functions that run see the +selected frame as the minibuffer's frame rather than the minibufferless +frame you came from, so that (e.g.) your typing actually appears in the +minibuffer's frame and things behave sanely. + +There's also some weird logic that switches the redirected frame focus +from one frame to another if Lisp code explicitly calls +@code{select-frame} (but not if @code{handle-switch-frame} is called), +and saves and restores the frame focus in window configurations, +etc. etc. All of this logic is heavily @code{#if 0}'d, with lots of +comments saying "No, this approach doesn't seem to work, so I'm trying +this ... is it reasonable? Well, I'm not sure ..." that are a red flag +indicating crockishness. + +Because of our way of doing things, we can avoid all this crock. +Keyboard events never cause a select-frame (who cares what frame they're +associated with? They come from a console, only). We change the actual +WM focus to a surrogate minibuffer frame, so we don't have to do any +internal redirection. In order to get the focus back, I took the +approach in @file{minibuf.el} of just checking to see if the frame we moved to +is still the selected frame, and move back to the old one if so. +Conceivably we might have to do the weird "tracking" that GNU Emacs does +when @code{select-frame} is called, but I don't think so. If the +selected frame moved from the minibuffer frame, then we just leave it +there, figuring that someone knows what they're doing. Because we don't +have any redirection recorded anywhere, it's safe to do this, and we +don't end up with unwanted redirection. + +@node Editor-Level Control Flow Modules, , Focus Handling, Events and the Event Loop +@section Editor-Level Control Flow Modules +@cindex control flow modules, editor-level +@cindex modules, editor-level control flow + +@example +@file{event-Xt.c} +@file{event-msw.c} +@file{event-stream.c} +@file{event-tty.c} +@file{events-mod.h} +@file{gpmevent.c} +@file{gpmevent.h} +@file{events.c} +@file{events.h} +@end example + +These implement the handling of events (user input and other system +notifications). + +@file{events.c} and @file{events.h} define the @dfn{event} Lisp object +type and primitives for manipulating it. + +@file{event-stream.c} implements the basic functions for working with +event queues, dispatching an event by looking it up in relevant keymaps +and such, and handling timeouts; this includes the primitives +@code{next-event} and @code{dispatch-event}, as well as related +primitives such as @code{sit-for}, @code{sleep-for}, and +@code{accept-process-output}. (@file{event-stream.c} is one of the +hairiest and trickiest modules in XEmacs. Beware! You can easily mess +things up here.) + +@file{event-Xt.c} and @file{event-tty.c} implement the low-level +interfaces onto retrieving events from Xt (the X toolkit) and from TTY's +(using @code{read()} and @code{select()}), respectively. The event +interface enforces a clean separation between the specific code for +interfacing with the operating system and the generic code for working +with events, by defining an API of basic, low-level event methods; +@file{event-Xt.c} and @file{event-tty.c} are two different +implementations of this API. To add support for a new operating system +(e.g. NeXTstep), one merely needs to provide another implementation of +those API functions. + +Note that the choice of whether to use @file{event-Xt.c} or +@file{event-tty.c} is made at compile time! Or at the very latest, it +is made at startup time. @file{event-Xt.c} handles events for +@emph{both} X and TTY frames; @file{event-tty.c} is only used when X +support is not compiled into XEmacs. The reason for this is that there +is only one event loop in XEmacs: thus, it needs to be able to receive +events from all different kinds of frames. + + + +@example +@file{keymap.c} +@file{keymap.h} +@end example + +@file{keymap.c} and @file{keymap.h} define the @dfn{keymap} Lisp object +type and associated methods and primitives. (Remember that keymaps are +objects that associate event descriptions with functions to be called to +``execute'' those events; @code{dispatch-event} looks up events in the +relevant keymaps.) + + + +@example +@file{cmdloop.c} +@end example + +@file{cmdloop.c} contains functions that implement the actual editor +command loop---i.e. the event loop that cyclically retrieves and +dispatches events. This code is also rather tricky, just like +@file{event-stream.c}. + + + +@example +@file{macros.c} +@file{macros.h} +@end example + +These two modules contain the basic code for defining keyboard macros. +These functions don't actually do much; most of the code that handles keyboard +macros is mixed in with the event-handling code in @file{event-stream.c}. + + + +@example +@file{minibuf.c} +@end example + +This contains some miscellaneous code related to the minibuffer (most of +the minibuffer code was moved into Lisp by Richard Mlynarik). This +includes the primitives for completion (although filename completion is +in @file{dired.c}), the lowest-level interface to the minibuffer (if the +command loop were cleaned up, this too could be in Lisp), and code for +dealing with the echo area (this, too, was mostly moved into Lisp, and +the only code remaining is code to call out to Lisp or provide simple +bootstrapping implementations early in temacs, before the echo-area Lisp +code is loaded). + + +@node Asynchronous Events; Quit Checking, Lstreams, Events and the Event Loop, Top +@chapter Asynchronous Events; Quit Checking +@cindex asynchronous events; quit checking +@cindex asynchronous events + +@menu +* Signal Handling:: +* Control-G (Quit) Checking:: +* Profiling:: +* Asynchronous Timeouts:: +* Exiting:: +@end menu + +@node Signal Handling, Control-G (Quit) Checking, Asynchronous Events; Quit Checking, Asynchronous Events; Quit Checking +@section Signal Handling +@cindex signal handling + +@node Control-G (Quit) Checking, Profiling, Signal Handling, Asynchronous Events; Quit Checking +@section Control-G (Quit) Checking +@cindex Control-g checking +@cindex C-g checking +@cindex quit checking +@cindex QUIT checking +@cindex critical quit + +@emph{Note}: The code to handle QUIT is divided between @file{lisp.h} +and @file{signal.c}. There is also some special-case code in the async +timer code in @file{event-stream.c} to notice when the poll-for-quit +(and poll-for-sigchld) timers have gone off. + +Here's an overview of how this convoluted stuff works: + +@enumerate +@item + +Scattered throughout the XEmacs core code are calls to the macro QUIT; +This macro checks to see whether a @kbd{C-g} has recently been pressed +and not yet handled, and if so, it handles the @kbd{C-g} by calling +@code{signal_quit()}, which invokes the standard @code{Fsignal()} code, +with the error being @code{Qquit}. Lisp code can establish handlers +for this (using @code{condition-case}), but normally there is no +handler, and so execution is thrown back to the innermost enclosing +event loop. (One of the things that happens when entering an event loop +is that a @code{condition-case} is established that catches @strong{all} calls +to @code{signal}, including this one.) + +@item +How does the QUIT macro check to see whether @kbd{C-g} has been pressed; +obviously this needs to be extremely fast. Now for some history. +In early Lemacs as inherited from the FSF going back 15 years or +more, there was a great fondness for using SIGIO (which is sent +whenever there is I/O available on a given socket, tty, etc.). +In fact, in GNU Emacs, perhaps even today, all reading of events +from the X server occurs inside the SIGIO handler! This is crazy, +but not completely relevant. What is relevant is that similar +stuff happened inside the SIGIO handler for @kbd{C-g}: it searched +through all the pending (i.e. not yet delivered to XEmacs yet) +X events for one that matched @kbd{C-g}. When it saw a match, it set +Vquit_flag to Qt. On TTY's, @kbd{C-g} is actually mapped to be the +interrupt character (i.e. it generates SIGINT), and XEmacs's +handler for this signal sets Vquit_flag to Qt. Then, sometime +later after the signal handlers finished and a QUIT macro was +called, the macro noticed the setting of @code{Vquit_flag} and used +this as an indication to call @code{signal_quit()}. What @code{signal_quit()} +actually does is set @code{Vquit_flag} to Qnil (so that we won't get +repeated interruptions from a single @kbd{C-g} press) and then calls +the equivalent of (signal 'quit nil). + +@item +Another complication is introduced in that Vquit_flag is actually +exported to Lisp as @code{quit-flag}. This allows users some level of +control over whether and when @kbd{C-g} is processed as quit, esp. in +combination with @code{inhibit-quit}. This is another Lisp variable, +and if set to non-nil, it inhibits @code{signal_quit()} from getting +called, meaning that the @kbd{C-g} gets essentially ignored. But not +completely: Because the resetting of @code{quit-flag} happens only +in @code{signal_quit()}, which isn't getting called, the @kbd{C-g} press is +still noticed, and as soon as @code{inhibit-quit} is set back to nil, +a quit will be signalled at the next QUIT macro. Thus, what +@code{inhibit-quit} really does is defer quits until after the quit- +inhibitted period. + +@item +Another consideration, introduced by XEmacs, is critical quitting. If +you press @kbd{Control-Shift-G} instead of just @kbd{C-g}, +@code{quit-flag} is set to @code{critical} instead of to t. When QUIT +processes this value, it @strong{ignores} the value of +@code{inhibit-quit}. This allows you to quit even out of a +quit-inhibitted section of code! Furthermore, when @code{signal_quit()} +notices that it was invoked as a result of a critical quit, it +automatically invokes the debugger (which otherwise would only happen +when @code{debug-on-quit} is set to t). + +@item +Well, I explained above about how @code{quit-flag} gets set correctly, +but I began with a disclaimer stating that this was the old way +of doing things. What's done now? Well, first of all, the SIGIO +handler (which formerly checked all pending events to see if there's +a @kbd{C-g}) now does nothing but set a flag -- or actually two flags, +something_happened and quit_check_signal_happened. There are two +flags because the QUIT macro is now used for more than just handling +QUIT; it's also used for running asynchronous timeout handlers that +have recently expired, and perhaps other things. The idea here is +that the QUIT macros occur extremely often in the code, but only occur +at places that are relatively safe -- in particular, if an error occurs, +nothing will get completely trashed. + +@item +Now, let's look at QUIT again. + +@item + +UNFINISHED. Note, however, that as of the point when this comment got +committed to CVS (mid-2001), the interaction between reading @kbd{C-g} +as an event and processing it as QUIT was overhauled to (for the first +time) be understandable and actually work correctly. Now, the way +things work is that if @kbd{C-g} is pressed while XEmacs is blocking at +the top level, waiting for a user event, it will be read as an event; +otherwise, it will cause QUIT. (This includes times when XEmacs is +blocking, but not waiting for a user event, +e.g. @code{accept-process-output} and +@code{wait_delaying_user_events()}.) Formerly, this was supposed to +happen, but didn't always due to a bizarre and broken scheme, documented +in @code{next_event_internal} like this: + +@quotation +If we read a @kbd{C-g}, then set @code{quit-flag} but do not discard the +@kbd{C-g}. The callers of @code{next_event_internal()} will do one of +two things: + +@enumerate +@item +set @code{Vquit_flag} to Qnil. (@code{next-event} does this.) This will +cause the ^G to be treated as a normal keystroke. + +@item +not change @code{Vquit_flag} but attempt to enqueue the ^G, at which +point it will be discarded. The next time QUIT is called, it will +notice that @code{Vquit_flag} was set. +@end enumerate +@end quotation + +This required weirdness in @code{enqueue_command_event_1} like this: + +@quotation +put the event on the typeahead queue, unless the event is the quit char, +in which case the @code{QUIT} which will occur on the next trip through this +loop is all the processing we should do - leaving it on the queue would +cause the quit to be processed twice. +@end quotation + +And further weirdness elsewhere, none of which made any sense, and +didn't work, because (e.g.) it required that QUIT never happen anywhere +inside @code{next_event_internal()} or any callers when @kbd{C-g} should +be read as a user event, which was impossible to implement in practice. + +Now what we do is fairly simple. Callers of +@code{next_event_internal()} that want @kbd{C-g} read as a user event +call @code{begin_dont_check_for_quit()}. @code{next_event_internal()}, +when it gets a @kbd{C-g}, simply sets @code{Vquit_flag} (just as when a +@kbd{C-g} is detected during the operation of @code{QUIT} or +@code{QUITP}), and then tries to @code{QUIT}. This will fail if blocked +by the previous call, at which point @code{next_event_internal()} will +return the @kbd{C-g} as an event. To unblock things, first set +@code{Vquit_flag} to nil (it was set to t when the @kbd{C-g} was read, +and if we don't reset it, the next call to @code{QUIT} will quit), and +then @code{unbind_to()} the depth returned by +@code{begin_dont_check_for_quit()}. It makes no difference is +@code{QUIT} is called a zillion times in @code{next_event_internal()} or +anywhere else, because it's blocked and will never signal. +@end enumerate + +@node Profiling, Asynchronous Timeouts, Control-G (Quit) Checking, Asynchronous Events; Quit Checking +@section Profiling +@cindex profiling +@cindex SIGPROF + +We implement our own profiling scheme so that we can determine +things like which Lisp functions are occupying the most time. Any +standard OS-provided profiling works on C functions, which is +not always that useful -- and inconvenient, since it requires compiling +with profile info and can't be retrieved dynamically, as XEmacs is +running. + +The basic idea is simple. We set a profiling timer using setitimer +(ITIMER_PROF), which generates a SIGPROF every so often. (This runs not +in real time but rather when the process is executing or the system is +running on behalf of the process -- at least, that is the case under +Unix. Under MS Windows and Cygwin, there is no @code{setitimer()}, so we +simulate it using multimedia timers, which run in real time. To make +the results a bit more realistic, we ignore ticks that go off while +blocking on an event wait. Note that Cygwin does provide a simulation +of @code{setitimer()}, but it's in real time anyway, since Windows doesn't +provide a way to have process-time timers, and furthermore, it's broken, +so we don't use it.) When the signal goes off, we see what we're in, and +add 1 to the count associated with that function. + +It would be nice to use the Lisp allocation mechanism etc. to keep track +of the profiling information (i.e. to use Lisp hash tables), but we +can't because that's not safe -- updating the timing information happens +inside of a signal handler, so we can't rely on not being in the middle +of Lisp allocation, garbage collection, @code{malloc()}, etc. Trying to make +it work would be much more work than it's worth. Instead we use a basic +(non-Lisp) hash table, which will not conflict with garbage collection +or anything else as long as it doesn't try to resize itself. Resizing +itself, however (which happens as a result of a @code{puthash()}), could be +deadly. To avoid this, we make sure, at points where it's safe +(e.g. @code{profile_record_about_to_call()} -- recording the entry into a +function call), that the table always has some breathing room in it so +that no resizes will occur until at least that many items are added. +This is safe because any new item to be added in the sigprof would +likely have the @code{profile_record_about_to_call()} called just before it, +and the breathing room is checked. + +In general: any entry that the sigprof handler puts into the table comes +from a backtrace frame (except "Processing Events at Top Level", and +there's only one of those). Either that backtrace frame was added when +profiling was on (in which case @code{profile_record_about_to_call()} was +called and the breathing space updated), or when it was off -- and in +this case, no such frames can have been added since the last time +@code{start-profile} was called, so when @code{start-profile} is called we make +sure there is sufficient breathing room to account for all entries +currently on the stack. + +Jan 1998: In addition to timing info, I have added code to remember call +counts of Lisp funcalls. The @code{profile_increase_call_count()} +function is called from @code{Ffuncall()}, and serves to add data to +Vcall_count_profile_table. This mechanism is much simpler and +independent of the SIGPROF-driven one. It uses the Lisp allocation +mechanism normally, since it is not called from a handler. It may +even be useful to provide a way to turn on only one profiling +mechanism, but I haven't done so yet. --hniksic + +Dec 2002: Total overhaul of the interface, making it sane and easier to +use. --ben + +Feb 2003: Lots of rewriting of the internal code. Add GC-consing-usage, +total GC usage, and total timing to the information tracked. Track +profiling overhead and allow the ability to have internal sections +(e.g. internal-external conversion, byte-char conversion) that are +treated like Lisp functions for the purpose of profiling. --ben + +BEWARE: If you are modifying this file, be @strong{very} careful. Correctly +implementing the "total" values is very tricky due to the possibility of +recursion and of functions already on the stack when starting to +profile/still on the stack when stopping. + +@node Asynchronous Timeouts, Exiting, Profiling, Asynchronous Events; Quit Checking +@section Asynchronous Timeouts +@cindex asynchronous timeouts + +@node Exiting, , Asynchronous Timeouts, Asynchronous Events; Quit Checking +@section Exiting +@cindex exiting +@cindex crash +@cindex hang +@cindex core dump +@cindex Armageddon +@cindex exits, expected and unexpected +@cindex unexpected exits +@cindex expected exits + +Ben's capsule summary about expected and unexpected exits from XEmacs. + +Expected exits occur when the user directs XEmacs to exit, for example +by pressing the close button on the only frame in XEmacs, or by typing +@kbd{C-x C-c}. This runs @code{save-buffers-kill-emacs}, which saves +any necessary buffers, and then exits using the primitive +@code{kill-emacs}. + +However, unexpected exits occur in a few different ways: + +@itemize @bullet +@item +A memory access violation or other hardware-generated exception occurs. +This is the worst possible problem to deal with, because the fault can +occur while XEmacs is in any state whatsoever, even quite unstable ones. +As a result, we need to be @strong{extremely} careful what we do. + +@item +We are using one X display (or if we've used more, we've closed the +others already), and some hardware or other problem happens and +suddenly we've lost our connection to the display. In this situation, +things are not so dire as in the last one; our code itself isn't +trashed, so we can continue execution as normal, after having set +things up so that we can exit at the appropriate time. Our exit +still needs to be of the emergency nature; we have no displays, so +any attempts to use them will fail. We simply want to auto-save +(the single most important thing to do during shut-down), do minimal +cleanup of stuff that has an independent existence outside of XEmacs, +and exit. +@end itemize + +Currently, both unexpected exit scenarios described above set +@code{preparing_for_armageddon} to indicate that nonessential and possibly +dangerous things should not be done, specifically: + +@itemize @minus +@item +no garbage collection. +@item +no hooks are run. +@item +no messages of any sort from autosaving. +@item +autosaving tries harder, ignoring certain failures. +@item +existing frames are not deleted. +@end itemize + +(Also, all places that set @code{preparing_for_armageddon} also +set @code{dont_check_for_quit}. This happens separately because it's +also necessary to set other variables to make absolutely sure +no quitting happens.) + +In the first scenario above (the access violation), we also set +@code{fatal_error_in_progress}. This causes more things to not happen: + +@itemize @minus +@item +assertion failures do not abort. +@item +printing code does not do code conversion or gettext when +printing to stdout/stderr. +@end itemize + +@node Lstreams, Subprocesses, Asynchronous Events; Quit Checking, Top +@chapter Lstreams +@cindex lstreams + + An @dfn{lstream} is an internal Lisp object that provides a generic +buffering stream implementation. Conceptually, you send data to the +stream or read data from the stream, not caring what's on the other end +of the stream. The other end could be another stream, a file +descriptor, a stdio stream, a fixed block of memory, a reallocating +block of memory, etc. The main purpose of the stream is to provide a +standard interface and to do buffering. Macros are defined to read or +write characters, so the calling functions do not have to worry about +blocking data together in order to achieve efficiency. + +@menu +* Creating an Lstream:: Creating an lstream object. +* Lstream Types:: Different sorts of things that are streamed. +* Lstream Functions:: Functions for working with lstreams. +* Lstream Methods:: Creating new lstream types. +@end menu + +@node Creating an Lstream, Lstream Types, Lstreams, Lstreams +@section Creating an Lstream +@cindex lstream, creating an + +Lstreams come in different types, depending on what is being interfaced +to. Although the primitive for creating new lstreams is +@code{Lstream_new()}, generally you do not call this directly. Instead, +you call some type-specific creation function, which creates the lstream +and initializes it as appropriate for the particular type. + +All lstream creation functions take a @var{mode} argument, specifying +what mode the lstream should be opened as. This controls whether the +lstream is for input and output, and optionally whether data should be +blocked up in units of MULE characters. Note that some types of +lstreams can only be opened for input; others only for output; and +others can be opened either way. #### Richard Mlynarik thinks that +there should be a strict separation between input and output streams, +and he's probably right. + + @var{mode} is a string, one of + +@table @code +@item "r" + Open for reading. +@item "w" + Open for writing. +@item "rc" + Open for reading, but ``read'' never returns partial MULE characters. +@item "wc" + Open for writing, but never writes partial MULE characters. +@end table + +@node Lstream Types, Lstream Functions, Creating an Lstream, Lstreams +@section Lstream Types +@cindex lstream types +@cindex types, lstream + +@table @asis +@item stdio + +@item filedesc + +@item lisp-string + +@item fixed-buffer + +@item resizing-buffer + +@item dynarr + +@item lisp-buffer + +@item print + +@item decoding + +@item encoding +@end table + +@node Lstream Functions, Lstream Methods, Lstream Types, Lstreams +@section Lstream Functions +@cindex lstream functions + +@deftypefun {Lstream *} Lstream_new (Lstream_implementation *@var{imp}, const char *@var{mode}) +Allocate and return a new Lstream. This function is not really meant to +be called directly; rather, each stream type should provide its own +stream creation function, which creates the stream and does any other +necessary creation stuff (e.g. opening a file). +@end deftypefun + +@deftypefun void Lstream_set_buffering (Lstream *@var{lstr}, Lstream_buffering @var{buffering}, int @var{buffering_size}) +Change the buffering of a stream. See @file{lstream.h}. By default the +buffering is @code{STREAM_BLOCK_BUFFERED}. +@end deftypefun + +@deftypefun int Lstream_flush (Lstream *@var{lstr}) +Flush out any pending unwritten data in the stream. Clear any buffered +input data. Returns 0 on success, -1 on error. +@end deftypefun + +@deftypefn Macro int Lstream_putc (Lstream *@var{stream}, int @var{c}) +Write out one byte to the stream. This is a macro and so it is very +efficient. The @var{c} argument is only evaluated once but the @var{stream} +argument is evaluated more than once. Returns 0 on success, -1 on +error. +@end deftypefn + +@deftypefn Macro int Lstream_getc (Lstream *@var{stream}) +Read one byte from the stream. This is a macro and so it is very +efficient. The @var{stream} argument is evaluated more than once. Return +value is -1 for EOF or error. +@end deftypefn + +@deftypefn Macro void Lstream_ungetc (Lstream *@var{stream}, int @var{c}) +Push one byte back onto the input queue. This will be the next byte +read from the stream. Any number of bytes can be pushed back and will +be read in the reverse order they were pushed back---most recent +first. (This is necessary for consistency---if there are a number of +bytes that have been unread and I read and unread a byte, it needs to be +the first to be read again.) This is a macro and so it is very +efficient. The @var{c} argument is only evaluated once but the @var{stream} +argument is evaluated more than once. +@end deftypefn + +@deftypefun int Lstream_fputc (Lstream *@var{stream}, int @var{c}) +@deftypefunx int Lstream_fgetc (Lstream *@var{stream}) +@deftypefunx void Lstream_fungetc (Lstream *@var{stream}, int @var{c}) +Function equivalents of the above macros. +@end deftypefun + +@deftypefun Bytecount Lstream_read (Lstream *@var{stream}, void *@var{data}, Bytecount @var{size}) +Read @var{size} bytes of @var{data} from the stream. Return the number +of bytes read. 0 means EOF. -1 means an error occurred and no bytes +were read. +@end deftypefun + +@deftypefun Bytecount Lstream_write (Lstream *@var{stream}, void *@var{data}, Bytecount @var{size}) +Write @var{size} bytes of @var{data} to the stream. Return the number +of bytes written. -1 means an error occurred and no bytes were written. +@end deftypefun + +@deftypefun void Lstream_unread (Lstream *@var{stream}, void *@var{data}, Bytecount @var{size}) +Push back @var{size} bytes of @var{data} onto the input queue. The next +call to @code{Lstream_read()} with the same size will read the same +bytes back. Note that this will be the case even if there is other +pending unread data. +@end deftypefun + +@deftypefun int Lstream_close (Lstream *@var{stream}) +Close the stream. All data will be flushed out. +@end deftypefun + +@deftypefun void Lstream_reopen (Lstream *@var{stream}) +Reopen a closed stream. This enables I/O on it again. This is not +meant to be called except from a wrapper routine that reinitializes +variables and such---the close routine may well have freed some +necessary storage structures, for example. +@end deftypefun + +@deftypefun void Lstream_rewind (Lstream *@var{stream}) +Rewind the stream to the beginning. +@end deftypefun + +@node Lstream Methods, , Lstream Functions, Lstreams +@section Lstream Methods +@cindex lstream methods + +@deftypefn {Lstream Method} Bytecount reader (Lstream *@var{stream}, unsigned char *@var{data}, Bytecount @var{size}) +Read some data from the stream's end and store it into @var{data}, which +can hold @var{size} bytes. Return the number of bytes read. A return +value of 0 means no bytes can be read at this time. This may be because +of an EOF, or because there is a granularity greater than one byte that +the stream imposes on the returned data, and @var{size} is less than +this granularity. (This will happen frequently for streams that need to +return whole characters, because @code{Lstream_read()} calls the reader +function repeatedly until it has the number of bytes it wants or until 0 +is returned.) The lstream functions do not treat a 0 return as EOF or +do anything special; however, the calling function will interpret any 0 +it gets back as EOF. This will normally not happen unless the caller +calls @code{Lstream_read()} with a very small size. + +This function can be @code{NULL} if the stream is output-only. +@end deftypefn + +@deftypefn {Lstream Method} Bytecount writer (Lstream *@var{stream}, const unsigned char *@var{data}, Bytecount @var{size}) +Send some data to the stream's end. Data to be sent is in @var{data} +and is @var{size} bytes. Return the number of bytes sent. This +function can send and return fewer bytes than is passed in; in that +case, the function will just be called again until there is no data left +or 0 is returned. A return value of 0 means that no more data can be +currently stored, but there is no error; the data will be squirreled +away until the writer can accept data. (This is useful, e.g., if you're +dealing with a non-blocking file descriptor and are getting +@code{EWOULDBLOCK} errors.) This function can be @code{NULL} if the +stream is input-only. +@end deftypefn + +@deftypefn {Lstream Method} int rewinder (Lstream *@var{stream}) +Rewind the stream. If this is @code{NULL}, the stream is not seekable. +@end deftypefn + +@deftypefn {Lstream Method} int seekable_p (Lstream *@var{stream}) +Indicate whether this stream is seekable---i.e. it can be rewound. +This method is ignored if the stream does not have a rewind method. If +this method is not present, the result is determined by whether a rewind +method is present. +@end deftypefn + +@deftypefn {Lstream Method} int flusher (Lstream *@var{stream}) +Perform any additional operations necessary to flush the data in this +stream. +@end deftypefn + +@deftypefn {Lstream Method} int pseudo_closer (Lstream *@var{stream}) +@end deftypefn + +@deftypefn {Lstream Method} int closer (Lstream *@var{stream}) +Perform any additional operations necessary to close this stream down. +May be @code{NULL}. This function is called when @code{Lstream_close()} +is called or when the stream is garbage-collected. When this function +is called, all pending data in the stream will already have been written +out. +@end deftypefn + +@deftypefn {Lstream Method} Lisp_Object marker (Lisp_Object @var{lstream}, void (*@var{markfun}) (Lisp_Object)) +Mark this object for garbage collection. Same semantics as a standard +@code{Lisp_Object} marker. This function can be @code{NULL}. +@end deftypefn + +@node Subprocesses, Interface to MS Windows, Lstreams, Top @chapter Subprocesses @cindex subprocesses @@ -14850,7 +15683,7 @@ Auto-generated Unicode encapsulation headers @end table -@node Interface to the X Window System, Future Work, Interface to MS Windows, Top +@node Interface to the X Window System, Dumping, Interface to MS Windows, Top @chapter Interface to the X Window System @cindex X Window System, interface to the @@ -15146,11 +15979,423 @@ Don't touch this code; something is liable to break if you do. -@node Future Work, Future Work Discussion, Interface to the X Window System, Top +@node Dumping, Future Work, Interface to the X Window System, Top +@chapter Dumping +@cindex dumping + +@menu +* Dumping Justification:: +* Overview:: +* Data descriptions:: +* Dumping phase:: +* Reloading phase:: +* Remaining issues:: +@end menu + +@node Dumping Justification, Overview, Dumping, Dumping +@section Dumping Justification +@cindex dumping, justification + +The C code of XEmacs is just a Lisp engine with a lot of built-in +primitives useful for writing an editor. The editor itself is written +mostly in Lisp, and represents around 100K lines of code. Loading and +executing the initialization of all this code takes a bit a time (five +to ten times the usual startup time of current xemacs) and requires +having all the lisp source files around. Having to reload them each +time the editor is started would not be acceptable. + +The traditional solution to this problem is called dumping: the build +process first creates the lisp engine under the name @file{temacs}, then +runs it until it has finished loading and initializing all the lisp +code, and eventually creates a new executable called @file{xemacs} +including both the object code in @file{temacs} and all the contents of +the memory after the initialization. + +This solution, while working, has a huge problem: the creation of the +new executable from the actual contents of memory is an extremely +system-specific process, quite error-prone, and which interferes with a +lot of system libraries (like malloc). It is even getting worse +nowadays with libraries using constructors which are automatically +called when the program is started (even before @code{main()}) which tend to +crash when they are called multiple times, once before dumping and once +after (IRIX 6.x @file{libz.so} pulls in some C++ image libraries thru +dependencies which have this problem). Writing the dumper is also one +of the most difficult parts of porting XEmacs to a new operating system. +Basically, `dumping' is an operation that is just not officially +supported on many operating systems. + +The aim of the portable dumper is to solve the same problem as the +system-specific dumper, that is to be able to reload quickly, using only +a small number of files, the fully initialized lisp part of the editor, +without any system-specific hacks. + +@node Overview, Data descriptions, Dumping Justification, Dumping +@section Overview +@cindex dumping overview + +The portable dumping system has to: + +@enumerate +@item +At dump time, write all initialized, non-quickly-rebuildable data to a +file [Note: currently named @file{xemacs.dmp}, but the name will +change], along with all information needed for the reloading. + +@item +When starting xemacs, reload the dump file, relocate it to its new +starting address if needed, and reinitialize all pointers to this +data. Also, rebuild all the quickly rebuildable data. +@end enumerate + +Note: As of 21.5.18, the dump file has been moved inside of the +executable, although there are still problems with this on some systems. + +@node Data descriptions, Dumping phase, Overview, Dumping +@section Data descriptions +@cindex dumping data descriptions + +The more complex task of the dumper is to be able to write memory blocks +on the heap (lisp objects, i.e. lrecords, and C-allocated memory, such +as structs and arrays) to disk and reload them at a different address, +updating all the pointers they include in the process. This is done by +using external data descriptions that give information about the layout +of the blocks in memory. + +The specification of these descriptions is in lrecord.h. A description +of an lrecord is an array of struct memory_description. Each of these +structs include a type, an offset in the block and some optional +parameters depending on the type. For instance, here is the string +description: + +@example +static const struct memory_description string_description[] = @{ + @{ XD_BYTECOUNT, offsetof (Lisp_String, size) @}, + @{ XD_OPAQUE_DATA_PTR, offsetof (Lisp_String, data), XD_INDIRECT(0, 1) @}, + @{ XD_LISP_OBJECT, offsetof (Lisp_String, plist) @}, + @{ XD_END @} +@}; +@end example + +The first line indicates a member of type Bytecount, which is used by +the next, indirect directive. The second means "there is a pointer to +some opaque data in the field @code{data}". The length of said data is +given by the expression @code{XD_INDIRECT(0, 1)}, which means "the value +in the 0th line of the description (welcome to C) plus one". The third +line means "there is a Lisp_Object member @code{plist} in the Lisp_String +structure". @code{XD_END} then ends the description. + +This gives us all the information we need to move around what is pointed +to by a memory block (C or lrecord) and, by transitivity, everything +that it points to. The only missing information for dumping is the size +of the block. For lrecords, this is part of the +lrecord_implementation, so we don't need to duplicate it. For C blocks +we use a struct sized_memory_description, which includes a size field +and a pointer to an associated array of memory_description. + +@node Dumping phase, Reloading phase, Data descriptions, Dumping +@section Dumping phase +@cindex dumping phase + +Dumping is done by calling the function @code{pdump()} (in @file{dumper.c}) which is +invoked from Fdump_emacs (in @file{emacs.c}). This function performs a number +of tasks. + +@menu +* Object inventory:: +* Address allocation:: +* The header:: +* Data dumping:: +* Pointers dumping:: +@end menu + +@node Object inventory, Address allocation, Dumping phase, Dumping phase +@subsection Object inventory +@cindex dumping object inventory +@cindex memory blocks + +The first task is to build the list of the objects to dump. This +includes: + +@itemize @bullet +@item lisp objects +@item other memory blocks (C structures, arrays. etc) +@end itemize + +We end up with one @code{pdump_block_list_elt} per object group (arrays +of C structs are kept together) which includes a pointer to the first +object of the group, the per-object size and the count of objects in the +group, along with some other information which is initialized later. + +These entries are linked together in @code{pdump_block_list} structures +and can be enumerated thru either: + +@enumerate +@item +the @code{pdump_object_table}, an array of @code{pdump_block_list}, one +per lrecord type, indexed by type number. + +@item +the @code{pdump_opaque_data_list}, used for the opaque data which does +not include pointers, and hence does not need descriptions. + +@item +the @code{pdump_desc_table}, which is a vector of +@code{memory_description}/@code{pdump_block_list} pairs, used for +non-opaque C memory blocks. +@end enumerate + +This uses a marking strategy similar to the garbage collector. Some +differences though: + +@enumerate +@item +We do not use the mark bit (which does not exist for generic memory blocks +anyway); we use a big hash table instead. + +@item +We do not use the mark function of lrecords but instead rely on the +external descriptions. This happens essentially because we need to +follow pointers to generic memory blocks and opaque data in addition to +Lisp_Object members. +@end enumerate + +This is done by @code{pdump_register_object()}, which handles +Lisp_Object variables, and @code{pdump_register_block()} which handles +generic memory blocks (C structures, arrays, etc.), which both delegate +the description management to @code{pdump_register_sub()}. + +The hash table doubles as a map object to pdump_block_list_elmt (i.e. +allows us to look up a pdump_block_list_elmt with the object it points +to). Entries are added with @code{pdump_add_block()} and looked up with +@code{pdump_get_block()}. There is no need for entry removal. The hash +value is computed quite simply from the object pointer by +@code{pdump_make_hash()}. + +The roots for the marking are: + +@enumerate +@item +the @code{staticpro}'ed variables (there is a special +@code{staticpro_nodump()} call for protected variables we do not want to +dump). + +@item +the Lisp_Object variables registered via @code{dump_add_root_lisp_object} +(@code{staticpro()} is equivalent to @code{staticpro_nodump()} + +@code{dump_add_root_lisp_object()}). + +@item +the data-segment memory blocks registered via @code{dump_add_root_block} +(for blocks with relocatable pointers), or @code{dump_add_opaque} (for +"opaque" blocks with no relocatable pointers; this is just a shortcut +for calling @code{dump_add_root_block} with a NULL description). + +@item +the pointer variables registered via @code{dump_add_root_block_ptr}, +each of which points to a block of heap memory (generally a C structure +or array). Note that @code{dump_add_root_block_ptr} is not technically +necessary, as a pointer variable can be seen as a special case of a +data-segment memory block and registered using +@code{dump_add_root_block}. Doing it this way, however, would require +another level of static structures declared. Since pointer variables +are quite common, @code{dump_add_root_block_ptr} is provided for +convenience. Note also that internally we have to treat it separately +from @code{dump_add_root_block} rather than writing the former as a call +to the latter, since we don't have support for creating and using memory +descriptions on the fly -- they must all be statically declared in the +data-segment. +@end enumerate + +This does not include the GCPRO'ed variables, the specbinds, the +catchtags, the backlist, the redisplay or the profiling info, since we +do not want to rebuild the actual chain of lisp calls which end up to +the dump-emacs call, only the global variables. + +Weak lists and weak hash tables are dumped as if they were their +non-weak equivalent (without changing their type, of course). This has +not yet been a problem. + +@node Address allocation, The header, Object inventory, Dumping phase +@subsection Address allocation +@cindex dumping address allocation + + +The next step is to allocate the offsets of each of the objects in the +final dump file. This is done by @code{pdump_allocate_offset()} which +is called indirectly by @code{pdump_scan_by_alignment()}. + +The strategy to deal with alignment problems uses these facts: + +@enumerate +@item +real world alignment requirements are powers of two. + +@item +the C compiler is required to adjust the size of a struct so that you +can have an array of them next to each other. This means you can have an +upper bound of the alignment requirements of a given structure by +looking at which power of two its size is a multiple. + +@item +the non-variant part of variable size lrecords has an alignment +requirement of 4. +@end enumerate + +Hence, for each lrecord type, C struct type or opaque data block the +alignment requirement is computed as a power of two, with a minimum of +2^2 for lrecords. @code{pdump_scan_by_alignment()} then scans all the +@code{pdump_block_list_elmt}'s, the ones with the highest requirements +first. This ensures the best packing. + +The maximum alignment requirement we take into account is 2^8. + +@code{pdump_allocate_offset()} only has to do a linear allocation, +starting at offset 256 (this leaves room for the header and keeps the +alignments happy). + +@node The header, Data dumping, Address allocation, Dumping phase +@subsection The header +@cindex dumping, the header + +The next step creates the file and writes a header with a signature and +some random information in it. The @code{reloc_address} field, which +indicates at which address the file should be loaded if we want to avoid +post-reload relocation, is set to 0. It then seeks to offset 256 (base +offset for the objects). + +@node Data dumping, Pointers dumping, The header, Dumping phase +@subsection Data dumping +@cindex data dumping +@cindex dumping, data + +The data is dumped in the same order as the addresses were allocated by +@code{pdump_dump_data()}, called from @code{pdump_scan_by_alignment()}. +This function copies the data to a temporary buffer, relocates all +pointers in the object to the addresses allocated in step Address +Allocation, and writes it to the file. Using the same order means that, +if we are careful with lrecords whose size is not a multiple of 4, we +are ensured that the object is always written at the offset in the file +allocated in step Address Allocation. + +@node Pointers dumping, , Data dumping, Dumping phase +@subsection Pointers dumping +@cindex pointers dumping +@cindex dumping, pointers + +A bunch of tables needed to reassign properly the global pointers are +then written. They are: + +@enumerate +@item +the pdump_root_block_ptrs dynarr +@item +the pdump_opaques dynarr +@item +a vector of all the offsets to the objects in the file that include a +description (for faster relocation at reload time) +@item +the pdump_root_objects and pdump_weak_object_chains dynarrs. +@end enumerate + +For each of the dynarrs we write both the pointer to the variables and +the relocated offset of the object they point to. Since these variables +are global, the pointers are still valid when restarting the program and +are used to regenerate the global pointers. + +The @code{pdump_weak_object_chains} dynarr is a special case. The +variables it points to are the head of weak linked lists of lisp objects +of the same type. Not all objects of this list are dumped so the +relocated pointer we associate with them points to the first dumped +object of the list, or Qnil if none is available. This is also the +reason why they are not used as roots for the purpose of object +enumeration. + +Some very important information like the @code{staticpros} and +@code{lrecord_implementations_table} are handled indirectly using +@code{dump_add_opaque} or @code{dump_add_root_block_ptr}. + +This is the end of the dumping part. + +@node Reloading phase, Remaining issues, Dumping phase, Dumping +@section Reloading phase +@cindex reloading phase +@cindex dumping, reloading phase + +@subsection File loading +@cindex dumping, file loading + +The file is mmap'ed in memory (which ensures a PAGESIZE alignment, at +least 4096), or if mmap is unavailable or fails, a 256-bytes aligned +malloc is done and the file is loaded. + +Some variables are reinitialized from the values found in the header. + +The difference between the actual loading address and the reloc_address +is computed and will be used for all the relocations. + + +@subsection Putting back the pdump_opaques +@cindex dumping, putting back the pdump_opaques + +The memory contents are restored in the obvious and trivial way. + + +@subsection Putting back the pdump_root_block_ptrs +@cindex dumping, putting back the pdump_root_block_ptrs + +The variables pointed to by pdump_root_block_ptrs in the dump phase are +reset to the right relocated object addresses. + + +@subsection Object relocation +@cindex dumping, object relocation + +All the objects are relocated using their description and their offset +by @code{pdump_reloc_one}. This step is unnecessary if the +reloc_address is equal to the file loading address. + + +@subsection Putting back the pdump_root_objects and pdump_weak_object_chains +@cindex dumping, putting back the pdump_root_objects and pdump_weak_object_chains + +Same as Putting back the pdump_root_block_ptrs. + + +@subsection Reorganize the hash tables +@cindex dumping, reorganize the hash tables + +Since some of the hash values in the lisp hash tables are +address-dependent, their layout is now wrong. So we go through each of +them and have them resorted by calling @code{pdump_reorganize_hash_table}. + +@node Remaining issues, , Reloading phase, Dumping +@section Remaining issues +@cindex dumping, remaining issues + +The build process will have to start a post-dump xemacs, ask it the +loading address (which will, hopefully, be always the same between +different xemacs invocations) [[unfortunately, not true on Linux with +the ExecShield feature]] and relocate the file to the new address. +This way the object relocation phase will not have to be done, which +means no writes in the objects and that, because of the use of mmap, the +dumped data will be shared between all the xemacs running on the +computer. + +Some executable signature will be necessary to ensure that a given dump +file is really associated with a given executable, or random crashes +will occur. Maybe a random number set at compile or configure time thru +a define. This will also allow for having differently-compiled xemacsen +on the same system (mule and no-mule comes to mind). + +The DOC file contents should probably end up in the dump file. + + +@node Future Work, Future Work Discussion, Dumping, Top @chapter Future Work @cindex future work @menu +* Future Work -- General Suggestions:: * Future Work -- Elisp Compatibility Package:: * Future Work -- Drag-n-Drop:: * Future Work -- Standard Interface for Enabling Extensions:: @@ -15175,19 +16420,486 @@ * Future Work -- Lisp Engine Replacement:: @end menu -@ignore -Macro to convert a single line containing a heading into the format of -all headings in the Future Work section. - -(setq last-kbd-macro (read-kbd-macro -"<S-end> <f3> <home> @node SPC <end> RET @section SPC <f4> <home> <up> <C-right> <right> Future SPC Work SPC - - SPC <home> <down> <C-right> <right> Future SPC Work SPC - - SPC <end> RET @cindex SPC future SPC work, SPC <f4> C-r , RET C-x C-x M-l RET @cindex SPC <f4> <home> <C-right> <S-end> M-l , SPC future SPC work RET")) -@end ignore - -@node Future Work -- Elisp Compatibility Package, Future Work -- Drag-n-Drop, Future Work, Future Work +@node Future Work -- General Suggestions, Future Work -- Elisp Compatibility Package, Future Work, Future Work +@section Future Work -- General Suggestions +@cindex future work, general suggestions +@cindex general suggestions, future work + +@subheading Jamie Zawinski's XEmacs Wishlist + +This document is based on Jamie Zawinski's +@uref{http://www.jwz.org/doc/xemacs-wishlist.html,xemacs wishlist}. + Throughout this page, ``I'' refers to Jamie. + +The list has been substantially reformatted and edited to fit the needs + of this site. If you have any soul at all, you'll go check out the + original. OK? You should also check out some other +@uref{http://www.xemacs.org/Releases/Public-21.2/execution.html#wishlists,wishlists}. + + +@subsubheading About the List + +I've ranked these (roughly) from easiest to hardest; though of all of +them, I think the debugger improvements would be the most useful. I think +the combination of emacs+gdb is the best Unix development environment +currently available, but it's still lamentably primitive and extremely +frustrating (much like Unix itself), especially if you know what kinds of +features more modern integrated debuggers have. + +@subsubheading XEmacs Wishlist + +@table @strong +@item Improve the keyboard macro system. + +Keyboard macros are one of the most useful concepts that emacs has to +offer, but there's room for improvement. + +@table @strong +@item Make it possible to embed one macro inside of another. + +Often, I'll define a keyboard macro, and then realize that I've +left something out, or that there's more that I need to do; for +example, I may define a macro that does something to the current line, +and then realize that I want to apply it to a lot of lines. So, I'd +like this to work: + +@example +@kbd{C-x ( } +; start macro #1 +@kbd{... } +; (do stuff) +@kbd{C-x ) } +; done with macro #1 +@kbd{... } +; (do stuff) +@kbd{C-x ( } +; start macro #2 +@kbd{C-x e } +; execute macro #1 (splice it into macro #2) +@kbd{C-s foo } +; move forward to the next spot +@kbd{C-x ) } +; done with macro #2 +@kbd{C-u 1000 C-x e } +; apply the new macro +@end example + +That is, simply, one should be able to wrap new text around an +existing macro. I can't tell you how many times I've defined a complex +macro but left out the ``@kbd{C-n C-a}'' at the end... + +Yes, you can accomplish this with M-x name-last-kbd-macro, but +that's a pain. And it's also more permanent than I'd often like. +@item Make it possible to correct errors when defining a macro. + +Right now, the act of defining a macro stops if you get an error +while defining it, and all of the characters you've already typed into +the macro are gone. It needn't be that way. I think that, when that +first error occurs, the user should be given the option of taking the +last command off of the macro and trying again. + +The macro-reader knows where the bounds of multi-character command +sequences are, and it could even keep track of the corresponding undo +records; rubbing out the previous entry on the macro could also undo +any changes that command had made. (This should also work if the macro +spans multiple buffers, and should restore window configurations as +well.) + +You'd want multi-level undo for this as well, so maybe the way to +go would be to add some new key sequence which was used only as the +back-up-inside-a-keyboard-macro-definition command. + +I'm not totally sure that this would end up being very usable; +maybe it would be too hard to deal with. Which brings us to: +@item Make it possible to edit a keyboard macro after it has been defined. + +I only just discovered @code{edit-kbd-macro} (@kbd{C-x C-k}). +It is very, very cool. + +The trick it does of showing the command which will be executed is +somewhat error-prone, as it can only look up things in the current map +or the global map; if the macro changed buffers, it wouldn't be +displaying the right commands. (One of the things I often use macros +for is operating on many files at once, by bringing up a dired buffer +of those files, editing them, and then moving on to the next.) + +However, if the act of recording a macro also kept track of the +actual commands that had gotten executed, it could make use of that +info as well. + +Another way of editing a macro, other than as text in a buffer, +would be to have a command which single-steps a macro: you would lean +on the space bar to watch the macro execute one character (command?) +at a time, and then when you reached the point you wanted to change, +you could do some gesture to either: insert some keystrokes into the +middle of the macro and then continue; or to replace the rest of the +macro from here to the end; or something. + +Another similar hack might be to convert a macro to the equivalent +lisp code, so that one could tweak it later in ways that would be too +hard to do from the keyboard (wrapping parts of it in @code{while} loops or +something.) (@kbd{M-x insert-kbd-macro} isn't really what I'm +talking about here: I mean insert the list of commands, not the list +of keystrokes.) +@end table + +@item Save my wrists! + +In the spirit of the `@code{teach-extended-commands-p}' variable, +it would be interesting if emacs would keep track of what are the +commands I use most often, perhaps grouped by proximity or mode -- it +would then be more obvious which commands were most likely candidates +for placement on a toolbar, or popup menu, or just a more convenient key +binding. + +Bonus points if it figures out that I type ``@kbd{bt\n}'' and +``@kbd{ret\ny\n}'' into my @samp{*gdb*} buffer about a hundred +thousand times a day. +@item XmCreateFileSelectionBox + +The thing that ``File/Open...'' pops up has excellent @emph{hack} +value, but as a user interface, it's an abomination. Isn't it time +someone added a real file selection dialog already? (For the +Motifly-challenged, the Athena-based file selector that GhostView uses +seems adequate.) +@item Improve the toolbar system. + +It's great that XEmacs has a toolbar, but it's damn near impossible +to customize it. + +@table @strong +@item Make it easy to define new toolbar buttons. + +Currently, to define a toolbar button that has a text equivalent, +one must edit a pixmap, and put the text there! That's prohibitive. +One should be able to add some kind of generic toolbar button, with a +plain icon or none at all, but which has a text label, without having +to use a paint program. +@item Make it easy to have customized, mode-local toolbars. + +In my @code{c-mode-hook}, for example, I can add a couple of new +keybindings, and delete a few others, and to do that, I don't have to +duplicate the entire definition of the @code{c-mode-map}. Making +mode-local additions and subtractions to the toolbars should be as +easy. +@item Make it easy to have customized, mode-local popup menus. + +The same situation holds for the right-mouse-button popup menu; one +should be able to add new commands to those menus without difficulty. +One problem is that each mode which does have a popup menu implements +it in a different way... +@end table + +@item Make the External Widget work. + +About half of the work is done to make a replacement for the +@code{XmText} widget which offloads editing responsibility to an +external Emacs process. Someone should finish that. The benefit here +would be that then, any Motif program could be linked such that all +editing happened with a real Emacs behind it. (If you're Athena-minded, +flavor with @code{Text} instead of @code{XmText} -- it's probably +easy to make it work with both.) + +The part of this that is done already is the ability to run an Emacs +screen on a Window object that has been created by another process (this +is what the @file{ExternalClient.c} and @file{ExternalShell.c} stuff +is.) What is left to be done is, adding the text-widget-editor aspects +of this. + +First, the emacs screen being displayed on that window would have to +be one without a modeline, and one which behaved sensibly in the context +of ``I am a small multi-line text area embedded in a dialog box'' as +opposed to ``I am a full-on text editor and lord of all that I survey.'' + +Second, the API that the (non-emacs-aware) user of the +@code{XmText} widget expects would need to be implemented: give the +caller the ability to pull the edited text string back out, and so on. +The idea here being, hooking up emacs as the widget editor should be as +transparent as possible. +@item Bring the debugger interface into the eighties. + +Some of you may have seen my @file{gdb-highlight.el} +package, that I posted to gnu.emacs.sources last month. I think +it's really cool, but there should be a lot more work in that direction. +For those of you who haven't seen it, what it does is watch text that +gets inserted into the @samp{*gdb*} buffer and make very nearly +everything be clickable and have a context-sensitive menu. Generally, +the types that are noticed are: + +@itemize +@item function names; +@item variable and parameter names; +@item structure slots; +@item source file names; +@item type names; +@item breakpoint numbers; +@item stack frame numbers. +@end itemize + +Any time one of those objects is presented in the @samp{*gdb*} +buffer, it is mousable. Clicking middle button on it takes some default +action (edits the function, selects the stack frame, disables the +breakpoint, ...) Clicking the right button pops up a menu of commands, +including commands specific to the object under the mouse, and/or other +objects on the same line. + +So that's all well and good, and I get far more joy out of what this +code does for me than I expected, but there are still a bunch of +limitations. The debugger interface needs to do much, much more. + +@table @strong +@item Make gdbsrc-mode not suck. + +The idea behind @code{gdbsrc-mode} is on the side of the angels: +one should be able to focus on the source code and not on the debugger +buffer, absolutely. But the implementation is just awful. + +First and foremost, it should not change ``modes'' (in the more +general sense). Any commands that it defines should be on keys which +are exclusively used for that purpose, not keys which are normally +self-inserting. I can't be the only person who usually has occasion to +actually @emph{edit} the sources which the debugger has chosen to +display! Switching into and out of @code{gdbsrc-mode} is +prohibitive. + +I want to be looking at my sources at all times, yet I don't want +to have to give up my source-editing gestures. I think the right way +to accomplish this is to put the gdbsrc commands on the toolbar and on +popup menus; or to let the user define their own keys (I could see +devoting my @key{kp_enter} key to ``step'', or something common +like that.) + +Also it's extremely frustrating that one can't turn off gdbsrc mode +once it has been loaded, without exiting and restarting emacs; that +alone means that I'd probably never take the time to learn how to use +it, without first having taken the time to repair it... +@item Make it easier access to variable values. + +I want to be able to double-click on a variable name to highlight +it, and then drag it to the debugger window to have its value printed. + +I want gestures that let me write as well as read: for example, to +store value A into slot B. +@item Make all breakpoints visible. + +Any time there is a running gdb which has breakpoints, the buffers +holding the lines on which those breakpoints are set should have icons +in them. These icons should be context-sensitive: I should be able to +pop up a menu to enable or disable them, to delete them, to change +their commands or conditions. + +I should also be able to @emph{move} them. It's +annoying when you have a breakpoint with a complex condition or +command on it, and then you realize that you really want it to be at a +different location. I want to be able to drag-and-drop the icon to its +new home. +@item Make a debugger status display window. + +@itemize +@item + +I want a window off to the side that shows persistent information +-- it should have a pane which is a drag-editable, drag-reorderable +representation of the elements on gdb's ``display'' list; they +should be displayed here instead of being just dumped in with the +rest of the output in the @samp{*gdb*} buffer. +@item + +I want a pane that displays the current call-stack and nothing +else. I want a pane that displays the arguments and locals of the +currently-selected frame and nothing else. I want these both to +update as I move around on the stack. +@item + +Since the unfortunate reality is that excavating this information +from gdb can be slow, it would be a good idea for these panes to +have a toggle button on them which meant ``stop updating'', so that +when I want to move fast, I can, but I can easily get the display +back when I need it again. +@end itemize + +The reason for all of this is that I spend entirely too much time +scrolling around in the @samp{*gdb*} buffer; with gdb-highlight, I +can just click on a line in the backtrace output to go to that frame, +but I find that I spend a lot of time @emph{looking} for that +backtrace: since it's mixed in with all the other random output, I +waste time looking around for things (and usually just give up and +type ``@kbd{bt}'' again, then thrash around as the buffer scrolls, +and I try to find the lower frames that I'm interested in, as they +have invariably scrolled off the window already... +@item Save and restore breakpoints across emacs/debugger sessions. + +This would be especially handy given that gdb leaks like a sieve, +and with a big program, I only get a few dozen relink-and-rerun +attempts before gdb has blown my swap space. +@item Keep breakpoints in sync with source lines. + +When a program is recompiled and then reloaded into gdb, the +breakpoints often end up in less-than-useful places. For example, when +I edit text which occurs in a file anywhere before a breakpoint, emacs +is aware that the line of the bp hasn't changed, but just that it is +in a different place relative to the top of the file. Gdb doesn't know +this, so your breakpoints end up getting set in the wrong places +(usually the maximally inconvenient places, like @emph{after} a loop +instead of @emph{inside} it). But emacs knows, so emacs should +inform the debugger, and move the breakpoints back to the places they +were intended to be. +@end table + +(Possibly the OOBR stuff does some of this, but can't tell, because +I've never been able to get it to do anything but beep at me and mumble +about environments. I find it pretty funny that the manual keeps +explaining to me how intuitive it is, without actually giving me a clue +how to launch it...) +@item Add better dialog box features. + +It'd be nice to be able to create more complex dialog boxes from +emacs-lisp: ones with checkboxes, radio button groups, text fields, and +popup menus. +@item Add embeddable dialog boxes. + +One of the things that the now-defunct Energize code (the C side of +it, that is) could do was embed a dialog box between the toolbar and the +main text area -- buffers could have control panels associated with +them, that had all kinds of complex behavior. +@item Make the mark-stack be visible. + +You know, I've encountered people who have been using emacs for +years, and never use the mark stack for navigation. I can't live without +it; ``@kbd{C-u C-SPC}'' is among my most common gestures. + +@enumerate +@item + +It would be a lot easier to realize what's going to happen if the +marks on the mark stack were visible. They could be displayed as small +``caret'' glyphs, for example; something large enough to be visible, +but not easily mistaken for a character or for the cursor. +@item + +The marks and the selected region should be visible in the +scrollbar as well -- I don't remember where I first saw this idea, but +it's very cool: there's a second, less-strongly-rendered ``thumb'' in +the scrollbar which indicates the position and size of the selection; +and there are tiny tick-marks which indicate the positions of the +saved points. +@item + +Markers which are in registers (@code{point-to-register}, @kbd{C-x +/}) should be displayed differently (more prominent.) +@item + +It'd be cool if you could pick up markers and move them around, to +adjust the points you'll be coming back to later. +@end enumerate + +@item Write a new garbage collector. + +The emacs GC is very primitive; it is also, fortunately, a +rather well isolated module, and it would not be a very big task to swap +it with a new one (once that new one was written, that is.) Someone +should go bone up on modern GC techniques, and then just dive right +in... +@item Add support for lexical scope to the emacs-lisp runtime. + +Yadda yadda, this list goes to eleven. +@end table + +@* +Subject: +@strong{Re: XEmacs wishlist} +Date: Wed, 14 May 1997 16:18:23 -0700 +From: Jamie Zawinski <jwz@@netscape.com> +Newsgroups: comp.emacs.xemacs, comp.emacs + +Andreas Schwab wrote: + +@quotation +@emph{Use `C-u C-x (': } + +@emph{start-kbd-macro:@*Non-nil arg (prefix arg) means append to last +macro defined; This begins by re-executing that macro as if you typed it +again. } +@end quotation + +Cool, I didn't know it did that... + +But it only lets you append. I often want to prepend, or embed the +macro multiple times (motion 1, C-x e, motion 2, C-x e, motion 3.) + +@subheading 21.2 Showstoppers + +Author: @uref{mailto:ben@@xemacs.org,Ben Wing} + +DISTRIBUTION ISSUES + +A. Unified Source Tarball. + +Packages go under root/lib/xemacs/xemacs-packages and no one ever has +to mess with --package-path and the result can be moved from one +directory to another pre- or post-install. + + +Unified Binary Tarballs with Packages. + +Same principles as above. + +If people complain, we can also provide split binary tarballs +(architecture dependent and independent) and place these files in a +subdirectory so as not to confuse the majority just looking for one +tarball. + +Under Windows, we need to provide a WISE-style GUI setup program. It's +already there but needs some work so you can select "all" packages +easily (should be the default). + +Parallel Root and Package Trees. + +If the user downloads separately, the main source and the packages, he +will naturally untar them into the same directory. This results in the +parallel root and package structure. We should support this as a "last +resort," i.e., if we find no packages anywhere and are about to resign +ourselves to not having packages, then look for a parallel package +tree. The user who sets things up like this should be able to either +run in place or "make install" and get a proper installed +XEmacs. Never should the user have to touch --package-path. + +II. WINDOWS PRINTING + +Looks like the internals are done but not the GUI. This must be +working in 21.2. + +III. WINDOWS MULE + +Basic support should be there. There's already a patch to get things +started and I'll be doing more work to make this real. + +IV. GUTTER ETC. + +This stuff needs to be "stable" and generally free from bugs. Any +API's we create need to be well-reviewed or marked clearly as +experimental. + +V. PORTABLE DUMPER + +Last bits need to be cleaned up. This should be made the "default" for +a while to flush-out problems. Under Microsoft Windows, Portable +Dumper must be the default in 21.2 because of the problems with the +existing dump process. + +COMMENT: I'd like to feature freeze this pretty soon and create a 21.3 +tree where all of my major overhauls of Mule-related stuff will go +in. At the same time or around, we need to do the move-around in the +repository (or create a new one) and "upgrade" to the latest CVS +server. + +@node Future Work -- Elisp Compatibility Package, Future Work -- Drag-n-Drop, Future Work -- General Suggestions, Future Work @section Future Work -- Elisp Compatibility Package @cindex future work, elisp compatibility package @cindex elisp compatibility package, future work +Author: @uref{mailto:ben@@xemacs.org,Ben Wing} + A while ago I created a package called Sysdep, which aimed to be a forward compatibility package for Elisp. The idea was that instead of having to write your package using the oldest version of Emacs that you @@ -15322,13 +17034,13 @@ macrolet call, and so there doesn't appear to be any point in extracting them). -@uref{../../www.666.com/ben/default.htm,Ben Wing} - @node Future Work -- Drag-n-Drop, Future Work -- Standard Interface for Enabling Extensions, Future Work -- Elisp Compatibility Package, Future Work @section Future Work -- Drag-n-Drop @cindex future work, drag-n-drop @cindex drag-n-drop, future work +Author: @uref{mailto:ben@@xemacs.org,Ben Wing} + @strong{Abstract:} I propose completely redoing the drag-n-drop interface to make it powerful and extensible enough to support such concepts as drag over and drag under visuals and context menus invoked @@ -15439,13 +17151,13 @@ refer to the @code{current-mouse-event} variable, and in fact, this variable should not be changed at all during a drag or a drop. -@uref{../../www.666.com/ben/default.htm,Ben Wing} - @node Future Work -- Standard Interface for Enabling Extensions, Future Work -- Better Initialization File Scheme, Future Work -- Drag-n-Drop, Future Work @section Future Work -- Standard Interface for Enabling Extensions @cindex future work, standard interface for enabling extensions @cindex standard interface for enabling extensions, future work +Author: @uref{mailto:ben@@xemacs.org,Ben Wing} + @strong{Abstract:} Apparently, if you know the name of a package (for example, @code{fusion}), you can load it using the @code{require} function, but there's no standard way to turn it on or turn it off. The @@ -15520,13 +17232,13 @@ the package is. Both of these sorts of judgments could be obtained by doing user surveys if need be. -@uref{../../www.666.com/ben/default.htm,Ben Wing} - @node Future Work -- Better Initialization File Scheme, Future Work -- Keyword Parameters, Future Work -- Standard Interface for Enabling Extensions, Future Work @section Future Work -- Better Initialization File Scheme @cindex future work, better initialization file scheme @cindex better initialization file scheme, future work +Author: @uref{mailto:ben@@xemacs.org,Ben Wing} + @strong{Abstract:} A proposal is outlined for converting XEmacs to use the @code{.xemacs} subdirectory for its initialization files instead of putting them in the user's home directory. In the process, a general @@ -15627,12 +17339,10 @@ like what would be in a package root), then it becomes the value of the init file directory. Otherwise the user's home directory is used. @item - If the init file directory is the user's home directory, then the init file is called @code{.emacs}. Otherwise, it's called @code{init.el}. @item - If the init file directory is the user's home directory, then the pre-init file is called @code{.xemacs-pre-init.el}. Otherwise it's @@ -15640,7 +17350,6 @@ with the dialog box that might be displayed at startup. This will be described below.) @item - If the init file directory is the user's home directory, then the custom init file is called @code{.xemacs-custom-init.el}. Otherwise, it's @@ -15714,13 +17423,13 @@ always be created and mapped at that time so that the error is displayed and the debugger has a place to be invoked. -@uref{../../www.666.com/ben/default.htm,Ben Wing} - @node Future Work -- Keyword Parameters, Future Work -- Property Interface Changes, Future Work -- Better Initialization File Scheme, Future Work @section Future Work -- Keyword Parameters @cindex future work, keyword parameters @cindex keyword parameters, future work +Author: @uref{mailto:ben@@xemacs.org,Ben Wing} + NOTE: These changes are partly motivated by the various user-interface changes elsewhere in this document, and partly for Mule support. In general the various API's in this document would benefit greatly from @@ -15772,7 +17481,6 @@ The subr object type needs to be modified to contain additional slots for the number and names of any keyword parameters. @item - The implementation of the @code{funcall} function needs to be modified so that it knows how to process keyword parameters. This is the only @@ -15780,7 +17488,6 @@ logic that would need to be added can be lifted directly from the @code{cl} code. @item - A new macro, similar to the @code{DEFUN} macro, and probably called @code{DEFUN_WITH_KEYWORDS}, needs to be defined so that built-in Lisp @@ -15804,25 +17511,21 @@ @code{DEFUN_WITH_KEYWORDS} macro, and probably isn't worth implementing). @item - The byte compiler would have to be modified slightly so that it knows about keyword parameters when it parses the parameter declaration of a function. For example, so that it issues the correct warnings concerning calls to that function with incorrect arguments. @item - The @code{make-docfile} program would have to be modified so that it generates the correct parameter lists for primitives defined using the @code{DEFUN_WITH_KEYWORDS} macro. @item - Possibly other aspects of the help system that deal with function descriptions might have to be modified. @item - A helper function might need to be defined to make it easier for primitives that use both the @code{&rest} and @code{&key} @@ -15892,6 +17595,8 @@ @cindex future work, property interface changes @cindex property interface changes, future work +Author: @uref{mailto:ben@@xemacs.org,Ben Wing} + In my past work on XEmacs, I already expanded the standard property functions of @code{get}, @code{put}, and @code{remprop} to work on objects other than symbols and defined an additional function @@ -15913,7 +17618,6 @@ @code{remprop}. Note also that for some built-in properties, setting the property to its default value is equivalent to making it unbound. @item - The behavior of the @code{get} function is modified. If the @code{get} function is called on a property that is unbound and the third, optional @@ -15927,26 +17631,22 @@ expects to get @code{nil} returned if the property is unbound, is almost certainly wrong anyway. @item - A new function, @code{get1} is defined. This function does not take a default argument like the @code{get} function. Instead, if the property is unbound, an error is signaled. Note: @code{get} can be implemented in terms of @code{get1}. @item - New functions @code{property-default-value} and @code{property-bound-p} are defined with the obvious semantics. @item - An additional function @code{property-built-in-p} is defined which takes two arguments, the first one being a symbol naming an object type, and the second one specifying a property, and indicates whether the property name has a built-in meaning for objects of that type. @item - It is not necessary, or even desirable, for all object types to allow user-defined properties. It is always possible to simulate user-defined @@ -15957,12 +17657,10 @@ @code{undefined-property}, when given any property other than those that are predefined. @item - A function called @code{user-defined-properties-allowed-p} should be defined with the obvious semantics. (See the previous item.) @item - Three more functions should be defined, called @code{built-in-property-name-list}, @code{property-name-list}, and @@ -15988,7 +17686,6 @@ :put #'(lambda (obj key value) (puthash key obj value))) @end example - @node Future Work -- Toolbars, Future Work -- Menu API Changes, Future Work -- Property Interface Changes, Future Work @section Future Work -- Toolbars @cindex future work, toolbars @@ -16004,6 +17701,8 @@ @cindex future work, easier toolbar customization @cindex easier toolbar customization, future work +Author: @uref{mailto:ben@@xemacs.org,Ben Wing} + @strong{Abstract:} One of XEmacs' greatest strengths is its ability to be customized endlessly. Unfortunately, it is often too difficult to figure out how to do this. There has been some recent work like the @@ -16055,13 +17754,13 @@ (This is incomplete.....) -@uref{../../www.666.com/ben/default.htm,Ben Wing} - @node Future Work -- Toolbar Interface Changes, , Future Work -- Easier Toolbar Customization, Future Work -- Toolbars @subsection Future Work -- Toolbar Interface Changes @cindex future work, toolbar interface changes @cindex toolbar interface changes, future work +Author: @uref{mailto:ben@@xemacs.org,Ben Wing} + I propose changing the way that toolbars are specified to make them more flexible. @@ -16207,6 +17906,7 @@ @cindex future work, menu API changes @cindex menu API changes, future work +Author: @uref{mailto:ben@@xemacs.org,Ben Wing} @enumerate @item @@ -16260,13 +17960,13 @@ @end enumerate -@uref{../../www.666.com/ben/default.htm,Ben Wing} - @node Future Work -- Removal of Misc-User Event Type, Future Work -- Mouse Pointer, Future Work -- Menu API Changes, Future Work @section Future Work -- Removal of Misc-User Event Type @cindex future work, removal of misc-user event type @cindex removal of misc-user event type, future work +Author: @uref{mailto:ben@@xemacs.org,Ben Wing} + @strong{Abstract:} This page describes why the misc-user event type should be split up into a number of different event types, and how to do this. @@ -16314,6 +18014,8 @@ @cindex future work, abstracted mouse pointer interface @cindex abstracted mouse pointer interface, future work +Author: @uref{mailto:ben@@xemacs.org,Ben Wing} + @strong{Abstract:} We need to create a new image format that allows standard pointer shapes to be specified in a way that works on all Windows systems. I suggest that this be called @code{pointer}, which @@ -16338,13 +18040,13 @@ as an obsolete alias. The @code{resource} format was added so recently that it's possible that we can just change it. -@uref{../../www.666.com/ben/default.htm,Ben Wing} - @node Future Work -- Busy Pointer, , Future Work -- Abstracted Mouse Pointer Interface, Future Work -- Mouse Pointer @subsection Future Work -- Busy Pointer @cindex future work, busy pointer @cindex busy pointer, future work +Author: @uref{mailto:ben@@xemacs.org,Ben Wing} + Automatically make the mouse pointer switch to a busy shape (watch signal) when XEmacs has been "busy" for more than, e.g. 2 seconds. Define the @dfn{busy time} as the time since the last time that XEmacs was @@ -16395,6 +18097,8 @@ @cindex future work, everything should obey duplicable extents @cindex everything should obey duplicable extents, future work +Author: @uref{mailto:ben@@xemacs.org,Ben Wing} + A lot of functions don't properly track duplicable extents. For example, the @code{concat} function does, but the @code{format} function does not, and extents in keymap prompts are not displayed either. All @@ -16425,13 +18129,13 @@ exact thing, and it should be easy to create a modified version of this function. -@uref{../../www.666.com/ben/default.htm,Ben Wing} - @node Future Work -- Version Number and Development Tree Organization, Future Work -- Improvements to the @code{xemacs.org} Website, Future Work -- Extents, Future Work @section Future Work -- Version Number and Development Tree Organization @cindex future work, version number and development tree organization @cindex version number and development tree organization, future work +Author: @uref{mailto:ben@@xemacs.org,Ben Wing} + @strong{Abstract:} The purpose of this proposal is to present a coherent plan for how development branches in XEmacs are managed. This will cover such issues as stable versus experimental branches, creating new @@ -16726,13 +18430,13 @@ @end enumerate -@uref{../../www.666.com/ben,Ben Wing} - @node Future Work -- Improvements to the @code{xemacs.org} Website, Future Work -- Keybindings, Future Work -- Version Number and Development Tree Organization, Future Work @section Future Work -- Improvements to the @code{xemacs.org} Website @cindex future work, improvements to the @code{xemacs.org} website @cindex improvements to the @code{xemacs.org} website, future work +Author: @uref{mailto:ben@@xemacs.org,Ben Wing} + The @code{xemacs.org} web site is the face that XEmacs presents to the outside world. In my opinion, its most important function is to present information about XEmacs in such a way that solicits new XEmacs users @@ -16831,8 +18535,6 @@ @uref{news:comp.os.linux.announce,comp.os.linux.announce}, and the Windows announcement news group) etc. -@uref{../../www.666.com/ben/default.htm,Ben Wing} - @node Future Work -- Keybindings, Future Work -- Byte Code Snippets, Future Work -- Improvements to the @code{xemacs.org} Website, Future Work @section Future Work -- Keybindings @cindex future work, keybindings @@ -16849,6 +18551,8 @@ @cindex future work, keybinding schemes @cindex keybinding schemes, future work +Author: @uref{mailto:ben@@xemacs.org,Ben Wing} + @strong{Abstract:} We need a standard mechanism that allows a different global key binding schemes to be defined. Ideally, this would be the @uref{keyboard-actions.html,keyboard action interface} that I have @@ -16867,6 +18571,8 @@ @cindex future work, better support for windows style key bindings @cindex better support for windows style key bindings, future work +Author: @uref{mailto:ben@@xemacs.org,Ben Wing} + @strong{Abstract:} This page describes how we could create an XEmacs extension that modifies the global key bindings so that a Windows user would feel at home when using the keyboard in XEmacs. Some of these @@ -16933,13 +18639,13 @@ bindings (which would be Viper mode), Windows-style bindings, Brief, CodeWright, Visual C++, or whatever we manage to implement. -@uref{../../www.666.com/ben/default.htm,Ben Wing} - @node Future Work -- Misc Key Binding Ideas, , Future Work -- Better Support for Windows Style Key Bindings, Future Work -- Keybindings @subsection Future Work -- Misc Key Binding Ideas @cindex future work, misc key binding ideas @cindex misc key binding ideas, future work +Author: @uref{mailto:ben@@xemacs.org,Ben Wing} + @itemize @item M-123 ... do digit arg @@ -17000,6 +18706,8 @@ @cindex future work, byte code snippets @cindex byte code snippets, future work +Author: @uref{mailto:ben@@xemacs.org,Ben Wing} + @itemize @item For use in time critical (e.g. redisplay) places such as display @@ -17029,6 +18737,7 @@ @menu * Future Work -- Autodetection:: * Future Work -- Conversion Error Detection:: +* Future Work -- Unicode:: * Future Work -- BIDI Support:: * Future Work -- Localized Text/Messages:: @end menu @@ -17040,7 +18749,9 @@ There are various proposals contained here. -@subsection New Implementation of Autodetection Mechanism +@subheading New Implementation of Autodetection Mechanism + +Author: @uref{mailto:ben@@xemacs.org,Ben Wing} The current auto detection mechanism in XEmacs Mule has many problems. For one thing, it is wrong too much of the time. Another @@ -17183,10 +18894,112 @@ user says "no, they're not sure," then the same list of choices as previously mentioned will be presented. -@subheading Implementation of Coding System Priority Lists in Various Locales - -@example -@enumerate +@subheading RFC: Autodetection + +Also appeared under heading "Implementation of Coding System Priority +Lists in Various Locales" ? + +Author: @uref{mailto:stephen@@xemacs.org,Stephen Turnbull} + +Date: 11/1/1999 2:48 AM + +@example +>>>>> "Hrvoje" == Hrvoje Niksic <hniksic@@srce.hr> writes: + + [Ben sez:] + + >> You are perfectly free to set up your XEmacs like this, but + >> XEmacs/Mule @strong{will} autodetect by default if there is no + >> Content-Type: info and no reason to believe we are dealing with + >> binary files. + + Hrvoje> In that case, it will be a serious mistake to make + Hrvoje> --with-mule the default, ever. I think more care should + Hrvoje> be shown in meeting the need of European users. +@end example + +Hrvoje, I don't understand what you are worrying about. I suspect you +are worrying about Handa's hyperactive and obstinate Mule, not what +Ben has in mind. Yes, Ben has said "better guessing," but that's +simply not reasonable without substantial language environment +information. I think trying to detect Latin-1 vs Latin-2 in the POSIX +locale would be a big mistake, I think trying to guess Big 5 v. Shift +JIS in a European locale would be a big mistake. + +If Ben doesn't mean "more appropriate use of language environment +information" when he writes "better guessing," I, as much as you, want +to see how he plans to do that. Ben? ("Yes/no/oops I need to think +about it" is good enough if you have specifics you intend to put in +the RFC you're planning to present.) + +Let me give a formal proposal of what I would like to see in the +autodetection specification. + +@enumerate +@item +Definitions + +@enumerate +@item +@dfn{Autodetection} means detecting and making available to Mule +the external file's encoding. See (5), below. It doesn't +imply any specific actions based on that information. + +@item +The @dfn{default} case is POSIX locale, and no environment +information in ~/.emacs. + +N.B. This @strong{will} cause breakage for all 1-byte users because +the default case can no longer assume Latin-1. You @strong{may} be +able to use the TTY font or the Xt -font option to fake this, +and default to iso8859-1; I would hope that we would not use +such a kludge in the beta versions, although it might be +satisfactory for general use. In particular, encodings like +VISCII (Vietnamese) and I believe KOI-8 (Cyrillic) are not +ISO-2022-clean, but using C1 control characters as a heuristic +for detecting binary files is useful. + +If we do allow it, I think that XEmacs should bitch and warn +that the practices of implicitly specifying language +environment by -font and defaulting on TTYs is deprecated and +likely to be obsoleted. + +@item +The @dfn{European} case is any Latin-* locale, either implied by +setlocale() and friends or set in ~/.emacs. Latin-1 is +specifically not given precedence over other Latin-*, or +non-Latin or non-ISO-8859 for that matter. I suspect but am +not sure that this case extends to all ISO-8859 encodings, and +possibly to non-ISO-8859 single-byte encodings like KOI-8r (in +particular when combined in a class with ISO-8859 encodings). + +@item +The @dfn{CJK} case is any CJK locale. Japanese is specifically +not given precedence over other Asian locales. + +@item +For completeness, define the @dfn{Unicode} case (Unicode +unfortunately has lots of junk such as precomposed characters, +language tags, and directionality indicators in it; we +probably don't care yet, but we should also not claim +compliance) and the @dfn{general} case (which has a lot of +features similar to Unicode, but lacks the advantage of a +unified encoding). This proposal has no idea how to handle +the special features of these, or even if that matters. The +general case includes stuff that nobody here really knows how +it works, like Tibetan and Ethiopic. +@end enumerate + +Each of the following cases is given in the order of priority of +detection. I'm not sure I'm serious about the top priority given the +(optional) Unicode detection. This may be appropriate if Ben is +right that ISO-2022 is going to disappear, but possibly not until then +(two two-byte sequences out of 65536 is probably 1.99 too many). It +probably isn't too risky if (6)(c) is taken pretty seriously; a Unicode +file should contain _no_ private use characters unless the encoding is +explicitly specified, and that's a block of 1/10 of the code space, +which should help a lot in detecting binary files. + @item Default locale @@ -17293,7 +19106,6 @@ @item Unicode and general locales; multilingual use -@end enumerate @enumerate @item @@ -17313,7 +19125,210 @@ and betting that a file containing many characters from that set is Shift JIS. @end enumerate -@end example + +@item +Relationship to decoding semantics + +@enumerate +@item +Autodetection should be run on every input stream unless the +user explicitly disables it. + +@item +The (conceptual) default procedure is + +@item +Read the file into the buffer + +Announce the result of autodetection to the user. + +User may request decoding, with autodetected encoding(s) +given priority in a list of available encodings. + +zations (see (e) below) should avoid introducing data +tion that this default procedure would avoid. + +sly, it can't be perfect if any autodecoding is done; +like Hrvoje should have an easily available option to + to this default (or an optimized approximation which +t actually read the whole file into a buffer) or simply +y everything as binary (with the "font" for binary files +a user option). + +@item +This implies that we should be detecting conditions in the +tail of the file which violate the implicit assumptions of the +coding system autodetected (eg, in UTF-8 illegal UTF-8 +sequences, including those corresponding to surrogates) should +raise a warning; the buffer should probably be made read-only +and the user prompted. + +This could be taken to extremes, like checking by table +whether all characters in a Japanese file are actually +legitimate JIS codes; that's insane (and would cause corporate +encodings to be recognized as binary). But we should think +about the idea that autodetection shouldn't mean XEmacs can't +change its mind. + +@item +A flexible means for the user to delegate the decision +(conditional on the result of autodetection) to decode or not +to XEmacs or a Lisp program should be provided (eg, the +coding priority list and/or a file-coding-alist). + +@item +Optimized operations (eg, the current lstreams) should be +provided, with the recognition that if they depend on sampling +the file they are risky. + +@item +Mule should provide a reasonable set of default delegations +(as in (d) above) for as many locales as possible. +@end enumerate + +@item +Implementation + +@enumerate +@item +I think all the decision logic suggested above can be +accomplished through a coding-priority-list and appropriate +initializations for different language environments, and a +file-coding-alist. + +@item +Many of the tests on the file's tail shouldn't be very +expensive; in particular, all of the ones I've suggested are +O(n) although they might involve moderate-sized auxiliary +tables for efficiency (eg, 64kB for a single Unicode-oriented +test). +@end enumerate +@end enumerate + +Other comments: + +It might be reasonable given Hrvoje's objections to require that any +autodetection that could cause data loss (any coding system that +involves escape sequences, and only those AFAIK: by design translation +to Unicode is invertible) by default prompt the user (presumable with +a novice-like ability to retain the prompt, always default to binary, +or always default to the autodetected encoding) in the future, at +least in locales that don't need it (POSIX, Latin-any). + +Ben thinks that we can remember the input data; I think it's going to +be hard to comprehensively test that a highly optimized version works. +Good design will help, but ISO-2022 is enormously complex, and there +are many encodings that violate even its lax assumptions. On the +other hand, memory is the only way to get non-rewindable streams right. + +Hrvoje himself said he would like to have an XEmacs that distinguishes +between Latin-1 and Latin-2 text. Where it is possible to do that, +this is exactly what autodetection of ISO-2022 and Unicode gives you. +Many people would want that, even at some risk of binary corruption. + + >> Once again I remind you that XEmacs is a @strong{text} editor. There + >> are lots of files that potentially may have Japanese etc. in + >> them without this marked, e.g. C or Elisp files in the XEmacs + >> source. Surely you're not arguing that we interpret even these + >> files as binary by default? + + Hrvoje> I am. If I want to see Japanese, I'll setup my + Hrvoje> environment that way. But I don't, and neither do 99% of + Hrvoje> Croatian users. I can't speak for French, Italian, and + Hrvoje> others, but I'd assume similar. + + Hrvoje> If there is Japanese in the source files, I will see it as + Hrvoje> escape sequences, which is perfectly fine, because I don't + Hrvoje> read Japanese. + +And some (European) people will have their terminals scrambled, +because Shift-JIS contains sequences that can change the state of +XTerm (as do fixed-width Unicode and Big5). This may also be a +problem with some Windows-12xx encodings; I'm not sure they all are +ISO-2022-clean. (This isn't a problem for XEmacs native X11 frames or +native MS-Windows frames, and the XEmacs sources themselves are all in +7-bit ISO-2022 now IIRC. But it is a potential source of great +frustration for many users.) + +I think that should be considered too, although it is presumably lower +priority than the data corruption of binary files. + +@subheading Response to RFC: Autodetection + +Author: @uref{mailto:ben@@xemacs.org,Ben Wing} + +Date: 11/1/1999 7:24 AM + +Stephen, thank you very much for writing this up. I think it is a good start, +and definitely moving in the direction I would like to see things going: more +proposals, less arguing. (aka "more light, less heat") However, I have some +suggestions for cleaning this up: + +You should try to make it more layered. For example, you might have one +section devoted to the workings of autodetection, which starts out like this +(the section numbers below are totally arbitrary): + +@subsubheading Section 5 + +@code{Autodetect()} is a function whose arguments are (1) a readable stream, (2) some +hints indicating how the autodetection is to proceed, and (3) a value +indicating the maximum number of characters to examine at the beginning of the +stream. (Possibly, the value in (3) may be some special symbol indicating +that we only go as far as the next line, or a certain number of lines ahead; +this would be used as part of "continuous autodetection", e.g. we are decoding +the results of an interactive terminal session, where the user may +periodically switch encodings, line terminations, etc. as different programs +get run and/or telnet or similar sessions are entered into and exited.) We +assume the stream is rewindable; if not, insert a "rewinding" stream in front +of the non-rewinding stream; this kind of stream automatically buffers the +data as necessary. +[You can use pseudo-code terminology here. No need for straight C or ELisp.] +[Then proceed to describe what the hints look like -- e.g. you could portray +it as a property list or whatever. The idea is that, for each locale, there +is a corresponding hints value that is used at least by default. The hints +structure also has to be set up to allow for two or more competing hints +specifications to be merged together. For example, the extension of a file +might provide an additional hint or hints about how to interpret the data of +that file, and the caller of @code{autodetect()}, when calling @code{autodetect()} on such a +file, would need to have a way of gracefully merging the default hints +corresponding to the locale with the more specific hints provided by the +extension. Furthermore, users like Hrvoje might well want to provide their +own hints to supplement and override parts of the generic hints -- e.g. "I +don't ever want to see non-European encodings decoded; treat them as binary +instead".] +[Then describe algorithmically how the autodetection works. First, you could +describe it more generally, i.e. presenting an algorithmic overview, then you +could discuss in detail exactly how autodetection of a particular type of +external encoding works -- e.g. "for iso2022, we first look for an escape +character, followed by a byte in this range [. ... .] etc."] + +@subsubheading Section 6 + +This section describes the concept of a locale in XEmacs, and how it is +derived from the user's environment. A locale in XEmacs is a pair, a country +and a language, together determining the handling of locale-specific areas of +XEmacs. All locale-specific areas in XEmacs make use of this XEmacs locale, +and do not attempt to derive the locale from any other sources. The user is +free to change the current locale at any time; accessor and mutator functions +are provided to do this so that various locale-specific areas can optionally +be changed together with it. + +[Then you describe how the XEmacs locale is extracted from .emacs, from +@code{setlocale()}, from the LANG environment variables, from -font, or wherever +else. All other sections assume this dirty work is done and never even +mention it] + +@subsubheading Section 7 + +[Here you describe the default @code{autodetect()} hints value corresponding to each +possible locale. You should probably use a schematic description here, e.g. +an actual Lisp property list, liberally commented.] + +@subsubheading Section 8 etc. + +[Other sections cover anything I've missed. By being very careful to separate +out the layers, you simultaneously introduce more rigor (easier to catch bugs) +and make it easier for someone else to understand it completely.] @subheading Better Algorithm, More Flexibility, Different Levels of Certainty @@ -17323,6 +19338,8 @@ @subheading Another Autodetection Proposal +Author: @uref{mailto:ben@@xemacs.org,Ben Wing} + however, in general the detection code has major problems and needs lots of work: @@ -17332,7 +19349,7 @@ more flexible system, with various levels of likelihood. Currently I've created a system with six levels, as follows: -[see file-coding.h] +[see @file{file-coding.h}] Let's consider what this might mean for an ASCII text detector. (In order to have accurate detection, especially given the iteration I @@ -17501,6 +19518,8 @@ ***** +Author: @uref{mailto:stephen@@xemacs.org,Stephen Turnbull} + While this is clearly something of an improvement over earlier designs, it doesn't deal with the most important issue: to do better than categories (which in the medium term is mostly going to mean "which flavor of Unicode @@ -17527,13 +19546,15 @@ --sjt -@node Future Work -- Conversion Error Detection, Future Work -- BIDI Support, Future Work -- Autodetection, Future Work -- Byte Code Snippets +@node Future Work -- Conversion Error Detection, Future Work -- Unicode, Future Work -- Autodetection, Future Work -- Byte Code Snippets @subsection Future Work -- Conversion Error Detection @cindex future work, conversion error detection @cindex conversion error detection, future work @subheading "No Corruption" Scheme for Preserving External Encoding when Non-Invertible Transformation Applied +Author: @uref{mailto:ben@@xemacs.org,Ben Wing} + A preliminary and simple implementation is: @quotation @@ -17602,6 +19623,8 @@ @subheading Another Error-Catching Idea +Author: @uref{mailto:ben@@xemacs.org,Ben Wing} + Nov 4, 1999 Finally, I don't think "save the input" is as hard as you make it out to @@ -17621,7 +19644,7 @@ @subheading Strategies for Error Annotation and Coding Orthogonalization -From sjt (?): +Author: @uref{mailto:stephen@@xemacs.org,Stephen Turnbull} We really want to separate out a number of things. Conceptually, there is a nested syntax. @@ -17662,7 +19685,7 @@ @subheading Handling Writing a File Safely, Without Data Loss -From ben: +Author: @uref{mailto:ben@@xemacs.org,Ben Wing} @quotation When writing a file, we need error detection; otherwise somebody @@ -17717,7 +19740,7 @@ @end enumerate @end quotation ---ben +Author: @uref{mailto:stephen@@xemacs.org,Stephen Turnbull} I don't much like Ben's scheme. First, this isn't an issue of I/O, it's a coding issue. It can happen in many places, not just on stream @@ -17749,11 +19772,151 @@ --sjt -@node Future Work -- BIDI Support, Future Work -- Localized Text/Messages, Future Work -- Conversion Error Detection, Future Work -- Byte Code Snippets +@node Future Work -- Unicode, Future Work -- BIDI Support, Future Work -- Conversion Error Detection, Future Work -- Byte Code Snippets +@subsection Future Work -- Unicode +@cindex future work, unicode +@cindex unicode, future work + +Author: @uref{mailto:ben@@xemacs.org,Ben Wing} + +Following is an old proposal. Unicode has been implemented already, in +a different fashion; but there are some ideas here for more general +support, e.g. properties of Unicode characters other than their mappings +to particular charsets. + + +We recognize 128, [256], 128x128, [256x256] for source charsets; + +for Unicode, 256x256 or 16x256x256. + +In all cases, use tables of tables and substitute a default subtable +if entire row is empty. + +If destination is Unicode, either 16 or 32 bits. + +If destination is charset, either 8 or 16 bits. + +For the moment, since we only do 94, 96, 94x94 or 96x96, only do 128 +or 128x128 for source charsets and use the range 33-126 or 32-127. +(Except ASCII - we special case that and have no table because we can +algorithmically translate) + +Also have a 16x256x256 table -> 32 bits of Unicode char properties. + +A particular charset contains two associated mapping tables, for both +directions. + +API is set-unicode-mapping: + +@example +(set-unicode-mapping + unicode char + unicode charset-code charset-offset + unicode vector of char + unicode list of char + unicode string of char + unicode vector or list of codes charset-offset +@end example + + Establishes a mapping between a unicode codepoint (an integer) and + one or more chars in a charset. The mapping is automatically + established in both directions. Chars in a charset can be specified + either with an actual character or a codepoint (i.e. an integer) + and the charset it's within. If a sequence of chars or charset + points is given, multiple mappings are established for consecutive + unicode codepoints starting with the given one. Charset codepoints + are specified as most-significant x 256 + least significant, with + both bytes in the range 33-126 (for 94 or 94x94) or 32-127 (for 96 + or 96x96), unless an offset is given, which will be subtracted from + each byte. (Most common values are 128, for codepoints given with + the high bit set, or -32, for codepoints given as 1-94 or 0-95.) + +Other API's: + +@example +(write-unicode-mapping file charset) +@end example + + Write the mapping table for a particular charset to the specified + file. The tables are written in an internal format that allows for + efficient loading, for portability across platforms and XEmacs + invocations, for conserving space, for appending multiple tables one + directly after another with no need for a directory anywhere in the + file, and for reorganizing a file as in this format (with a magic + sequence at the beginning). The data will be appended at the end of + a file, so that multiple tables can be written to a file; remove the + file first to avoid this. + +@example +(write-unicode-properties file unicode-codepoint length) +@end example + + Write the Unicode properties (not including charset mappings) for + the specified range of contiguous Unicode codepoints to the end of + the file (i.e. append mode) in a binary format similar to what was + mentioned in the write-unicode-mapping description and with the same + features. + +Extension to set-unicode-mapping: + +@example +(set-unicode-mapping + list-or-vector-of-unicode-codepoints char + "" charset-code charset-offset + "" sequence of char + "" list-or-vector-of-codes +charset-offset +@end example + + The first two forms are conceptually the inverse of the forms above + to specify characters for a contiguous range of Unicode codepoints. + These new forms let you specify the Unicode codepoints for a + contiguous range of chars in a charset. "Contiguous" here means + that if we run off the end of a row, we go to the first entry of the + next row, rather than to an invalid code point. For example, in a + 94x94 charset, valid rows and columns are in the range 0x21-0x7e; + after 0x457c 0x457d 4x457e goes 0x4621, not something like 0x457f, + which is invalid. + + The final two forms are the most general, letting you specify an + arbitrary set of both Unicode points and charset chars, and the two + are matched up just like a series of individual calls. However, if + the lists or vectors do not have the same length, an error is + signaled. + +@example +(load-unicode-mapping file &optional charset) +@end example + + If charset is omitted, loads all charset mapping tables found and + returns a list of the charsets found. If charset is specified, + searches through the file for the appropriate mapping tables. (This + is extremely fast because each entry in the file gives an offset to + the next one). Returns t if found. + +@example +(load-unicode-properties file unicode-codepoint) +@end example + +@example +(list-unicode-entries file) +@end example + +@example +(autoload-unicode-mapping charset) +@end example + +... + +(unfinished) + +@node Future Work -- BIDI Support, Future Work -- Localized Text/Messages, Future Work -- Unicode, Future Work -- Byte Code Snippets @subsection Future Work -- BIDI Support @cindex future work, bidi support @cindex bidi support, future work +Author: @uref{mailto:ben@@xemacs.org,Ben Wing} + @enumerate @item Use text properties to handle nesting levels, overrides @@ -17815,6 +19978,8 @@ @subsection Proposal for How This All Ought to Work +Author: @uref{mailto:jwz@@jwz.org,Jamie Zawinski} + this isn't implemented yet, but this is the plan-in-progress In general, it's accepted that the best way to internationalize is for all @@ -17862,7 +20027,7 @@ So, we should endeavor to minimize the impact on the lisp code. Certain primitive lisp routines (the stuff in lisp/prim/, and especially in -cmdloop.el and minibuf.el) may need to be changed to know about translation, +@file{cmdloop.el} and @file{minibuf.el}) may need to be changed to know about translation, but that's an ideologically clean thing to do because those are considered a part of the emacs substrate. @@ -17882,7 +20047,7 @@ already, and the translation will get done in some more centralized, lower level place. -This program (make-msgfile.c) addresses the first part, extracting the +This program (@file{make-msgfile.c}) addresses the first part, extracting the strings. For the emacs C code, we need to recognize the following patterns: @@ -17931,17 +20096,18 @@ is a commonly used wrapper around an eventual call to @code{message} or @code{read-from-minibuffer} needs to be recognized by this program. - @example (dgettext "domain-name" "string") #### do we still need this? things that should probably be restructured: - @code{princ} in cmdloop.el - @code{insert} in debug.el + @code{princ} in @file{cmdloop.el} + @code{insert} in @file{debug.el} face-interactive - help.el, syntax.el all messed up + @file{help.el}, @file{syntax.el} all messed up @end example +Author: @uref{mailto:ben@@xemacs.org,Ben Wing} + ben: (format) is a tricky case. If I use format to create a string that I then send to a file, I probably don't want the string translated. On the other hand, If the string gets used as an argument to (y-or-n-p) @@ -18053,14 +20219,16 @@ opposed to having been constructed at run time as it would in the above case.) To solve this: -@example - - @code{Fmessage()} takes a lisp string as its first argument. - If that string is a constant, that is, was read from a source file - as a literal, then it calls @code{message()} with it, which translates. - Otherwise, it calls @code{message_no_translate()}, which does not translate. - - - @code{Ferror()} (actually, @code{Fsignal()} when condition is Qerror) works similarly. -@end example +@itemize @bullet +@item +@code{Fmessage()} takes a lisp string as its first argument. +If that string is a constant, that is, was read from a source file +as a literal, then it calls @code{message()} with it, which translates. +Otherwise, it calls @code{message_no_translate()}, which does not translate. + +@item +@code{Ferror()} (actually, @code{Fsignal()} when condition is Qerror) works similarly. +@end itemize More specifically, we do: @@ -18102,7 +20270,7 @@ @item @code{snarf()} should be modified so that it doesn't output null strings and non-textual strings (see the comment at the top -of make-msgfile.c). +of @file{make-msgfile.c}). @item parsing of (insert) should snarf all of the arguments. @item @@ -18142,6 +20310,8 @@ @cindex future work, Lisp stream API @cindex Lisp stream API, future work +Author: @uref{mailto:ben@@xemacs.org,Ben Wing} + Expose XEmacs internal lstreams to Lisp as stream objects. (In addition to the functions given below, each stream object has properties that can be associated with it using the standard put, get @@ -18532,6 +20702,8 @@ @cindex future work, multiple values @cindex multiple values, future work +Author: @uref{mailto:ben@@xemacs.org,Ben Wing} + On low level, all funs that can return multiple values are defined with DEFUN_MULTIPLE_VALUES and have an extra parameter, a struct mv_context *. @@ -18577,6 +20749,8 @@ @cindex future work, macros @cindex macros, future work +Author: @uref{mailto:ben@@xemacs.org,Ben Wing} + @enumerate @item Option to control whether beep really kills a macro execution. @@ -18595,6 +20769,8 @@ @cindex future work, specifiers @cindex specifiers, future work +Author: @uref{mailto:ben@@xemacs.org,Ben Wing} + @subheading Ideas To Work On When Their Time Has Come @itemize @@ -18801,6 +20977,8 @@ @cindex future work, display tables @cindex display tables, future work +Author: @uref{mailto:ben@@xemacs.org,Ben Wing} + #### It would also be really nice if you could specify that the characters come out in hex instead of in octal. Mule does that by adding a @code{ctl-hexa} variable similar to @code{ctl-arrow}, but @@ -18843,13 +21021,13 @@ Since more than one display table is possible, you have great flexibility in mapping ranges of characters. -@uref{../../www.666.com/ben/default.htm,Ben Wing} - @node Future Work -- Making Elisp Function Calls Faster, Future Work -- Lisp Engine Replacement, Future Work -- Display Tables, Future Work @section Future Work -- Making Elisp Function Calls Faster @cindex future work, making Elisp function calls faster @cindex making Elisp function calls faster, future work +Author: @uref{mailto:ben@@xemacs.org,Ben Wing} + @strong{Abstract: }This page describes many optimizations that can be made to the existing Elisp function call mechanism without too much effort. The most important optimizations can probably be implemented @@ -18951,10 +21129,7 @@ @end enumerate - -@end enumerate - - +@end enumerate The entire series of calls to @code{specbind()} should be inline and merged into the argument processing code as a single tight loop, with no @@ -18998,10 +21173,7 @@ @end enumerate - -@end enumerate - - +@end enumerate Other optimizations that could be done are: @@ -19085,8 +21257,6 @@ @end itemize -@uref{../../www.666.com/ben/default.htm,Ben Wing} - @node Future Work -- Lisp Engine Replacement, , Future Work -- Making Elisp Function Calls Faster, Future Work @section Future Work -- Lisp Engine Replacement @cindex future work, lisp engine replacement @@ -19095,6 +21265,7 @@ @menu * Future Work -- Lisp Engine Discussion:: * Future Work -- Lisp Engine Replacement -- Implementation:: +* Future Work -- Startup File Modification by Packages:: @end menu @node Future Work -- Lisp Engine Discussion, Future Work -- Lisp Engine Replacement -- Implementation, Future Work -- Lisp Engine Replacement, Future Work -- Lisp Engine Replacement @@ -19102,6 +21273,7 @@ @cindex future work, lisp engine discussion @cindex lisp engine discussion, future work +Author: @uref{mailto:ben@@xemacs.org,Ben Wing} @strong{Abstract: }Recently there has been a great deal of talk on the XEmacs mailing lists about potential changes to the XEmacs Lisp engine. @@ -19225,7 +21397,6 @@ many of the Lisp engines that are being considered have such a mechanism built into them? - @subsubheading Maintainability. A new Lisp engine might well improve the maintainability of XEmacs by @@ -19284,15 +21455,13 @@ a standard object system for Common Lisp, but it is extremely complex and difficult to understand. - -@uref{../../www.666.com/ben/default.htm,Ben Wing} - - -@node Future Work -- Lisp Engine Replacement -- Implementation, , Future Work -- Lisp Engine Discussion, Future Work -- Lisp Engine Replacement +@node Future Work -- Lisp Engine Replacement -- Implementation, Future Work -- Startup File Modification by Packages, Future Work -- Lisp Engine Discussion, Future Work -- Lisp Engine Replacement @subsection Future Work -- Lisp Engine Replacement -- Implementation @cindex future work, lisp engine replacement, implementation @cindex lisp engine replacement, implementation, future work +Author: @uref{mailto:ben@@xemacs.org,Ben Wing} + Let's take a look at the sort of work that would be required if we were to replace the existing Elisp engine in XEmacs with some other engine, for example, the Clisp engine. I'm assuming here, of course, that we @@ -19433,7 +21602,6 @@ define a new macro for use when calling a primitive. @end enumerate - @subsubheading Make the Existing Lisp Engine be Self-contained. The goal of this stage is to gradually build up a self-contained Lisp @@ -19641,8 +21809,64 @@ @end enumerate - -@uref{../../www.666.com/ben/default.htm,Ben Wing} +@node Future Work -- Startup File Modification by Packages, , Future Work -- Lisp Engine Replacement -- Implementation, Future Work -- Lisp Engine Replacement +@subsection Future Work -- Startup File Modification by Packages +@cindex future work, startup file modification by packages +@cindex startup file modification by packages, future work + +Author: @uref{mailto:ben@@xemacs.org,Ben Wing} + +OK, we need to create a design document for all of this, including: + +PRINCIPLE #1: Whenever you have auto-generated stuff, @strong{CLEARLY} +indicate this in comments around the stuff. These comments get +searched for, and used to locate the existing generated stuff to +replace. Custom currently doesn't do this. + +PRINCIPLE #2: Currently, lots of functions want to add code to the +.emacs. (e.g. I get prompted for my mail address from +add-change-log-entry, and then prompted if I want to make this +permanent). There needs to be a Lisp API for working with arbitrary +code to be added to a user's startup. This API hides all the details +of which file to put the fragment in, where in it, how to mark it with +magical comments of the right kind so that previous fragments can be +replaced, etc. + +PRINCIPLE #3: @strong{ALL} generated stuff should be loaded before any +user-written init stuff. This way the user can override the generated +settings. Although in the case of customize, it may work when the +custom stuff is at the end of the init file, it surely won't work for +arbitrary code fragments (which typically do @code{setq} or the like). + +PRINCIPLE #4: As much as possible, generated stuff should be place in +separate files from non-generated stuff. Otherwise it's inevitable +that some corruption is going to result. + +PRINCIPLE #5: Packages are encouraged, as much as possible, to work +within the customize model and store all their customizations there. +However, if they really need to have their own init files, these files +should be placed in .xemacs/, given normal names +(e.g. @file{saved-abbrevs.el} not .abbrevs), and there should be some magic +comment at the top of the file that causes it to get automatically +loaded while loading a user's init file. (Alternatively, the +above-named API could specify a function that lets a package specify +that they want such-and-such file loaded from the init file, and have +the specifics of this get handled correctly.) + +OVERARCHING GOAL: The overarching goal is to provide a unified +mechanism for packages to store state and setting information about +the user and what they were doing when XEmacs exited, so that the same +or a similar environment can be automatically set up the next time. +In general, we are working more and more towards being a truly GUI app +where users' settings are easy to change and get remembered correctly +and consistently from one session to the next, rather than requiring +nasty hacking in elisp. + +Hrvoje, do you have any interest in this? How about you, Martin? +This seems like it might be up your alley. This stuff has been +ad-hocked since kingdom come, and it's high time that we make this +work properly so that it could be relied upon, and a lot of things +could "just work". @node Future Work Discussion, Old Future Work, Future Work, Top @chapter Future Work Discussion @@ -19657,6 +21881,11 @@ @menu * Discussion -- garbage collection:: * Discussion -- glyphs:: +* Discussion -- Dialog Boxes:: +* Discussion -- Multilingual Issues:: +* Discussion -- Windows External Widget:: +* Discussion -- Packages:: +* Discussion -- Distribution Layout:: @end menu @node Discussion -- garbage collection, Discussion -- glyphs, Future Work Discussion, Future Work Discussion @@ -19664,16 +21893,11 @@ @cindex discussion, garbage collection @cindex garbage collection, discussion - -@example On Tue, Oct 12, 1999 at 03:36:59AM -0700, Ben Wing wrote: -@end example So what am I missing here? -@example In response, Olivier Galibert wrote: -@end example Two things: @enumerate @@ -19712,7 +21936,6 @@ move the markbit outside of the lrecord. @end itemize - The second solution is more appealing to me for a bunch of reasons: @itemize @bullet @item @@ -19745,12 +21968,9 @@ think, really cool. @end enumerate - -@example On 10/12/1999 5:49 PM Ben Wing wrote: Subject: Re: hashtable-based marking and cleanups -@end example OK, I can see the advantages. But: @@ -19795,14 +22015,13 @@ http://www.amazon.com/exec/obidos/ASIN/0471941484/qid=939775572/sr=1-1/002-3092633-2509405 @end example -@node Discussion -- glyphs, , Discussion -- garbage collection, Future Work Discussion +@node Discussion -- glyphs, Discussion -- Dialog Boxes, Discussion -- garbage collection, Future Work Discussion @section Discussion -- glyphs @cindex discussion, glyphs @cindex glyphs, discussion Some comments (not always pretty!) by Ben: -@example March 20, 2000 Andy, I use the tab widgets but I've been having lots of problems. @@ -19817,9 +22036,7 @@ wrong: If you don't reorder the buffer list, everything else gets screwed up. If you want the order of the tabs not to change, you need to decouple this order from the buffer list order. -@end example - -@example + March 23, 2000 I'm very confused. The SIGIO timer is used @strong{only} for C-g. It has @@ -19839,7 +22056,9 @@ like we already have. I think instead, you should introduce the following primitive: +@example (wait-for-event redisplay &rest event-specs) +@end example Waits for one of the event specifications specified to happen. Returns something about what happened. @@ -19847,11 +22066,16 @@ REDISPLAY controls the behavior of redisplay during waiting. Something like -- nil (never redisplay), -- t (redisplay when it seems appropriate), etc. +@itemize @bullet +@item +nil (never redisplay), +@item +t (redisplay when it seems appropriate), etc. +@end itemize EVENT-SPECS could be +@example t -- drain all non-user events, and then return any-process -- wait till input or state change on any process process -- wait till input or state change on process @@ -19861,6 +22085,7 @@ happened 'event -- wait till any event has happened '(event predicate) -- wait till event matching the predicate has happened +@end example The existing functions @code{next-event}, @code{next-command-event}, @code{accept-process-output}, @code{sit-for}, @code{sleep-for}, etc. could all be @@ -19870,18 +22095,14 @@ But you said something about need a magic event to invoke redisplay? Why is that? -@end example - -@example + April 2, 2000 the internal distinction between "widget" and "layout" is bogus. there exist widgets that do drawing and do layout of their children, e.g. group-box widgets and proper tab widgets. the only sensible distinction is between widgets with children and those without children. -@end example - -@example + April 5, 2000 andy, i'm not sure i really believe that you need to cycle the event @@ -19900,9 +22121,7 @@ in other words, dispatch-non-command-events must go, and i am proposing a general function (redisplay OBJECT) to replace the existing ad-hoc functions. -@end example - -@example + April 6, 2000 the tab widget code should simply be able to create a whole lot of tabs @@ -19911,9 +22130,7 @@ automatically map and unmap them as necessary, to fill up the available space. perhaps this already works and what you're doing is just for optimization? but i get the feeling this is not the case. -@end example - -@example + April 6, 2000 the function make-gutter-only-dialog-frame is bogus. the use of the @@ -19926,14 +22143,17 @@ also, these dialog boxes, and this function make-dialog-frame, should -a] be in dialog.el, not gutter-items.el. -b] when possible, be placed in the interactive spec of standard lisp -functions rather than accessed directly from menubar-items.el -c] wrapped in calls to should-use-dialog-box-p, so the user has control +@enumerate +@item +be in @file{dialog.el}, not gutter-items.el. +@item +when possible, be placed in the interactive spec of standard lisp +functions rather than accessed directly from @file{menubar-items.el} +@item +wrapped in calls to should-use-dialog-box-p, so the user has control over when dialog boxes appear. -@end example - -@example +@end enumerate + April 7, 2000 hmmm ... in that case, the whitespace absolutely needs to be specified @@ -19943,9 +22163,7 @@ translations in a different language. Your modus operandi should be "hardcoded pixel sizes are @strong{always} bad." -@end example - -@example + April 7, 2000 you mean the number of tabs adjusts, or the size of each tab adjusts (by @@ -19966,18 +22184,18 @@ i won't stop complaining until i see nearly every one of those pixel-width and pixel-height parameters gone, and the remaining ones there for a very, very good reason. -@end example - -@example + April 7, 2000 Andy Piper wrote: +@example > At 03:51 PM 4/6/00 -0700, Ben Wing wrote: > >[the function make-gutter-only-dialog-frame is bogus] > > The problem is that some of the callbacks and such need access to the > @strong{created} frame, so you end up in a catch 22 unless you do what I've done. +@end example [Ben proposes other ways to avoid exposing all the guts, as in @code{make-gutter-only-dialog-frame}:] @@ -20000,9 +22218,7 @@ box and its parent, and not have to worry about embedding it in at creation time. @end enumerate -@end example - -@example + April 15, 2000 I don't understand when you say "the various types of callback". Are you using the callback for various different purposes? @@ -20011,9 +22227,7 @@ take two arguments, one indicating the object to which the callback was attached (an image instance, i think), and the event that caused the callback to be invoked. -@end example - -@example + April 17, 2000 I am completely vetoing widget-callback-current-channel. How about you @@ -20028,6 +22242,1857 @@ the problem with this and everything you've proposed is that there's no way, of course, to get at the actual widget that you were invoked from. would you propose adding widget-callback-current-widget? + +@node Discussion -- Dialog Boxes, Discussion -- Multilingual Issues, Discussion -- glyphs, Future Work Discussion +@section Discussion -- Dialog Boxes +@cindex discussion, dialog boxes +@cindex dialog boxes, discussion + +@example +From: + Ben Wing <ben@@666.com> + 10/7/1999 5:57 PM + + Subject: + Re: Animated gif patch (2) + To: + Andy Piper <andy@@xemacs.org> + CC: + xemacs-review@@xemacs.org, xemacs-beta@@xemacs.org + + + + +The distinction between layouts and widgets makes no sense, so you should combine +the different data required. Consider a grouping widget. Is this a layout or a +widget? It draws, like a widget, but has children, like a layout. Same for a tab +widget, properly implemented. It draws, handles input, has children, and makes +choices about how to lay them out. + +ben + +From: + Ben Wing <ben@@666.com> + 9/7/1999 8:50 PM + + Subject: + Re: Layouts done + To: + Andy Piper <andyp@@beasys.com> + + + + +this sounds great! where can i see the code? + +as for user-defined layouts, you must certainly have some sort of abstraction +layer for layouts, with DEFINE_LAYOUT_TYPE or something similar just like device +types and such. If not, you should certainly make one ... it would have methods +such as query-geometry and do-layout. It should be easy to create a user-defined +layout if you have such an abstraction. + +with a user-defined layout, complex built-in layouts such as grid should not be +necessary because it's so easy to write snippets of lisp. + +as for the "redisplay too much" problem, perhaps you could put a dirty flag in +each glyph indicating whether it needs to be redisplayed, recalculated, etc.? + +Andy Piper wrote: + +> You may want to check them out. I haven't done the user-defined layout +> callback - I'm not sure what sort of API this could have. Keywords I've done: +> +> :orientation - vertical or horizontal +> :justify - left, center or right +> :border - etch-in, etch-out, bevel-in, bevel -out or text (which gives you +> etch-in with a title) +> +> You can embed any glyph type in a layout. +> +> There is probably room for improvements for justify to do grid-type layouts +> as per java. +> +> The only annoying thing is that I've hacked up font-lock support to do a +> progress gauge in the gutter area. I've used a layout to set things out +> correctly. The problem is if you change one of the sub-widgets, the whole +> layout gets redisplayed because it is treated as a single glyph by redisplay. +> +> Oh, and I've done line based scrolling so that glyphs scroll off the page +> in units of the average display line height rather than the whole line at +> once. This could easily be converted to pixel scrolling but would be very +> slow I fear. +> +> andy +> -------------------------------------------------------------- +> Dr Andy Piper +> Senior Consultant Architect, BEA Systems Ltd + + + + +From: + Ben Wing <ben@@666.com> + 8/10/1999 11:11 PM + + Subject: + Re: Widgets + To: + Andy Piper <andy@@xemacs.org> + + + + +I think you might have misinterpreted what i meant. I meant to say that XEmacs should +implement the @strong{concept} of a hierarchy of nested child "widgets" or "gui items" or +whatever we want to call them -- this includes container "widgets" such as grouping +widgets (which draw a border around the children, like in Windows), tab widgets, simple +layout widgets (invisible, but lay out their children appropriately), etc, plus leaf +"widgets" (buttons, sliders, etc., also standard Emacs windows). The layout calculations +for these widgets would be handled entirely by XEmacs in a window-system-independent way. +There is no need to create a corresponding hierarchy of window-system +widgets/controls/whatever if it's not required, and certainly no need to try to use the +window-system-supplied geometry management routines. It's absolutely necessary to support +this nesting concept in XEmacs, however, or it's impossible to have easily-designable +dialog boxes. On the other hand, I think it @strong{is} required to create much of this +hierarchy within the actual window system, at the very least for non-invisible container +widgets (tab, grouping, etc.), otherwise we will have very bogus, non-native-looking +containers like your current tab-widget implementation. It's critical for XEmacs to be +able to create dialog boxes in Windows or Motif that look just like those in any other +standard application. Otherwise people will continue to think that XEmacs is a +backwards-looking, badly implemented piece of software, which in many ways it is, +particularly in regards to its user interface. + +Perhaps we should talk on the phone? This typing is quite hard for me still. What hours +are you at work? My hours are approx. 2pm - 2am Pacific time (GMT - 7 hours currently). + +ben + + +From: + Ben Wing <ben@@666.com> + 7/21/1999 2:44 AM + + Subject: + Re: Tabs 'n widgets screenshot + To: + Andy Piper <andy@@xemacs.org> + CC: + xemacs-beta@@xemacs.org, wmperry@@aventail.com + + + + +This is real cool, but looking at this, it's clear that it doesn't look the +way tab widgets are supposed to work. In particular, of course, they should +have the proper borders around the stuff displayed. I've attached a screen +shot of a typical Windows dialog box with a tab widget in it. The problem +lies with this "expanded gutter" concept. Tabs are @strong{NOT} extra graphical junk +placed in the gutters of a buffer but are GUI objects with @strong{children} inside +of them. This is the right way to do things, and you would need no extra +gutter functionality at all for this. You just need to implement the concept +of GUI objects containing other GUI objects within them. One such GUI object +needs to be a "Emacs-text" GUI object, which is an Emacs window and contains a +buffer within it. At this level, you need not be concerned with the +complexities of geometry layout. The only change that needs to be made in the +overall strategy of frames, windows, etc. is that windows need not be exactly +contiguous and tiled, as long as they are contained within a frame. Or more +specifically: Given that you could always split a window contained inside a +GUI object, we just need to expand things so that each frame has @strong{multiple} +hierarchies of windows in it, rather than just one. A hierarchy of windows +can nest inside of another window -- e.g. I put a tab widget or a text widget +inside of a buffer. This should be easy to implement -- just change things so +there are multiple hierarchies of windows where there are one, each (except +the top-level one) being rooted inside some other window. + +Anyone willing to implement this? Andy? + + +From: + Ben Wing <ben@@666.com> + 6/30/1999 3:30 PM + + Subject: + Re: Focus Help! + To: + Andy Piper <andy@@xemacs.org> + CC: + Ben Wing <ben@@xemacs.org>, martin@@xemacs.org, andyp@@beasys.com + + + + +It sounds like you're doing very good work. It also sounds like the approach +you have followed is the correct one. Now, it seems like there isn't really +that much work left to get dialog boxes working. What you really just need to +do is implement container widgets, that is to say, subwindows that can contain +other subwindows. For example, the tab widget works this way. (It sounds like +you have already implemented tab widgets, so I don't quite see how you've done +this without the concept of container widgets.) So you might just try adding a +framework for container widgets and then implementing very simple container +widgets. The basic container widgets are: + +1. A vertical-layout widget, which draws nothing itself and lays out its +children one above the next. +2. A horizontal-layout widget, which draws nothing itself and lays out its +children side-to-side. +3. A box (or "grouping") widget, which draws a rectangle around its single child +and optionally draws some text on the top or bottom line of the rectangle. +4. A tab widget, which displays a series of tabs horizontally at the top of its +area, and then below it places one of its children, +corresponding to the selected tab. +5. A user widget, which draws nothing itself and does no layout at all on its +children, except that it has a "layout callback" +property, a Lisp function, so that the programmer can control the layout. + +The framework is as follows: + +1. Every widget has at least the following properties: + a) a size, whose value can be "unspecified", which might be implemented +using the value -1. The default value should be "unspecified". + b) whether it's mapped, i.e. whether it will be displayed. (Some container +widgets, such as the tab widget, set the mapped +property themselves on their children. Others, such as the vertical and +horizontal layout widgets, don't change this property but pay attention to it, +and ignore completely all children marked as unmapped.) The default value should +be "true". + c) whether its size can be changed by another widget's layout routine. The +default value should be "true". + d) a layout procedure, which (potentially at least) determines the size of +the widget as well as the position, size and mappedness of its child widgets. +The layout procedure is inherent in the widget and is not an external property +of the widget (except in the case of the "user widget"): it is instead more like +the redisplay callback that each widget has. +2. Every container widget contains a property which is a list of child widgets. +3. Every child widget contains the following properties: + a) a position indicating where the child is located relative to the top +left corner of its parent. The position's value can be "unspecified", which +might be implemented using the value -1. The default value should be +"unspecified". + b) whether its position can be changed by another widget's layout routine. +The default value should be "true". +4. All of the properties just listed (except possibly the layout procedure) can +be modified directly by the programmer, and there are no proscriptions against +doing so. However, if the programmer wants to resize, reposition, map or unmap +a widget in such a way that the layout of all the other widgets in the tree +changes appropriately, he should use a special function to change the property, +as described below. + +The redisplay mechanism pays attention to the position, size, and mappedness +properties and to the hierarchy of widgets, mapping, resizing and repositioning +the corresponding subwindows (the "real representation" of the widgets) as +necessary. It also pays attention to the hierarchy of the widgets, making sure +that container subwindows get drawn before their child subwindows. When it +encounters widgets with an unspecified size, it should not draw them, and should +issue a warning. When it encounters widgets with an unspecified position, it +should draw them at position (0, 0) and should issue a warning. + +The above framework should be fairly simple to implement and is basically +universal across all high-level windowing system toolkits. The stickyness comes +with what procedures you follow for getting the layout done. + +Andy, I understand that implementing this may seem like a daunting task. +Therefore, I propose that at first you implement the above framework but don't +implement any of the layout procedures, or any of the functions that call them: +Just make them stubs that do nothing. This way, the Lisp programmer can still +create any dialog boxes he wants, he just has to set the sizes and positions of +all the widgets explicitly, and then recompute them whenever the widget tree is +resized (once you get around to allowing this). I have a lot more to write +about exactly how the layout procedures work, but I'll send that to you later +once you're ready. + +You should also think about making a way to have widget trees as top-level +windows rather than just glyphs in a buffer. There's already the concept of +"popup" frames. You could provide an easy way to create a popup frame with no +menu, toolbars, scrollbars, modeline or minibuffer, and put a single glyph in +the displayed buffer that takes up the whole Emacs window. + +Ben + + + + +March 20, 2000 + +You wrote to me awhile ago about this and asked about documentation, and I +dictated a response but never got it sent, so here it is: + +I don't think there's any more documentation on how things work under Xt but it +should be clear. The EmacsFrame widget is the widget corresponding to the X +window that Emacs draws into and there is a handler for expose events called +from Xt which arranges for the invalidated areas to get redrawn. I think this +used to happen as part of the handler itself but now it is delayed until the +next call to redisplay. + +However, one thing that you absolutely must not do is remove the Xt support. +This would be an incredibly unfriendly thing to do as it would prevent people +from using any widget set other than Qt or GTK. Keep in mind that people run +XEmacs on all sorts of different versions of X in Unix, and Xt is the standard +and the only toolkit that probably exists on all of these systems. + +Pardon me if I've misunderstood your intentions w.r.t. this. + +As for how you would implement GTK support, it will not be very hard to convert +redisplay to draw into a GTK window instead of an Xt window. In fact redisplay +basically doesn't know about Xt at all, except in the portion that handles +updating menubars and scrollbars and stuff that's directly related to Xt. + +What you'd probably want to do is create a new set of event routines to replace +the ones in event-Xt.c. On the display side you could conceivably create a new +device type but you probably wouldn't want to do that because it would be an +externally visible change at the Lisp level. You might simply want to put a +flag on each frame indicating what sort of toolkit the frame was created under +and put conditions in the redisplay code and the code to update toolbars and +menubars and so forth to test this flag and do the appropriate thing. + + +April 12, 2000 + +This is way cool, buuuuutttttttt ............. + +what we @strong{really} need is the GUI interface on top of it. I've taken a shot at +it with generic-print-buffer +(print-buffer is taken by lpr, which is such a total mess that it needs to be +trashed; or at least, the generic +stuff in this package needs to be taken out and properly genericized). For +the moment, generic-print-buffer +just does something like what Kirill's been posting if we're running windows, +and uses lpr otherwards. However, what we absofuckinglutely need is a Lisp +interface onto @code{EnumPrinters()} so that we can get the +list of printers and have a nice menu listing the available printers, and you +can check the one you want. People in the Windows world don't normally even +know the names of their local printers! + +Kirill, given what I've done in @file{simple.el} and @file{menubar-items.el}, do you think +you could add the @code{EnumPrinters()} +support and fix up the GUI? If you don't feel comfortable with the GUI, at +least do the @code{EnumPrinters()}. + +But ... Kirill, I tried your formula for printing and nothing happened. +Perhaps I didn't call redisplay-frame or something? You need to fix this up +and make it work for multi-page documents. (Again, this is in +generic-print-buffer.) Nothing special, it just needs to fucking work! There +are zillions and zillions of postings every day on xemacs-nt about how to get +printing working, and none seem to refer to the built-in support. + +ben + + +April 19, 2000 + +Kirill 'Big K' Katsnelson wrote: + +> Some time ago, Ben Wing wrote... +> >kirill, the interface i created is more general, like this: +> +> [snip] +> +> >Unfortunately I haven't implemented much of this; just some of the file +> >dialog box. but i think +> >this is better than creating new mswindows-specific primitives. if you +> >are interested in working on +> >this, i'll send you the code i have. +> +> Sure. Can you just commit it for my starting point? +> +> >also, the dialogs shouldn't have anything directly to do with the printer +> >device. all they should +> >do is return a set of values. it's the caller's responsibility to +> >interpret them and set device +> >properties accordingly. this way, there's a complete separation between +> >the underlying +> >functionality and the gui. +> +> Unfortunately. I thought about doing it this way, but we then lose a lot of +> printer-specific setup in this case. The DEVMODE structure contains two +> parts: printer independent, as defined by SDK typedef DEVMODE, and +> some trailing bytes, of unknown structure, used by a driver. The driver +> only returns the extra length it wants. Such options as PCL ReT resolution +> enhancement options or PostScript negative output are not available +> through the standard part of the devmode structure, and stored in the +> driver part (printer dialogs are driver-specific). +> +> So we have total of three options: +> - Not to implement options beyond standard DEVMODE +> - Make DEVMODE a Lisp object. +> - Hide DEVMODE inside the device object. +> +> First case looks cheesy. Letting DEVMODE fall off the printer is no good +> either, since one needs both the device and the devmode to edit the +> devmode, and they must match. I am still convinced that the devmode and +> the printer should not be separated. + +hmm, i see ... this completely breaks abstraction though. it fails in various +scenarios, e.g. a program wants to initialize the dialog box with certain +non-driver-specific properties, without caring about the particular printer. + +i think you should create a new print-properties object that encapsulates all +printer properties (which can be changed using get/put), including the printer +name, and contains a DEVMODE in it. if the printer name gets changed, the +DEVMODE might change too, but the print-properties object itself stays the +same. you pass this object as a parameter to the dialog box, and it gets +changed accordingly. you can call something like set-device-print-properties to +stick everything in this structure into the device. (you could imagine a case +where someone wanted to keep multiple print configurations around ...) + +> +> +> Big K + +-- +Ben + +@end example + +@node Discussion -- Multilingual Issues, Discussion -- Windows External Widget, Discussion -- Dialog Boxes, Future Work Discussion +@section Discussion -- Multilingual Issues +@cindex discussion, multilingual issues +@cindex multilingual issues, discussion + +@example + + 4/10/2000 4:13 AM + +BTW I am planning on adding some more powerful font-mapping capabilities to +XEmacs (i.e. how do we map particular characters to the proper fonts that can +display them, and how do we map the character's codes to the indices into the +font). These will replace to hackish charset-registry/charset-ccl-program stuff +we currently have, and be [a] much more powerful, [b] designed in a +window-system-independent way, [c] works with specifiers so you can control the +mapping of individual buffers, and [d] works on a character rather than charset +level, to correctly handle Unicode. One possible usage would be to declare that +all latin1 in a particular buffer to be displayed with latin2 fonts; I bet +Hrvoje would really appreciate that + +--------------------------------------------------------------------------- + +April 10, 2000 + +[info from "creation of generic macros for accessing internally formatted data"] + +Hmm, so there I just wrote a detailed design for the macros. I would be +@strong{THRILLED} and overjoyed if you went ahead and implemented this mechanism, or +parts of it. + +I've just finished arranging for a new transcriptionist, and soon I should be +able to send off and get back my dictation of my (a) exposing streams to lisp, +and (b) allowing for proper lisp-created coding systems, which define their +reading, writing, and detecting methods in lisp. + + +BTW How's it going wrt your Unicode and decode-priority stuff? + +And ... you sent me mail asking what it was you had promised me, and listed +only one thing, which was +profiling of vm and certain other operations you found showed tremendous +slowdown with Japanese characters. The other main thing I want from you is + +-- Your priorities, as an actual Japanese user and XEmacs developer, +concerning what MULE work should be done, how it should be done, in what +order, etc. + +I'm sure there's something else, but it's been awhile since I took my sleeping +dose and my brain can barely function anymore. Just let me know how you're +going to proceed with the above macro changes. + +BTW there's some nice Perl scripts written by Martin and fixed by me to make +global-search-and-replace +much, much easier. I've attached them. The first one is a shell script that +works like + +gr foo bar *.[ch] + +and replaces foo with bar in all of the files. For each modified file, a +backup is created in the backup/ directory, which is created as necessary. +This shell script is a fairly trivial front end onto global-replace2, which is +a perl script that takes one argument (a Perl expression such as s/foo/bar/g) +and a list of files obtained by reading the stdin, and does the same global +replacement. This means that the regexp syntax used here has to be perl-style +rather than standard emacs/grep style. + +ben + +--------------------------------------------------------------------- + + +From: + Ben Wing <ben@@666.com> + 12/23/1999 3:34 AM + + Subject: + Re: check process state before accessing coding_stream (fix PR#1061) + To: + "Stephen J. Turnbull" <turnbull@@sk.tsukuba.ac.jp> + CC: + XEmacs Developers <xemacs-beta@@xemacs.org> + + + + +Thankfully, nearly all of this horridity you bring up is irrelevant. In +XEmacs, "gettext" does not refer to any standard API, but is merely a stand-in +for a translation routine (presumably written by us). We may as well call it +something else. We define our own concept of "current language". We also +allow for a function that needs a different version for each language, which +handles all cases where simple translation isn't sufficient, e.g. when you +have to pluralize some noun given to you or insert the correct form of the +definite article. No weird hacks needed. No interaction problems with other +pieces of software. + +What I wrote "awhile ago" is (unfortunately) not anywhere public currently, +but it's on my list to put it on the web site. "There you go again" is +usually not true; most of what I quote was indeed put out publicly at some +point, but I'll try to be more explicit about this in the future. + +ben + +"Stephen J. Turnbull" wrote: + +> >>>>> "Ben" == Ben Wing <ben@@666.com> writes: +> +> Ben> "Stephen J. Turnbull" wrote: +> +> >> What I have in mind is not just gettext-izing everything in the +> >> XEmacs core sources. I currently believe that to be +> >> unacceptable +> +> Ben> I don't quite understand. Could you elaborate and give some +> Ben> examples? +> +> Examples? Hmm. +> +> First, there's the surface of Jan's y-or-n-p example. You have to +> coordinate the translation of the message string and the response +> prompt. This is handled by y-or-n-p itself (I see that we already do +> have gettext for Emacs Lisp, that's nice to know). +> +> Except that it's not really handled by y-or-n-p. There's no reason to +> suppose that somebody writing a Lisp package would necessarily use the +> XEmacs domain (in fact, due to the way gettext binds text domains---if +> I understand that correctly---we don't want that to be the case, +> because it means that every time a Lisp package is updated the whole +> XEmacs catalog must also be updated). So which domain gets used for +> the message string? +> +> In the current implementation, it is the domain of y-or-n-p. So +> packages with their own domain won't get y-or-n-p prompts correctly +> translated. But that means that the package should do its own +> translation. But now you're applying gettext to the same string +> twice; you just have to pray the that translator upstream doesn't +> collide with an English string that's in the XEmacs domain. (The +> gettext docs mention the similar problem of English words with +> multiple meanings that must map to different words in the target +> language; this can be disambiguated by various trickeries in forming +> the strings ... but only if you "own" them, which in the multi-domain, +> interated gettext example you do not.) AFAICT this means that you +> must never pass untranslated strings across public APIs, but this may +> or may not be reasonable, and certainly is inconvenient. +> +> Next, we have to translate the possible answer strings to match the +> language being passed by the user. This is presumably OK here, +> because it's done by y-or-n-p. But what if y-or-n-p returned a string +> rather than a boolean? Then we would need to coordinate the +> presentation of the prompt (done by y-or-n-p) and the translation of +> the possible answer strings (done by the caller). This can in fact be +> done using dgettext with the XEmacs domain, but you must know that +> y-or-n-p is in the XEmacs domain. This is not necessarily going to be +> obvious, and it might very well be that sets of related packages might +> have the same domain, so you wouldn't necessarily know which domain is +> appropriate by looking at the requires. +> +> And what happens if one domain does supply translations for a language +> and the other does not? AFAIK, gettext has no way to find out if this +> is the case. But you might very will prefer a global fallback to +> English if substantial phrases are drawn from both domains, while you +> might prefer string-by-string fallback if the main text is translated +> and only a few words are left to fallback to English. +> +> Aside from confusing users, this puts a great burden on programmers. +> Programmers need to know about the status of the domains of packages +> they use as well as the XEmacs domain; they need to program +> defensively against the possibility that some package they use will +> become gettext-ized, or the translation projects will be out of synch +> (some teams will do the calling package first, others will do the +> caller package first). +> +> I don't think anybody will use gettext in these circumstances. At +> least not after they get the first bug report that "XEmacs is stuck in +> an infinite y-or-n-p loop and I can't get out." +> +> Ben> I wrote this awhile ago: +> +> "There you go again." Not anywhere I could see it! (At least, it +> doesn't look familiar and grepping the archives doesn't turn it up.) +> +> OK, you win. Subscribe me to xemacs-review. Or whatever seems +> appropriate. +> +> -- +> University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN +> Institute of Policy and Planning Sciences Tel/fax: +81 (298) 53-5091 +> _________________ _________________ _________________ _________________ +> What are those straight lines for? "XEmacs rules." + +-- +In order to save my hands, I am cutting back on my responses, especially +to XEmacs-related mail. You _will_ get a response, but please be patient. +If you need an immediate response and it is not apparent in your message, +please say so. Thanks for your understanding. + + + +-------------------------------------------------------------------- + + +From: + Ben Wing <ben@@666.com> + 12/21/1999 2:22 AM + + Subject: + Re: check process state before accessing coding_stream (fix PR#1061) + To: + "Stephen J. Turnbull" <turnbull@@sk.tsukuba.ac.jp> + CC: + XEmacs Developers <xemacs-beta@@xemacs.org> + + + + + +"Stephen J. Turnbull" wrote: + +> >>>>> "Ben" == Ben Wing <ben@@666.com> writes: +> +> Ben> Implementing message translation is not that hard. +> +> What I have in mind is not just gettext-izing everything in the XEmacs +> core sources. I currently believe that to be unacceptable (see Jan's +> message for the pitfalls in I18N; it's worse for M17N). I think +> really solving this problem needs a specifier-like fallback mechanism +> (this would solve Jan's example because you could query the +> text-specifier presenting the question for the affirmative and +> negative responses, and the catalog-building mechanism would have +> checks to make sure they were properly set, perhaps a locale +> (language) argument), and gettext is just not sufficient for that. + +I don't quite understand. Could you elaborate and give some examples? + +> +> +> At a minimum, we need to implement gettext for Lisp packages. +> (Currently, gettext is only implemented for C AFAIK.) But this could +> potentially cuase more trouble than it's worth. +> +> Ben> A lot depends on priority: How important do you think this +> Ben> issue is to your average Japanese/Chinese/etc. user? +> +> Which average Japanese (etc) user? The English-skilled (relatively) +> programmer in the free software movement, or my not-at-all-competent +> undergrad students who I would love to have using an Emacs? This is a +> really important ease-of-use issue. +> +> Realistically, for Japanese, it's low priority. The Japanese team in +> the GNU Translation Project is doing very little AFAIK, so even if the +> capability were there, I doubt the message catalog would soon be done. +> +> But I think that many non-English speakers would find it very +> attractive, and for many languages there are well-organized and +> productive translation teams. I suspect that if the I18N facility +> were well-designed, many Western European languages would have full +> catalogs within a year (granted, they are the ones where it's least +> needed :-( ). +> +> Personally, I think doing it well is hard, and of little benefit to +> _current_ core XEmacs constituency. I think doing a good job, with +> catalogs, would be very attractive to many non-English-speaking +> _potential_ users. +> +> Ben> How does it compare to some of the other important Mule +> Ben> issues that Martin and I are (trying to work) on? +> +> I don't know what you guys are _trying_ to work on. Everything in the +> I18N section of "Architecting XEmacs" is red-flagged. OTOH, it's +> clear from your posts that you are overburdened, so I can't read +> priority into the fact that you've responded to specific issues in the +> past. + +I wrote this awhile ago: + + +> +> Ben> The big question is, would you be willing to help do the +> Ben> actual implementation, to "be my hands"? +> +> Sure, subject to the usual caveat that I'd need to be convinced it's +> worth doing and a secondary caveat that I am not an experienced coder. + +If you'll implement it, I'll design it. It's more a case of will on your part +than anything else. I can give you instructions sufficient enough to match +your level of expertise. + +ben + +> +> +> -- +> University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN +> Institute of Policy and Planning Sciences Tel/fax: +81 (298) 53-5091 +> _________________ _________________ _________________ _________________ +> What are those straight lines for? "XEmacs rules." + +-- +In order to save my hands, I am cutting back on my responses, especially +to XEmacs-related mail. You _will_ get a response, but please be patient. +If you need an immediate response and it is not apparent in your message, +please say so. Thanks for your understanding. + + + +----------------------------------------------------------------------------- + +Dec 20, 1999 + + +Implementing message translation is not that hard. I've already done a lot of +preliminary work in places such as @file{make-msgfile.lex} in lib-src/. Finishing up +the work is not that big a task; I already know exactly how it should be +done. Perhaps I'll write up detailed design instructions for this, as I'm +doing for other things. A lot depends on priority: How important do you think +this issue is to your average Japanese/Chinese/etc. user? How does it compare +to some of the other important Mule issues that Martin and I are (trying to +work) on? If I did the design document, would you be willing to do the +necessary bit of C hackery to implement the document? If the design document +is not specific enough for you, I can give you an "implementation document" +which will definitely be specific enough: i.e. I'll show you exactly where the +code needs to be modified, and how. The big question is, would you be willing +to help do the actual implementation, to "be my hands"? + +--------------------------------------------------------------------------- + +From: + Ben Wing <ben@@666.com> + 12/14/1999 11:00 PM + + Subject: + Re: Mule UI disaster: displaying character tables + To: + Hrvoje Niksic <hniksic@@iskon.hr> + CC: + XEmacs vs Mule <xemacs-mule@@xemacs.org> + + + + +What I mean is, please put my name in the header, as well as xemacs-mule. +That way I'll see it in my personal box. + +I agree that Mule has problems, but: + +Brokenness can be fixed. +Slowness can be fixed. +Limitations can be fixed. + +The design limitation you mention below, for example, is not really very +hard to change. + +Keep in mind that I pretty much rewrote Mule from scratch, and did it +@strong{all} in 6-7 months. In comparison with that, the changes below are +pretty minor, and each could be done by a good (and able-bodied!) +programmer familiar with the Mule code in less than a week -- to the +XEmacs code, at least. The problem is, everyone who could do this work is +instead spending their time complaining about Mule problems instead of +doing things. + +I'll gladly help out anyone who wants to do Mule coding by explaining all +the details; I'll even write a "Mule internals manual", if that will +help. I can also make international phone calls -- they're cheap here in +the US due to the long distance wars. But so far no one has asked me for +help or shown any willingness to do any work on Mule. + +Perhaps people are daunted by the seeming vastness of the problems. But I +wager that if I had another 6 months to work on nothing but Mule, it would +be nearly perfect. The basic design of the XEmacs C code is good; +incremental changes, without over-much concern for compatibility, could +make huge strides in a short amount of time (as was the case the whole +time I worked on it, esp. towards the end -- it didn't even @strong{compile} for +4 months!). A "total rewrite" would be an incredible waste of time. + +Again, I'm completely willing to provide help, documentation, design +improvement suggestions (ala Architecting XEmacs -- which seems to have +been completely ignored, alas), etc. + +ben + +Hrvoje Niksic wrote: + +> Ben Wing <ben@@666.com> writes: +> +> > I'm the one who did most of the Mule work in XEmacs, so if you have +> > any questions about the core, please address them to me directly. I +> > can probably give you a very clear and detailed answer. +> +> Thanks. I think it still makes sense to ask here, so that other +> developer have a chance to chime in. +> +> > However, I need some explanation. What's misdesigned that you're +> > complaining about? And what's the coding-system disaster? +> +> It's been spoken of a lot. Basically: +> +> * Unlike XEmacs/no-Mule, XEmacs/Mule doesn't preserve binary files in +> Latin 2 locales by default. This is annoying for users who are used +> to XEmacs/no-Mule. +> +> * XEmacs/Mule is much slower than XEmacs, and not only because of +> character/byte conversions. It seems that font lookups etc. are +> slower. +> +> * The "coding-system disaster" refers to inherent limitations of the +> coding-system model. If I understand things correctly, +> coding-systems convert streams of bytes to streams of Emchars. It +> does not appear to be possible to create a "gzip" coding system for +> handling gzipped file. Even EOL conversions look kludgish: +> +> iso-2022-8 +> iso-2022-8-dos +> iso-2022-8-mac +> iso-2022-8-unix +> iso-2022-8bit-ss2 +> iso-2022-8bit-ss2-dos +> iso-2022-8bit-ss2-mac +> iso-2022-8bit-ss2-unix +> iso-2022-int-1 +> iso-2022-int-1-dos +> iso-2022-int-1-mac +> iso-2022-int-1-unix +> +> Ideally, it should be possible to specify a stream of +> coding-systems, where only the last one converts to actual Emchars. +> +> There are more problems I don't remember right now. Many many usage +> problems become apparent when I stand and look over the shoulders of +> an XEmacs users who tries to use Mule. + +-- +In order to save my hands, I am cutting back on my responses, especially +to XEmacs-related mail. You _will_ get a response, but please be patient. + +If you need an immediate response and it is not apparent in your message, +please say so. Thanks for your understanding. + + + +----------------------------------------------------------------------- + + + + +From: + Ben Wing <ben@@666.com> + 12/14/1999 12:20 AM + + Subject: + Re: Mule UI disaster: displaying character tables + To: + "Stephen J. Turnbull" <turnbull@@sk.tsukuba.ac.jp> + CC: + XEmacs vs Mule <xemacs-mule@@xemacs.org> + + + + +I think you should go ahead with your proposal, and assume it will get +implemented. I don't think Martin is really suggesting that API changes not +be allowed, but just that they proceed in a somewhat orderly fashion; and in +any case, I imagine I have final say in cases of Mule-related conflicts. + +ben + +"Stephen J. Turnbull" wrote: + +> >>>>> "Hrvoje" == Hrvoje Niksic <hniksic@@iskon.hr> writes: +> +> Hrvoje> So next I tried the "Mule" menu. That's right, boys and +> Hrvoje> girls, I've never looked at it before. +> +> For quite a while, it didn't work at all, led to crashes and other +> warm/fuzzy things. IIRC there used to be a top level menu item +> pointing to information about the current language environment but it +> got removed. +> +> Hrvoje> Wow. Seeing shift_jis, iso-2022 variants and (above all +> Hrvoje> things) big5 makes me really warm and fuzzy. +> +> We've been through this recently---you were there. We know what to do +> about it, basically (Ben liked my proposal, and it would fix this +> silliness as well as the binary file breakage). But given that Ben +> and Martin seem to have different ideas about where to go with Mule +> (Ben seemed to be supporting API and implementation revisions, Martin +> evidently wants to keep the current Mule), working on that proposal is +> possibly a waste of time. I've got other stuff on my plate and I'll +> get back to it one of these days (not tomorrow but sooner than Real +> Soon Now). +> +> Hrvoje> The items it presents (leading to further submenus) are: +> +> Hrvoje> 94 character set +> Hrvoje> 94 x 94 character set +> Hrvoje> 96 character set +> +> This _is_ bad UI, now that you point it out. But it is quite natural +> for a coding system lawyer (as all Japanese users have to be), I never +> noticed it before. Easy enough to fix ("raise my karma"). +> +> Hrvoje> But I do bear some Mule scars, so I happily select "96 +> Hrvoje> character sets", then ISO8859-2. And I get this: +> +> [Table omitted] +> +> Hrvoje> So me wonders: what the hell is this? +> +> Huh? That is the standard table that you see over and over again in +> references. I'll believe you if you say you've never seen one before, +> but every Japanese users' manual has dozens of pages of those, using +> exactly that format. +> +> The presentation in the range 00--7F is not unreasonable for Latin 2; +> ISO-8859 is a version of ISO-2022, so the high bit should not be +> interpreted as "+ x80" (technically speaking), it should be +> interpreted as a character set shift. +> +> Of course, this doesn't make sense to anybody but a character set +> lawyer, and so should be changed. Especially since the header refers +> to ISO-8859-2 which everybody these days thinks of as _one, 8-bit_ +> character set, not two 7-bit ones. +> +> As for the "Japanese" in the table, that's just a really stupid +> "optimization": those happen to be line-drawing characters available +> in JIS X 0208, to make pretty borders. Substitute "-", "+", and "|" +> in appropriate places to make ugly but portable borders. +> +> Hrvoje> Mule is just broken. Warn your friends. +> +> Hrvoje is on the rampage again. Warn your friends ;-) +> +> -- +> University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN +> Institute of Policy and Planning Sciences Tel/fax: +81 (298) 53-5091 +> _________________ _________________ _________________ _________________ +> What are those straight lines for? "XEmacs rules." + +-- +In order to save my hands, I am cutting back on my responses, especially +to XEmacs-related mail. You _will_ get a response, but please be patient. +If you need an immediate response and it is not apparent in your message, +please say so. Thanks for your understanding. + + + +--------------------------------------------------------------------------- + +From: + Ben Wing <ben@@666.com> + 12/14/1999 10:28 PM + + Subject: + Re: Autodetect proposal; specifer questions/suggestions + To: + "Stephen J. Turnbull" <turnbull@@sk.tsukuba.ac.jp> + + + + +I've always thought the specifier API is too complicated (and too +"write-only"), but I went back at one point well after I designed it and I +couldn't figure out an obvious way to simplify it that still kept reasonable +functionality. Perhaps that's what Custom did, and why it turned out bad. + +Inefficiency is a stupid reason not to use them. They seem efficient enough +for redisplay. Changing them might be inefficient, but Emacs Lisp is in +general, right? + +Can you propose an API or functionality change that will make them more used? + + + +"Stephen J. Turnbull" wrote: + +> >>>>> "Ben" == Ben Wing <ben@@666.com> writes: +> +> Ben> I think you should go ahead with your proposal, and assume it +> Ben> will get implemented. +> +> OK. "yas baas" ;-) +> +> On something totally different. I'm really bothered by the fact that +> specifiers are so little used (eg, Custom reimplements them badly), +> and the fact that every package seems to define its own set of faces +> (or whatever), rather than use the specifier mechanism to inherit from +> existing ones, or add new specifications to existing ones. API problem? +> +> Also, faces (maybe specifiers in general?) should have an autoload +> mechanism, and a @file{<package>-faces.el} (or @file{<package>-specifiers.el}) +> convention. There are a number of faces in (eg) Custom that I like to +> use, but I have to load Custom to get them. And Custom should be able +> to somehow see all the faces in various packages available, even when +> they are not loaded. +> +> I've seen claims that specifiers aren't very efficient. +> +> Opinions? +> +> -- +> University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN +> Institute of Policy and Planning Sciences Tel/fax: +81 (298) 53-5091 +> _________________ _________________ _________________ _________________ +> What are those straight lines for? "XEmacs rules." + +-- +In order to save my hands, I am cutting back on my responses, especially +to XEmacs-related mail. You _will_ get a response, but please be patient. +If you need an immediate response and it is not apparent in your message, +please say so. Thanks for your understanding. + + +----------------------------------------------------------------------------- +From: + Ben Wing <ben@@666.com> + 11/18/1999 9:02 PM + + Subject: + Re: Char-related crashes (hopefully) fixed + To: + "Stephen J. Turnbull" <turnbull@@sk.tsukuba.ac.jp> + CC: + XEmacs Beta List <xemacs-beta@@xemacs.org> + + + + +OK, in summation: + +1. C-q is a user-level function and should do whatever makes the most sense. +2. int-char is a low-level primitive and should never depend on high-level +settings like language environment. +3. Everything you can do with int-char can and should be done with make-char +-- representation-independent, much less likelihood of bugs, etc. Therefore +int-char should be removed. +4. Note that CLTL2 also removes int-char. +5. Your statement + +> In one-byte buffers (either Olivier's 1/2/4 extension or `xemacs -font +> *-iso8859-2') it implicitly will have dependence whatever you say. + +is confusing internal and external representations. + +ben + +"Stephen J. Turnbull" wrote: + +> Can somebody give a bunch of examples where using integers as +> characters is useful? For that matter, where they are actually used? +> Ben said "backward compatibility," but I haven't seen this used, and I +> don't really know how to grep for it. I have grepped for int-char, +> int-to-char, char-int, and char-to-int and they're pretty rare in the +> core and package code (2/3 of it) that I have. +> +> The only one that I ever use is the C-q hack for inserting characters +> by code value at the keyboard, and that could arguably (and in +> Japanese invariably is) delegated to an input method which would know +> about language environment (and return a true character). +> +> For iterating over a character set in "natural" order, only ASCII +> satisfies the requirement of having one, and even that's shaky. AFAIK +> the Swedes and the Norwegians, or is it the Danes, disagree on +> ordering the _letters_ in ISO-8859-1 character set. This really +> should be table-driven, and will have to be for everything except +> ASCII and ISO-8859-1 if we go to a Unicode internal representation. +> +> We already have primitives for efficient case conversion and the like. +> +> The only example I can think of offhand where you would really really +> want the facility is to iterate over a code space where you don't know +> which points are legal characters. Eg, to print out tables of fonts. +> Pretty specialized. And this can be done through make-char, anyway. +> +> According to CLtL1, the main portable use for char-int is for hashing. +> But that doesn't square with the kind of usage we've been talking +> about (in loops and the like). +> +> What else am I missing? +> +> Ben's desiderata have some problems. +> +> >>>>> "Ben" == Ben Wing <ben@@666.com> writes: +> +> Ben> Either int-char should be the mirror opposite of char-int +> Ben> (i.e. accept all legal char integers), or it should be +> Ben> removed entirely. +> +> OK. I agree with this. +> +> Ben> int-char should @strong{never} have any dependence on the language +> Ben> environment. +> +> In one-byte buffers (either Olivier's 1/2/4 extension or `xemacs -font +> *-iso8859-2') it implicitly will have dependence whatever you say. +> Even without Mule, people can always use external encoders to change +> raw ISO-8859-2 to ISO-2022 (not that anybody sane ever would, OK, +> Hrvoje?). Then the two files will be interpreted differently in a +> Latin-1 locale Mule; the ISO-8859-2 file will be recognized as +> ISO-8859-1, and the ISO-2022 file will be internally interpreted as +> ISO-8859-2. +> +> The point is that people normally assume that int-char should accept +> their "natural" integer to character map. For Americans, that's +> ASCII, for Germans, that's ISO-8859-1, for Croatians, that's +> ISO-8859-2. And it works "correctly" in a no-mule XEmacs with `-font +> *-iso8859-2'! Japanese usually use ku-ten or JIS, and there's a +> "natural" map from byte-sized integer pairs to shorts, but it's full +> of holes. So language environments don't agree on what a legal char +> integer is, and where they do (eg, ISO-8859-1 and ISO-8859-2), they +> don't agree on the map. To satisfy your dictum (with which I agree, +> but I take to mean we should get rid of these functions) we can take +> the intersection where they agree +> +> ==> legal char integers == ASCII +> +> which is what I prefer, or pick something arbitrary and efficient +> +> ==> char-int returns the internal representation +> +> which I really hate, or something else. Suggestions? +> +> Ben> I don't think C-q should either. If Hrvoje wants to insert +> Ben> Latin-2 characters by number, then make C-u C-q work so that +> Ben> it also prompts for a character set, with a default chosen +> Ben> from the language environment. +> +> And restrict this to ASCII? Or assume Latin-1 in GR if there is no +> prefix argument? +> +> This is a useful feature. C-q currently inserts Latin-2 characters +> for Hrvoje in no-mule XEmacs (stretching the point only a little); I +> think it should continue to do so in Mule. This really is an input +> method issue, not a keyboard issue. In XEmacs, inserting an integer +> into a buffer has no meaning. Users insert characters. So this is a +> completely different issue from the programming API, and should not be +> considered analogous. +> +> Maybe we could have C-q insert according to the Unicode standard, and +> treat C-u C-q as part of the input method. But I think most users +> would prefer to have C-q insert according to their locale-standard +> tables, and select Unicode explicitly using the C-u C-q idiom. In +> fact (again this points to the input method idea), Japanese users +> would probably like to have the alternatives of using kuten (pairs +> from 1--94 x 1--94) or JIS (pairs from 0x21--0x7E x 0x21--0x7E) as +> options since both indexing systems are common in tables. +> +> -- +> University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN +> Institute of Policy and Planning Sciences Tel/fax: +81 (298) 53-5091 +> __________________________________________________________________________ +> __________________________________________________________________________ +> What are those two straight lines for? "Free software rules." + +-- +ben + +-- +In order to save my hands, I am cutting back on my responses, especially to +XEmacs-related mail. You +_will_ get a response, but please be patient. If you need an immediate +response and it’s not apparent in +your message, please say so. Thanks for your understanding. + + + +----------------------------------------------------------------------------- + +From: + Ben Wing <ben@@666.com> + 11/16/1999 11:03 PM + + Subject: + Re: Char-related crashes (hopefully) fixed + To: + Yoshiki Hayashi <t90553@@m.ecc.u-tokyo.ac.jp> + CC: + Hrvoje Niksic <hniksic@@iskon.hr>, + XEmacs Beta List <xemacs-beta@@xemacs.org> + + + + +Either int-char should be the mirror opposite of char-int (i.e. accept all +legal char integers), or it should be removed entirely. + +int-char should @strong{never} have any dependence on the language environment. + +I don't think C-q should either. If Hrvoje wants to insert Latin-2 +characters by number, then make C-u C-q work so that it also prompts for a +character set, with a default chosen from the language environment. + +ben + +Yoshiki Hayashi wrote: + +> Hrvoje Niksic <hniksic@@iskon.hr> writes: +> +> > As Ben said, now that we've fixed the actual bugs, we can think about +> > changing the behaviour for int-char conversions for 21.2. +> +> Following are proposed which integers should be accepted +> where characters are expected: +> +> 1) Don't allow anything +> 2) Accept 0-127 +> 3) Accept 0-256 +> 4) Accept everything +> +> Other things proposed are: +> +> a) When doing C-q, treat 128-256 as Latin-2 in Latin 2 +> language environment. +> +> So far, most of the proposal is intended to apply to every +> int-char conversions, I'd like to make some functions to +> accept. +> +> My plan is: +> Accept only 0-256 in every place except int-to-char. +> int-to-char accepts every valid integers. +> Make new function which does int-to-char conversion +> correctly according to the language environment. +> +> This way, most of the code which does (insert (1+ ?a)) or +> something continues working. Now internal representation is +> changed a little bit, so disabling > 256 characters will +> warn those who are dealing with internal representation +> directly, which is bad. Still, you can do +> (let ((i 1442)) +> (while (i < 2000) +> (insert (int-to-char i)) +> (setq i (+1 i)))) +> to achieve old behaviour. +> +> For C-q, I'm not for changing it's original definition, +> since it might confuse people who are expecting Latin-1 in +> other language environment and typing just 1 integer doesn't +> make sense for multibyte world. It's cleaner to make new +> function, which does make-char according to the charset of +> language-info-alist so that people who use that often can +> bind it to C-q or some other keys. +> +> -- +> Yoshiki Hayashi + +-- +ben + +-- +In order to save my hands, I am cutting back on my responses, especially to +XEmacs-related mail. You +_will_ get a response, but please be patient. If you need an immediate +response and it’s not apparent in +your message, please say so. Thanks for your understanding. + + + +@end example + +@node Discussion -- Windows External Widget, Discussion -- Packages, Discussion -- Multilingual Issues, Future Work Discussion +@section Discussion -- Windows External Widget +@cindex discussion, windows external widget +@cindex windows external widget, discussion + +@example + +Subject: + Re: External Widget Support for Xemacs on nt + Date: + Sat, 08 Jul 2000 01:47:14 -0700 + From: + Ben Wing <ben@@666.com> + To: + Timothy.Fowler@@msdw.com + CC: + xemacs-nt@@xemacs.org + References: + 1 + + + + +Nothing is currently done for external widget support under XEmacs but it should +not be too hard to do and would be a great addition to XEmacs. What you would +probably want to do is create an XEmacs control that has an interface something +like the built-in edit control and which communicates to an existing XEmacs +process using DDE. (Basically you would modify XEmacs so that it registered +itself as a DDE server accepting external widget requests, and then the external +edit control would simply send a DDE request and the result would be a handle of +some sort used for future communication with that particular XEmacs process.) + +There are two basic issues in getting the external widget to work, which are +display and input. Although I am not completely sure, I have a feeling that it +is possible for one process to write into the window of another process, simply +by using that window's HWND handle. If so it should be extremely easy to get the +output working (this is exactly the approach used under Xt). For input, you +would probably again want to do what is done under Xt, which is that the client +widget simply passes all of the appropriate messages to the XEmacs server +process using whatever communication channel was set up, e.g. DDE, and the +XEmacs server processes them normally. Very few modifications would be needed to +the XEmacs source code and all of the necessary modifications could be done +simply by looking for existing external widget code in XEmacs. + +If you are interested in continuing this, I will certainly give you any support +you need along the way. This would be a great project to be added to XEmacs. + + + +Timothy Fowler wrote: + +> I am looking into external widget support for xemacs nt similar to that +> existing in xemacs for X +> Have any developement efforts been made in this direction in the past? +> Is there any current effort? +> Any insight into the complexity of achieving this? +> Any comments would be greatly appreciated +> Thanks +> Tim Fowler + +-- +Ben + +In order to save my hands, I am cutting back on my mail. I also write +as succinctly as possible -- please don't be offended. If you send me +mail, you _will_ get a response, but please be patient, especially for +XEmacs-related mail. If you need an immediate response and it is not +apparent in your message, please say so. Thanks for your understanding. + +See also http://www.666.com/ben/chronic-pain/ + + +Subject: + RE: External Widget Support for Xemacs on nt + Date: + Mon, 10 Jul 2000 12:40:01 +0100 + From: + "Alastair J. Houghton" <ajhoughton@@lineone.net> + To: + "Ben Wing" <ben@@666.com>, <xemacs-nt@@xemacs.org> + CC: + <Timothy.Fowler@@msdw.com> + + + + +> -----Original Message----- +> From: owner-xemacs-nt@@xemacs.org [mailto:owner-xemacs-nt@@xemacs.org]On +> Behalf Of Ben Wing +> Sent: 08 July 2000 09:47 +> To: Timothy.Fowler@@msdw.com +> Cc: xemacs-nt@@xemacs.org +> Subject: Re: External Widget Support for Xemacs on nt +> +> Nothing is currently done for external widget support under +> XEmacs but it should +> not be too hard to do and would be a great addition to XEmacs. +> What you would +> probably want to do is create an XEmacs control that has an +> interface something +> like the built-in edit control and which communicates to an +> existing XEmacs +> process using DDE. + +It would be @strong{much} better to use RPC or COM rather than DDE - and +also it would provide a more useful interface to XEmacs (like the +Microsoft rich text edit control that is used by Wordpad). It +would probably also be easier... + +> If you are interested in continuing this, I will certainly give +> you any support +> you need along the way. This would be a great project to be added +> to XEmacs. + +I agree. This would be a *really useful* thing to do... + +Regards, + +Alastair. + +____________________________________________________________ +Alastair Houghton ajhoughton@@lineone.net + +Subject: + Re: External Widget Support for Xemacs on nt + Date: + Mon, 10 Jul 2000 22:56:06 -0700 + From: + Ben Wing <ben@@666.com> + To: + "Alastair J. Houghton" <ajhoughton@@lineone.net> + CC: + xemacs-nt@@xemacs.org, Timothy.Fowler@@msdw.com + References: + 1 + + + + +sounds good. i don't know too much about windows ipc methods, so i suggested +dde just as an example. + +"Alastair J. Houghton" wrote: + +> > -----Original Message----- +> > From: owner-xemacs-nt@@xemacs.org [mailto:owner-xemacs-nt@@xemacs.org]On +> > Behalf Of Ben Wing +> > Sent: 08 July 2000 09:47 +> > To: Timothy.Fowler@@msdw.com +> > Cc: xemacs-nt@@xemacs.org +> > Subject: Re: External Widget Support for Xemacs on nt +> > +> > Nothing is currently done for external widget support under +> > XEmacs but it should +> > not be too hard to do and would be a great addition to XEmacs. +> > What you would +> > probably want to do is create an XEmacs control that has an +> > interface something +> > like the built-in edit control and which communicates to an +> > existing XEmacs +> > process using DDE. +> +> It would be @strong{much} better to use RPC or COM rather than DDE - and +> also it would provide a more useful interface to XEmacs (like the +> Microsoft rich text edit control that is used by Wordpad). It +> would probably also be easier... +> +> > If you are interested in continuing this, I will certainly give +> > you any support +> > you need along the way. This would be a great project to be added +> > to XEmacs. +> +> I agree. This would be a *really useful* thing to do... +> +> Regards, +> +> Alastair. +> +> ____________________________________________________________ +> Alastair Houghton ajhoughton@@lineone.net + +-- +Ben + +In order to save my hands, I am cutting back on my mail. I also write +as succinctly as possible -- please don't be offended. If you send me +mail, you _will_ get a response, but please be patient, especially for +XEmacs-related mail. If you need an immediate response and it is not +apparent in your message, please say so. Thanks for your understanding. + +See also http://www.666.com/ben/chronic-pain/ + +@end example + + +@node Discussion -- Packages, Discussion -- Distribution Layout, Discussion -- Windows External Widget, Future Work Discussion +@section Discussion -- Packages +@cindex discussion, packages +@cindex packages, discussion + +Author: @uref{mailto:ben@@xemacs.org,Ben Wing} + +@subheading Important package-related changes + +This file details changes that make the package system no longer an +unmitigated disaster. This way, at the very least, people can +essentially ignore the package system and not get bitten horribly the +way they currently do. + +@enumerate +@item +A single tarball containing absolutely everything and named +xemacs-21.2.68.tar.gz. This must contain absolutely everything, +including all of the packages, and in the proper directory +structure, so that the paradigm for + +untar; configure; make; make install + +just works. + +@item +Fixed startup slowdown when all packages are installed so that +there is absolutely no penalty to having them all installed. This +may be hard. + +@item +All files on the ftp site should be accessible through http. + +@item +Put symlinks into the distribution directory to the appropriate +files in the package directory. + +@item +Eliminate the confusing SUMO name, choosing a much more obvious +name such as all-packages. + +@item +There should be no separation of mule and non-mule packages. + +@item +Having 2 packages that conflict with each other should be +completely disallowed. + +@item +Fix vc and ps-print so that there is only ONE version. + +@item +Fix up all of the READMEs on the distribution site to make it +abundantly clear what needs to be obtained, where to get it, and +how to install it, especially with regards to packages. +@end enumerate + +@node Discussion -- Distribution Layout, , Discussion -- Packages, Future Work Discussion +@section Discussion -- Distribution Layout +@cindex discussion, distribution layout +@cindex distribution layout, discussion + + +@example +From: + Ben Wing <ben@@666.com> + 10/15/1999 8:50 PM + + Subject: + VOTE: Absolutely necessary changes to file naming in releases + To: + SL Baur <steve@@xemacs.org>, + XEmacs Reviews <xemacs-review@@xemacs.org> + + + + +Everybody except Steve seems to agree that we need to provide a single +tar file containing the entire XEmacs tree whenever we release a new +version of XEmacs (beta or not). Therefore I propose the following +simple changes, and ask for a vote. If it is the general will of the +developers, then Steve @strong{WILL} make these changes. This is the +definition of cooperative development -- no one, not even the +maintainer, can assert absolute power over anything. + +I propose (assuming, for example, release 21.2.20): + +1. xemacs-21.2.20.tar.gz -> xemacs-21.2.20-core.tar.gz + +2. xemacs-sumo.tar.gz -> xemacs-packages.tar.gz + +3. xemacs-mule-sumo.tar.gz -> xemacs-mule-packages.tar.gz + +4. Symlinks to the files mentioned in #2 and #3 get created in the SAME +directory as xemacs-21.2.20-*.tar.gz. + +5. MOST IMPORTANTLY, a new file xemacs-21.2.20.tar.gz gets created, +which is the combination of the 5 files xemacs-21.2.20-core.tar.gz, +xemacs-21.2.20-elc.tar.gz, xemacs-21.2.20-info.tar.gz, +xemacs-packages.tar.gz, and xemacs-mule-packages.tar.gz. + + +The directory structure of the new combined file xemacs-21.2.20.tar.gz +would look like this: + +xemacs-21.2.20/ +xemacs-packages/ +xemacs-mule-packages/ + + +I am sorry to shout, but the current situation is just completely +insane. + +ben + + + + + + +From: + Ben Wing <ben@@666.com> + 10/16/1999 3:12 AM + + Subject: + Re: VOTE: Absolutely necessary changes to file naming in releases + To: + SL Baur <steve@@xemacs.org>, + XEmacs Reviews <xemacs-review@@xemacs.org>, + "Michael Sperber [Mr. Preprocessor]" <sperber@@informatik.uni-tuebingen.de> + + + + +Something went wrong with my mail program while I was responding, so +Michael's response is not quoted here. + +Let me rephrase my proposal, stressing the important points in order of +importance: + +1. MOST IMPORTANT: There MUST be a SINGLE tar file containing the complete +XEmacs sources, packages, etc. The name of this tar file must have a +format like this: + +xemacs-21.2.10.tar.gz + +The directory layout of the packages within it is not important as long as +it works: The user who downloads the tar file MUST be able to apply the +'configure; make; make install' paradigm at the top-level directory and +have it work properly. + +2. All the pieces of XEmacs must be in the @strong{same} subdirectory on the FTP +site. + +3. The names need to be obvious and standard. Naming the core files +"xemacs-21.2.20.tar.gz" is non-standard because those are only the core +files. The standard followed by everybody in the world is that a name like +this refers to the entire product, with all ancillary files. Also, "sumo", +although a nice in-joke, is extremely confusing and needs to go. + +Referring to Michael's point about the layout I proposed, I also think that +the package system needs to be modified to accept a layout produced by the +"obvious" way of obtaining and untarring the parts, which leaves you with a +directory consisting of + +xemacs-21.2.19/ +xemacs-packages/ +mule-packages/ + +All at the same level. However, this is an independent issue from the vote +at hand. + + +Consider the current insanity. The new XEmacs user or beta tester goes to +the FTP site, looks around, finds the file xemacs-21.2.19.tar.gz, and +downloads it, because it looks like the obvious one to get. But it doesn't +work. Oops ... He looks some more and finds the other two -elc and -info +parts, grabs them, and then tries again. But it still doesn't work. He +manages to overhear something about packages, so he looks for them, but +doesn't find them immediately (they're not even in the beta tree, though +they obviously contain beta-level code, especially in xemacs-base and +mule-base). Eventually he discovers the package/ subdirectory, but what +the hell does he do there? There's no README at all there giving any +clues, so he downloads everything. Along with this, he gets some files +called "sumo", which he doesn't understand, but he notices that some of +them are extremely large. "sumo" ... "large" ... hehe, I get it. Some +silly developer's joke. But then he tries again to compile things, and +just can't figure things out. He still doesn't know: + +-- "sumo" is not just some large file, but is a tar file of all the +packages. +-- The packages can't be placed is any subdirectory in any obvious relation +to the XEmacs directory ("straight out of the box" if you manage to grok +the significance of the sumo files, you get a layout like + +xemacs-21.2.19/ +xemacs-packages/ +mule-packages/ + +which naturally doesn't work! He needs to put them underneath +xemacs-21.2.19/lib/xemacs/ or something.) + +At this point, he gives up, and (if he was a user of a pre-packagized +XEmacs) wonders in despair how things got so messed up, when all older +XEmacs releases, including all the betas, followed the standard "configure; +make; make install" paradigm). + + + +Soooooo ......... PLEASE vote on issues #1-3 above, and add any comments +you feel like adding. + +ben + +Ben Wing wrote: + +> Everybody except Steve seems to agree that we need to provide a single +> tar file containing the entire XEmacs tree whenever we release a new +> version of XEmacs (beta or not). Therefore I propose the following +> simple changes, and ask for a vote. If it is the general will of the +> developers, then Steve @strong{WILL} make these changes. This is the +> definition of cooperative development -- no one, not even the +> maintainer, can assert absolute power over anything. +> +> I propose (assuming, for example, release 21.2.20): +> +> 1. xemacs-21.2.20.tar.gz -> xemacs-21.2.20-core.tar.gz +> +> 2. xemacs-sumo.tar.gz -> xemacs-packages.tar.gz +> +> 3. xemacs-mule-sumo.tar.gz -> xemacs-mule-packages.tar.gz +> +> 4. Symlinks to the files mentioned in #2 and #3 get created in the SAME +> directory as xemacs-21.2.20-*.tar.gz. +> +> 5. MOST IMPORTANTLY, a new file xemacs-21.2.20.tar.gz gets created, +> which is the combination of the 5 files xemacs-21.2.20-core.tar.gz, +> xemacs-21.2.20-elc.tar.gz, xemacs-21.2.20-info.tar.gz, +> xemacs-packages.tar.gz, and xemacs-mule-packages.tar.gz. +> +> The directory structure of the new combined file xemacs-21.2.20.tar.gz +> would look like this: +> +> xemacs-21.2.20/ +> xemacs-packages/ +> xemacs-mule-packages/ +> +> I am sorry to shout, but the current situation is just completely +> insane. +> +> ben + + + +From: + Ben Wing <ben@@666.com> + 12/6/1999 4:19 AM + + Subject: + Re: Please Vote on Proposals + To: + Kyle Jones <kyle_jones@@wonderworks.com> + CC: + XEmacs Review <xemacs-review@@xemacs.org> + + + + +OK Kyle, how about a different proposal: + +1. The distribution consists of the following three parts (let's assume +v21.2.25): + +-- xemacs-21.2.25-core.tar.gz + The same as would currently in xemacs-21.2.25.tar.gz. You can + run this editor and edit in fundamental mode, but not do anything +else. + +-- xemacs-21.2.25-core-packages.tar.gz + A useful and complete subset of all the possible packages. Selection +of + what goes in and what goes out is based partially on consensus, +partially + on vote, and partially on these criteria: + + -- commonly-used packages go in. + -- unmaintained or out-of-date packages go out. + -- buggy, poorly-written packages go out. + -- really obscure packages that hardly anybody could possibly care + about go out. + -- when there are two or three packages implementing basically the + same functionality, pick only one to go in unless there are two +that + both are really commonly-used. + -- if a package can be loaded implicitly as a result of something in +the + core, it needs to go in, regardless of whether it's been +maintained. + This applies, for example, to the mode files -- @strong{all} mode +packages must + go in (or more properly, every mode must have a corresponding +package + that's in, although if there are two or more packages implementing +a + particular mode, e.g. html, we are free to choose just one). + +-- xemacs-21.2.25-aux-packages.tar.gz + All of the packages not in the previous file. Generally +crappy-quality, + poorly-maintained code. + +Note, we do not make distinctions between Mule and non-Mule in our +packaging scheme -- this is a bug and XEmacs and/or the packages should +be fixed up so that this goes away. + +2. The distribution also contains two combination files: + +-- xemacs-21.2.25.tar.gz + This is the "default" file that a naive user ought to retrieve, and + he'll get a running XEmacs, just like he wants, and comfortable, too, + because all of the common packages are there. This file is a +combination + of xemacs-21.2.25-core.tar.gz and xemacs-21.2.25-core-packages.tar.gz. + +-- xemacs-21.2.25-everything.tar.gz + This file contains absolutely everything, like it advertises -- + including the aux packages and all of their associated crappy-quality, + + unmaintained code. This file is a combination of +xemacs-21.2.25-core.tar.gz, + xemacs-21.2.25-core-packages.tar.gz, and +xemacs-21.2.25-aux-packages.tar.gz. + + +I like this proposal better than the previous one I advocated, because it +follows your good suggestion of separating the wheat from the chaff in +the packages, so to speak. People will grab xemacs-21.2.25.tar.gz by +default, just like they should, +and they'll get something they're quite happy with, and we're happy +because we can exercise quality control over the packages and exclude the +crappy ones most likely to cause grief later on. + + +What say y'all? + +ben + + + +Kyle Jones wrote: + +> Ben Wing writes: +> > Disagree. Please let's follow everyone else's convention, and not +> > introduce yet another randomness. +> +> It is not randomness! I think this is a semantic issue and an +> important one. The issue is: What do we consider part of XEmacs +> and what is considered external to XEmacs. If you put all the +> packages in xemacs.tar.gz, then users can reasonably and wrongly +> assume that all this random Lisp code is maintained by us. We +> are trying to stay away from that model because in the past it has +> left us with piles and piles of orphaned code. Even if every one +> of us were paid to maintain XEmacs, it is just not practical for +> us to continue to maintain all that code, let alone any new code. +> So I think the naming distinction Jan is making is worth doing. +> +> Also, I don't consider the current situation broken, except +> perhaps the sumo tarball being out of date. I never, ever, +> though it was a great idea to ship all the stuff that XEacs +> shipped in the old days. Because this pile of code was always +> around in the distribution, an enormous web of undocumented +> dependencies was constructed. Eventually, you HAD to install +> everything because if you left something out or removed something +> you never knew when XEmacs would throw an error. Thus the Cult +> of the Cargo was born. +> +> One of the best things that came out of the package system was +> the month or two we spent running XEmacs without all the assorted +> Lisp installed. Dependencies were removed or documented, some +> stuff got retired, and for the first time we actually had a full +> accounting of what we were shipping. I currently run XEmacs with +> 7 packages and I don't miss the other stuff. +> +> Having come this far, I do not think we should go back to +> advocating that everyone just install everything and not +> think about they are doing. Besides saving space and startup +> time, another reason to not install everything is that you +> won't bloat your XEmacs process nearly as much if you go +> exploring in the Custom menus, because there won't be as much +> Lisp loaded as Custom sets up its groups and whatnot. + +-- +In order to save my hands, I am cutting back on my responses, especially +to XEmacs-related mail. You _will_ get a response, but please be +patient. +If you need an immediate response and it is not apparent in your message, + +please say so. Thanks for your understanding. @end example @node Old Future Work, Index, Future Work Discussion, Top @@ -20041,19 +24106,21 @@ they may discuss relevant design issues, alternative implementations, or work still to be done. - @menu -* Future Work -- A Portable Unexec Replacement:: -* Future Work -- Indirect Buffers:: -* Future Work -- Improvements in support for non-ASCII (European) keysyms under X:: -* Future Work -- xemacs.org Mailing Address Changes:: -* Future Work -- Lisp callbacks from critical areas of the C code:: +* Old Future Work -- A Portable Unexec Replacement:: +* Old Future Work -- Indirect Buffers:: +* Old Future Work -- Improvements in support for non-ASCII (European) keysyms under X:: +* Old Future Work -- RTF Clipboard Support:: +* Old Future Work -- xemacs.org Mailing Address Changes:: +* Old Future Work -- Lisp callbacks from critical areas of the C code:: @end menu -@node Future Work -- A Portable Unexec Replacement, Future Work -- Indirect Buffers, Old Future Work, Old Future Work -@section Future Work -- A Portable Unexec Replacement -@cindex future work, a portable unexec replacement -@cindex a portable unexec replacement, future work +@node Old Future Work -- A Portable Unexec Replacement, Old Future Work -- Indirect Buffers, Old Future Work, Old Future Work +@section Old Future Work -- A Portable Unexec Replacement +@cindex old future work, a portable unexec replacement +@cindex a portable unexec replacement, old future work + +Author: @uref{mailto:ben@@xemacs.org,Ben Wing} @strong{Abstract:} Currently, during the build stage of XEmacs, a bare version of the program (called @dfn{temacs}) is run, which loads up a @@ -20181,10 +24248,12 @@ @code{free} function when freeing dynamically-allocated data, depending on whether this data was allocated by us or by the -@node Future Work -- Indirect Buffers, Future Work -- Improvements in support for non-ASCII (European) keysyms under X, Future Work -- A Portable Unexec Replacement, Old Future Work -@section Future Work -- Indirect Buffers -@cindex future work, indirect buffers -@cindex indirect buffers, future work +@node Old Future Work -- Indirect Buffers, Old Future Work -- Improvements in support for non-ASCII (European) keysyms under X, Old Future Work -- A Portable Unexec Replacement, Old Future Work +@section Old Future Work -- Indirect Buffers +@cindex old future work, indirect buffers +@cindex indirect buffers, old future work + +Author: @uref{mailto:ben@@xemacs.org,Ben Wing} An indirect buffer is a buffer that shares its text with some other buffer, but has its own version of all of the buffer properties, @@ -20260,12 +24329,12 @@ iterating over a buffer, and then all of the indirect children of that buffer. -@node Future Work -- Improvements in support for non-ASCII (European) keysyms under X, Future Work -- xemacs.org Mailing Address Changes, Future Work -- Indirect Buffers, Old Future Work -@section Future Work -- Improvements in support for non-ASCII (European) keysyms under X -@cindex future work, improvements in support for non-ascii (european) keysyms under x -@cindex improvements in support for non-ascii (european) keysyms under x, future work - -From Martin Buchholz. +@node Old Future Work -- Improvements in support for non-ASCII (European) keysyms under X, Old Future Work -- RTF Clipboard Support, Old Future Work -- Indirect Buffers, Old Future Work +@section Old Future Work -- Improvements in support for non-ASCII (European) keysyms under X +@cindex old future work, improvements in support for non-ascii (european) keysyms under x +@cindex improvements in support for non-ascii (european) keysyms under x, old future work + +Author: @uref{mailto:martin@@xemacs.org,Martin Buchholz} If a user has a keyboard with known standard non-ASCII character equivalents, typically for European users, then Emacs' default @@ -20284,6 +24353,7 @@ This is implemented by maintaining a table of translations between all the known X keysym names and the corresponding (charset, octet) pairs. +@quotation For every key on the keyboard that has a known character correspondence, we define the ascii-character property of the keysym, and make the default binding for the key be self-insert-command. @@ -20295,11 +24365,92 @@ In a non-Mule world, a user can still have a multi-lingual editor, by doing (set-face-font "...-iso8859-2" (current-buffer)) for all their Latin-2 buffers, etc. - -@node Future Work -- xemacs.org Mailing Address Changes, Future Work -- Lisp callbacks from critical areas of the C code, Future Work -- Improvements in support for non-ASCII (European) keysyms under X, Old Future Work -@section Future Work -- xemacs.org Mailing Address Changes -@cindex future work, xemacs.org mailing address changes -@cindex xemacs.org mailing address changes, future work +@end quotation + +@node Old Future Work -- RTF Clipboard Support, Old Future Work -- xemacs.org Mailing Address Changes, Old Future Work -- Improvements in support for non-ASCII (European) keysyms under X, Old Future Work +@section Old Future Work -- RTF Clipboard Support +@cindex old future work, RTF clipboard support +@cindex RTF clipboard support, old future work + +Author: @uref{mailto:ben@@xemacs.org,Ben Wing} + +in fact, i merged the windows stuff with the already-existing generic code. + +what i'd like to see is something like this: + +@enumerate +@item +The current function + +@example +(defun own-selection (data &optional type append) +@end example + +should become + +@example +(defun own-selection (data &optional type how-to-add data-type) +@end example + +where data-type is the mswindows format, and how-to-add is + +@example +'replace-all or nil -- remove data for all formats +'replace-existing -- remove data for DATA-TYPE, but leave other formats alone +'append or t -- append data to existing data in DATA-TYPE, and leave other +formats alone +@end example + +@item +the function + +@example +(get-selection &optional TYPE DATA-TYPE) +@end example + +already has a data-type so you don't need to change it. + +@item +the existing function + +@example +(selection-exists-p &optional SELECTION DEVICE) +@end example + +should become + +@example +(selection-exists-p &optional SELECTION DEVICE DATA-TYPE) +@end example + +@item +a new function + +@example +(register-selection-data-type DATA-TYPE) +@end example + +like your mswindows-register-clipboard-format. + +@item +there's already a selection-converter-alist, but that's only for data out. +you should alias it to selection-conversion-out-alist, and create +selection-conversion-in-alist. these alists contain entries for CF_TEXT, which +handles CR/LF conversion, and rtf, which does rtf in/out conversion -- no need +for separate functions to do this. + +this may seem daunting, but it's much less hard to add stuff like this than it +seems, and i and others will certainly give you lots of support if you run into +problems. it would be way cool to have a more powerful clipboard mechanism in +XEmacs. +@end enumerate + +@node Old Future Work -- xemacs.org Mailing Address Changes, Old Future Work -- Lisp callbacks from critical areas of the C code, Old Future Work -- RTF Clipboard Support, Old Future Work +@section Old Future Work -- xemacs.org Mailing Address Changes +@cindex old future work, xemacs.org mailing address changes +@cindex xemacs.org mailing address changes, old future work + +Author: @uref{mailto:ben@@xemacs.org,Ben Wing} @subheading Personal addresses @@ -20382,12 +24533,13 @@ @uref{../../www.666.com/ben/default.htm,Ben Wing} -@node Future Work -- Lisp callbacks from critical areas of the C code, , Future Work -- xemacs.org Mailing Address Changes, Old Future Work -@section Future Work -- Lisp callbacks from critical areas of the C code -@cindex future work, lisp callbacks from critical areas of the c code -@cindex lisp callbacks from critical areas of the c code, future work - -@example +@node Old Future Work -- Lisp callbacks from critical areas of the C code, , Old Future Work -- xemacs.org Mailing Address Changes, Old Future Work +@section Old Future Work -- Lisp callbacks from critical areas of the C code +@cindex old future work, lisp callbacks from critical areas of the c code +@cindex lisp callbacks from critical areas of the c code, old future work + +Author: @uref{mailto:ben@@xemacs.org,Ben Wing} + There are many places in the XEmacs C code where Lisp functions are called, usually because the Lisp function is acting as a callback, hook, process filter, or the like. The lisp code is often called in @@ -20423,61 +24575,62 @@ The sets of dangerous operations which can be prohibited are: -OPERATION_GC_PROHIBITED -1. garbage collection. When this flag is set, and the garbage - collection threshold is reached, garbage collection simply doesn't - happen. It will happen at the next opportunity that it is allowed. - Similarly, explicitly calling the Lisp function garbage-collect - simply does nothing. - -OPERATION_CATCH_ERRORS -2. signalling an error. When @code{enter_sensitive_code_section()} is - called, with the bit flag corresponding to this prohibited - operation. When this bit flag is passed to - @code{enter_sensitive_code_section()}, a catch is set up which catches all - errors, signals a warning with @code{warn_when_safe()}, and then simply - continues. This is exactly the same behavior you now get with the - @code{call_*_trapping_errors()} functions. (there should also be some way - of specifying a warning level and class here, similar to the - @code{call_*_trapping_errors()} functions. This is not completely - important, however, because a standard warning level and class - could simply be chosen.) - -OPERATION_NO_UNSAFE_OBJECT_DELETION -3. This flag prohibits deletion of any permanent object (i.e. any - object that does not automatically disappear when created, such as - buffers, frames, devices, windows, etc...) unless they were created - after this bit flag was set. This would be implemented using a - list which stores all of the permanent objects created after this - bit flag was set. This list is reset to its previous value when - the call to @code{exit_sensitive_code_section()} occurs. The motivation - here is to allow Lisp callbacks to create their own temporary - buffers or frames, and later delete them, but not allow any other - permanent objects to be deleted, because C code might be working - with them, and not expect them to change. - -OPERATION_NO_BUFFER_MODIFICATION -4. This flag disallows modifications to the text, extent or any other - properties of any buffers except those created after this flag was - set, just like in the previous entry. - -OPERATION_NO_REDISPLAY -5. This bit flag inhibits any redisplay-related operations from - happening, more specifically, any entry into the redisplay-related - code. This includes, for example, the Lisp functions sit-for, - force-redisplay, force-cursor-redisplay, window-end with certain - arguments to it, and various other functions. When this flag is - set, instead of entering the redisplay code, the calling function - should simply make sure not to enter the redisplay code, (for - example, in the case of window-end), or postpone the redisplay - until such a time when it's safe (for example, with sit-for and - force-redisplay). - -OPERATION_NO_REDISPLAY_SETTINGS_CHANGE -6. This flag prohibits any modifications to faces, glyphs, specifiers, - extents, or any other settings that will affect the way that any - window is displayed. - +@table @code +@item OPERATION_GC_PROHIBITED +garbage collection. When this flag is set, and the garbage +collection threshold is reached, garbage collection simply doesn't +happen. It will happen at the next opportunity that it is allowed. +Similarly, explicitly calling the Lisp function garbage-collect +simply does nothing. + +@item OPERATION_CATCH_ERRORS +signalling an error. When @code{enter_sensitive_code_section()} is +called, with the bit flag corresponding to this prohibited +operation. When this bit flag is passed to +@code{enter_sensitive_code_section()}, a catch is set up which catches all +errors, signals a warning with @code{warn_when_safe()}, and then simply +continues. This is exactly the same behavior you now get with the +@code{call_*_trapping_errors()} functions. (there should also be some way +of specifying a warning level and class here, similar to the +@code{call_*_trapping_errors()} functions. This is not completely +important, however, because a standard warning level and class +could simply be chosen.) + +@item OPERATION_NO_UNSAFE_OBJECT_DELETION +This flag prohibits deletion of any permanent object (i.e. any +object that does not automatically disappear when created, such as +buffers, frames, devices, windows, etc...) unless they were created +after this bit flag was set. This would be implemented using a +list which stores all of the permanent objects created after this +bit flag was set. This list is reset to its previous value when +the call to @code{exit_sensitive_code_section()} occurs. The motivation +here is to allow Lisp callbacks to create their own temporary +buffers or frames, and later delete them, but not allow any other +permanent objects to be deleted, because C code might be working +with them, and not expect them to change. + +@item OPERATION_NO_BUFFER_MODIFICATION +This flag disallows modifications to the text, extent or any other +properties of any buffers except those created after this flag was +set, just like in the previous entry. + +@item OPERATION_NO_REDISPLAY +This bit flag inhibits any redisplay-related operations from +happening, more specifically, any entry into the redisplay-related +code. This includes, for example, the Lisp functions sit-for, +force-redisplay, force-cursor-redisplay, window-end with certain +arguments to it, and various other functions. When this flag is +set, instead of entering the redisplay code, the calling function +should simply make sure not to enter the redisplay code, (for +example, in the case of window-end), or postpone the redisplay +until such a time when it's safe (for example, with sit-for and +force-redisplay). + +@item OPERATION_NO_REDISPLAY_SETTINGS_CHANGE +This flag prohibits any modifications to faces, glyphs, specifiers, +extents, or any other settings that will affect the way that any +window is displayed. +@end table The idea here is that it will finally be safe to call Lisp code from nearly any part of the C code, simply by setting any combination of @@ -20486,7 +24639,6 @@ reason that I thought of this is that some coding system translations might cause Lisp code to be invoked and C code often invokes these translations in sensitive places. -@end example @c Indexing guidelines