comparison man/internals/internals.texi @ 2362:6aa56b089139

[xemacs-hg @ 2004-11-02 09:51:04 by ben] To: xemacs-patches@xemacs.org internals/index.texi: Deleted. Incorporated into internals.texi. Having a separate index file messes up texinfo-master-menu. internals/internals.texi: Add bunches and bunches and bunches and bunches of stuff, taken from documentation floating around in various places -- text.c, file-coding.c, other .c and .h files, stuff that I wrote up for an old XEmacs contract, proposals written up in the process of an e-mail discussion, etc. Fix up some mistakes, esp. in CCL. Extra crap from CCL, duplicated with Lispref, removed. Sections on Old Future Work and Future Work Discussion added. Bunches of other work. Add bunches of documentation taken from the source code. Fixup various places to use @strong{}, @code{}, @file{}. Create new Text chapter, split off from Buffers and Textual Representation. Create new chapter for MS Windows, mostly written from scratch. Consolidate all Mule info under "Multilingual Support". Break up chapter on modules and move some parts to the sections discussing the modules, for consolidation purposes. Add a big cross-reference table for all the modules to where they're discussed (or not). New chapter Asynchronous Events; Quit Checking. (Taken from various parts of the code.) New Introduction. New section on Focus Handling (from the code). NOTE that in the process, I discovered that we essentially have FOUR redundant introductions to Mule issues! Someone really needs to go through and clean them up and integrate them (sjt?).
author ben
date Tue, 02 Nov 2004 09:51:18 +0000
parents e13775448cf0
children ce4aa0ef8af1
comparison
equal deleted inserted replaced
2361:5ff532e448b5 2362:6aa56b089139
8 @dircategory XEmacs Editor 8 @dircategory XEmacs Editor
9 @direntry 9 @direntry
10 * Internals: (internals). XEmacs Internals Manual. 10 * Internals: (internals). XEmacs Internals Manual.
11 @end direntry 11 @end direntry
12 12
13 Copyright @copyright{} 1992 - 1996 Ben Wing. 13 Edition History:
14
15 Created November 1995 (?) by Ben Wing.
16 XEmacs Internals Manual Version 1.0, March, 1996.
17 XEmacs Internals Manual Version 1.1, March, 1997.
18 XEmacs Internals Manual Version 1.4, March, 2001.
19 XEmacs Internals Manual Version 21.5, October, 2004.
20 @c Please REMEMBER to update edition number in *four* places in this file,
21 @c including adding a line above.
22
23 Copyright @copyright{} 1992 - 2004 Ben Wing.
14 Copyright @copyright{} 1996, 1997 Sun Microsystems. 24 Copyright @copyright{} 1996, 1997 Sun Microsystems.
15 Copyright @copyright{} 1994 - 1998, 2002, 2003 Free Software Foundation. 25 Copyright @copyright{} 1994 - 1998, 2002, 2003 Free Software Foundation.
16 Copyright @copyright{} 1994, 1995 Board of Trustees, University of Illinois. 26 Copyright @copyright{} 1994, 1995 Board of Trustees, University of Illinois.
17 27
18 28
61 @setchapternewpage odd 71 @setchapternewpage odd
62 @finalout 72 @finalout
63 73
64 @titlepage 74 @titlepage
65 @title XEmacs Internals Manual 75 @title XEmacs Internals Manual
66 @subtitle Version 1.4, March 2001 76 @subtitle Version 21.5, October 2004
67 77
68 @author Ben Wing 78 @author Ben Wing
79 @sp 1
80
81 Improvements by
82
83 @sp 1
84
85 @author Stephen Turnbull
69 @author Martin Buchholz 86 @author Martin Buchholz
70 @author Hrvoje Niksic 87 @author Hrvoje Niksic
71 @author Matthias Neubauer 88 @author Matthias Neubauer
72 @author Olivier Galibert 89 @author Olivier Galibert
90 @author Andy Piper
91
92
73 @page 93 @page
74 @vskip 0pt plus 1fill 94 @vskip 0pt plus 1fill
75 95
76 @noindent 96 @noindent
77 Copyright @copyright{} 1992 - 1996, 2001 Ben Wing. @* 97 Copyright @copyright{} 1992 - 2004 Ben Wing. @*
78 Copyright @copyright{} 1996, 1997 Sun Microsystems, Inc. @* 98 Copyright @copyright{} 1996, 1997 Sun Microsystems. @*
79 Copyright @copyright{} 1994 - 1998 Free Software Foundation. @* 99 Copyright @copyright{} 1994 - 1998, 2002, 2003 Free Software Foundation. @*
80 Copyright @copyright{} 1994, 1995 Board of Trustees, University of Illinois. 100 Copyright @copyright{} 1994, 1995 Board of Trustees, University of Illinois.
81 101
82 @sp 2 102 @sp 2
83 Version 1.4 @* 103 Version 21.5 @*
84 March 2001.@* 104 October 2004.@*
85 105
86 Permission is granted to make and distribute verbatim copies of this 106 Permission is granted to make and distribute verbatim copies of this
87 manual provided the copyright notice and this permission notice are 107 manual provided the copyright notice and this permission notice are
88 preserved on all copies. 108 preserved on all copies.
89 109
100 included in a translation approved by the Free Software Foundation 120 included in a translation approved by the Free Software Foundation
101 instead of in the original English. 121 instead of in the original English.
102 @end titlepage 122 @end titlepage
103 @page 123 @page
104 124
105 @node Top, A History of Emacs, (dir), (dir) 125 @node Top, Introduction, (dir), (dir)
106 126
107 @ifinfo 127 @ifinfo
108 This Info file contains v1.4 of the XEmacs Internals Manual, March 2001. 128 This Info file contains v21.5 of the XEmacs Internals Manual, October 2004.
109 @end ifinfo 129 @end ifinfo
110 130
131 @c Don't update this by hand!!!!!!
132 @c Use C-u C-c C-u m (aka C-u M-x texinfo-master-list).
133 @c NOTE: This command does not include the Index:: menu entry.
134 @c You must add it by hand.
135
136 @c Here are some useful Lisp routines for quickly Texinfo-izing text that
137 @c has been formatted into ASCII lists and tables. The first routine is
138 @c currently more general and well-developed than the second.
139
140 @c (defun list-to-texinfo (b e)
141 @c "Convert the selected region from an ASCII list to a Texinfo list."
142 @c (interactive "r")
143 @c (save-restriction
144 @c (narrow-to-region b e)
145 @c (goto-char (point-min))
146 @c (let ((dash-type "^ *-+ +")
147 @c (num-type "^ *[[(]?\\([0-9]+\\|[a-z]\\)[]).] +")
148 @c dash)
149 @c (save-excursion
150 @c (cond ((re-search-forward num-type nil t))
151 @c ((re-search-forward dash-type nil t) (setq dash t))
152 @c (t (error "No table entries?"))))
153 @c (if dash (insert "@itemize @bullet\n")
154 @c (insert "@enumerate\n"))
155 @c (while (re-search-forward (if dash dash-type num-type) nil t)
156 @c (let ((p (point)))
157 @c (or (re-search-forward (if dash dash-type num-type) nil t)
158 @c (goto-char (point-max)))
159 @c (beginning-of-line)
160 @c (forward-line -1)
161 @c (let ((q (point)))
162 @c (goto-char p)
163 @c (kill-rectangle p q))
164 @c (insert "@item\n")))
165 @c (goto-char (point-max))
166 @c (beginning-of-line)
167 @c (if dash (insert "@end itemize\n")
168 @c (insert "@end enumerate\n")))))
169
170 @c (defun table-to-texinfo (b e)
171 @c "Convert the selected region from an ASCII table to a Texinfo table."
172 @c (interactive "r")
173 @c (save-restriction
174 @c (narrow-to-region b e)
175 @c (goto-char (point-min))
176 @c (insert "@table @code\n")
177 @c (while (not (eobp))
178 @c (insert "@item ")
179 @c (forward-sexp)
180 @c (delete-char)
181 @c (insert "\n")
182 @c (or (search-forward "\n\n" nil t)
183 @c (goto-char (point-max))))
184 @c (beginning-of-line)
185 @c (insert "@end table\n")))
186
187 @c A useful Lisp routine for adding markup based on conventions used in plain
188 @c text files; see doc string below.
189
190 @c (defun convert-text-to-texinfo (&optional no-narrow)
191 @c "Convert text to Texinfo.
192 @c If the region is active, do the region; otherwise, go from point to the end
193 @c of the buffer. This query-replaces for various kinds of conventions used
194 @c in text: @code{} surrounded by ` and ' or followed by a (); @strong{}
195 @c surrounded by *'s; @file{} something that looks like a file name."
196 @c (interactive)
197 @c (if (region-active-p)
198 @c (save-restriction
199 @c (narrow-to-region (region-beginning) (region-end))
200 @c (convert-comments-to-texinfo t))
201 @c (let ((p (point))
202 @c (case-replace nil))
203 @c (query-replace-regexp "`\\([^']+\\)'\\([^']\\)" "@code{\\1}\\2" nil)
204 @c (goto-char p)
205 @c (query-replace-regexp "\\(\\Sw\\)\\*\\(\\(?:\\s_\\|\\sw\\)+\\)\\*\\([^A-Za-z.}]\\)" "\\1@strong{\\2}\\3" nil)
206 @c (goto-char p)
207 @c (query-replace-regexp "\\(\\(\\s_\\|\\sw\\)+()\\)\\([^}]\\)" "@code{\\1}\\3" nil)
208 @c (goto-char p)
209 @c (query-replace-regexp "\\(\\(\\s_\\|\\sw\\)+\\.[A-Za-z]+\\)\\([^A-Za-z.}]\\)" "@file{\\1}\\3" nil)
210 @c )))
211
111 @menu 212 @menu
213 * Introduction:: Overview of this manual.
214 * Authorship of XEmacs::
112 * A History of Emacs:: Times, dates, important events. 215 * A History of Emacs:: Times, dates, important events.
113 * XEmacs From the Outside:: A broad conceptual overview. 216 * XEmacs From the Outside:: A broad conceptual overview.
114 * The Lisp Language:: An overview. 217 * The Lisp Language:: An overview.
115 * XEmacs From the Perspective of Building:: 218 * XEmacs From the Perspective of Building::
116 * Build-Time Dependencies:: 219 * Build-Time Dependencies::
117 * XEmacs From the Inside:: 220 * XEmacs From the Inside::
118 * The XEmacs Object System (Abstractly Speaking):: 221 * The XEmacs Object System (Abstractly Speaking)::
119 * How Lisp Objects Are Represented in C:: 222 * How Lisp Objects Are Represented in C::
120 * Major Textual Changes:: 223 * Major Textual Changes::
121 * Rules When Writing New C Code:: 224 * Rules When Writing New C Code::
122 * Regression Testing XEmacs:: 225 * Regression Testing XEmacs::
123 * CVS Techniques:: 226 * CVS Techniques::
124 * A Summary of the Various XEmacs Modules:: 227 * The Modules of XEmacs::
125 * Allocation of Objects in XEmacs Lisp:: 228 * Allocation of Objects in XEmacs Lisp::
126 * Dumping:: 229 * Dumping::
127 * Events and the Event Loop:: 230 * Events and the Event Loop::
128 * Evaluation; Stack Frames; Bindings:: 231 * Asynchronous Events; Quit Checking::
129 * Symbols and Variables:: 232 * Evaluation; Stack Frames; Bindings::
130 * Buffers and Textual Representation:: 233 * Symbols and Variables::
131 * MULE Character Sets and Encodings:: 234 * Buffers::
132 * The Lisp Reader and Compiler:: 235 * Text::
133 * Lstreams:: 236 * Multilingual Support::
134 * Consoles; Devices; Frames; Windows:: 237 * The Lisp Reader and Compiler::
135 * The Redisplay Mechanism:: 238 * Lstreams::
136 * Extents:: 239 * Consoles; Devices; Frames; Windows::
137 * Faces:: 240 * The Redisplay Mechanism::
138 * Glyphs:: 241 * Extents::
139 * Specifiers:: 242 * Faces::
140 * Menus:: 243 * Glyphs::
141 * Subprocesses:: 244 * Specifiers::
142 * Interface to the X Window System:: 245 * Menus::
143 * Index:: 246 * Subprocesses::
247 * Interface to MS Windows::
248 * Interface to the X Window System::
249 * Future Work::
250 * Future Work Discussion::
251 * Old Future Work::
252 * Index::
144 253
145 @detailmenu 254 @detailmenu
146 255 --- The Detailed Node Listing ---
147 --- The Detailed Node Listing ---
148 256
149 A History of Emacs 257 A History of Emacs
150 258
151 * Through Version 18:: Unification prevails. 259 * Through Version 18:: Unification prevails.
152 * Lucid Emacs:: One version 19 Emacs. 260 * Lucid Emacs:: One version 19 Emacs.
153 * GNU Emacs 19:: The other version 19 Emacs. 261 * GNU Emacs 19:: The other version 19 Emacs.
154 * GNU Emacs 20:: The other version 20 Emacs. 262 * GNU Emacs 20:: The other version 20 Emacs.
155 * XEmacs:: The continuation of Lucid Emacs. 263 * XEmacs:: The continuation of Lucid Emacs.
156 264
265 Major Textual Changes
266
267 * Great Integral Type Renaming::
268 * Text/Char Type Renaming::
269
157 Rules When Writing New C Code 270 Rules When Writing New C Code
158 271
159 * General Coding Rules:: 272 * A Reader's Guide to XEmacs Coding Conventions::
160 * Writing Lisp Primitives:: 273 * General Coding Rules::
161 * Writing Good Comments:: 274 * Object-Oriented Techniques for C::
162 * Adding Global Lisp Variables:: 275 * Writing Lisp Primitives::
163 * Proper Use of Unsigned Types:: 276 * Writing Good Comments::
164 * Coding for Mule:: 277 * Adding Global Lisp Variables::
165 * Techniques for XEmacs Developers:: 278 * Writing Macros::
166 279 * Proper Use of Unsigned Types::
167 Coding for Mule 280 * Techniques for XEmacs Developers::
168 281
169 * Character-Related Data Types:: 282 Regression Testing XEmacs
170 * Working With Character and Byte Positions:: 283
171 * Conversion to and from External Data:: 284 * How to Regression-Test::
172 * General Guidelines for Writing Mule-Aware Code:: 285 * Modules for Regression Testing::
173 * An Example of Mule-Aware Code::
174 286
175 CVS Techniques 287 CVS Techniques
176 288
177 * Merging a Branch into the Trunk:: 289 * Merging a Branch into the Trunk::
178 290
179 Regression Testing XEmacs 291 The Modules of XEmacs
180 292
181 A Summary of the Various XEmacs Modules 293 * A Summary of the Various XEmacs Modules::
182 294 * Low-Level Modules::
183 * Low-Level Modules:: 295 * Basic Lisp Modules::
184 * Basic Lisp Modules:: 296 * Modules for Standard Editing Operations::
185 * Modules for Standard Editing Operations:: 297 * Modules for Interfacing with the File System::
186 * Editor-Level Control Flow Modules:: 298 * Modules for Other Aspects of the Lisp Interpreter and Object System::
187 * Modules for the Basic Displayable Lisp Objects:: 299 * Modules for Interfacing with the Operating System::
188 * Modules for other Display-Related Lisp Objects::
189 * Modules for the Redisplay Mechanism::
190 * Modules for Interfacing with the File System::
191 * Modules for Other Aspects of the Lisp Interpreter and Object System::
192 * Modules for Interfacing with the Operating System::
193 * Modules for Interfacing with X Windows::
194 * Modules for Internationalization::
195 * Modules for Regression Testing::
196 300
197 Allocation of Objects in XEmacs Lisp 301 Allocation of Objects in XEmacs Lisp
198 302
199 * Introduction to Allocation:: 303 * Introduction to Allocation::
200 * Garbage Collection:: 304 * Garbage Collection::
201 * GCPROing:: 305 * GCPROing::
202 * Garbage Collection - Step by Step:: 306 * Garbage Collection - Step by Step::
203 * Integers and Characters:: 307 * Integers and Characters::
204 * Allocation from Frob Blocks:: 308 * Allocation from Frob Blocks::
205 * lrecords:: 309 * lrecords::
206 * Low-level allocation:: 310 * Low-level allocation::
207 * Cons:: 311 * Cons::
208 * Vector:: 312 * Vector::
209 * Bit Vector:: 313 * Bit Vector::
210 * Symbol:: 314 * Symbol::
211 * Marker:: 315 * Marker::
212 * String:: 316 * String::
213 * Compiled Function:: 317 * Compiled Function::
214 318
215 Garbage Collection - Step by Step 319 Garbage Collection - Step by Step
216 320
217 * Invocation:: 321 * Invocation::
218 * garbage_collect_1:: 322 * garbage_collect_1::
219 * mark_object:: 323 * mark_object::
220 * gc_sweep:: 324 * gc_sweep::
221 * sweep_lcrecords_1:: 325 * sweep_lcrecords_1::
222 * compact_string_chars:: 326 * compact_string_chars::
223 * sweep_strings:: 327 * sweep_strings::
224 * sweep_bit_vectors_1:: 328 * sweep_bit_vectors_1::
225 329
226 Dumping 330 Dumping
227 331
228 * Overview:: 332 * Dumping Justification::
229 * Data descriptions:: 333 * Overview::
230 * Dumping phase:: 334 * Data descriptions::
231 * Reloading phase:: 335 * Dumping phase::
336 * Reloading phase::
337 * Remaining issues::
232 338
233 Dumping phase 339 Dumping phase
234 340
235 * Object inventory:: 341 * Object inventory::
236 * Address allocation:: 342 * Address allocation::
237 * The header:: 343 * The header::
238 * Data dumping:: 344 * Data dumping::
239 * Pointers dumping:: 345 * Pointers dumping::
240 346
241 Events and the Event Loop 347 Events and the Event Loop
242 348
243 * Introduction to Events:: 349 * Introduction to Events::
244 * Main Loop:: 350 * Main Loop::
245 * Specifics of the Event Gathering Mechanism:: 351 * Specifics of the Event Gathering Mechanism::
246 * Specifics About the Emacs Event:: 352 * Specifics About the Emacs Event::
247 * The Event Stream Callback Routines:: 353 * Event Queues::
248 * Other Event Loop Functions:: 354 * Event Stream Callback Routines::
249 * Converting Events:: 355 * Other Event Loop Functions::
250 * Dispatching Events; The Command Builder:: 356 * Stream Pairs::
357 * Converting Events::
358 * Dispatching Events; The Command Builder::
359 * Focus Handling::
360 * Editor-Level Control Flow Modules::
361
362 Asynchronous Events; Quit Checking
363
364 * Signal Handling::
365 * Control-G (Quit) Checking::
366 * Profiling::
367 * Asynchronous Timeouts::
368 * Exiting::
251 369
252 Evaluation; Stack Frames; Bindings 370 Evaluation; Stack Frames; Bindings
253 371
254 * Evaluation:: 372 * Evaluation::
255 * Dynamic Binding; The specbinding Stack; Unwind-Protects:: 373 * Dynamic Binding; The specbinding Stack; Unwind-Protects::
256 * Simple Special Forms:: 374 * Simple Special Forms::
257 * Catch and Throw:: 375 * Catch and Throw::
258 376
259 Symbols and Variables 377 Symbols and Variables
260 378
261 * Introduction to Symbols:: 379 * Introduction to Symbols::
262 * Obarrays:: 380 * Obarrays::
263 * Symbol Values:: 381 * Symbol Values::
264 382
265 Buffers and Textual Representation 383 Buffers
266 384
267 * Introduction to Buffers:: A buffer holds a block of text such as a file. 385 * Introduction to Buffers:: A buffer holds a block of text such as a file.
268 * The Text in a Buffer:: Representation of the text in a buffer.
269 * Buffer Lists:: Keeping track of all buffers. 386 * Buffer Lists:: Keeping track of all buffers.
270 * Markers and Extents:: Tagging locations within a buffer. 387 * Markers and Extents:: Tagging locations within a buffer.
388 * The Buffer Object:: The Lisp object corresponding to a buffer.
389
390 Text
391
392 * The Text in a Buffer:: Representation of the text in a buffer.
271 * Ibytes and Ichars:: Representation of individual characters. 393 * Ibytes and Ichars:: Representation of individual characters.
272 * The Buffer Object:: The Lisp object corresponding to a buffer. 394 * Byte-Char Position Conversion::
273 * Searching and Matching:: Higher-level algorithms. 395 * Searching and Matching:: Higher-level algorithms.
274 396
275 MULE Character Sets and Encodings 397 Multilingual Support
276 398
277 * Character Sets:: 399 * Introduction to Multilingual Issues #1::
278 * Encodings:: 400 * Introduction to Multilingual Issues #2::
279 * Internal Mule Encodings:: 401 * Introduction to Multilingual Issues #3::
280 * CCL:: 402 * Introduction to Multilingual Issues #4::
403 * Character Sets::
404 * Encodings::
405 * Internal Mule Encodings::
406 * Byte/Character Types; Buffer Positions; Other Typedefs::
407 * Internal Text API's::
408 * Coding for Mule::
409 * CCL::
410 * Modules for Internationalization::
281 411
282 Encodings 412 Encodings
283 413
284 * Japanese EUC (Extended Unix Code):: 414 * Japanese EUC (Extended Unix Code)::
285 * JIS7:: 415 * JIS7::
286 416
287 Internal Mule Encodings 417 Internal Mule Encodings
288 418
289 * Internal String Encoding:: 419 * Internal String Encoding::
290 * Internal Character Encoding:: 420 * Internal Character Encoding::
421
422 Byte/Character Types; Buffer Positions; Other Typedefs
423
424 * Byte Types::
425 * Different Ways of Seeing Internal Text::
426 * Buffer Positions::
427 * Other Typedefs::
428 * Usage of the Various Representations::
429 * Working With the Various Representations::
430
431 Internal Text API's
432
433 * Basic internal-format API's::
434 * The DFC API::
435 * The Eistring API::
436
437 Coding for Mule
438
439 * Character-Related Data Types::
440 * Working With Character and Byte Positions::
441 * Conversion to and from External Data::
442 * General Guidelines for Writing Mule-Aware Code::
443 * An Example of Mule-Aware Code::
444 * Mule-izing Code::
291 445
292 Lstreams 446 Lstreams
293 447
294 * Creating an Lstream:: Creating an lstream object. 448 * Creating an Lstream:: Creating an lstream object.
295 * Lstream Types:: Different sorts of things that are streamed. 449 * Lstream Types:: Different sorts of things that are streamed.
296 * Lstream Functions:: Functions for working with lstreams. 450 * Lstream Functions:: Functions for working with lstreams.
297 * Lstream Methods:: Creating new lstream types. 451 * Lstream Methods:: Creating new lstream types.
298 452
299 Consoles; Devices; Frames; Windows 453 Consoles; Devices; Frames; Windows
300 454
301 * Introduction to Consoles; Devices; Frames; Windows:: 455 * Introduction to Consoles; Devices; Frames; Windows::
302 * Point:: 456 * Point::
303 * Window Hierarchy:: 457 * Window Hierarchy::
304 * The Window Object:: 458 * The Window Object::
459 * Modules for the Basic Displayable Lisp Objects::
305 460
306 The Redisplay Mechanism 461 The Redisplay Mechanism
307 462
308 * Critical Redisplay Sections:: 463 * Critical Redisplay Sections::
309 * Line Start Cache:: 464 * Line Start Cache::
310 * Redisplay Piece by Piece:: 465 * Redisplay Piece by Piece::
466 * Modules for the Redisplay Mechanism::
467 * Modules for other Display-Related Lisp Objects::
311 468
312 Extents 469 Extents
313 470
314 * Introduction to Extents:: Extents are ranges over text, with properties. 471 * Introduction to Extents:: Extents are ranges over text, with properties.
315 * Extent Ordering:: How extents are ordered internally. 472 * Extent Ordering:: How extents are ordered internally.
316 * Format of the Extent Info:: The extent information in a buffer or string. 473 * Format of the Extent Info:: The extent information in a buffer or string.
317 * Zero-Length Extents:: A weird special case. 474 * Zero-Length Extents:: A weird special case.
318 * Mathematics of Extent Ordering:: A rigorous foundation. 475 * Mathematics of Extent Ordering:: A rigorous foundation.
319 * Extent Fragments:: Cached information useful for redisplay. 476 * Extent Fragments:: Cached information useful for redisplay.
320 477
478 Interface to MS Windows
479
480 * Different kinds of Windows environments::
481 * Windows Build Flags::
482 * Windows I18N Introduction::
483 * Modules for Interfacing with MS Windows::
484
485 Interface to the X Window System
486
487 * Lucid Widget Library:: An interface to various widget sets.
488 * Modules for Interfacing with X Windows::
489
490 Lucid Widget Library
491
492 * Generic Widget Interface:: The lwlib generic widget interface.
493 * Scrollbars::
494 * Menubars::
495 * Checkboxes and Radio Buttons::
496 * Progress Bars::
497 * Tab Controls::
498
499 Future Work
500
501 * Future Work -- Elisp Compatibility Package::
502 * Future Work -- Drag-n-Drop::
503 * Future Work -- Standard Interface for Enabling Extensions::
504 * Future Work -- Better Initialization File Scheme::
505 * Future Work -- Keyword Parameters::
506 * Future Work -- Property Interface Changes::
507 * Future Work -- Toolbars::
508 * Future Work -- Menu API Changes::
509 * Future Work -- Removal of Misc-User Event Type::
510 * Future Work -- Mouse Pointer::
511 * Future Work -- Extents::
512 * Future Work -- Version Number and Development Tree Organization::
513 * Future Work -- Improvements to the @code{xemacs.org} Website::
514 * Future Work -- Keybindings::
515 * Future Work -- Byte Code Snippets::
516 * Future Work -- Lisp Stream API::
517 * Future Work -- Multiple Values::
518 * Future Work -- Macros::
519 * Future Work -- Specifiers::
520 * Future Work -- Display Tables::
521 * Future Work -- Making Elisp Function Calls Faster::
522 * Future Work -- Lisp Engine Replacement::
523
524 Future Work -- Toolbars
525
526 * Future Work -- Easier Toolbar Customization::
527 * Future Work -- Toolbar Interface Changes::
528
529 Future Work -- Mouse Pointer
530
531 * Future Work -- Abstracted Mouse Pointer Interface::
532 * Future Work -- Busy Pointer::
533
534 Future Work -- Extents
535
536 * Future Work -- Everything should obey duplicable extents::
537
538 Future Work -- Keybindings
539
540 * Future Work -- Keybinding Schemes::
541 * Future Work -- Better Support for Windows Style Key Bindings::
542 * Future Work -- Misc Key Binding Ideas::
543
544 Future Work -- Byte Code Snippets
545
546 * Future Work -- Autodetection::
547 * Future Work -- Conversion Error Detection::
548 * Future Work -- BIDI Support::
549 * Future Work -- Localized Text/Messages::
550
551 Future Work -- Lisp Engine Replacement
552
553 * Future Work -- Lisp Engine Discussion::
554 * Future Work -- Lisp Engine Replacement -- Implementation::
555
556 Future Work Discussion
557
558 * Discussion -- garbage collection::
559 * Discussion -- glyphs::
560
561 Old Future Work
562
563 * Future Work -- A Portable Unexec Replacement::
564 * Future Work -- Indirect Buffers::
565 * Future Work -- Improvements in support for non-ASCII (European) keysyms under X::
566 * Future Work -- xemacs.org Mailing Address Changes::
567 * Future Work -- Lisp callbacks from critical areas of the C code::
568
321 @end detailmenu 569 @end detailmenu
322 @end menu 570 @end menu
323 571
324 @node A History of Emacs, XEmacs From the Outside, Top, Top 572 @node Introduction, Authorship of XEmacs, Top, Top
573 @chapter Introduction
574 @cindex introduction
575 @cindex authorship, manual
576
577 This manual documents the internals of XEmacs. It presumes knowledge of
578 how to use XEmacs (@pxref{Top,,, xemacs, XEmacs User's Manual}), and
579 especially, knowledge of XEmacs Lisp (@pxref{Top,,, lispref, XEmacs Lisp
580 Reference Manual}). Information in either of these manuals will not be
581 repeated here, and some information in the Lisp Reference Manual in
582 particular is more relevant to a person working on the internals than
583 the average XEmacs Lisp programmer. (In such cases, a cross-reference is
584 usually made to the Lisp Reference Manual.)
585
586 Ideally, this manual would be complete and up-to-date. Unfortunately,
587 in reality it is neither, due to the limited resources of the
588 maintainers of XEmacs. (That said, it is much better than the internal
589 documentation of most programs.) Also, much information about the
590 internals is documented only in the code itself, in the form of
591 comments. Furthermore, since the maintainers are more likely to be
592 working on the code than on this manual, information contained in
593 comments may be more up-to-date than information in this manual. Do not
594 assume that all information in this manual is necessarily accurate as of
595 the snapshot of the code you are looking at, and in the case of
596 contradictions between the code comments and the manual, @strong{always}
597 assume that the code comments are correct. (Because of the proximity of
598 the comments to the code, comments will rarely be out-of-date.)
599
600 This manual was primarily written by Ben Wing. Certain sections were
601 written by others, including those mentioned on the title page as well
602 as other coders. Some sections were lifted directly from comments in
603 the code, and in those cases we may not completely be aware of the
604 authorship. In addition, due to the collaborative nature of XEmacs,
605 many people have made small changes and emendations as they have
606 discovered problems.
607
608 The following is a (necessarily incomplete) list of the work that was
609 @emph{not} done by Ben Wing (for more complete information, take a look
610 at the ChangeLog for the @file{man} directory and the CVS records of
611 actual changes):
612
613 @table @asis
614 @item Stephen Turnbull
615 Various cleanup work, mostly post-2000. Object-Oriented Techniques in
616 XEmacs. A Reader's Guide to XEmacs Coding Conventions. Searching and
617 Matching. Regression Testing XEmacs. Modules for Regression Testing.
618 Lucid Widget Library.
619 @item Martin Buchholz
620 Various cleanup work, mostly pre-2001. Docs on inline functions. Docs
621 on dfc conversion functions (Conversion to and from External Data).
622 Improvements in support for non-ASCII (European) keysyms under X.
623 @item Hrvoje Niksic
624 Coding for Mule.
625 @item Matthias Neubauer
626 Garbage Collection - Step by Step.
627 @item Olivier Galibert
628 Portable dumper documentation.
629 @item Andy Piper
630 Redisplay Piece by Piece. Glyphs.
631 @item Chuck Thompson
632 Line Start Cache.
633 @item Kenichi Handa
634 CCL.
635 @end table
636
637 @node Authorship of XEmacs, A History of Emacs, Introduction, Top
638 @chapter Authorship of XEmacs
639 @cindex authorship, XEmacs
640
641 General authorship in chronological order:
642
643 @table @asis
644
645 @item Jamie Zawinski, Eric Benson, Matthieu Devin, Harlan Sexton
646 These were the early creators of Lucid Emacs, the predecessor of Xemacs.
647 Jamie Zawinski was the primary maintainer and coder for Lucid Emacs—
648 active between early 1991 and June 1994. He presided over versions 19.0
649 through 19.10, and then abruptly left for Netscape. He wrote the
650 advanced stream code, the Xt interface code, the byte compiler, the
651 original version of the X selection code, the first, second and third
652 versions of the face code which appeared in 19.0, 19.6 and 19.9
653 respectively. Part of the keymap code separated the Lisp directories
654 into many subdirectories and many smaller changes. Matthieu Devin wrote
655 the original version of the Extents code. Someone else at Lucid wrote
656 the Lucid widget library (LWLIB), with the exception of the scrollbar
657 code, which was added later.
658
659 @item Richard Mlynarik
660 Active 1991 to 1993, author of much of the current Lisp object scheme,
661 including Lrecords and LC records (added this support in 1993 to allow
662 for 28-bit pointers, which had previously been restricted to 26 bits.)
663 Moved the minibuffer and abbreve code into Lisp, worked on the keymap
664 code and did the initial synching between Xemacs and the first released
665 version of GNU Emacs version 19 in mid-1993.
666
667 @item Martin Buchholz
668 Active 1995 to 2001, maintainer of Xemacs late 1999 to ?, author of the
669 current configure support, mini optimizations to the byte interpreter,
670 many improvements to the case changing code and many bug fixes to the
671 process and system-specific code, also general spell checking and code
672 cleanliness guru.
673
674 @item Steve Baur
675 Maintainer of Xemacs 1996 to 1999, responsible for many improvements to
676 the Xemacs development process, for example, creation of the review
677 board and arranging for Xemacs to be placed under CVS. Author of the
678 package code.
679
680 @item Chuck Thompson
681 Active January 1993 to June of 1996, author of the current and previous
682 ve3rsions of the redisplay code and maintainer of Xemacs from mid-1994
683 to mid-1996. Creator of XEMacs.org. Also wrote the scrollbar code, the
684 original configure support, and prototype versions of the toolbar and
685 device code.
686
687 @item Ben Wing
688 Active April 1993 to April 1996 and February 2000 to present. Chief
689 coder for Xemacs between 1994 and 1996. Ben Wing was never the
690 maintainer of Xemacs, and as a result, is the author of more of the
691 Xemacs specific code in Xemacs than anyone else. Author of the mule
692 support (Extense code), the glis-phonetically spelled-and specifiers
693 code most of the toolbars, and device distraction code, the error
694 checking code, the Lstream code, the bit vector, char-table, and
695 range-table code, much of the current Xt code, much, much of the events
696 code (including most of the TTY event code), some of the phase code, and
697 numerous other aspects of the code. Also author of most of the Xemacs
698 documentation including the internals manual and the Xemacs editions to
699 the Lisp reference manual, and responsible for much of the synching
700 between Xemacs and GNU Emacs.
701
702 @item Kyle Jones
703 Author of the minimal tag bits support in—minimal lisp support for lisp
704 objects which allows for 32-bit pointers and 31-bit integers.
705
706 @item Olivier Galibert
707 Author of the portable dumping mechanism.
708
709 @item Andy Piper
710 Author of the widget support, the gutter support and much of the
711 Microsoft Windows support.
712
713 @item Kirill Katsnelson
714 Author of many improvements to Microsoft Windows support, the current
715 sub-process code, and revamping of the display size change mechanism.
716
717 @item Jonathan Harris
718 Author of much of the Microsoft Windows support.
719 @end table
720
721 Authorship of some of the modules:
722
723 @table @file
724 @item alloc.c
725 Inherited 1991 from a prototype of GNU Emacs 19. Around mid-1993
726 Richard Mlynarik redid much of the code, creating the existing system of
727 object abstractions, (where each object can define its own marking
728 method, printing method, and so on) and the existing scheme of Lrecords
729 and LC records. This was done both to increase the number of bits that
730 a pointer can occupy from 26 to 28, and provide a general framework for
731 creating new object types easily. The garbage collection and
732 froblock-phonetically spelled-allocation code is left over from the
733 original version, but was cleaned up somewhat by Mlynarik. Later in
734 1993, Jamie Zawinski improved the code that kept track of pure space
735 usage so it would report exactly where you exceeded the pure space and
736 how much pure space you are going to have to add to get everything to
737 fit. He also added code to issue nice pure space and garbage
738 collections statistics at the end of dumping. Early in 1995, Ben Wing
739 cleaned up the froblock code to be as compact as possible, added the
740 various bits of error checking, which are controlled using the
741 _ErrorCheck*. He also added the ability of strings to be resized, which
742 is necessary under MULE, because you can replace one character in a
743 string with another character of a different size. As a result, the
744 string resizes. Ben Wing also added bit factors for 1913 around
745 September 1995, and Elsie record lists for 1914 around December 1995.
746 Steve Baur did some work on the purification and dump time code, and
747 added Doug Lea Malloc support from Emacs 20.2 circa 1998. Kyle Jones
748 continued to work done by Mlynarik, reducing the number of primitive
749 Lisp types so that there are only three: integer character and pointer
750 type, which encompasses all other types. This allows for 31-bit
751 integers and 32-bit pointers, although there is potential slowdown in
752 some extra in directions when determining the type of an object, and
753 some memory increase for the objects that previously were considered to
754 be the most primitive types. Martin Buchholz has recently (February
755 2000) done some work to eliminate most of the slowdown.
756
757 Olivier Galibert, mid-1999 to 2000, implemented the portable
758 dumper. This writes out the state of the Lisp object heap to
759 disk file in a real locatable fashion so that it can later be
760 read in at any memory location. This work entails a number of
761 changes in Alec.C. For example, pure space was removed and
762 structures were created to define the types of all the elements
763 contained in the various lisp object structures and associated
764 structures.
765
766 @item alloca.c
767 Inherited a long time ago from a prerelease version of GNU Emacs 19,
768 kept in sync with more recent versions very few changes from Xemacs.
769 Most changes consist of converting the code to ANSI C, and fixing up the
770 includes at the top of the file to follow Xemacs conventions.
771
772 @item alloca.s
773 Inherited almost unchanged from FSF kept in sync up through 19.30
774 basically no changes for Xemacs.
775 @end table
776
777 @node A History of Emacs, XEmacs From the Outside, Authorship of XEmacs, Top
325 @chapter A History of Emacs 778 @chapter A History of Emacs
326 @cindex history of Emacs, a 779 @cindex history of Emacs, a
327 @cindex Emacs, a history of 780 @cindex Emacs, a history of
328 @cindex Hackers (Steven Levy) 781 @cindex Hackers (Steven Levy)
329 @cindex Levy, Steven 782 @cindex Levy, Steven
358 * GNU Emacs 19:: The other version 19 Emacs. 811 * GNU Emacs 19:: The other version 19 Emacs.
359 * GNU Emacs 20:: The other version 20 Emacs. 812 * GNU Emacs 20:: The other version 20 Emacs.
360 * XEmacs:: The continuation of Lucid Emacs. 813 * XEmacs:: The continuation of Lucid Emacs.
361 @end menu 814 @end menu
362 815
363 @node Through Version 18 816 @node Through Version 18, Lucid Emacs, A History of Emacs, A History of Emacs
364 @section Through Version 18 817 @section Through Version 18
365 @cindex version 18, through 818 @cindex version 18, through
366 @cindex Gosling, James 819 @cindex Gosling, James
367 @cindex Great Usenet Renaming 820 @cindex Great Usenet Renaming
368 821
369 Although the history of the early versions of GNU Emacs is unclear, 822 As described above, Emacs began life in the mid-1970's as a series of
370 the history is well-known from the middle of 1985. A time line is: 823 editor macros for TECO, an early editor on the PDP-10. In the early
824 1980's it was rewritten in C as a collaboration between Richard
825 M. Stallman (RMS) and James Gosling (the creator of Java); its extension
826 language was known as @dfn{Mocklisp}. This version of Emacs-in-C formed
827 the basis for the early versions of GNU Emacs and also for Gosling's
828 Unipress Emacs, a commercial product. Because of bad blood between the
829 two over the issue of commercialism, RMS pretty much disowned this
830 collaboration, referring to it as "Gosling Emacs".
831
832 At this point we pick up with a time line of events. (A broader timeline
833 is available at @uref{http://http://www.jwz.org/doc/emacs-timeline.html,
834 ``Emacs Timeline''}.)
371 835
372 @itemize @bullet 836 @itemize @bullet
373 @item 837 @item
374 GNU Emacs version 15 (15.34) was released sometime in 1984 or 1985 and 838 Unipress Emacs, a $395 commercial product, was released on May 6, 1983.
375 shared some code with a version of Emacs written by James Gosling (the 839 This was an outgrowth of the Emacs-in-C collaboration written by Gosling
376 same James Gosling who later created the Java language). 840 and RMS.
841
842 @item
843 GNU Emacs version 13.0? was released on March 20, 1985. This may have
844 been the initial public release. This was also based on this same
845 Emacs-in-C collaboration.
846
847 @item
848 GNU Emacs version 15.10 was released on April 11, 1985.
849
850 @item
851 GNU Emacs version 15.34 was released on May 7, 1985. This appears
852 to be the last release of version 15.
853
377 @item 854 @item
378 GNU Emacs version 16 (first released version was 16.56) was released on 855 GNU Emacs version 16 (first released version was 16.56) was released on
379 July 15, 1985. All Gosling code was removed due to potential copyright 856 July 15, 1985. All Gosling code was removed due to potential copyright
380 problems with the code. 857 problems with the code.
381 @item 858 @item
472 version 18.58 released ?????. 949 version 18.58 released ?????.
473 @item 950 @item
474 version 18.59 released October 31, 1992. 951 version 18.59 released October 31, 1992.
475 @end itemize 952 @end itemize
476 953
477 @node Lucid Emacs 954 @node Lucid Emacs, GNU Emacs 19, Through Version 18, A History of Emacs
478 @section Lucid Emacs 955 @section Lucid Emacs
479 @cindex Lucid Emacs 956 @cindex Lucid Emacs
480 @cindex Lucid Inc. 957 @cindex Lucid Inc.
481 @cindex Energize 958 @cindex Energize
482 @cindex Epoch 959 @cindex Epoch
538 version 19.9 released January 12, 1994. 1015 version 19.9 released January 12, 1994.
539 @item 1016 @item
540 version 19.10 released May 27, 1994. 1017 version 19.10 released May 27, 1994.
541 @end itemize 1018 @end itemize
542 1019
543 @node GNU Emacs 19 1020 @node GNU Emacs 19, GNU Emacs 20, Lucid Emacs, A History of Emacs
544 @section GNU Emacs 19 1021 @section GNU Emacs 19
545 @cindex GNU Emacs 19 1022 @cindex GNU Emacs 19
546 @cindex Emacs 19, GNU 1023 @cindex Emacs 19, GNU
547 @cindex version 19, GNU Emacs 1024 @cindex version 19, GNU Emacs
548 @cindex FSF Emacs 1025 @cindex FSF Emacs
617 worse. Lucid soon began incorporating features from GNU Emacs 19 into 1094 worse. Lucid soon began incorporating features from GNU Emacs 19 into
618 Lucid Emacs; the work was mostly done by Richard Mlynarik, who had been 1095 Lucid Emacs; the work was mostly done by Richard Mlynarik, who had been
619 working on and using GNU Emacs for a long time (back as far as version 1096 working on and using GNU Emacs for a long time (back as far as version
620 16 or 17). 1097 16 or 17).
621 1098
622 @node GNU Emacs 20 1099 @node GNU Emacs 20, XEmacs, GNU Emacs 19, A History of Emacs
623 @section GNU Emacs 20 1100 @section GNU Emacs 20
624 @cindex GNU Emacs 20 1101 @cindex GNU Emacs 20
625 @cindex Emacs 20, GNU 1102 @cindex Emacs 20, GNU
626 @cindex version 20, GNU Emacs 1103 @cindex version 20, GNU Emacs
627 @cindex FSF Emacs 1104 @cindex FSF Emacs
638 version 20.2 released September 20, 1997. 1115 version 20.2 released September 20, 1997.
639 @item 1116 @item
640 version 20.3 released August 19, 1998. 1117 version 20.3 released August 19, 1998.
641 @end itemize 1118 @end itemize
642 1119
643 @node XEmacs 1120 @node XEmacs, , GNU Emacs 20, A History of Emacs
644 @section XEmacs 1121 @section XEmacs
645 @cindex XEmacs 1122 @cindex XEmacs
646 1123
647 @cindex Sun Microsystems 1124 @cindex Sun Microsystems
648 @cindex University of Illinois 1125 @cindex University of Illinois
1287 Recompiling anything depends on @file{bytecomp.elc} and 1764 Recompiling anything depends on @file{bytecomp.elc} and
1288 @file{byte-optimize.elc} being up-to-date. 1765 @file{byte-optimize.elc} being up-to-date.
1289 @end enumerate 1766 @end enumerate
1290 1767
1291 Put these together and you'll see it's perfectly acceptable to build 1768 Put these together and you'll see it's perfectly acceptable to build
1292 auto-autoloads *after* dumping if no @file{.elc} files are out-of-date. 1769 auto-autoloads @strong{after} dumping if no @file{.elc} files are out-of-date.
1293 @end quotation 1770 @end quotation
1294 1771
1295 These Lisp driver programs typically run from temacs, not a dumped 1772 These Lisp driver programs typically run from temacs, not a dumped
1296 XEmacs. The simplest (but time-consuming) way to achieve a sane 1773 XEmacs. The simplest (but time-consuming) way to achieve a sane
1297 environment for running Lisp is to load @file{loadup.el} or 1774 environment for running Lisp is to load @file{loadup.el} or
1948 2425
1949 An example of the right way to do this was the so-called "great integral 2426 An example of the right way to do this was the so-called "great integral
1950 type renaming". 2427 type renaming".
1951 2428
1952 @menu 2429 @menu
1953 * Great Integral Type Renaming:: 2430 * Great Integral Type Renaming::
1954 * Text/Char Type Renaming:: 2431 * Text/Char Type Renaming::
1955 @end menu 2432 @end menu
1956 2433
1957 @node Great Integral Type Renaming 2434 @node Great Integral Type Renaming, Text/Char Type Renaming, Major Textual Changes, Major Textual Changes
1958 @section Great Integral Type Renaming 2435 @section Great Integral Type Renaming
1959 @cindex Great Integral Type Renaming 2436 @cindex Great Integral Type Renaming
1960 @cindex integral type renaming, great 2437 @cindex integral type renaming, great
1961 @cindex type renaming, integral 2438 @cindex type renaming, integral
1962 @cindex renaming, integral types 2439 @cindex renaming, integral types
1986 are annoying. More has been written on this elsewhere. 2463 are annoying. More has been written on this elsewhere.
1987 2464
1988 @item 2465 @item
1989 All such quantity types just mentioned boil down to EMACS_INT, which is 2466 All such quantity types just mentioned boil down to EMACS_INT, which is
1990 32 bits on 32-bit machines and 64 bits on 64-bit machines. This is 2467 32 bits on 32-bit machines and 64 bits on 64-bit machines. This is
1991 guaranteed to be the same size as Lisp objects of type `int', and (as 2468 guaranteed to be the same size as Lisp objects of type @code{int}, and (as
1992 far as I can tell) of size_t (unsigned!) and ssize_t. The only type 2469 far as I can tell) of size_t (unsigned!) and ssize_t. The only type
1993 below that is not an EMACS_INT is Hashcode, which is an unsigned value 2470 below that is not an EMACS_INT is Hashcode, which is an unsigned value
1994 of the same size as EMACS_INT. 2471 of the same size as EMACS_INT.
1995 2472
1996 @item 2473 @item
2068 things, particularly relating to the duplicate definitions of 2545 things, particularly relating to the duplicate definitions of
2069 types, now that some types merged with others. Specifically: 2546 types, now that some types merged with others. Specifically:
2070 2547
2071 @enumerate 2548 @enumerate
2072 @item 2549 @item
2073 in lisp.h, removed duplicate declarations of Bytecount. The changed 2550 in @file{lisp.h}, removed duplicate declarations of Bytecount. The changed
2074 code should now look like this: (In each code snippet below, the first 2551 code should now look like this: (In each code snippet below, the first
2075 and last lines are the same as the original, as are all lines outside of 2552 and last lines are the same as the original, as are all lines outside of
2076 those lines. That allows you to locate the section to be replaced, and 2553 those lines. That allows you to locate the section to be replaced, and
2077 replace the stuff in that section, verifying that there isn't anything 2554 replace the stuff in that section, verifying that there isn't anything
2078 new added that would need to be kept.) 2555 new added that would need to be kept.)
2092 /* ------------------------ dynamic arrays ------------------- */ 2569 /* ------------------------ dynamic arrays ------------------- */
2093 --------------------------------- snip ------------------------------------- 2570 --------------------------------- snip -------------------------------------
2094 @end example 2571 @end example
2095 2572
2096 @item 2573 @item
2097 in lstream.h, removed duplicate declaration of Bytecount. Rewrote the 2574 in @file{lstream.h}, removed duplicate declaration of Bytecount. Rewrote the
2098 comment about this type. The changed code should now look like this: 2575 comment about this type. The changed code should now look like this:
2099 2576
2100 @example 2577 @example
2101 --------------------------------- snip ------------------------------------- 2578 --------------------------------- snip -------------------------------------
2102 #endif 2579 #endif
2103 2580
2104 /* The have been some arguments over the what the type should be that 2581 /* The have been some arguments over the what the type should be that
2105 specifies a count of bytes in a data block to be written out or read in, 2582 specifies a count of bytes in a data block to be written out or read in,
2106 using Lstream_read(), Lstream_write(), and related functions. 2583 using @code{Lstream_read()}, @code{Lstream_write()}, and related functions.
2107 Originally it was long, which worked fine; Martin "corrected" these to 2584 Originally it was long, which worked fine; Martin "corrected" these to
2108 size_t and ssize_t on the grounds that this is theoretically cleaner and 2585 size_t and ssize_t on the grounds that this is theoretically cleaner and
2109 is in keeping with the C standards. Unfortunately, this practice is 2586 is in keeping with the C standards. Unfortunately, this practice is
2110 horribly error-prone due to design flaws in the way that mixed 2587 horribly error-prone due to design flaws in the way that mixed
2111 signed/unsigned arithmetic happens. In fact, by doing this change, 2588 signed/unsigned arithmetic happens. In fact, by doing this change,
2119 Some earlier comments about why the type must be signed: This MUST BE 2596 Some earlier comments about why the type must be signed: This MUST BE
2120 SIGNED, since it also is used in functions that return the number of 2597 SIGNED, since it also is used in functions that return the number of
2121 bytes actually read to or written from in an operation, and these 2598 bytes actually read to or written from in an operation, and these
2122 functions can return -1 to signal error. 2599 functions can return -1 to signal error.
2123 2600
2124 Note that the standard Unix read() and write() functions define the 2601 Note that the standard Unix @code{read()} and @code{write()} functions define the
2125 count going in as a size_t, which is UNSIGNED, and the count going 2602 count going in as a size_t, which is UNSIGNED, and the count going
2126 out as an ssize_t, which is SIGNED. This is a horrible design 2603 out as an ssize_t, which is SIGNED. This is a horrible design
2127 flaw. Not only is it highly likely to lead to logic errors when a 2604 flaw. Not only is it highly likely to lead to logic errors when a
2128 -1 gets interpreted as a large positive number, but operations are 2605 -1 gets interpreted as a large positive number, but operations are
2129 bound to fail in all sorts of horrible ways when a number in the 2606 bound to fail in all sorts of horrible ways when a number in the
2138 typedef enum lstream_buffering 2615 typedef enum lstream_buffering
2139 --------------------------------- snip ------------------------------------- 2616 --------------------------------- snip -------------------------------------
2140 @end example 2617 @end example
2141 2618
2142 @item 2619 @item
2143 in dumper.c, there are four places, all inside of switch() statements, 2620 in @file{dumper.c}, there are four places, all inside of @code{switch()} statements,
2144 where XD_BYTECOUNT appears twice as a case tag. In each case, the two 2621 where XD_BYTECOUNT appears twice as a case tag. In each case, the two
2145 case blocks contain identical code, and you should *REMOVE THE SECOND* 2622 case blocks contain identical code, and you should *REMOVE THE SECOND*
2146 and leave the first. 2623 and leave the first.
2147 @end enumerate 2624 @end enumerate
2148 2625
2149 @node Text/Char Type Renaming 2626 @node Text/Char Type Renaming, , Great Integral Type Renaming, Major Textual Changes
2150 @section Text/Char Type Renaming 2627 @section Text/Char Type Renaming
2151 @cindex Text/Char Type Renaming 2628 @cindex Text/Char Type Renaming
2152 @cindex type renaming, text/char 2629 @cindex type renaming, text/char
2153 @cindex renaming, text/char types 2630 @cindex renaming, text/char types
2154 2631
2209 present. You can probably do the same if you don't have a separate 2686 present. You can probably do the same if you don't have a separate
2210 workspace, but do have lots of outstanding changes and you'd rather not 2687 workspace, but do have lots of outstanding changes and you'd rather not
2211 just merge all the textual changes directly. Use something like this: 2688 just merge all the textual changes directly. Use something like this:
2212 2689
2213 (WARNING: I'm not a CVS guru; before trying this, or any large operation 2690 (WARNING: I'm not a CVS guru; before trying this, or any large operation
2214 that might potentially mess things up, *DEFINITELY* make a backup of 2691 that might potentially mess things up, @strong{DEFINITELY} make a backup of
2215 your existing workspace.) 2692 your existing workspace.)
2216 2693
2217 @example 2694 @example
2218 cup -r pre-internal-format-textual-renaming 2695 cup -r pre-internal-format-textual-renaming
2219 <apply script> 2696 <apply script>
2235 @example 2712 @example
2236 files="*.[ch] s/*.h m/*.h config.h.in ../configure.in Makefile.in.in ../lib-src/*.[ch] ../lwlib/*.[ch]" 2713 files="*.[ch] s/*.h m/*.h config.h.in ../configure.in Makefile.in.in ../lib-src/*.[ch] ../lwlib/*.[ch]"
2237 2714
2238 # Evidently Perl considers _ to be a word char ala \b, even though XEmacs 2715 # Evidently Perl considers _ to be a word char ala \b, even though XEmacs
2239 # doesn't. We need to be careful here with ibyte/ichar because of words 2716 # doesn't. We need to be careful here with ibyte/ichar because of words
2240 # like Richard, eicharlen(), multibyte, HIBYTE, etc. 2717 # like Richard, @code{eicharlen()}, multibyte, HIBYTE, etc.
2241 2718
2242 gr Ibyte Intbyte $files 2719 gr Ibyte Intbyte $files
2243 gr '\bIBYTE' INTBYTE $files 2720 gr '\bIBYTE' INTBYTE $files
2244 gr '\bibyte' intbyte $files 2721 gr '\bibyte' intbyte $files
2245 gr '\bICHAR' EMCHAR $files 2722 gr '\bICHAR' EMCHAR $files
2275 of the utmost importance that you follow them. If you don't, you may 2752 of the utmost importance that you follow them. If you don't, you may
2276 get something that appears to work, but which will crash in odd 2753 get something that appears to work, but which will crash in odd
2277 situations, often in code far away from where the actual breakage is. 2754 situations, often in code far away from where the actual breakage is.
2278 2755
2279 @menu 2756 @menu
2280 * A Reader's Guide to XEmacs Coding Conventions:: 2757 * A Reader's Guide to XEmacs Coding Conventions::
2281 * General Coding Rules:: 2758 * General Coding Rules::
2282 * Object-Oriented Techniques for C:: 2759 * Object-Oriented Techniques for C::
2283 * Writing Lisp Primitives:: 2760 * Writing Lisp Primitives::
2284 * Writing Good Comments:: 2761 * Writing Good Comments::
2285 * Adding Global Lisp Variables:: 2762 * Adding Global Lisp Variables::
2286 * Proper Use of Unsigned Types:: 2763 * Writing Macros::
2287 * Coding for Mule:: 2764 * Proper Use of Unsigned Types::
2288 * Techniques for XEmacs Developers:: 2765 * Techniques for XEmacs Developers::
2289 @end menu 2766 @end menu
2290 2767
2291 @node A Reader's Guide to XEmacs Coding Conventions 2768 See also @ref{Coding for Mule}.
2769
2770 @node A Reader's Guide to XEmacs Coding Conventions, General Coding Rules, Rules When Writing New C Code, Rules When Writing New C Code
2292 @section A Reader's Guide to XEmacs Coding Conventions 2771 @section A Reader's Guide to XEmacs Coding Conventions
2293 @cindex coding conventions 2772 @cindex coding conventions
2294 @cindex reader's guide 2773 @cindex reader's guide
2295 @cindex coding rules, naming 2774 @cindex coding rules, naming
2296 2775
2379 @samp{F} implement Lisp primitives. Of course all their arguments and 2858 @samp{F} implement Lisp primitives. Of course all their arguments and
2380 their return values must be Lisp_Objects. (This is hidden in the 2859 their return values must be Lisp_Objects. (This is hidden in the
2381 @code{DEFUN} macro.) 2860 @code{DEFUN} macro.)
2382 2861
2383 2862
2384 @node General Coding Rules 2863 @node General Coding Rules, Object-Oriented Techniques for C, A Reader's Guide to XEmacs Coding Conventions, Rules When Writing New C Code
2385 @section General Coding Rules 2864 @section General Coding Rules
2386 @cindex coding rules, general 2865 @cindex coding rules, general
2387 2866
2388 The C code is actually written in a dialect of C called @dfn{Clean C}, 2867 The C code is actually written in a dialect of C called @dfn{Clean C},
2389 meaning that it can be compiled, mostly warning-free, with either a C or 2868 meaning that it can be compiled, mostly warning-free, with either a C or
2390 C++ compiler. Coding in Clean C has several advantages over plain C. 2869 C++ compiler. Coding in Clean C has several advantages over plain C.
2391 C++ compilers are more nit-picking, and a number of coding errors have 2870 C++ compilers are more nit-picking, and a number of coding errors have
2392 been found by compiling with C++. The ability to use both C and C++ 2871 been found by compiling with C++. The ability to use both C and C++
2393 tools means that a greater variety of development tools are available to 2872 tools means that a greater variety of development tools are available to
2394 the developer. 2873 the developer. In addition, the ability to overload operators in C++
2874 means it is possible, for error-checking purposes, to redefine certain
2875 simple types (normally defined as aliases for simple built-in types such
2876 as @code{unsigned char} or @code{long}) as classes, strictly limiting the permissible
2877 operations and catching illegal implicit casts and such.
2395 2878
2396 Every module includes @file{<config.h>} (angle brackets so that 2879 Every module includes @file{<config.h>} (angle brackets so that
2397 @samp{--srcdir} works correctly; @file{config.h} may or may not be in 2880 @samp{--srcdir} works correctly; @file{config.h} may or may not be in
2398 the same directory as the C sources) and @file{lisp.h}. @file{config.h} 2881 the same directory as the C sources) and @file{lisp.h}. @file{config.h}
2399 must always be included before any other header files (including 2882 must always be included before any other header files (including
2498 case of @code{GET_EXTERNAL_LIST_LENGTH}, validating the properness of 2981 case of @code{GET_EXTERNAL_LIST_LENGTH}, validating the properness of
2499 the list. The macros @code{EXTERNAL_LIST_LOOP_DELETE_IF} and 2982 the list. The macros @code{EXTERNAL_LIST_LOOP_DELETE_IF} and
2500 @code{LIST_LOOP_DELETE_IF} delete elements from a lisp list satisfying some 2983 @code{LIST_LOOP_DELETE_IF} delete elements from a lisp list satisfying some
2501 predicate. 2984 predicate.
2502 2985
2503 @node Object-Oriented Techniques for C 2986 @node Object-Oriented Techniques for C, Writing Lisp Primitives, General Coding Rules, Rules When Writing New C Code
2504 @section Object-Oriented Techniques for C 2987 @section Object-Oriented Techniques for C
2505 @cindex coding rules, object-oriented 2988 @cindex coding rules, object-oriented
2506 @cindex object-oriented techniques 2989 @cindex object-oriented techniques
2507 2990
2508 At the lowest levels, XEmacs makes heavy use of object-oriented 2991 At the lowest levels, XEmacs makes heavy use of object-oriented
2598 @samp{some_method}, but this will also catch calls and definitions of 3081 @samp{some_method}, but this will also catch calls and definitions of
2599 that method for instances of other subtypes of @samp{<Type>}, and there 3082 that method for instances of other subtypes of @samp{<Type>}, and there
2600 may be a rather large number of them. 3083 may be a rather large number of them.
2601 3084
2602 3085
2603 @node Writing Lisp Primitives 3086 @node Writing Lisp Primitives, Writing Good Comments, Object-Oriented Techniques for C, Rules When Writing New C Code
2604 @section Writing Lisp Primitives 3087 @section Writing Lisp Primitives
2605 @cindex writing Lisp primitives 3088 @cindex writing Lisp primitives
2606 @cindex Lisp primitives, writing 3089 @cindex Lisp primitives, writing
2607 @cindex primitives, writing Lisp 3090 @cindex primitives, writing Lisp
2608 3091
2736 The names of the C arguments will be used as the names of the arguments 3219 The names of the C arguments will be used as the names of the arguments
2737 to the Lisp primitive as displayed in its documentation, modulo the same 3220 to the Lisp primitive as displayed in its documentation, modulo the same
2738 concerns described above for @code{F...} names (in particular, 3221 concerns described above for @code{F...} names (in particular,
2739 underscores in the C arguments become dashes in the Lisp arguments). 3222 underscores in the C arguments become dashes in the Lisp arguments).
2740 3223
2741 There is one additional kludge: A trailing `_' on the C argument is 3224 There is one additional kludge: A trailing @samp{_} on the C argument is
2742 discarded when forming the Lisp argument. This allows C language 3225 discarded when forming the Lisp argument. This allows C language
2743 reserved words (like @code{default}) or global symbols (like 3226 reserved words (like @code{default}) or global symbols (like
2744 @code{dirname}) to be used as argument names without compiler warnings 3227 @code{dirname}) to be used as argument names without compiler warnings
2745 or errors. 3228 or errors.
2746 3229
2845 3328
2846 @file{eval.c} is a very good file to look through for examples; 3329 @file{eval.c} is a very good file to look through for examples;
2847 @file{lisp.h} contains the definitions for important macros and 3330 @file{lisp.h} contains the definitions for important macros and
2848 functions. 3331 functions.
2849 3332
2850 @node Writing Good Comments 3333 @node Writing Good Comments, Adding Global Lisp Variables, Writing Lisp Primitives, Rules When Writing New C Code
2851 @section Writing Good Comments 3334 @section Writing Good Comments
2852 @cindex writing good comments 3335 @cindex writing good comments
2853 @cindex comments, writing good 3336 @cindex comments, writing good
2854 3337
2855 Comments are a lifeline for programmers trying to understand tricky 3338 Comments are a lifeline for programmers trying to understand tricky
2908 them as incorrect. 3391 them as incorrect.
2909 3392
2910 To indicate a "todo" or other problem, use four pound signs -- 3393 To indicate a "todo" or other problem, use four pound signs --
2911 i.e. @samp{####}. 3394 i.e. @samp{####}.
2912 3395
2913 @node Adding Global Lisp Variables 3396 @node Adding Global Lisp Variables, Writing Macros, Writing Good Comments, Rules When Writing New C Code
2914 @section Adding Global Lisp Variables 3397 @section Adding Global Lisp Variables
2915 @cindex global Lisp variables, adding 3398 @cindex global Lisp variables, adding
2916 @cindex variables, adding global Lisp 3399 @cindex variables, adding global Lisp
2917 3400
2918 Global variables whose names begin with @samp{Q} are constants whose 3401 Global variables whose names begin with @samp{Q} are constants whose
2977 garbage-collection mechanism won't know that the object in this variable 3460 garbage-collection mechanism won't know that the object in this variable
2978 is in use, and will happily collect it and reuse its storage for another 3461 is in use, and will happily collect it and reuse its storage for another
2979 Lisp object, and you will be the one who's unhappy when you can't figure 3462 Lisp object, and you will be the one who's unhappy when you can't figure
2980 out how your variable got overwritten. 3463 out how your variable got overwritten.
2981 3464
2982 @node Proper Use of Unsigned Types 3465 @node Writing Macros, Proper Use of Unsigned Types, Adding Global Lisp Variables, Rules When Writing New C Code
3466 @section Writing Macros
3467 @cindex writing macros
3468 @cindex macros, writing
3469
3470 The three golden rules of macros:
3471
3472 @enumerate
3473 @item
3474 Anything that's an lvalue can be evaluated more than once.
3475 @item
3476 Macros where anything else can be evaluated more than once should
3477 have the word "unsafe" in their name (exceptions may be made for
3478 large sets of macros that evaluate arguments of certain types more
3479 than once, e.g. struct buffer * arguments, when clearly indicated in
3480 the macro documentation). These macros are generally meant to be
3481 called only by other macros that have already stored the calling
3482 values in temporary variables.
3483 @item
3484 Nothing else can be evaluated more than once. Use inline
3485 functions, if necessary, to prevent multiple evaluation.
3486 @end enumerate
3487
3488 NOTE: The functions and macros below are given full prototypes in their
3489 docs, even when the implementation is a macro. In such cases, passing
3490 an argument of a type other than expected will produce undefined
3491 results. Also, given that macros can do things functions can't (in
3492 particular, directly modify arguments as if they were passed by
3493 reference), the declaration syntax has been extended to include the
3494 call-by-reference syntax from C++, where an & after a type indicates
3495 that the argument is an lvalue and is passed by reference, i.e. the
3496 function can modify its value. (This is equivalent in C to passing a
3497 pointer to the argument, but without the need to explicitly worry about
3498 pointers.)
3499
3500 When to capitalize macros:
3501
3502 @itemize @bullet
3503 @item
3504 Capitalize macros doing stuff obviously impossible with (C)
3505 functions, e.g. directly modifying arguments as if they were passed by
3506 reference.
3507 @item
3508 Capitalize macros that evaluate @strong{any} argument more than once regardless
3509 of whether that's "allowed" (e.g. buffer arguments).
3510 @item
3511 Capitalize macros that directly access a field in a Lisp_Object or
3512 its equivalent underlying structure. In such cases, access through the
3513 Lisp_Object precedes the macro with an X, and access through the underlying
3514 structure doesn't.
3515 @item
3516 Capitalize certain other basic macros relating to Lisp_Objects; e.g.
3517 FRAMEP, CHECK_FRAME, etc.
3518 @item
3519 Try to avoid capitalizing any other macros.
3520 @end itemize
3521
3522 @node Proper Use of Unsigned Types, Techniques for XEmacs Developers, Writing Macros, Rules When Writing New C Code
2983 @section Proper Use of Unsigned Types 3523 @section Proper Use of Unsigned Types
2984 @cindex unsigned types, proper use of 3524 @cindex unsigned types, proper use of
2985 @cindex types, proper use of unsigned 3525 @cindex types, proper use of unsigned
2986 3526
2987 Avoid using @code{unsigned int} and @code{unsigned long} whenever 3527 Avoid using @code{unsigned int} and @code{unsigned long} whenever
3008 @end enumerate 3548 @end enumerate
3009 3549
3010 Other reasonable uses of @code{unsigned int} and @code{unsigned long} 3550 Other reasonable uses of @code{unsigned int} and @code{unsigned long}
3011 are representing non-quantities -- e.g. bit-oriented flags and such. 3551 are representing non-quantities -- e.g. bit-oriented flags and such.
3012 3552
3013 @node Coding for Mule 3553 @node Techniques for XEmacs Developers, , Proper Use of Unsigned Types, Rules When Writing New C Code
3014 @section Coding for Mule
3015 @cindex coding for Mule
3016 @cindex Mule, coding for
3017
3018 Although Mule support is not compiled by default in XEmacs, many people
3019 are using it, and we consider it crucial that new code works correctly
3020 with multibyte characters. This is not hard; it is only a matter of
3021 following several simple user-interface guidelines. Even if you never
3022 compile with Mule, with a little practice you will find it quite easy
3023 to code Mule-correctly.
3024
3025 Note that these guidelines are not necessarily tied to the current Mule
3026 implementation; they are also a good idea to follow on the grounds of
3027 code generalization for future I18N work.
3028
3029 @menu
3030 * Character-Related Data Types::
3031 * Working With Character and Byte Positions::
3032 * Conversion to and from External Data::
3033 * General Guidelines for Writing Mule-Aware Code::
3034 * An Example of Mule-Aware Code::
3035 * Mule-izing Code::
3036 @end menu
3037
3038 @node Character-Related Data Types
3039 @subsection Character-Related Data Types
3040 @cindex character-related data types
3041 @cindex data types, character-related
3042
3043 First, let's review the basic character-related datatypes used by
3044 XEmacs. Note that some of the separate @code{typedef}s are not
3045 mandatory, but they improve clarity of code a great deal, because one
3046 glance at the declaration can tell the intended use of the variable.
3047
3048 @table @code
3049 @item Ichar
3050 @cindex Ichar
3051 An @code{Ichar} holds a single Emacs character.
3052
3053 Obviously, the equality between characters and bytes is lost in the Mule
3054 world. Characters can be represented by one or more bytes in the
3055 buffer, and @code{Ichar} is a C type large enough to hold any
3056 character. (This currently isn't quite true for ISO 10646, which
3057 defines a character as a 31-bit non-negative quantity, while XEmacs
3058 characters are only 30-bits. This is irrelevant, unless you are
3059 considering using the ISO 10646 private groups to support really large
3060 private character sets---in particular, the Mule character set!---in
3061 a version of XEmacs using Unicode internally.)
3062
3063 Without Mule support, an @code{Ichar} is equivalent to an
3064 @code{unsigned char}. [[This doesn't seem to be true; @file{lisp.h}
3065 unconditionally @samp{typedef}s @code{Ichar} to @code{int}.]]
3066
3067 @item Ibyte
3068 @cindex Ibyte
3069 The data representing the text in a buffer or string is logically a set
3070 of @code{Ibyte}s.
3071
3072 XEmacs does not work with the same character formats all the time; when
3073 reading characters from the outside, it decodes them to an internal
3074 format, and likewise encodes them when writing. @code{Ibyte} (in fact
3075 @code{unsigned char}) is the basic unit of XEmacs internal buffers and
3076 strings format. An @code{Ibyte *} is the type that points at text
3077 encoded in the variable-width internal encoding.
3078
3079 One character can correspond to one or more @code{Ibyte}s. In the
3080 current Mule implementation, an ASCII character is represented by the
3081 same @code{Ibyte}, and other characters are represented by a sequence
3082 of two or more @code{Ibyte}s. (This will also be true of an
3083 implementation using UTF-8 as the internal encoding. In fact, only code
3084 that implements character code conversions and a very few macros used to
3085 implement motion by whole characters will notice the difference between
3086 UTF-8 and the Mule encoding.)
3087
3088 Without Mule support, there are exactly 256 characters, implicitly
3089 Latin-1, and each character is represented using one @code{Ibyte}, and
3090 there is a one-to-one correspondence between @code{Ibyte}s and
3091 @code{Ichar}s.
3092
3093 @item Charxpos
3094 @item Charbpos
3095 @itemx Charcount
3096 @cindex Charxpos
3097 @cindex Charbpos
3098 @cindex Charcount
3099 A @code{Charbpos} represents a character position in a buffer. A
3100 @code{Charcount} represents a number (count) of characters. Logically,
3101 subtracting two @code{Charbpos} values yields a @code{Charcount} value.
3102 When representing a character position in a string, we just use
3103 @code{Charcount} directly. The reason for having a separate typedef for
3104 buffer positions is that they are 1-based, whereas string positions are
3105 0-based and hence string counts and positions can be freely intermixed (a
3106 string position is equivalent to the count of characters from the
3107 beginning). When representing a character position that could be either
3108 in a buffer or string (for example, in the extent code), @code{Charxpos}
3109 is used. Although all of these are @code{typedef}ed to
3110 @code{EMACS_INT}, we use them in preference to @code{EMACS_INT} to make
3111 it clear what sort of position is being used.
3112
3113 @code{Charxpos}, @code{Charbpos} and @code{Charcount} values are the
3114 only ones that are ever visible to Lisp.
3115
3116 @item Bytexpos
3117 @itemx Bytecount
3118 @cindex Bytebpos
3119 @cindex Bytecount
3120 A @code{Bytebpos} represents a byte position in a buffer. A
3121 @code{Bytecount} represents the distance between two positions, in
3122 bytes. Byte positions in strings use @code{Bytecount}, and for byte
3123 positions that can be either in a buffer or string, @code{Bytexpos} is
3124 used. The relationship between @code{Bytexpos}, @code{Bytebpos} and
3125 @code{Bytecount} is the same as the relationship between
3126 @code{Charxpos}, @code{Charbpos} and @code{Charcount}.
3127
3128 @item Extbyte
3129 @cindex Extbyte
3130 When dealing with the outside world, XEmacs works with @code{Extbyte}s,
3131 which are equivalent to @code{char}. The distance between two
3132 @code{Extbyte}s is a @code{Bytecount}, since external text is a
3133 byte-by-byte encoding. Extbytes occur mainly at the transition point
3134 between internal text and external functions. XEmacs code should not,
3135 if it can possibly avoid it, do any actual manipulation using external
3136 text, since its format is completely unpredictable (it might not even be
3137 ASCII-compatible).
3138 @end table
3139
3140 @node Working With Character and Byte Positions
3141 @subsection Working With Character and Byte Positions
3142 @cindex character and byte positions, working with
3143 @cindex byte positions, working with character and
3144 @cindex positions, working with character and byte
3145
3146 Now that we have defined the basic character-related types, we can look
3147 at the macros and functions designed for work with them and for
3148 conversion between them. Most of these macros are defined in
3149 @file{buffer.h}, and we don't discuss all of them here, but only the
3150 most important ones. Examining the existing code is the best way to
3151 learn about them.
3152
3153 @table @code
3154 @item MAX_ICHAR_LEN
3155 @cindex MAX_ICHAR_LEN
3156 This preprocessor constant is the maximum number of buffer bytes to
3157 represent an Emacs character in the variable width internal encoding.
3158 It is useful when allocating temporary strings to keep a known number of
3159 characters. For instance:
3160
3161 @example
3162 @group
3163 @{
3164 Charcount cclen;
3165 ...
3166 @{
3167 /* Allocate place for @var{cclen} characters. */
3168 Ibyte *buf = (Ibyte *) alloca (cclen * MAX_ICHAR_LEN);
3169 ...
3170 @end group
3171 @end example
3172
3173 If you followed the previous section, you can guess that, logically,
3174 multiplying a @code{Charcount} value with @code{MAX_ICHAR_LEN} produces
3175 a @code{Bytecount} value.
3176
3177 In the current Mule implementation, @code{MAX_ICHAR_LEN} equals 4.
3178 Without Mule, it is 1. In a mature Unicode-based XEmacs, it will also
3179 be 4 (since all Unicode characters can be encoded in UTF-8 in 4 bytes or
3180 less), but some versions may use up to 6, in order to use the large
3181 private space provided by ISO 10646 to ``mirror'' the Mule code space.
3182
3183 @item itext_ichar
3184 @itemx set_itext_ichar
3185 @cindex itext_ichar
3186 @cindex set_itext_ichar
3187 The @code{itext_ichar} macro takes a @code{Ibyte} pointer and
3188 returns the @code{Ichar} stored at that position. If it were a
3189 function, its prototype would be:
3190
3191 @example
3192 Ichar itext_ichar (Ibyte *p);
3193 @end example
3194
3195 @code{set_itext_ichar} stores an @code{Ichar} to the specified byte
3196 position. It returns the number of bytes stored:
3197
3198 @example
3199 Bytecount set_itext_ichar (Ibyte *p, Ichar c);
3200 @end example
3201
3202 It is important to note that @code{set_itext_ichar} is safe only for
3203 appending a character at the end of a buffer, not for overwriting a
3204 character in the middle. This is because the width of characters
3205 varies, and @code{set_itext_ichar} cannot resize the string if it
3206 writes, say, a two-byte character where a single-byte character used to
3207 reside.
3208
3209 A typical use of @code{set_itext_ichar} can be demonstrated by this
3210 example, which copies characters from buffer @var{buf} to a temporary
3211 string of Ibytes.
3212
3213 @example
3214 @group
3215 @{
3216 Charbpos pos;
3217 for (pos = beg; pos < end; pos++)
3218 @{
3219 Ichar c = BUF_FETCH_CHAR (buf, pos);
3220 p += set_itext_ichar (buf, c);
3221 @}
3222 @}
3223 @end group
3224 @end example
3225
3226 Note how @code{set_itext_ichar} is used to store the @code{Ichar}
3227 and increment the counter, at the same time.
3228
3229 @item INC_IBYTEPTR
3230 @itemx DEC_IBYTEPTR
3231 @cindex INC_IBYTEPTR
3232 @cindex DEC_IBYTEPTR
3233 These two macros increment and decrement an @code{Ibyte} pointer,
3234 respectively. They will adjust the pointer by the appropriate number of
3235 bytes according to the byte length of the character stored there. Both
3236 macros assume that the memory address is located at the beginning of a
3237 valid character.
3238
3239 Without Mule support, @code{INC_IBYTEPTR (p)} and @code{DEC_IBYTEPTR (p)}
3240 simply expand to @code{p++} and @code{p--}, respectively.
3241
3242 @item bytecount_to_charcount
3243 @cindex bytecount_to_charcount
3244 Given a pointer to a text string and a length in bytes, return the
3245 equivalent length in characters.
3246
3247 @example
3248 Charcount bytecount_to_charcount (Ibyte *p, Bytecount bc);
3249 @end example
3250
3251 @item charcount_to_bytecount
3252 @cindex charcount_to_bytecount
3253 Given a pointer to a text string and a length in characters, return the
3254 equivalent length in bytes.
3255
3256 @example
3257 Bytecount charcount_to_bytecount (Ibyte *p, Charcount cc);
3258 @end example
3259
3260 @item itext_n_addr
3261 @cindex itext_n_addr
3262 Return a pointer to the beginning of the character offset @var{cc} (in
3263 characters) from @var{p}.
3264
3265 @example
3266 Ibyte *itext_n_addr (Ibyte *p, Charcount cc);
3267 @end example
3268 @end table
3269
3270 @node Conversion to and from External Data
3271 @subsection Conversion to and from External Data
3272 @cindex conversion to and from external data
3273 @cindex external data, conversion to and from
3274
3275 When an external function, such as a C library function, returns a
3276 @code{char} pointer, you should almost never treat it as @code{Ibyte}.
3277 This is because these returned strings may contain 8bit characters which
3278 can be misinterpreted by XEmacs, and cause a crash. Likewise, when
3279 exporting a piece of internal text to the outside world, you should
3280 always convert it to an appropriate external encoding, lest the internal
3281 stuff (such as the infamous \201 characters) leak out.
3282
3283 The interface to conversion between the internal and external
3284 representations of text are the numerous conversion macros defined in
3285 @file{buffer.h}. There used to be a fixed set of external formats
3286 supported by these macros, but now any coding system can be used with
3287 them. The coding system alias mechanism is used to create the
3288 following logical coding systems, which replace the fixed external
3289 formats. The (dontusethis-set-symbol-value-handler) mechanism was
3290 enhanced to make this possible (more work on that is needed).
3291
3292 Often useful coding systems:
3293
3294 @table @code
3295 @item Qbinary
3296 This is the simplest format and is what we use in the absence of a more
3297 appropriate format. This converts according to the @code{binary} coding
3298 system:
3299
3300 @enumerate a
3301 @item
3302 On input, bytes 0--255 are converted into (implicitly Latin-1)
3303 characters 0--255. A non-Mule xemacs doesn't really know about
3304 different character sets and the fonts to display them, so the bytes can
3305 be treated as text in different 1-byte encodings by simply setting the
3306 appropriate fonts. So in a sense, non-Mule xemacs is a multi-lingual
3307 editor if, for example, different fonts are used to display text in
3308 different buffers, faces, or windows. The specifier mechanism gives the
3309 user complete control over this kind of behavior.
3310 @item
3311 On output, characters 0--255 are converted into bytes 0--255 and other
3312 characters are converted into `~'.
3313 @end enumerate
3314
3315 @item Qnative
3316 Format used for the external Unix environment---@code{argv[]}, stuff
3317 from @code{getenv()}, stuff from the @file{/etc/passwd} file, etc.
3318 This is encoded according to the encoding specified by the current locale.
3319 [[This is dangerous; current locale is user preference, and the system
3320 is probably going to be something else. Is there anything we can do
3321 about it?]]
3322
3323 @item Qfile_name
3324 Format used for filenames. This is normally the same as @code{Qnative},
3325 but the two should be distinguished for clarity and possible future
3326 separation -- and also because @code{Qfile_name} can be changed using either
3327 the @code{file-name-coding-system} or @code{pathname-coding-system} (now
3328 obsolete) variables.
3329
3330 @item Qctext
3331 Compound-text format. This is the standard X11 format used for data
3332 stored in properties, selections, and the like. This is an 8-bit
3333 no-lock-shift ISO2022 coding system. This is a real coding system,
3334 unlike @code{Qfile_name}, which is user-definable.
3335
3336 @item Qmswindows_tstr
3337 Used for external data in all MS Windows functions that are declared to
3338 accept data of type @code{LPTSTR} or @code{LPCSTR}. This maps to either
3339 @code{Qmswindows_multibyte} (a locale-specific encoding, same as
3340 @code{Qnative}) or @code{Qmswindows_unicode}, depending on whether
3341 XEmacs is being run under Windows 9X or Windows NT/2000/XP.
3342 @end table
3343
3344 Many other coding systems are provided by default.
3345
3346 There are two fundamental macros to convert between external and
3347 internal format, as well as various convenience macros to simplify the
3348 most common operations.
3349
3350 @code{TO_INTERNAL_FORMAT} converts external data to internal format, and
3351 @code{TO_EXTERNAL_FORMAT} converts the other way around. The arguments
3352 each of these receives are a source type, a source, a sink type, a sink,
3353 and a coding system (or a symbol naming a coding system).
3354
3355 A typical call looks like
3356 @example
3357 TO_EXTERNAL_FORMAT (LISP_STRING, str, C_STRING_MALLOC, ptr, Qfile_name);
3358 @end example
3359
3360 which means that the contents of the lisp string @code{str} are written
3361 to a malloc'ed memory area which will be pointed to by @code{ptr}, after
3362 the function returns. The conversion will be done using the
3363 @code{file-name} coding system, which will be controlled by the user
3364 indirectly by setting or binding the variable
3365 @code{file-name-coding-system}.
3366
3367 Some sources and sinks require two C variables to specify. We use some
3368 preprocessor magic to allow different source and sink types, and even
3369 different numbers of arguments to specify different types of sources and
3370 sinks.
3371
3372 So we can have a call that looks like
3373 @example
3374 TO_INTERNAL_FORMAT (DATA, (ptr, len),
3375 MALLOC, (ptr, len),
3376 coding_system);
3377 @end example
3378
3379 The parenthesized argument pairs are required to make the preprocessor
3380 magic work.
3381
3382 Here are the different source and sink types:
3383
3384 @table @code
3385 @item @code{DATA, (ptr, len),}
3386 input data is a fixed buffer of size @var{len} at address @var{ptr}
3387 @item @code{ALLOCA, (ptr, len),}
3388 output data is placed in an alloca()ed buffer of size @var{len} pointed to by @var{ptr}
3389 @item @code{MALLOC, (ptr, len),}
3390 output data is in a malloc()ed buffer of size @var{len} pointed to by @var{ptr}
3391 @item @code{C_STRING_ALLOCA, ptr,}
3392 equivalent to @code{ALLOCA (ptr, len_ignored)} on output.
3393 @item @code{C_STRING_MALLOC, ptr,}
3394 equivalent to @code{MALLOC (ptr, len_ignored)} on output
3395 @item @code{C_STRING, ptr,}
3396 equivalent to @code{DATA, (ptr, strlen/wcslen (ptr))} on input
3397 @item @code{LISP_STRING, string,}
3398 input or output is a Lisp_Object of type string
3399 @item @code{LISP_BUFFER, buffer,}
3400 output is written to @code{(point)} in lisp buffer @var{buffer}
3401 @item @code{LISP_LSTREAM, lstream,}
3402 input or output is a Lisp_Object of type lstream
3403 @item @code{LISP_OPAQUE, object,}
3404 input or output is a Lisp_Object of type opaque
3405 @end table
3406
3407 A source type of @code{C_STRING} or a sink type of
3408 @code{C_STRING_ALLOCA} or @code{C_STRING_MALLOC} is appropriate where
3409 the external API is not '\0'-byte-clean -- i.e. it expects strings to be
3410 terminated with a null byte. For external API's that are in fact
3411 '\0'-byte-clean, we should of course not use these.
3412
3413 The sinks to be specified must be lvalues, unless they are the lisp
3414 object types @code{LISP_LSTREAM} or @code{LISP_BUFFER}.
3415
3416 There is no problem using the same lvalue for source and sink.
3417
3418 Garbage collection is inhibited during these conversion operations, so
3419 it is OK to pass in data from Lisp strings using @code{XSTRING_DATA}.
3420
3421 For the sink types @code{ALLOCA} and @code{C_STRING_ALLOCA}, the
3422 resulting text is stored in a stack-allocated buffer, which is
3423 automatically freed on returning from the function. However, the sink
3424 types @code{MALLOC} and @code{C_STRING_MALLOC} return @code{xmalloc()}ed
3425 memory. The caller is responsible for freeing this memory using
3426 @code{xfree()}.
3427
3428 Note that it doesn't make sense for @code{LISP_STRING} to be a source
3429 for @code{TO_INTERNAL_FORMAT} or a sink for @code{TO_EXTERNAL_FORMAT}.
3430 You'll get an assertion failure if you try.
3431
3432 99% of conversions involve raw data or Lisp strings as both source and
3433 sink, and usually data is output as @code{alloca()}, or sometimes
3434 @code{xmalloc()}. For this reason, convenience macros are defined for
3435 many types of conversions involving raw data and/or Lisp strings,
3436 especially when the output is an @code{alloca()}ed string. (When the
3437 destination is a Lisp string, there are other functions that should be
3438 used instead -- @code{build_ext_string()} and @code{make_ext_string()},
3439 for example.) The convenience macros are of two types -- the older kind
3440 that store the result into a specified variable, and the newer kind that
3441 return the result. The newer kind of macros don't exist when the output
3442 is sized data, because that would have two return values. NOTE: All
3443 convenience macros are ultimately defined in terms of
3444 @code{TO_EXTERNAL_FORMAT} and @code{TO_INTERNAL_FORMAT}. Thus, any
3445 comments above about the workings of these macros also apply to all
3446 convenience macros.
3447
3448 A typical old-style convenience macro is
3449
3450 @example
3451 C_STRING_TO_EXTERNAL (in, out, codesys);
3452 @end example
3453
3454 This is equivalent to
3455
3456 @example
3457 TO_EXTERNAL_FORMAT (C_STRING, in, C_STRING_ALLOCA, out, codesys);
3458 @end example
3459
3460 but is easier to write and somewhat clearer, since it clearly identifies
3461 the arguments without the clutter of having the preprocessor types mixed
3462 in.
3463
3464 The new-style equivalent is @code{NEW_C_STRING_TO_EXTERNAL (src,
3465 codesys)}, which @emph{returns} the converted data (still in
3466 @code{alloca()} space). This is far more convenient for most
3467 operations.
3468
3469 @node General Guidelines for Writing Mule-Aware Code
3470 @subsection General Guidelines for Writing Mule-Aware Code
3471 @cindex writing Mule-aware code, general guidelines for
3472 @cindex Mule-aware code, general guidelines for writing
3473 @cindex code, general guidelines for writing Mule-aware
3474
3475 This section contains some general guidance on how to write Mule-aware
3476 code, as well as some pitfalls you should avoid.
3477
3478 @table @emph
3479 @item Never use @code{char} and @code{char *}.
3480 In XEmacs, the use of @code{char} and @code{char *} is almost always a
3481 mistake. If you want to manipulate an Emacs character from ``C'', use
3482 @code{Ichar}. If you want to examine a specific octet in the internal
3483 format, use @code{Ibyte}. If you want a Lisp-visible character, use a
3484 @code{Lisp_Object} and @code{make_char}. If you want a pointer to move
3485 through the internal text, use @code{Ibyte *}. Also note that you
3486 almost certainly do not need @code{Ichar *}. Other typedefs to clarify
3487 the use of @code{char} are @code{Char_ASCII}, @code{Char_Binary},
3488 @code{UChar_Binary}, and @code{CIbyte}.
3489
3490 @item Be careful not to confuse @code{Charcount}, @code{Bytecount}, @code{Charbpos} and @code{Bytebpos}.
3491 The whole point of using different types is to avoid confusion about the
3492 use of certain variables. Lest this effect be nullified, you need to be
3493 careful about using the right types.
3494
3495 @item Always convert external data
3496 It is extremely important to always convert external data, because
3497 XEmacs can crash if unexpected 8-bit sequences are copied to its internal
3498 buffers literally.
3499
3500 This means that when a system function, such as @code{readdir}, returns
3501 a string, you normally need to convert it using one of the conversion macros
3502 described in the previous chapter, before passing it further to Lisp.
3503
3504 Actually, most of the basic system functions that accept '\0'-terminated
3505 string arguments, like @code{stat()} and @code{open()}, have
3506 @strong{encapsulated} equivalents that do the internal to external
3507 conversion themselves. The encapsulated equivalents have a @code{qxe_}
3508 prefix and have string arguments of type @code{Ibyte *}, and you can
3509 pass internally encoded data to them, often from a Lisp string using
3510 @code{XSTRING_DATA}. (A better design might be to provide versions that
3511 accept Lisp strings directly.) [[Really? Then they'd either take
3512 @code{Lisp_Object}s and need to check type, or they'd take
3513 @code{Lisp_String}s, and violate the rules about passing any of the
3514 specific Lisp types.]]
3515
3516 Also note that many internal functions, such as @code{make_string},
3517 accept Ibytes, which removes the need for them to convert the data they
3518 receive. This increases efficiency because that way external data needs
3519 to be decoded only once, when it is read. After that, it is passed
3520 around in internal format.
3521
3522 @item Do all work in internal format
3523 External-formatted data is completely unpredictable in its format. It
3524 may be fixed-width Unicode (not even ASCII compatible); it may be a
3525 modal encoding, in
3526 which case some occurrences of (e.g.) the slash character may be part of
3527 two-byte Asian-language characters, and a naive attempt to split apart a
3528 pathname by slashes will fail; etc. Internal-format text should be
3529 converted to external format only at the point where an external API is
3530 actually called, and the first thing done after receiving
3531 external-format text from an external API should be to convert it to
3532 internal text.
3533 @end table
3534
3535 @node An Example of Mule-Aware Code
3536 @subsection An Example of Mule-Aware Code
3537 @cindex code, an example of Mule-aware
3538 @cindex Mule-aware code, an example of
3539
3540 As an example of Mule-aware code, we will analyze the @code{string}
3541 function, which conses up a Lisp string from the character arguments it
3542 receives. Here is the definition, pasted from @code{alloc.c}:
3543
3544 @example
3545 @group
3546 DEFUN ("string", Fstring, 0, MANY, 0, /*
3547 Concatenate all the argument characters and make the result a string.
3548 */
3549 (int nargs, Lisp_Object *args))
3550 @{
3551 Ibyte *storage = alloca_array (Ibyte, nargs * MAX_ICHAR_LEN);
3552 Ibyte *p = storage;
3553
3554 for (; nargs; nargs--, args++)
3555 @{
3556 Lisp_Object lisp_char = *args;
3557 CHECK_CHAR_COERCE_INT (lisp_char);
3558 p += set_itext_ichar (p, XCHAR (lisp_char));
3559 @}
3560 return make_string (storage, p - storage);
3561 @}
3562 @end group
3563 @end example
3564
3565 Now we can analyze the source line by line.
3566
3567 Obviously, string will be as long as there are arguments to the
3568 function. This is why we allocate @code{MAX_ICHAR_LEN} * @var{nargs}
3569 bytes on the stack, i.e. the worst-case number of bytes for @var{nargs}
3570 @code{Ichar}s to fit in the string.
3571
3572 Then, the loop checks that each element is a character, converting
3573 integers in the process. Like many other functions in XEmacs, this
3574 function silently accepts integers where characters are expected, for
3575 historical and compatibility reasons. Unless you know what you are
3576 doing, @code{CHECK_CHAR} will also suffice. @code{XCHAR (lisp_char)}
3577 extracts the @code{Ichar} from the @code{Lisp_Object}, and
3578 @code{set_itext_ichar} stores it to storage, increasing @code{p} in
3579 the process.
3580
3581 Other instructive examples of correct coding under Mule can be found all
3582 over the XEmacs code. For starters, I recommend
3583 @code{Fnormalize_menu_item_name} in @file{menubar.c}. After you have
3584 understood this section of the manual and studied the examples, you can
3585 proceed writing new Mule-aware code.
3586
3587 @node Mule-izing Code
3588 @subsection Mule-izing Code
3589
3590 A lot of code is written without Mule in mind, and needs to be made
3591 Mule-correct or "Mule-ized". There is really no substitute for
3592 line-by-line analysis when doing this, but the following checklist can
3593 help:
3594
3595 @itemize @bullet
3596 @item
3597 Check all uses of @code{XSTRING_DATA}.
3598 @item
3599 Check all uses of @code{build_string} and @code{make_string}.
3600 @item
3601 Check all uses of @code{tolower} and @code{toupper}.
3602 @item
3603 Check object print methods.
3604 @item
3605 Check for use of functions such as @code{write_c_string},
3606 @code{write_fmt_string}, @code{stderr_out}, @code{stdout_out}.
3607 @item
3608 Check all occurrences of @code{char} and correct to one of the other
3609 typedefs described above.
3610 @item
3611 Check all existing uses of @code{TO_EXTERNAL_FORMAT},
3612 @code{TO_INTERNAL_FORMAT}, and any convenience macros (grep for
3613 @samp{EXTERNAL_TO}, @samp{TO_EXTERNAL}, and @samp{TO_SIZED_EXTERNAL}).
3614 @item
3615 In Windows code, string literals may need to be encapsulated with @code{XETEXT}.
3616 @end itemize
3617
3618 @node Techniques for XEmacs Developers
3619 @section Techniques for XEmacs Developers 3554 @section Techniques for XEmacs Developers
3620 @cindex techniques for XEmacs developers 3555 @cindex techniques for XEmacs developers
3621 @cindex developers, techniques for XEmacs 3556 @cindex developers, techniques for XEmacs
3622 3557
3623 @cindex Purify 3558 @cindex Purify
3711 if (!marked_p (obj)) mark_object (obj), did_mark = 1 3646 if (!marked_p (obj)) mark_object (obj), did_mark = 1
3712 @end example 3647 @end example
3713 3648
3714 This macro evaluates its argument twice, and also fails if used like this: 3649 This macro evaluates its argument twice, and also fails if used like this:
3715 @example 3650 @example
3716 if (flag) MARK_OBJECT (obj); else do_something(); 3651 if (flag) MARK_OBJECT (obj); else @code{do_something()};
3717 @end example 3652 @end example
3718 3653
3719 A much better definition is 3654 A much better definition is
3720 3655
3721 @example 3656 @example
3863 @end enumerate 3798 @end enumerate
3864 3799
3865 @node Regression Testing XEmacs, CVS Techniques, Rules When Writing New C Code, Top 3800 @node Regression Testing XEmacs, CVS Techniques, Rules When Writing New C Code, Top
3866 @chapter Regression Testing XEmacs 3801 @chapter Regression Testing XEmacs
3867 @cindex testing, regression 3802 @cindex testing, regression
3803
3804 @menu
3805 * How to Regression-Test::
3806 * Modules for Regression Testing::
3807 @end menu
3808
3809 @node How to Regression-Test, Modules for Regression Testing, Regression Testing XEmacs, Regression Testing XEmacs
3810 @section How to Regression-Test
3811 @cindex how to regression-test
3812 @cindex regression-test, how to
3813 @cindex testing, regression, how to
3868 3814
3869 The source directory @file{tests/automated} contains XEmacs' automated 3815 The source directory @file{tests/automated} contains XEmacs' automated
3870 test suite. The usual way of running all the tests is running 3816 test suite. The usual way of running all the tests is running
3871 @code{make check} from the top-level build directory. 3817 @code{make check} from the top-level build directory.
3872 3818
4084 reported as an assertion failure (the test failed in a foreseeable way), 4030 reported as an assertion failure (the test failed in a foreseeable way),
4085 rather than something else (we don't know what happened because XEmacs 4031 rather than something else (we don't know what happened because XEmacs
4086 is broken in a way that we weren't trying to test!) 4032 is broken in a way that we weren't trying to test!)
4087 @end enumerate 4033 @end enumerate
4088 4034
4089 4035 @node Modules for Regression Testing, , How to Regression-Test, Regression Testing XEmacs
4090 @node CVS Techniques, A Summary of the Various XEmacs Modules, Regression Testing XEmacs, Top 4036 @section Modules for Regression Testing
4037 @cindex modules for regression testing
4038 @cindex regression testing, modules for
4039
4040 @example
4041 @file{test-harness.el}
4042 @file{base64-tests.el}
4043 @file{byte-compiler-tests.el}
4044 @file{case-tests.el}
4045 @file{ccl-tests.el}
4046 @file{c-tests.el}
4047 @file{database-tests.el}
4048 @file{extent-tests.el}
4049 @file{hash-table-tests.el}
4050 @file{lisp-tests.el}
4051 @file{md5-tests.el}
4052 @file{mule-tests.el}
4053 @file{regexp-tests.el}
4054 @file{symbol-tests.el}
4055 @file{syntax-tests.el}
4056 @file{tag-tests.el}
4057 @file{weak-tests.el}
4058 @end example
4059
4060 @file{test-harness.el} defines the macros @code{Assert},
4061 @code{Check-Error}, @code{Check-Error-Message}, and
4062 @code{Check-Message}. The other files are test files, testing various
4063 XEmacs facilities. @xref{Regression Testing XEmacs}.
4064
4065
4066 @node CVS Techniques, The Modules of XEmacs, Regression Testing XEmacs, Top
4091 @chapter CVS Techniques 4067 @chapter CVS Techniques
4092 @cindex CVS techniques 4068 @cindex CVS techniques
4093 4069
4094 @menu 4070 @menu
4095 * Merging a Branch into the Trunk:: 4071 * Merging a Branch into the Trunk::
4096 @end menu 4072 @end menu
4097 4073
4098 @node Merging a Branch into the Trunk 4074 @node Merging a Branch into the Trunk, , CVS Techniques, CVS Techniques
4099 @section Merging a Branch into the Trunk 4075 @section Merging a Branch into the Trunk
4100 @cindex merging a branch into the trunk 4076 @cindex merging a branch into the trunk
4101 4077
4102 @enumerate 4078 @enumerate
4103 @item 4079 @item
4175 crw rtag -F -r next-sync-ben-mule-21-5 last-sync-ben-mule-21-5 xemacs 4151 crw rtag -F -r next-sync-ben-mule-21-5 last-sync-ben-mule-21-5 xemacs
4176 @end example 4152 @end example
4177 @end enumerate 4153 @end enumerate
4178 4154
4179 4155
4180 @node A Summary of the Various XEmacs Modules, Allocation of Objects in XEmacs Lisp, CVS Techniques, Top 4156 @node The Modules of XEmacs, Allocation of Objects in XEmacs Lisp, CVS Techniques, Top
4181 @chapter A Summary of the Various XEmacs Modules 4157 @chapter The Modules of XEmacs
4182 @cindex modules, a summary of the various XEmacs 4158 @cindex modules of XEmacs
4183
4184 This is accurate as of XEmacs 20.0.
4185 4159
4186 @menu 4160 @menu
4187 * Low-Level Modules:: 4161 * A Summary of the Various XEmacs Modules::
4188 * Basic Lisp Modules:: 4162 * Low-Level Modules::
4189 * Modules for Standard Editing Operations:: 4163 * Basic Lisp Modules::
4190 * Editor-Level Control Flow Modules:: 4164 * Modules for Standard Editing Operations::
4191 * Modules for the Basic Displayable Lisp Objects:: 4165 * Modules for Interfacing with the File System::
4192 * Modules for other Display-Related Lisp Objects:: 4166 * Modules for Other Aspects of the Lisp Interpreter and Object System::
4193 * Modules for the Redisplay Mechanism:: 4167 * Modules for Interfacing with the Operating System::
4194 * Modules for Interfacing with the File System::
4195 * Modules for Other Aspects of the Lisp Interpreter and Object System::
4196 * Modules for Interfacing with the Operating System::
4197 * Modules for Interfacing with X Windows::
4198 * Modules for Internationalization::
4199 * Modules for Regression Testing::
4200 @end menu 4168 @end menu
4201 4169
4202 @node Low-Level Modules 4170 @node A Summary of the Various XEmacs Modules, Low-Level Modules, The Modules of XEmacs, The Modules of XEmacs
4171 @section A Summary of the Various XEmacs Modules
4172 @cindex summary of the various XEmacs modules
4173 @cindex modules, summary of the various XEmacs
4174
4175 The following is a list of the sections describing the various modules
4176 (i.e. files) that implement XEmacs. Some of them are in this chapter;
4177 some of them are attached to the chapters describing the modules in
4178 question.
4179
4180 @itemize @bullet
4181 @item
4182 @ref{Low-Level Modules}.
4183 @item
4184 @ref{Basic Lisp Modules}.
4185 @item
4186 @ref{Modules for Standard Editing Operations}.
4187 @item
4188 @ref{Editor-Level Control Flow Modules}.
4189 @item
4190 @ref{Modules for the Basic Displayable Lisp Objects}.
4191 @item
4192 @ref{Modules for other Display-Related Lisp Objects}.
4193 @item
4194 @ref{Modules for the Redisplay Mechanism}.
4195 @item
4196 @ref{Modules for Interfacing with the File System}.
4197 @item
4198 @ref{Modules for Other Aspects of the Lisp Interpreter and Object System}.
4199 @item
4200 @ref{Modules for Interfacing with the Operating System}.
4201 @item
4202 @ref{Modules for Interfacing with MS Windows}.
4203 @item
4204 @ref{Modules for Interfacing with X Windows}.
4205 @item
4206 @ref{Modules for Internationalization}.
4207 @item
4208 @ref{Modules for Regression Testing}.
4209 @end itemize
4210
4211 The following table contains cross-references from each module in XEmacs
4212 21.5 to the section (if any) describing it.
4213
4214 @multitable {@file{intl-auto-encap-win32.c}} {@ref{Modules for Other Aspects of the Lisp Interpreter and Object System}}
4215 @item @file{Emacs.ad.h} @tab @ref{Modules for Interfacing with X Windows}.
4216 @item @file{EmacsFrame.c} @tab @ref{Modules for Interfacing with X Windows}.
4217 @item @file{EmacsFrame.h} @tab @ref{Modules for Interfacing with X Windows}.
4218 @item @file{EmacsFrameP.h} @tab @ref{Modules for Interfacing with X Windows}.
4219 @item @file{EmacsManager.c} @tab @ref{Modules for Interfacing with X Windows}.
4220 @item @file{EmacsManager.h} @tab @ref{Modules for Interfacing with X Windows}.
4221 @item @file{EmacsManagerP.h} @tab @ref{Modules for Interfacing with X Windows}.
4222 @item @file{EmacsShell-sub.c} @tab @ref{Modules for Interfacing with X Windows}.
4223 @item @file{EmacsShell.c} @tab @ref{Modules for Interfacing with X Windows}.
4224 @item @file{EmacsShell.h} @tab @ref{Modules for Interfacing with X Windows}.
4225 @item @file{EmacsShellP.h} @tab @ref{Modules for Interfacing with X Windows}.
4226 @item @file{ExternalClient-Xlib.c} @tab @ref{Modules for Interfacing with X Windows}.
4227 @item @file{ExternalClient.c} @tab @ref{Modules for Interfacing with X Windows}.
4228 @item @file{ExternalClient.h} @tab @ref{Modules for Interfacing with X Windows}.
4229 @item @file{ExternalClientP.h} @tab @ref{Modules for Interfacing with X Windows}.
4230 @item @file{ExternalShell.c} @tab @ref{Modules for Interfacing with X Windows}.
4231 @item @file{ExternalShell.h} @tab @ref{Modules for Interfacing with X Windows}.
4232 @item @file{ExternalShellP.h} @tab @ref{Modules for Interfacing with X Windows}.
4233 @item @file{Makefile.in.in} @tab
4234 @item @file{abbrev.c} @tab @ref{Modules for Other Aspects of the Lisp Interpreter and Object System}.
4235 @item @file{alloc.c} @tab @ref{Basic Lisp Modules}.
4236 @item @file{alloca.c} @tab @ref{Low-Level Modules}.
4237 @item @file{alloca.s} @tab
4238 @item @file{backtrace.h} @tab @ref{Basic Lisp Modules}.
4239 @item @file{balloon-x.c} @tab
4240 @item @file{balloon_help.c} @tab
4241 @item @file{balloon_help.h} @tab
4242 @item @file{base64-tests.el} @tab @ref{Modules for Regression Testing}.
4243 @item @file{bitmaps.h} @tab @ref{Modules for other Display-Related Lisp Objects}.
4244 @item @file{blocktype.c} @tab @ref{Low-Level Modules}.
4245 @item @file{blocktype.h} @tab @ref{Low-Level Modules}.
4246 @item @file{broken-sun.h} @tab @ref{Modules for Interfacing with the Operating System}.
4247 @item @file{buffer.c} @tab @ref{Modules for Standard Editing Operations}.
4248 @item @file{buffer.h} @tab @ref{Modules for Standard Editing Operations}.
4249 @item @file{bufslots.h} @tab @ref{Modules for Standard Editing Operations}.
4250 @item @file{byte-compiler-tests.el} @tab @ref{Modules for Regression Testing}.
4251 @item @file{bytecode.c} @tab @ref{Basic Lisp Modules}.
4252 @item @file{bytecode.h} @tab @ref{Basic Lisp Modules}.
4253 @item @file{c-tests.el} @tab @ref{Modules for Regression Testing}.
4254 @item @file{callint.c} @tab @ref{Modules for Standard Editing Operations}.
4255 @item @file{case-tests.el} @tab @ref{Modules for Regression Testing}.
4256 @item @file{casefiddle.c} @tab @ref{Modules for Other Aspects of the Lisp Interpreter and Object System}.
4257 @item @file{casetab.c} @tab @ref{Modules for Other Aspects of the Lisp Interpreter and Object System}.
4258 @item @file{casetab.h} @tab
4259 @item @file{ccl-tests.el} @tab @ref{Modules for Regression Testing}.
4260 @item @file{charset.h} @tab
4261 @item @file{chartab.c} @tab @ref{Modules for Other Aspects of the Lisp Interpreter and Object System}.
4262 @item @file{chartab.h} @tab @ref{Modules for Other Aspects of the Lisp Interpreter and Object System}.
4263 @item @file{cm.c} @tab @ref{Modules for the Redisplay Mechanism}.
4264 @item @file{cm.h} @tab @ref{Modules for the Redisplay Mechanism}.
4265 @item @file{cmdloop.c} @tab @ref{Editor-Level Control Flow Modules}.
4266 @item @file{cmds.c} @tab @ref{Modules for Standard Editing Operations}.
4267 @item @file{coding-system-slots.h} @tab
4268 @item @file{commands.h} @tab @ref{Modules for Standard Editing Operations}.
4269 @item @file{compiler.h} @tab
4270 @item @file{config.h.in} @tab
4271 @item @file{config.h} @tab @ref{Low-Level Modules}.
4272 @item @file{conslots.h} @tab
4273 @item @file{console-gtk-impl.h} @tab
4274 @item @file{console-gtk.c} @tab
4275 @item @file{console-gtk.h} @tab
4276 @item @file{console-impl.h} @tab
4277 @item @file{console-msw-impl.h} @tab
4278 @item @file{console-msw.c} @tab @ref{Modules for the Basic Displayable Lisp Objects}.
4279 @item @file{console-msw.h} @tab @ref{Modules for the Basic Displayable Lisp Objects}.
4280 @item @file{console-stream-impl.h} @tab
4281 @item @file{console-stream.c} @tab @ref{Modules for the Basic Displayable Lisp Objects}.
4282 @item @file{console-stream.h} @tab @ref{Modules for the Basic Displayable Lisp Objects}.
4283 @item @file{console-tty-impl.h} @tab
4284 @item @file{console-tty.c} @tab @ref{Modules for the Basic Displayable Lisp Objects}.
4285 @item @file{console-tty.h} @tab @ref{Modules for the Basic Displayable Lisp Objects}.
4286 @item @file{console-x-impl.h} @tab
4287 @item @file{console-x.c} @tab @ref{Modules for the Basic Displayable Lisp Objects}.
4288 @item @file{console-x.h} @tab @ref{Modules for the Basic Displayable Lisp Objects}.
4289 @item @file{console.c} @tab @ref{Modules for the Basic Displayable Lisp Objects}.
4290 @item @file{console.h} @tab @ref{Modules for the Basic Displayable Lisp Objects}.
4291 @item @file{data.c} @tab @ref{Basic Lisp Modules}.
4292 @item @file{database-tests.el} @tab @ref{Modules for Regression Testing}.
4293 @item @file{database.c} @tab
4294 @item @file{database.h} @tab
4295 @item @file{debug.c} @tab @ref{Low-Level Modules}.
4296 @item @file{debug.h} @tab @ref{Low-Level Modules}.
4297 @item @file{depend} @tab
4298 @item @file{device-gtk.c} @tab
4299 @item @file{device-impl.h} @tab
4300 @item @file{device-msw.c} @tab @ref{Modules for the Basic Displayable Lisp Objects}.
4301 @item @file{device-tty.c} @tab @ref{Modules for the Basic Displayable Lisp Objects}.
4302 @item @file{device-x.c} @tab @ref{Modules for the Basic Displayable Lisp Objects}.
4303 @item @file{device.c} @tab @ref{Modules for the Basic Displayable Lisp Objects}.
4304 @item @file{device.h} @tab @ref{Modules for the Basic Displayable Lisp Objects}.
4305 @item @file{devslots.h} @tab
4306 @item @file{dgif_lib.c} @tab @ref{Modules for other Display-Related Lisp Objects}.
4307 @item @file{dialog-gtk.c} @tab
4308 @item @file{dialog-msw.c} @tab
4309 @item @file{dialog-x.c} @tab
4310 @item @file{dialog.c} @tab
4311 @item @file{dired-msw.c} @tab
4312 @item @file{dired.c} @tab @ref{Modules for Interfacing with the File System}.
4313 @item @file{doc.c} @tab @ref{Modules for Other Aspects of the Lisp Interpreter and Object System}.
4314 @item @file{doprnt.c} @tab @ref{Modules for Standard Editing Operations}.
4315 @item @file{dragdrop.c} @tab
4316 @item @file{dragdrop.h} @tab
4317 @item @file{dump-data.c} @tab
4318 @item @file{dump-data.h} @tab
4319 @item @file{dump-id.c} @tab
4320 @item @file{dumper.c} @tab
4321 @item @file{dumper.h} @tab
4322 @item @file{dynarr.c} @tab @ref{Low-Level Modules}.
4323 @item @file{ecrt0.c} @tab @ref{Low-Level Modules}.
4324 @item @file{editfns.c} @tab @ref{Modules for Standard Editing Operations}.
4325 @item @file{elhash.c} @tab @ref{Modules for Other Aspects of the Lisp Interpreter and Object System}.
4326 @item @file{elhash.h} @tab @ref{Modules for Other Aspects of the Lisp Interpreter and Object System}.
4327 @item @file{emacs-marshals.c} @tab
4328 @item @file{emacs-new.c.old} @tab
4329 @item @file{emacs-widget-accessors.c} @tab
4330 @item @file{emacs.c} @tab @ref{Low-Level Modules}.
4331 @item @file{emodules.c} @tab
4332 @item @file{emodules.h} @tab
4333 @item @file{esd.c} @tab
4334 @item @file{eval.c} @tab @ref{Basic Lisp Modules}.
4335 @item @file{event-Xt.c} @tab @ref{Editor-Level Control Flow Modules}.
4336 @item @file{event-gtk.c} @tab
4337 @item @file{event-gtk.h} @tab
4338 @item @file{event-msw.c} @tab @ref{Editor-Level Control Flow Modules}.
4339 @item @file{event-stream.c} @tab @ref{Editor-Level Control Flow Modules}.
4340 @item @file{event-tty.c} @tab @ref{Editor-Level Control Flow Modules}.
4341 @item @file{event-unixoid.c} @tab
4342 @item @file{event-xlike-inc.c} @tab
4343 @item @file{events-mod.h} @tab @ref{Editor-Level Control Flow Modules}.
4344 @item @file{events.c} @tab @ref{Editor-Level Control Flow Modules}.
4345 @item @file{events.h} @tab @ref{Editor-Level Control Flow Modules}.
4346 @item @file{extent-tests.el} @tab @ref{Modules for Regression Testing}.
4347 @item @file{extents-impl.h} @tab
4348 @item @file{extents.c} @tab @ref{Modules for Standard Editing Operations}.
4349 @item @file{extents.h} @tab @ref{Modules for Standard Editing Operations}.
4350 @item @file{extw-Xlib.c} @tab @ref{Modules for Interfacing with X Windows}.
4351 @item @file{extw-Xlib.h} @tab @ref{Modules for Interfacing with X Windows}.
4352 @item @file{extw-Xt.c} @tab @ref{Modules for Interfacing with X Windows}.
4353 @item @file{extw-Xt.h} @tab @ref{Modules for Interfacing with X Windows}.
4354 @item @file{faces.c} @tab @ref{Modules for other Display-Related Lisp Objects}.
4355 @item @file{faces.h} @tab @ref{Modules for other Display-Related Lisp Objects}.
4356 @item @file{file-coding.c} @tab @ref{Modules for Internationalization}.
4357 @item @file{file-coding.h} @tab @ref{Modules for Internationalization}.
4358 @item @file{fileio.c} @tab @ref{Modules for Interfacing with the File System}.
4359 @item @file{filelock.c} @tab @ref{Modules for Interfacing with the File System}.
4360 @item @file{filemode.c} @tab @ref{Modules for Interfacing with the File System}.
4361 @item @file{floatfns.c} @tab @ref{Basic Lisp Modules}.
4362 @item @file{fns.c} @tab @ref{Basic Lisp Modules}.
4363 @item @file{font-lock.c} @tab @ref{Modules for other Display-Related Lisp Objects}.
4364 @item @file{frame-gtk.c} @tab
4365 @item @file{frame-impl.h} @tab
4366 @item @file{frame-msw.c} @tab @ref{Modules for the Basic Displayable Lisp Objects}.
4367 @item @file{frame-tty.c} @tab @ref{Modules for the Basic Displayable Lisp Objects}.
4368 @item @file{frame-x.c} @tab @ref{Modules for the Basic Displayable Lisp Objects}.
4369 @item @file{frame.c} @tab @ref{Modules for the Basic Displayable Lisp Objects}.
4370 @item @file{frame.diff} @tab
4371 @item @file{frame.h} @tab @ref{Modules for the Basic Displayable Lisp Objects}.
4372 @item @file{frameslots.h} @tab
4373 @item @file{free-hook.c} @tab @ref{Low-Level Modules}.
4374 @item @file{gccache-gtk.c} @tab
4375 @item @file{gccache-gtk.h} @tab
4376 @item @file{general-slots.h} @tab
4377 @item @file{general.c} @tab @ref{Basic Lisp Modules}.
4378 @item @file{getloadavg.c} @tab @ref{Modules for Interfacing with the Operating System}.
4379 @item @file{getpagesize.h} @tab @ref{Low-Level Modules}.
4380 @item @file{gif_err.c} @tab @ref{Modules for other Display-Related Lisp Objects}.
4381 @item @file{gif_io.c} @tab
4382 @item @file{gif_lib.h} @tab @ref{Modules for other Display-Related Lisp Objects}.
4383 @item @file{gifalloc.c} @tab @ref{Modules for other Display-Related Lisp Objects}.
4384 @item @file{gifrlib.h} @tab
4385 @item @file{glade.c} @tab
4386 @item @file{glyphs-eimage.c} @tab @ref{Modules for other Display-Related Lisp Objects}.
4387 @item @file{glyphs-gtk.c} @tab
4388 @item @file{glyphs-gtk.h} @tab
4389 @item @file{glyphs-msw.c} @tab @ref{Modules for other Display-Related Lisp Objects}.
4390 @item @file{glyphs-msw.h} @tab @ref{Modules for other Display-Related Lisp Objects}.
4391 @item @file{glyphs-shared.c} @tab
4392 @item @file{glyphs-widget.c} @tab @ref{Modules for other Display-Related Lisp Objects}.
4393 @item @file{glyphs-x.c} @tab @ref{Modules for other Display-Related Lisp Objects}.
4394 @item @file{glyphs-x.h} @tab @ref{Modules for other Display-Related Lisp Objects}.
4395 @item @file{glyphs.c} @tab @ref{Modules for other Display-Related Lisp Objects}.
4396 @item @file{glyphs.h} @tab @ref{Modules for other Display-Related Lisp Objects}.
4397 @item @file{gmalloc.c} @tab @ref{Low-Level Modules}.
4398 @item @file{gpmevent.c} @tab @ref{Editor-Level Control Flow Modules}.
4399 @item @file{gpmevent.h} @tab @ref{Editor-Level Control Flow Modules}.
4400 @item @file{gtk-glue.c} @tab
4401 @item @file{gtk-xemacs.c} @tab
4402 @item @file{gtk-xemacs.h} @tab
4403 @item @file{gui-gtk.c} @tab
4404 @item @file{gui-msw.c} @tab
4405 @item @file{gui-x.c} @tab
4406 @item @file{gui.c} @tab
4407 @item @file{gui.h} @tab
4408 @item @file{gutter.c} @tab
4409 @item @file{gutter.h} @tab
4410 @item @file{hash-table-tests.el} @tab @ref{Modules for Regression Testing}.
4411 @item @file{hash.c} @tab @ref{Modules for Other Aspects of the Lisp Interpreter and Object System}.
4412 @item @file{hash.h} @tab @ref{Modules for Other Aspects of the Lisp Interpreter and Object System}.
4413 @item @file{hftctl.c} @tab @ref{Modules for Interfacing with the Operating System}.
4414 @item @file{hpplay.c} @tab @ref{Modules for Interfacing with the Operating System}.
4415 @item @file{imgproc.c} @tab
4416 @item @file{imgproc.h} @tab
4417 @item @file{indent.c} @tab @ref{Modules for the Redisplay Mechanism}.
4418 @item @file{inline.c} @tab @ref{Low-Level Modules}.
4419 @item @file{input-method-motif.c} @tab
4420 @item @file{input-method-xlib.c} @tab
4421 @item @file{insdel.c} @tab @ref{Modules for Standard Editing Operations}.
4422 @item @file{insdel.h} @tab @ref{Modules for Standard Editing Operations}.
4423 @item @file{intl-auto-encap-win32.c} @tab
4424 @item @file{intl-auto-encap-win32.h} @tab
4425 @item @file{intl-encap-win32.c} @tab
4426 @item @file{intl-win32.c} @tab
4427 @item @file{intl-x.c} @tab
4428 @item @file{intl.c} @tab @ref{Modules for Internationalization}.
4429 @item @file{iso-wide.h} @tab @ref{Modules for Internationalization}.
4430 @item @file{keymap.c} @tab @ref{Editor-Level Control Flow Modules}.
4431 @item @file{keymap.h} @tab @ref{Editor-Level Control Flow Modules}.
4432 @item @file{lastfile.c} @tab @ref{Low-Level Modules}.
4433 @item @file{libinterface.c} @tab
4434 @item @file{libinterface.h} @tab
4435 @item @file{libsst.c} @tab @ref{Modules for Interfacing with the Operating System}.
4436 @item @file{libsst.h} @tab @ref{Modules for Interfacing with the Operating System}.
4437 @item @file{libst.h} @tab @ref{Modules for Interfacing with the Operating System}.
4438 @item @file{line-number.c} @tab
4439 @item @file{line-number.h} @tab
4440 @item @file{linuxplay.c} @tab @ref{Modules for Interfacing with the Operating System}.
4441 @item @file{lisp-disunion.h} @tab @ref{Basic Lisp Modules}.
4442 @item @file{lisp-tests.el} @tab @ref{Modules for Regression Testing}.
4443 @item @file{lisp-union.h} @tab @ref{Basic Lisp Modules}.
4444 @item @file{lisp.h} @tab @ref{Basic Lisp Modules}.
4445 @item @file{lread.c} @tab @ref{Basic Lisp Modules}.
4446 @item @file{lrecord.h} @tab @ref{Basic Lisp Modules}.
4447 @item @file{lstream.c} @tab @ref{Modules for Interfacing with the File System}.
4448 @item @file{lstream.h} @tab @ref{Modules for Interfacing with the File System}.
4449 @item @file{macros.c} @tab @ref{Editor-Level Control Flow Modules}.
4450 @item @file{macros.h} @tab @ref{Editor-Level Control Flow Modules}.
4451 @item @file{make-src-depend} @tab
4452 @item @file{malloc.c} @tab @ref{Low-Level Modules}.
4453 @item @file{marker.c} @tab @ref{Modules for Standard Editing Operations}.
4454 @item @file{md5-tests.el} @tab @ref{Modules for Regression Testing}.
4455 @item @file{md5.c} @tab @ref{Modules for Other Aspects of the Lisp Interpreter and Object System}.
4456 @item @file{mem-limits.h} @tab @ref{Low-Level Modules}.
4457 @item @file{menubar-gtk.c} @tab
4458 @item @file{menubar-msw.c} @tab @ref{Modules for other Display-Related Lisp Objects}.
4459 @item @file{menubar-msw.h} @tab @ref{Modules for other Display-Related Lisp Objects}.
4460 @item @file{menubar-x.c} @tab @ref{Modules for other Display-Related Lisp Objects}.
4461 @item @file{menubar.c} @tab @ref{Modules for other Display-Related Lisp Objects}.
4462 @item @file{menubar.h} @tab @ref{Modules for other Display-Related Lisp Objects}.
4463 @item @file{minibuf.c} @tab @ref{Editor-Level Control Flow Modules}.
4464 @item @file{miscplay.c} @tab
4465 @item @file{miscplay.h} @tab
4466 @item @file{mule-canna.c} @tab @ref{Modules for Internationalization}.
4467 @item @file{mule-ccl.c} @tab @ref{Modules for Internationalization}.
4468 @item @file{mule-ccl.h} @tab
4469 @item @file{mule-charset.c} @tab @ref{Modules for Internationalization}.
4470 @item @file{mule-charset.h} @tab @ref{Modules for Internationalization}.
4471 @item @file{mule-coding.c} @tab @ref{Modules for Internationalization}.
4472 @item @file{mule-mcpath.c} @tab @ref{Modules for Internationalization}.
4473 @item @file{mule-mcpath.h} @tab @ref{Modules for Internationalization}.
4474 @item @file{mule-tests.el} @tab @ref{Modules for Regression Testing}.
4475 @item @file{mule-wnnfns.c} @tab @ref{Modules for Internationalization}.
4476 @item @file{mule.c} @tab @ref{Modules for Internationalization}.
4477 @item @file{nas.c} @tab @ref{Modules for Interfacing with the Operating System}.
4478 @item @file{native-gtk-toolbar.c} @tab
4479 @item @file{ndir.h} @tab @ref{Modules for Interfacing with the File System}.
4480 @item @file{nsselect.m} @tab
4481 @item @file{nt.c} @tab
4482 @item @file{ntheap.c} @tab
4483 @item @file{ntplay.c} @tab
4484 @item @file{number-gmp.c} @tab
4485 @item @file{number-gmp.h} @tab
4486 @item @file{number-mp.c} @tab
4487 @item @file{number-mp.h} @tab
4488 @item @file{number.c} @tab
4489 @item @file{number.h} @tab
4490 @item @file{objects-gtk-impl.h} @tab
4491 @item @file{objects-gtk.c} @tab
4492 @item @file{objects-gtk.h} @tab
4493 @item @file{objects-impl.h} @tab
4494 @item @file{objects-msw-impl.h} @tab
4495 @item @file{objects-msw.c} @tab @ref{Modules for other Display-Related Lisp Objects}.
4496 @item @file{objects-msw.h} @tab @ref{Modules for other Display-Related Lisp Objects}.
4497 @item @file{objects-tty-impl.h} @tab
4498 @item @file{objects-tty.c} @tab @ref{Modules for other Display-Related Lisp Objects}.
4499 @item @file{objects-tty.h} @tab @ref{Modules for other Display-Related Lisp Objects}.
4500 @item @file{objects-x-impl.h} @tab
4501 @item @file{objects-x.c} @tab @ref{Modules for other Display-Related Lisp Objects}.
4502 @item @file{objects-x.h} @tab @ref{Modules for other Display-Related Lisp Objects}.
4503 @item @file{objects.c} @tab @ref{Modules for other Display-Related Lisp Objects}.
4504 @item @file{objects.h} @tab @ref{Modules for other Display-Related Lisp Objects}.
4505 @item @file{offix-cursors.h} @tab
4506 @item @file{offix-types.h} @tab
4507 @item @file{offix.c} @tab
4508 @item @file{offix.h} @tab
4509 @item @file{opaque.c} @tab @ref{Modules for Other Aspects of the Lisp Interpreter and Object System}.
4510 @item @file{opaque.h} @tab @ref{Modules for Other Aspects of the Lisp Interpreter and Object System}.
4511 @item @file{paths.h.in} @tab
4512 @item @file{paths.h} @tab @ref{Low-Level Modules}.
4513 @item @file{ppc.ldscript} @tab
4514 @item @file{pre-crt0.c} @tab @ref{Low-Level Modules}.
4515 @item @file{print.c} @tab @ref{Basic Lisp Modules}.
4516 @item @file{process-nt.c} @tab
4517 @item @file{process-slots.h} @tab
4518 @item @file{process-unix.c} @tab
4519 @item @file{process.c} @tab @ref{Modules for Interfacing with the Operating System}.
4520 @item @file{process.el} @tab @ref{Modules for Interfacing with the Operating System}.
4521 @item @file{process.h} @tab @ref{Modules for Interfacing with the Operating System}.
4522 @item @file{procimpl.h} @tab
4523 @item @file{profile.c.orig} @tab
4524 @item @file{profile.c.rej} @tab
4525 @item @file{profile.c} @tab
4526 @item @file{profile.h} @tab
4527 @item @file{ralloc.c} @tab @ref{Low-Level Modules}.
4528 @item @file{rangetab.c} @tab @ref{Modules for Other Aspects of the Lisp Interpreter and Object System}.
4529 @item @file{rangetab.h} @tab
4530 @item @file{realpath.c} @tab @ref{Modules for Interfacing with the File System}.
4531 @item @file{redisplay-gtk.c} @tab
4532 @item @file{redisplay-msw.c} @tab @ref{Modules for the Redisplay Mechanism}.
4533 @item @file{redisplay-output.c} @tab @ref{Modules for the Redisplay Mechanism}.
4534 @item @file{redisplay-tty.c} @tab @ref{Modules for the Redisplay Mechanism}.
4535 @item @file{redisplay-x.c} @tab @ref{Modules for the Redisplay Mechanism}.
4536 @item @file{redisplay.c} @tab @ref{Modules for the Redisplay Mechanism}.
4537 @item @file{redisplay.h} @tab @ref{Modules for the Redisplay Mechanism}.
4538 @item @file{regex.c} @tab @ref{Modules for Standard Editing Operations}.
4539 @item @file{regex.h} @tab @ref{Modules for Standard Editing Operations}.
4540 @item @file{regexp-tests.el} @tab @ref{Modules for Regression Testing}.
4541 @item @file{scrollbar-gtk.c} @tab
4542 @item @file{scrollbar-gtk.h} @tab
4543 @item @file{scrollbar-msw.c} @tab @ref{Modules for other Display-Related Lisp Objects}.
4544 @item @file{scrollbar-msw.h} @tab @ref{Modules for other Display-Related Lisp Objects}.
4545 @item @file{scrollbar-x.c} @tab @ref{Modules for other Display-Related Lisp Objects}.
4546 @item @file{scrollbar-x.h} @tab @ref{Modules for other Display-Related Lisp Objects}.
4547 @item @file{scrollbar.c} @tab @ref{Modules for other Display-Related Lisp Objects}.
4548 @item @file{scrollbar.h} @tab @ref{Modules for other Display-Related Lisp Objects}.
4549 @item @file{search.c} @tab @ref{Modules for Standard Editing Operations}.
4550 @item @file{select-common.h} @tab
4551 @item @file{select-gtk.c} @tab
4552 @item @file{select-msw.c} @tab @ref{Modules for Interfacing with X Windows}.
4553 @item @file{select-x.c} @tab @ref{Modules for Interfacing with X Windows}.
4554 @item @file{select.c} @tab @ref{Modules for Interfacing with X Windows}.
4555 @item @file{select.h} @tab @ref{Modules for Interfacing with X Windows}.
4556 @item @file{sgiplay.c} @tab @ref{Modules for Interfacing with the Operating System}.
4557 @item @file{sheap.c} @tab
4558 @item @file{signal.c} @tab @ref{Low-Level Modules}.
4559 @item @file{sound.c} @tab @ref{Modules for Interfacing with the Operating System}.
4560 @item @file{sound.h} @tab
4561 @item @file{specifier.c} @tab @ref{Modules for Other Aspects of the Lisp Interpreter and Object System}.
4562 @item @file{specifier.h} @tab @ref{Modules for Other Aspects of the Lisp Interpreter and Object System}.
4563 @item @file{src-headers} @tab
4564 @item @file{strcat.c} @tab
4565 @item @file{strcmp.c} @tab @ref{Modules for Interfacing with the Operating System}.
4566 @item @file{strcpy.c} @tab @ref{Modules for Interfacing with the Operating System}.
4567 @item @file{strftime.c} @tab
4568 @item @file{sunOS-fix.c} @tab @ref{Modules for Interfacing with the Operating System}.
4569 @item @file{sunplay.c} @tab @ref{Modules for Interfacing with the Operating System}.
4570 @item @file{sunpro.c} @tab @ref{Modules for Interfacing with the Operating System}.
4571 @item @file{symbol-tests.el} @tab @ref{Modules for Regression Testing}.
4572 @item @file{symbols.c} @tab @ref{Basic Lisp Modules}.
4573 @item @file{symeval.h} @tab @ref{Basic Lisp Modules}.
4574 @item @file{symsinit.h} @tab @ref{Basic Lisp Modules}.
4575 @item @file{syntax-tests.el} @tab @ref{Modules for Regression Testing}.
4576 @item @file{syntax.c} @tab @ref{Modules for Other Aspects of the Lisp Interpreter and Object System}.
4577 @item @file{syntax.h} @tab @ref{Modules for Other Aspects of the Lisp Interpreter and Object System}.
4578 @item @file{sysdep.c} @tab @ref{Modules for Interfacing with the Operating System}.
4579 @item @file{sysdep.h} @tab @ref{Modules for Interfacing with the Operating System}.
4580 @item @file{sysdir.h} @tab @ref{Modules for Interfacing with the Operating System}.
4581 @item @file{sysdll.c} @tab
4582 @item @file{sysdll.h} @tab
4583 @item @file{sysfile.h} @tab @ref{Modules for Interfacing with the Operating System}.
4584 @item @file{sysfloat.h} @tab @ref{Modules for Interfacing with the Operating System}.
4585 @item @file{sysproc.h} @tab @ref{Modules for Interfacing with the Operating System}.
4586 @item @file{syspwd.h} @tab @ref{Modules for Interfacing with the Operating System}.
4587 @item @file{syssignal.h} @tab @ref{Modules for Interfacing with the Operating System}.
4588 @item @file{systime.h} @tab @ref{Modules for Interfacing with the Operating System}.
4589 @item @file{systty.h} @tab @ref{Modules for Interfacing with the Operating System}.
4590 @item @file{syswait.h} @tab @ref{Modules for Interfacing with the Operating System}.
4591 @item @file{syswindows.h} @tab
4592 @item @file{tag-tests.el} @tab @ref{Modules for Regression Testing}.
4593 @item @file{termcap.c} @tab @ref{Modules for the Redisplay Mechanism}.
4594 @item @file{terminfo.c} @tab @ref{Modules for the Redisplay Mechanism}.
4595 @item @file{test-harness.el} @tab @ref{Modules for Regression Testing}.
4596 @item @file{tests.c} @tab
4597 @item @file{text.c} @tab
4598 @item @file{text.h} @tab
4599 @item @file{toolbar-common.c} @tab
4600 @item @file{toolbar-common.h} @tab
4601 @item @file{toolbar-gtk.c} @tab
4602 @item @file{toolbar-msw.c} @tab @ref{Modules for other Display-Related Lisp Objects}.
4603 @item @file{toolbar-x.c} @tab @ref{Modules for other Display-Related Lisp Objects}.
4604 @item @file{toolbar.c} @tab @ref{Modules for other Display-Related Lisp Objects}.
4605 @item @file{toolbar.h} @tab @ref{Modules for other Display-Related Lisp Objects}.
4606 @item @file{tooltalk.c} @tab @ref{Modules for Interfacing with the Operating System}.
4607 @item @file{tooltalk.h} @tab @ref{Modules for Interfacing with the Operating System}.
4608 @item @file{tparam.c} @tab @ref{Modules for the Redisplay Mechanism}.
4609 @item @file{ui-byhand.c} @tab
4610 @item @file{ui-gtk.c} @tab
4611 @item @file{ui-gtk.h} @tab
4612 @item @file{undo.c} @tab @ref{Modules for Standard Editing Operations}.
4613 @item @file{unexaix.c} @tab @ref{Low-Level Modules}.
4614 @item @file{unexalpha.c} @tab @ref{Low-Level Modules}.
4615 @item @file{unexapollo.c} @tab @ref{Low-Level Modules}.
4616 @item @file{unexconvex.c} @tab @ref{Low-Level Modules}.
4617 @item @file{unexcw.c} @tab
4618 @item @file{unexec.c} @tab @ref{Low-Level Modules}.
4619 @item @file{unexelf.c} @tab @ref{Low-Level Modules}.
4620 @item @file{unexelfsgi.c} @tab @ref{Low-Level Modules}.
4621 @item @file{unexencap.c} @tab @ref{Low-Level Modules}.
4622 @item @file{unexenix.c} @tab @ref{Low-Level Modules}.
4623 @item @file{unexfreebsd.c} @tab @ref{Low-Level Modules}.
4624 @item @file{unexfx2800.c} @tab @ref{Low-Level Modules}.
4625 @item @file{unexhp9k3.c} @tab @ref{Low-Level Modules}.
4626 @item @file{unexhp9k800.c} @tab @ref{Low-Level Modules}.
4627 @item @file{unexmips.c} @tab @ref{Low-Level Modules}.
4628 @item @file{unexnext.c} @tab @ref{Low-Level Modules}.
4629 @item @file{unexnt.c} @tab
4630 @item @file{unexsni.c} @tab
4631 @item @file{unexsol2-6.c} @tab
4632 @item @file{unexsol2.c} @tab @ref{Low-Level Modules}.
4633 @item @file{unexsunos4.c} @tab @ref{Low-Level Modules}.
4634 @item @file{unicode.c} @tab
4635 @item @file{universe.h} @tab @ref{Low-Level Modules}.
4636 @item @file{vm-limit.c} @tab @ref{Low-Level Modules}.
4637 @item @file{weak-tests.el} @tab @ref{Modules for Regression Testing}.
4638 @item @file{widget.c} @tab
4639 @item @file{win32.c} @tab
4640 @item @file{window-impl.h} @tab
4641 @item @file{window.c} @tab @ref{Modules for the Basic Displayable Lisp Objects}.
4642 @item @file{window.h} @tab @ref{Modules for the Basic Displayable Lisp Objects}.
4643 @item @file{winslots.h} @tab
4644 @item @file{xemacs.def.in.in} @tab
4645 @item @file{xgccache.c} @tab @ref{Modules for Interfacing with X Windows}.
4646 @item @file{xgccache.h} @tab @ref{Modules for Interfacing with X Windows}.
4647 @item @file{xintrinsic.h} @tab @ref{Modules for Interfacing with X Windows}.
4648 @item @file{xintrinsicp.h} @tab @ref{Modules for Interfacing with X Windows}.
4649 @item @file{xmmanagerp.h} @tab @ref{Modules for Interfacing with X Windows}.
4650 @item @file{xmotif.h} @tab
4651 @item @file{xmprimitivep.h} @tab @ref{Modules for Interfacing with X Windows}.
4652 @item @file{xmu.c} @tab @ref{Modules for Interfacing with X Windows}.
4653 @item @file{xmu.h} @tab @ref{Modules for Interfacing with X Windows}.
4654 @end multitable
4655
4656
4657
4658 @node Low-Level Modules, Basic Lisp Modules, A Summary of the Various XEmacs Modules, The Modules of XEmacs
4203 @section Low-Level Modules 4659 @section Low-Level Modules
4204 @cindex low-level modules 4660 @cindex low-level modules
4205 @cindex modules, low-level 4661 @cindex modules, low-level
4206 4662
4207 @example 4663 @example
4208 config.h 4664 @file{config.h}
4209 @end example 4665 @end example
4210 4666
4211 This is automatically generated from @file{config.h.in} based on the 4667 This is automatically generated from @file{config.h.in} based on the
4212 results of configure tests and user-selected optional features and 4668 results of configure tests and user-selected optional features and
4213 contains preprocessor definitions specifying the nature of the 4669 contains preprocessor definitions specifying the nature of the
4214 environment in which XEmacs is being compiled. 4670 environment in which XEmacs is being compiled.
4215 4671
4216 4672
4217 4673
4218 @example 4674 @example
4219 paths.h 4675 @file{paths.h}
4220 @end example 4676 @end example
4221 4677
4222 This is automatically generated from @file{paths.h.in} based on supplied 4678 This is automatically generated from @file{paths.h.in} based on supplied
4223 configure values, and allows for non-standard installed configurations 4679 configure values, and allows for non-standard installed configurations
4224 of the XEmacs directories. It's currently broken, though. 4680 of the XEmacs directories. It's currently broken, though.
4225 4681
4226 4682
4227 4683
4228 @example 4684 @example
4229 emacs.c 4685 @file{emacs.c}
4230 signal.c 4686 @file{signal.c}
4231 @end example 4687 @end example
4232 4688
4233 @file{emacs.c} contains @code{main()} and other code that performs the most 4689 @file{emacs.c} contains @code{main()} and other code that performs the most
4234 basic environment initializations and handles shutting down the XEmacs 4690 basic environment initializations and handles shutting down the XEmacs
4235 process (this includes @code{kill-emacs}, the normal way that XEmacs is 4691 process (this includes @code{kill-emacs}, the normal way that XEmacs is
4245 @file{syssignal.h} header file, described in section J below. 4701 @file{syssignal.h} header file, described in section J below.
4246 4702
4247 4703
4248 4704
4249 @example 4705 @example
4250 unexaix.c 4706 @file{unexaix.c}
4251 unexalpha.c 4707 @file{unexalpha.c}
4252 unexapollo.c 4708 @file{unexapollo.c}
4253 unexconvex.c 4709 @file{unexconvex.c}
4254 unexec.c 4710 @file{unexec.c}
4255 unexelf.c 4711 @file{unexelf.c}
4256 unexelfsgi.c 4712 @file{unexelfsgi.c}
4257 unexencap.c 4713 @file{unexencap.c}
4258 unexenix.c 4714 @file{unexenix.c}
4259 unexfreebsd.c 4715 @file{unexfreebsd.c}
4260 unexfx2800.c 4716 @file{unexfx2800.c}
4261 unexhp9k3.c 4717 @file{unexhp9k3.c}
4262 unexhp9k800.c 4718 @file{unexhp9k800.c}
4263 unexmips.c 4719 @file{unexmips.c}
4264 unexnext.c 4720 @file{unexnext.c}
4265 unexsol2.c 4721 @file{unexsol2.c}
4266 unexsunos4.c 4722 @file{unexsunos4.c}
4267 @end example 4723 @end example
4268 4724
4269 These modules contain code dumping out the XEmacs executable on various 4725 These modules contain code dumping out the XEmacs executable on various
4270 different systems. (This process is highly machine-specific and 4726 different systems. (This process is highly machine-specific and
4271 requires intimate knowledge of the executable format and the memory map 4727 requires intimate knowledge of the executable format and the memory map
4273 chosen by @file{configure}. 4729 chosen by @file{configure}.
4274 4730
4275 4731
4276 4732
4277 @example 4733 @example
4278 ecrt0.c 4734 @file{ecrt0.c}
4279 lastfile.c 4735 @file{lastfile.c}
4280 pre-crt0.c 4736 @file{pre-crt0.c}
4281 @end example 4737 @end example
4282 4738
4283 These modules are used in conjunction with the dump mechanism. On some 4739 These modules are used in conjunction with the dump mechanism. On some
4284 systems, an alternative version of the C startup code (the actual code 4740 systems, an alternative version of the C startup code (the actual code
4285 that receives control from the operating system when the process is 4741 that receives control from the operating system when the process is
4300 data space when dumping. 4756 data space when dumping.
4301 4757
4302 4758
4303 4759
4304 @example 4760 @example
4305 alloca.c 4761 @file{alloca.c}
4306 free-hook.c 4762 @file{free-hook.c}
4307 getpagesize.h 4763 @file{getpagesize.h}
4308 gmalloc.c 4764 @file{gmalloc.c}
4309 malloc.c 4765 @file{malloc.c}
4310 mem-limits.h 4766 @file{mem-limits.h}
4311 ralloc.c 4767 @file{ralloc.c}
4312 vm-limit.c 4768 @file{vm-limit.c}
4313 @end example 4769 @end example
4314 4770
4315 These handle basic C allocation of memory. @file{alloca.c} is an emulation of 4771 These handle basic C allocation of memory. @file{alloca.c} is an emulation of
4316 the stack allocation function @code{alloca()} on machines that lack 4772 the stack allocation function @code{alloca()} on machines that lack
4317 this. (XEmacs makes extensive use of @code{alloca()} in its code.) 4773 this. (XEmacs makes extensive use of @code{alloca()} in its code.)
4363 retrieving the total amount of available virtual memory. Both are 4819 retrieving the total amount of available virtual memory. Both are
4364 similar in spirit to the @file{sys*.h} files described in section J, below. 4820 similar in spirit to the @file{sys*.h} files described in section J, below.
4365 4821
4366 4822
4367 @example 4823 @example
4368 blocktype.c 4824 @file{blocktype.c}
4369 blocktype.h 4825 @file{blocktype.h}
4370 dynarr.c 4826 @file{dynarr.c}
4371 @end example 4827 @end example
4372 4828
4373 These implement a couple of basic C data types to facilitate memory 4829 These implement a couple of basic C data types to facilitate memory
4374 allocation. The @code{Blocktype} type efficiently manages the 4830 allocation. The @code{Blocktype} type efficiently manages the
4375 allocation of fixed-size blocks by minimizing the number of times that 4831 allocation of fixed-size blocks by minimizing the number of times that
4389 mechanism. 4845 mechanism.
4390 4846
4391 4847
4392 4848
4393 @example 4849 @example
4394 inline.c 4850 @file{inline.c}
4395 @end example 4851 @end example
4396 4852
4397 This module is used in connection with inline functions (available in 4853 This module is used in connection with inline functions (available in
4398 some compilers). Often, inline functions need to have a corresponding 4854 some compilers). Often, inline functions need to have a corresponding
4399 non-inline function that does the same thing. This module is where they 4855 non-inline function that does the same thing. This module is where they
4403 function definitions, so that each one gets a real function equivalent. 4859 function definitions, so that each one gets a real function equivalent.
4404 4860
4405 4861
4406 4862
4407 @example 4863 @example
4408 debug.c 4864 @file{debug.c}
4409 debug.h 4865 @file{debug.h}
4410 @end example 4866 @end example
4411 4867
4412 These functions provide a system for doing internal consistency checks 4868 These functions provide a system for doing internal consistency checks
4413 during code development. This system is not currently used; instead the 4869 during code development. This system is not currently used; instead the
4414 simpler @code{assert()} macro is used along with the various checks 4870 simpler @code{assert()} macro is used along with the various checks
4415 provided by the @samp{--error-check-*} configuration options. 4871 provided by the @samp{--error-check-*} configuration options.
4416 4872
4417 4873
4418 4874
4419 @example 4875 @example
4420 universe.h 4876 @file{universe.h}
4421 @end example 4877 @end example
4422 4878
4423 This is not currently used. 4879 This is not currently used.
4424 4880
4425 4881
4426 4882
4427 @node Basic Lisp Modules 4883 @node Basic Lisp Modules, Modules for Standard Editing Operations, Low-Level Modules, The Modules of XEmacs
4428 @section Basic Lisp Modules 4884 @section Basic Lisp Modules
4429 @cindex Lisp modules, basic 4885 @cindex Lisp modules, basic
4430 @cindex modules, basic Lisp 4886 @cindex modules, basic Lisp
4431 4887
4432 @example 4888 @example
4433 lisp-disunion.h 4889 @file{lisp-disunion.h}
4434 lisp-union.h 4890 @file{lisp-union.h}
4435 lisp.h 4891 @file{lisp.h}
4436 lrecord.h 4892 @file{lrecord.h}
4437 symsinit.h 4893 @file{symsinit.h}
4438 @end example 4894 @end example
4439 4895
4440 These are the basic header files for all XEmacs modules. Each module 4896 These are the basic header files for all XEmacs modules. Each module
4441 includes @file{lisp.h}, which brings the other header files in. 4897 includes @file{lisp.h}, which brings the other header files in.
4442 @file{lisp.h} contains the definitions of the structures and extractor 4898 @file{lisp.h} contains the definitions of the structures and extractor
4475 @file{symsinit.h}. 4931 @file{symsinit.h}.
4476 4932
4477 4933
4478 4934
4479 @example 4935 @example
4480 alloc.c 4936 @file{alloc.c}
4481 @end example 4937 @end example
4482 4938
4483 The large module @file{alloc.c} implements all of the basic allocation and 4939 The large module @file{alloc.c} implements all of the basic allocation and
4484 garbage collection for Lisp objects. The most commonly used Lisp 4940 garbage collection for Lisp objects. The most commonly used Lisp
4485 objects are allocated in chunks, similar to the Blocktype data type 4941 objects are allocated in chunks, similar to the Blocktype data type
4503 subtypes in the subsystem; this provides a great deal of robustness to 4959 subtypes in the subsystem; this provides a great deal of robustness to
4504 the XEmacs code. 4960 the XEmacs code.
4505 4961
4506 4962
4507 @example 4963 @example
4508 eval.c 4964 @file{eval.c}
4509 backtrace.h 4965 @file{backtrace.h}
4510 @end example 4966 @end example
4511 4967
4512 This module contains all of the functions to handle the flow of control. 4968 This module contains all of the functions to handle the flow of control.
4513 This includes the mechanisms of defining functions, calling functions, 4969 This includes the mechanisms of defining functions, calling functions,
4514 traversing stack frames, and binding variables; the control primitives 4970 traversing stack frames, and binding variables; the control primitives
4523 flow of control. 4979 flow of control.
4524 4980
4525 4981
4526 4982
4527 @example 4983 @example
4528 lread.c 4984 @file{lread.c}
4529 @end example 4985 @end example
4530 4986
4531 This module implements the Lisp reader and the @code{read} function, 4987 This module implements the Lisp reader and the @code{read} function,
4532 which converts text into Lisp objects, according to the read syntax of 4988 which converts text into Lisp objects, according to the read syntax of
4533 the objects, as described above. This is similar to the parser that is 4989 the objects, as described above. This is similar to the parser that is
4534 a part of all compilers. 4990 a part of all compilers.
4535 4991
4536 4992
4537 4993
4538 @example 4994 @example
4539 print.c 4995 @file{print.c}
4540 @end example 4996 @end example
4541 4997
4542 This module implements the Lisp print mechanism and the @code{print} 4998 This module implements the Lisp print mechanism and the @code{print}
4543 function and related functions. This is the inverse of the Lisp reader 4999 function and related functions. This is the inverse of the Lisp reader
4544 -- it converts Lisp objects to a printed, textual representation. 5000 -- it converts Lisp objects to a printed, textual representation.
4546 an equivalent object.) 5002 an equivalent object.)
4547 5003
4548 5004
4549 5005
4550 @example 5006 @example
4551 general.c 5007 @file{general.c}
4552 symbols.c 5008 @file{symbols.c}
4553 symeval.h 5009 @file{symeval.h}
4554 @end example 5010 @end example
4555 5011
4556 @file{symbols.c} implements the handling of symbols, obarrays, and 5012 @file{symbols.c} implements the handling of symbols, obarrays, and
4557 retrieving the values of symbols. Much of the code is devoted to 5013 retrieving the values of symbols. Much of the code is devoted to
4558 handling the special @dfn{symbol-value-magic} objects that define 5014 handling the special @dfn{symbol-value-magic} objects that define
4566 @code{DEFVAR_LISP()} and related macros for declaring variables. 5022 @code{DEFVAR_LISP()} and related macros for declaring variables.
4567 5023
4568 5024
4569 5025
4570 @example 5026 @example
4571 data.c 5027 @file{data.c}
4572 floatfns.c 5028 @file{floatfns.c}
4573 fns.c 5029 @file{fns.c}
4574 @end example 5030 @end example
4575 5031
4576 These modules implement the methods and standard Lisp primitives for all 5032 These modules implement the methods and standard Lisp primitives for all
4577 the basic Lisp object types other than symbols (which are described 5033 the basic Lisp object types other than symbols (which are described
4578 above). @file{data.c} contains all the predicates (primitives that return 5034 above). @file{data.c} contains all the predicates (primitives that return
4587 arithmetic. 5043 arithmetic.
4588 5044
4589 5045
4590 5046
4591 @example 5047 @example
4592 bytecode.c 5048 @file{bytecode.c}
4593 bytecode.h 5049 @file{bytecode.h}
4594 @end example 5050 @end example
4595 5051
4596 @file{bytecode.c} implements the byte-code interpreter and 5052 @file{bytecode.c} implements the byte-code interpreter and
4597 compiled-function objects, and @file{bytecode.h} contains associated 5053 compiled-function objects, and @file{bytecode.h} contains associated
4598 structures. Note that the byte-code @emph{compiler} is written in Lisp. 5054 structures. Note that the byte-code @emph{compiler} is written in Lisp.
4599 5055
4600 5056
4601 5057
4602 5058
4603 @node Modules for Standard Editing Operations 5059 @node Modules for Standard Editing Operations, Modules for Interfacing with the File System, Basic Lisp Modules, The Modules of XEmacs
4604 @section Modules for Standard Editing Operations 5060 @section Modules for Standard Editing Operations
4605 @cindex modules for standard editing operations 5061 @cindex modules for standard editing operations
4606 @cindex editing operations, modules for standard 5062 @cindex editing operations, modules for standard
4607 5063
4608 @example 5064 @example
4609 buffer.c 5065 @file{buffer.c}
4610 buffer.h 5066 @file{buffer.h}
4611 bufslots.h 5067 @file{bufslots.h}
4612 @end example 5068 @end example
4613 5069
4614 @file{buffer.c} implements the @dfn{buffer} Lisp object type. This 5070 @file{buffer.c} implements the @dfn{buffer} Lisp object type. This
4615 includes functions that create and destroy buffers; retrieve buffers by 5071 includes functions that create and destroy buffers; retrieve buffers by
4616 name or by other properties; manipulate lists of buffers (remember that 5072 name or by other properties; manipulate lists of buffers (remember that
4635 the built-in buffer-local variables. 5091 the built-in buffer-local variables.
4636 5092
4637 5093
4638 5094
4639 @example 5095 @example
4640 insdel.c 5096 @file{insdel.c}
4641 insdel.h 5097 @file{insdel.h}
4642 @end example 5098 @end example
4643 5099
4644 @file{insdel.c} contains low-level functions for inserting and deleting text in 5100 @file{insdel.c} contains low-level functions for inserting and deleting text in
4645 a buffer, keeping track of changed regions for use by redisplay, and 5101 a buffer, keeping track of changed regions for use by redisplay, and
4646 calling any before-change and after-change functions that may have been 5102 calling any before-change and after-change functions that may have been
4650 @file{insdel.h} contains associated headers. 5106 @file{insdel.h} contains associated headers.
4651 5107
4652 5108
4653 5109
4654 @example 5110 @example
4655 marker.c 5111 @file{marker.c}
4656 @end example 5112 @end example
4657 5113
4658 This module implements the @dfn{marker} Lisp object type, which 5114 This module implements the @dfn{marker} Lisp object type, which
4659 conceptually is a pointer to a text position in a buffer that moves 5115 conceptually is a pointer to a text position in a buffer that moves
4660 around as text is inserted and deleted, so as to remain in the same 5116 around as text is inserted and deleted, so as to remain in the same
4669 current buffer position of the marker. 5125 current buffer position of the marker.
4670 5126
4671 5127
4672 5128
4673 @example 5129 @example
4674 extents.c 5130 @file{extents.c}
4675 extents.h 5131 @file{extents.h}
4676 @end example 5132 @end example
4677 5133
4678 This module implements the @dfn{extent} Lisp object type, which is like 5134 This module implements the @dfn{extent} Lisp object type, which is like
4679 a marker that works over a range of text rather than a single position. 5135 a marker that works over a range of text rather than a single position.
4680 Extents are also much more complex and powerful than markers and have a 5136 Extents are also much more complex and powerful than markers and have a
4690 cover.) 5146 cover.)
4691 5147
4692 5148
4693 5149
4694 @example 5150 @example
4695 editfns.c 5151 @file{editfns.c}
4696 @end example 5152 @end example
4697 5153
4698 @file{editfns.c} contains the standard Lisp primitives for working with 5154 @file{editfns.c} contains the standard Lisp primitives for working with
4699 a buffer's text, and calls the low-level functions in @file{insdel.c}. 5155 a buffer's text, and calls the low-level functions in @file{insdel.c}.
4700 It also contains primitives for working with @code{point} (the default 5156 It also contains primitives for working with @code{point} (the default
4707 @file{editfns.c}. 5163 @file{editfns.c}.
4708 5164
4709 5165
4710 5166
4711 @example 5167 @example
4712 callint.c 5168 @file{callint.c}
4713 cmds.c 5169 @file{cmds.c}
4714 commands.h 5170 @file{commands.h}
4715 @end example 5171 @end example
4716 5172
4717 @cindex interactive 5173 @cindex interactive
4718 These modules implement the basic @dfn{interactive} commands, 5174 These modules implement the basic @dfn{interactive} commands,
4719 i.e. user-callable functions. Commands, as opposed to other functions, 5175 i.e. user-callable functions. Commands, as opposed to other functions,
4736 @file{commands.h} contains associated structure definitions and prototypes. 5192 @file{commands.h} contains associated structure definitions and prototypes.
4737 5193
4738 5194
4739 5195
4740 @example 5196 @example
4741 regex.c 5197 @file{regex.c}
4742 regex.h 5198 @file{regex.h}
4743 search.c 5199 @file{search.c}
4744 @end example 5200 @end example
4745 5201
4746 @file{search.c} implements the Lisp primitives for searching for text in 5202 @file{search.c} implements the Lisp primitives for searching for text in
4747 a buffer, and some of the low-level algorithms for doing this. In 5203 a buffer, and some of the low-level algorithms for doing this. In
4748 particular, the fast fixed-string Boyer-Moore search algorithm is 5204 particular, the fast fixed-string Boyer-Moore search algorithm is
4753 routines used in @file{grep} and other GNU utilities. 5209 routines used in @file{grep} and other GNU utilities.
4754 5210
4755 5211
4756 5212
4757 @example 5213 @example
4758 doprnt.c 5214 @file{doprnt.c}
4759 @end example 5215 @end example
4760 5216
4761 @file{doprnt.c} implements formatted-string processing, similar to 5217 @file{doprnt.c} implements formatted-string processing, similar to
4762 @code{printf()} command in C. 5218 @code{printf()} command in C.
4763 5219
4764 5220
4765 5221
4766 @example 5222 @example
4767 undo.c 5223 @file{undo.c}
4768 @end example 5224 @end example
4769 5225
4770 This module implements the undo mechanism for tracking buffer changes. 5226 This module implements the undo mechanism for tracking buffer changes.
4771 Most of this could be implemented in Lisp. 5227 Most of this could be implemented in Lisp.
4772 5228
4773 5229
4774 5230 @node Modules for Interfacing with the File System, Modules for Other Aspects of the Lisp Interpreter and Object System, Modules for Standard Editing Operations, The Modules of XEmacs
4775 @node Editor-Level Control Flow Modules
4776 @section Editor-Level Control Flow Modules
4777 @cindex control flow modules, editor-level
4778 @cindex modules, editor-level control flow
4779
4780 @example
4781 event-Xt.c
4782 event-msw.c
4783 event-stream.c
4784 event-tty.c
4785 events-mod.h
4786 gpmevent.c
4787 gpmevent.h
4788 events.c
4789 events.h
4790 @end example
4791
4792 These implement the handling of events (user input and other system
4793 notifications).
4794
4795 @file{events.c} and @file{events.h} define the @dfn{event} Lisp object
4796 type and primitives for manipulating it.
4797
4798 @file{event-stream.c} implements the basic functions for working with
4799 event queues, dispatching an event by looking it up in relevant keymaps
4800 and such, and handling timeouts; this includes the primitives
4801 @code{next-event} and @code{dispatch-event}, as well as related
4802 primitives such as @code{sit-for}, @code{sleep-for}, and
4803 @code{accept-process-output}. (@file{event-stream.c} is one of the
4804 hairiest and trickiest modules in XEmacs. Beware! You can easily mess
4805 things up here.)
4806
4807 @file{event-Xt.c} and @file{event-tty.c} implement the low-level
4808 interfaces onto retrieving events from Xt (the X toolkit) and from TTY's
4809 (using @code{read()} and @code{select()}), respectively. The event
4810 interface enforces a clean separation between the specific code for
4811 interfacing with the operating system and the generic code for working
4812 with events, by defining an API of basic, low-level event methods;
4813 @file{event-Xt.c} and @file{event-tty.c} are two different
4814 implementations of this API. To add support for a new operating system
4815 (e.g. NeXTstep), one merely needs to provide another implementation of
4816 those API functions.
4817
4818 Note that the choice of whether to use @file{event-Xt.c} or
4819 @file{event-tty.c} is made at compile time! Or at the very latest, it
4820 is made at startup time. @file{event-Xt.c} handles events for
4821 @emph{both} X and TTY frames; @file{event-tty.c} is only used when X
4822 support is not compiled into XEmacs. The reason for this is that there
4823 is only one event loop in XEmacs: thus, it needs to be able to receive
4824 events from all different kinds of frames.
4825
4826
4827
4828 @example
4829 keymap.c
4830 keymap.h
4831 @end example
4832
4833 @file{keymap.c} and @file{keymap.h} define the @dfn{keymap} Lisp object
4834 type and associated methods and primitives. (Remember that keymaps are
4835 objects that associate event descriptions with functions to be called to
4836 ``execute'' those events; @code{dispatch-event} looks up events in the
4837 relevant keymaps.)
4838
4839
4840
4841 @example
4842 cmdloop.c
4843 @end example
4844
4845 @file{cmdloop.c} contains functions that implement the actual editor
4846 command loop---i.e. the event loop that cyclically retrieves and
4847 dispatches events. This code is also rather tricky, just like
4848 @file{event-stream.c}.
4849
4850
4851
4852 @example
4853 macros.c
4854 macros.h
4855 @end example
4856
4857 These two modules contain the basic code for defining keyboard macros.
4858 These functions don't actually do much; most of the code that handles keyboard
4859 macros is mixed in with the event-handling code in @file{event-stream.c}.
4860
4861
4862
4863 @example
4864 minibuf.c
4865 @end example
4866
4867 This contains some miscellaneous code related to the minibuffer (most of
4868 the minibuffer code was moved into Lisp by Richard Mlynarik). This
4869 includes the primitives for completion (although filename completion is
4870 in @file{dired.c}), the lowest-level interface to the minibuffer (if the
4871 command loop were cleaned up, this too could be in Lisp), and code for
4872 dealing with the echo area (this, too, was mostly moved into Lisp, and
4873 the only code remaining is code to call out to Lisp or provide simple
4874 bootstrapping implementations early in temacs, before the echo-area Lisp
4875 code is loaded).
4876
4877
4878
4879 @node Modules for the Basic Displayable Lisp Objects
4880 @section Modules for the Basic Displayable Lisp Objects
4881 @cindex modules for the basic displayable Lisp objects
4882 @cindex displayable Lisp objects, modules for the basic
4883 @cindex Lisp objects, modules for the basic displayable
4884 @cindex objects, modules for the basic displayable Lisp
4885
4886 @example
4887 console-msw.c
4888 console-msw.h
4889 console-stream.c
4890 console-stream.h
4891 console-tty.c
4892 console-tty.h
4893 console-x.c
4894 console-x.h
4895 console.c
4896 console.h
4897 @end example
4898
4899 These modules implement the @dfn{console} Lisp object type. A console
4900 contains multiple display devices, but only one keyboard and mouse.
4901 Most of the time, a console will contain exactly one device.
4902
4903 Consoles are the top of a lisp object inclusion hierarchy. Consoles
4904 contain devices, which contain frames, which contain windows.
4905
4906
4907
4908 @example
4909 device-msw.c
4910 device-tty.c
4911 device-x.c
4912 device.c
4913 device.h
4914 @end example
4915
4916 These modules implement the @dfn{device} Lisp object type. This
4917 abstracts a particular screen or connection on which frames are
4918 displayed. As with Lisp objects, event interfaces, and other
4919 subsystems, the device code is separated into a generic component that
4920 contains a standardized interface (in the form of a set of methods) onto
4921 particular device types.
4922
4923 The device subsystem defines all the methods and provides method
4924 services for not only device operations but also for the frame, window,
4925 menubar, scrollbar, toolbar, and other displayable-object subsystems.
4926 The reason for this is that all of these subsystems have the same
4927 subtypes (X, TTY, NeXTstep, Microsoft Windows, etc.) as devices do.
4928
4929
4930
4931 @example
4932 frame-msw.c
4933 frame-tty.c
4934 frame-x.c
4935 frame.c
4936 frame.h
4937 @end example
4938
4939 Each device contains one or more frames in which objects (e.g. text) are
4940 displayed. A frame corresponds to a window in the window system;
4941 usually this is a top-level window but it could potentially be one of a
4942 number of overlapping child windows within a top-level window, using the
4943 MDI (Multiple Document Interface) protocol in Microsoft Windows or a
4944 similar scheme.
4945
4946 The @file{frame-*} files implement the @dfn{frame} Lisp object type and
4947 provide the generic and device-type-specific operations on frames
4948 (e.g. raising, lowering, resizing, moving, etc.).
4949
4950
4951
4952 @example
4953 window.c
4954 window.h
4955 @end example
4956
4957 @cindex window (in Emacs)
4958 @cindex pane
4959 Each frame consists of one or more non-overlapping @dfn{windows} (better
4960 known as @dfn{panes} in standard window-system terminology) in which a
4961 buffer's text can be displayed. Windows can also have scrollbars
4962 displayed around their edges.
4963
4964 @file{window.c} and @file{window.h} implement the @dfn{window} Lisp
4965 object type and provide code to manage windows. Since windows have no
4966 associated resources in the window system (the window system knows only
4967 about the frame; no child windows or anything are used for XEmacs
4968 windows), there is no device-type-specific code here; all of that code
4969 is part of the redisplay mechanism or the code for particular object
4970 types such as scrollbars.
4971
4972
4973
4974 @node Modules for other Display-Related Lisp Objects
4975 @section Modules for other Display-Related Lisp Objects
4976 @cindex modules for other display-related Lisp objects
4977 @cindex display-related Lisp objects, modules for other
4978 @cindex Lisp objects, modules for other display-related
4979
4980 @example
4981 faces.c
4982 faces.h
4983 @end example
4984
4985
4986
4987 @example
4988 bitmaps.h
4989 glyphs-eimage.c
4990 glyphs-msw.c
4991 glyphs-msw.h
4992 glyphs-widget.c
4993 glyphs-x.c
4994 glyphs-x.h
4995 glyphs.c
4996 glyphs.h
4997 @end example
4998
4999
5000
5001 @example
5002 objects-msw.c
5003 objects-msw.h
5004 objects-tty.c
5005 objects-tty.h
5006 objects-x.c
5007 objects-x.h
5008 objects.c
5009 objects.h
5010 @end example
5011
5012
5013
5014 @example
5015 menubar-msw.c
5016 menubar-msw.h
5017 menubar-x.c
5018 menubar.c
5019 menubar.h
5020 @end example
5021
5022
5023
5024 @example
5025 scrollbar-msw.c
5026 scrollbar-msw.h
5027 scrollbar-x.c
5028 scrollbar-x.h
5029 scrollbar.c
5030 scrollbar.h
5031 @end example
5032
5033
5034
5035 @example
5036 toolbar-msw.c
5037 toolbar-x.c
5038 toolbar.c
5039 toolbar.h
5040 @end example
5041
5042
5043
5044 @example
5045 font-lock.c
5046 @end example
5047
5048 This file provides C support for syntax highlighting---i.e.
5049 highlighting different syntactic constructs of a source file in
5050 different colors, for easy reading. The C support is provided so that
5051 this is fast.
5052
5053
5054
5055 @example
5056 dgif_lib.c
5057 gif_err.c
5058 gif_lib.h
5059 gifalloc.c
5060 @end example
5061
5062 These modules decode GIF-format image files, for use with glyphs.
5063 These files were removed due to Unisys patent infringement concerns.
5064
5065
5066
5067 @node Modules for the Redisplay Mechanism
5068 @section Modules for the Redisplay Mechanism
5069 @cindex modules for the redisplay mechanism
5070 @cindex redisplay mechanism, modules for the
5071
5072 @example
5073 redisplay-output.c
5074 redisplay-msw.c
5075 redisplay-tty.c
5076 redisplay-x.c
5077 redisplay.c
5078 redisplay.h
5079 @end example
5080
5081 These files provide the redisplay mechanism. As with many other
5082 subsystems in XEmacs, there is a clean separation between the general
5083 and device-specific support.
5084
5085 @file{redisplay.c} contains the bulk of the redisplay engine. These
5086 functions update the redisplay structures (which describe how the screen
5087 is to appear) to reflect any changes made to the state of any
5088 displayable objects (buffer, frame, window, etc.) since the last time
5089 that redisplay was called. These functions are highly optimized to
5090 avoid doing more work than necessary (since redisplay is called
5091 extremely often and is potentially a huge time sink), and depend heavily
5092 on notifications from the objects themselves that changes have occurred,
5093 so that redisplay doesn't explicitly have to check each possible object.
5094 The redisplay mechanism also contains a great deal of caching to further
5095 speed things up; some of this caching is contained within the various
5096 displayable objects.
5097
5098 @file{redisplay-output.c} goes through the redisplay structures and converts
5099 them into calls to device-specific methods to actually output the screen
5100 changes.
5101
5102 @file{redisplay-x.c} and @file{redisplay-tty.c} are two implementations
5103 of these redisplay output methods, for X frames and TTY frames,
5104 respectively.
5105
5106
5107
5108 @example
5109 indent.c
5110 @end example
5111
5112 This module contains various functions and Lisp primitives for
5113 converting between buffer positions and screen positions. These
5114 functions call the redisplay mechanism to do most of the work, and then
5115 examine the redisplay structures to get the necessary information. This
5116 module needs work.
5117
5118
5119
5120 @example
5121 termcap.c
5122 terminfo.c
5123 tparam.c
5124 @end example
5125
5126 These files contain functions for working with the termcap (BSD-style)
5127 and terminfo (System V style) databases of terminal capabilities and
5128 escape sequences, used when XEmacs is displaying in a TTY.
5129
5130
5131
5132 @example
5133 cm.c
5134 cm.h
5135 @end example
5136
5137 These files provide some miscellaneous TTY-output functions and should
5138 probably be merged into @file{redisplay-tty.c}.
5139
5140
5141
5142 @node Modules for Interfacing with the File System
5143 @section Modules for Interfacing with the File System 5231 @section Modules for Interfacing with the File System
5144 @cindex modules for interfacing with the file system 5232 @cindex modules for interfacing with the file system
5145 @cindex interfacing with the file system, modules for 5233 @cindex interfacing with the file system, modules for
5146 @cindex file system, modules for interfacing with the 5234 @cindex file system, modules for interfacing with the
5147 5235
5148 @example 5236 @example
5149 lstream.c 5237 @file{lstream.c}
5150 lstream.h 5238 @file{lstream.h}
5151 @end example 5239 @end example
5152 5240
5153 These modules implement the @dfn{stream} Lisp object type. This is an 5241 These modules implement the @dfn{stream} Lisp object type. This is an
5154 internal-only Lisp object that implements a generic buffering stream. 5242 internal-only Lisp object that implements a generic buffering stream.
5155 The idea is to provide a uniform interface onto all sources and sinks of 5243 The idea is to provide a uniform interface onto all sources and sinks of
5172 types of streams; others are provided, e.g., in @file{file-coding.c}. 5260 types of streams; others are provided, e.g., in @file{file-coding.c}.
5173 5261
5174 5262
5175 5263
5176 @example 5264 @example
5177 fileio.c 5265 @file{fileio.c}
5178 @end example 5266 @end example
5179 5267
5180 This implements the basic primitives for interfacing with the file 5268 This implements the basic primitives for interfacing with the file
5181 system. This includes primitives for reading files into buffers, 5269 system. This includes primitives for reading files into buffers,
5182 writing buffers into files, checking for the presence or accessibility 5270 writing buffers into files, checking for the presence or accessibility
5189 @file{simple.el}. 5277 @file{simple.el}.
5190 5278
5191 5279
5192 5280
5193 @example 5281 @example
5194 filelock.c 5282 @file{filelock.c}
5195 @end example 5283 @end example
5196 5284
5197 This file provides functions for detecting clashes between different 5285 This file provides functions for detecting clashes between different
5198 processes (e.g. XEmacs and some external process, or two different 5286 processes (e.g. XEmacs and some external process, or two different
5199 XEmacs processes) modifying the same file. (XEmacs can optionally use 5287 XEmacs processes) modifying the same file. (XEmacs can optionally use
5204 modified, the user is made aware of this so that the buffer can be 5292 modified, the user is made aware of this so that the buffer can be
5205 synched up with the external changes if necessary. 5293 synched up with the external changes if necessary.
5206 5294
5207 5295
5208 @example 5296 @example
5209 filemode.c 5297 @file{filemode.c}
5210 @end example 5298 @end example
5211 5299
5212 This file provides some miscellaneous functions that construct a 5300 This file provides some miscellaneous functions that construct a
5213 @samp{rwxr-xr-x}-type permissions string (as might appear in an 5301 @samp{rwxr-xr-x}-type permissions string (as might appear in an
5214 @file{ls}-style directory listing) given the information returned by the 5302 @file{ls}-style directory listing) given the information returned by the
5215 @code{stat()} system call. 5303 @code{stat()} system call.
5216 5304
5217 5305
5218 5306
5219 @example 5307 @example
5220 dired.c 5308 @file{dired.c}
5221 ndir.h 5309 @file{ndir.h}
5222 @end example 5310 @end example
5223 5311
5224 These files implement the XEmacs interface to directory searching. This 5312 These files implement the XEmacs interface to directory searching. This
5225 includes a number of primitives for determining the files in a directory 5313 includes a number of primitives for determining the files in a directory
5226 and for doing filename completion. (Remember that generic completion is 5314 and for doing filename completion. (Remember that generic completion is
5232 those systems, directories can be read directly as files, and parsed.) 5320 those systems, directories can be read directly as files, and parsed.)
5233 5321
5234 5322
5235 5323
5236 @example 5324 @example
5237 realpath.c 5325 @file{realpath.c}
5238 @end example 5326 @end example
5239 5327
5240 This file provides an implementation of the @code{realpath()} function 5328 This file provides an implementation of the @code{realpath()} function
5241 for expanding symbolic links, on systems that don't implement it or have 5329 for expanding symbolic links, on systems that don't implement it or have
5242 a broken implementation. 5330 a broken implementation.
5243 5331
5244 5332
5245 5333
5246 @node Modules for Other Aspects of the Lisp Interpreter and Object System 5334 @node Modules for Other Aspects of the Lisp Interpreter and Object System, Modules for Interfacing with the Operating System, Modules for Interfacing with the File System, The Modules of XEmacs
5247 @section Modules for Other Aspects of the Lisp Interpreter and Object System 5335 @section Modules for Other Aspects of the Lisp Interpreter and Object System
5248 @cindex modules for other aspects of the Lisp interpreter and object system 5336 @cindex modules for other aspects of the Lisp interpreter and object system
5249 @cindex Lisp interpreter and object system, modules for other aspects of the 5337 @cindex Lisp interpreter and object system, modules for other aspects of the
5250 @cindex interpreter and object system, modules for other aspects of the Lisp 5338 @cindex interpreter and object system, modules for other aspects of the Lisp
5251 @cindex object system, modules for other aspects of the Lisp interpreter and 5339 @cindex object system, modules for other aspects of the Lisp interpreter and
5252 5340
5253 @example 5341 @example
5254 elhash.c 5342 @file{elhash.c}
5255 elhash.h 5343 @file{elhash.h}
5256 hash.c 5344 @file{hash.c}
5257 hash.h 5345 @file{hash.h}
5258 @end example 5346 @end example
5259 5347
5260 These files provide two implementations of hash tables. Files 5348 These files provide two implementations of hash tables. Files
5261 @file{hash.c} and @file{hash.h} provide a generic C implementation of 5349 @file{hash.c} and @file{hash.h} provide a generic C implementation of
5262 hash tables which can stand independently of XEmacs. Files 5350 hash tables which can stand independently of XEmacs. Files
5265 things like garbage collection, and implement the @dfn{hash-table} Lisp 5353 things like garbage collection, and implement the @dfn{hash-table} Lisp
5266 object type. 5354 object type.
5267 5355
5268 5356
5269 @example 5357 @example
5270 specifier.c 5358 @file{specifier.c}
5271 specifier.h 5359 @file{specifier.h}
5272 @end example 5360 @end example
5273 5361
5274 This module implements the @dfn{specifier} Lisp object type. This is 5362 This module implements the @dfn{specifier} Lisp object type. This is
5275 primarily used for displayable properties, and allows for values that 5363 primarily used for displayable properties, and allows for values that
5276 are specific to a particular buffer, window, frame, device, or device 5364 are specific to a particular buffer, window, frame, device, or device
5282 looks up a value given a window (from which a buffer, frame, and device 5370 looks up a value given a window (from which a buffer, frame, and device
5283 can be derived). 5371 can be derived).
5284 5372
5285 5373
5286 @example 5374 @example
5287 chartab.c 5375 @file{chartab.c}
5288 chartab.h 5376 @file{chartab.h}
5289 casetab.c 5377 @file{casetab.c}
5290 @end example 5378 @end example
5291 5379
5292 @file{chartab.c} and @file{chartab.h} implement the @dfn{char table} 5380 @file{chartab.c} and @file{chartab.h} implement the @dfn{char table}
5293 Lisp object type, which maps from characters or certain sorts of 5381 Lisp object type, which maps from characters or certain sorts of
5294 character ranges to Lisp objects. The implementation of this object 5382 character ranges to Lisp objects. The implementation of this object
5304 and to do case-insensitive searching. 5392 and to do case-insensitive searching.
5305 5393
5306 5394
5307 5395
5308 @example 5396 @example
5309 syntax.c 5397 @file{syntax.c}
5310 syntax.h 5398 @file{syntax.h}
5311 @end example 5399 @end example
5312 5400
5313 @cindex scanner 5401 @cindex scanner
5314 This module implements @dfn{syntax tables}, another sort of char table 5402 This module implements @dfn{syntax tables}, another sort of char table
5315 that maps characters into syntax classes that define the syntax of these 5403 that maps characters into syntax classes that define the syntax of these
5374 been removed, readded, and removed again. Currently neither GNU Emacs 5462 been removed, readded, and removed again. Currently neither GNU Emacs
5375 (21.3.99) nor XEmacs (21.5.17) seems to use it. 5463 (21.3.99) nor XEmacs (21.5.17) seems to use it.
5376 5464
5377 5465
5378 @example 5466 @example
5379 casefiddle.c 5467 @file{casefiddle.c}
5380 @end example 5468 @end example
5381 5469
5382 This module implements various Lisp primitives for upcasing, downcasing 5470 This module implements various Lisp primitives for upcasing, downcasing
5383 and capitalizing strings or regions of buffers. 5471 and capitalizing strings or regions of buffers.
5384 5472
5385 5473
5386 5474
5387 @example 5475 @example
5388 rangetab.c 5476 @file{rangetab.c}
5389 @end example 5477 @end example
5390 5478
5391 This module implements the @dfn{range table} Lisp object type, which 5479 This module implements the @dfn{range table} Lisp object type, which
5392 provides for a mapping from ranges of integers to arbitrary Lisp 5480 provides for a mapping from ranges of integers to arbitrary Lisp
5393 objects. 5481 objects.
5394 5482
5395 5483
5396 5484
5397 @example 5485 @example
5398 opaque.c 5486 @file{opaque.c}
5399 opaque.h 5487 @file{opaque.h}
5400 @end example 5488 @end example
5401 5489
5402 This module implements the @dfn{opaque} Lisp object type, an 5490 This module implements the @dfn{opaque} Lisp object type, an
5403 internal-only Lisp object that encapsulates an arbitrary block of memory 5491 internal-only Lisp object that encapsulates an arbitrary block of memory
5404 so that it can be managed by the Lisp allocation system. To create an 5492 so that it can be managed by the Lisp allocation system. To create an
5416 create a new Lisp object type---it's not hard.) 5504 create a new Lisp object type---it's not hard.)
5417 5505
5418 5506
5419 5507
5420 @example 5508 @example
5421 abbrev.c 5509 @file{abbrev.c}
5422 @end example 5510 @end example
5423 5511
5424 This function provides a few primitives for doing dynamic abbreviation 5512 This function provides a few primitives for doing dynamic abbreviation
5425 expansion. In XEmacs, most of the code for this has been moved into 5513 expansion. In XEmacs, most of the code for this has been moved into
5426 Lisp. Some C code remains for speed and because the primitive 5514 Lisp. Some C code remains for speed and because the primitive
5429 is itself in C only for speed.) 5517 is itself in C only for speed.)
5430 5518
5431 5519
5432 5520
5433 @example 5521 @example
5434 doc.c 5522 @file{doc.c}
5435 @end example 5523 @end example
5436 5524
5437 This function provides primitives for retrieving the documentation 5525 This function provides primitives for retrieving the documentation
5438 strings of functions and variables. These documentation strings contain 5526 strings of functions and variables. These documentation strings contain
5439 certain special markers that get dynamically expanded (e.g. a 5527 certain special markers that get dynamically expanded (e.g. a
5448 the appropriate documentation string.) 5536 the appropriate documentation string.)
5449 5537
5450 5538
5451 5539
5452 @example 5540 @example
5453 md5.c 5541 @file{md5.c}
5454 @end example 5542 @end example
5455 5543
5456 This function provides a Lisp primitive that implements the MD5 secure 5544 This function provides a Lisp primitive that implements the MD5 secure
5457 hashing scheme, used to create a large hash value of a string of data such that 5545 hashing scheme, used to create a large hash value of a string of data such that
5458 the data cannot be derived from the hash value. This is used for 5546 the data cannot be derived from the hash value. This is used for
5459 various security applications on the Internet. 5547 various security applications on the Internet.
5460 5548
5461 5549
5462 5550
5463 5551
5464 @node Modules for Interfacing with the Operating System 5552 @node Modules for Interfacing with the Operating System, , Modules for Other Aspects of the Lisp Interpreter and Object System, The Modules of XEmacs
5465 @section Modules for Interfacing with the Operating System 5553 @section Modules for Interfacing with the Operating System
5466 @cindex modules for interfacing with the operating system 5554 @cindex modules for interfacing with the operating system
5467 @cindex interfacing with the operating system, modules for 5555 @cindex interfacing with the operating system, modules for
5468 @cindex operating system, modules for interfacing with the 5556 @cindex operating system, modules for interfacing with the
5469 5557
5470 @example 5558 @example
5471 process.el 5559 @file{process.el}
5472 process.c 5560 @file{process.c}
5473 process.h 5561 @file{process.h}
5474 @end example 5562 @end example
5475 5563
5476 These modules allow XEmacs to spawn and communicate with subprocesses 5564 These modules allow XEmacs to spawn and communicate with subprocesses
5477 and network connections. 5565 and network connections.
5478 5566
5513 subprocesses. 5601 subprocesses.
5514 5602
5515 5603
5516 5604
5517 @example 5605 @example
5518 sysdep.c 5606 @file{sysdep.c}
5519 sysdep.h 5607 @file{sysdep.h}
5520 @end example 5608 @end example
5521 5609
5522 These modules implement most of the low-level, messy operating-system 5610 These modules implement most of the low-level, messy operating-system
5523 interface code. This includes various device control (ioctl) operations 5611 interface code. This includes various device control (ioctl) operations
5524 for file descriptors, TTY's, pseudo-terminals, etc. (usually this stuff 5612 for file descriptors, TTY's, pseudo-terminals, etc. (usually this stuff
5527 provide them or have broken versions. 5615 provide them or have broken versions.
5528 5616
5529 5617
5530 5618
5531 @example 5619 @example
5532 sysdir.h 5620 @file{sysdir.h}
5533 sysfile.h 5621 @file{sysfile.h}
5534 sysfloat.h 5622 @file{sysfloat.h}
5535 sysproc.h 5623 @file{sysproc.h}
5536 syspwd.h 5624 @file{syspwd.h}
5537 syssignal.h 5625 @file{syssignal.h}
5538 systime.h 5626 @file{systime.h}
5539 systty.h 5627 @file{systty.h}
5540 syswait.h 5628 @file{syswait.h}
5541 @end example 5629 @end example
5542 5630
5543 These header files provide consistent interfaces onto system-dependent 5631 These header files provide consistent interfaces onto system-dependent
5544 header files and system calls. The idea is that, instead of including a 5632 header files and system calls. The idea is that, instead of including a
5545 standard header file like @file{<sys/param.h>} (which may or may not 5633 standard header file like @file{<sys/param.h>} (which may or may not
5590 an int). 5678 an int).
5591 5679
5592 5680
5593 5681
5594 @example 5682 @example
5595 hpplay.c 5683 @file{hpplay.c}
5596 libsst.c 5684 @file{libsst.c}
5597 libsst.h 5685 @file{libsst.h}
5598 libst.h 5686 @file{libst.h}
5599 linuxplay.c 5687 @file{linuxplay.c}
5600 nas.c 5688 @file{nas.c}
5601 sgiplay.c 5689 @file{sgiplay.c}
5602 sound.c 5690 @file{sound.c}
5603 sunplay.c 5691 @file{sunplay.c}
5604 @end example 5692 @end example
5605 5693
5606 These files implement the ability to play various sounds on some types 5694 These files implement the ability to play various sounds on some types
5607 of computers. You have to configure your XEmacs with sound support in 5695 of computers. You have to configure your XEmacs with sound support in
5608 order to get this capability. 5696 order to get this capability.
5635 currently in use. 5723 currently in use.
5636 5724
5637 5725
5638 5726
5639 @example 5727 @example
5640 tooltalk.c 5728 @file{tooltalk.c}
5641 tooltalk.h 5729 @file{tooltalk.h}
5642 @end example 5730 @end example
5643 5731
5644 These two modules implement an interface to the ToolTalk protocol, which 5732 These two modules implement an interface to the ToolTalk protocol, which
5645 is an interprocess communication protocol implemented on some versions 5733 is an interprocess communication protocol implemented on some versions
5646 of Unix. ToolTalk is a high-level protocol that allows processes to 5734 of Unix. ToolTalk is a high-level protocol that allows processes to
5652 parts of the SPARCWorks development environment. 5740 parts of the SPARCWorks development environment.
5653 5741
5654 5742
5655 5743
5656 @example 5744 @example
5657 getloadavg.c 5745 @file{getloadavg.c}
5658 @end example 5746 @end example
5659 5747
5660 This module provides the ability to retrieve the system's current load 5748 This module provides the ability to retrieve the system's current load
5661 average. (The way to do this is highly system-specific, unfortunately, 5749 average. (The way to do this is highly system-specific, unfortunately,
5662 and requires a lot of special-case code.) 5750 and requires a lot of special-case code.)
5663 5751
5664 5752
5665 5753
5666 @example 5754 @example
5667 sunpro.c 5755 @file{sunpro.c}
5668 @end example 5756 @end example
5669 5757
5670 This module provides a small amount of code used internally at Sun to 5758 This module provides a small amount of code used internally at Sun to
5671 keep statistics on the usage of XEmacs. 5759 keep statistics on the usage of XEmacs.
5672 5760
5673 5761
5674 5762
5675 @example 5763 @example
5676 broken-sun.h 5764 @file{broken-sun.h}
5677 strcmp.c 5765 @file{strcmp.c}
5678 strcpy.c 5766 @file{strcpy.c}
5679 sunOS-fix.c 5767 @file{sunOS-fix.c}
5680 @end example 5768 @end example
5681 5769
5682 These files provide replacement functions and prototypes to fix numerous 5770 These files provide replacement functions and prototypes to fix numerous
5683 bugs in early releases of SunOS 4.1. 5771 bugs in early releases of SunOS 4.1.
5684 5772
5685 5773
5686 5774
5687 @example 5775 @example
5688 hftctl.c 5776 @file{hftctl.c}
5689 @end example 5777 @end example
5690 5778
5691 This module provides some terminal-control code necessary on versions of 5779 This module provides some terminal-control code necessary on versions of
5692 AIX prior to 4.1. 5780 AIX prior to 4.1.
5693 5781
5694 5782
5695 5783 @node Allocation of Objects in XEmacs Lisp, Dumping, The Modules of XEmacs, Top
5696 @node Modules for Interfacing with X Windows
5697 @section Modules for Interfacing with X Windows
5698 @cindex modules for interfacing with X Windows
5699 @cindex interfacing with X Windows, modules for
5700 @cindex X Windows, modules for interfacing with
5701
5702 @example
5703 Emacs.ad.h
5704 @end example
5705
5706 A file generated from @file{Emacs.ad}, which contains XEmacs-supplied
5707 fallback resources (so that XEmacs has pretty defaults).
5708
5709
5710
5711 @example
5712 EmacsFrame.c
5713 EmacsFrame.h
5714 EmacsFrameP.h
5715 @end example
5716
5717 These modules implement an Xt widget class that encapsulates a frame.
5718 This is for ease in integrating with Xt. The EmacsFrame widget covers
5719 the entire X window except for the menubar; the scrollbars are
5720 positioned on top of the EmacsFrame widget.
5721
5722 @strong{Warning:} Abandon hope, all ye who enter here. This code took
5723 an ungodly amount of time to get right, and is likely to fall apart
5724 mercilessly at the slightest change. Such is life under Xt.
5725
5726
5727
5728 @example
5729 EmacsManager.c
5730 EmacsManager.h
5731 EmacsManagerP.h
5732 @end example
5733
5734 These modules implement a simple Xt manager (i.e. composite) widget
5735 class that simply lets its children set whatever geometry they want.
5736 It's amazing that Xt doesn't provide this standardly, but on second
5737 thought, it makes sense, considering how amazingly broken Xt is.
5738
5739
5740 @example
5741 EmacsShell-sub.c
5742 EmacsShell.c
5743 EmacsShell.h
5744 EmacsShellP.h
5745 @end example
5746
5747 These modules implement two Xt widget classes that are subclasses of
5748 the TopLevelShell and TransientShell classes. This is necessary to deal
5749 with more brokenness that Xt has sadistically thrust onto the backs of
5750 developers.
5751
5752
5753
5754 @example
5755 xgccache.c
5756 xgccache.h
5757 @end example
5758
5759 These modules provide functions for maintenance and caching of GC's
5760 (graphics contexts) under the X Window System. This code is junky and
5761 needs to be rewritten.
5762
5763
5764
5765 @example
5766 select-msw.c
5767 select-x.c
5768 select.c
5769 select.h
5770 @end example
5771
5772 @cindex selections
5773 This module provides an interface to the X Window System's concept of
5774 @dfn{selections}, the standard way for X applications to communicate
5775 with each other.
5776
5777
5778
5779 @example
5780 xintrinsic.h
5781 xintrinsicp.h
5782 xmmanagerp.h
5783 xmprimitivep.h
5784 @end example
5785
5786 These header files are similar in spirit to the @file{sys*.h} files and buffer
5787 against different implementations of Xt and Motif.
5788
5789 @itemize @bullet
5790 @item
5791 @file{xintrinsic.h} should be included in place of @file{<Intrinsic.h>}.
5792 @item
5793 @file{xintrinsicp.h} should be included in place of @file{<IntrinsicP.h>}.
5794 @item
5795 @file{xmmanagerp.h} should be included in place of @file{<XmManagerP.h>}.
5796 @item
5797 @file{xmprimitivep.h} should be included in place of @file{<XmPrimitiveP.h>}.
5798 @end itemize
5799
5800
5801
5802 @example
5803 xmu.c
5804 xmu.h
5805 @end example
5806
5807 These files provide an emulation of the Xmu library for those systems
5808 (i.e. HPUX) that don't provide it as a standard part of X.
5809
5810
5811
5812 @example
5813 ExternalClient-Xlib.c
5814 ExternalClient.c
5815 ExternalClient.h
5816 ExternalClientP.h
5817 ExternalShell.c
5818 ExternalShell.h
5819 ExternalShellP.h
5820 extw-Xlib.c
5821 extw-Xlib.h
5822 extw-Xt.c
5823 extw-Xt.h
5824 @end example
5825
5826 @cindex external widget
5827 These files provide the @dfn{external widget} interface, which allows an
5828 XEmacs frame to appear as a widget in another application. To do this,
5829 you have to configure with @samp{--external-widget}.
5830
5831 @file{ExternalShell*} provides the server (XEmacs) side of the
5832 connection.
5833
5834 @file{ExternalClient*} provides the client (other application) side of
5835 the connection. These files are not compiled into XEmacs but are
5836 compiled into libraries that are then linked into your application.
5837
5838 @file{extw-*} is common code that is used for both the client and server.
5839
5840 Don't touch this code; something is liable to break if you do.
5841
5842
5843
5844 @node Modules for Internationalization
5845 @section Modules for Internationalization
5846 @cindex modules for internationalization
5847 @cindex internationalization, modules for
5848
5849 @example
5850 mule-canna.c
5851 mule-ccl.c
5852 mule-charset.c
5853 mule-charset.h
5854 file-coding.c
5855 file-coding.h
5856 mule-coding.c
5857 mule-mcpath.c
5858 mule-mcpath.h
5859 mule-wnnfns.c
5860 mule.c
5861 @end example
5862
5863 These files implement the MULE (Asian-language) support. Note that MULE
5864 actually provides a general interface for all sorts of languages, not
5865 just Asian languages (although they are generally the most complicated
5866 to support). This code is still in beta.
5867
5868 @file{mule-charset.*} and @file{file-coding.*} provide the heart of the
5869 XEmacs MULE support. @file{mule-charset.*} implements the @dfn{charset}
5870 Lisp object type, which encapsulates a character set (an ordered one- or
5871 two-dimensional set of characters, such as US ASCII or JISX0208 Japanese
5872 Kanji).
5873
5874 @file{file-coding.*} implements the @dfn{coding-system} Lisp object
5875 type, which encapsulates a method of converting between different
5876 encodings. An encoding is a representation of a stream of characters,
5877 possibly from multiple character sets, using a stream of bytes or words,
5878 and defines (e.g.) which escape sequences are used to specify particular
5879 character sets, how the indices for a character are converted into bytes
5880 (sometimes this involves setting the high bit; sometimes complicated
5881 rearranging of the values takes place, as in the Shift-JIS encoding),
5882 etc. It also contains some generic coding system implementations, such
5883 as the binary (no-conversion) coding system and a sample gzip coding system.
5884
5885 @file{mule-coding.c} contains the implementations of text coding systems.
5886
5887 @file{mule-ccl.c} provides the CCL (Code Conversion Language)
5888 interpreter. CCL is similar in spirit to Lisp byte code and is used to
5889 implement converters for custom encodings.
5890
5891 @file{mule-canna.c} and @file{mule-wnnfns.c} implement interfaces to
5892 external programs used to implement the Canna and WNN input methods,
5893 respectively. This is currently in beta.
5894
5895 @file{mule-mcpath.c} provides some functions to allow for pathnames
5896 containing extended characters. This code is fragmentary, obsolete, and
5897 completely non-working. Instead, @code{pathname-coding-system} is used
5898 to specify conversions of names of files and directories. The standard
5899 C I/O functions like @samp{open()} are wrapped so that conversion occurs
5900 automatically.
5901
5902 @file{mule.c} contains a few miscellaneous things. It currently seems
5903 to be unused and probably should be removed.
5904
5905
5906
5907 @example
5908 intl.c
5909 @end example
5910
5911 This provides some miscellaneous internationalization code for
5912 implementing message translation and interfacing to the Ximp input
5913 method. None of this code is currently working.
5914
5915
5916
5917 @example
5918 iso-wide.h
5919 @end example
5920
5921 This contains leftover code from an earlier implementation of
5922 Asian-language support, and is not currently used.
5923
5924
5925
5926
5927 @node Modules for Regression Testing
5928 @section Modules for Regression Testing
5929 @cindex modules for regression testing
5930 @cindex regression testing, modules for
5931
5932 @example
5933 test-harness.el
5934 base64-tests.el
5935 byte-compiler-tests.el
5936 case-tests.el
5937 ccl-tests.el
5938 c-tests.el
5939 database-tests.el
5940 extent-tests.el
5941 hash-table-tests.el
5942 lisp-tests.el
5943 md5-tests.el
5944 mule-tests.el
5945 regexp-tests.el
5946 symbol-tests.el
5947 syntax-tests.el
5948 tag-tests.el
5949 weak-tests.el
5950 @end example
5951
5952 @file{test-harness.el} defines the macros @code{Assert},
5953 @code{Check-Error}, @code{Check-Error-Message}, and
5954 @code{Check-Message}. The other files are test files, testing various
5955 XEmacs facilities. @xref{Regression Testing XEmacs}.
5956
5957
5958
5959 @node Allocation of Objects in XEmacs Lisp, Dumping, A Summary of the Various XEmacs Modules, Top
5960 @chapter Allocation of Objects in XEmacs Lisp 5784 @chapter Allocation of Objects in XEmacs Lisp
5961 @cindex allocation of objects in XEmacs Lisp 5785 @cindex allocation of objects in XEmacs Lisp
5962 @cindex objects in XEmacs Lisp, allocation of 5786 @cindex objects in XEmacs Lisp, allocation of
5963 @cindex Lisp objects, allocation of in XEmacs 5787 @cindex Lisp objects, allocation of in XEmacs
5964 5788
5965 @menu 5789 @menu
5966 * Introduction to Allocation:: 5790 * Introduction to Allocation::
5967 * Garbage Collection:: 5791 * Garbage Collection::
5968 * GCPROing:: 5792 * GCPROing::
5969 * Garbage Collection - Step by Step:: 5793 * Garbage Collection - Step by Step::
5970 * Integers and Characters:: 5794 * Integers and Characters::
5971 * Allocation from Frob Blocks:: 5795 * Allocation from Frob Blocks::
5972 * lrecords:: 5796 * lrecords::
5973 * Low-level allocation:: 5797 * Low-level allocation::
5974 * Cons:: 5798 * Cons::
5975 * Vector:: 5799 * Vector::
5976 * Bit Vector:: 5800 * Bit Vector::
5977 * Symbol:: 5801 * Symbol::
5978 * Marker:: 5802 * Marker::
5979 * String:: 5803 * String::
5980 * Compiled Function:: 5804 * Compiled Function::
5981 @end menu 5805 @end menu
5982 5806
5983 @node Introduction to Allocation 5807 @node Introduction to Allocation, Garbage Collection, Allocation of Objects in XEmacs Lisp, Allocation of Objects in XEmacs Lisp
5984 @section Introduction to Allocation 5808 @section Introduction to Allocation
5985 @cindex allocation, introduction to 5809 @cindex allocation, introduction to
5986 5810
5987 Emacs Lisp, like all Lisps, has garbage collection. This means that 5811 Emacs Lisp, like all Lisps, has garbage collection. This means that
5988 the programmer never has to explicitly free (destroy) an object; it 5812 the programmer never has to explicitly free (destroy) an object; it
6050 like vectors. You can basically view them as exactly like vectors 5874 like vectors. You can basically view them as exactly like vectors
6051 except that their type is stored in lrecord fashion rather than 5875 except that their type is stored in lrecord fashion rather than
6052 in directly-tagged fashion. 5876 in directly-tagged fashion.
6053 5877
6054 5878
6055 @node Garbage Collection 5879 @node Garbage Collection, GCPROing, Introduction to Allocation, Allocation of Objects in XEmacs Lisp
6056 @section Garbage Collection 5880 @section Garbage Collection
6057 @cindex garbage collection 5881 @cindex garbage collection
6058 5882
6059 @cindex mark and sweep 5883 @cindex mark and sweep
6060 Garbage collection is simple in theory but tricky to implement. 5884 Garbage collection is simple in theory but tricky to implement.
6061 Emacs Lisp uses the oldest garbage collection method, called 5885 Emacs Lisp uses the oldest garbage collection method, called
6062 @dfn{mark and sweep}. Garbage collection begins by starting with 5886 @dfn{mark and sweep}. Garbage collection begins by starting with
6063 all accessible locations (i.e. all variables and other slots where 5887 all accessible locations (i.e. all variables and other slots where
6064 Lisp objects might occur) and recursively traversing all objects 5888 Lisp objects might occur) and recursively traversing all objects
6065 accessible from those slots, marking each one that is found. 5889 accessible from those slots, marking each one that is found.
6066 We then go through all of memory and free each object that is 5890 We then go through all of memory and free each object that is
6067 not marked, and unmarking each object that is marked. Note 5891 not marked, and unmarking each object that is marked. Note
6074 @code{garbage-collect} but is also called automatically by @code{eval}, 5898 @code{garbage-collect} but is also called automatically by @code{eval},
6075 once a certain amount of memory has been allocated since the last 5899 once a certain amount of memory has been allocated since the last
6076 garbage collection (according to @code{gc-cons-threshold}). 5900 garbage collection (according to @code{gc-cons-threshold}).
6077 5901
6078 5902
6079 @node GCPROing 5903 @node GCPROing, Garbage Collection - Step by Step, Garbage Collection, Allocation of Objects in XEmacs Lisp
6080 @section @code{GCPRO}ing 5904 @section @code{GCPRO}ing
6081 @cindex @code{GCPRO}ing 5905 @cindex @code{GCPRO}ing
6082 @cindex garbage collection protection 5906 @cindex garbage collection protection
6083 @cindex protection, garbage collection 5907 @cindex protection, garbage collection
6084 5908
6249 anything that looks like a reference to an object as a reference. This 6073 anything that looks like a reference to an object as a reference. This
6250 will result in a few objects not getting collected when they should, but 6074 will result in a few objects not getting collected when they should, but
6251 it obviates the need for @code{GCPRO}ing, and allows garbage collection 6075 it obviates the need for @code{GCPRO}ing, and allows garbage collection
6252 to happen at any point at all, such as during object allocation. 6076 to happen at any point at all, such as during object allocation.
6253 6077
6254 @node Garbage Collection - Step by Step 6078 @node Garbage Collection - Step by Step, Integers and Characters, GCPROing, Allocation of Objects in XEmacs Lisp
6255 @section Garbage Collection - Step by Step 6079 @section Garbage Collection - Step by Step
6256 @cindex garbage collection - step by step 6080 @cindex garbage collection - step by step
6257 6081
6258 @menu 6082 @menu
6259 * Invocation:: 6083 * Invocation::
6260 * garbage_collect_1:: 6084 * garbage_collect_1::
6261 * mark_object:: 6085 * mark_object::
6262 * gc_sweep:: 6086 * gc_sweep::
6263 * sweep_lcrecords_1:: 6087 * sweep_lcrecords_1::
6264 * compact_string_chars:: 6088 * compact_string_chars::
6265 * sweep_strings:: 6089 * sweep_strings::
6266 * sweep_bit_vectors_1:: 6090 * sweep_bit_vectors_1::
6267 @end menu 6091 @end menu
6268 6092
6269 @node Invocation 6093 @node Invocation, garbage_collect_1, Garbage Collection - Step by Step, Garbage Collection - Step by Step
6270 @subsection Invocation 6094 @subsection Invocation
6271 @cindex garbage collection, invocation 6095 @cindex garbage collection, invocation
6272 6096
6273 The first thing that anyone should know about garbage collection is: 6097 The first thing that anyone should know about garbage collection is:
6274 when and how the garbage collector is invoked. One might think that this 6098 when and how the garbage collector is invoked. One might think that this
6324 everything related to @code{eval} (@code{Feval_buffer}, @code{call0}, 6148 everything related to @code{eval} (@code{Feval_buffer}, @code{call0},
6325 ...) and inside @code{Fsignal}. The latter is used to handle signals, as 6149 ...) and inside @code{Fsignal}. The latter is used to handle signals, as
6326 for example the ones raised by every @code{QUIT}-macro triggered after 6150 for example the ones raised by every @code{QUIT}-macro triggered after
6327 pressing Ctrl-g. 6151 pressing Ctrl-g.
6328 6152
6329 @node garbage_collect_1 6153 @node garbage_collect_1, mark_object, Invocation, Garbage Collection - Step by Step
6330 @subsection @code{garbage_collect_1} 6154 @subsection @code{garbage_collect_1}
6331 @cindex @code{garbage_collect_1} 6155 @cindex @code{garbage_collect_1}
6332 6156
6333 We can now describe exactly what happens after the invocation takes 6157 We can now describe exactly what happens after the invocation takes
6334 place. 6158 place.
6514 A small memory reserve is always held back that can be reached by 6338 A small memory reserve is always held back that can be reached by
6515 @code{breathing_space}. If nothing more is left, we create a new reserve 6339 @code{breathing_space}. If nothing more is left, we create a new reserve
6516 and exit. 6340 and exit.
6517 @end enumerate 6341 @end enumerate
6518 6342
6519 @node mark_object 6343 @node mark_object, gc_sweep, garbage_collect_1, Garbage Collection - Step by Step
6520 @subsection @code{mark_object} 6344 @subsection @code{mark_object}
6521 @cindex @code{mark_object} 6345 @cindex @code{mark_object}
6522 6346
6523 The first thing that is checked while marking an object is whether the 6347 The first thing that is checked while marking an object is whether the
6524 object is a real Lisp object @code{Lisp_Type_Record} or just an integer 6348 object is a real Lisp object @code{Lisp_Type_Record} or just an integer
6548 be performed. 6372 be performed.
6549 6373
6550 In case another object was returned, as mentioned before, we reiterate 6374 In case another object was returned, as mentioned before, we reiterate
6551 the whole @code{mark_object} process beginning with this next object. 6375 the whole @code{mark_object} process beginning with this next object.
6552 6376
6553 @node gc_sweep 6377 @node gc_sweep, sweep_lcrecords_1, mark_object, Garbage Collection - Step by Step
6554 @subsection @code{gc_sweep} 6378 @subsection @code{gc_sweep}
6555 @cindex @code{gc_sweep} 6379 @cindex @code{gc_sweep}
6556 6380
6557 The job of this function is to free all unmarked records from memory. As 6381 The job of this function is to free all unmarked records from memory. As
6558 we know, there are different types of objects implemented and managed, and 6382 we know, there are different types of objects implemented and managed, and
6643 (by @code{UNMARK_...}). While going through one block, we note if the 6467 (by @code{UNMARK_...}). While going through one block, we note if the
6644 whole block is empty. If so, the whole block is freed (using 6468 whole block is empty. If so, the whole block is freed (using
6645 @code{xfree}) and the free list state is set to the state it had before 6469 @code{xfree}) and the free list state is set to the state it had before
6646 handling this block. 6470 handling this block.
6647 6471
6648 @node sweep_lcrecords_1 6472 @node sweep_lcrecords_1, compact_string_chars, gc_sweep, Garbage Collection - Step by Step
6649 @subsection @code{sweep_lcrecords_1} 6473 @subsection @code{sweep_lcrecords_1}
6650 @cindex @code{sweep_lcrecords_1} 6474 @cindex @code{sweep_lcrecords_1}
6651 6475
6652 After nullifying the complete lcrecord statistics, we go over all 6476 After nullifying the complete lcrecord statistics, we go over all
6653 lcrecords two separate times. They are all chained together in a list with 6477 lcrecords two separate times. They are all chained together in a list with
6664 through the whole list. In case an object is read only or marked, it 6488 through the whole list. In case an object is read only or marked, it
6665 has to persist, otherwise it is manually freed by calling 6489 has to persist, otherwise it is manually freed by calling
6666 @code{xfree}. During this loop, the lcrecord statistics are kept up to 6490 @code{xfree}. During this loop, the lcrecord statistics are kept up to
6667 date by calling @code{tick_lcrecord_stats} with the right arguments, 6491 date by calling @code{tick_lcrecord_stats} with the right arguments,
6668 6492
6669 @node compact_string_chars 6493 @node compact_string_chars, sweep_strings, sweep_lcrecords_1, Garbage Collection - Step by Step
6670 @subsection @code{compact_string_chars} 6494 @subsection @code{compact_string_chars}
6671 @cindex @code{compact_string_chars} 6495 @cindex @code{compact_string_chars}
6672 6496
6673 The purpose of this function is to compact all the data parts of the 6497 The purpose of this function is to compact all the data parts of the
6674 strings that are held in so-called @code{string_chars_block}, i.e. the 6498 strings that are held in so-called @code{string_chars_block}, i.e. the
6710 @code{string_chars_block}, sitting in @code{current_string_chars_block}, 6534 @code{string_chars_block}, sitting in @code{current_string_chars_block},
6711 is reset on the last block to which we moved a string, 6535 is reset on the last block to which we moved a string,
6712 i.e. @code{to_block}, and all remaining blocks (we know that they just 6536 i.e. @code{to_block}, and all remaining blocks (we know that they just
6713 carry garbage) are explicitly @code{xfree}d. 6537 carry garbage) are explicitly @code{xfree}d.
6714 6538
6715 @node sweep_strings 6539 @node sweep_strings, sweep_bit_vectors_1, compact_string_chars, Garbage Collection - Step by Step
6716 @subsection @code{sweep_strings} 6540 @subsection @code{sweep_strings}
6717 @cindex @code{sweep_strings} 6541 @cindex @code{sweep_strings}
6718 6542
6719 The sweeping for the fixed sized string objects is essentially exactly 6543 The sweeping for the fixed sized string objects is essentially exactly
6720 the same as it is for all other fixed size types. As before, the freeing 6544 the same as it is for all other fixed size types. As before, the freeing
6731 addition: in case, the string was not allocated in a 6555 addition: in case, the string was not allocated in a
6732 @code{string_chars_block} because it exceeded the maximal length, and 6556 @code{string_chars_block} because it exceeded the maximal length, and
6733 therefore it was @code{malloc}ed separately, we know also @code{xfree} 6557 therefore it was @code{malloc}ed separately, we know also @code{xfree}
6734 it explicitly. 6558 it explicitly.
6735 6559
6736 @node sweep_bit_vectors_1 6560 @node sweep_bit_vectors_1, , sweep_strings, Garbage Collection - Step by Step
6737 @subsection @code{sweep_bit_vectors_1} 6561 @subsection @code{sweep_bit_vectors_1}
6738 @cindex @code{sweep_bit_vectors_1} 6562 @cindex @code{sweep_bit_vectors_1}
6739 6563
6740 Bit vectors are also one of the rare types that are @code{malloc}ed 6564 Bit vectors are also one of the rare types that are @code{malloc}ed
6741 individually. Consequently, while sweeping, all further needless 6565 individually. Consequently, while sweeping, all further needless
6745 all unmarked bit vectors are unlinked by calling @code{xfree} and all of 6569 all unmarked bit vectors are unlinked by calling @code{xfree} and all of
6746 them become unmarked. 6570 them become unmarked.
6747 In addition, the bookkeeping information used for garbage 6571 In addition, the bookkeeping information used for garbage
6748 collector's output purposes is updated. 6572 collector's output purposes is updated.
6749 6573
6750 @node Integers and Characters 6574 @node Integers and Characters, Allocation from Frob Blocks, Garbage Collection - Step by Step, Allocation of Objects in XEmacs Lisp
6751 @section Integers and Characters 6575 @section Integers and Characters
6752 @cindex integers and characters 6576 @cindex integers and characters
6753 @cindex characters, integers and 6577 @cindex characters, integers and
6754 6578
6755 Integer and character Lisp objects are created from integers using the 6579 Integer and character Lisp objects are created from integers using the
6761 6585
6762 @code{XSETINT()} and the like will truncate values given to them that 6586 @code{XSETINT()} and the like will truncate values given to them that
6763 are too big; i.e. you won't get the value you expected but the tag bits 6587 are too big; i.e. you won't get the value you expected but the tag bits
6764 will at least be correct. 6588 will at least be correct.
6765 6589
6766 @node Allocation from Frob Blocks 6590 @node Allocation from Frob Blocks, lrecords, Integers and Characters, Allocation of Objects in XEmacs Lisp
6767 @section Allocation from Frob Blocks 6591 @section Allocation from Frob Blocks
6768 @cindex allocation from frob blocks 6592 @cindex allocation from frob blocks
6769 @cindex frob blocks, allocation from 6593 @cindex frob blocks, allocation from
6770 6594
6771 The uninitialized memory required by a @code{Lisp_Object} of a particular type 6595 The uninitialized memory required by a @code{Lisp_Object} of a particular type
6790 @code{ALLOCATE_FIXED_TYPE_FROM_BLOCK()}, which looks at the end of the 6614 @code{ALLOCATE_FIXED_TYPE_FROM_BLOCK()}, which looks at the end of the
6791 last frob block for space, and creates a new frob block if there is 6615 last frob block for space, and creates a new frob block if there is
6792 none. (There are actually two versions of these macros, one of which is 6616 none. (There are actually two versions of these macros, one of which is
6793 more defensive but less efficient and is used for error-checking.) 6617 more defensive but less efficient and is used for error-checking.)
6794 6618
6795 @node lrecords 6619 @node lrecords, Low-level allocation, Allocation from Frob Blocks, Allocation of Objects in XEmacs Lisp
6796 @section lrecords 6620 @section lrecords
6797 @cindex lrecords 6621 @cindex lrecords
6798 6622
6799 [see @file{lrecord.h}] 6623 [see @file{lrecord.h}]
6800 6624
7030 (i.e. declared with a @code{_SEQUENCE_IMPLEMENTATION} macro.) This should 6854 (i.e. declared with a @code{_SEQUENCE_IMPLEMENTATION} macro.) This should
7031 simply return the object's size in bytes, exactly as you might expect. 6855 simply return the object's size in bytes, exactly as you might expect.
7032 For an example, see the methods for window configurations and opaques. 6856 For an example, see the methods for window configurations and opaques.
7033 @end enumerate 6857 @end enumerate
7034 6858
7035 @node Low-level allocation 6859 @node Low-level allocation, Cons, lrecords, Allocation of Objects in XEmacs Lisp
7036 @section Low-level allocation 6860 @section Low-level allocation
7037 @cindex low-level allocation 6861 @cindex low-level allocation
7038 @cindex allocation, low-level 6862 @cindex allocation, low-level
7039 6863
7040 Memory that you want to allocate directly should be allocated using 6864 Memory that you want to allocate directly should be allocated using
7103 and bit-vector creation routines. These routines also call 6927 and bit-vector creation routines. These routines also call
7104 @code{INCREMENT_CONS_COUNTER()} at the appropriate times; this keeps 6928 @code{INCREMENT_CONS_COUNTER()} at the appropriate times; this keeps
7105 statistics on how much memory is allocated, so that garbage-collection 6929 statistics on how much memory is allocated, so that garbage-collection
7106 can be invoked when the threshold is reached. 6930 can be invoked when the threshold is reached.
7107 6931
7108 @node Cons 6932 @node Cons, Vector, Low-level allocation, Allocation of Objects in XEmacs Lisp
7109 @section Cons 6933 @section Cons
7110 @cindex cons 6934 @cindex cons
7111 6935
7112 Conses are allocated in standard frob blocks. The only thing to 6936 Conses are allocated in standard frob blocks. The only thing to
7113 note is that conses can be explicitly freed using @code{free_cons()} 6937 note is that conses can be explicitly freed using @code{free_cons()}
7118 generating extra objects and thereby triggering GC sooner. 6942 generating extra objects and thereby triggering GC sooner.
7119 However, you have to be @emph{extremely} careful when doing this. 6943 However, you have to be @emph{extremely} careful when doing this.
7120 If you mess this up, you will get BADLY BURNED, and it has happened 6944 If you mess this up, you will get BADLY BURNED, and it has happened
7121 before. 6945 before.
7122 6946
7123 @node Vector 6947 @node Vector, Bit Vector, Cons, Allocation of Objects in XEmacs Lisp
7124 @section Vector 6948 @section Vector
7125 @cindex vector 6949 @cindex vector
7126 6950
7127 As mentioned above, each vector is @code{malloc()}ed individually, and 6951 As mentioned above, each vector is @code{malloc()}ed individually, and
7128 all are threaded through the variable @code{all_vectors}. Vectors are 6952 all are threaded through the variable @code{all_vectors}. Vectors are
7130 Note that the @code{struct Lisp_Vector} is declared with its 6954 Note that the @code{struct Lisp_Vector} is declared with its
7131 @code{contents} field being a @emph{stretchy} array of one element. It 6955 @code{contents} field being a @emph{stretchy} array of one element. It
7132 is actually @code{malloc()}ed with the right size, however, and access 6956 is actually @code{malloc()}ed with the right size, however, and access
7133 to any element through the @code{contents} array works fine. 6957 to any element through the @code{contents} array works fine.
7134 6958
7135 @node Bit Vector 6959 @node Bit Vector, Symbol, Vector, Allocation of Objects in XEmacs Lisp
7136 @section Bit Vector 6960 @section Bit Vector
7137 @cindex bit vector 6961 @cindex bit vector
7138 @cindex vector, bit 6962 @cindex vector, bit
7139 6963
7140 Bit vectors work exactly like vectors, except for more complicated 6964 Bit vectors work exactly like vectors, except for more complicated
7142 vectors are lrecords while vectors are not. (The only difference here is 6966 vectors are lrecords while vectors are not. (The only difference here is
7143 that there's an lrecord implementation pointer at the beginning and the 6967 that there's an lrecord implementation pointer at the beginning and the
7144 tag field in bit vector Lisp words is ``lrecord'' rather than 6968 tag field in bit vector Lisp words is ``lrecord'' rather than
7145 ``vector''.) 6969 ``vector''.)
7146 6970
7147 @node Symbol 6971 @node Symbol, Marker, Bit Vector, Allocation of Objects in XEmacs Lisp
7148 @section Symbol 6972 @section Symbol
7149 @cindex symbol 6973 @cindex symbol
7150 6974
7151 Symbols are also allocated in frob blocks. Symbols in the awful 6975 Symbols are also allocated in frob blocks. Symbols in the awful
7152 horrible obarray structure are chained through their @code{next} field. 6976 horrible obarray structure are chained through their @code{next} field.
7153 6977
7154 Remember that @code{intern} looks up a symbol in an obarray, creating 6978 Remember that @code{intern} looks up a symbol in an obarray, creating
7155 one if necessary. 6979 one if necessary.
7156 6980
7157 @node Marker 6981 @node Marker, String, Symbol, Allocation of Objects in XEmacs Lisp
7158 @section Marker 6982 @section Marker
7159 @cindex marker 6983 @cindex marker
7160 6984
7161 Markers are allocated in frob blocks, as usual. They are kept 6985 Markers are allocated in frob blocks, as usual. They are kept
7162 in a buffer unordered, but in a doubly-linked list so that they 6986 in a buffer unordered, but in a doubly-linked list so that they
7164 but in some cases garbage collection took an extraordinarily 6988 but in some cases garbage collection took an extraordinarily
7165 long time due to the O(N^2) time required to remove lots of 6989 long time due to the O(N^2) time required to remove lots of
7166 markers from a buffer.) Markers are removed from a buffer in 6990 markers from a buffer.) Markers are removed from a buffer in
7167 the finalize stage, in @code{ADDITIONAL_FREE_marker()}. 6991 the finalize stage, in @code{ADDITIONAL_FREE_marker()}.
7168 6992
7169 @node String 6993 @node String, Compiled Function, Marker, Allocation of Objects in XEmacs Lisp
7170 @section String 6994 @section String
7171 @cindex string 6995 @cindex string
7172 6996
7173 As mentioned above, strings are a special case. A string is logically 6997 As mentioned above, strings are a special case. A string is logically
7174 two parts, a fixed-size object (containing the length, property list, 6998 two parts, a fixed-size object (containing the length, property list,
7226 string data (which would normally be obtained from the now-non-existent 7050 string data (which would normally be obtained from the now-non-existent
7227 @code{struct Lisp_String}) at the beginning of the dead string data gap. 7051 @code{struct Lisp_String}) at the beginning of the dead string data gap.
7228 The string compactor recognizes this special 0xFFFFFFFF marker and 7052 The string compactor recognizes this special 0xFFFFFFFF marker and
7229 handles it correctly. 7053 handles it correctly.
7230 7054
7231 @node Compiled Function 7055 @node Compiled Function, , String, Allocation of Objects in XEmacs Lisp
7232 @section Compiled Function 7056 @section Compiled Function
7233 @cindex compiled function 7057 @cindex compiled function
7234 @cindex function, compiled 7058 @cindex function, compiled
7235 7059
7236 Not yet documented. 7060 Not yet documented.
7238 7062
7239 @node Dumping, Events and the Event Loop, Allocation of Objects in XEmacs Lisp, Top 7063 @node Dumping, Events and the Event Loop, Allocation of Objects in XEmacs Lisp, Top
7240 @chapter Dumping 7064 @chapter Dumping
7241 @cindex dumping 7065 @cindex dumping
7242 7066
7243 @section What is dumping and its justification 7067 @menu
7244 @cindex dumping and its justification, what is 7068 * Dumping Justification::
7069 * Overview::
7070 * Data descriptions::
7071 * Dumping phase::
7072 * Reloading phase::
7073 * Remaining issues::
7074 @end menu
7075
7076 @node Dumping Justification, Overview, Dumping, Dumping
7077 @section Dumping Justification
7078 @cindex dumping, justification
7245 7079
7246 The C code of XEmacs is just a Lisp engine with a lot of built-in 7080 The C code of XEmacs is just a Lisp engine with a lot of built-in
7247 primitives useful for writing an editor. The editor itself is written 7081 primitives useful for writing an editor. The editor itself is written
7248 mostly in Lisp, and represents around 100K lines of code. Loading and 7082 mostly in Lisp, and represents around 100K lines of code. Loading and
7249 executing the initialization of all this code takes a bit a time (five 7083 executing the initialization of all this code takes a bit a time (five
7261 This solution, while working, has a huge problem: the creation of the 7095 This solution, while working, has a huge problem: the creation of the
7262 new executable from the actual contents of memory is an extremely 7096 new executable from the actual contents of memory is an extremely
7263 system-specific process, quite error-prone, and which interferes with a 7097 system-specific process, quite error-prone, and which interferes with a
7264 lot of system libraries (like malloc). It is even getting worse 7098 lot of system libraries (like malloc). It is even getting worse
7265 nowadays with libraries using constructors which are automatically 7099 nowadays with libraries using constructors which are automatically
7266 called when the program is started (even before main()) which tend to 7100 called when the program is started (even before @code{main()}) which tend to
7267 crash when they are called multiple times, once before dumping and once 7101 crash when they are called multiple times, once before dumping and once
7268 after (IRIX 6.x libz.so pulls in some C++ image libraries thru 7102 after (IRIX 6.x @file{libz.so} pulls in some C++ image libraries thru
7269 dependencies which have this problem). Writing the dumper is also one 7103 dependencies which have this problem). Writing the dumper is also one
7270 of the most difficult parts of porting XEmacs to a new operating system. 7104 of the most difficult parts of porting XEmacs to a new operating system.
7271 Basically, `dumping' is an operation that is just not officially 7105 Basically, `dumping' is an operation that is just not officially
7272 supported on many operating systems. 7106 supported on many operating systems.
7273 7107
7274 The aim of the portable dumper is to solve the same problem as the 7108 The aim of the portable dumper is to solve the same problem as the
7275 system-specific dumper, that is to be able to reload quickly, using only 7109 system-specific dumper, that is to be able to reload quickly, using only
7276 a small number of files, the fully initialized lisp part of the editor, 7110 a small number of files, the fully initialized lisp part of the editor,
7277 without any system-specific hacks. 7111 without any system-specific hacks.
7278 7112
7279 @menu 7113 @node Overview, Data descriptions, Dumping Justification, Dumping
7280 * Overview::
7281 * Data descriptions::
7282 * Dumping phase::
7283 * Reloading phase::
7284 * Remaining issues::
7285 @end menu
7286
7287 @node Overview
7288 @section Overview 7114 @section Overview
7289 @cindex dumping overview 7115 @cindex dumping overview
7290 7116
7291 The portable dumping system has to: 7117 The portable dumping system has to:
7292 7118
7293 @enumerate 7119 @enumerate
7294 @item 7120 @item
7295 At dump time, write all initialized, non-quickly-rebuildable data to a 7121 At dump time, write all initialized, non-quickly-rebuildable data to a
7296 file [Note: currently named @file{xemacs.dmp}, but the name will 7122 file [Note: currently named @file{xemacs.dmp}, but the name will
7297 change], along with all informations needed for the reloading. 7123 change], along with all information needed for the reloading.
7298 7124
7299 @item 7125 @item
7300 When starting xemacs, reload the dump file, relocate it to its new 7126 When starting xemacs, reload the dump file, relocate it to its new
7301 starting address if needed, and reinitialize all pointers to this 7127 starting address if needed, and reinitialize all pointers to this
7302 data. Also, rebuild all the quickly rebuildable data. 7128 data. Also, rebuild all the quickly rebuildable data.
7303 @end enumerate 7129 @end enumerate
7304 7130
7305 Note: As of 21.5.18, the dump file has been moved inside of the 7131 Note: As of 21.5.18, the dump file has been moved inside of the
7306 executable, although there are still problems with this on some systems. 7132 executable, although there are still problems with this on some systems.
7307 7133
7308 @node Data descriptions 7134 @node Data descriptions, Dumping phase, Overview, Dumping
7309 @section Data descriptions 7135 @section Data descriptions
7310 @cindex dumping data descriptions 7136 @cindex dumping data descriptions
7311 7137
7312 The more complex task of the dumper is to be able to write lisp objects 7138 The more complex task of the dumper is to be able to write memory blocks
7313 (lrecords) and C structs to disk and reload them at a different address, 7139 on the heap (lisp objects, i.e. lrecords, and C-allocated memory, such
7140 as structs and arrays) to disk and reload them at a different address,
7314 updating all the pointers they include in the process. This is done by 7141 updating all the pointers they include in the process. This is done by
7315 using external data descriptions that give information about the layout 7142 using external data descriptions that give information about the layout
7316 of the structures in memory. 7143 of the blocks in memory.
7317 7144
7318 The specification of these descriptions is in lrecord.h. A description 7145 The specification of these descriptions is in lrecord.h. A description
7319 of an lrecord is an array of struct lrecord_description. Each of these 7146 of an lrecord is an array of struct memory_description. Each of these
7320 structs include a type, an offset in the structure and some optional 7147 structs include a type, an offset in the block and some optional
7321 parameters depending on the type. For instance, here is the string 7148 parameters depending on the type. For instance, here is the string
7322 description: 7149 description:
7323 7150
7324 @example 7151 @example
7325 static const struct lrecord_description string_description[] = @{ 7152 static const struct memory_description string_description[] = @{
7326 @{ XD_BYTECOUNT, offsetof (Lisp_String, size) @}, 7153 @{ XD_BYTECOUNT, offsetof (Lisp_String, size) @},
7327 @{ XD_OPAQUE_DATA_PTR, offsetof (Lisp_String, data), XD_INDIRECT(0, 1) @}, 7154 @{ XD_OPAQUE_DATA_PTR, offsetof (Lisp_String, data), XD_INDIRECT(0, 1) @},
7328 @{ XD_LISP_OBJECT, offsetof (Lisp_String, plist) @}, 7155 @{ XD_LISP_OBJECT, offsetof (Lisp_String, plist) @},
7329 @{ XD_END @} 7156 @{ XD_END @}
7330 @}; 7157 @};
7337 in the 0th line of the description (welcome to C) plus one". The third 7164 in the 0th line of the description (welcome to C) plus one". The third
7338 line means "there is a Lisp_Object member @code{plist} in the Lisp_String 7165 line means "there is a Lisp_Object member @code{plist} in the Lisp_String
7339 structure". @code{XD_END} then ends the description. 7166 structure". @code{XD_END} then ends the description.
7340 7167
7341 This gives us all the information we need to move around what is pointed 7168 This gives us all the information we need to move around what is pointed
7342 to by a structure (C or lrecord) and, by transitivity, everything that 7169 to by a memory block (C or lrecord) and, by transitivity, everything
7343 it points to. The only missing information for dumping is the size of 7170 that it points to. The only missing information for dumping is the size
7344 the structure. For lrecords, this is part of the 7171 of the block. For lrecords, this is part of the
7345 lrecord_implementation, so we don't need to duplicate it. For C 7172 lrecord_implementation, so we don't need to duplicate it. For C blocks
7346 structures we use a struct struct_description, which includes a size 7173 we use a struct sized_memory_description, which includes a size field
7347 field and a pointer to an associated array of lrecord_description. 7174 and a pointer to an associated array of memory_description.
7348 7175
7349 @node Dumping phase 7176 @node Dumping phase, Reloading phase, Data descriptions, Dumping
7350 @section Dumping phase 7177 @section Dumping phase
7351 @cindex dumping phase 7178 @cindex dumping phase
7352 7179
7353 Dumping is done by calling the function pdump() (in dumper.c) which is 7180 Dumping is done by calling the function @code{pdump()} (in @file{dumper.c}) which is
7354 invoked from Fdump_emacs (in emacs.c). This function performs a number 7181 invoked from Fdump_emacs (in @file{emacs.c}). This function performs a number
7355 of tasks. 7182 of tasks.
7356 7183
7357 @menu 7184 @menu
7358 * Object inventory:: 7185 * Object inventory::
7359 * Address allocation:: 7186 * Address allocation::
7360 * The header:: 7187 * The header::
7361 * Data dumping:: 7188 * Data dumping::
7362 * Pointers dumping:: 7189 * Pointers dumping::
7363 @end menu 7190 @end menu
7364 7191
7365 @node Object inventory 7192 @node Object inventory, Address allocation, Dumping phase, Dumping phase
7366 @subsection Object inventory 7193 @subsection Object inventory
7367 @cindex dumping object inventory 7194 @cindex dumping object inventory
7195 @cindex memory blocks
7368 7196
7369 The first task is to build the list of the objects to dump. This 7197 The first task is to build the list of the objects to dump. This
7370 includes: 7198 includes:
7371 7199
7372 @itemize @bullet 7200 @itemize @bullet
7373 @item lisp objects 7201 @item lisp objects
7374 @item C structures 7202 @item other memory blocks (C structures, arrays. etc)
7375 @end itemize 7203 @end itemize
7376 7204
7377 We end up with one @code{pdump_entry_list_elmt} per object group (arrays 7205 We end up with one @code{pdump_block_list_elt} per object group (arrays
7378 of C structs are kept together) which includes a pointer to the first 7206 of C structs are kept together) which includes a pointer to the first
7379 object of the group, the per-object size and the count of objects in the 7207 object of the group, the per-object size and the count of objects in the
7380 group, along with some other information which is initialized later. 7208 group, along with some other information which is initialized later.
7381 7209
7382 These entries are linked together in @code{pdump_entry_list} structures 7210 These entries are linked together in @code{pdump_block_list} structures
7383 and can be enumerated thru either: 7211 and can be enumerated thru either:
7384 7212
7385 @enumerate 7213 @enumerate
7386 @item 7214 @item
7387 the @code{pdump_object_table}, an array of @code{pdump_entry_list}, one 7215 the @code{pdump_object_table}, an array of @code{pdump_block_list}, one
7388 per lrecord type, indexed by type number. 7216 per lrecord type, indexed by type number.
7389 7217
7390 @item 7218 @item
7391 the @code{pdump_opaque_data_list}, used for the opaque data which does 7219 the @code{pdump_opaque_data_list}, used for the opaque data which does
7392 not include pointers, and hence does not need descriptions. 7220 not include pointers, and hence does not need descriptions.
7393 7221
7394 @item 7222 @item
7395 the @code{pdump_struct_table}, which is a vector of 7223 the @code{pdump_desc_table}, which is a vector of
7396 @code{struct_description}/@code{pdump_entry_list} pairs, used for 7224 @code{memory_description}/@code{pdump_block_list} pairs, used for
7397 non-opaque C structures. 7225 non-opaque C memory blocks.
7398 @end enumerate 7226 @end enumerate
7399 7227
7400 This uses a marking strategy similar to the garbage collector. Some 7228 This uses a marking strategy similar to the garbage collector. Some
7401 differences though: 7229 differences though:
7402 7230
7403 @enumerate 7231 @enumerate
7404 @item 7232 @item
7405 We do not use the mark bit (which does not exist for C structures 7233 We do not use the mark bit (which does not exist for generic memory blocks
7406 anyway); we use a big hash table instead. 7234 anyway); we use a big hash table instead.
7407 7235
7408 @item 7236 @item
7409 We do not use the mark function of lrecords but instead rely on the 7237 We do not use the mark function of lrecords but instead rely on the
7410 external descriptions. This happens essentially because we need to 7238 external descriptions. This happens essentially because we need to
7411 follow pointers to C structures and opaque data in addition to 7239 follow pointers to generic memory blocks and opaque data in addition to
7412 Lisp_Object members. 7240 Lisp_Object members.
7413 @end enumerate 7241 @end enumerate
7414 7242
7415 This is done by @code{pdump_register_object()}, which handles Lisp_Object 7243 This is done by @code{pdump_register_object()}, which handles
7416 variables, and @code{pdump_register_struct()} which handles C structures, 7244 Lisp_Object variables, and @code{pdump_register_block()} which handles
7417 which both delegate the description management to @code{pdump_register_sub()}. 7245 generic memory blocks (C structures, arrays, etc.), which both delegate
7418 7246 the description management to @code{pdump_register_sub()}.
7419 The hash table doubles as a map object to pdump_entry_list_elmt (i.e. 7247
7420 allows us to look up a pdump_entry_list_elmt with the object it points 7248 The hash table doubles as a map object to pdump_block_list_elmt (i.e.
7421 to). Entries are added with @code{pdump_add_entry()} and looked up with 7249 allows us to look up a pdump_block_list_elmt with the object it points
7422 @code{pdump_get_entry()}. There is no need for entry removal. The hash 7250 to). Entries are added with @code{pdump_add_block()} and looked up with
7251 @code{pdump_get_block()}. There is no need for entry removal. The hash
7423 value is computed quite simply from the object pointer by 7252 value is computed quite simply from the object pointer by
7424 @code{pdump_make_hash()}. 7253 @code{pdump_make_hash()}.
7425 7254
7426 The roots for the marking are: 7255 The roots for the marking are:
7427 7256
7428 @enumerate 7257 @enumerate
7429 @item 7258 @item
7430 the @code{staticpro}'ed variables (there is a special @code{staticpro_nodump()} 7259 the @code{staticpro}'ed variables (there is a special
7431 call for protected variables we do not want to dump). 7260 @code{staticpro_nodump()} call for protected variables we do not want to
7432 7261 dump).
7433 @item 7262
7434 the variables registered via @code{dump_add_root_object} 7263 @item
7264 the Lisp_Object variables registered via @code{dump_add_root_lisp_object}
7435 (@code{staticpro()} is equivalent to @code{staticpro_nodump()} + 7265 (@code{staticpro()} is equivalent to @code{staticpro_nodump()} +
7436 @code{dump_add_root_object()}). 7266 @code{dump_add_root_lisp_object()}).
7437 7267
7438 @item 7268 @item
7439 the variables registered via @code{dump_add_root_struct_ptr}, each of 7269 the data-segment memory blocks registered via @code{dump_add_root_block}
7440 which points to a C structure. 7270 (for blocks with relocatable pointers), or @code{dump_add_opaque} (for
7271 "opaque" blocks with no relocatable pointers; this is just a shortcut
7272 for calling @code{dump_add_root_block} with a NULL description).
7273
7274 @item
7275 the pointer variables registered via @code{dump_add_root_block_ptr},
7276 each of which points to a block of heap memory (generally a C structure
7277 or array). Note that @code{dump_add_root_block_ptr} is not technically
7278 necessary, as a pointer variable can be seen as a special case of a
7279 data-segment memory block and registered using
7280 @code{dump_add_root_block}. Doing it this way, however, would require
7281 another level of static structures declared. Since pointer variables
7282 are quite common, @code{dump_add_root_block_ptr} is provided for
7283 convenience. Note also that internally we have to treat it separately
7284 from @code{dump_add_root_block} rather than writing the former as a call
7285 to the latter, since we don't have support for creating and using memory
7286 descriptions on the fly -- they must all be statically declared in the
7287 data-segment.
7441 @end enumerate 7288 @end enumerate
7442 7289
7443 This does not include the GCPRO'ed variables, the specbinds, the 7290 This does not include the GCPRO'ed variables, the specbinds, the
7444 catchtags, the backlist, the redisplay or the profiling info, since we 7291 catchtags, the backlist, the redisplay or the profiling info, since we
7445 do not want to rebuild the actual chain of lisp calls which end up to 7292 do not want to rebuild the actual chain of lisp calls which end up to
7447 7294
7448 Weak lists and weak hash tables are dumped as if they were their 7295 Weak lists and weak hash tables are dumped as if they were their
7449 non-weak equivalent (without changing their type, of course). This has 7296 non-weak equivalent (without changing their type, of course). This has
7450 not yet been a problem. 7297 not yet been a problem.
7451 7298
7452 @node Address allocation 7299 @node Address allocation, The header, Object inventory, Dumping phase
7453 @subsection Address allocation 7300 @subsection Address allocation
7454 @cindex dumping address allocation 7301 @cindex dumping address allocation
7455 7302
7456 7303
7457 The next step is to allocate the offsets of each of the objects in the 7304 The next step is to allocate the offsets of each of the objects in the
7476 @end enumerate 7323 @end enumerate
7477 7324
7478 Hence, for each lrecord type, C struct type or opaque data block the 7325 Hence, for each lrecord type, C struct type or opaque data block the
7479 alignment requirement is computed as a power of two, with a minimum of 7326 alignment requirement is computed as a power of two, with a minimum of
7480 2^2 for lrecords. @code{pdump_scan_by_alignment()} then scans all the 7327 2^2 for lrecords. @code{pdump_scan_by_alignment()} then scans all the
7481 @code{pdump_entry_list_elmt}'s, the ones with the highest requirements 7328 @code{pdump_block_list_elmt}'s, the ones with the highest requirements
7482 first. This ensures the best packing. 7329 first. This ensures the best packing.
7483 7330
7484 The maximum alignment requirement we take into account is 2^8. 7331 The maximum alignment requirement we take into account is 2^8.
7485 7332
7486 @code{pdump_allocate_offset()} only has to do a linear allocation, 7333 @code{pdump_allocate_offset()} only has to do a linear allocation,
7487 starting at offset 256 (this leaves room for the header and keeps the 7334 starting at offset 256 (this leaves room for the header and keeps the
7488 alignments happy). 7335 alignments happy).
7489 7336
7490 @node The header 7337 @node The header, Data dumping, Address allocation, Dumping phase
7491 @subsection The header 7338 @subsection The header
7492 @cindex dumping, the header 7339 @cindex dumping, the header
7493 7340
7494 The next step creates the file and writes a header with a signature and 7341 The next step creates the file and writes a header with a signature and
7495 some random information in it. The @code{reloc_address} field, which 7342 some random information in it. The @code{reloc_address} field, which
7496 indicates at which address the file should be loaded if we want to avoid 7343 indicates at which address the file should be loaded if we want to avoid
7497 post-reload relocation, is set to 0. It then seeks to offset 256 (base 7344 post-reload relocation, is set to 0. It then seeks to offset 256 (base
7498 offset for the objects). 7345 offset for the objects).
7499 7346
7500 @node Data dumping 7347 @node Data dumping, Pointers dumping, The header, Dumping phase
7501 @subsection Data dumping 7348 @subsection Data dumping
7502 @cindex data dumping 7349 @cindex data dumping
7503 @cindex dumping, data 7350 @cindex dumping, data
7504 7351
7505 The data is dumped in the same order as the addresses were allocated by 7352 The data is dumped in the same order as the addresses were allocated by
7509 Allocation, and writes it to the file. Using the same order means that, 7356 Allocation, and writes it to the file. Using the same order means that,
7510 if we are careful with lrecords whose size is not a multiple of 4, we 7357 if we are careful with lrecords whose size is not a multiple of 4, we
7511 are ensured that the object is always written at the offset in the file 7358 are ensured that the object is always written at the offset in the file
7512 allocated in step Address Allocation. 7359 allocated in step Address Allocation.
7513 7360
7514 @node Pointers dumping 7361 @node Pointers dumping, , Data dumping, Dumping phase
7515 @subsection Pointers dumping 7362 @subsection Pointers dumping
7516 @cindex pointers dumping 7363 @cindex pointers dumping
7517 @cindex dumping, pointers 7364 @cindex dumping, pointers
7518 7365
7519 A bunch of tables needed to reassign properly the global pointers are 7366 A bunch of tables needed to reassign properly the global pointers are
7520 then written. They are: 7367 then written. They are:
7521 7368
7522 @enumerate 7369 @enumerate
7523 @item 7370 @item
7524 the pdump_root_struct_ptrs dynarr 7371 the pdump_root_block_ptrs dynarr
7525 @item 7372 @item
7526 the pdump_opaques dynarr 7373 the pdump_opaques dynarr
7527 @item 7374 @item
7528 a vector of all the offsets to the objects in the file that include a 7375 a vector of all the offsets to the objects in the file that include a
7529 description (for faster relocation at reload time) 7376 description (for faster relocation at reload time)
7544 reason why they are not used as roots for the purpose of object 7391 reason why they are not used as roots for the purpose of object
7545 enumeration. 7392 enumeration.
7546 7393
7547 Some very important information like the @code{staticpros} and 7394 Some very important information like the @code{staticpros} and
7548 @code{lrecord_implementations_table} are handled indirectly using 7395 @code{lrecord_implementations_table} are handled indirectly using
7549 @code{dump_add_opaque} or @code{dump_add_root_struct_ptr}. 7396 @code{dump_add_opaque} or @code{dump_add_root_block_ptr}.
7550 7397
7551 This is the end of the dumping part. 7398 This is the end of the dumping part.
7552 7399
7553 @node Reloading phase 7400 @node Reloading phase, Remaining issues, Dumping phase, Dumping
7554 @section Reloading phase 7401 @section Reloading phase
7555 @cindex reloading phase 7402 @cindex reloading phase
7556 @cindex dumping, reloading phase 7403 @cindex dumping, reloading phase
7557 7404
7558 @subsection File loading 7405 @subsection File loading
7572 @cindex dumping, putting back the pdump_opaques 7419 @cindex dumping, putting back the pdump_opaques
7573 7420
7574 The memory contents are restored in the obvious and trivial way. 7421 The memory contents are restored in the obvious and trivial way.
7575 7422
7576 7423
7577 @subsection Putting back the pdump_root_struct_ptrs 7424 @subsection Putting back the pdump_root_block_ptrs
7578 @cindex dumping, putting back the pdump_root_struct_ptrs 7425 @cindex dumping, putting back the pdump_root_block_ptrs
7579 7426
7580 The variables pointed to by pdump_root_struct_ptrs in the dump phase are 7427 The variables pointed to by pdump_root_block_ptrs in the dump phase are
7581 reset to the right relocated object addresses. 7428 reset to the right relocated object addresses.
7582 7429
7583 7430
7584 @subsection Object relocation 7431 @subsection Object relocation
7585 @cindex dumping, object relocation 7432 @cindex dumping, object relocation
7590 7437
7591 7438
7592 @subsection Putting back the pdump_root_objects and pdump_weak_object_chains 7439 @subsection Putting back the pdump_root_objects and pdump_weak_object_chains
7593 @cindex dumping, putting back the pdump_root_objects and pdump_weak_object_chains 7440 @cindex dumping, putting back the pdump_root_objects and pdump_weak_object_chains
7594 7441
7595 Same as Putting back the pdump_root_struct_ptrs. 7442 Same as Putting back the pdump_root_block_ptrs.
7596 7443
7597 7444
7598 @subsection Reorganize the hash tables 7445 @subsection Reorganize the hash tables
7599 @cindex dumping, reorganize the hash tables 7446 @cindex dumping, reorganize the hash tables
7600 7447
7601 Since some of the hash values in the lisp hash tables are 7448 Since some of the hash values in the lisp hash tables are
7602 address-dependent, their layout is now wrong. So we go through each of 7449 address-dependent, their layout is now wrong. So we go through each of
7603 them and have them resorted by calling @code{pdump_reorganize_hash_table}. 7450 them and have them resorted by calling @code{pdump_reorganize_hash_table}.
7604 7451
7605 @node Remaining issues 7452 @node Remaining issues, , Reloading phase, Dumping
7606 @section Remaining issues 7453 @section Remaining issues
7607 @cindex dumping, remaining issues 7454 @cindex dumping, remaining issues
7608 7455
7609 The build process will have to start a post-dump xemacs, ask it the 7456 The build process will have to start a post-dump xemacs, ask it the
7610 loading address (which will, hopefully, be always the same between 7457 loading address (which will, hopefully, be always the same between
7622 on the same system (mule and no-mule comes to mind). 7469 on the same system (mule and no-mule comes to mind).
7623 7470
7624 The DOC file contents should probably end up in the dump file. 7471 The DOC file contents should probably end up in the dump file.
7625 7472
7626 7473
7627 @node Events and the Event Loop, Evaluation; Stack Frames; Bindings, Dumping, Top 7474 @node Events and the Event Loop, Asynchronous Events; Quit Checking, Dumping, Top
7628 @chapter Events and the Event Loop 7475 @chapter Events and the Event Loop
7629 @cindex events and the event loop 7476 @cindex events and the event loop
7630 @cindex event loop, events and the 7477 @cindex event loop, events and the
7631 7478
7632 @menu 7479 @menu
7633 * Introduction to Events:: 7480 * Introduction to Events::
7634 * Main Loop:: 7481 * Main Loop::
7635 * Specifics of the Event Gathering Mechanism:: 7482 * Specifics of the Event Gathering Mechanism::
7636 * Specifics About the Emacs Event:: 7483 * Specifics About the Emacs Event::
7637 * The Event Stream Callback Routines:: 7484 * Event Queues::
7638 * Other Event Loop Functions:: 7485 * Event Stream Callback Routines::
7639 * Converting Events:: 7486 * Other Event Loop Functions::
7640 * Dispatching Events; The Command Builder:: 7487 * Stream Pairs::
7488 * Converting Events::
7489 * Dispatching Events; The Command Builder::
7490 * Focus Handling::
7491 * Editor-Level Control Flow Modules::
7641 @end menu 7492 @end menu
7642 7493
7643 @node Introduction to Events 7494 @node Introduction to Events, Main Loop, Events and the Event Loop, Events and the Event Loop
7644 @section Introduction to Events 7495 @section Introduction to Events
7645 @cindex events, introduction to 7496 @cindex events, introduction to
7646 7497
7647 An event is an object that encapsulates information about an 7498 An event is an object that encapsulates information about an
7648 interesting occurrence in the operating system. Events are 7499 interesting occurrence in the operating system. Events are
7678 Emacs events---there may not be a one-to-one correspondence. 7529 Emacs events---there may not be a one-to-one correspondence.
7679 7530
7680 Emacs events are documented in @file{events.h}; I'll discuss them 7531 Emacs events are documented in @file{events.h}; I'll discuss them
7681 later. 7532 later.
7682 7533
7683 @node Main Loop 7534 @node Main Loop, Specifics of the Event Gathering Mechanism, Introduction to Events, Events and the Event Loop
7684 @section Main Loop 7535 @section Main Loop
7685 @cindex main loop 7536 @cindex main loop
7686 @cindex events, main loop 7537 @cindex events, main loop
7687 7538
7688 The @dfn{command loop} is the top-level loop that the editor is always 7539 The @dfn{command loop} is the top-level loop that the editor is always
7747 wrapper similar to @code{command_loop_2()}. Note also that 7598 wrapper similar to @code{command_loop_2()}. Note also that
7748 @code{initial_command_loop()} sets up a catch for @code{top-level} when 7599 @code{initial_command_loop()} sets up a catch for @code{top-level} when
7749 invoking @code{top_level_1()}, just like when it invokes 7600 invoking @code{top_level_1()}, just like when it invokes
7750 @code{command_loop_2()}. 7601 @code{command_loop_2()}.
7751 7602
7752 @node Specifics of the Event Gathering Mechanism 7603 @node Specifics of the Event Gathering Mechanism, Specifics About the Emacs Event, Main Loop, Events and the Event Loop
7753 @section Specifics of the Event Gathering Mechanism 7604 @section Specifics of the Event Gathering Mechanism
7754 @cindex event gathering mechanism, specifics of the 7605 @cindex event gathering mechanism, specifics of the
7755 7606
7756 Here is an approximate diagram of the collection processes 7607 Here is an approximate diagram of the collection processes
7757 at work in XEmacs, under TTY's (TTY's are simpler than X 7608 at work in XEmacs, under TTY's (TTY's are simpler than X
7778 | | | | | | 7629 | | | | | |
7779 V V V V V V 7630 V V V V V V
7780 ------>-----------<----------------<---------------- 7631 ------>-----------<----------------<----------------
7781 | 7632 |
7782 | 7633 |
7783 | [collected using select() in emacs_tty_next_event() 7634 | [collected using @code{select()} in @code{emacs_tty_next_event()}
7784 | and converted to the appropriate Emacs event] 7635 | and converted to the appropriate Emacs event]
7785 | 7636 |
7786 | 7637 |
7787 V (above this line is TTY-specific) 7638 V (above this line is TTY-specific)
7788 Emacs ----------------------------------------------- 7639 Emacs -----------------------------------------------
7789 event (below this line is the generic event mechanism) 7640 event (below this line is the generic event mechanism)
7790 | 7641 |
7791 | 7642 |
7792 was there if not, call 7643 was there if not, call
7793 a SIGINT? emacs_tty_next_event() 7644 a SIGINT? @code{emacs_tty_next_event()}
7794 | | 7645 | |
7795 | | 7646 | |
7796 | | 7647 | |
7797 V V 7648 V V
7798 --->------<---- 7649 --->------<----
7799 | 7650 |
7800 | [collected in event_stream_next_event(); 7651 | [collected in @code{event_stream_next_event()};
7801 | SIGINT is converted using maybe_read_quit_event()] 7652 | SIGINT is converted using @code{maybe_read_quit_event()}]
7802 V 7653 V
7803 Emacs 7654 Emacs
7804 event 7655 event
7805 | 7656 |
7806 \---->------>----- maybe_kbd_translate() ---->---\ 7657 \---->------>----- maybe_kbd_translate() ---->---\
7808 | 7659 |
7809 | 7660 |
7810 command event queue | 7661 command event queue |
7811 if not from command 7662 if not from command
7812 (contains events that were event queue, call 7663 (contains events that were event queue, call
7813 read earlier but not processed, event_stream_next_event() 7664 read earlier but not processed, @code{event_stream_next_event()}
7814 typically when waiting in a | 7665 typically when waiting in a |
7815 sit-for, sleep-for, etc. for | 7666 sit-for, sleep-for, etc. for |
7816 a particular event to be received) | 7667 a particular event to be received) |
7817 | | 7668 | |
7818 | | 7669 | |
7819 V V 7670 V V
7820 ---->------------------------------------<---- 7671 ---->------------------------------------<----
7821 | 7672 |
7822 | [collected in 7673 | [collected in
7823 | next_event_internal()] 7674 | @code{next_event_internal()}]
7824 | 7675 |
7825 unread- unread- event from | 7676 unread- unread- event from |
7826 command- command- keyboard else, call 7677 command- command- keyboard else, call
7827 events event macro next_event_internal() 7678 events event macro @code{next_event_internal()}
7828 | | | | 7679 | | | |
7829 | | | | 7680 | | | |
7830 | | | | 7681 | | | |
7831 V V V V 7682 V V V V
7832 --------->----------------------<------------ 7683 --------->----------------------<------------
7833 | 7684 |
7834 | [collected in `next-event', which may loop 7685 | [collected in @code{next-event}, which may loop
7835 | more than once if the event it gets is on 7686 | more than once if the event it gets is on
7836 | a dead frame, device, etc.] 7687 | a dead frame, device, etc.]
7837 | 7688 |
7838 | 7689 |
7839 V 7690 V
7840 feed into top-level event loop, 7691 feed into top-level event loop,
7841 which repeatedly calls `next-event' 7692 which repeatedly calls @code{next-event}
7842 and then dispatches the event 7693 and then dispatches the event
7843 using `dispatch-event' 7694 using @code{dispatch-event}
7844 @end example 7695 @end example
7845 7696
7846 Notice the separation between TTY-specific and generic event mechanism. 7697 Notice the separation between TTY-specific and generic event mechanism.
7847 When using the Xt-based event loop, the TTY-specific stuff is replaced 7698 When using the Xt-based event loop, the TTY-specific stuff is replaced
7848 but the rest stays the same. 7699 but the rest stays the same.
7887 | | | | | | | | 7738 | | | | | | | |
7888 | | | | | | | | 7739 | | | | | | | |
7889 V V V V V V V V 7740 V V V V V V V V
7890 --->----------------------------------------<---------<------ 7741 --->----------------------------------------<---------<------
7891 | | | 7742 | | |
7892 | | |[collected using select() in 7743 | | |[collected using @code{select()} in
7893 | | | _XtWaitForSomething(), called 7744 | | | @code{_XtWaitForSomething()}, called
7894 | | | from XtAppProcessEvent(), called 7745 | | | from @code{XtAppProcessEvent()}, called
7895 | | | in emacs_Xt_next_event(); 7746 | | | in @code{emacs_Xt_next_event()};
7896 | | | dispatched to various callbacks] 7747 | | | dispatched to various callbacks]
7897 | | | 7748 | | |
7898 | | | 7749 | | |
7899 emacs_Xt_ p_s_callback(), | [popup_selection_callback] 7750 emacs_Xt_ p_s_callback(), | [popup_selection_callback]
7900 event_handler() x_u_v_s_callback(),| [x_update_vertical_scrollbar_ 7751 event_handler() x_u_v_s_callback(),| [x_update_vertical_scrollbar_
7914 | | | 7765 | | |
7915 V V | 7766 V V |
7916 -->----------<-- | 7767 -->----------<-- |
7917 | | 7768 | |
7918 | | 7769 | |
7919 dispatch Xt_what_callback() 7770 dispatch @code{Xt_what_callback()}
7920 event sets flags 7771 event sets flags
7921 queue | 7772 queue |
7922 | | 7773 | |
7923 | | 7774 | |
7924 | | 7775 | |
7925 | | 7776 | |
7926 ---->-----------<-------- 7777 ---->-----------<--------
7927 | 7778 |
7928 | 7779 |
7929 | [collected and converted as appropriate in 7780 | [collected and converted as appropriate in
7930 | emacs_Xt_next_event()] 7781 | @code{emacs_Xt_next_event()}]
7931 | 7782 |
7932 | 7783 |
7933 V (above this line is Xt-specific) 7784 V (above this line is Xt-specific)
7934 Emacs ------------------------------------------------ 7785 Emacs ------------------------------------------------
7935 event (below this line is the generic event mechanism) 7786 event (below this line is the generic event mechanism)
7936 | 7787 |
7937 | 7788 |
7938 was there if not, call 7789 was there if not, call
7939 a SIGINT? emacs_Xt_next_event() 7790 a SIGINT? @code{emacs_Xt_next_event()}
7940 | | 7791 | |
7941 | | 7792 | |
7942 | | 7793 | |
7943 V V 7794 V V
7944 --->-------<---- 7795 --->-------<----
7945 | 7796 |
7946 | [collected in event_stream_next_event(); 7797 | [collected in @code{event_stream_next_event()};
7947 | SIGINT is converted using maybe_read_quit_event()] 7798 | SIGINT is converted using @code{maybe_read_quit_event()}]
7948 V 7799 V
7949 Emacs 7800 Emacs
7950 event 7801 event
7951 | 7802 |
7952 \---->------>----- maybe_kbd_translate() -->-----\ 7803 \---->------>----- maybe_kbd_translate() -->-----\
7954 | 7805 |
7955 | 7806 |
7956 command event queue | 7807 command event queue |
7957 if not from command 7808 if not from command
7958 (contains events that were event queue, call 7809 (contains events that were event queue, call
7959 read earlier but not processed, event_stream_next_event() 7810 read earlier but not processed, @code{event_stream_next_event()}
7960 typically when waiting in a | 7811 typically when waiting in a |
7961 sit-for, sleep-for, etc. for | 7812 sit-for, sleep-for, etc. for |
7962 a particular event to be received) | 7813 a particular event to be received) |
7963 | | 7814 | |
7964 | | 7815 | |
7965 V V 7816 V V
7966 ---->----------------------------------<------ 7817 ---->----------------------------------<------
7967 | 7818 |
7968 | [collected in 7819 | [collected in
7969 | next_event_internal()] 7820 | @code{next_event_internal()}]
7970 | 7821 |
7971 unread- unread- event from | 7822 unread- unread- event from |
7972 command- command- keyboard else, call 7823 command- command- keyboard else, call
7973 events event macro next_event_internal() 7824 events event macro @code{next_event_internal()}
7974 | | | | 7825 | | | |
7975 | | | | 7826 | | | |
7976 | | | | 7827 | | | |
7977 V V V V 7828 V V V V
7978 --------->----------------------<------------ 7829 --------->----------------------<------------
7979 | 7830 |
7980 | [collected in `next-event', which may loop 7831 | [collected in @code{next-event}, which may loop
7981 | more than once if the event it gets is on 7832 | more than once if the event it gets is on
7982 | a dead frame, device, etc.] 7833 | a dead frame, device, etc.]
7983 | 7834 |
7984 | 7835 |
7985 V 7836 V
7986 feed into top-level event loop, 7837 feed into top-level event loop,
7987 which repeatedly calls `next-event' 7838 which repeatedly calls @code{next-event}
7988 and then dispatches the event 7839 and then dispatches the event
7989 using `dispatch-event' 7840 using @code{dispatch-event}
7990 @end example 7841 @end example
7991 7842
7992 @node Specifics About the Emacs Event 7843 @node Specifics About the Emacs Event, Event Queues, Specifics of the Event Gathering Mechanism, Events and the Event Loop
7993 @section Specifics About the Emacs Event 7844 @section Specifics About the Emacs Event
7994 @cindex event, specifics about the Lisp object 7845 @cindex event, specifics about the Lisp object
7995 7846
7996 @node The Event Stream Callback Routines 7847 @node Event Queues, Event Stream Callback Routines, Specifics About the Emacs Event, Events and the Event Loop
7997 @section The Event Stream Callback Routines 7848 @section Event Queues
7998 @cindex event stream callback routines, the 7849 @cindex event queues
7999 @cindex callback routines, the event stream 7850 @cindex queues, event
8000 7851
8001 @node Other Event Loop Functions 7852 There are two event queues here -- the command event queue (#### which
7853 should be called "deferred event queue" and is in my glyph ws) and the
7854 dispatch event queue. (MS Windows actually has an extra dispatch queue
7855 for non-user events and uses the generic one only for user events. This
7856 is because user and non-user events in Windows come through the same
7857 place -- the window procedure -- but under X, it's possible to
7858 selectively process events such that we take all the user events before
7859 the non-user ones. #### In fact, given the way we now drain the queue,
7860 we might need two separate queues, like under Windows. Need to think
7861 carefully exactly how this works, and should certainly generalize the
7862 two different queues.
7863
7864 The dispatch queue (which used to occur duplicated inside of each event
7865 implementation) is used for events that have been read from the
7866 window-system event queue(s) and not yet process by
7867 @code{next_event_internal()}. It exists for two reasons: (1) because in many
7868 implementations, events often come from the window system by way of
7869 callbacks, and need to push the event to be returned onto a queue; (2)
7870 in order to handle QUIT in a guaranteed correct fashion without
7871 resorting to weird implementation-specific hacks that may or may not
7872 work well, we need to drain the window-system event queues and then look
7873 through to see if there's an event matching quit-char (usually ^G). the
7874 drained events need to go onto a queue. (There are other, similar cases
7875 where we need to drain the pending events so we can look ahead -- for
7876 example, checking for pending expose events under X to avoid excessive
7877 server activity.)
7878
7879 The command event queue is used @strong{AFTER} an event has been read from
7880 @code{next_event_internal()}, when it needs to be pushed back. This
7881 includes, for example, @code{accept-process-output}, @code{sleep-for}
7882 and @code{wait_delaying_user_input()}. Eval events and the like,
7883 generated by @code{enqueue-eval-event},
7884 @code{enqueue_magic_eval_event()}, etc. are also pushed onto this queue.
7885 Some events generated by callbacks are also pushed onto this queue, ####
7886 although maybe shouldn't be.
7887
7888 The command queue takes precedence over the dispatch queue.
7889
7890 #### It is worth investigating to see whether both queues are really
7891 needed, and how exactly they should be used. @code{enqueue-eval-event},
7892 for example, could certainly push onto the dispatch queue, and all
7893 callbacks maybe should. @code{wait_delaying_user_input()} seems to need
7894 both queues, since it can take events from the dispatch queue and push
7895 them onto the command queue; but it perhaps could be rewritten to avoid
7896 this. #### In general we need to review the handling of these two
7897 queues, figure out exactly what ought to be happening, and document it.
7898
7899
7900 @node Event Stream Callback Routines, Other Event Loop Functions, Event Queues, Events and the Event Loop
7901 @section Event Stream Callback Routines
7902 @cindex event stream callback routines
7903 @cindex callback routines, event stream
7904
7905 There is one object called an event_stream. This object contains
7906 callback functions for doing the window-system-dependent operations
7907 that XEmacs requires.
7908
7909 If XEmacs is compiled with support for X11 and the X Toolkit, then this
7910 event_stream structure will contain functions that can cope with input
7911 on XEmacs windows on multiple displays, as well as input from dumb tty
7912 frames.
7913
7914 If it is desired to have XEmacs able to open frames on the displays of
7915 multiple heterogeneous machines, X11 and SunView, or X11 and NeXT, for
7916 example, then it will be necessary to construct an event_stream structure
7917 that can cope with the given types. Currently, the only implemented
7918 event_streams are for dumb-ttys, and for X11 plus dumb-ttys,
7919 and for mswindows.
7920
7921 To implement this for one window system is relatively simple.
7922 To implement this for multiple window systems is trickier and may
7923 not be possible in all situations, but it's been done for X and TTY.
7924
7925 Note that these callbacks are @strong{NOT} console methods; that's because
7926 the routines are not specific to a particular console type but must
7927 be able to simultaneously cope with all allowable console types.
7928
7929 The slots of the event_stream structure:
7930
7931 @table @code
7932 @item next_event_cb
7933 A function which fills in an XEmacs_event structure with the next event
7934 available. If there is no event available, then this should block.
7935
7936 IMPORTANT: timer events and especially process events *must not* be
7937 returned if there are events of other types available; otherwise you can
7938 end up with an infinite loop in @code{Fdiscard_input()}.
7939
7940 @item event_pending_cb
7941 A function which says whether there are events to be read. If called
7942 with an argument of 0, then this should say whether calling the
7943 @code{next_event_cb} will block. If called with a non-zero argument,
7944 then this should say whether there are that many user-generated events
7945 pending (that is, keypresses, mouse-clicks, dialog-box selection events,
7946 etc.). (This is used for redisplay optimization, among other things.)
7947 The difference is that the former includes process events and timer
7948 events, but the latter doesn't.
7949
7950 If this function is not sure whether there are events to be read, it
7951 @strong{must} return 0. Otherwise various undesirable effects will
7952 occur, such as redisplay not occurring until the next event occurs.
7953
7954 @item handle_magic_event_cb
7955 XEmacs calls this with an event structure which contains window-system
7956 dependent information that XEmacs doesn't need to know about, but which
7957 must happen in order. If the @code{next_event_cb} never returns an
7958 event of type "magic", this will never be used.
7959
7960 @item format_magic_event_cb
7961 Called with a magic event; print a representation of the innards of the
7962 event to @var{PSTREAM}.
7963
7964 @item compare_magic_event_cb
7965 Called with two magic events; return non-zero if the innards of the two
7966 are equal, zero otherwise.
7967
7968 @item hash_magic_event_cb
7969 Called with a magic event; return a hash of the innards of the event.
7970
7971 @item add_timeout_cb
7972 Called with an @var{EMACS_TIME}, the absolute time at which a wakeup event
7973 should be generated; and a void *, which is an arbitrary value that will
7974 be returned in the timeout event. The timeouts generated by this
7975 function should be one-shots: they fire once and then disappear. This
7976 callback should return an int id-number which uniquely identifies this
7977 wakeup. If an implementation doesn't have microseconds or millisecond
7978 granularity, it should round up to the closest value it can deal with.
7979
7980 @item remove_timeout_cb
7981 Called with an int, the id number of a wakeup to discard. This id
7982 number must have been returned by the @code{add_timeout_cb}. If the given
7983 wakeup has already expired, this should do nothing.
7984
7985 @item select_process_cb
7986 @item unselect_process_cb
7987 These callbacks tell the underlying implementation to add or remove a
7988 file descriptor from the list of fds which are polled for
7989 inferior-process input. When input becomes available on the given
7990 process connection, an event of type "process" should be generated.
7991
7992 @item select_console_cb
7993 @item unselect_console_cb
7994 These callbacks tell the underlying implementation to add or remove a
7995 console from the list of consoles which are polled for user-input.
7996
7997 @item select_device_cb
7998 @item unselect_device_cb
7999 These callbacks are used by Unixoid event loops (those that use @code{select()}
8000 and file descriptors and have a separate input fd per device).
8001
8002 @item create_io_streams_cb
8003 @item delete_io_streams_cb
8004 These callbacks are called by process code to create the input and
8005 output lstreams which are used for subprocess I/O.
8006
8007 @item quitp_cb
8008 A handler function called from the @code{QUIT} macro which should check
8009 whether the quit character has been typed. On systems with SIGIO, this
8010 will not be called unless the @code{sigio_happened} flag is true (it is set
8011 from the SIGIO handler).
8012 @end table
8013
8014 XEmacs has its own event structures, which are distinct from the event
8015 structures used by X or any other window system. It is the job of the
8016 event_stream layer to translate to this format.
8017
8018 @node Other Event Loop Functions, Stream Pairs, Event Stream Callback Routines, Events and the Event Loop
8002 @section Other Event Loop Functions 8019 @section Other Event Loop Functions
8003 @cindex event loop functions, other 8020 @cindex event loop functions, other
8004 8021
8005 @code{detect_input_pending()} and @code{input-pending-p} look for 8022 @code{detect_input_pending()} and @code{input-pending-p} look for
8006 input by calling @code{event_stream->event_pending_p} and looking in 8023 input by calling @code{event_stream->event_pending_p} and looking in
8019 @code{read-char} calls @code{next-command-event} and uses 8036 @code{read-char} calls @code{next-command-event} and uses
8020 @code{event_to_character()} to return the character equivalent. With 8037 @code{event_to_character()} to return the character equivalent. With
8021 the right kind of input method support, it is possible for (read-char) 8038 the right kind of input method support, it is possible for (read-char)
8022 to return a Kanji character. 8039 to return a Kanji character.
8023 8040
8024 @node Converting Events 8041 @node Stream Pairs, Converting Events, Other Event Loop Functions, Events and the Event Loop
8042 @section Stream Pairs
8043 @cindex stream pairs
8044 @cindex pairs, stream
8045
8046 Since there are many possible processes/event loop combinations, the
8047 event code is responsible for creating an appropriate lstream type. The
8048 process implementation does not care about that implementation.
8049
8050 The Create stream pair function is passed two void* values, which
8051 identify process-dependent 'handles'. The process implementation uses
8052 these handles to communicate with child processes. The function must be
8053 prepared to receive handle types of any process implementation. Since
8054 only one process implementation exists in a particular XEmacs
8055 configuration, preprocessing is a means of compiling in the support for
8056 the code which deals with particular handle types.
8057
8058 For example, a unixoid type loop, which relies on file descriptors, may be
8059 asked to create a pair of streams by a unix-style process implementation.
8060 In this case, the handles passed are unix file descriptors, and the code
8061 may deal with these directly. Although, the same code may be used on Win32
8062 system with X-Windows. In this case, Win32 process implementation passes
8063 handles of type HANDLE, and the @code{create_io_streams} function must call
8064 appropriate function to get file descriptors given HANDLEs, so that these
8065 descriptors may be passed to @code{XtAddInput}.
8066
8067 The handle given may have special denying value, in which case the
8068 corresponding lstream should not be created.
8069
8070 The return value of the function is a unique stream identifier. It is used
8071 by processes implementation, in its platform-independent part. There is
8072 the get_process_from_usid function, which returns process object given its
8073 USID. The event stream is responsible for converting its internal handle
8074 type into USID.
8075
8076 Example is the TTY event stream. When a file descriptor signals input, the
8077 event loop must determine process to which the input is destined. Thus,
8078 the implementation uses process input stream file descriptor as USID, by
8079 simply casting the fd value to USID type.
8080
8081 There are two special USID values. One, @code{USID_ERROR}, indicates
8082 that the stream pair cannot be created. The second,
8083 @code{USID_DONTHASH}, indicates that streams are created, but the event
8084 stream does not wish to be able to find the process by its
8085 USID. Specifically, if an event stream implementation never calls
8086 @code{get_process_from_usid}, this value should always be returned, to
8087 prevent accumulating useless information on USID to process
8088 relationship.
8089
8090 @node Converting Events, Dispatching Events; The Command Builder, Stream Pairs, Events and the Event Loop
8025 @section Converting Events 8091 @section Converting Events
8026 @cindex converting events 8092 @cindex converting events
8027 @cindex events, converting 8093 @cindex events, converting
8028 8094
8029 @code{character_to_event()}, @code{event_to_character()}, 8095 @code{character_to_event()}, @code{event_to_character()},
8032 event was not a keypress, @code{event_to_character()} returns -1 and 8098 event was not a keypress, @code{event_to_character()} returns -1 and
8033 @code{event-to-character} returns @code{nil}. These functions convert 8099 @code{event-to-character} returns @code{nil}. These functions convert
8034 between character representation and the split-up event representation 8100 between character representation and the split-up event representation
8035 (keysym plus mod keys). 8101 (keysym plus mod keys).
8036 8102
8037 @node Dispatching Events; The Command Builder 8103 @node Dispatching Events; The Command Builder, Focus Handling, Converting Events, Events and the Event Loop
8038 @section Dispatching Events; The Command Builder 8104 @section Dispatching Events; The Command Builder
8039 @cindex dispatching events; the command builder 8105 @cindex dispatching events; the command builder
8040 @cindex events; the command builder, dispatching 8106 @cindex events; the command builder, dispatching
8041 @cindex command builder, dispatching events; the 8107 @cindex command builder, dispatching events; the
8042 8108
8043 Not yet documented. 8109 Not yet documented.
8044 8110
8045 @node Evaluation; Stack Frames; Bindings, Symbols and Variables, Events and the Event Loop, Top 8111 @node Focus Handling, Editor-Level Control Flow Modules, Dispatching Events; The Command Builder, Events and the Event Loop
8112 @section Focus Handling
8113 @cindex focus handling
8114
8115 Ben's capsule lecture on focus:
8116
8117 In GNU Emacs @code{select-frame} never changes the window-manager frame
8118 focus. All it does is change the "selected frame". This is similar to
8119 what happens when we call @code{select-device} or @code{select-console}.
8120 Whenever an event comes in (including a keyboard event), its frame is
8121 selected; therefore, evaluating @code{select-frame} in @samp{*scratch*}
8122 won't cause any effects because the next received event (in the same
8123 frame) will cause a switch back to the frame displaying
8124 @samp{*scratch*}.
8125
8126 Whenever a focus-change event is received from the window manager, it
8127 generates a @code{switch-frame} event, which causes the Lisp function
8128 @code{handle-switch-frame} to get run. This basically just runs
8129 @code{select-frame} (see below, however).
8130
8131 In GNU Emacs, if you want to have an operation run when a frame is
8132 selected, you supply an event binding for @code{switch-frame} (and then
8133 maybe call @code{handle-switch-frame}, or something ...).
8134
8135 In XEmacs, we @strong{do} change the window-manager frame focus as a
8136 result of @code{select-frame}, but not until the next time an event is
8137 received, so that a function that momentarily changes the selected frame
8138 won't cause WM focus flashing. (#### There's something not quite right
8139 here; this is causing the wrong-cursor-focus problems that you
8140 occasionally see. But the general idea is correct.) This approach is
8141 winning for people who use the explicit-focus model, but is trickier to
8142 implement.
8143
8144 We also don't make the @code{switch-frame} event visible but instead have
8145 @code{select-frame-hook}, which is a better approach.
8146
8147 There is the problem of surrogate minibuffers, where when we enter the
8148 minibuffer, you essentially want to temporarily switch the WM focus to
8149 the frame with the minibuffer, and switch it back when you exit the
8150 minibuffer.
8151
8152 GNU Emacs solves this with the crockish @code{redirect-frame-focus},
8153 which says "for keyboard events received from FRAME, act like they're
8154 coming from FOCUS-FRAME". I think what this means is that, when a
8155 keyboard event comes in and the event manager is about to select the
8156 event's frame, if that frame has its focus redirected, the redirected-to
8157 frame is selected instead. That way, if you're in a minibufferless
8158 frame and enter the minibuffer, then all Lisp functions that run see the
8159 selected frame as the minibuffer's frame rather than the minibufferless
8160 frame you came from, so that (e.g.) your typing actually appears in the
8161 minibuffer's frame and things behave sanely.
8162
8163 There's also some weird logic that switches the redirected frame focus
8164 from one frame to another if Lisp code explicitly calls
8165 @code{select-frame} (but not if @code{handle-switch-frame} is called),
8166 and saves and restores the frame focus in window configurations,
8167 etc. etc. All of this logic is heavily @code{#if 0}'d, with lots of
8168 comments saying "No, this approach doesn't seem to work, so I'm trying
8169 this ... is it reasonable? Well, I'm not sure ..." that are a red flag
8170 indicating crockishness.
8171
8172 Because of our way of doing things, we can avoid all this crock.
8173 Keyboard events never cause a select-frame (who cares what frame they're
8174 associated with? They come from a console, only). We change the actual
8175 WM focus to a surrogate minibuffer frame, so we don't have to do any
8176 internal redirection. In order to get the focus back, I took the
8177 approach in @file{minibuf.el} of just checking to see if the frame we moved to
8178 is still the selected frame, and move back to the old one if so.
8179 Conceivably we might have to do the weird "tracking" that GNU Emacs does
8180 when @code{select-frame} is called, but I don't think so. If the
8181 selected frame moved from the minibuffer frame, then we just leave it
8182 there, figuring that someone knows what they're doing. Because we don't
8183 have any redirection recorded anywhere, it's safe to do this, and we
8184 don't end up with unwanted redirection.
8185
8186 @node Editor-Level Control Flow Modules, , Focus Handling, Events and the Event Loop
8187 @section Editor-Level Control Flow Modules
8188 @cindex control flow modules, editor-level
8189 @cindex modules, editor-level control flow
8190
8191 @example
8192 @file{event-Xt.c}
8193 @file{event-msw.c}
8194 @file{event-stream.c}
8195 @file{event-tty.c}
8196 @file{events-mod.h}
8197 @file{gpmevent.c}
8198 @file{gpmevent.h}
8199 @file{events.c}
8200 @file{events.h}
8201 @end example
8202
8203 These implement the handling of events (user input and other system
8204 notifications).
8205
8206 @file{events.c} and @file{events.h} define the @dfn{event} Lisp object
8207 type and primitives for manipulating it.
8208
8209 @file{event-stream.c} implements the basic functions for working with
8210 event queues, dispatching an event by looking it up in relevant keymaps
8211 and such, and handling timeouts; this includes the primitives
8212 @code{next-event} and @code{dispatch-event}, as well as related
8213 primitives such as @code{sit-for}, @code{sleep-for}, and
8214 @code{accept-process-output}. (@file{event-stream.c} is one of the
8215 hairiest and trickiest modules in XEmacs. Beware! You can easily mess
8216 things up here.)
8217
8218 @file{event-Xt.c} and @file{event-tty.c} implement the low-level
8219 interfaces onto retrieving events from Xt (the X toolkit) and from TTY's
8220 (using @code{read()} and @code{select()}), respectively. The event
8221 interface enforces a clean separation between the specific code for
8222 interfacing with the operating system and the generic code for working
8223 with events, by defining an API of basic, low-level event methods;
8224 @file{event-Xt.c} and @file{event-tty.c} are two different
8225 implementations of this API. To add support for a new operating system
8226 (e.g. NeXTstep), one merely needs to provide another implementation of
8227 those API functions.
8228
8229 Note that the choice of whether to use @file{event-Xt.c} or
8230 @file{event-tty.c} is made at compile time! Or at the very latest, it
8231 is made at startup time. @file{event-Xt.c} handles events for
8232 @emph{both} X and TTY frames; @file{event-tty.c} is only used when X
8233 support is not compiled into XEmacs. The reason for this is that there
8234 is only one event loop in XEmacs: thus, it needs to be able to receive
8235 events from all different kinds of frames.
8236
8237
8238
8239 @example
8240 @file{keymap.c}
8241 @file{keymap.h}
8242 @end example
8243
8244 @file{keymap.c} and @file{keymap.h} define the @dfn{keymap} Lisp object
8245 type and associated methods and primitives. (Remember that keymaps are
8246 objects that associate event descriptions with functions to be called to
8247 ``execute'' those events; @code{dispatch-event} looks up events in the
8248 relevant keymaps.)
8249
8250
8251
8252 @example
8253 @file{cmdloop.c}
8254 @end example
8255
8256 @file{cmdloop.c} contains functions that implement the actual editor
8257 command loop---i.e. the event loop that cyclically retrieves and
8258 dispatches events. This code is also rather tricky, just like
8259 @file{event-stream.c}.
8260
8261
8262
8263 @example
8264 @file{macros.c}
8265 @file{macros.h}
8266 @end example
8267
8268 These two modules contain the basic code for defining keyboard macros.
8269 These functions don't actually do much; most of the code that handles keyboard
8270 macros is mixed in with the event-handling code in @file{event-stream.c}.
8271
8272
8273
8274 @example
8275 @file{minibuf.c}
8276 @end example
8277
8278 This contains some miscellaneous code related to the minibuffer (most of
8279 the minibuffer code was moved into Lisp by Richard Mlynarik). This
8280 includes the primitives for completion (although filename completion is
8281 in @file{dired.c}), the lowest-level interface to the minibuffer (if the
8282 command loop were cleaned up, this too could be in Lisp), and code for
8283 dealing with the echo area (this, too, was mostly moved into Lisp, and
8284 the only code remaining is code to call out to Lisp or provide simple
8285 bootstrapping implementations early in temacs, before the echo-area Lisp
8286 code is loaded).
8287
8288
8289 @node Asynchronous Events; Quit Checking, Evaluation; Stack Frames; Bindings, Events and the Event Loop, Top
8290 @chapter Asynchronous Events; Quit Checking
8291 @cindex asynchronous events; quit checking
8292 @cindex asynchronous events
8293
8294 @menu
8295 * Signal Handling::
8296 * Control-G (Quit) Checking::
8297 * Profiling::
8298 * Asynchronous Timeouts::
8299 * Exiting::
8300 @end menu
8301
8302 @node Signal Handling, Control-G (Quit) Checking, Asynchronous Events; Quit Checking, Asynchronous Events; Quit Checking
8303 @section Signal Handling
8304 @cindex signal handling
8305
8306 @node Control-G (Quit) Checking, Profiling, Signal Handling, Asynchronous Events; Quit Checking
8307 @section Control-G (Quit) Checking
8308 @cindex Control-g checking
8309 @cindex C-g checking
8310 @cindex quit checking
8311 @cindex QUIT checking
8312 @cindex critical quit
8313
8314 @emph{Note}: The code to handle QUIT is divided between @file{lisp.h}
8315 and @file{signal.c}. There is also some special-case code in the async
8316 timer code in @file{event-stream.c} to notice when the poll-for-quit
8317 (and poll-for-sigchld) timers have gone off.
8318
8319 Here's an overview of how this convoluted stuff works:
8320
8321 @enumerate
8322 @item
8323
8324 Scattered throughout the XEmacs core code are calls to the macro QUIT;
8325 This macro checks to see whether a @kbd{C-g} has recently been pressed
8326 and not yet handled, and if so, it handles the @kbd{C-g} by calling
8327 @code{signal_quit()}, which invokes the standard @code{Fsignal()} code,
8328 with the error being @code{Qquit}. Lisp code can establish handlers
8329 for this (using @code{condition-case}), but normally there is no
8330 handler, and so execution is thrown back to the innermost enclosing
8331 event loop. (One of the things that happens when entering an event loop
8332 is that a @code{condition-case} is established that catches @strong{all} calls
8333 to @code{signal}, including this one.)
8334
8335 @item
8336 How does the QUIT macro check to see whether @kbd{C-g} has been pressed;
8337 obviously this needs to be extremely fast. Now for some history.
8338 In early Lemacs as inherited from the FSF going back 15 years or
8339 more, there was a great fondness for using SIGIO (which is sent
8340 whenever there is I/O available on a given socket, tty, etc.).
8341 In fact, in GNU Emacs, perhaps even today, all reading of events
8342 from the X server occurs inside the SIGIO handler! This is crazy,
8343 but not completely relevant. What is relevant is that similar
8344 stuff happened inside the SIGIO handler for @kbd{C-g}: it searched
8345 through all the pending (i.e. not yet delivered to XEmacs yet)
8346 X events for one that matched @kbd{C-g}. When it saw a match, it set
8347 Vquit_flag to Qt. On TTY's, @kbd{C-g} is actually mapped to be the
8348 interrupt character (i.e. it generates SIGINT), and XEmacs's
8349 handler for this signal sets Vquit_flag to Qt. Then, sometime
8350 later after the signal handlers finished and a QUIT macro was
8351 called, the macro noticed the setting of @code{Vquit_flag} and used
8352 this as an indication to call @code{signal_quit()}. What @code{signal_quit()}
8353 actually does is set @code{Vquit_flag} to Qnil (so that we won't get
8354 repeated interruptions from a single @kbd{C-g} press) and then calls
8355 the equivalent of (signal 'quit nil).
8356
8357 @item
8358 Another complication is introduced in that Vquit_flag is actually
8359 exported to Lisp as @code{quit-flag}. This allows users some level of
8360 control over whether and when @kbd{C-g} is processed as quit, esp. in
8361 combination with @code{inhibit-quit}. This is another Lisp variable,
8362 and if set to non-nil, it inhibits @code{signal_quit()} from getting
8363 called, meaning that the @kbd{C-g} gets essentially ignored. But not
8364 completely: Because the resetting of @code{quit-flag} happens only
8365 in @code{signal_quit()}, which isn't getting called, the @kbd{C-g} press is
8366 still noticed, and as soon as @code{inhibit-quit} is set back to nil,
8367 a quit will be signalled at the next QUIT macro. Thus, what
8368 @code{inhibit-quit} really does is defer quits until after the quit-
8369 inhibitted period.
8370
8371 @item
8372 Another consideration, introduced by XEmacs, is critical quitting. If
8373 you press @kbd{Control-Shift-G} instead of just @kbd{C-g},
8374 @code{quit-flag} is set to @code{critical} instead of to t. When QUIT
8375 processes this value, it @strong{ignores} the value of
8376 @code{inhibit-quit}. This allows you to quit even out of a
8377 quit-inhibitted section of code! Furthermore, when @code{signal_quit()}
8378 notices that it was invoked as a result of a critical quit, it
8379 automatically invokes the debugger (which otherwise would only happen
8380 when @code{debug-on-quit} is set to t).
8381
8382 @item
8383 Well, I explained above about how @code{quit-flag} gets set correctly,
8384 but I began with a disclaimer stating that this was the old way
8385 of doing things. What's done now? Well, first of all, the SIGIO
8386 handler (which formerly checked all pending events to see if there's
8387 a @kbd{C-g}) now does nothing but set a flag -- or actually two flags,
8388 something_happened and quit_check_signal_happened. There are two
8389 flags because the QUIT macro is now used for more than just handling
8390 QUIT; it's also used for running asynchronous timeout handlers that
8391 have recently expired, and perhaps other things. The idea here is
8392 that the QUIT macros occur extremely often in the code, but only occur
8393 at places that are relatively safe -- in particular, if an error occurs,
8394 nothing will get completely trashed.
8395
8396 @item
8397 Now, let's look at QUIT again.
8398
8399 @item
8400
8401 UNFINISHED. Note, however, that as of the point when this comment got
8402 committed to CVS (mid-2001), the interaction between reading @kbd{C-g}
8403 as an event and processing it as QUIT was overhauled to (for the first
8404 time) be understandable and actually work correctly. Now, the way
8405 things work is that if @kbd{C-g} is pressed while XEmacs is blocking at
8406 the top level, waiting for a user event, it will be read as an event;
8407 otherwise, it will cause QUIT. (This includes times when XEmacs is
8408 blocking, but not waiting for a user event,
8409 e.g. @code{accept-process-output} and
8410 @code{wait_delaying_user_events()}.) Formerly, this was supposed to
8411 happen, but didn't always due to a bizarre and broken scheme, documented
8412 in @code{next_event_internal} like this:
8413
8414 @quotation
8415 If we read a @kbd{C-g}, then set @code{quit-flag} but do not discard the
8416 @kbd{C-g}. The callers of @code{next_event_internal()} will do one of
8417 two things:
8418
8419 @enumerate
8420 @item
8421 set @code{Vquit_flag} to Qnil. (@code{next-event} does this.) This will
8422 cause the ^G to be treated as a normal keystroke.
8423
8424 @item
8425 not change @code{Vquit_flag} but attempt to enqueue the ^G, at which
8426 point it will be discarded. The next time QUIT is called, it will
8427 notice that @code{Vquit_flag} was set.
8428 @end enumerate
8429 @end quotation
8430
8431 This required weirdness in @code{enqueue_command_event_1} like this:
8432
8433 @quotation
8434 put the event on the typeahead queue, unless the event is the quit char,
8435 in which case the @code{QUIT} which will occur on the next trip through this
8436 loop is all the processing we should do - leaving it on the queue would
8437 cause the quit to be processed twice.
8438 @end quotation
8439
8440 And further weirdness elsewhere, none of which made any sense, and
8441 didn't work, because (e.g.) it required that QUIT never happen anywhere
8442 inside @code{next_event_internal()} or any callers when @kbd{C-g} should
8443 be read as a user event, which was impossible to implement in practice.
8444
8445 Now what we do is fairly simple. Callers of
8446 @code{next_event_internal()} that want @kbd{C-g} read as a user event
8447 call @code{begin_dont_check_for_quit()}. @code{next_event_internal()},
8448 when it gets a @kbd{C-g}, simply sets @code{Vquit_flag} (just as when a
8449 @kbd{C-g} is detected during the operation of @code{QUIT} or
8450 @code{QUITP}), and then tries to @code{QUIT}. This will fail if blocked
8451 by the previous call, at which point @code{next_event_internal()} will
8452 return the @kbd{C-g} as an event. To unblock things, first set
8453 @code{Vquit_flag} to nil (it was set to t when the @kbd{C-g} was read,
8454 and if we don't reset it, the next call to @code{QUIT} will quit), and
8455 then @code{unbind_to()} the depth returned by
8456 @code{begin_dont_check_for_quit()}. It makes no difference is
8457 @code{QUIT} is called a zillion times in @code{next_event_internal()} or
8458 anywhere else, because it's blocked and will never signal.
8459 @end enumerate
8460
8461 @node Profiling, Asynchronous Timeouts, Control-G (Quit) Checking, Asynchronous Events; Quit Checking
8462 @section Profiling
8463 @cindex profiling
8464 @cindex SIGPROF
8465
8466 We implement our own profiling scheme so that we can determine
8467 things like which Lisp functions are occupying the most time. Any
8468 standard OS-provided profiling works on C functions, which is
8469 not always that useful -- and inconvenient, since it requires compiling
8470 with profile info and can't be retrieved dynamically, as XEmacs is
8471 running.
8472
8473 The basic idea is simple. We set a profiling timer using setitimer
8474 (ITIMER_PROF), which generates a SIGPROF every so often. (This runs not
8475 in real time but rather when the process is executing or the system is
8476 running on behalf of the process -- at least, that is the case under
8477 Unix. Under MS Windows and Cygwin, there is no @code{setitimer()}, so we
8478 simulate it using multimedia timers, which run in real time. To make
8479 the results a bit more realistic, we ignore ticks that go off while
8480 blocking on an event wait. Note that Cygwin does provide a simulation
8481 of @code{setitimer()}, but it's in real time anyway, since Windows doesn't
8482 provide a way to have process-time timers, and furthermore, it's broken,
8483 so we don't use it.) When the signal goes off, we see what we're in, and
8484 add 1 to the count associated with that function.
8485
8486 It would be nice to use the Lisp allocation mechanism etc. to keep track
8487 of the profiling information (i.e. to use Lisp hash tables), but we
8488 can't because that's not safe -- updating the timing information happens
8489 inside of a signal handler, so we can't rely on not being in the middle
8490 of Lisp allocation, garbage collection, @code{malloc()}, etc. Trying to make
8491 it work would be much more work than it's worth. Instead we use a basic
8492 (non-Lisp) hash table, which will not conflict with garbage collection
8493 or anything else as long as it doesn't try to resize itself. Resizing
8494 itself, however (which happens as a result of a @code{puthash()}), could be
8495 deadly. To avoid this, we make sure, at points where it's safe
8496 (e.g. @code{profile_record_about_to_call()} -- recording the entry into a
8497 function call), that the table always has some breathing room in it so
8498 that no resizes will occur until at least that many items are added.
8499 This is safe because any new item to be added in the sigprof would
8500 likely have the @code{profile_record_about_to_call()} called just before it,
8501 and the breathing room is checked.
8502
8503 In general: any entry that the sigprof handler puts into the table comes
8504 from a backtrace frame (except "Processing Events at Top Level", and
8505 there's only one of those). Either that backtrace frame was added when
8506 profiling was on (in which case @code{profile_record_about_to_call()} was
8507 called and the breathing space updated), or when it was off -- and in
8508 this case, no such frames can have been added since the last time
8509 @code{start-profile} was called, so when @code{start-profile} is called we make
8510 sure there is sufficient breathing room to account for all entries
8511 currently on the stack.
8512
8513 Jan 1998: In addition to timing info, I have added code to remember call
8514 counts of Lisp funcalls. The @code{profile_increase_call_count()}
8515 function is called from @code{Ffuncall()}, and serves to add data to
8516 Vcall_count_profile_table. This mechanism is much simpler and
8517 independent of the SIGPROF-driven one. It uses the Lisp allocation
8518 mechanism normally, since it is not called from a handler. It may
8519 even be useful to provide a way to turn on only one profiling
8520 mechanism, but I haven't done so yet. --hniksic
8521
8522 Dec 2002: Total overhaul of the interface, making it sane and easier to
8523 use. --ben
8524
8525 Feb 2003: Lots of rewriting of the internal code. Add GC-consing-usage,
8526 total GC usage, and total timing to the information tracked. Track
8527 profiling overhead and allow the ability to have internal sections
8528 (e.g. internal-external conversion, byte-char conversion) that are
8529 treated like Lisp functions for the purpose of profiling. --ben
8530
8531 BEWARE: If you are modifying this file, be @strong{very} careful. Correctly
8532 implementing the "total" values is very tricky due to the possibility of
8533 recursion and of functions already on the stack when starting to
8534 profile/still on the stack when stopping.
8535
8536 @node Asynchronous Timeouts, Exiting, Profiling, Asynchronous Events; Quit Checking
8537 @section Asynchronous Timeouts
8538 @cindex asynchronous timeouts
8539
8540 @node Exiting, , Asynchronous Timeouts, Asynchronous Events; Quit Checking
8541 @section Exiting
8542 @cindex exiting
8543 @cindex crash
8544 @cindex hang
8545 @cindex core dump
8546 @cindex Armageddon
8547 @cindex exits, expected and unexpected
8548 @cindex unexpected exits
8549 @cindex expected exits
8550
8551 Ben's capsule summary about expected and unexpected exits from XEmacs.
8552
8553 Expected exits occur when the user directs XEmacs to exit, for example
8554 by pressing the close button on the only frame in XEmacs, or by typing
8555 @kbd{C-x C-c}. This runs @code{save-buffers-kill-emacs}, which saves
8556 any necessary buffers, and then exits using the primitive
8557 @code{kill-emacs}.
8558
8559 However, unexpected exits occur in a few different ways:
8560
8561 @itemize @bullet
8562 @item
8563 A memory access violation or other hardware-generated exception occurs.
8564 This is the worst possible problem to deal with, because the fault can
8565 occur while XEmacs is in any state whatsoever, even quite unstable ones.
8566 As a result, we need to be @strong{extremely} careful what we do.
8567
8568 @item
8569 We are using one X display (or if we've used more, we've closed the
8570 others already), and some hardware or other problem happens and
8571 suddenly we've lost our connection to the display. In this situation,
8572 things are not so dire as in the last one; our code itself isn't
8573 trashed, so we can continue execution as normal, after having set
8574 things up so that we can exit at the appropriate time. Our exit
8575 still needs to be of the emergency nature; we have no displays, so
8576 any attempts to use them will fail. We simply want to auto-save
8577 (the single most important thing to do during shut-down), do minimal
8578 cleanup of stuff that has an independent existence outside of XEmacs,
8579 and exit.
8580 @end itemize
8581
8582 Currently, both unexpected exit scenarios described above set
8583 @code{preparing_for_armageddon} to indicate that nonessential and possibly
8584 dangerous things should not be done, specifically:
8585
8586 @itemize @minus
8587 @item
8588 no garbage collection.
8589 @item
8590 no hooks are run.
8591 @item
8592 no messages of any sort from autosaving.
8593 @item
8594 autosaving tries harder, ignoring certain failures.
8595 @item
8596 existing frames are not deleted.
8597 @end itemize
8598
8599 (Also, all places that set @code{preparing_for_armageddon} also
8600 set @code{dont_check_for_quit}. This happens separately because it's
8601 also necessary to set other variables to make absolutely sure
8602 no quitting happens.)
8603
8604 In the first scenario above (the access violation), we also set
8605 @code{fatal_error_in_progress}. This causes more things to not happen:
8606
8607 @itemize @minus
8608 @item
8609 assertion failures do not abort.
8610 @item
8611 printing code does not do code conversion or gettext when
8612 printing to stdout/stderr.
8613 @end itemize
8614
8615 @node Evaluation; Stack Frames; Bindings, Symbols and Variables, Asynchronous Events; Quit Checking, Top
8046 @chapter Evaluation; Stack Frames; Bindings 8616 @chapter Evaluation; Stack Frames; Bindings
8047 @cindex evaluation; stack frames; bindings 8617 @cindex evaluation; stack frames; bindings
8048 @cindex stack frames; bindings, evaluation; 8618 @cindex stack frames; bindings, evaluation;
8049 @cindex bindings, evaluation; stack frames; 8619 @cindex bindings, evaluation; stack frames;
8050 8620
8051 @menu 8621 @menu
8052 * Evaluation:: 8622 * Evaluation::
8053 * Dynamic Binding; The specbinding Stack; Unwind-Protects:: 8623 * Dynamic Binding; The specbinding Stack; Unwind-Protects::
8054 * Simple Special Forms:: 8624 * Simple Special Forms::
8055 * Catch and Throw:: 8625 * Catch and Throw::
8056 @end menu 8626 @end menu
8057 8627
8058 @node Evaluation 8628 @node Evaluation, Dynamic Binding; The specbinding Stack; Unwind-Protects, Evaluation; Stack Frames; Bindings, Evaluation; Stack Frames; Bindings
8059 @section Evaluation 8629 @section Evaluation
8060 @cindex evaluation 8630 @cindex evaluation
8061 8631
8062 @code{Feval()} evaluates the form (a Lisp object) that is passed to 8632 @code{Feval()} evaluates the form (a Lisp object) that is passed to
8063 it. Note that evaluation is only non-trivial for two types of objects: 8633 it. Note that evaluation is only non-trivial for two types of objects:
8184 @code{call3()} call a function, passing it the argument(s) given (the 8754 @code{call3()} call a function, passing it the argument(s) given (the
8185 arguments are given as separate C arguments rather than being passed as 8755 arguments are given as separate C arguments rather than being passed as
8186 an array). @code{apply1()} uses @code{Fapply()} while the others use 8756 an array). @code{apply1()} uses @code{Fapply()} while the others use
8187 @code{Ffuncall()} to do the real work. 8757 @code{Ffuncall()} to do the real work.
8188 8758
8189 @node Dynamic Binding; The specbinding Stack; Unwind-Protects 8759 @node Dynamic Binding; The specbinding Stack; Unwind-Protects, Simple Special Forms, Evaluation, Evaluation; Stack Frames; Bindings
8190 @section Dynamic Binding; The specbinding Stack; Unwind-Protects 8760 @section Dynamic Binding; The specbinding Stack; Unwind-Protects
8191 @cindex dynamic binding; the specbinding stack; unwind-protects 8761 @cindex dynamic binding; the specbinding stack; unwind-protects
8192 @cindex binding; the specbinding stack; unwind-protects, dynamic 8762 @cindex binding; the specbinding stack; unwind-protects, dynamic
8193 @cindex specbinding stack; unwind-protects, dynamic binding; the 8763 @cindex specbinding stack; unwind-protects, dynamic binding; the
8194 @cindex unwind-protects, dynamic binding; the specbinding stack; 8764 @cindex unwind-protects, dynamic binding; the specbinding stack;
8242 a local-variable binding (@code{func} is 0, @code{symbol} is not 8812 a local-variable binding (@code{func} is 0, @code{symbol} is not
8243 @code{nil}, and @code{old_value} holds the old value, which is stored as 8813 @code{nil}, and @code{old_value} holds the old value, which is stored as
8244 the symbol's value). 8814 the symbol's value).
8245 @end enumerate 8815 @end enumerate
8246 8816
8247 @node Simple Special Forms 8817 @node Simple Special Forms, Catch and Throw, Dynamic Binding; The specbinding Stack; Unwind-Protects, Evaluation; Stack Frames; Bindings
8248 @section Simple Special Forms 8818 @section Simple Special Forms
8249 @cindex special forms, simple 8819 @cindex special forms, simple
8250 8820
8251 @code{or}, @code{and}, @code{if}, @code{cond}, @code{progn}, 8821 @code{or}, @code{and}, @code{if}, @code{cond}, @code{progn},
8252 @code{prog1}, @code{prog2}, @code{setq}, @code{quote}, @code{function}, 8822 @code{prog1}, @code{prog2}, @code{setq}, @code{quote}, @code{function},
8260 Note that, with the exception of @code{Fprogn}, these functions are 8830 Note that, with the exception of @code{Fprogn}, these functions are
8261 typically called in real life only in interpreted code, since the byte 8831 typically called in real life only in interpreted code, since the byte
8262 compiler knows how to convert calls to these functions directly into 8832 compiler knows how to convert calls to these functions directly into
8263 byte code. 8833 byte code.
8264 8834
8265 @node Catch and Throw 8835 @node Catch and Throw, , Simple Special Forms, Evaluation; Stack Frames; Bindings
8266 @section Catch and Throw 8836 @section Catch and Throw
8267 @cindex catch and throw 8837 @cindex catch and throw
8268 @cindex throw, catch and 8838 @cindex throw, catch and
8269 8839
8270 @example 8840 @example
8321 the values of @code{gcprolist}, @code{backtrace_list}, and 8891 the values of @code{gcprolist}, @code{backtrace_list}, and
8322 @code{lisp_eval}, and calls @code{unbind_to()} to undo any specbindings 8892 @code{lisp_eval}, and calls @code{unbind_to()} to undo any specbindings
8323 created since the catch. 8893 created since the catch.
8324 8894
8325 8895
8326 @node Symbols and Variables, Buffers and Textual Representation, Evaluation; Stack Frames; Bindings, Top 8896 @node Symbols and Variables, Buffers, Evaluation; Stack Frames; Bindings, Top
8327 @chapter Symbols and Variables 8897 @chapter Symbols and Variables
8328 @cindex symbols and variables 8898 @cindex symbols and variables
8329 @cindex variables, symbols and 8899 @cindex variables, symbols and
8330 8900
8331 @menu 8901 @menu
8332 * Introduction to Symbols:: 8902 * Introduction to Symbols::
8333 * Obarrays:: 8903 * Obarrays::
8334 * Symbol Values:: 8904 * Symbol Values::
8335 @end menu 8905 @end menu
8336 8906
8337 @node Introduction to Symbols 8907 @node Introduction to Symbols, Obarrays, Symbols and Variables, Symbols and Variables
8338 @section Introduction to Symbols 8908 @section Introduction to Symbols
8339 @cindex symbols, introduction to 8909 @cindex symbols, introduction to
8340 8910
8341 A symbol is basically just an object with four fields: a name (a 8911 A symbol is basically just an object with four fields: a name (a
8342 string), a value (some Lisp object), a function (some Lisp object), and 8912 string), a value (some Lisp object), a function (some Lisp object), and
8350 there can be a distinct function and variable with the same name. The 8920 there can be a distinct function and variable with the same name. The
8351 property list is used as a more general mechanism of associating 8921 property list is used as a more general mechanism of associating
8352 additional values with particular names, and once again the namespace is 8922 additional values with particular names, and once again the namespace is
8353 independent of the function and variable namespaces. 8923 independent of the function and variable namespaces.
8354 8924
8355 @node Obarrays 8925 @node Obarrays, Symbol Values, Introduction to Symbols, Symbols and Variables
8356 @section Obarrays 8926 @section Obarrays
8357 @cindex obarrays 8927 @cindex obarrays
8358 8928
8359 The identity of symbols with their names is accomplished through a 8929 The identity of symbols with their names is accomplished through a
8360 structure called an obarray, which is just a poorly-implemented hash 8930 structure called an obarray, which is just a poorly-implemented hash
8418 a new one, and @code{unintern} to remove a symbol from an obarray. This 8988 a new one, and @code{unintern} to remove a symbol from an obarray. This
8419 returns the removed symbol. (Remember: You can't put the symbol back 8989 returns the removed symbol. (Remember: You can't put the symbol back
8420 into any obarray.) Finally, @code{mapatoms} maps over all of the symbols 8990 into any obarray.) Finally, @code{mapatoms} maps over all of the symbols
8421 in an obarray. 8991 in an obarray.
8422 8992
8423 @node Symbol Values 8993 @node Symbol Values, , Obarrays, Symbols and Variables
8424 @section Symbol Values 8994 @section Symbol Values
8425 @cindex symbol values 8995 @cindex symbol values
8426 @cindex values, symbol 8996 @cindex values, symbol
8427 8997
8428 The value field of a symbol normally contains a Lisp object. However, 8998 The value field of a symbol normally contains a Lisp object. However,
8463 9033
8464 The exact workings of this are rather complex and involved and are 9034 The exact workings of this are rather complex and involved and are
8465 well-documented in comments in @file{buffer.c}, @file{symbols.c}, and 9035 well-documented in comments in @file{buffer.c}, @file{symbols.c}, and
8466 @file{lisp.h}. 9036 @file{lisp.h}.
8467 9037
8468 @node Buffers and Textual Representation, MULE Character Sets and Encodings, Symbols and Variables, Top 9038 @node Buffers, Text, Symbols and Variables, Top
8469 @chapter Buffers and Textual Representation 9039 @chapter Buffers
8470 @cindex buffers and textual representation 9040 @cindex buffers
8471 @cindex textual representation, buffers and
8472 9041
8473 @menu 9042 @menu
8474 * Introduction to Buffers:: A buffer holds a block of text such as a file. 9043 * Introduction to Buffers:: A buffer holds a block of text such as a file.
8475 * The Text in a Buffer:: Representation of the text in a buffer.
8476 * Buffer Lists:: Keeping track of all buffers. 9044 * Buffer Lists:: Keeping track of all buffers.
8477 * Markers and Extents:: Tagging locations within a buffer. 9045 * Markers and Extents:: Tagging locations within a buffer.
8478 * Ibytes and Ichars:: Representation of individual characters.
8479 * The Buffer Object:: The Lisp object corresponding to a buffer. 9046 * The Buffer Object:: The Lisp object corresponding to a buffer.
8480 * Searching and Matching:: Higher-level algorithms.
8481 @end menu 9047 @end menu
8482 9048
8483 @node Introduction to Buffers 9049 @node Introduction to Buffers, Buffer Lists, Buffers, Buffers
8484 @section Introduction to Buffers 9050 @section Introduction to Buffers
8485 @cindex buffers, introduction to 9051 @cindex buffers, introduction to
8486 9052
8487 A buffer is logically just a Lisp object that holds some text. 9053 A buffer is logically just a Lisp object that holds some text.
8488 In this, it is like a string, but a buffer is optimized for 9054 In this, it is like a string, but a buffer is optimized for
8532 and @dfn{buffer of the selected window}, and the distinction between 9098 and @dfn{buffer of the selected window}, and the distinction between
8533 @dfn{point} of the current buffer and @dfn{window-point} of the selected 9099 @dfn{point} of the current buffer and @dfn{window-point} of the selected
8534 window. (This latter distinction is explained in detail in the section 9100 window. (This latter distinction is explained in detail in the section
8535 on windows.) 9101 on windows.)
8536 9102
8537 @node The Text in a Buffer 9103 @node Buffer Lists, Markers and Extents, Introduction to Buffers, Buffers
9104 @section Buffer Lists
9105 @cindex buffer lists
9106
9107 Recall earlier that buffers are @dfn{permanent} objects, i.e. that
9108 they remain around until explicitly deleted. This entails that there is
9109 a list of all the buffers in existence. This list is actually an
9110 assoc-list (mapping from the buffer's name to the buffer) and is stored
9111 in the global variable @code{Vbuffer_alist}.
9112
9113 The order of the buffers in the list is important: the buffers are
9114 ordered approximately from most-recently-used to least-recently-used.
9115 Switching to a buffer using @code{switch-to-buffer},
9116 @code{pop-to-buffer}, etc. and switching windows using
9117 @code{other-window}, etc. usually brings the new current buffer to the
9118 front of the list. @code{switch-to-buffer}, @code{other-buffer},
9119 etc. look at the beginning of the list to find an alternative buffer to
9120 suggest. You can also explicitly move a buffer to the end of the list
9121 using @code{bury-buffer}.
9122
9123 In addition to the global ordering in @code{Vbuffer_alist}, each frame
9124 has its own ordering of the list. These lists always contain the same
9125 elements as in @code{Vbuffer_alist} although possibly in a different
9126 order. @code{buffer-list} normally returns the list for the selected
9127 frame. This allows you to work in separate frames without things
9128 interfering with each other.
9129
9130 The standard way to look up a buffer given a name is
9131 @code{get-buffer}, and the standard way to create a new buffer is
9132 @code{get-buffer-create}, which looks up a buffer with a given name,
9133 creating a new one if necessary. These operations correspond exactly
9134 with the symbol operations @code{intern-soft} and @code{intern},
9135 respectively. You can also force a new buffer to be created using
9136 @code{generate-new-buffer}, which takes a name and (if necessary) makes
9137 a unique name from this by appending a number, and then creates the
9138 buffer. This is basically like the symbol operation @code{gensym}.
9139
9140 @node Markers and Extents, The Buffer Object, Buffer Lists, Buffers
9141 @section Markers and Extents
9142 @cindex markers and extents
9143 @cindex extents, markers and
9144
9145 Among the things associated with a buffer are things that are
9146 logically attached to certain buffer positions. This can be used to
9147 keep track of a buffer position when text is inserted and deleted, so
9148 that it remains at the same spot relative to the text around it; to
9149 assign properties to particular sections of text; etc. There are two
9150 such objects that are useful in this regard: they are @dfn{markers} and
9151 @dfn{extents}.
9152
9153 A @dfn{marker} is simply a flag placed at a particular buffer
9154 position, which is moved around as text is inserted and deleted.
9155 Markers are used for all sorts of purposes, such as the @code{mark} that
9156 is the other end of textual regions to be cut, copied, etc.
9157
9158 An @dfn{extent} is similar to two markers plus some associated
9159 properties, and is used to keep track of regions in a buffer as text is
9160 inserted and deleted, and to add properties (e.g. fonts) to particular
9161 regions of text. The external interface of extents is explained
9162 elsewhere.
9163
9164 The important thing here is that markers and extents simply contain
9165 buffer positions in them as integers, and every time text is inserted or
9166 deleted, these positions must be updated. In order to minimize the
9167 amount of shuffling that needs to be done, the positions in markers and
9168 extents (there's one per marker, two per extent) are stored in Membpos's.
9169 This means that they only need to be moved when the text is physically
9170 moved in memory; since the gap structure tries to minimize this, it also
9171 minimizes the number of marker and extent indices that need to be
9172 adjusted. Look in @file{insdel.c} for the details of how this works.
9173
9174 One other important distinction is that markers are @dfn{temporary}
9175 while extents are @dfn{permanent}. This means that markers disappear as
9176 soon as there are no more pointers to them, and correspondingly, there
9177 is no way to determine what markers are in a buffer if you are just
9178 given the buffer. Extents remain in a buffer until they are detached
9179 (which could happen as a result of text being deleted) or the buffer is
9180 deleted, and primitives do exist to enumerate the extents in a buffer.
9181
9182 @node The Buffer Object, , Markers and Extents, Buffers
9183 @section The Buffer Object
9184 @cindex buffer object, the
9185 @cindex object, the buffer
9186
9187 Buffers contain fields not directly accessible by the Lisp programmer.
9188 We describe them here, naming them by the names used in the C code.
9189 Many are accessible indirectly in Lisp programs via Lisp primitives.
9190
9191 @table @code
9192 @item name
9193 The buffer name is a string that names the buffer. It is guaranteed to
9194 be unique. @xref{Buffer Names,,, lispref, XEmacs Lisp Reference
9195 Manual}.
9196
9197 @item save_modified
9198 This field contains the time when the buffer was last saved, as an
9199 integer. @xref{Buffer Modification,,, lispref, XEmacs Lisp Reference
9200 Manual}.
9201
9202 @item modtime
9203 This field contains the modification time of the visited file. It is
9204 set when the file is written or read. Every time the buffer is written
9205 to the file, this field is compared to the modification time of the
9206 file. @xref{Buffer Modification,,, lispref, XEmacs Lisp Reference
9207 Manual}.
9208
9209 @item auto_save_modified
9210 This field contains the time when the buffer was last auto-saved.
9211
9212 @item last_window_start
9213 This field contains the @code{window-start} position in the buffer as of
9214 the last time the buffer was displayed in a window.
9215
9216 @item undo_list
9217 This field points to the buffer's undo list. @xref{Undo,,, lispref,
9218 XEmacs Lisp Reference Manual}.
9219
9220 @item syntax_table_v
9221 This field contains the syntax table for the buffer. @xref{Syntax
9222 Tables,,, lispref, XEmacs Lisp Reference Manual}.
9223
9224 @item downcase_table
9225 This field contains the conversion table for converting text to lower
9226 case. @xref{Case Tables,,, lispref, XEmacs Lisp Reference Manual}.
9227
9228 @item upcase_table
9229 This field contains the conversion table for converting text to upper
9230 case. @xref{Case Tables,,, lispref, XEmacs Lisp Reference Manual}.
9231
9232 @item case_canon_table
9233 This field contains the conversion table for canonicalizing text for
9234 case-folding search. @xref{Case Tables,,, lispref, XEmacs Lisp
9235 Reference Manual}.
9236
9237 @item case_eqv_table
9238 This field contains the equivalence table for case-folding search.
9239 @xref{Case Tables,,, lispref, XEmacs Lisp Reference Manual}.
9240
9241 @item display_table
9242 This field contains the buffer's display table, or @code{nil} if it
9243 doesn't have one. @xref{Display Tables,,, lispref, XEmacs Lisp
9244 Reference Manual}.
9245
9246 @item markers
9247 This field contains the chain of all markers that currently point into
9248 the buffer. Deletion of text in the buffer, and motion of the buffer's
9249 gap, must check each of these markers and perhaps update it.
9250 @xref{Markers,,, lispref, XEmacs Lisp Reference Manual}.
9251
9252 @item backed_up
9253 This field is a flag that tells whether a backup file has been made for
9254 the visited file of this buffer.
9255
9256 @item mark
9257 This field contains the mark for the buffer. The mark is a marker,
9258 hence it is also included on the list @code{markers}. @xref{The Mark,,,
9259 lispref, XEmacs Lisp Reference Manual}.
9260
9261 @item mark_active
9262 This field is non-@code{nil} if the buffer's mark is active.
9263
9264 @item local_var_alist
9265 This field contains the association list describing the variables local
9266 in this buffer, and their values, with the exception of local variables
9267 that have special slots in the buffer object. (Those slots are omitted
9268 from this table.) @xref{Buffer-Local Variables,,, lispref, XEmacs Lisp
9269 Reference Manual}.
9270
9271 @item modeline_format
9272 This field contains a Lisp object which controls how to display the mode
9273 line for this buffer. @xref{Modeline Format,,, lispref, XEmacs Lisp
9274 Reference Manual}.
9275
9276 @item base_buffer
9277 This field holds the buffer's base buffer (if it is an indirect buffer),
9278 or @code{nil}.
9279 @end table
9280
9281 @node Text, Multilingual Support, Buffers, Top
9282 @chapter Text
9283 @cindex text
9284
9285 @menu
9286 * The Text in a Buffer:: Representation of the text in a buffer.
9287 * Ibytes and Ichars:: Representation of individual characters.
9288 * Byte-Char Position Conversion::
9289 * Searching and Matching:: Higher-level algorithms.
9290 @end menu
9291
9292 @node The Text in a Buffer, Ibytes and Ichars, Text, Text
8538 @section The Text in a Buffer 9293 @section The Text in a Buffer
8539 @cindex text in a buffer, the 9294 @cindex text in a buffer, the
8540 @cindex buffer, the text in a 9295 @cindex buffer, the text in a
8541 9296
8542 The text in a buffer consists of a sequence of zero or more 9297 The text in a buffer consists of a sequence of zero or more
8674 Ibytes underscores the fact that we are working with a string of bytes 9429 Ibytes underscores the fact that we are working with a string of bytes
8675 in the internal Emacs buffer representation rather than in one of a 9430 in the internal Emacs buffer representation rather than in one of a
8676 number of possible alternative representations (e.g. EUC-encoded text, 9431 number of possible alternative representations (e.g. EUC-encoded text,
8677 etc.). 9432 etc.).
8678 9433
8679 @node Buffer Lists 9434 @node Ibytes and Ichars, Byte-Char Position Conversion, The Text in a Buffer, Text
8680 @section Buffer Lists
8681 @cindex buffer lists
8682
8683 Recall earlier that buffers are @dfn{permanent} objects, i.e. that
8684 they remain around until explicitly deleted. This entails that there is
8685 a list of all the buffers in existence. This list is actually an
8686 assoc-list (mapping from the buffer's name to the buffer) and is stored
8687 in the global variable @code{Vbuffer_alist}.
8688
8689 The order of the buffers in the list is important: the buffers are
8690 ordered approximately from most-recently-used to least-recently-used.
8691 Switching to a buffer using @code{switch-to-buffer},
8692 @code{pop-to-buffer}, etc. and switching windows using
8693 @code{other-window}, etc. usually brings the new current buffer to the
8694 front of the list. @code{switch-to-buffer}, @code{other-buffer},
8695 etc. look at the beginning of the list to find an alternative buffer to
8696 suggest. You can also explicitly move a buffer to the end of the list
8697 using @code{bury-buffer}.
8698
8699 In addition to the global ordering in @code{Vbuffer_alist}, each frame
8700 has its own ordering of the list. These lists always contain the same
8701 elements as in @code{Vbuffer_alist} although possibly in a different
8702 order. @code{buffer-list} normally returns the list for the selected
8703 frame. This allows you to work in separate frames without things
8704 interfering with each other.
8705
8706 The standard way to look up a buffer given a name is
8707 @code{get-buffer}, and the standard way to create a new buffer is
8708 @code{get-buffer-create}, which looks up a buffer with a given name,
8709 creating a new one if necessary. These operations correspond exactly
8710 with the symbol operations @code{intern-soft} and @code{intern},
8711 respectively. You can also force a new buffer to be created using
8712 @code{generate-new-buffer}, which takes a name and (if necessary) makes
8713 a unique name from this by appending a number, and then creates the
8714 buffer. This is basically like the symbol operation @code{gensym}.
8715
8716 @node Markers and Extents
8717 @section Markers and Extents
8718 @cindex markers and extents
8719 @cindex extents, markers and
8720
8721 Among the things associated with a buffer are things that are
8722 logically attached to certain buffer positions. This can be used to
8723 keep track of a buffer position when text is inserted and deleted, so
8724 that it remains at the same spot relative to the text around it; to
8725 assign properties to particular sections of text; etc. There are two
8726 such objects that are useful in this regard: they are @dfn{markers} and
8727 @dfn{extents}.
8728
8729 A @dfn{marker} is simply a flag placed at a particular buffer
8730 position, which is moved around as text is inserted and deleted.
8731 Markers are used for all sorts of purposes, such as the @code{mark} that
8732 is the other end of textual regions to be cut, copied, etc.
8733
8734 An @dfn{extent} is similar to two markers plus some associated
8735 properties, and is used to keep track of regions in a buffer as text is
8736 inserted and deleted, and to add properties (e.g. fonts) to particular
8737 regions of text. The external interface of extents is explained
8738 elsewhere.
8739
8740 The important thing here is that markers and extents simply contain
8741 buffer positions in them as integers, and every time text is inserted or
8742 deleted, these positions must be updated. In order to minimize the
8743 amount of shuffling that needs to be done, the positions in markers and
8744 extents (there's one per marker, two per extent) are stored in Membpos's.
8745 This means that they only need to be moved when the text is physically
8746 moved in memory; since the gap structure tries to minimize this, it also
8747 minimizes the number of marker and extent indices that need to be
8748 adjusted. Look in @file{insdel.c} for the details of how this works.
8749
8750 One other important distinction is that markers are @dfn{temporary}
8751 while extents are @dfn{permanent}. This means that markers disappear as
8752 soon as there are no more pointers to them, and correspondingly, there
8753 is no way to determine what markers are in a buffer if you are just
8754 given the buffer. Extents remain in a buffer until they are detached
8755 (which could happen as a result of text being deleted) or the buffer is
8756 deleted, and primitives do exist to enumerate the extents in a buffer.
8757
8758 @node Ibytes and Ichars
8759 @section Ibytes and Ichars 9435 @section Ibytes and Ichars
8760 @cindex Ibytes and Ichars 9436 @cindex Ibytes and Ichars
8761 @cindex Ichars, Ibytes and 9437 @cindex Ichars, Ibytes and
8762 9438
8763 Not yet documented. 9439 Not yet documented.
8764 9440
8765 @node The Buffer Object 9441 @node Byte-Char Position Conversion, Searching and Matching, Ibytes and Ichars, Text
8766 @section The Buffer Object 9442 @section Byte-Char Position Conversion
8767 @cindex buffer object, the 9443 @cindex byte-char position conversion
8768 @cindex object, the buffer 9444 @cindex position conversion, byte-char
8769 9445 @cindex conversion, byte-char position
8770 Buffers contain fields not directly accessible by the Lisp programmer. 9446
8771 We describe them here, naming them by the names used in the C code. 9447 Oct 2004:
8772 Many are accessible indirectly in Lisp programs via Lisp primitives. 9448
8773 9449 This is what I wrote when describing the previous algorithm:
8774 @table @code 9450
8775 @item name 9451 @quotation
8776 The buffer name is a string that names the buffer. It is guaranteed to 9452 The basic algorithm we use is to keep track of a known region of
8777 be unique. @xref{Buffer Names,,, lispref, XEmacs Lisp Reference 9453 characters in each buffer, all of which are of the same width. We keep
8778 Manual}. 9454 track of the boundaries of the region in both Charbpos and Bytebpos
8779 9455 coordinates and also keep track of the char width, which is 1 - 4 bytes.
8780 @item save_modified 9456 If the position we're translating is not in the known region, then we
8781 This field contains the time when the buffer was last saved, as an 9457 invoke a function to update the known region to surround the position in
8782 integer. @xref{Buffer Modification,,, lispref, XEmacs Lisp Reference 9458 question. This assumes locality of reference, which is usually the
8783 Manual}. 9459 case.
8784 9460
8785 @item modtime 9461 Note that the function to update the known region can be simple or
8786 This field contains the modification time of the visited file. It is 9462 complicated depending on how much information we cache. In addition to
8787 set when the file is written or read. Every time the buffer is written 9463 the known region, we always cache the correct conversions for point,
8788 to the file, this field is compared to the modification time of the 9464 BEGV, and ZV, and in addition to this we cache 16 positions where the
8789 file. @xref{Buffer Modification,,, lispref, XEmacs Lisp Reference 9465 conversion is known. We only look in the cache or update it when we
8790 Manual}. 9466 need to move the known region more than a certain amount (currently 50
8791 9467 chars), and then we throw away a "random" value and replace it with the
8792 @item auto_save_modified 9468 newly calculated value.
8793 This field contains the time when the buffer was last auto-saved. 9469
8794 9470 Finally, we maintain an extra flag that tracks whether the buffer is
8795 @item last_window_start 9471 entirely ASCII, to speed up the conversions even more. This flag is
8796 This field contains the @code{window-start} position in the buffer as of 9472 actually of dubious value because in an entirely-ASCII buffer the known
8797 the last time the buffer was displayed in a window. 9473 region will always span the entire buffer (in fact, we update the flag
8798 9474 based on this fact), and so all we're saving is a few machine cycles.
8799 @item undo_list 9475
8800 This field points to the buffer's undo list. @xref{Undo,,, lispref, 9476 A potentially smarter method than what we do with known regions and
8801 XEmacs Lisp Reference Manual}. 9477 cached positions would be to keep some sort of pseudo-extent layer over
8802 9478 the buffer; maybe keep track of the charbpos/bytebpos correspondence at
8803 @item syntax_table_v 9479 the beginning of each line, which would allow us to do a binary search
8804 This field contains the syntax table for the buffer. @xref{Syntax 9480 over the pseudo-extents to narrow things down to the correct line, at
8805 Tables,,, lispref, XEmacs Lisp Reference Manual}. 9481 which point you could use a linear movement method. This would also
8806 9482 mesh well with efficiently implementing a line-numbering scheme.
8807 @item downcase_table 9483 However, you have to weigh the amount of time spent updating the cache
8808 This field contains the conversion table for converting text to lower 9484 vs. the savings that result from it. In reality, we modify the buffer
8809 case. @xref{Case Tables,,, lispref, XEmacs Lisp Reference Manual}. 9485 far less often than we access it, so a cache of this sort that provides
8810 9486 guaranteed LOG (N) performance (or perhaps N * LOG (N), if we set a
8811 @item upcase_table 9487 maximum on the cache size) would indeed be a win, particularly in very
8812 This field contains the conversion table for converting text to upper 9488 large buffers. If we ever implement this, we should probably set a
8813 case. @xref{Case Tables,,, lispref, XEmacs Lisp Reference Manual}. 9489 reasonably high minimum below which we use the old method, because the
8814 9490 time spent updating the fancy cache would likely become dominant when
8815 @item case_canon_table 9491 making buffer modifications in smaller buffers.
8816 This field contains the conversion table for canonicalizing text for 9492
8817 case-folding search. @xref{Case Tables,,, lispref, XEmacs Lisp 9493 Note also that we have to multiply or divide by the char width in order
8818 Reference Manual}. 9494 to convert the positions. We do some tricks to avoid ever actually
8819 9495 having to do a multiply or divide, because that is typically an
8820 @item case_eqv_table 9496 expensive operation (esp. divide). Multiplying or dividing by 1, 2, or
8821 This field contains the equivalence table for case-folding search. 9497 4 can be implemented simply as a shift left or shift right, and we keep
8822 @xref{Case Tables,,, lispref, XEmacs Lisp Reference Manual}. 9498 track of a shifter value (0, 1, or 2) indicating how much to shift.
8823 9499 Multiplying by 3 can be implemented by doubling and then adding the
8824 @item display_table 9500 original value. Dividing by 3, alas, cannot be implemented in any
8825 This field contains the buffer's display table, or @code{nil} if it 9501 simple shift/subtract method, as far as I know; so we just do a table
8826 doesn't have one. @xref{Display Tables,,, lispref, XEmacs Lisp 9502 lookup. For simplicity, we use a table of size 128K, which indexes the
8827 Reference Manual}. 9503 "divide-by-3" values for the first 64K non-negative numbers. (Note that
8828 9504 we can increase the size up to 384K, i.e. indexing the first 192K
8829 @item markers 9505 non-negative numbers, while still using shorts in the array.) This also
8830 This field contains the chain of all markers that currently point into 9506 means that the size of the known region can be at most 64K for
8831 the buffer. Deletion of text in the buffer, and motion of the buffer's 9507 width-three characters.
8832 gap, must check each of these markers and perhaps update it. 9508 @end quotation
8833 @xref{Markers,,, lispref, XEmacs Lisp Reference Manual}. 9509
8834 9510 Unfortunately, it turned out that the implementation had serious problems
8835 @item backed_up 9511 which had never been corrected. In particular, the known region had a
8836 This field is a flag that tells whether a backup file has been made for 9512 large tendency to become zero-length and stay that way.
8837 the visited file of this buffer. 9513
8838 9514 So I decided to port the algorithm from FSF 21.3, in markers.c.
8839 @item mark 9515
8840 This field contains the mark for the buffer. The mark is a marker, 9516 This algorithm is fairly simple. Instead of using markers I kept the cache
8841 hence it is also included on the list @code{markers}. @xref{The Mark,,, 9517 array of known positions from the previous implementation.
8842 lispref, XEmacs Lisp Reference Manual}. 9518
8843 9519 Basically, we keep a number of positions cached:
8844 @item mark_active 9520
8845 This field is non-@code{nil} if the buffer's mark is active. 9521 @itemize @bullet
8846 9522 @item
8847 @item local_var_alist 9523 the actual end of the buffer
8848 This field contains the association list describing the variables local 9524 @item
8849 in this buffer, and their values, with the exception of local variables 9525 the beginning and end of the accessible region
8850 that have special slots in the buffer object. (Those slots are omitted 9526 @item
8851 from this table.) @xref{Buffer-Local Variables,,, lispref, XEmacs Lisp 9527 the value of point
8852 Reference Manual}. 9528 @item
8853 9529 the position of the gap
8854 @item modeline_format 9530 @item
8855 This field contains a Lisp object which controls how to display the mode 9531 the last value we computed
8856 line for this buffer. @xref{Modeline Format,,, lispref, XEmacs Lisp 9532 @item
8857 Reference Manual}. 9533 a set of positions that are "far away" from previously computed positions
8858 9534 (5000 chars currently; #### perhaps should be smaller)
8859 @item base_buffer 9535 @end itemize
8860 This field holds the buffer's base buffer (if it is an indirect buffer), 9536
8861 or @code{nil}. 9537 For each position, we @code{CONSIDER()} it. This means:
8862 @end table 9538
8863 9539 @itemize @bullet
8864 @node Searching and Matching 9540 @item
9541 If the position is what we're looking for, return it directly.
9542 @item
9543 Starting with the beginning and end of the buffer, we successively
9544 compute the smallest enclosing range of known positions. If at any
9545 point we discover that this range has the same byte and char length
9546 (i.e. is entirely single-byte), then our computation is trivial.
9547 @item
9548 If at any point we get a small enough range (50 chars currently),
9549 stop considering further positions.
9550 @end itemize
9551
9552 Otherwise, once we have an enclosing range, see which side is closer, and
9553 iterate until we find the desired value. As an optimization, I replaced
9554 the simple loop in FSF with the use of @code{bytecount_to_charcount()},
9555 @code{charcount_to_bytecount()}, @code{bytecount_to_charcount_down()}, or
9556 @code{charcount_to_bytecount_down()}. (The latter two I added for this purpose.)
9557 These scan 4 or 8 bytes at a time through purely single-byte characters.
9558
9559 If the amount we had to scan was more than our "far away" distance (5000
9560 characters, see above), then cache the new position.
9561
9562 #### Things to do:
9563
9564 @itemize @bullet
9565 @item
9566 Look at the most recent GNU Emacs to see whether anything has changed.
9567 @item
9568 Think about whether it makes sense to try to implement some sort of
9569 known region or list of "known regions", like we had before. This would
9570 be a region of entirely single-byte characters that we can check very
9571 quickly. (Previously I used a range of same-width characters of any
9572 size; but this adds extra complexity and slows down the scanning, and is
9573 probably not worth it.) As part of the scanning process in
9574 @code{bytecount_to_charcount()} et al, we skip over chunks of entirely
9575 single-byte chars, so it should be easy to remember the last one.
9576 Presumably what we should do is keep track of the largest known surrounding
9577 entirely-single-byte region for each of the cache positions as well as
9578 perhaps the last-cached position. We want to be careful not to get bitten
9579 by the previous problem of having the known region getting reset too
9580 often. If we implement this, we might well want to continue scanning
9581 some distance past the desired position (maybe 300-1000 bytes) if we are
9582 in a single-byte range so that we won't end up expanding the known range
9583 one position at a time and entering the function each time.
9584 @item
9585 Think about whether it makes sense to keep the position cache sorted.
9586 This would allow it to be larger and finer-grained in its positions.
9587 Note that with FSF's use of markers, they were sorted, but this
9588 was not really made good use of. With an array, we can do binary searching
9589 to quickly find the smallest range. We would probably want to make use of
9590 the gap-array code in extents.c.
9591 @end itemize
9592
9593 Note that FSF's algorithm checked @strong{ALL} markers, not just the ones cached
9594 by this algorithm. This includes markers created by the user as well as
9595 both ends of any overlays. We could do similarly, and our extents could
9596 keep both byte and character positions rather than just the former. (But
9597 this would probably be overkill. We should just use our cache instead.
9598 Any place an extent was set was surely already visited by the char<-->byte
9599 conversion routines.)
9600
9601 @node Searching and Matching, , Byte-Char Position Conversion, Text
8865 @section Searching and Matching 9602 @section Searching and Matching
8866 @cindex searching 9603 @cindex searching
8867 @cindex matching 9604 @cindex matching
8868 9605
8869 Very incomplete, limited to a brief introduction. 9606 Very incomplete, limited to a brief introduction.
9080 @end enumerate 9817 @end enumerate
9081 9818
9082 But if you keep your eye on the "switch in a loop" structure, you 9819 But if you keep your eye on the "switch in a loop" structure, you
9083 should be able to understand the parts you need. 9820 should be able to understand the parts you need.
9084 9821
9085 9822 @node Multilingual Support, The Lisp Reader and Compiler, Text, Top
9086 @node MULE Character Sets and Encodings, The Lisp Reader and Compiler, Buffers and Textual Representation, Top 9823 @chapter Multilingual Support
9087 @chapter MULE Character Sets and Encodings
9088 @cindex Mule character sets and encodings 9824 @cindex Mule character sets and encodings
9089 @cindex character sets and encodings, Mule 9825 @cindex character sets and encodings, Mule
9090 @cindex encodings, Mule character sets and 9826 @cindex encodings, Mule character sets and
9827
9828 @emph{NOTE}: There is a great deal of overlapping and redundant
9829 information in this chapter. Ben wrote introductions to Mule issues a
9830 number of times, each time not realizing that he had already written
9831 another introduction previously. Hopefully, in time these will all be
9832 integrated.
9833
9834 @emph{NOTE}: The information at the top of the source file
9835 @file{text.c} is more complete than the following, and there is also a
9836 list of all other places to look for text/I18N-related info. Also look in
9837 @file{text.h} for info about the DFC and Eistring API's.
9091 9838
9092 Recall that there are two primary ways that text is represented in 9839 Recall that there are two primary ways that text is represented in
9093 XEmacs. The @dfn{buffer} representation sees the text as a series of 9840 XEmacs. The @dfn{buffer} representation sees the text as a series of
9094 bytes (Ibytes), with a variable number of bytes used per character. 9841 bytes (Ibytes), with a variable number of bytes used per character.
9095 The @dfn{character} representation sees the text as a series of integers 9842 The @dfn{character} representation sees the text as a series of integers
9100 Lisp strings and buffers, and because of this, it is the ``default'' 9847 Lisp strings and buffers, and because of this, it is the ``default''
9101 representation that text comes in. The reason for using this 9848 representation that text comes in. The reason for using this
9102 representation is that it's compact and is compatible with ASCII. 9849 representation is that it's compact and is compatible with ASCII.
9103 9850
9104 @menu 9851 @menu
9105 * Character Sets:: 9852 * Introduction to Multilingual Issues #1::
9106 * Encodings:: 9853 * Introduction to Multilingual Issues #2::
9107 * Internal Mule Encodings:: 9854 * Introduction to Multilingual Issues #3::
9108 * CCL:: 9855 * Introduction to Multilingual Issues #4::
9856 * Character Sets::
9857 * Encodings::
9858 * Internal Mule Encodings::
9859 * Byte/Character Types; Buffer Positions; Other Typedefs::
9860 * Internal Text API's::
9861 * Coding for Mule::
9862 * CCL::
9863 * Modules for Internationalization::
9109 @end menu 9864 @end menu
9110 9865
9111 @node Character Sets 9866 @node Introduction to Multilingual Issues #1, Introduction to Multilingual Issues #2, Multilingual Support, Multilingual Support
9867 @section Introduction to Multilingual Issues #1
9868 @cindex introduction to multilingual issues #1
9869
9870 There is an introduction to these issues in the Lisp Reference manual.
9871 @xref{Internationalization Terminology,,, lispref, XEmacs Lisp Reference
9872 Manual}. Among other documentation that may be of interest to internals
9873 programmers is ISO-2022 (@pxref{ISO 2022,,, lispref, XEmacs Lisp
9874 Reference Manual}) and CCL (@pxref{CCL,,, lispref, XEmacs Lisp Reference
9875 Manual})
9876
9877 @node Introduction to Multilingual Issues #2, Introduction to Multilingual Issues #3, Introduction to Multilingual Issues #1, Multilingual Support
9878 @section Introduction to Multilingual Issues #2
9879 @cindex introduction to multilingual issues #2
9880
9881 @subheading Introduction
9882
9883 This document covers a number of design issues, problems and proposals
9884 with regards to XEmacs MULE. At first we present some definitions and
9885 some aspects of the design that have been agreed upon. Then we present
9886 some issues and problems that need to be addressed, and then I include a
9887 proposal of mine to address some of these issues. When there are other
9888 proposals, for example from Olivier, these will be appended to the end
9889 of this document.
9890
9891 @subheading Definitions and Design Basics
9892
9893 First, @dfn{text} is defined to be a series of characters which together
9894 defines an utterance or partial utterance in some language.
9895 Generally, this language is a human language, but it may also be a
9896 computer language if the computer language uses a representation close
9897 enough to that of human languages for it to also make sense to call its
9898 representation text. Text is opposed to @dfn{binary}, which is a sequence
9899 of bytes, representing machine-readable but not human-readable data.
9900 A @dfn{byte} is merely a number within a predefined range, which nowadays is
9901 nearly always zero to 255. A @dfn{character} is a unit of text. What makes
9902 one character different from another is not always clear-cut. It is
9903 generally related to the appearance of the character, although perhaps
9904 not any possible appearance of that character, but some sort of ideal
9905 appearance that is assigned to a character. Whether two characters
9906 that look very similar are actually the same depends on various
9907 factors such as political ones, such as whether the characters are
9908 used to mean similar sorts of things, or behave similarly in similar
9909 contexts. In any case, it is not always clearly defined whether two
9910 characters are actually the same or not. In practice, however, this
9911 is more or less agreed upon.
9912
9913 A @dfn{character set} is just that, a set of one or more characters.
9914 The set is unique in that there will not be more than one instance of
9915 the same character in a character set, and logically is unordered,
9916 although an order is often imposed or suggested for the characters in
9917 the character set. We can also define an @dfn{order} on a character
9918 set, which is a way of assigning a unique number, or possibly a pair of
9919 numbers, or a triplet of numbers, or even a set of four or more numbers
9920 to each character in the character set. The combination of an order in
9921 the character set results in an @dfn{ordered character set}. In an
9922 ordered character set, there is an upper limit and a lower limit on the
9923 possible values that a character, or that any number within the set of
9924 numbers assigned to a character, can take. However, the lower limit
9925 does not have to start at zero or one, or anywhere else in particular,
9926 nor does the upper limit have to end anywhere particular, and there may
9927 be gaps within these ranges such that particular numbers or sets of
9928 numbers do not have a corresponding character, even though they are
9929 within the upper and lower limits. For example, @dfn{ASCII} defines a
9930 very standard ordered character set. It is normally defined to be 94
9931 characters in the range 33 through 126 inclusive on both ends, with
9932 every possible character within this range being actually present in the
9933 character set.
9934
9935 Sometimes the ASCII character set is extended to include what are called
9936 @dfn{non-printing characters}. Non-printing characters are characters
9937 which instead of really being displayed in a more or less rectangular
9938 block, like all other characters, instead indicate certain functions
9939 typically related to either control of the display upon which the
9940 characters are being displayed, or have some effect on a communications
9941 channel that may be currently open and transmitting characters, or may
9942 change the meaning of future characters as they are being decoded, or
9943 some other similar function. You might say that non-printing characters
9944 are somewhat of a hack because they are a special exception to the
9945 standard concept of a character as being a printed glyph that has some
9946 direct correspondence in the non-computer world.
9947
9948 With non-printing characters in mind, the 94-character ordered character
9949 set called ASCII is often extended into a 96-character ordered character
9950 set, also often called ASCII, which includes in addition to the 94
9951 characters already mentioned, two non-printing characters, one called
9952 space and assigned the number 32, just below the bottom of the previous
9953 range, and another called @dfn{delete} or @dfn{rubout}, which is given
9954 number 127 just above the end of the previous range. Thus to reiterate,
9955 the result is a 96-character ordered character set, whose characters
9956 take the values from 32 to 127 inclusive. Sometimes ASCII is further
9957 extended to contain 32 more non-printing characters, which are given the
9958 numbers zero through 31 so that the result is a 128-character ordered
9959 character set with characters numbered zero through 127, and with many
9960 non-printing characters. Another way to look at this, and the way that
9961 is normally taken by XEmacs MULE, is that the characters that would be
9962 in the range 30 through 31 in the most extended definition of ASCII,
9963 instead form their own ordered character set, which is called
9964 @dfn{control zero}, and consists of 32 characters in the range zero
9965 through 31. A similar ordered character set called @dfn{control one} is
9966 also created, and it contains 32 more non-printing characters in the
9967 range 128 through 159. Note that none of these three ordered character
9968 sets overlaps in any of the numbers they are assigned to their
9969 characters, so they can all be used at once. Note further that the same
9970 character can occur in more than one character set. This was shown
9971 above, for example, in two different ordered character sets we defined,
9972 one of which we could have called @dfn{ASCII}, and the other
9973 @dfn{ASCII-extended}, to show that it had extended by two non-printable
9974 characters. Most of the characters in these two character sets are
9975 shared and present in both of them.
9976
9977 Note that there is no restriction on the size of the character set, or
9978 on the numbers that are assigned to characters in an ordered character
9979 set. It is often extremely useful to represent a sequence of characters
9980 as a sequence of bytes, where a byte as defined above is a number in the
9981 range zero to 255. An @dfn{encoding} does precisely this. It is simply
9982 a mapping from a sequence of characters, possibly augmented with
9983 information indicating the character set that each of these characters
9984 belongs to, to a sequence of bytes which represents that sequence of
9985 characters and no other -- which is to say the mapping is reversible.
9986
9987 A @dfn{coding system} is a set of rules for encoding a sequence of
9988 characters augmented with character set information into a sequence of
9989 bytes, and later performing the reverse operation. It is frequently
9990 possible to group coding systems into classes or types based on common
9991 features. Typically, for example, a particular coding system class
9992 may contain a base coding system which specifies some of the rules,
9993 but leaves the rest unspecified. Individual members of the coding
9994 system class are formed by starting with the base coding system, and
9995 augmenting it with additional rules to produce a particular coding
9996 system, what you might think of as a sort of variation within a
9997 theme.
9998
9999 @subheading XEmacs Specific Definitions
10000
10001 First of all, in XEmacs, the concept of character is a little different
10002 from the general definition given above. For one thing, the character
10003 set that a character belongs to may or may not be an inherent part of
10004 the character itself. In other words, the same character occurring in
10005 two different character sets may appear in XEmacs as two different
10006 characters. This is generally the case now, but we are attempting to
10007 move in the other direction. Different proposals may have different
10008 ideas about exactly the extent to which this change will be carried out.
10009 The general trend, though, is to represent all information about a
10010 character other than the character itself, using text properties
10011 attached to the character. That way two instances of the same character
10012 will look the same to lisp code that merely retrieves the character, and
10013 does not also look at the text properties of that character. Everyone
10014 involved is in agreement in doing it this way with all Latin characters,
10015 and in fact for all characters other than Chinese, Japanese, and Korean
10016 ideographs. For those, there may be a difference of opinion.
10017
10018 A second difference between the general definition of character and the
10019 XEmacs usage of character is that each character is assigned a unique
10020 number that distinguishes it from all other characters in the world, or
10021 at the very least, from all other characters currently existing anywhere
10022 inside the current XEmacs invocation. (If there is a case where the
10023 weaker statement applies, but not the stronger statement, it would
10024 possibly be with composite characters and any other such characters that
10025 are created on the sly.)
10026
10027 This unique number is called the @dfn{character representation} of the
10028 character, and its particular details are a matter of debate. There is
10029 the current standard in use that it is undoubtedly going to change.
10030 What has definitely been agreed upon is that it will be an integer, more
10031 specifically a positive integer, represented with less than or equal to
10032 31 bits on a 32-bit architecture, and possibly up to 63 bits on a 64-bit
10033 architecture, with the proviso that any characters that whose
10034 representation would fit in a 64-bit architecture, but not on a 32-bit
10035 architecture, would be used only for composite characters, and others
10036 that would satisfy the weak uniqueness property mentioned above, but not
10037 with the strong uniqueness property.
10038
10039 At this point, it is useful to talk about the different representations
10040 that a sequence of characters can take. The simplest representation is
10041 simply as a sequence of characters, and this is called the @dfn{Lisp
10042 representation} of text, because it is the representation that Lisp
10043 programs see. Other representations include the external
10044 representation, which refers to any encoding of the sequence of
10045 characters, using the definition of encoding mentioned above.
10046 Typically, text in the external representation is used outside of
10047 XEmacs, for example in files, e-mail messages, web sites, and the like.
10048 Another representation for a sequence of characters is what I will call
10049 the @dfn{byte representation}, and it represents the way that XEmacs
10050 internally represents text in a buffer, or in a string. Potentially,
10051 the representation could be different between a buffer and a string, and
10052 then the terms @dfn{buffer byte representation} and @dfn{string byte
10053 representation} would be used, but in practice I don't think this will
10054 occur. It will be possible, of course, for buffers and strings, or
10055 particular buffers and particular strings, to contain different
10056 sub-representations of a single representation. For example, Olivier's
10057 1-2-4 proposal allows for three sub-representations of his internal byte
10058 representation, allowing for 1 byte, 2 bytes, and 4 byte width
10059 characters respectively. A particular string may be in one
10060 sub-representation, and a particular buffer in another
10061 sub-representation, but overall both are following the same byte
10062 representation. I do not use the term @dfn{internal representation}
10063 here, as many people have, because it is potentially ambiguous.
10064
10065 Another representation is called the @dfn{array of characters
10066 representation}. This is a representation on the C-level in which the
10067 sequence of text is represented, not using the byte representation, but
10068 by using an array of characters, each represented using the character
10069 representation. This sort of representation is often used by redisplay
10070 because it is more convenient to work with than any of the other
10071 internal representations.
10072
10073 The term @dfn{binary representation} may also be heard. Binary
10074 representation is used to represent binary data. When binary data is
10075 represented in the lisp representation, an equivalence is simply set up
10076 between bytes zero through 255, and characters zero through 255. These
10077 characters come from four character sets, which are from bottom to top,
10078 control zero, ASCII, control 1, and Latin 1. Together, they comprise
10079 256 characters, and are a good mapping for the 256 possible bytes in a
10080 binary representation. Binary representation could also be used to
10081 refer to an external representation of the binary data, which is a
10082 simple direct byte-to-byte representation. No internal representation
10083 should ever be referred to as a binary representation because of
10084 ambiguity. The terms character set/encoding system were defined
10085 generally, above. In XEmacs, the equivalent concepts exist, although
10086 character set has been shortened to charset, and in fact represents
10087 specifically an ordered character set. For each possible charset, and
10088 for each possible coding system, there is an associated object in
10089 XEmacs. These objects will be of type charset and coding system,
10090 respectively. Charsets and coding systems are divided into classes, or
10091 @dfn{types}, the normal term under XEmacs, and all possible charsets
10092 encoding systems that may be defined must be in one of these types. If
10093 you need to create a charset or coding system that is not one of these
10094 types, you will have to modify the C code to support this new type.
10095 Some of the existing or soon-to-be-created types are, or will be,
10096 generic enough so that this shouldn't be an issue. Note also that the
10097 byte encoding for text and the character coding of a character are
10098 closely related. You might say that ideally each is the simplest
10099 equivalent of the other given the general constraints on each
10100 representation.
10101
10102 To be specific, in the current MULE representation,
10103
10104 @enumerate
10105 @item
10106 Characters encode both the character itself and the character set
10107 that it comes from. These character sets are always assumed to be
10108 representable as an ordered character set of size 96 or of size 96
10109 by 96, or the trivially-related sizes 94 and 94 by 94. The only
10110 allowable exceptions are the control zero and control one character
10111 sets, which are of size 32. Character sets which do not naturally
10112 have a compatible ordering such as this are shoehorned into an
10113 ordered character set, or possibly two ordered character sets of a
10114 compatible size.
10115 @item
10116 The variable width byte representation was deliberately chosen to
10117 allow scanning text forwards and backwards efficiently. This
10118 necessitated defining the possible bytes into three ranges which
10119 we shall call A, B, and C. Range A is used exclusively for
10120 single-byte characters, which is to say characters that are
10121 representing using only one contiguous byte. Multi-byte
10122 characters are always represented by using one byte from Range B,
10123 followed by one or more bytes from Range C. What this means is
10124 that bytes that begin a character are unequivocally distinguished
10125 from bytes that do not begin a character, and therefore there is
10126 never a problem scaling backwards and finding the beginning of a
10127 character. Know that UTF8 adopts a proposal that is very similar
10128 in spirit in that it uses separate ranges for the first byte of a
10129 multi byte sequence, and the following bytes in multi-byte
10130 sequence.
10131 @item
10132 Given the fact that all ordered character sets allowed were
10133 essentially 96 characters per dimension, it made perfect sense to
10134 make Range C comprise 96 bytes. With a little more tweaking, the
10135 currently-standard MULE byte representation was created, and was
10136 drafted from this.
10137 @item
10138 The MULE byte representation defined four basic representations for
10139 characters, which would take up from one to four bytes,
10140 respectively. The MULE character representation thus had the
10141 following constraints:
10142 @enumerate
10143 @item
10144 Character numbers zero through 255 should represent the
10145 characters that binary values zero through 255 would be
10146 mapped onto. (Note: this was not the case in Kenichi Handa's
10147 version of this representation, but I changed it.)
10148 @item
10149 The four sub-classes of representation in the MULE byte
10150 representation should correspond to four contiguous
10151 non-overlapping ranges of characters.
10152 @item
10153 The algorithmic conversion between the single character
10154 represented in the byte representation and in the character
10155 representation should be as easy as possible.
10156 @item
10157 Given the previous constraints, the character representation
10158 should be as compact as possible, which is to say it should
10159 use the least number of bits possible.
10160 @end enumerate
10161 @end enumerate
10162
10163 So you see that the entire structure of the byte and character
10164 representations stemmed from a very small number of basic choices,
10165 which were
10166
10167 @enumerate
10168 @item
10169 the choice to encode character set information in a character
10170 @item
10171 the choice to assume that all character sets would have an order
10172 imposed upon them with 96 characters per one or two
10173 dimensions. (This is less arbitrary than it seems--it follows
10174 ISO-2022)
10175 @item
10176 the choice to use a variable width byte representation.
10177 @end enumerate
10178
10179 What this means is that you cannot really separate the byte
10180 representation, the character representation, and the assumptions made
10181 about characters and whether they represent character sets from each
10182 other. All of these are closely intertwined, and for purposes of
10183 simplicity, they should be designed together. If you change one
10184 representation without changing another, you are in essence creating a
10185 completely new design with its own attendant problems--since your new
10186 design is likely to be quite complex and not very coherent with
10187 regards to the translation between the character and byte
10188 representations, you are likely to run into problems.
10189
10190 @node Introduction to Multilingual Issues #3, Introduction to Multilingual Issues #4, Introduction to Multilingual Issues #2, Multilingual Support
10191 @section Introduction to Multilingual Issues #3
10192 @cindex introduction to multilingual issues #3
10193
10194 In XEmacs, Mule is a code word for the support for input handling and
10195 display of multi-lingual text. This section provides an overview of how
10196 this support impacts the C and Lisp code in XEmacs. It is important for
10197 anyone who works on the C or the Lisp code, especially on the C code, to
10198 be aware of these issues, even if they don't work directly on code that
10199 implements multi-lingual features, because there are various general
10200 procedures that need to be followed in order to write Mule-compliant
10201 code. (The specifics of these procedures are documented elsewhere in
10202 this manual.)
10203
10204 There are four primary aspects of Mule support:
10205
10206 @enumerate
10207 @item
10208 internal handling and representation of multi-lingual text.
10209 @item
10210 conversion between the internal representation of text and the various
10211 external representations in which multi-lingual text is encoded, such as
10212 Unicode representations (including mostly fixed width encodings such as
10213 UCS-2/UTF-16 and UCS-4 and variable width ASCII conformant encodings,
10214 such as UTF-7 and UTF-8); the various ISO2022 representations, which
10215 typically use escape sequences to switch between different character
10216 sets (such as Compound Text, used under X Windows; JIS, used
10217 specifically for encoding Japanese; and EUC, a non-modal encoding used
10218 for Japanese, Korean, and certain other languages); Microsoft's
10219 multi-byte encodings (such as Shift-JIS); various simple encodings for
10220 particular 8-bit character sets (such as Latin-1 and Latin-2, and
10221 encodings (such as koi8 and Alternativny) for Cyrillic); and others.
10222 This conversion needs to happen both for text in files and text sent to
10223 or retrieved from system API calls. It even needs to happen for
10224 external binary data because the internal representation does not
10225 represent binary data simply as a sequence of bytes as it is represented
10226 externally.
10227 @item
10228 Proper display of multi-lingual characters.
10229 @item
10230 Input of multi-lingual text using the keyboard.
10231 @end enumerate
10232
10233 These four aspects are for the most part independent of each other.
10234
10235 @subheading Characters, Character Sets, and Encodings
10236
10237 A @dfn{character} (which is, BTW, a surprisingly complex concept) is, in
10238 a written representation of text, the most basic written unit that has a
10239 meaning of its own. It's comparable to a phoneme when analyzing words
10240 in spoken speech (for example, the sound of @samp{t} in English, which
10241 in fact has different pronunciations in different words -- aspirated in
10242 @samp{time}, unaspirated in @samp{stop}, unreleased or even pronounced
10243 as a glottal stop in @samp{button}, etc. -- but logically is a single
10244 concept). Like a phoneme, a character is an abstract concept defined by
10245 its @emph{meaning}. The character @samp{lowercase f}, for example, can
10246 always be used to represent the first letter in the word @samp{fill},
10247 regardless of whether it's drawn upright or italic, whether the
10248 @samp{fi} combination is drawn as a single ligature, whether there are
10249 serifs on the bottom of the vertical stroke, etc. (These different
10250 appearances of a single character are often called @dfn{graphs} or
10251 @dfn{glyphs}.) Our concern when representing text is on representing the
10252 abstract characters, and not on their exact appearance.
10253
10254 A @dfn{character set} (or @dfn{charset}), as we define it, is a set of
10255 characters, each with an associated number (or set of numbers -- see
10256 below), called a @dfn{code point}. It's important to understand that a
10257 character is not defined by any number attached to it, but by its
10258 meaning. For example, ASCII and EBCDIC are two charsets containing
10259 exactly the same characters (lowercase and uppercase letters, numbers 0
10260 through 9, particular punctuation marks) but with different
10261 numberings. The `comma' character in ASCII and EBCDIC, for instance, is
10262 the same character despite having a different numbering. Conversely,
10263 when comparing ASCII and JIS-Roman, which look the same except that the
10264 latter has a yen sign substituted for the backslash, we would say that
10265 the backslash and yen sign are @strong{not} the same characters, despite having
10266 the same number (95) and despite the fact that all other characters are
10267 present in both charsets, with the same numbering. ASCII and JIS-Roman,
10268 then, do @emph{not} have exactly the same characters in them (ASCII has
10269 a backslash character but no yen-sign character, and vice-versa for
10270 JIS-Roman), unlike ASCII and EBCDIC, even though the numberings in ASCII
10271 and JIS-Roman are closer.
10272
10273 It's also important to distinguish between charsets and encodings. For
10274 a simple charset like ASCII, there is only one encoding normally used --
10275 each character is represented by a single byte, with the same value as
10276 its code point. For more complicated charsets, however, things are not
10277 so obvious. Unicode version 2, for example, is a large charset with
10278 thousands of characters, each indexed by a 16-bit number, often
10279 represented in hex, e.g. 0x05D0 for the Hebrew letter "aleph". One
10280 obvious encoding uses two bytes per character (actually two encodings,
10281 depending on which of the two possible byte orderings is chosen). This
10282 encoding is convenient for internal processing of Unicode text; however,
10283 it's incompatible with ASCII, so a different encoding, e.g. UTF-8, is
10284 usually used for external text, for example files or e-mail. UTF-8
10285 represents Unicode characters with one to three bytes (often extended to
10286 six bytes to handle characters with up to 31-bit indices). Unicode
10287 characters 00 to 7F (identical with ASCII) are directly represented with
10288 one byte, and other characters with two or more bytes, each in the range
10289 80 to FF.
10290
10291 In general, a single encoding may be able to represent more than one
10292 charset.
10293
10294 @subheading Internal Representation of Text
10295
10296 In an ASCII or single-European-character-set world, life is very simple.
10297 There are 256 characters, and each character is represented using the
10298 numbers 0 through 255, which fit into a single byte. With a few
10299 exceptions (such as case-changing operations or syntax classes like
10300 'whitespace'), "text" is simply an array of indices into a font. You
10301 can get different languages simply by choosing fonts with different
10302 8-bit character sets (ISO-8859-1, -2, special-symbol fonts, etc.), and
10303 everything will "just work" as long as anyone else receiving your text
10304 uses a compatible font.
10305
10306 In the multi-lingual world, however, it is much more complicated. There
10307 are a great number of different characters which are organized in a
10308 complex fashion into various character sets. The representation to use
10309 is not obvious because there are issues of size versus speed to
10310 consider. In fact, there are in general two kinds of representations to
10311 work with: one that represents a single character using an integer
10312 (possibly a byte), and the other representing a single character as a
10313 sequence of bytes. The former representation is normally called fixed
10314 width, and the other variable width. Both representations represent
10315 exactly the same characters, and the conversion from one representation
10316 to the other is governed by a specific formula (rather than by table
10317 lookup) but it may not be simple. Most C code need not, and in fact
10318 should not, know the specifics of exactly how the representations work.
10319 In fact, the code must not make assumptions about the representations.
10320 This means in particular that it must use the proper macros for
10321 retrieving the character at a particular memory location, determining
10322 how many characters are present in a particular stretch of text, and
10323 incrementing a pointer to a particular character to point to the
10324 following character, and so on. It must not assume that one character
10325 is stored using one byte, or even using any particular number of bytes.
10326 It must not assume that the number of characters in a stretch of text
10327 bears any particular relation to a number of bytes in that stretch. It
10328 must not assume that the character at a particular memory location can
10329 be retrieved simply by dereferencing the memory location, even if a
10330 character is known to be ASCII or is being compared with an ASCII
10331 character, etc. Careful coding is required to be Mule clean. The
10332 biggest work of adding Mule support, in fact, is converting all of the
10333 existing code to be Mule clean.
10334
10335 Lisp code is mostly unaffected by these concerns. Text in strings and
10336 buffers appears simply as a sequence of characters regardless of
10337 whether Mule support is present. The biggest difference with older
10338 versions of Emacs, as well as current versions of GNU Emacs, is that
10339 integers and characters are no longer equivalent, but are separate
10340 Lisp Object types.
10341
10342 @subheading Conversion Between Internal and External Representations
10343
10344 All text needs to be converted to an external representation before being
10345 sent to a function or file, and all text retrieved from a function of
10346 file needs to be converted to the internal representation. This
10347 conversion needs to happen as close to the source or destination of the
10348 text as possible. No operations should ever be performed on text encoded
10349 in an external representation other than simple copying, because no
10350 assumptions can reliably be made about the format of this text. You
10351 cannot assume, for example, that the end of text is terminated by a null
10352 byte. (For example, if the text is Unicode, it will have many null bytes
10353 in it.) You cannot find the next "slash" character by searching through
10354 the bytes until you find a byte that looks like a "slash" character,
10355 because it might actually be the second byte of a Kanji character.
10356 Furthermore, all text in the internal representation must be converted,
10357 even if it is known to be completely ASCII, because the external
10358 representation may not be ASCII compatible (for example, if it is
10359 Unicode).
10360
10361 The place where C code needs to be the most careful is when calling
10362 external API functions. It is easy to forget that all text passed to or
10363 retrieved from these functions needs to be converted. This includes text
10364 in structures passed to or retrieved from these functions and all text
10365 that is passed to a callback function that is called by the system.
10366
10367 Macros are provided to perform conversions to or from external text.
10368 These macros are called TO_EXTERNAL_FORMAT and TO_INTERNAL_FORMAT
10369 respectively. These macros accept input in various forms, for example,
10370 Lisp strings, buffers, lstreams, raw data, and can return data in
10371 multiple formats, including both @code{malloc()}ed and @code{alloca()}ed data. The use
10372 of @code{alloca()}ed data here is particularly important because, in general,
10373 the returned data will not be used after making the API call, and as a
10374 result, using @code{alloca()}ed data provides a very cheap and easy to use
10375 method of allocation.
10376
10377 These macros take a coding system argument which indicates the nature of
10378 the external encoding. A coding system is an object that encapsulates
10379 the structures of a particular external encoding and the methods required
10380 to convert to and from this encoding. A facility exists to create coding
10381 system aliases, which in essence gives a single coding system two
10382 different names. It is effectively used in XEmacs to provide a layer of
10383 abstraction on top of the actual coding systems. For example, the coding
10384 system alias "file-name" points to whichever coding system is currently
10385 used for encoding and decoding file names as passed to or retrieved from
10386 system calls. In general, the actual encoding will differ from system to
10387 system, and also on the particular locale that the user is in. The use
10388 of the file-name alias effectively hides that implementation detail on
10389 top of that abstract interface layer which provides a unified set of
10390 coding systems which are consistent across all operating environments.
10391
10392 The choice of which coding system to use in a particular conversion macro
10393 requires some thought. In general, you should choose a lower-level
10394 actual coding system when the very design of the APIs you are working
10395 with call for that particular coding system. In all other cases, you
10396 should find the least general abstract coding system (i.e. coding system
10397 alias) that applies to your specific situation. Only use the most
10398 general coding systems, such as native, when there is simply nothing else
10399 that is more appropriate. By doing things this way, you allow the user
10400 more control over how the encoding actually works, because the user is
10401 free to map the abstracted coding system names onto to different actual
10402 coding systems.
10403
10404 Some common coding systems are:
10405
10406 @table @code
10407 @item ctext
10408 Compound Text, which is the standard encoding under X Windows, which is
10409 used for clipboard data and possibly other data. (ctext is a coding
10410 system of type ISO2022.)
10411
10412 @item mswindows-unicode
10413 this is used for representing text passed to MS Window API calls with
10414 arguments that need to be in Unicode format. (mswindows-unicode is a
10415 coding system of type UTF-16)
10416
10417 @item ms-windows-multi-byte
10418 this is used for representing text passed to MS Windows API calls with
10419 arguments that need to be in multi-byte format. Note that there are
10420 very few if any examples of such calls.
10421
10422 @item mswindows-tstr
10423 this is used for representing text passed to any MS Windows API calls
10424 that declare their argument as LPTSTR, or LPCTSTR. This is the vast
10425 majority of system calls and automatically translates either to
10426 mswindows-unicode or mswindows-multi-byte, depending on the presence or
10427 absence of the UNICODE preprocessor constant. (If we compile XEmacs
10428 with this preprocessor constant, then all API calls use Unicode for all
10429 text passed to or received from these API calls.)
10430
10431 @item terminal
10432 used for text sent to or read from a text terminal in the absence of a
10433 more specific coding system (calls to window-system specific APIs should
10434 use the appropriate window-specific coding system if it makes sense to
10435 do so.)
10436
10437 @item file-name
10438 used when specifying the names of files in the absence of a more
10439 specific encoding, such as ms-windows-tstr.
10440
10441 @item native
10442 the most general coding system for specifying text passed to system
10443 calls. This generally translates to whatever coding system is specified
10444 by the current locale. This should only be used when none of the coding
10445 systems mentioned above are appropriate.
10446 @end table
10447
10448 @subheading Proper Display of Multilingual Text
10449
10450 There are two things required to get this working correctly. One is
10451 selecting the correct font, and the other is encoding the text according
10452 to the encoding used for that specific font, or the window-system
10453 specific text display API. Generally each separate character set has a
10454 different font associated with it, which is specified by name and each
10455 font has an associated encoding into which the characters must be
10456 translated. (this is the case on X Windows, at least; on Windows there
10457 is a more general mechanism). Both the specific font for a charset and
10458 the encoding of that font are system dependent. Currently there is a
10459 way of specifying these two properties under X Windows (using the
10460 registry and ccl properties of a character set) but not for other window
10461 systems. A more general system needs to be implemented to allow these
10462 characteristics to be specified for all Windows systems.
10463
10464 Another issue is making sure that the necessary fonts for displaying
10465 various character sets are installed on the system. Currently, XEmacs
10466 provides, on its web site, X Windows fonts for a number of different
10467 character sets that can be installed by users. This isn't done yet for
10468 Windows, but it should be.
10469
10470 @subheading Inputting of Multilingual Text
10471
10472 This is a rather complicated issue because there are many paradigms
10473 defined for inputting multi-lingual text, some of which are specific to
10474 particular languages, and any particular language may have many
10475 different paradigms defined for inputting its text. These paradigms are
10476 encoded in input methods and there is a standard API for defining an
10477 input method in XEmacs called LEIM, or Library of Emacs Input Methods.
10478 Some of these input methods are written entirely in Elisp, and thus are
10479 system-independent, while others require the aid either of an external
10480 process, or of C level support that ties into a particular
10481 system-specific input method API, for example, XIM under X Windows, or
10482 the active keyboard layout and IME support under Windows. Currently,
10483 there is no support for any system-specific input methods under
10484 Microsoft Windows, although this will change.
10485
10486 @node Introduction to Multilingual Issues #4, Character Sets, Introduction to Multilingual Issues #3, Multilingual Support
10487 @section Introduction to Multilingual Issues #4
10488 @cindex introduction to multilingual issues #4
10489
10490 The rest of the sections in this chapter consist of yet another
10491 introduction to multilingual issues, duplicating the information in the
10492 previous sections.
10493
10494 @node Character Sets, Encodings, Introduction to Multilingual Issues #4, Multilingual Support
9112 @section Character Sets 10495 @section Character Sets
9113 @cindex character sets 10496 @cindex character sets
9114 10497
9115 A character set (or @dfn{charset}) is an ordered set of characters. A 10498 A @dfn{character set} (or @dfn{charset}) is an ordered set of
9116 particular character in a charset is indexed using one or more 10499 characters. A particular character in a charset is indexed using one or
9117 @dfn{position codes}, which are non-negative integers. The number of 10500 more @dfn{position codes}, which are non-negative integers. The number
9118 position codes needed to identify a particular character in a charset is 10501 of position codes needed to identify a particular character in a charset
9119 called the @dfn{dimension} of the charset. In XEmacs/Mule, all charsets 10502 is called the @dfn{dimension} of the charset. In XEmacs/Mule, all
9120 have dimension 1 or 2, and the size of all charsets (except for a few 10503 charsets have dimension 1 or 2, and the size of all charsets (except for
9121 special cases) is either 94, 96, 94 by 94, or 96 by 96. The range of 10504 a few special cases) is either 94, 96, 94 by 94, or 96 by 96. The range
9122 position codes used to index characters from any of these types of 10505 of position codes used to index characters from any of these types of
9123 character sets is as follows: 10506 character sets is as follows:
9124 10507
9125 @example 10508 @example
9126 Charset type Position code 1 Position code 2 10509 Charset type Position code 1 Position code 2
9127 ------------------------------------------------------------ 10510 ------------------------------------------------------------
9188 160 - 255 Latin-1 32 - 127 10571 160 - 255 Latin-1 32 - 127
9189 @end example 10572 @end example
9190 10573
9191 This is a bit ad-hoc but gets the job done. 10574 This is a bit ad-hoc but gets the job done.
9192 10575
9193 @node Encodings 10576 @node Encodings, Internal Mule Encodings, Character Sets, Multilingual Support
9194 @section Encodings 10577 @section Encodings
9195 @cindex encodings, Mule 10578 @cindex encodings, Mule
9196 @cindex Mule encodings 10579 @cindex Mule encodings
9197 10580
9198 An @dfn{encoding} is a way of numerically representing characters from 10581 An @dfn{encoding} is a way of numerically representing characters from
9213 10596
9214 Here are descriptions of a couple of common 10597 Here are descriptions of a couple of common
9215 encodings: 10598 encodings:
9216 10599
9217 @menu 10600 @menu
9218 * Japanese EUC (Extended Unix Code):: 10601 * Japanese EUC (Extended Unix Code)::
9219 * JIS7:: 10602 * JIS7::
9220 @end menu 10603 @end menu
9221 10604
9222 @node Japanese EUC (Extended Unix Code) 10605 @node Japanese EUC (Extended Unix Code), JIS7, Encodings, Encodings
9223 @subsection Japanese EUC (Extended Unix Code) 10606 @subsection Japanese EUC (Extended Unix Code)
9224 @cindex Japanese EUC (Extended Unix Code) 10607 @cindex Japanese EUC (Extended Unix Code)
9225 @cindex EUC (Extended Unix Code), Japanese 10608 @cindex EUC (Extended Unix Code), Japanese
9226 @cindex Extended Unix Code, Japanese EUC 10609 @cindex Extended Unix Code, Japanese EUC
9227 10610
9228 This encompasses the character sets Printing-ASCII, Japanese-JISX0201, 10611 This encompasses the character sets Printing-ASCII, Katakana-JISX0201
9229 and Japanese-JISX0208-Kana (half-width katakana, the right half of 10612 (half-width katakana, the right half of JISX0201), Japanese-JISX0208,
9230 JISX0201). It uses 8-bit bytes. 10613 and Japanese-JISX0212.
9231 10614
9232 Note that Printing-ASCII and Japanese-JISX0201-Kana are 94-character 10615 Note that Printing-ASCII and Katakana-JISX0201 are 94-character
9233 charsets, while Japanese-JISX0208 is a 94x94-character charset. 10616 charsets, while Japanese-JISX0208 and Japanese-JISX0212 are
10617 94x94-character charsets.
9234 10618
9235 The encoding is as follows: 10619 The encoding is as follows:
9236 10620
9237 @example 10621 @example
9238 Character set Representation (PC=position-code) 10622 Character set Representation (PC=position-code)
9239 ------------- -------------- 10623 ------------- --------------
9240 Printing-ASCII PC1 10624 Printing-ASCII PC1
9241 Japanese-JISX0201-Kana 0x8E | PC1 + 0x80 10625 Katakana-JISX0201 0x8E | PC1 + 0x80
9242 Japanese-JISX0208 PC1 + 0x80 | PC2 + 0x80 10626 Japanese-JISX0208 PC1 + 0x80 | PC2 + 0x80
9243 Japanese-JISX0212 PC1 + 0x80 | PC2 + 0x80 10627 Japanese-JISX0212 PC1 + 0x80 | PC2 + 0x80
9244 @end example 10628 @end example
9245 10629
9246 10630 Note that there are other versions of EUC for other Asian languages.
9247 @node JIS7 10631 EUC in general is characterized by
10632
10633 @enumerate
10634 @item
10635 row-column encoding,
10636 @item
10637 big-endian (row-first) ordering, and
10638 @item
10639 ASCII compatibility in variable width forms.
10640 @end enumerate
10641
10642 @node JIS7, , Japanese EUC (Extended Unix Code), Encodings
9248 @subsection JIS7 10643 @subsection JIS7
9249 @cindex JIS7 10644 @cindex JIS7
9250 10645
9251 This encompasses the character sets Printing-ASCII, 10646 This encompasses the character sets Printing-ASCII,
9252 Japanese-JISX0201-Roman (the left half of JISX0201; this character set 10647 Latin-JISX0201 (the left half of JISX0201; this character set
9253 is very similar to Printing-ASCII and is a 94-character charset), 10648 is very similar to Printing-ASCII and is a 94-character charset),
9254 Japanese-JISX0208, and Japanese-JISX0201-Kana. It uses 7-bit bytes. 10649 Japanese-JISX0208, and Katakana-JISX0201. It uses 7-bit bytes.
9255 10650
9256 Unlike Japanese EUC, this is a @dfn{modal} encoding, which 10651 Unlike EUC, this is a @dfn{modal} encoding, which means that there are
9257 means that there are multiple states that the encoding can 10652 multiple states that the encoding can be in, which affect how the bytes
9258 be in, which affect how the bytes are to be interpreted. 10653 are to be interpreted. Special sequences of bytes (called @dfn{escape
9259 Special sequences of bytes (called @dfn{escape sequences}) 10654 sequences}) are used to change states.
9260 are used to change states.
9261 10655
9262 The encoding is as follows: 10656 The encoding is as follows:
9263 10657
9264 @example 10658 @example
9265 Character set Representation (PC=position-code) 10659 Character set Representation (PC=position-code)
9266 ------------- -------------- 10660 ------------- --------------
9267 Printing-ASCII PC1 10661 Printing-ASCII PC1
9268 Japanese-JISX0201-Roman PC1 10662 Latin-JISX0201 PC1
9269 Japanese-JISX0201-Kana PC1 10663 Katakana-JISX0201 PC1
9270 Japanese-JISX0208 PC1 PC2 10664 Japanese-JISX0208 PC1 | PC2
9271 10665
9272 10666
9273 Escape sequence ASCII equivalent Meaning 10667 Escape sequence ASCII equivalent Meaning
9274 --------------- ---------------- ------- 10668 --------------- ---------------- -------
9275 0x1B 0x28 0x4A ESC ( J invoke Japanese-JISX0201-Roman 10669 0x1B 0x28 0x4A ESC ( J invoke Latin-JISX0201
9276 0x1B 0x28 0x49 ESC ( I invoke Japanese-JISX0201-Kana 10670 0x1B 0x28 0x49 ESC ( I invoke Katakana-JISX0201
9277 0x1B 0x24 0x42 ESC $ B invoke Japanese-JISX0208 10671 0x1B 0x24 0x42 ESC $ B invoke Japanese-JISX0208
9278 0x1B 0x28 0x42 ESC ( B invoke Printing-ASCII 10672 0x1B 0x28 0x42 ESC ( B invoke Printing-ASCII
9279 @end example 10673 @end example
9280 10674
9281 Initially, Printing-ASCII is invoked. 10675 Initially, Printing-ASCII is invoked.
9282 10676
9283 @node Internal Mule Encodings 10677 @node Internal Mule Encodings, Byte/Character Types; Buffer Positions; Other Typedefs, Encodings, Multilingual Support
9284 @section Internal Mule Encodings 10678 @section Internal Mule Encodings
9285 @cindex internal Mule encodings 10679 @cindex internal Mule encodings
9286 @cindex Mule encodings, internal 10680 @cindex Mule encodings, internal
9287 @cindex encodings, internal Mule 10681 @cindex encodings, internal Mule
9288 10682
9297 these are user-defined charsets. 10691 these are user-defined charsets.
9298 10692
9299 More specifically: 10693 More specifically:
9300 10694
9301 @example 10695 @example
9302 Character set Leading byte 10696 Character set Leading byte
9303 ------------- ------------ 10697 ------------- ------------
9304 ASCII 0 10698 ASCII 0 (0x7F in arrays indexed by leading byte)
9305 Composite 0x80 10699 Composite 0x8D
9306 Dimension-1 Official 0x81 - 0x8D 10700 Dimension-1 Official 0x80 - 0x8C/0x8D
9307 (0x8E is free) 10701 (0x8E is free)
9308 Control-1 0x8F 10702 Control 0x8F
9309 Dimension-2 Official 0x90 - 0x99 10703 Dimension-2 Official 0x90 - 0x99
9310 (0x9A - 0x9D are free; 10704 (0x9A - 0x9D are free)
9311 0x9E and 0x9F are reserved) 10705 Dimension-1 Private Marker 0x9E
9312 Dimension-1 Private 0xA0 - 0xEF 10706 Dimension-2 Private Marker 0x9F
9313 Dimension-2 Private 0xF0 - 0xFF 10707 Dimension-1 Private 0xA0 - 0xEF
10708 Dimension-2 Private 0xF0 - 0xFF
9314 @end example 10709 @end example
9315 10710
9316 There are two internal encodings for characters in XEmacs/Mule. One is 10711 There are two internal encodings for characters in XEmacs/Mule. One is
9317 called @dfn{string encoding} and is an 8-bit encoding that is used for 10712 called @dfn{string encoding} and is an 8-bit encoding that is used for
9318 representing characters in a buffer or string. It uses 1 to 4 bytes per 10713 representing characters in a buffer or string. It uses 1 to 4 bytes per
9323 (In the following descriptions, we'll ignore composite characters for 10718 (In the following descriptions, we'll ignore composite characters for
9324 the moment. We also give a general (structural) overview first, 10719 the moment. We also give a general (structural) overview first,
9325 followed later by the exact details.) 10720 followed later by the exact details.)
9326 10721
9327 @menu 10722 @menu
9328 * Internal String Encoding:: 10723 * Internal String Encoding::
9329 * Internal Character Encoding:: 10724 * Internal Character Encoding::
9330 @end menu 10725 @end menu
9331 10726
9332 @node Internal String Encoding 10727 @node Internal String Encoding, Internal Character Encoding, Internal Mule Encodings, Internal Mule Encodings
9333 @subsection Internal String Encoding 10728 @subsection Internal String Encoding
9334 @cindex internal string encoding 10729 @cindex internal string encoding
9335 @cindex string encoding, internal 10730 @cindex string encoding, internal
9336 @cindex encoding, internal string 10731 @cindex encoding, internal string
9337 10732
9380 None of the standard non-modal encodings meet all of these 10775 None of the standard non-modal encodings meet all of these
9381 conditions. For example, EUC satisfies only (2) and (3), while 10776 conditions. For example, EUC satisfies only (2) and (3), while
9382 Shift-JIS and Big5 (not yet described) satisfy only (2). (All 10777 Shift-JIS and Big5 (not yet described) satisfy only (2). (All
9383 non-modal encodings must satisfy (2), in order to be unambiguous.) 10778 non-modal encodings must satisfy (2), in order to be unambiguous.)
9384 10779
9385 @node Internal Character Encoding 10780 @node Internal Character Encoding, , Internal String Encoding, Internal Mule Encodings
9386 @subsection Internal Character Encoding 10781 @subsection Internal Character Encoding
9387 @cindex internal character encoding 10782 @cindex internal character encoding
9388 @cindex character encoding, internal 10783 @cindex character encoding, internal
9389 @cindex encoding, internal character 10784 @cindex encoding, internal character
9390 10785
9404 ------------- ------- ------- ------- 10799 ------------- ------- ------- -------
9405 ASCII 0 0 PC1 10800 ASCII 0 0 PC1
9406 range: (00 - 7F) 10801 range: (00 - 7F)
9407 Control-1 0 1 PC1 10802 Control-1 0 1 PC1
9408 range: (00 - 1F) 10803 range: (00 - 1F)
9409 Dimension-1 official 0 LB - 0x80 PC1 10804 Dimension-1 official 0 LB - 0x7F PC1
9410 range: (01 - 0D) (20 - 7F) 10805 range: (01 - 0D) (20 - 7F)
9411 Dimension-1 private 0 LB - 0x80 PC1 10806 Dimension-1 private 0 LB - 0x80 PC1
9412 range: (20 - 6F) (20 - 7F) 10807 range: (20 - 6F) (20 - 7F)
9413 Dimension-2 official LB - 0x8F PC1 PC2 10808 Dimension-2 official LB - 0x8F PC1 PC2
9414 range: (01 - 0A) (20 - 7F) (20 - 7F) 10809 range: (01 - 0A) (20 - 7F) (20 - 7F)
9415 Dimension-2 private LB - 0xE1 PC1 PC2 10810 Dimension-2 private LB - 0xE1 PC1 PC2
9416 range: (0F - 1E) (20 - 7F) (20 - 7F) 10811 range: (0F - 1E) (20 - 7F) (20 - 7F)
9417 Composite 0x1F ? ? 10812 Composite 0x1F ? ?
9418 @end example 10813 @end example
9419 10814
9420 Note that character codes 0 - 255 are the same as the ``binary encoding'' 10815 Note that character codes 0 - 255 are the same as the ``binary
9421 described above. 10816 encoding'' described above.
9422 10817
9423 @node CCL 10818 Most of the code in XEmacs knows nothing of the representation of a
10819 character other than that values 0 - 255 represent ASCII, Control 1,
10820 and Latin 1.
10821
10822 @strong{WARNING WARNING WARNING}: The Boyer-Moore code in
10823 @file{search.c}, and the code in @code{search_buffer()} that determines
10824 whether that code can be used, knows that ``field 3'' in a character
10825 always corresponds to the last byte in the textual representation of the
10826 character. (This is important because the Boyer-Moore algorithm works by
10827 looking at the last byte of the search string and &&#### finish this.
10828
10829 @node Byte/Character Types; Buffer Positions; Other Typedefs, Internal Text API's, Internal Mule Encodings, Multilingual Support
10830 @section Byte/Character Types; Buffer Positions; Other Typedefs
10831 @cindex byte/character types; buffer positions; other typedefs
10832 @cindex byte/character types
10833 @cindex character types
10834 @cindex buffer positions
10835 @cindex typedefs, other
10836
10837 @menu
10838 * Byte Types::
10839 * Different Ways of Seeing Internal Text::
10840 * Buffer Positions::
10841 * Other Typedefs::
10842 * Usage of the Various Representations::
10843 * Working With the Various Representations::
10844 @end menu
10845
10846 @node Byte Types, Different Ways of Seeing Internal Text, Byte/Character Types; Buffer Positions; Other Typedefs, Byte/Character Types; Buffer Positions; Other Typedefs
10847 @subsection Byte Types
10848 @cindex byte types
10849
10850 Stuff pointed to by a char * or unsigned char * will nearly always be
10851 one of the following types:
10852
10853 @itemize @minus
10854 @item
10855 a) [Ibyte] pointer to internally-formatted text
10856 @item
10857 b) [Extbyte] pointer to text in some external format, which can be
10858 defined as all formats other than the internal one
10859 @item
10860 c) [Ascbyte] pure ASCII text
10861 @item
10862 d) [Binbyte] binary data that is not meant to be interpreted as text
10863 @item
10864 e) [Rawbyte] general data in memory, where we don't care about whether
10865 it's text or binary
10866 @item
10867 f) [Boolbyte] a zero or a one
10868 @item
10869 g) [Bitbyte] a byte used for bit fields
10870 @item
10871 h) [Chbyte] null-semantics @code{char *}; used when casting an argument to
10872 an external API where the the other types may not be
10873 appropriate
10874 @end itemize
10875
10876 Types (b), (c), (f) and (h) are defined as @code{char}, while the others are
10877 @code{unsigned char}. This is for maximum safety (signed characters are
10878 dangerous to work with) while maintaining as much compatibility with
10879 external API's and string constants as possible.
10880
10881 We also provide versions of the above types defined with different
10882 underlying C types, for API compatibility. These use the following
10883 prefixes:
10884
10885 @example
10886 C = plain char, when the base type is unsigned
10887 U = unsigned
10888 S = signed
10889 @end example
10890
10891 (Formerly I had a comment saying that type (e) "should be replaced with
10892 void *". However, there are in fact many places where an unsigned char
10893 * might be used -- e.g. for ease in pointer computation, since void *
10894 doesn't allow this, and for compatibility with external API's.)
10895
10896 Note that these typedefs are purely for documentation purposes; from
10897 the C code's perspective, they are exactly equivalent to @code{char *},
10898 @code{unsigned char *}, etc., so you can freely use them with library
10899 functions declared as such.
10900
10901 Using these more specific types rather than the general ones helps avoid
10902 the confusions that occur when the semantics of a char * or unsigned
10903 char * argument being studied are unclear. Furthermore, by requiring
10904 that ALL uses of @code{char} be replaced with some other type as part of the
10905 Mule-ization process, we can use a search for @code{char} as a way of finding
10906 code that has not been properly Mule-ized yet.
10907
10908 @node Different Ways of Seeing Internal Text, Buffer Positions, Byte Types, Byte/Character Types; Buffer Positions; Other Typedefs
10909 @subsection Different Ways of Seeing Internal Text
10910 @cindex different ways of seeing internal text
10911
10912 There are various ways of representing internal text. The two primary
10913 ways are as an "array" of individual characters; the other is as a
10914 "stream" of bytes. In the ASCII world, where there are only 255
10915 characters at most, things are easy because each character fits into a
10916 byte. In general, however, this is not true -- see the above discussion
10917 of characters vs. encodings.
10918
10919 In some cases, it's also important to distinguish between a stream
10920 representation as a series of bytes and as a series of textual units.
10921 This is particularly important wrt Unicode. The UTF-16 representation
10922 (sometimes referred to, rather sloppily, as simply the "Unicode" format)
10923 represents text as a series of 16-bit units. Mostly, each unit
10924 corresponds to a single character, but not necessarily, as characters
10925 outside of the range 0-65535 (the BMP or "Basic Multilingual Plane" of
10926 Unicode) require two 16-bit units, through the mechanism of
10927 "surrogates". When a series of 16-bit units is serialized into a byte
10928 stream, there are at least two possible representations, little-endian
10929 and big-endian, and which one is used may depend on the native format of
10930 16-bit integers in the CPU of the machine that XEmacs is running
10931 on. (Similarly, UTF-32 is logically a representation with 32-bit textual
10932 units.)
10933
10934 Specifically:
10935
10936 @itemize @minus
10937 @item
10938 UTF-8 has 1-byte (8-bit) units.
10939 @item
10940 UTF-16 has 2-byte (16-bit) units.
10941 @item
10942 UTF-32 has 4-byte (32-bit) units.
10943 @item
10944 XEmacs-internal encoding (the old "Mule" encoding) has 1-byte (8-bit)
10945 units.
10946 @item
10947 UTF-7 technically has 7-bit units that are within the "mail-safe" range
10948 (ASCII 32 - 126 plus a few control characters), but normally is encoded
10949 in an 8-bit stream. (UTF-7 is also a modal encoding, since it has a
10950 normal mode where printable ASCII characters represent themselves and a
10951 shifted mode, introduced with a plus sign, where a base-64 encoding is
10952 used.)
10953 @item
10954 UTF-5 technically has 7-bit units (normally encoded in an 8-bit stream,
10955 like UTF-7), but only uses uppercase A-V and 0-9, and only encodes 4
10956 bits worth of data per character. UTF-5 is meant for encoding Unicode
10957 inside of DNS names.
10958 @end itemize
10959
10960 Thus, we can imagine three levels in the representation of texual data:
10961
10962 @example
10963 series of characters -> series of textual units -> series of bytes
10964 [Ichar] [Itext] [Ibyte]
10965 @end example
10966
10967 XEmacs has three corresponding typedefs:
10968
10969 @itemize @minus
10970 @item
10971 An Ichar is an integer (at least 32-bit), representing a 31-bit
10972 character.
10973 @item
10974 An Itext is an unsigned value, either 8, 16 or 32 bits, depending
10975 on the nature of the internal representation, and corresponding to
10976 a single textual unit.
10977 @item
10978 An Ibyte is an @code{unsigned char}, representing a single byte in a
10979 textual byte stream.
10980 @end itemize
10981
10982 Internal text in stream format can be simultaneously viewed as either
10983 @code{Itext *} or @code{Ibyte *}. The @code{Ibyte *} representation is convenient for
10984 copying data from one place to another, because such routines usually
10985 expect byte counts. However, @code{Itext *} is much better for actually
10986 working with the data.
10987
10988 From a text-unit perspective, units 0 through 127 will always be ASCII
10989 compatible, and data in Lisp strings (and other textual data generated
10990 as a whole, e.g. from external conversion) will be followed by a
10991 null-unit terminator. From an @code{Ibyte *} perspective, however, the
10992 encoding is only ASCII-compatible if it uses 1-byte units.
10993
10994 Similarly to the different text representations, three integral count
10995 types exist -- Charcount, Textcount and Bytecount.
10996
10997 NOTE: Despite the presence of the terminator, internal text itself can
10998 have nulls in it! (Null text units, not just the null bytes present in
10999 any UTF-16 encoding.) The terminator is present because in many cases
11000 internal text is passed to routines that will ultimately pass the text
11001 to library functions that cannot handle embedded nulls, e.g. functions
11002 manipulating filenames, and it is a real hassle to have to pass the
11003 length around constantly. But this can lead to sloppy coding! We need
11004 to be careful about watching for nulls in places that are important,
11005 e.g. manipulating string objects or passing data to/from the clipboard.
11006
11007 @table @code
11008 @item Ibyte
11009 The data in a buffer or string is logically made up of Ibyte objects,
11010 where a Ibyte takes up the same amount of space as a char. (It is
11011 declared differently, though, to catch invalid usages.) Strings stored
11012 using Ibytes are said to be in "internal format". The important
11013 characteristics of internal format are
11014
11015 @itemize @minus
11016 @item
11017 ASCII characters are represented as a single Ibyte, in the range 0 -
11018 0x7f.
11019 @item
11020 All other characters are represented as a Ibyte in the range 0x80 - 0x9f
11021 followed by one or more Ibytes in the range 0xa0 to 0xff.
11022 @end itemize
11023
11024 This leads to a number of desirable properties:
11025
11026 @itemize @minus
11027 @item
11028 Given the position of the beginning of a character, you can find the
11029 beginning of the next or previous character in constant time.
11030 @item
11031 When searching for a substring or an ASCII character within the string,
11032 you need merely use standard searching routines.
11033 @end itemize
11034
11035 @item Itext
11036
11037 #### Document me.
11038
11039 @item Ichar
11040 This typedef represents a single Emacs character, which can be ASCII,
11041 ISO-8859, or some extended character, as would typically be used for
11042 Kanji. Note that the representation of a character as an Ichar is @strong{not}
11043 the same as the representation of that same character in a string; thus,
11044 you cannot do the standard C trick of passing a pointer to a character
11045 to a function that expects a string.
11046
11047 An Ichar takes up 19 bits of representation and (for code compatibility
11048 and such) is compatible with an int. This representation is visible on
11049 the Lisp level. The important characteristics of the Ichar
11050 representation are
11051
11052 @itemize @minus
11053 @item
11054 values 0x00 - 0x7f represent ASCII.
11055 @item
11056 values 0x80 - 0xff represent the right half of ISO-8859-1.
11057 @item
11058 values 0x100 and up represent all other characters.
11059 @end itemize
11060
11061 This means that Ichar values are upwardly compatible with the standard
11062 8-bit representation of ASCII/ISO-8859-1.
11063
11064 @item Extbyte
11065 Strings that go in or out of Emacs are in "external format", typedef'ed
11066 as an array of char or a char *. There is more than one external format
11067 (JIS, EUC, etc.) but they all have similar properties. They are modal
11068 encodings, which is to say that the meaning of particular bytes is not
11069 fixed but depends on what "mode" the string is currently in (e.g. bytes
11070 in the range 0 - 0x7f might be interpreted as ASCII, or as Hiragana, or
11071 as 2-byte Kanji, depending on the current mode). The mode starts out in
11072 ASCII/ISO-8859-1 and is switched using escape sequences -- for example,
11073 in the JIS encoding, 'ESC $ B' switches to a mode where pairs of bytes
11074 in the range 0 - 0x7f are interpreted as Kanji characters.
11075
11076 External-formatted data is generally desirable for passing data between
11077 programs because it is upwardly compatible with standard
11078 ASCII/ISO-8859-1 strings and may require less space than internal
11079 encodings such as the one described above. In addition, some encodings
11080 (e.g. JIS) keep all characters (except the ESC used to switch modes) in
11081 the printing ASCII range 0x20 - 0x7e, which results in a much higher
11082 probability that the data will avoid being garbled in transmission.
11083 Externally-formatted data is generally not very convenient to work with,
11084 however, and for this reason is usually converted to internal format
11085 before any work is done on the string.
11086
11087 NOTE: filenames need to be in external format so that ISO-8859-1
11088 characters come out correctly.
11089 @end table
11090
11091 @node Buffer Positions, Other Typedefs, Different Ways of Seeing Internal Text, Byte/Character Types; Buffer Positions; Other Typedefs
11092 @subsection Buffer Positions
11093 @cindex buffer positions
11094
11095 There are three possible ways to specify positions in a buffer. All
11096 of these are one-based: the beginning of the buffer is position or
11097 index 1, and 0 is not a valid position.
11098
11099 As a "buffer position" (typedef Charbpos):
11100
11101 This is an index specifying an offset in characters from the
11102 beginning of the buffer. Note that buffer positions are
11103 logically @strong{between} characters, not on a character. The
11104 difference between two buffer positions specifies the number of
11105 characters between those positions. Buffer positions are the
11106 only kind of position externally visible to the user.
11107
11108 As a "byte index" (typedef Bytebpos):
11109
11110 This is an index over the bytes used to represent the characters
11111 in the buffer. If there is no Mule support, this is identical
11112 to a buffer position, because each character is represented
11113 using one byte. However, with Mule support, many characters
11114 require two or more bytes for their representation, and so a
11115 byte index may be greater than the corresponding buffer
11116 position.
11117
11118 As a "memory index" (typedef Membpos):
11119
11120 This is the byte index adjusted for the gap. For positions
11121 before the gap, this is identical to the byte index. For
11122 positions after the gap, this is the byte index plus the gap
11123 size. There are two possible memory indices for the gap
11124 position; the memory index at the beginning of the gap should
11125 always be used, except in code that deals with manipulating the
11126 gap, where both indices may be seen. The address of the
11127 character "at" (i.e. following) a particular position can be
11128 obtained from the formula
11129
11130 buffer_start_address + memory_index(position) - 1
11131
11132 except in the case of characters at the gap position.
11133
11134 @node Other Typedefs, Usage of the Various Representations, Buffer Positions, Byte/Character Types; Buffer Positions; Other Typedefs
11135 @subsection Other Typedefs
11136 @cindex other typedefs
11137
11138 Charcount:
11139 ----------
11140 This typedef represents a count of characters, such as
11141 a character offset into a string or the number of
11142 characters between two positions in a buffer. The
11143 difference between two Charbpos's is a Charcount, and
11144 character positions in a string are represented using
11145 a Charcount.
11146
11147 Textcount:
11148 ----------
11149 #### Document me.
11150
11151 Bytecount:
11152 ----------
11153 Similar to a Charcount but represents a count of bytes.
11154 The difference between two Bytebpos's is a Bytecount.
11155
11156
11157 @node Usage of the Various Representations, Working With the Various Representations, Other Typedefs, Byte/Character Types; Buffer Positions; Other Typedefs
11158 @subsection Usage of the Various Representations
11159 @cindex usage of the various representations
11160
11161 Memory indices are used in low-level functions in insdel.c and for
11162 extent endpoints and marker positions. The reason for this is that
11163 this way, the extents and markers don't need to be updated for most
11164 insertions, which merely shrink the gap and don't move any
11165 characters around in memory.
11166
11167 (The beginning-of-gap memory index simplifies insertions w.r.t.
11168 markers, because text usually gets inserted after markers. For
11169 extents, it is merely for consistency, because text can get
11170 inserted either before or after an extent's endpoint depending on
11171 the open/closedness of the endpoint.)
11172
11173 Byte indices are used in other code that needs to be fast,
11174 such as the searching, redisplay, and extent-manipulation code.
11175
11176 Buffer positions are used in all other code. This is because this
11177 representation is easiest to work with (especially since Lisp
11178 code always uses buffer positions), necessitates the fewest
11179 changes to existing code, and is the safest (e.g. if the text gets
11180 shifted underneath a buffer position, it will still point to a
11181 character; if text is shifted under a byte index, it might point
11182 to the middle of a character, which would be bad).
11183
11184 Similarly, Charcounts are used in all code that deals with strings
11185 except for code that needs to be fast, which used Bytecounts.
11186
11187 Strings are always passed around internally using internal format.
11188 Conversions between external format are performed at the time
11189 that the data goes in or out of Emacs.
11190
11191 @node Working With the Various Representations, , Usage of the Various Representations, Byte/Character Types; Buffer Positions; Other Typedefs
11192 @subsection Working With the Various Representations
11193 @cindex working with the various representations
11194
11195 We write things this way because it's very important the
11196 MAX_BYTEBPOS_GAP_SIZE_3 is a multiple of 3. (As it happens,
11197 65535 is a multiple of 3, but this may not always be the
11198 case. #### unfinished
11199
11200 @node Internal Text API's, Coding for Mule, Byte/Character Types; Buffer Positions; Other Typedefs, Multilingual Support
11201 @section Internal Text API's
11202 @cindex internal text API's
11203 @cindex text API's, internal
11204 @cindex API's, text, internal
11205
11206 @strong{NOTE}: The most current documentation for these API's is in
11207 @file{text.h}. In case of error, assume that file is correct and this
11208 one wrong.
11209
11210 @menu
11211 * Basic internal-format API's::
11212 * The DFC API::
11213 * The Eistring API::
11214 @end menu
11215
11216 @node Basic internal-format API's, The DFC API, Internal Text API's, Internal Text API's
11217 @subsection Basic internal-format API's
11218 @cindex basic internal-format API's
11219 @cindex internal-format API's, basic
11220 @cindex API's, basic internal-format
11221
11222 These are simple functions and macros to convert between text
11223 representation and characters, move forward and back in text, etc.
11224
11225 #### Finish the rest of this.
11226
11227 Use the following functions/macros on contiguous text in any of the
11228 internal formats. Those that take a format arg work on all internal
11229 formats; the others work only on the default (variable-width under Mule)
11230 format. If the text you're operating on is known to come from a buffer,
11231 use the buffer-level functions in buffer.h, which automatically know the
11232 correct format and handle the gap.
11233
11234 Some terminology:
11235
11236 "itext" appearing in the macros means "internal-format text" -- type
11237 @code{Ibyte *}. Operations on such pointers themselves, rather than on the
11238 text being pointed to, have "itext" instead of "itext" in the macro
11239 name. "ichar" in the macro names means an Ichar -- the representation
11240 of a character as a single integer rather than a series of bytes, as part
11241 of "itext". Many of the macros below are for converting between the
11242 two representations of characters.
11243
11244 Note also that we try to consistently distinguish between an "Ichar" and
11245 a Lisp character. Stuff working with Lisp characters often just says
11246 "char", so we consistently use "Ichar" when that's what we're working
11247 with.
11248
11249 @node The DFC API, The Eistring API, Basic internal-format API's, Internal Text API's
11250 @subsection The DFC API
11251 @cindex DFC API
11252 @cindex API, DFC
11253
11254 This is for conversion between internal and external text. Note that
11255 there is also the "new DFC" API, which @strong{returns} a pointer to the
11256 converted text (in alloca space), rather than storing it into a
11257 variable.
11258
11259 The macros below are used for converting data between different formats.
11260 Generally, the data is textual, and the formats are related to
11261 internationalization (e.g. converting between internal-format text and
11262 UTF-8) -- but the mechanism is general, and could be used for anything,
11263 e.g. decoding gzipped data.
11264
11265 In general, conversion involves a source of data, a sink, the existing
11266 format of the source data, and the desired format of the sink. The
11267 macros below, however, always require that either the source or sink is
11268 internal-format text. Therefore, in practice the conversions below
11269 involve source, sink, an external format (specified by a coding system),
11270 and the direction of conversion (internal->external or vice-versa).
11271
11272 Sources and sinks can be raw data (sized or unsized -- when unsized,
11273 input data is assumed to be null-terminated [double null-terminated for
11274 Unicode-format data], and on output the length is not stored anywhere),
11275 Lisp strings, Lisp buffers, lstreams, and opaque data objects. When the
11276 output is raw data, the result can be allocated either with @code{alloca()} or
11277 @code{malloc()}. (There is currently no provision for writing into a fixed
11278 buffer. If you want this, use @code{alloca()} output and then copy the data --
11279 but be careful with the size! Unless you are very sure of the encoding
11280 being used, upper bounds for the size are not in general computable.)
11281 The obvious restrictions on source and sink types apply (e.g. Lisp
11282 strings are a source and sink only for internal data).
11283
11284 All raw data outputted will contain an extra null byte (two bytes for
11285 Unicode -- currently, in fact, all output data, whether internal or
11286 external, is double-null-terminated, but you can't count on this; see
11287 below). This means that enough space is allocated to contain the extra
11288 nulls; however, these nulls are not reflected in the returned output
11289 size.
11290
11291 The most basic macros are TO_EXTERNAL_FORMAT and TO_INTERNAL_FORMAT.
11292 These can be used to convert between any kinds of sources or sinks.
11293 However, 99% of conversions involve raw data or Lisp strings as both
11294 source and sink, and usually data is output as @code{alloca()} rather than
11295 @code{malloc()}. For this reason, convenience macros are defined for many types
11296 of conversions involving raw data and/or Lisp strings, especially when
11297 the output is an @code{alloca()}ed string. (When the destination is a
11298 Lisp_String, there are other functions that should be used instead --
11299 @code{build_ext_string()} and @code{make_ext_string()}, for example.) The convenience
11300 macros are of two types -- the older kind that store the result into a
11301 specified variable, and the newer kind that return the result. The newer
11302 kind of macros don't exist when the output is sized data, because that
11303 would have two return values. NOTE: All convenience macros are
11304 ultimately defined in terms of TO_EXTERNAL_FORMAT and TO_INTERNAL_FORMAT.
11305 Thus, any comments below about the workings of these macros also apply to
11306 all convenience macros.
11307
11308 @example
11309 TO_EXTERNAL_FORMAT (source_type, source, sink_type, sink, codesys)
11310 TO_INTERNAL_FORMAT (source_type, source, sink_type, sink, codesys)
11311 @end example
11312
11313 Typical use is
11314
11315 @example
11316 TO_EXTERNAL_FORMAT (LISP_STRING, str, C_STRING_MALLOC, ptr, Qfile_name);
11317 @end example
11318
11319 which means that the contents of the lisp string @var{str} are written
11320 to a malloc'ed memory area which will be pointed to by @var{ptr}, after the
11321 function returns. The conversion will be done using the @code{file-name}
11322 coding system (which will be controlled by the user indirectly by
11323 setting or binding the variable @code{file-name-coding-system}).
11324
11325 Some sources and sinks require two C variables to specify. We use
11326 some preprocessor magic to allow different source and sink types, and
11327 even different numbers of arguments to specify different types of
11328 sources and sinks.
11329
11330 So we can have a call that looks like
11331
11332 @example
11333 TO_INTERNAL_FORMAT (DATA, (ptr, len),
11334 MALLOC, (ptr, len),
11335 coding_system);
11336 @end example
11337
11338 The parenthesized argument pairs are required to make the
11339 preprocessor magic work.
11340
11341 NOTE: GC is inhibited during the entire operation of these macros. This
11342 is because frequently the data to be converted comes from strings but
11343 gets passed in as just DATA, and GC may move around the string data. If
11344 we didn't inhibit GC, there'd have to be a lot of messy recoding,
11345 alloca-copying of strings and other annoying stuff.
11346
11347 The source or sink can be specified in one of these ways:
11348
11349 @example
11350 DATA, (ptr, len), // input data is a fixed buffer of size len
11351 ALLOCA, (ptr, len), // output data is in a @code{ALLOCA()}ed buffer of size len
11352 MALLOC, (ptr, len), // output data is in a @code{malloc()}ed buffer of size len
11353 C_STRING_ALLOCA, ptr, // equivalent to ALLOCA (ptr, len_ignored) on output
11354 C_STRING_MALLOC, ptr, // equivalent to MALLOC (ptr, len_ignored) on output
11355 C_STRING, ptr, // equivalent to DATA, (ptr, strlen/wcslen (ptr))
11356 // on input (the Unicode version is used when correct)
11357 LISP_STRING, string, // input or output is a Lisp_Object of type string
11358 LISP_BUFFER, buffer, // output is written to (point) in lisp buffer
11359 LISP_LSTREAM, lstream, // input or output is a Lisp_Object of type lstream
11360 LISP_OPAQUE, object, // input or output is a Lisp_Object of type opaque
11361 @end example
11362
11363 When specifying the sink, use lvalues, since the macro will assign to them,
11364 except when the sink is an lstream or a lisp buffer.
11365
11366 For the sink types @code{ALLOCA} and @code{C_STRING_ALLOCA}, the resulting text is
11367 stored in a stack-allocated buffer, which is automatically freed on
11368 returning from the function. However, the sink types @code{MALLOC} and
11369 @code{C_STRING_MALLOC} return @code{xmalloc()}ed memory. The caller is responsible
11370 for freeing this memory using @code{xfree()}.
11371
11372 The macros accept the kinds of sources and sinks appropriate for
11373 internal and external data representation. See the type_checking_assert
11374 macros below for the actual allowed types.
11375
11376 Since some sources and sinks use one argument (a Lisp_Object) to
11377 specify them, while others take a (pointer, length) pair, we use
11378 some C preprocessor trickery to allow pair arguments to be specified
11379 by parenthesizing them, as in the examples above.
11380
11381 Anything prefixed by dfc_ (`data format conversion') is private.
11382 They are only used to implement these macros.
11383
11384 [[Using C_STRING* is appropriate for using with external APIs that
11385 take null-terminated strings. For internal data, we should try to
11386 be '\0'-clean - i.e. allow arbitrary data to contain embedded '\0'.
11387
11388 Sometime in the future we might allow output to C_STRING_ALLOCA or
11389 C_STRING_MALLOC _only_ with @code{TO_EXTERNAL_FORMAT()}, not
11390 @code{TO_INTERNAL_FORMAT()}.]]
11391
11392 The above comments are not true. Frequently (most of the time, in
11393 fact), external strings come as zero-terminated entities, where the
11394 zero-termination is the only way to find out the length. Even in
11395 cases where you can get the length, most of the time the system will
11396 still use the null to signal the end of the string, and there will
11397 still be no way to either send in or receive a string with embedded
11398 nulls. In such situations, it's pointless to track the length
11399 because null bytes can never be in the string. We have a lot of
11400 operations that make it easy to operate on zero-terminated strings,
11401 and forcing the user the deal with the length everywhere would only
11402 make the code uglier and more complicated, for no gain. --ben
11403
11404 There is no problem using the same lvalue for source and sink.
11405
11406 Also, when pointers are required, the code (currently at least) is
11407 lax and allows any pointer types, either in the source or the sink.
11408 This makes it possible, e.g., to deal with internal format data held
11409 in char *'s or external format data held in WCHAR * (i.e. Unicode).
11410
11411 Finally, whenever storage allocation is called for, extra space is
11412 allocated for a terminating zero, and such a zero is stored in the
11413 appropriate place, regardless of whether the source data was
11414 specified using a length or was specified as zero-terminated. This
11415 allows you to freely pass the resulting data, no matter how
11416 obtained, to a routine that expects zero termination (modulo, of
11417 course, that any embedded zeros in the resulting text will cause
11418 truncation). In fact, currently two embedded zeros are allocated
11419 and stored after the data result. This is to allow for the
11420 possibility of storing a Unicode value on output, which needs the
11421 two zeros. Currently, however, the two zeros are stored regardless
11422 of whether the conversion is internal or external and regardless of
11423 whether the external coding system is in fact Unicode. This
11424 behavior may change in the future, and you cannot rely on this --
11425 the most you can rely on is that sink data in Unicode format will
11426 have two terminating nulls, which combine to form one Unicode null
11427 character.
11428
11429 NOTE: You might ask, why are these not written as functions that
11430 @strong{RETURN} the converted string, since that would allow them to be used
11431 much more conveniently, without having to constantly declare temporary
11432 variables? The answer is that in fact I originally did write the
11433 routines that way, but that required either
11434
11435 @itemize @bullet
11436 @item
11437 (a) calling @code{alloca()} inside of a function call, or
11438 @item
11439 (b) using expressions separated by commas and a global temporary variable, or
11440 @item
11441 (c) using the GCC extension (@{ ... @}).
11442 @end itemize
11443
11444 Turned out that all of the above had bugs, all caused by GCC (hence the
11445 comments about "those GCC wankers" and "ream gcc up the ass"). As for
11446 (a), some versions of GCC (especially on Intel platforms), which had
11447 buggy implementations of @code{alloca()} that couldn't handle being called
11448 inside of a function call -- they just decremented the stack right in the
11449 middle of pushing args. Oops, crash with stack trashing, very bad. (b)
11450 was an attempt to fix (a), and that led to further GCC crashes, esp. when
11451 you had two such calls in a single subexpression, because GCC couldn't be
11452 counted upon to follow even a minimally reasonable order of execution.
11453 True, you can't count on one argument being evaluated before another, but
11454 GCC would actually interleave them so that the temp var got stomped on by
11455 one while the other was accessing it. So I tried (c), which was
11456 problematic because that GCC extension has more bugs in it than a
11457 termite's nest.
11458
11459 So reluctantly I converted to the current way. Now, that was awhile ago
11460 (c. 1994), and it appears that the bug involving alloca in function calls
11461 has long since been fixed. More recently, I defined the new-dfc routines
11462 down below, which DO allow exactly such convenience of returning your
11463 args rather than store them in temp variables, and I also wrote a
11464 configure check to see whether @code{alloca()} causes crashes inside of function
11465 calls, and if so use the portable @code{alloca()} implementation in alloca.c.
11466 If you define TEST_NEW_DFC, the old routines get written in terms of the
11467 new ones, and I've had a beta put out with this on and it appeared to
11468 this appears to cause no problems -- so we should consider
11469 switching, and feel no compunctions about writing further such function-
11470 like @code{alloca()} routines in lieu of statement-like ones. --ben
11471
11472 @node The Eistring API, , The DFC API, Internal Text API's
11473 @subsection The Eistring API
11474 @cindex Eistring API
11475 @cindex API, Eistring
11476
11477 (This API is currently under-used) When doing simple things with
11478 internal text, the basic internal-format API's are enough. But to do
11479 things like delete or replace a substring, concatenate various strings,
11480 etc. is difficult to do cleanly because of the allocation issues.
11481 The Eistring API is designed to deal with this, and provides a clean
11482 way of modifying and building up internal text. (Note that the former
11483 lack of this API has meant that some code uses Lisp strings to do
11484 similar manipulations, resulting in excess garbage and increased
11485 garbage collection.)
11486
11487 NOTE: The Eistring API is (or should be) Mule-correct even without
11488 an ASCII-compatible internal representation.
11489
11490 @example
11491 #### NOTE: This is a work in progress. Neither the API nor especially
11492 the implementation is finished.
11493
11494 NOTE: An Eistring is a structure that makes it easy to work with
11495 internally-formatted strings of data. It provides operations similar
11496 in feel to the standard @code{strcpy()}, @code{strcat()}, @code{strlen()}, etc., but
11497
11498 (a) it is Mule-correct
11499 (b) it does dynamic allocation so you never have to worry about size
11500 restrictions
11501 (c) it comes in an @code{ALLOCA()} variety (all allocation is stack-local,
11502 so there is no need to explicitly clean up) as well as a @code{malloc()}
11503 variety
11504 (d) it knows its own length, so it does not suffer from standard null
11505 byte brain-damage -- but it null-terminates the data anyway, so
11506 it can be passed to standard routines
11507 (e) it provides a much more powerful set of operations and knows about
11508 all the standard places where string data might reside: Lisp_Objects,
11509 other Eistrings, Ibyte * data with or without an explicit length,
11510 ASCII strings, Ichars, etc.
11511 (f) it provides easy operations to convert to/from externally-formatted
11512 data, and is easier to use than the standard TO_INTERNAL_FORMAT
11513 and TO_EXTERNAL_FORMAT macros. (An Eistring can store both the internal
11514 and external version of its data, but the external version is only
11515 initialized or changed when you call @code{eito_external()}.)
11516
11517 The idea is to make it as easy to write Mule-correct string manipulation
11518 code as it is to write normal string manipulation code. We also make
11519 the API sufficiently general that it can handle multiple internal data
11520 formats (e.g. some fixed-width optimizing formats and a default variable
11521 width format) and allows for @strong{ANY} data format we might choose in the
11522 future for the default format, including UCS2. (In other words, we can't
11523 assume that the internal format is ASCII-compatible and we can't assume
11524 it doesn't have embedded null bytes. We do assume, however, that any
11525 chosen format will have the concept of null-termination.) All of this is
11526 hidden from the user.
11527
11528 #### It is really too bad that we don't have a real object-oriented
11529 language, or at least a language with polymorphism!
11530
11531
11532 **********************************************
11533 * Declaration *
11534 **********************************************
11535
11536 To declare an Eistring, either put one of the following in the local
11537 variable section:
11538
11539 DECLARE_EISTRING (name);
11540 Declare a new Eistring and initialize it to the empy string. This
11541 is a standard local variable declaration and can go anywhere in the
11542 variable declaration section. NAME itself is declared as an
11543 Eistring *, and its storage declared on the stack.
11544
11545 DECLARE_EISTRING_MALLOC (name);
11546 Declare and initialize a new Eistring, which uses @code{malloc()}ed
11547 instead of @code{ALLOCA()}ed data. This is a standard local variable
11548 declaration and can go anywhere in the variable declaration
11549 section. Once you initialize the Eistring, you will have to free
11550 it using @code{eifree()} to avoid memory leaks. You will need to use this
11551 form if you are passing an Eistring to any function that modifies
11552 it (otherwise, the modified data may be in stack space and get
11553 overwritten when the function returns).
11554
11555 or use
11556
11557 Eistring ei;
11558 void eiinit (Eistring *ei);
11559 void eiinit_malloc (Eistring *einame);
11560 If you need to put an Eistring elsewhere than in a local variable
11561 declaration (e.g. in a structure), declare it as shown and then
11562 call one of the init macros.
11563
11564 Also note:
11565
11566 void eifree (Eistring *ei);
11567 If you declared an Eistring to use @code{malloc()} to hold its data,
11568 or converted it to the heap using @code{eito_malloc()}, then this
11569 releases any data in it and afterwards resets the Eistring
11570 using @code{eiinit_malloc()}. Otherwise, it just resets the Eistring
11571 using @code{eiinit()}.
11572
11573
11574 **********************************************
11575 * Conventions *
11576 **********************************************
11577
11578 - The names of the functions have been chosen, where possible, to
11579 match the names of @code{str*()} functions in the standard C API.
11580 -
11581
11582
11583 **********************************************
11584 * Initialization *
11585 **********************************************
11586
11587 void eireset (Eistring *eistr);
11588 Initialize the Eistring to the empty string.
11589
11590 void eicpy_* (Eistring *eistr, ...);
11591 Initialize the Eistring from somewhere:
11592
11593 void eicpy_ei (Eistring *eistr, Eistring *eistr2);
11594 ... from another Eistring.
11595 void eicpy_lstr (Eistring *eistr, Lisp_Object lisp_string);
11596 ... from a Lisp_Object string.
11597 void eicpy_ch (Eistring *eistr, Ichar ch);
11598 ... from an Ichar (this can be a conventional C character).
11599
11600 void eicpy_lstr_off (Eistring *eistr, Lisp_Object lisp_string,
11601 Bytecount off, Charcount charoff,
11602 Bytecount len, Charcount charlen);
11603 ... from a section of a Lisp_Object string.
11604 void eicpy_lbuf (Eistring *eistr, Lisp_Object lisp_buf,
11605 Bytecount off, Charcount charoff,
11606 Bytecount len, Charcount charlen);
11607 ... from a section of a Lisp_Object buffer.
11608 void eicpy_raw (Eistring *eistr, const Ibyte *data, Bytecount len);
11609 ... from raw internal-format data in the default internal format.
11610 void eicpy_rawz (Eistring *eistr, const Ibyte *data);
11611 ... from raw internal-format data in the default internal format
11612 that is "null-terminated" (the meaning of this depends on the nature
11613 of the default internal format).
11614 void eicpy_raw_fmt (Eistring *eistr, const Ibyte *data, Bytecount len,
11615 Internal_Format intfmt, Lisp_Object object);
11616 ... from raw internal-format data in the specified format.
11617 void eicpy_rawz_fmt (Eistring *eistr, const Ibyte *data,
11618 Internal_Format intfmt, Lisp_Object object);
11619 ... from raw internal-format data in the specified format that is
11620 "null-terminated" (the meaning of this depends on the nature of
11621 the specific format).
11622 void eicpy_c (Eistring *eistr, const Ascbyte *c_string);
11623 ... from an ASCII null-terminated string. Non-ASCII characters in
11624 the string are @strong{ILLEGAL} (read @code{abort()} with error-checking defined).
11625 void eicpy_c_len (Eistring *eistr, const Ascbyte *c_string, len);
11626 ... from an ASCII string, with length specified. Non-ASCII characters
11627 in the string are @strong{ILLEGAL} (read @code{abort()} with error-checking defined).
11628 void eicpy_ext (Eistring *eistr, const Extbyte *extdata,
11629 Lisp_Object codesys);
11630 ... from external null-terminated data, with coding system specified.
11631 void eicpy_ext_len (Eistring *eistr, const Extbyte *extdata,
11632 Bytecount extlen, Lisp_Object codesys);
11633 ... from external data, with length and coding system specified.
11634 void eicpy_lstream (Eistring *eistr, Lisp_Object lstream);
11635 ... from an lstream; reads data till eof. Data must be in default
11636 internal format; otherwise, interpose a decoding lstream.
11637
11638
11639 **********************************************
11640 * Getting the data out of the Eistring *
11641 **********************************************
11642
11643 Ibyte *eidata (Eistring *eistr);
11644 Return a pointer to the raw data in an Eistring. This is NOT
11645 a copy.
11646
11647 Lisp_Object eimake_string (Eistring *eistr);
11648 Make a Lisp string out of the Eistring.
11649
11650 Lisp_Object eimake_string_off (Eistring *eistr,
11651 Bytecount off, Charcount charoff,
11652 Bytecount len, Charcount charlen);
11653 Make a Lisp string out of a section of the Eistring.
11654
11655 void eicpyout_alloca (Eistring *eistr, LVALUE: Ibyte *ptr_out,
11656 LVALUE: Bytecount len_out);
11657 Make an @code{ALLOCA()} copy of the data in the Eistring, using the
11658 default internal format. Due to the nature of @code{ALLOCA()}, this
11659 must be a macro, with all lvalues passed in as parameters.
11660 (More specifically, not all compilers correctly handle using
11661 @code{ALLOCA()} as the argument to a function call -- GCC on x86
11662 didn't used to, for example.) A pointer to the @code{ALLOCA()}ed data
11663 is stored in PTR_OUT, and the length of the data (not including
11664 the terminating zero) is stored in LEN_OUT.
11665
11666 void eicpyout_alloca_fmt (Eistring *eistr, LVALUE: Ibyte *ptr_out,
11667 LVALUE: Bytecount len_out,
11668 Internal_Format intfmt, Lisp_Object object);
11669 Like @code{eicpyout_alloca()}, but converts to the specified internal
11670 format. (No formats other than FORMAT_DEFAULT are currently
11671 implemented, and you get an assertion failure if you try.)
11672
11673 Ibyte *eicpyout_malloc (Eistring *eistr, Bytecount *intlen_out);
11674 Make a @code{malloc()} copy of the data in the Eistring, using the
11675 default internal format. This is a real function. No lvalues
11676 passed in. Returns the new data, and stores the length (not
11677 including the terminating zero) using INTLEN_OUT, unless it's
11678 a NULL pointer.
11679
11680 Ibyte *eicpyout_malloc_fmt (Eistring *eistr, Internal_Format intfmt,
11681 Bytecount *intlen_out, Lisp_Object object);
11682 Like @code{eicpyout_malloc()}, but converts to the specified internal
11683 format. (No formats other than FORMAT_DEFAULT are currently
11684 implemented, and you get an assertion failure if you try.)
11685
11686
11687 **********************************************
11688 * Moving to the heap *
11689 **********************************************
11690
11691 void eito_malloc (Eistring *eistr);
11692 Move this Eistring to the heap. Its data will be stored in a
11693 @code{malloc()}ed block rather than the stack. Subsequent changes to
11694 this Eistring will @code{realloc()} the block as necessary. Use this
11695 when you want the Eistring to remain in scope past the end of
11696 this function call. You will have to manually free the data
11697 in the Eistring using @code{eifree()}.
11698
11699 void eito_alloca (Eistring *eistr);
11700 Move this Eistring back to the stack, if it was moved to the
11701 heap with @code{eito_malloc()}. This will automatically free any
11702 heap-allocated data.
11703
11704
11705
11706 **********************************************
11707 * Retrieving the length *
11708 **********************************************
11709
11710 Bytecount eilen (Eistring *eistr);
11711 Return the length of the internal data, in bytes. See also
11712 @code{eiextlen()}, below.
11713 Charcount eicharlen (Eistring *eistr);
11714 Return the length of the internal data, in characters.
11715
11716
11717 **********************************************
11718 * Working with positions *
11719 **********************************************
11720
11721 Bytecount eicharpos_to_bytepos (Eistring *eistr, Charcount charpos);
11722 Convert a char offset to a byte offset.
11723 Charcount eibytepos_to_charpos (Eistring *eistr, Bytecount bytepos);
11724 Convert a byte offset to a char offset.
11725 Bytecount eiincpos (Eistring *eistr, Bytecount bytepos);
11726 Increment the given position by one character.
11727 Bytecount eiincpos_n (Eistring *eistr, Bytecount bytepos, Charcount n);
11728 Increment the given position by N characters.
11729 Bytecount eidecpos (Eistring *eistr, Bytecount bytepos);
11730 Decrement the given position by one character.
11731 Bytecount eidecpos_n (Eistring *eistr, Bytecount bytepos, Charcount n);
11732 Deccrement the given position by N characters.
11733
11734
11735 **********************************************
11736 * Getting the character at a position *
11737 **********************************************
11738
11739 Ichar eigetch (Eistring *eistr, Bytecount bytepos);
11740 Return the character at a particular byte offset.
11741 Ichar eigetch_char (Eistring *eistr, Charcount charpos);
11742 Return the character at a particular character offset.
11743
11744
11745 **********************************************
11746 * Setting the character at a position *
11747 **********************************************
11748
11749 Ichar eisetch (Eistring *eistr, Bytecount bytepos, Ichar chr);
11750 Set the character at a particular byte offset.
11751 Ichar eisetch_char (Eistring *eistr, Charcount charpos, Ichar chr);
11752 Set the character at a particular character offset.
11753
11754
11755 **********************************************
11756 * Concatenation *
11757 **********************************************
11758
11759 void eicat_* (Eistring *eistr, ...);
11760 Concatenate onto the end of the Eistring, with data coming from the
11761 same places as above:
11762
11763 void eicat_ei (Eistring *eistr, Eistring *eistr2);
11764 ... from another Eistring.
11765 void eicat_c (Eistring *eistr, Ascbyte *c_string);
11766 ... from an ASCII null-terminated string. Non-ASCII characters in
11767 the string are @strong{ILLEGAL} (read @code{abort()} with error-checking defined).
11768 void eicat_raw (ei, const Ibyte *data, Bytecount len);
11769 ... from raw internal-format data in the default internal format.
11770 void eicat_rawz (ei, const Ibyte *data);
11771 ... from raw internal-format data in the default internal format
11772 that is "null-terminated" (the meaning of this depends on the nature
11773 of the default internal format).
11774 void eicat_lstr (ei, Lisp_Object lisp_string);
11775 ... from a Lisp_Object string.
11776 void eicat_ch (ei, Ichar ch);
11777 ... from an Ichar.
11778
11779 All except the first variety are convenience functions.
11780 n the general case, create another Eistring from the source.)
11781
11782
11783 **********************************************
11784 * Replacement *
11785 **********************************************
11786
11787 void eisub_* (Eistring *eistr, Bytecount off, Charcount charoff,
11788 Bytecount len, Charcount charlen, ...);
11789 Replace a section of the Eistring, specifically:
11790
11791 void eisub_ei (Eistring *eistr, Bytecount off, Charcount charoff,
11792 Bytecount len, Charcount charlen, Eistring *eistr2);
11793 ... with another Eistring.
11794 void eisub_c (Eistring *eistr, Bytecount off, Charcount charoff,
11795 Bytecount len, Charcount charlen, Ascbyte *c_string);
11796 ... with an ASCII null-terminated string. Non-ASCII characters in
11797 the string are @strong{ILLEGAL} (read @code{abort()} with error-checking defined).
11798 void eisub_ch (Eistring *eistr, Bytecount off, Charcount charoff,
11799 Bytecount len, Charcount charlen, Ichar ch);
11800 ... with an Ichar.
11801
11802 void eidel (Eistring *eistr, Bytecount off, Charcount charoff,
11803 Bytecount len, Charcount charlen);
11804 Delete a section of the Eistring.
11805
11806
11807 **********************************************
11808 * Converting to an external format *
11809 **********************************************
11810
11811 void eito_external (Eistring *eistr, Lisp_Object codesys);
11812 Convert the Eistring to an external format and store the result
11813 in the string. NOTE: Further changes to the Eistring will @strong{NOT}
11814 change the external data stored in the string. You will have to
11815 call @code{eito_external()} again in such a case if you want the external
11816 data.
11817
11818 Extbyte *eiextdata (Eistring *eistr);
11819 Return a pointer to the external data stored in the Eistring as
11820 a result of a prior call to @code{eito_external()}.
11821
11822 Bytecount eiextlen (Eistring *eistr);
11823 Return the length in bytes of the external data stored in the
11824 Eistring as a result of a prior call to @code{eito_external()}.
11825
11826
11827 **********************************************
11828 * Searching in the Eistring for a character *
11829 **********************************************
11830
11831 Bytecount eichr (Eistring *eistr, Ichar chr);
11832 Charcount eichr_char (Eistring *eistr, Ichar chr);
11833 Bytecount eichr_off (Eistring *eistr, Ichar chr, Bytecount off,
11834 Charcount charoff);
11835 Charcount eichr_off_char (Eistring *eistr, Ichar chr, Bytecount off,
11836 Charcount charoff);
11837 Bytecount eirchr (Eistring *eistr, Ichar chr);
11838 Charcount eirchr_char (Eistring *eistr, Ichar chr);
11839 Bytecount eirchr_off (Eistring *eistr, Ichar chr, Bytecount off,
11840 Charcount charoff);
11841 Charcount eirchr_off_char (Eistring *eistr, Ichar chr, Bytecount off,
11842 Charcount charoff);
11843
11844
11845 **********************************************
11846 * Searching in the Eistring for a string *
11847 **********************************************
11848
11849 Bytecount eistr_ei (Eistring *eistr, Eistring *eistr2);
11850 Charcount eistr_ei_char (Eistring *eistr, Eistring *eistr2);
11851 Bytecount eistr_ei_off (Eistring *eistr, Eistring *eistr2, Bytecount off,
11852 Charcount charoff);
11853 Charcount eistr_ei_off_char (Eistring *eistr, Eistring *eistr2,
11854 Bytecount off, Charcount charoff);
11855 Bytecount eirstr_ei (Eistring *eistr, Eistring *eistr2);
11856 Charcount eirstr_ei_char (Eistring *eistr, Eistring *eistr2);
11857 Bytecount eirstr_ei_off (Eistring *eistr, Eistring *eistr2, Bytecount off,
11858 Charcount charoff);
11859 Charcount eirstr_ei_off_char (Eistring *eistr, Eistring *eistr2,
11860 Bytecount off, Charcount charoff);
11861
11862 Bytecount eistr_c (Eistring *eistr, Ascbyte *c_string);
11863 Charcount eistr_c_char (Eistring *eistr, Ascbyte *c_string);
11864 Bytecount eistr_c_off (Eistring *eistr, Ascbyte *c_string, Bytecount off,
11865 Charcount charoff);
11866 Charcount eistr_c_off_char (Eistring *eistr, Ascbyte *c_string,
11867 Bytecount off, Charcount charoff);
11868 Bytecount eirstr_c (Eistring *eistr, Ascbyte *c_string);
11869 Charcount eirstr_c_char (Eistring *eistr, Ascbyte *c_string);
11870 Bytecount eirstr_c_off (Eistring *eistr, Ascbyte *c_string,
11871 Bytecount off, Charcount charoff);
11872 Charcount eirstr_c_off_char (Eistring *eistr, Ascbyte *c_string,
11873 Bytecount off, Charcount charoff);
11874
11875
11876 **********************************************
11877 * Comparison *
11878 **********************************************
11879
11880 int eicmp_* (Eistring *eistr, ...);
11881 int eicmp_off_* (Eistring *eistr, Bytecount off, Charcount charoff,
11882 Bytecount len, Charcount charlen, ...);
11883 int eicasecmp_* (Eistring *eistr, ...);
11884 int eicasecmp_off_* (Eistring *eistr, Bytecount off, Charcount charoff,
11885 Bytecount len, Charcount charlen, ...);
11886 int eicasecmp_i18n_* (Eistring *eistr, ...);
11887 int eicasecmp_i18n_off_* (Eistring *eistr, Bytecount off, Charcount charoff,
11888 Bytecount len, Charcount charlen, ...);
11889
11890 Compare the Eistring with the other data. Return value same as
11891 from strcmp. The `*' is either `ei' for another Eistring (in
11892 which case `...' is an Eistring), or `c' for a pure-ASCII string
11893 (in which case `...' is a pointer to that string). For anything
11894 more complex, first create an Eistring out of the source.
11895 Comparison is either simple (`eicmp_...'), ASCII case-folding
11896 (`eicasecmp_...'), or multilingual case-folding
11897 (`eicasecmp_i18n_...).
11898
11899
11900 More specifically, the prototypes are:
11901
11902 int eicmp_ei (Eistring *eistr, Eistring *eistr2);
11903 int eicmp_off_ei (Eistring *eistr, Bytecount off, Charcount charoff,
11904 Bytecount len, Charcount charlen, Eistring *eistr2);
11905 int eicasecmp_ei (Eistring *eistr, Eistring *eistr2);
11906 int eicasecmp_off_ei (Eistring *eistr, Bytecount off, Charcount charoff,
11907 Bytecount len, Charcount charlen, Eistring *eistr2);
11908 int eicasecmp_i18n_ei (Eistring *eistr, Eistring *eistr2);
11909 int eicasecmp_i18n_off_ei (Eistring *eistr, Bytecount off,
11910 Charcount charoff, Bytecount len,
11911 Charcount charlen, Eistring *eistr2);
11912
11913 int eicmp_c (Eistring *eistr, Ascbyte *c_string);
11914 int eicmp_off_c (Eistring *eistr, Bytecount off, Charcount charoff,
11915 Bytecount len, Charcount charlen, Ascbyte *c_string);
11916 int eicasecmp_c (Eistring *eistr, Ascbyte *c_string);
11917 int eicasecmp_off_c (Eistring *eistr, Bytecount off, Charcount charoff,
11918 Bytecount len, Charcount charlen,
11919 Ascbyte *c_string);
11920 int eicasecmp_i18n_c (Eistring *eistr, Ascbyte *c_string);
11921 int eicasecmp_i18n_off_c (Eistring *eistr, Bytecount off, Charcount charoff,
11922 Bytecount len, Charcount charlen,
11923 Ascbyte *c_string);
11924
11925
11926 **********************************************
11927 * Case-changing the Eistring *
11928 **********************************************
11929
11930 void eilwr (Eistring *eistr);
11931 Convert all characters in the Eistring to lowercase.
11932 void eiupr (Eistring *eistr);
11933 Convert all characters in the Eistring to uppercase.
11934 @end example
11935
11936 @node Coding for Mule, CCL, Internal Text API's, Multilingual Support
11937 @section Coding for Mule
11938 @cindex coding for Mule
11939 @cindex Mule, coding for
11940
11941 Although Mule support is not compiled by default in XEmacs, many people
11942 are using it, and we consider it crucial that new code works correctly
11943 with multibyte characters. This is not hard; it is only a matter of
11944 following several simple user-interface guidelines. Even if you never
11945 compile with Mule, with a little practice you will find it quite easy
11946 to code Mule-correctly.
11947
11948 Note that these guidelines are not necessarily tied to the current Mule
11949 implementation; they are also a good idea to follow on the grounds of
11950 code generalization for future I18N work.
11951
11952 @menu
11953 * Character-Related Data Types::
11954 * Working With Character and Byte Positions::
11955 * Conversion to and from External Data::
11956 * General Guidelines for Writing Mule-Aware Code::
11957 * An Example of Mule-Aware Code::
11958 * Mule-izing Code::
11959 @end menu
11960
11961 @node Character-Related Data Types, Working With Character and Byte Positions, Coding for Mule, Coding for Mule
11962 @subsection Character-Related Data Types
11963 @cindex character-related data types
11964 @cindex data types, character-related
11965
11966 First, let's review the basic character-related datatypes used by
11967 XEmacs. Note that some of the separate @code{typedef}s are not
11968 mandatory, but they improve clarity of code a great deal, because one
11969 glance at the declaration can tell the intended use of the variable.
11970
11971 @table @code
11972 @item Ichar
11973 @cindex Ichar
11974 An @code{Ichar} holds a single Emacs character.
11975
11976 Obviously, the equality between characters and bytes is lost in the Mule
11977 world. Characters can be represented by one or more bytes in the
11978 buffer, and @code{Ichar} is a C type large enough to hold any
11979 character. (This currently isn't quite true for ISO 10646, which
11980 defines a character as a 31-bit non-negative quantity, while XEmacs
11981 characters are only 30-bits. This is irrelevant, unless you are
11982 considering using the ISO 10646 private groups to support really large
11983 private character sets---in particular, the Mule character set!---in
11984 a version of XEmacs using Unicode internally.)
11985
11986 Without Mule support, an @code{Ichar} is equivalent to an
11987 @code{unsigned char}. [[This doesn't seem to be true; @file{lisp.h}
11988 unconditionally @samp{typedef}s @code{Ichar} to @code{int}.]]
11989
11990 @item Ibyte
11991 @cindex Ibyte
11992 The data representing the text in a buffer or string is logically a set
11993 of @code{Ibyte}s.
11994
11995 XEmacs does not work with the same character formats all the time; when
11996 reading characters from the outside, it decodes them to an internal
11997 format, and likewise encodes them when writing. @code{Ibyte} (in fact
11998 @code{unsigned char}) is the basic unit of XEmacs internal buffers and
11999 strings format. An @code{Ibyte *} is the type that points at text
12000 encoded in the variable-width internal encoding.
12001
12002 One character can correspond to one or more @code{Ibyte}s. In the
12003 current Mule implementation, an ASCII character is represented by the
12004 same @code{Ibyte}, and other characters are represented by a sequence
12005 of two or more @code{Ibyte}s. (This will also be true of an
12006 implementation using UTF-8 as the internal encoding. In fact, only code
12007 that implements character code conversions and a very few macros used to
12008 implement motion by whole characters will notice the difference between
12009 UTF-8 and the Mule encoding.)
12010
12011 Without Mule support, there are exactly 256 characters, implicitly
12012 Latin-1, and each character is represented using one @code{Ibyte}, and
12013 there is a one-to-one correspondence between @code{Ibyte}s and
12014 @code{Ichar}s.
12015
12016 @item Charxpos
12017 @item Charbpos
12018 @itemx Charcount
12019 @cindex Charxpos
12020 @cindex Charbpos
12021 @cindex Charcount
12022 A @code{Charbpos} represents a character position in a buffer. A
12023 @code{Charcount} represents a number (count) of characters. Logically,
12024 subtracting two @code{Charbpos} values yields a @code{Charcount} value.
12025 When representing a character position in a string, we just use
12026 @code{Charcount} directly. The reason for having a separate typedef for
12027 buffer positions is that they are 1-based, whereas string positions are
12028 0-based and hence string counts and positions can be freely intermixed (a
12029 string position is equivalent to the count of characters from the
12030 beginning). When representing a character position that could be either
12031 in a buffer or string (for example, in the extent code), @code{Charxpos}
12032 is used. Although all of these are @code{typedef}ed to
12033 @code{EMACS_INT}, we use them in preference to @code{EMACS_INT} to make
12034 it clear what sort of position is being used.
12035
12036 @code{Charxpos}, @code{Charbpos} and @code{Charcount} values are the
12037 only ones that are ever visible to Lisp.
12038
12039 @item Bytexpos
12040 @itemx Bytecount
12041 @cindex Bytebpos
12042 @cindex Bytecount
12043 A @code{Bytebpos} represents a byte position in a buffer. A
12044 @code{Bytecount} represents the distance between two positions, in
12045 bytes. Byte positions in strings use @code{Bytecount}, and for byte
12046 positions that can be either in a buffer or string, @code{Bytexpos} is
12047 used. The relationship between @code{Bytexpos}, @code{Bytebpos} and
12048 @code{Bytecount} is the same as the relationship between
12049 @code{Charxpos}, @code{Charbpos} and @code{Charcount}.
12050
12051 @item Extbyte
12052 @cindex Extbyte
12053 When dealing with the outside world, XEmacs works with @code{Extbyte}s,
12054 which are equivalent to @code{char}. The distance between two
12055 @code{Extbyte}s is a @code{Bytecount}, since external text is a
12056 byte-by-byte encoding. Extbytes occur mainly at the transition point
12057 between internal text and external functions. XEmacs code should not,
12058 if it can possibly avoid it, do any actual manipulation using external
12059 text, since its format is completely unpredictable (it might not even be
12060 ASCII-compatible).
12061 @end table
12062
12063 @node Working With Character and Byte Positions, Conversion to and from External Data, Character-Related Data Types, Coding for Mule
12064 @subsection Working With Character and Byte Positions
12065 @cindex character and byte positions, working with
12066 @cindex byte positions, working with character and
12067 @cindex positions, working with character and byte
12068
12069 Now that we have defined the basic character-related types, we can look
12070 at the macros and functions designed for work with them and for
12071 conversion between them. Most of these macros are defined in
12072 @file{buffer.h}, and we don't discuss all of them here, but only the
12073 most important ones. Examining the existing code is the best way to
12074 learn about them.
12075
12076 @table @code
12077 @item MAX_ICHAR_LEN
12078 @cindex MAX_ICHAR_LEN
12079 This preprocessor constant is the maximum number of buffer bytes to
12080 represent an Emacs character in the variable width internal encoding.
12081 It is useful when allocating temporary strings to keep a known number of
12082 characters. For instance:
12083
12084 @example
12085 @group
12086 @{
12087 Charcount cclen;
12088 ...
12089 @{
12090 /* Allocate place for @var{cclen} characters. */
12091 Ibyte *buf = (Ibyte *) alloca (cclen * MAX_ICHAR_LEN);
12092 ...
12093 @end group
12094 @end example
12095
12096 If you followed the previous section, you can guess that, logically,
12097 multiplying a @code{Charcount} value with @code{MAX_ICHAR_LEN} produces
12098 a @code{Bytecount} value.
12099
12100 In the current Mule implementation, @code{MAX_ICHAR_LEN} equals 4.
12101 Without Mule, it is 1. In a mature Unicode-based XEmacs, it will also
12102 be 4 (since all Unicode characters can be encoded in UTF-8 in 4 bytes or
12103 less), but some versions may use up to 6, in order to use the large
12104 private space provided by ISO 10646 to ``mirror'' the Mule code space.
12105
12106 @item itext_ichar
12107 @itemx set_itext_ichar
12108 @cindex itext_ichar
12109 @cindex set_itext_ichar
12110 The @code{itext_ichar} macro takes a @code{Ibyte} pointer and
12111 returns the @code{Ichar} stored at that position. If it were a
12112 function, its prototype would be:
12113
12114 @example
12115 Ichar itext_ichar (Ibyte *p);
12116 @end example
12117
12118 @code{set_itext_ichar} stores an @code{Ichar} to the specified byte
12119 position. It returns the number of bytes stored:
12120
12121 @example
12122 Bytecount set_itext_ichar (Ibyte *p, Ichar c);
12123 @end example
12124
12125 It is important to note that @code{set_itext_ichar} is safe only for
12126 appending a character at the end of a buffer, not for overwriting a
12127 character in the middle. This is because the width of characters
12128 varies, and @code{set_itext_ichar} cannot resize the string if it
12129 writes, say, a two-byte character where a single-byte character used to
12130 reside.
12131
12132 A typical use of @code{set_itext_ichar} can be demonstrated by this
12133 example, which copies characters from buffer @var{buf} to a temporary
12134 string of Ibytes.
12135
12136 @example
12137 @group
12138 @{
12139 Charbpos pos;
12140 for (pos = beg; pos < end; pos++)
12141 @{
12142 Ichar c = BUF_FETCH_CHAR (buf, pos);
12143 p += set_itext_ichar (buf, c);
12144 @}
12145 @}
12146 @end group
12147 @end example
12148
12149 Note how @code{set_itext_ichar} is used to store the @code{Ichar}
12150 and increment the counter, at the same time.
12151
12152 @item INC_IBYTEPTR
12153 @itemx DEC_IBYTEPTR
12154 @cindex INC_IBYTEPTR
12155 @cindex DEC_IBYTEPTR
12156 These two macros increment and decrement an @code{Ibyte} pointer,
12157 respectively. They will adjust the pointer by the appropriate number of
12158 bytes according to the byte length of the character stored there. Both
12159 macros assume that the memory address is located at the beginning of a
12160 valid character.
12161
12162 Without Mule support, @code{INC_IBYTEPTR (p)} and @code{DEC_IBYTEPTR (p)}
12163 simply expand to @code{p++} and @code{p--}, respectively.
12164
12165 @item bytecount_to_charcount
12166 @cindex bytecount_to_charcount
12167 Given a pointer to a text string and a length in bytes, return the
12168 equivalent length in characters.
12169
12170 @example
12171 Charcount bytecount_to_charcount (Ibyte *p, Bytecount bc);
12172 @end example
12173
12174 @item charcount_to_bytecount
12175 @cindex charcount_to_bytecount
12176 Given a pointer to a text string and a length in characters, return the
12177 equivalent length in bytes.
12178
12179 @example
12180 Bytecount charcount_to_bytecount (Ibyte *p, Charcount cc);
12181 @end example
12182
12183 @item itext_n_addr
12184 @cindex itext_n_addr
12185 Return a pointer to the beginning of the character offset @var{cc} (in
12186 characters) from @var{p}.
12187
12188 @example
12189 Ibyte *itext_n_addr (Ibyte *p, Charcount cc);
12190 @end example
12191 @end table
12192
12193 @node Conversion to and from External Data, General Guidelines for Writing Mule-Aware Code, Working With Character and Byte Positions, Coding for Mule
12194 @subsection Conversion to and from External Data
12195 @cindex conversion to and from external data
12196 @cindex external data, conversion to and from
12197
12198 When an external function, such as a C library function, returns a
12199 @code{char} pointer, you should almost never treat it as @code{Ibyte}.
12200 This is because these returned strings may contain 8bit characters which
12201 can be misinterpreted by XEmacs, and cause a crash. Likewise, when
12202 exporting a piece of internal text to the outside world, you should
12203 always convert it to an appropriate external encoding, lest the internal
12204 stuff (such as the infamous \201 characters) leak out.
12205
12206 The interface to conversion between the internal and external
12207 representations of text are the numerous conversion macros defined in
12208 @file{buffer.h}. There used to be a fixed set of external formats
12209 supported by these macros, but now any coding system can be used with
12210 them. The coding system alias mechanism is used to create the
12211 following logical coding systems, which replace the fixed external
12212 formats. The (dontusethis-set-symbol-value-handler) mechanism was
12213 enhanced to make this possible (more work on that is needed).
12214
12215 Often useful coding systems:
12216
12217 @table @code
12218 @item Qbinary
12219 This is the simplest format and is what we use in the absence of a more
12220 appropriate format. This converts according to the @code{binary} coding
12221 system:
12222
12223 @enumerate a
12224 @item
12225 On input, bytes 0--255 are converted into (implicitly Latin-1)
12226 characters 0--255. A non-Mule xemacs doesn't really know about
12227 different character sets and the fonts to display them, so the bytes can
12228 be treated as text in different 1-byte encodings by simply setting the
12229 appropriate fonts. So in a sense, non-Mule xemacs is a multi-lingual
12230 editor if, for example, different fonts are used to display text in
12231 different buffers, faces, or windows. The specifier mechanism gives the
12232 user complete control over this kind of behavior.
12233 @item
12234 On output, characters 0--255 are converted into bytes 0--255 and other
12235 characters are converted into @samp{~}.
12236 @end enumerate
12237
12238 @item Qnative
12239 Format used for the external Unix environment---@code{argv[]}, stuff
12240 from @code{getenv()}, stuff from the @file{/etc/passwd} file, etc.
12241 This is encoded according to the encoding specified by the current locale.
12242 [[This is dangerous; current locale is user preference, and the system
12243 is probably going to be something else. Is there anything we can do
12244 about it?]]
12245
12246 @item Qfile_name
12247 Format used for filenames. This is normally the same as @code{Qnative},
12248 but the two should be distinguished for clarity and possible future
12249 separation -- and also because @code{Qfile_name} can be changed using either
12250 the @code{file-name-coding-system} or @code{pathname-coding-system} (now
12251 obsolete) variables.
12252
12253 @item Qctext
12254 Compound-text format. This is the standard X11 format used for data
12255 stored in properties, selections, and the like. This is an 8-bit
12256 no-lock-shift ISO2022 coding system. This is a real coding system,
12257 unlike @code{Qfile_name}, which is user-definable.
12258
12259 @item Qmswindows_tstr
12260 Used for external data in all MS Windows functions that are declared to
12261 accept data of type @code{LPTSTR} or @code{LPCSTR}. This maps to either
12262 @code{Qmswindows_multibyte} (a locale-specific encoding, same as
12263 @code{Qnative}) or @code{Qmswindows_unicode}, depending on whether
12264 XEmacs is being run under Windows 9X or Windows NT/2000/XP.
12265 @end table
12266
12267 Many other coding systems are provided by default.
12268
12269 There are two fundamental macros to convert between external and
12270 internal format, as well as various convenience macros to simplify the
12271 most common operations.
12272
12273 @code{TO_INTERNAL_FORMAT} converts external data to internal format, and
12274 @code{TO_EXTERNAL_FORMAT} converts the other way around. The arguments
12275 each of these receives are a source type, a source, a sink type, a sink,
12276 and a coding system (or a symbol naming a coding system).
12277
12278 A typical call looks like
12279 @example
12280 TO_EXTERNAL_FORMAT (LISP_STRING, str, C_STRING_MALLOC, ptr, Qfile_name);
12281 @end example
12282
12283 which means that the contents of the lisp string @code{str} are written
12284 to a malloc'ed memory area which will be pointed to by @code{ptr}, after
12285 the function returns. The conversion will be done using the
12286 @code{file-name} coding system, which will be controlled by the user
12287 indirectly by setting or binding the variable
12288 @code{file-name-coding-system}.
12289
12290 Some sources and sinks require two C variables to specify. We use some
12291 preprocessor magic to allow different source and sink types, and even
12292 different numbers of arguments to specify different types of sources and
12293 sinks.
12294
12295 So we can have a call that looks like
12296 @example
12297 TO_INTERNAL_FORMAT (DATA, (ptr, len),
12298 MALLOC, (ptr, len),
12299 coding_system);
12300 @end example
12301
12302 The parenthesized argument pairs are required to make the preprocessor
12303 magic work.
12304
12305 Here are the different source and sink types:
12306
12307 @table @code
12308 @item @code{DATA, (ptr, len),}
12309 input data is a fixed buffer of size @var{len} at address @var{ptr}
12310 @item @code{ALLOCA, (ptr, len),}
12311 output data is placed in an @code{alloca()}ed buffer of size @var{len} pointed to by @var{ptr}
12312 @item @code{MALLOC, (ptr, len),}
12313 output data is in a @code{malloc()}ed buffer of size @var{len} pointed to by @var{ptr}
12314 @item @code{C_STRING_ALLOCA, ptr,}
12315 equivalent to @code{ALLOCA (ptr, len_ignored)} on output.
12316 @item @code{C_STRING_MALLOC, ptr,}
12317 equivalent to @code{MALLOC (ptr, len_ignored)} on output
12318 @item @code{C_STRING, ptr,}
12319 equivalent to @code{DATA, (ptr, strlen/wcslen (ptr))} on input
12320 @item @code{LISP_STRING, string,}
12321 input or output is a Lisp_Object of type string
12322 @item @code{LISP_BUFFER, buffer,}
12323 output is written to @code{(point)} in lisp buffer @var{buffer}
12324 @item @code{LISP_LSTREAM, lstream,}
12325 input or output is a Lisp_Object of type lstream
12326 @item @code{LISP_OPAQUE, object,}
12327 input or output is a Lisp_Object of type opaque
12328 @end table
12329
12330 A source type of @code{C_STRING} or a sink type of
12331 @code{C_STRING_ALLOCA} or @code{C_STRING_MALLOC} is appropriate where
12332 the external API is not '\0'-byte-clean -- i.e. it expects strings to be
12333 terminated with a null byte. For external API's that are in fact
12334 '\0'-byte-clean, we should of course not use these.
12335
12336 The sinks to be specified must be lvalues, unless they are the lisp
12337 object types @code{LISP_LSTREAM} or @code{LISP_BUFFER}.
12338
12339 There is no problem using the same lvalue for source and sink.
12340
12341 Garbage collection is inhibited during these conversion operations, so
12342 it is OK to pass in data from Lisp strings using @code{XSTRING_DATA}.
12343
12344 For the sink types @code{ALLOCA} and @code{C_STRING_ALLOCA}, the
12345 resulting text is stored in a stack-allocated buffer, which is
12346 automatically freed on returning from the function. However, the sink
12347 types @code{MALLOC} and @code{C_STRING_MALLOC} return @code{xmalloc()}ed
12348 memory. The caller is responsible for freeing this memory using
12349 @code{xfree()}.
12350
12351 Note that it doesn't make sense for @code{LISP_STRING} to be a source
12352 for @code{TO_INTERNAL_FORMAT} or a sink for @code{TO_EXTERNAL_FORMAT}.
12353 You'll get an assertion failure if you try.
12354
12355 99% of conversions involve raw data or Lisp strings as both source and
12356 sink, and usually data is output as @code{alloca()}, or sometimes
12357 @code{xmalloc()}. For this reason, convenience macros are defined for
12358 many types of conversions involving raw data and/or Lisp strings,
12359 especially when the output is an @code{alloca()}ed string. (When the
12360 destination is a Lisp string, there are other functions that should be
12361 used instead -- @code{build_ext_string()} and @code{make_ext_string()},
12362 for example.) The convenience macros are of two types -- the older kind
12363 that store the result into a specified variable, and the newer kind that
12364 return the result. The newer kind of macros don't exist when the output
12365 is sized data, because that would have two return values. NOTE: All
12366 convenience macros are ultimately defined in terms of
12367 @code{TO_EXTERNAL_FORMAT} and @code{TO_INTERNAL_FORMAT}. Thus, any
12368 comments above about the workings of these macros also apply to all
12369 convenience macros.
12370
12371 A typical old-style convenience macro is
12372
12373 @example
12374 C_STRING_TO_EXTERNAL (in, out, codesys);
12375 @end example
12376
12377 This is equivalent to
12378
12379 @example
12380 TO_EXTERNAL_FORMAT (C_STRING, in, C_STRING_ALLOCA, out, codesys);
12381 @end example
12382
12383 but is easier to write and somewhat clearer, since it clearly identifies
12384 the arguments without the clutter of having the preprocessor types mixed
12385 in.
12386
12387 The new-style equivalent is @code{NEW_C_STRING_TO_EXTERNAL (src,
12388 codesys)}, which @emph{returns} the converted data (still in
12389 @code{alloca()} space). This is far more convenient for most
12390 operations.
12391
12392 @node General Guidelines for Writing Mule-Aware Code, An Example of Mule-Aware Code, Conversion to and from External Data, Coding for Mule
12393 @subsection General Guidelines for Writing Mule-Aware Code
12394 @cindex writing Mule-aware code, general guidelines for
12395 @cindex Mule-aware code, general guidelines for writing
12396 @cindex code, general guidelines for writing Mule-aware
12397
12398 This section contains some general guidance on how to write Mule-aware
12399 code, as well as some pitfalls you should avoid.
12400
12401 @table @emph
12402 @item Never use @code{char} and @code{char *}.
12403 In XEmacs, the use of @code{char} and @code{char *} is almost always a
12404 mistake. If you want to manipulate an Emacs character from ``C'', use
12405 @code{Ichar}. If you want to examine a specific octet in the internal
12406 format, use @code{Ibyte}. If you want a Lisp-visible character, use a
12407 @code{Lisp_Object} and @code{make_char}. If you want a pointer to move
12408 through the internal text, use @code{Ibyte *}. Also note that you
12409 almost certainly do not need @code{Ichar *}. Other typedefs to clarify
12410 the use of @code{char} are @code{Char_ASCII}, @code{Char_Binary},
12411 @code{UChar_Binary}, and @code{CIbyte}.
12412
12413 @item Be careful not to confuse @code{Charcount}, @code{Bytecount}, @code{Charbpos} and @code{Bytebpos}.
12414 The whole point of using different types is to avoid confusion about the
12415 use of certain variables. Lest this effect be nullified, you need to be
12416 careful about using the right types.
12417
12418 @item Always convert external data
12419 It is extremely important to always convert external data, because
12420 XEmacs can crash if unexpected 8-bit sequences are copied to its internal
12421 buffers literally.
12422
12423 This means that when a system function, such as @code{readdir}, returns
12424 a string, you normally need to convert it using one of the conversion macros
12425 described in the previous chapter, before passing it further to Lisp.
12426
12427 Actually, most of the basic system functions that accept '\0'-terminated
12428 string arguments, like @code{stat()} and @code{open()}, have
12429 @strong{encapsulated} equivalents that do the internal to external
12430 conversion themselves. The encapsulated equivalents have a @code{qxe_}
12431 prefix and have string arguments of type @code{Ibyte *}, and you can
12432 pass internally encoded data to them, often from a Lisp string using
12433 @code{XSTRING_DATA}. (A better design might be to provide versions that
12434 accept Lisp strings directly.) [[Really? Then they'd either take
12435 @code{Lisp_Object}s and need to check type, or they'd take
12436 @code{Lisp_String}s, and violate the rules about passing any of the
12437 specific Lisp types.]]
12438
12439 Also note that many internal functions, such as @code{make_string},
12440 accept Ibytes, which removes the need for them to convert the data they
12441 receive. This increases efficiency because that way external data needs
12442 to be decoded only once, when it is read. After that, it is passed
12443 around in internal format.
12444
12445 @item Do all work in internal format
12446 External-formatted data is completely unpredictable in its format. It
12447 may be fixed-width Unicode (not even ASCII compatible); it may be a
12448 modal encoding, in
12449 which case some occurrences of (e.g.) the slash character may be part of
12450 two-byte Asian-language characters, and a naive attempt to split apart a
12451 pathname by slashes will fail; etc. Internal-format text should be
12452 converted to external format only at the point where an external API is
12453 actually called, and the first thing done after receiving
12454 external-format text from an external API should be to convert it to
12455 internal text.
12456 @end table
12457
12458 @node An Example of Mule-Aware Code, Mule-izing Code, General Guidelines for Writing Mule-Aware Code, Coding for Mule
12459 @subsection An Example of Mule-Aware Code
12460 @cindex code, an example of Mule-aware
12461 @cindex Mule-aware code, an example of
12462
12463 As an example of Mule-aware code, we will analyze the @code{string}
12464 function, which conses up a Lisp string from the character arguments it
12465 receives. Here is the definition, pasted from @code{alloc.c}:
12466
12467 @example
12468 @group
12469 DEFUN ("string", Fstring, 0, MANY, 0, /*
12470 Concatenate all the argument characters and make the result a string.
12471 */
12472 (int nargs, Lisp_Object *args))
12473 @{
12474 Ibyte *storage = alloca_array (Ibyte, nargs * MAX_ICHAR_LEN);
12475 Ibyte *p = storage;
12476
12477 for (; nargs; nargs--, args++)
12478 @{
12479 Lisp_Object lisp_char = *args;
12480 CHECK_CHAR_COERCE_INT (lisp_char);
12481 p += set_itext_ichar (p, XCHAR (lisp_char));
12482 @}
12483 return make_string (storage, p - storage);
12484 @}
12485 @end group
12486 @end example
12487
12488 Now we can analyze the source line by line.
12489
12490 Obviously, string will be as long as there are arguments to the
12491 function. This is why we allocate @code{MAX_ICHAR_LEN} * @var{nargs}
12492 bytes on the stack, i.e. the worst-case number of bytes for @var{nargs}
12493 @code{Ichar}s to fit in the string.
12494
12495 Then, the loop checks that each element is a character, converting
12496 integers in the process. Like many other functions in XEmacs, this
12497 function silently accepts integers where characters are expected, for
12498 historical and compatibility reasons. Unless you know what you are
12499 doing, @code{CHECK_CHAR} will also suffice. @code{XCHAR (lisp_char)}
12500 extracts the @code{Ichar} from the @code{Lisp_Object}, and
12501 @code{set_itext_ichar} stores it to storage, increasing @code{p} in
12502 the process.
12503
12504 Other instructive examples of correct coding under Mule can be found all
12505 over the XEmacs code. For starters, I recommend
12506 @code{Fnormalize_menu_item_name} in @file{menubar.c}. After you have
12507 understood this section of the manual and studied the examples, you can
12508 proceed writing new Mule-aware code.
12509
12510 @node Mule-izing Code, , An Example of Mule-Aware Code, Coding for Mule
12511 @subsection Mule-izing Code
12512
12513 A lot of code is written without Mule in mind, and needs to be made
12514 Mule-correct or "Mule-ized". There is really no substitute for
12515 line-by-line analysis when doing this, but the following checklist can
12516 help:
12517
12518 @itemize @bullet
12519 @item
12520 Check all uses of @code{XSTRING_DATA}.
12521 @item
12522 Check all uses of @code{build_string} and @code{make_string}.
12523 @item
12524 Check all uses of @code{tolower} and @code{toupper}.
12525 @item
12526 Check object print methods.
12527 @item
12528 Check for use of functions such as @code{write_c_string},
12529 @code{write_fmt_string}, @code{stderr_out}, @code{stdout_out}.
12530 @item
12531 Check all occurrences of @code{char} and correct to one of the other
12532 typedefs described above.
12533 @item
12534 Check all existing uses of @code{TO_EXTERNAL_FORMAT},
12535 @code{TO_INTERNAL_FORMAT}, and any convenience macros (grep for
12536 @samp{EXTERNAL_TO}, @samp{TO_EXTERNAL}, and @samp{TO_SIZED_EXTERNAL}).
12537 @item
12538 In Windows code, string literals may need to be encapsulated with @code{XETEXT}.
12539 @end itemize
12540
12541 @node CCL, Modules for Internationalization, Coding for Mule, Multilingual Support
9424 @section CCL 12542 @section CCL
9425 @cindex CCL 12543 @cindex CCL
9426 12544
9427 @example 12545 @example
9428 CCL PROGRAM SYNTAX:
9429 CCL_PROGRAM := (CCL_MAIN_BLOCK
9430 [ CCL_EOF_BLOCK ])
9431
9432 CCL_MAIN_BLOCK := CCL_BLOCK
9433 CCL_EOF_BLOCK := CCL_BLOCK
9434
9435 CCL_BLOCK := STATEMENT | (STATEMENT [STATEMENT ...])
9436 STATEMENT :=
9437 SET | IF | BRANCH | LOOP | REPEAT | BREAK
9438 | READ | WRITE
9439
9440 SET := (REG = EXPRESSION) | (REG SELF_OP EXPRESSION)
9441 | INT-OR-CHAR
9442
9443 EXPRESSION := ARG | (EXPRESSION OP ARG)
9444
9445 IF := (if EXPRESSION CCL_BLOCK CCL_BLOCK)
9446 BRANCH := (branch EXPRESSION CCL_BLOCK [CCL_BLOCK ...])
9447 LOOP := (loop STATEMENT [STATEMENT ...])
9448 BREAK := (break)
9449 REPEAT := (repeat)
9450 | (write-repeat [REG | INT-OR-CHAR | string])
9451 | (write-read-repeat REG [INT-OR-CHAR | string | ARRAY]?)
9452 READ := (read REG) | (read REG REG)
9453 | (read-if REG ARITH_OP ARG CCL_BLOCK CCL_BLOCK)
9454 | (read-branch REG CCL_BLOCK [CCL_BLOCK ...])
9455 WRITE := (write REG) | (write REG REG)
9456 | (write INT-OR-CHAR) | (write STRING) | STRING
9457 | (write REG ARRAY)
9458 END := (end)
9459
9460 REG := r0 | r1 | r2 | r3 | r4 | r5 | r6 | r7
9461 ARG := REG | INT-OR-CHAR
9462 OP := + | - | * | / | % | & | '|' | ^ | << | >> | <8 | >8 | //
9463 | < | > | == | <= | >= | !=
9464 SELF_OP :=
9465 += | -= | *= | /= | %= | &= | '|=' | ^= | <<= | >>=
9466 ARRAY := '[' INT-OR-CHAR ... ']'
9467 INT-OR-CHAR := INT | CHAR
9468
9469 MACHINE CODE: 12546 MACHINE CODE:
9470 12547
9471 The machine code consists of a vector of 32-bit words. 12548 The machine code consists of a vector of 32-bit words.
9472 The first such word specifies the start of the EOF section of the code; 12549 The first such word specifies the start of the EOF section of the code;
9473 this is the code executed to handle any stuff that needs to be done 12550 this is the code executed to handle any stuff that needs to be done
9585 ReadJumpCondR: 11111 RRR c...c Read1 and JumpCondR 12662 ReadJumpCondR: 11111 RRR c...c Read1 and JumpCondR
9586 ............rrr 12663 ............rrr
9587 ..........AAAAA 12664 ..........AAAAA
9588 @end example 12665 @end example
9589 12666
9590 @node The Lisp Reader and Compiler, Lstreams, MULE Character Sets and Encodings, Top 12667 @node Modules for Internationalization, , CCL, Multilingual Support
12668 @section Modules for Internationalization
12669 @cindex modules for internationalization
12670 @cindex internationalization, modules for
12671
12672 @example
12673 @file{mule-canna.c}
12674 @file{mule-ccl.c}
12675 @file{mule-charset.c}
12676 @file{mule-charset.h}
12677 @file{file-coding.c}
12678 @file{file-coding.h}
12679 @file{mule-coding.c}
12680 @file{mule-mcpath.c}
12681 @file{mule-mcpath.h}
12682 @file{mule-wnnfns.c}
12683 @file{mule.c}
12684 @end example
12685
12686 These files implement the MULE (Asian-language) support. Note that MULE
12687 actually provides a general interface for all sorts of languages, not
12688 just Asian languages (although they are generally the most complicated
12689 to support). This code is still in beta.
12690
12691 @file{mule-charset.*} and @file{file-coding.*} provide the heart of the
12692 XEmacs MULE support. @file{mule-charset.*} implements the @dfn{charset}
12693 Lisp object type, which encapsulates a character set (an ordered one- or
12694 two-dimensional set of characters, such as US ASCII or JISX0208 Japanese
12695 Kanji).
12696
12697 @file{file-coding.*} implements the @dfn{coding-system} Lisp object
12698 type, which encapsulates a method of converting between different
12699 encodings. An encoding is a representation of a stream of characters,
12700 possibly from multiple character sets, using a stream of bytes or words,
12701 and defines (e.g.) which escape sequences are used to specify particular
12702 character sets, how the indices for a character are converted into bytes
12703 (sometimes this involves setting the high bit; sometimes complicated
12704 rearranging of the values takes place, as in the Shift-JIS encoding),
12705 etc. It also contains some generic coding system implementations, such
12706 as the binary (no-conversion) coding system and a sample gzip coding system.
12707
12708 @file{mule-coding.c} contains the implementations of text coding systems.
12709
12710 @file{mule-ccl.c} provides the CCL (Code Conversion Language)
12711 interpreter. CCL is similar in spirit to Lisp byte code and is used to
12712 implement converters for custom encodings.
12713
12714 @file{mule-canna.c} and @file{mule-wnnfns.c} implement interfaces to
12715 external programs used to implement the Canna and WNN input methods,
12716 respectively. This is currently in beta.
12717
12718 @file{mule-mcpath.c} provides some functions to allow for pathnames
12719 containing extended characters. This code is fragmentary, obsolete, and
12720 completely non-working. Instead, @code{pathname-coding-system} is used
12721 to specify conversions of names of files and directories. The standard
12722 C I/O functions like @samp{open()} are wrapped so that conversion occurs
12723 automatically.
12724
12725 @file{mule.c} contains a few miscellaneous things. It currently seems
12726 to be unused and probably should be removed.
12727
12728
12729
12730 @example
12731 @file{intl.c}
12732 @end example
12733
12734 This provides some miscellaneous internationalization code for
12735 implementing message translation and interfacing to the Ximp input
12736 method. None of this code is currently working.
12737
12738
12739
12740 @example
12741 @file{iso-wide.h}
12742 @end example
12743
12744 This contains leftover code from an earlier implementation of
12745 Asian-language support, and is not currently used.
12746
12747
12748 @node The Lisp Reader and Compiler, Lstreams, Multilingual Support, Top
9591 @chapter The Lisp Reader and Compiler 12749 @chapter The Lisp Reader and Compiler
9592 @cindex Lisp reader and compiler, the 12750 @cindex Lisp reader and compiler, the
9593 @cindex reader and compiler, the Lisp 12751 @cindex reader and compiler, the Lisp
9594 @cindex compiler, the Lisp reader and 12752 @cindex compiler, the Lisp reader and
9595 12753
9614 * Lstream Types:: Different sorts of things that are streamed. 12772 * Lstream Types:: Different sorts of things that are streamed.
9615 * Lstream Functions:: Functions for working with lstreams. 12773 * Lstream Functions:: Functions for working with lstreams.
9616 * Lstream Methods:: Creating new lstream types. 12774 * Lstream Methods:: Creating new lstream types.
9617 @end menu 12775 @end menu
9618 12776
9619 @node Creating an Lstream 12777 @node Creating an Lstream, Lstream Types, Lstreams, Lstreams
9620 @section Creating an Lstream 12778 @section Creating an Lstream
9621 @cindex lstream, creating an 12779 @cindex lstream, creating an
9622 12780
9623 Lstreams come in different types, depending on what is being interfaced 12781 Lstreams come in different types, depending on what is being interfaced
9624 to. Although the primitive for creating new lstreams is 12782 to. Although the primitive for creating new lstreams is
9646 Open for reading, but ``read'' never returns partial MULE characters. 12804 Open for reading, but ``read'' never returns partial MULE characters.
9647 @item "wc" 12805 @item "wc"
9648 Open for writing, but never writes partial MULE characters. 12806 Open for writing, but never writes partial MULE characters.
9649 @end table 12807 @end table
9650 12808
9651 @node Lstream Types 12809 @node Lstream Types, Lstream Functions, Creating an Lstream, Lstreams
9652 @section Lstream Types 12810 @section Lstream Types
9653 @cindex lstream types 12811 @cindex lstream types
9654 @cindex types, lstream 12812 @cindex types, lstream
9655 12813
9656 @table @asis 12814 @table @asis
9673 @item decoding 12831 @item decoding
9674 12832
9675 @item encoding 12833 @item encoding
9676 @end table 12834 @end table
9677 12835
9678 @node Lstream Functions 12836 @node Lstream Functions, Lstream Methods, Lstream Types, Lstreams
9679 @section Lstream Functions 12837 @section Lstream Functions
9680 @cindex lstream functions 12838 @cindex lstream functions
9681 12839
9682 @deftypefun {Lstream *} Lstream_new (Lstream_implementation *@var{imp}, const char *@var{mode}) 12840 @deftypefun {Lstream *} Lstream_new (Lstream_implementation *@var{imp}, const char *@var{mode})
9683 Allocate and return a new Lstream. This function is not really meant to 12841 Allocate and return a new Lstream. This function is not really meant to
9757 12915
9758 @deftypefun void Lstream_rewind (Lstream *@var{stream}) 12916 @deftypefun void Lstream_rewind (Lstream *@var{stream})
9759 Rewind the stream to the beginning. 12917 Rewind the stream to the beginning.
9760 @end deftypefun 12918 @end deftypefun
9761 12919
9762 @node Lstream Methods 12920 @node Lstream Methods, , Lstream Functions, Lstreams
9763 @section Lstream Methods 12921 @section Lstream Methods
9764 @cindex lstream methods 12922 @cindex lstream methods
9765 12923
9766 @deftypefn {Lstream Method} Bytecount reader (Lstream *@var{stream}, unsigned char *@var{data}, Bytecount @var{size}) 12924 @deftypefn {Lstream Method} Bytecount reader (Lstream *@var{stream}, unsigned char *@var{data}, Bytecount @var{size})
9767 Read some data from the stream's end and store it into @var{data}, which 12925 Read some data from the stream's end and store it into @var{data}, which
9831 @cindex devices; frames; windows, consoles; 12989 @cindex devices; frames; windows, consoles;
9832 @cindex frames; windows, consoles; devices; 12990 @cindex frames; windows, consoles; devices;
9833 @cindex windows, consoles; devices; frames; 12991 @cindex windows, consoles; devices; frames;
9834 12992
9835 @menu 12993 @menu
9836 * Introduction to Consoles; Devices; Frames; Windows:: 12994 * Introduction to Consoles; Devices; Frames; Windows::
9837 * Point:: 12995 * Point::
9838 * Window Hierarchy:: 12996 * Window Hierarchy::
9839 * The Window Object:: 12997 * The Window Object::
12998 * Modules for the Basic Displayable Lisp Objects::
9840 @end menu 12999 @end menu
9841 13000
9842 @node Introduction to Consoles; Devices; Frames; Windows 13001 @node Introduction to Consoles; Devices; Frames; Windows, Point, Consoles; Devices; Frames; Windows, Consoles; Devices; Frames; Windows
9843 @section Introduction to Consoles; Devices; Frames; Windows 13002 @section Introduction to Consoles; Devices; Frames; Windows
9844 @cindex consoles; devices; frames; windows, introduction to 13003 @cindex consoles; devices; frames; windows, introduction to
9845 @cindex devices; frames; windows, introduction to consoles; 13004 @cindex devices; frames; windows, introduction to consoles;
9846 @cindex frames; windows, introduction to consoles; devices; 13005 @cindex frames; windows, introduction to consoles; devices;
9847 @cindex windows, introduction to consoles; devices; frames; 13006 @cindex windows, introduction to consoles; devices; frames;
9883 window, but every frame remembers the last window in it that was 13042 window, but every frame remembers the last window in it that was
9884 selected, and changing the selected frame causes the remembered window 13043 selected, and changing the selected frame causes the remembered window
9885 within it to become the selected window. Similar relationships apply 13044 within it to become the selected window. Similar relationships apply
9886 for consoles to devices and devices to frames. 13045 for consoles to devices and devices to frames.
9887 13046
9888 @node Point 13047 @node Point, Window Hierarchy, Introduction to Consoles; Devices; Frames; Windows, Consoles; Devices; Frames; Windows
9889 @section Point 13048 @section Point
9890 @cindex point 13049 @cindex point
9891 13050
9892 Recall that every buffer has a current insertion position, called 13051 Recall that every buffer has a current insertion position, called
9893 @dfn{point}. Now, two or more windows may be displaying the same buffer, 13052 @dfn{point}. Now, two or more windows may be displaying the same buffer,
9905 want to retrieve the correct value of @code{point} for a window, 13064 want to retrieve the correct value of @code{point} for a window,
9906 you must special-case on the selected window and retrieve the 13065 you must special-case on the selected window and retrieve the
9907 buffer's point instead. This is related to why @code{save-window-excursion} 13066 buffer's point instead. This is related to why @code{save-window-excursion}
9908 does not save the selected window's value of @code{point}. 13067 does not save the selected window's value of @code{point}.
9909 13068
9910 @node Window Hierarchy 13069 @node Window Hierarchy, The Window Object, Point, Consoles; Devices; Frames; Windows
9911 @section Window Hierarchy 13070 @section Window Hierarchy
9912 @cindex window hierarchy 13071 @cindex window hierarchy
9913 @cindex hierarchy of windows 13072 @cindex hierarchy of windows
9914 13073
9915 If a frame contains multiple windows (panes), they are always created 13074 If a frame contains multiple windows (panes), they are always created
10003 frames have no root window, and the @code{next} of the minibuffer window 13162 frames have no root window, and the @code{next} of the minibuffer window
10004 is @code{nil} but the @code{prev} points to itself. (#### This is an 13163 is @code{nil} but the @code{prev} points to itself. (#### This is an
10005 artifact that should be fixed.) 13164 artifact that should be fixed.)
10006 @end enumerate 13165 @end enumerate
10007 13166
10008 @node The Window Object 13167 @node The Window Object, Modules for the Basic Displayable Lisp Objects, Window Hierarchy, Consoles; Devices; Frames; Windows
10009 @section The Window Object 13168 @section The Window Object
10010 @cindex window object, the 13169 @cindex window object, the
10011 @cindex object, the window 13170 @cindex object, the window
10012 13171
10013 Windows have the following accessible fields: 13172 Windows have the following accessible fields:
10110 If the region (or part of it) is highlighted in this window, this field 13269 If the region (or part of it) is highlighted in this window, this field
10111 holds the mark position that made one end of that region. Otherwise, 13270 holds the mark position that made one end of that region. Otherwise,
10112 this field is @code{nil}. 13271 this field is @code{nil}.
10113 @end table 13272 @end table
10114 13273
13274 @node Modules for the Basic Displayable Lisp Objects, , The Window Object, Consoles; Devices; Frames; Windows
13275 @section Modules for the Basic Displayable Lisp Objects
13276 @cindex modules for the basic displayable Lisp objects
13277 @cindex displayable Lisp objects, modules for the basic
13278 @cindex Lisp objects, modules for the basic displayable
13279 @cindex objects, modules for the basic displayable Lisp
13280
13281 @example
13282 @file{console-msw.c}
13283 @file{console-msw.h}
13284 @file{console-stream.c}
13285 @file{console-stream.h}
13286 @file{console-tty.c}
13287 @file{console-tty.h}
13288 @file{console-x.c}
13289 @file{console-x.h}
13290 @file{console.c}
13291 @file{console.h}
13292 @end example
13293
13294 These modules implement the @dfn{console} Lisp object type. A console
13295 contains multiple display devices, but only one keyboard and mouse.
13296 Most of the time, a console will contain exactly one device.
13297
13298 Consoles are the top of a lisp object inclusion hierarchy. Consoles
13299 contain devices, which contain frames, which contain windows.
13300
13301
13302
13303 @example
13304 @file{device-msw.c}
13305 @file{device-tty.c}
13306 @file{device-x.c}
13307 @file{device.c}
13308 @file{device.h}
13309 @end example
13310
13311 These modules implement the @dfn{device} Lisp object type. This
13312 abstracts a particular screen or connection on which frames are
13313 displayed. As with Lisp objects, event interfaces, and other
13314 subsystems, the device code is separated into a generic component that
13315 contains a standardized interface (in the form of a set of methods) onto
13316 particular device types.
13317
13318 The device subsystem defines all the methods and provides method
13319 services for not only device operations but also for the frame, window,
13320 menubar, scrollbar, toolbar, and other displayable-object subsystems.
13321 The reason for this is that all of these subsystems have the same
13322 subtypes (X, TTY, NeXTstep, Microsoft Windows, etc.) as devices do.
13323
13324
13325
13326 @example
13327 @file{frame-msw.c}
13328 @file{frame-tty.c}
13329 @file{frame-x.c}
13330 @file{frame.c}
13331 @file{frame.h}
13332 @end example
13333
13334 Each device contains one or more frames in which objects (e.g. text) are
13335 displayed. A frame corresponds to a window in the window system;
13336 usually this is a top-level window but it could potentially be one of a
13337 number of overlapping child windows within a top-level window, using the
13338 MDI (Multiple Document Interface) protocol in Microsoft Windows or a
13339 similar scheme.
13340
13341 The @file{frame-*} files implement the @dfn{frame} Lisp object type and
13342 provide the generic and device-type-specific operations on frames
13343 (e.g. raising, lowering, resizing, moving, etc.).
13344
13345
13346
13347 @example
13348 @file{window.c}
13349 @file{window.h}
13350 @end example
13351
13352 @cindex window (in Emacs)
13353 @cindex pane
13354 Each frame consists of one or more non-overlapping @dfn{windows} (better
13355 known as @dfn{panes} in standard window-system terminology) in which a
13356 buffer's text can be displayed. Windows can also have scrollbars
13357 displayed around their edges.
13358
13359 @file{window.c} and @file{window.h} implement the @dfn{window} Lisp
13360 object type and provide code to manage windows. Since windows have no
13361 associated resources in the window system (the window system knows only
13362 about the frame; no child windows or anything are used for XEmacs
13363 windows), there is no device-type-specific code here; all of that code
13364 is part of the redisplay mechanism or the code for particular object
13365 types such as scrollbars.
13366
10115 @node The Redisplay Mechanism, Extents, Consoles; Devices; Frames; Windows, Top 13367 @node The Redisplay Mechanism, Extents, Consoles; Devices; Frames; Windows, Top
10116 @chapter The Redisplay Mechanism 13368 @chapter The Redisplay Mechanism
10117 @cindex redisplay mechanism, the 13369 @cindex redisplay mechanism, the
10118 13370
10119 The redisplay mechanism is one of the most complicated sections of 13371 The redisplay mechanism is one of the most complicated sections of
10133 @item 13385 @item
10134 It Is Better To Be Fast Than Not To Be. 13386 It Is Better To Be Fast Than Not To Be.
10135 @end enumerate 13387 @end enumerate
10136 13388
10137 @menu 13389 @menu
10138 * Critical Redisplay Sections:: 13390 * Critical Redisplay Sections::
10139 * Line Start Cache:: 13391 * Line Start Cache::
10140 * Redisplay Piece by Piece:: 13392 * Redisplay Piece by Piece::
13393 * Modules for the Redisplay Mechanism::
13394 * Modules for other Display-Related Lisp Objects::
10141 @end menu 13395 @end menu
10142 13396
10143 @node Critical Redisplay Sections 13397 @node Critical Redisplay Sections, Line Start Cache, The Redisplay Mechanism, The Redisplay Mechanism
10144 @section Critical Redisplay Sections 13398 @section Critical Redisplay Sections
10145 @cindex redisplay sections, critical 13399 @cindex redisplay sections, critical
10146 @cindex critical redisplay sections 13400 @cindex critical redisplay sections
10147 13401
10148 Within this section, we are defenseless and assume that the 13402 Within this section, we are defenseless and assume that the
10171 we simply return. #### We should abort instead. 13425 we simply return. #### We should abort instead.
10172 13426
10173 #### If a frame-size change does occur we should probably 13427 #### If a frame-size change does occur we should probably
10174 actually be preempting redisplay. 13428 actually be preempting redisplay.
10175 13429
10176 @node Line Start Cache 13430 @node Line Start Cache, Redisplay Piece by Piece, Critical Redisplay Sections, The Redisplay Mechanism
10177 @section Line Start Cache 13431 @section Line Start Cache
10178 @cindex line start cache 13432 @cindex line start cache
10179 13433
10180 The traditional scrolling code in Emacs breaks in a variable height 13434 The traditional scrolling code in Emacs breaks in a variable height
10181 world. It depends on the key assumption that the number of lines that 13435 world. It depends on the key assumption that the number of lines that
10232 @end itemize 13486 @end itemize
10233 13487
10234 In case you're wondering, the Second Golden Rule of Redisplay is not 13488 In case you're wondering, the Second Golden Rule of Redisplay is not
10235 applicable. 13489 applicable.
10236 13490
10237 @node Redisplay Piece by Piece 13491 @node Redisplay Piece by Piece, Modules for the Redisplay Mechanism, Line Start Cache, The Redisplay Mechanism
10238 @section Redisplay Piece by Piece 13492 @section Redisplay Piece by Piece
10239 @cindex redisplay piece by piece 13493 @cindex redisplay piece by piece
10240 13494
10241 As you can begin to see redisplay is complex and also not well 13495 As you can begin to see redisplay is complex and also not well
10242 documented. Chuck no longer works on XEmacs so this section is my take 13496 documented. Chuck no longer works on XEmacs so this section is my take
10282 a string we cannot use @code{create_text_block}. Instead we use 13536 a string we cannot use @code{create_text_block}. Instead we use
10283 @code{create_text_string_block} which performs the same function as 13537 @code{create_text_string_block} which performs the same function as
10284 @code{create_text_block} but for strings. Many of the complexities of 13538 @code{create_text_block} but for strings. Many of the complexities of
10285 @code{create_text_block} to do with cursor handling and selective 13539 @code{create_text_block} to do with cursor handling and selective
10286 display have been removed. 13540 display have been removed.
13541
13542 @node Modules for the Redisplay Mechanism, Modules for other Display-Related Lisp Objects, Redisplay Piece by Piece, The Redisplay Mechanism
13543 @section Modules for the Redisplay Mechanism
13544 @cindex modules for the redisplay mechanism
13545 @cindex redisplay mechanism, modules for the
13546
13547 @example
13548 @file{redisplay-output.c}
13549 @file{redisplay-msw.c}
13550 @file{redisplay-tty.c}
13551 @file{redisplay-x.c}
13552 @file{redisplay.c}
13553 @file{redisplay.h}
13554 @end example
13555
13556 These files provide the redisplay mechanism. As with many other
13557 subsystems in XEmacs, there is a clean separation between the general
13558 and device-specific support.
13559
13560 @file{redisplay.c} contains the bulk of the redisplay engine. These
13561 functions update the redisplay structures (which describe how the screen
13562 is to appear) to reflect any changes made to the state of any
13563 displayable objects (buffer, frame, window, etc.) since the last time
13564 that redisplay was called. These functions are highly optimized to
13565 avoid doing more work than necessary (since redisplay is called
13566 extremely often and is potentially a huge time sink), and depend heavily
13567 on notifications from the objects themselves that changes have occurred,
13568 so that redisplay doesn't explicitly have to check each possible object.
13569 The redisplay mechanism also contains a great deal of caching to further
13570 speed things up; some of this caching is contained within the various
13571 displayable objects.
13572
13573 @file{redisplay-output.c} goes through the redisplay structures and converts
13574 them into calls to device-specific methods to actually output the screen
13575 changes.
13576
13577 @file{redisplay-x.c} and @file{redisplay-tty.c} are two implementations
13578 of these redisplay output methods, for X frames and TTY frames,
13579 respectively.
13580
13581
13582
13583 @example
13584 @file{indent.c}
13585 @end example
13586
13587 This module contains various functions and Lisp primitives for
13588 converting between buffer positions and screen positions. These
13589 functions call the redisplay mechanism to do most of the work, and then
13590 examine the redisplay structures to get the necessary information. This
13591 module needs work.
13592
13593
13594
13595 @example
13596 @file{termcap.c}
13597 @file{terminfo.c}
13598 @file{tparam.c}
13599 @end example
13600
13601 These files contain functions for working with the termcap (BSD-style)
13602 and terminfo (System V style) databases of terminal capabilities and
13603 escape sequences, used when XEmacs is displaying in a TTY.
13604
13605
13606
13607 @example
13608 @file{cm.c}
13609 @file{cm.h}
13610 @end example
13611
13612 These files provide some miscellaneous TTY-output functions and should
13613 probably be merged into @file{redisplay-tty.c}.
13614
13615
13616
13617 @node Modules for other Display-Related Lisp Objects, , Modules for the Redisplay Mechanism, The Redisplay Mechanism
13618 @section Modules for other Display-Related Lisp Objects
13619 @cindex modules for other display-related Lisp objects
13620 @cindex display-related Lisp objects, modules for other
13621 @cindex Lisp objects, modules for other display-related
13622
13623 @example
13624 @file{faces.c}
13625 @file{faces.h}
13626 @end example
13627
13628
13629
13630 @example
13631 @file{bitmaps.h}
13632 @file{glyphs-eimage.c}
13633 @file{glyphs-msw.c}
13634 @file{glyphs-msw.h}
13635 @file{glyphs-widget.c}
13636 @file{glyphs-x.c}
13637 @file{glyphs-x.h}
13638 @file{glyphs.c}
13639 @file{glyphs.h}
13640 @end example
13641
13642
13643
13644 @example
13645 @file{objects-msw.c}
13646 @file{objects-msw.h}
13647 @file{objects-tty.c}
13648 @file{objects-tty.h}
13649 @file{objects-x.c}
13650 @file{objects-x.h}
13651 @file{objects.c}
13652 @file{objects.h}
13653 @end example
13654
13655
13656
13657 @example
13658 @file{menubar-msw.c}
13659 @file{menubar-msw.h}
13660 @file{menubar-x.c}
13661 @file{menubar.c}
13662 @file{menubar.h}
13663 @end example
13664
13665
13666
13667 @example
13668 @file{scrollbar-msw.c}
13669 @file{scrollbar-msw.h}
13670 @file{scrollbar-x.c}
13671 @file{scrollbar-x.h}
13672 @file{scrollbar.c}
13673 @file{scrollbar.h}
13674 @end example
13675
13676
13677
13678 @example
13679 @file{toolbar-msw.c}
13680 @file{toolbar-x.c}
13681 @file{toolbar.c}
13682 @file{toolbar.h}
13683 @end example
13684
13685
13686
13687 @example
13688 @file{font-lock.c}
13689 @end example
13690
13691 This file provides C support for syntax highlighting---i.e.
13692 highlighting different syntactic constructs of a source file in
13693 different colors, for easy reading. The C support is provided so that
13694 this is fast.
13695
13696
13697
13698 @example
13699 @file{dgif_lib.c}
13700 @file{gif_err.c}
13701 @file{gif_lib.h}
13702 @file{gifalloc.c}
13703 @end example
13704
13705 These modules decode GIF-format image files, for use with glyphs.
13706 These files were removed due to Unisys patent infringement concerns.
13707
10287 13708
10288 @node Extents, Faces, The Redisplay Mechanism, Top 13709 @node Extents, Faces, The Redisplay Mechanism, Top
10289 @chapter Extents 13710 @chapter Extents
10290 @cindex extents 13711 @cindex extents
10291 13712
10296 * Zero-Length Extents:: A weird special case. 13717 * Zero-Length Extents:: A weird special case.
10297 * Mathematics of Extent Ordering:: A rigorous foundation. 13718 * Mathematics of Extent Ordering:: A rigorous foundation.
10298 * Extent Fragments:: Cached information useful for redisplay. 13719 * Extent Fragments:: Cached information useful for redisplay.
10299 @end menu 13720 @end menu
10300 13721
10301 @node Introduction to Extents 13722 @node Introduction to Extents, Extent Ordering, Extents, Extents
10302 @section Introduction to Extents 13723 @section Introduction to Extents
10303 @cindex extents, introduction to 13724 @cindex extents, introduction to
10304 13725
10305 Extents are regions over a buffer, with a start and an end position 13726 Extents are regions over a buffer, with a start and an end position
10306 denoting the region of the buffer included in the extent. In 13727 denoting the region of the buffer included in the extent. In
10319 automatically go inside or out of extents as necessary with no 13740 automatically go inside or out of extents as necessary with no
10320 further work needing to be done. It didn't work out that way, 13741 further work needing to be done. It didn't work out that way,
10321 however, and just ended up complexifying and buggifying all the 13742 however, and just ended up complexifying and buggifying all the
10322 rest of the code.) 13743 rest of the code.)
10323 13744
10324 @node Extent Ordering 13745 @node Extent Ordering, Format of the Extent Info, Introduction to Extents, Extents
10325 @section Extent Ordering 13746 @section Extent Ordering
10326 @cindex extent ordering 13747 @cindex extent ordering
10327 13748
10328 Extents are compared using memory indices. There are two orderings 13749 Extents are compared using memory indices. There are two orderings
10329 for extents and both orders are kept current at all times. The normal 13750 for extents and both orders are kept current at all times. The normal
10354 The display order and the e-order are complementary orders: any 13775 The display order and the e-order are complementary orders: any
10355 theorem about the display order also applies to the e-order if you swap 13776 theorem about the display order also applies to the e-order if you swap
10356 all occurrences of ``display order'' and ``e-order'', ``less than'' and 13777 all occurrences of ``display order'' and ``e-order'', ``less than'' and
10357 ``greater than'', and ``extent start'' and ``extent end''. 13778 ``greater than'', and ``extent start'' and ``extent end''.
10358 13779
10359 @node Format of the Extent Info 13780 @node Format of the Extent Info, Zero-Length Extents, Extent Ordering, Extents
10360 @section Format of the Extent Info 13781 @section Format of the Extent Info
10361 @cindex extent info, format of the 13782 @cindex extent info, format of the
10362 13783
10363 An extent-info structure consists of a list of the buffer or string's 13784 An extent-info structure consists of a list of the buffer or string's
10364 extents and a @dfn{stack of extents} that lists all of the extents over 13785 extents and a @dfn{stack of extents} that lists all of the extents over
10417 An alternative would be balanced binary trees, which have guaranteed 13838 An alternative would be balanced binary trees, which have guaranteed
10418 @math{O(log N)} time for all operations (although the constant factors 13839 @math{O(log N)} time for all operations (although the constant factors
10419 are not as good, and repeated localized operations will be slower than 13840 are not as good, and repeated localized operations will be slower than
10420 for a gap array). Such code is quite tricky to write, however. 13841 for a gap array). Such code is quite tricky to write, however.
10421 13842
10422 @node Zero-Length Extents 13843 @node Zero-Length Extents, Mathematics of Extent Ordering, Format of the Extent Info, Extents
10423 @section Zero-Length Extents 13844 @section Zero-Length Extents
10424 @cindex zero-length extents 13845 @cindex zero-length extents
10425 @cindex extents, zero-length 13846 @cindex extents, zero-length
10426 13847
10427 Extents can be zero-length, and will end up that way if their endpoints 13848 Extents can be zero-length, and will end up that way if their endpoints
10448 13869
10449 Note that closed-open, non-detachable zero-length extents behave 13870 Note that closed-open, non-detachable zero-length extents behave
10450 exactly like markers and that open-closed, non-detachable zero-length 13871 exactly like markers and that open-closed, non-detachable zero-length
10451 extents behave like the ``point-type'' marker in Mule. 13872 extents behave like the ``point-type'' marker in Mule.
10452 13873
10453 @node Mathematics of Extent Ordering 13874 @node Mathematics of Extent Ordering, Extent Fragments, Zero-Length Extents, Extents
10454 @section Mathematics of Extent Ordering 13875 @section Mathematics of Extent Ordering
10455 @cindex mathematics of extent ordering 13876 @cindex mathematics of extent ordering
10456 @cindex extent mathematics 13877 @cindex extent mathematics
10457 @cindex extent ordering 13878 @cindex extent ordering
10458 13879
10576 Proof: If @math{F2} does not include @math{I} then its start index is 13997 Proof: If @math{F2} does not include @math{I} then its start index is
10577 greater than @math{I} and thus it is greater than any extent in 13998 greater than @math{I} and thus it is greater than any extent in
10578 @math{S}, including @math{F}. Otherwise, @math{F2} includes @math{I} 13999 @math{S}, including @math{F}. Otherwise, @math{F2} includes @math{I}
10579 and thus is in @math{S}, and thus @math{F2 >= F}. 14000 and thus is in @math{S}, and thus @math{F2 >= F}.
10580 14001
10581 @node Extent Fragments 14002 @node Extent Fragments, , Mathematics of Extent Ordering, Extents
10582 @section Extent Fragments 14003 @section Extent Fragments
10583 @cindex extent fragments 14004 @cindex extent fragments
10584 @cindex fragments, extent 14005 @cindex fragments, extent
10585 14006
10586 Imagine that the buffer is divided up into contiguous, non-overlapping 14007 Imagine that the buffer is divided up into contiguous, non-overlapping
10759 @chapter Specifiers 14180 @chapter Specifiers
10760 @cindex specifiers 14181 @cindex specifiers
10761 14182
10762 Not yet documented. 14183 Not yet documented.
10763 14184
14185 Specifiers are documented in depth in the Lisp Reference manual.
14186 @xref{Specifiers,,, lispref, XEmacs Lisp Reference Manual}. The code in
14187 @file{specifier.c} is pretty straightforward.
14188
10764 @node Menus, Subprocesses, Specifiers, Top 14189 @node Menus, Subprocesses, Specifiers, Top
10765 @chapter Menus 14190 @chapter Menus
10766 @cindex menus 14191 @cindex menus
10767 14192
10768 A menu is set by setting the value of the variable 14193 A menu is set by setting the value of the variable
10812 @code{menubar_selection_callback()} enqueues a menu event, putting in it 14237 @code{menubar_selection_callback()} enqueues a menu event, putting in it
10813 a function to call (either @code{eval} or @code{call-interactively}) and 14238 a function to call (either @code{eval} or @code{call-interactively}) and
10814 its argument, which is the callback function or form given in the menu's 14239 its argument, which is the callback function or form given in the menu's
10815 description. 14240 description.
10816 14241
10817 @node Subprocesses, Interface to the X Window System, Menus, Top 14242 @node Subprocesses, Interface to MS Windows, Menus, Top
10818 @chapter Subprocesses 14243 @chapter Subprocesses
10819 @cindex subprocesses 14244 @cindex subprocesses
10820 14245
10821 The fields of a process are: 14246 The fields of a process are:
10822 14247
10886 @item tty_name 14311 @item tty_name
10887 The name of the terminal that the subprocess is using, 14312 The name of the terminal that the subprocess is using,
10888 or @code{nil} if it is using pipes. 14313 or @code{nil} if it is using pipes.
10889 @end table 14314 @end table
10890 14315
10891 @node Interface to the X Window System, Index, Subprocesses, Top 14316 @node Interface to MS Windows, Interface to the X Window System, Subprocesses, Top
14317 @chapter Interface to MS Windows
14318 @cindex MS Windows, interface to
14319 @cindex Windows, interface to
14320
14321 @menu
14322 * Different kinds of Windows environments::
14323 * Windows Build Flags::
14324 * Windows I18N Introduction::
14325 * Modules for Interfacing with MS Windows::
14326 @end menu
14327
14328 @node Different kinds of Windows environments, Windows Build Flags, Interface to MS Windows, Interface to MS Windows
14329 @section Different kinds of Windows environments
14330 @cindex different kinds of Windows environments
14331 @cindex Windows environments, different kinds of
14332 @cindex MS Windows environments, different kinds of
14333
14334 @subsubheading (a) operating system (OS) vs. window system vs. Win32 API vs. C runtime library (CRT) vs. and compiler
14335
14336 There are various Windows operating systems (Windows NT, 2000, XP, 95,
14337 98, ME, etc.), which come in two basic classes: Windows NT (NT, 2000,
14338 XP, and all future versions) and 9x (95, 98, ME). 9x-class operating
14339 systems are a kind of hodgepodge of a 32-bit upper layer on top of a
14340 16-bit MS-DOS-compatible lower layer. NT-class operating systems are
14341 written from the ground up as 32-bit (there are also 64-bit versions
14342 available now), and provide many more features and much greater
14343 stability, since there is full memory protection between all processes
14344 and the between processes and the system. NT-class operating systems
14345 also provide emulation for DOS programs inside of a "sandbox" (i.e. a
14346 walled-off environment in which one DOS program can screw up another
14347 one, but there is theoretically no way for a DOS program to screw up the
14348 OS itself). From the perspective of XEmacs, the different between NT
14349 and 9x is very important in Unicode support (not really provided under
14350 9x -- see @file{intl-win32.c}) and subprocess creation, among other things.
14351
14352 The operating system provides the framework for accessing files and
14353 devices and running programs. From the perspective of a program, the
14354 operating system provides a set of services. At the lowest level, the
14355 way to call these services is dependent on the processor the OS is
14356 running on, but a portable interface is provided to C programs through
14357 functions called "system calls". Under Windows, this interface is called
14358 the Win32 API, and includes file-manipulation calls such as @code{CreateFile()}
14359 and @code{ReadFile()}, process-creation calls such as @code{CreateProcess()}, etc.
14360
14361 This concept of system calls goes back to Unix, where similar services
14362 are available but through routines with different, simpler names, such
14363 as @code{open()}, @code{read()}, @code{fork()}, @code{execve()}, etc. In addition, Unix provides
14364 a higher layer of routines, called the C Runtime Library (CRT), which
14365 provide higher-level, more convenient versions of the same services (e.g.
14366 "stream-oriented" file routines such as @code{fopen()} and @code{fread()}) as well
14367 as various other utility functions, such as string-manipulation routines
14368 (e.g. @code{strcpy()} and @code{strcmp()}).
14369
14370 For compatibility, a C Runtime Library (CRT) is also provided under
14371 Windows, which provides a partial implementation of both the Unix CRT
14372 and the Unix system-call API, implemented using the Win32 API. The CRT
14373 sources come with Visual C++ (VC++). For example, under VC++ 6, look in
14374 the CRT/SRC directory, e.g. for me (ben): /Program Files/Microsoft
14375 Visual Studio/VC98/CRT/SRC. The CRT is provided using either MSVCRT
14376 (dynamically linked) or @file{LIBC.LIB} (statically linked).
14377
14378 The window system provides the framework for creating overlapped windows
14379 and unifying signals provided by various devices (input devices such as
14380 the keyboard and mouse, timers, etc.) into a single event queue (or
14381 "message queue", under Windows). Like the operating system, the window
14382 system can be viewed from the perspective of a program as a set of
14383 services provided by an API of function calls. Under Windows,
14384 window-system services are also available through the Win32 API, while
14385 under UNIX the window system is typically a separate component (e.g. the
14386 X Windowing System, aka X Windows or X11). The term "GUI" ("graphical
14387 user interface") is often used to refer to the services provided by the
14388 window system, or to a windowing interface provided by a program.
14389
14390 The Win32 API is implemented by various dynamic libraries, or DLL's.
14391 The most important are KERNEL32, USER32, and GDI32. KERNEL32 implements
14392 the basic file-system and process services. USER32 implements the
14393 fundamental window-system services such as creating windows and handling
14394 messages. GDI32 implements higher-level drawing capabilities -- fonts,
14395 colors, lines, etc.
14396
14397 C programs are compiled into executables using a compiler. Under Unix,
14398 a compiler usually comes as part of the operating system, but not under
14399 Windows, where the compiler is a separate product. Even under Unix,
14400 people often install their own compilers, such as gcc. Under Windows,
14401 the Microsoft-standard compiler is Visual C++ (VC++).
14402
14403 It is possible to provide an emulation of any API using any other, as
14404 long as the underlying API provides the suitable functionality. This is
14405 what Cygwin (www.cygwin.com) does. It provides a fairly complete POSIX
14406 emulation layer (POSIX is a government standard for Unix behavior) on
14407 top of MS Windows -- in particular, providing the file-system, process,
14408 tty, and signal semantics that are part of a modern, standard Unix
14409 operating system. Cygwin does this using its own DLL, @file{cygwin1.dll},
14410 which makes calls to the Win32 API services in @file{kernel32.dll}. Cygwin
14411 also provides its own implementation of the C runtime library, called
14412 @code{newlib} (@file{libcygwin.a}; @file{libc.a} and @file{libm.a} are symlinked to it), which is
14413 implemented on top of the Unix system calls provided in @file{cygwin1.dll}. In
14414 addition, Cygwin provides static import libraries that give you direct
14415 access to the Win32 API -- XEmacs uses this to provide GUI support under
14416 Cygwin. Cygwin provides a version of GCC (the GNU Project C compiler)
14417 that is set up to automatically link with the appropriate Cygwin
14418 libraries. Cygwin also provides, as optional components, pre-compiled
14419 binaries for a great number of open-source programs compiled under the
14420 Cygwin environment. This includes all of the standard Unix file-system,
14421 text-manipulation, development, networking, database, etc. utilities, a
14422 version of X Windows that uses the Win32 API underlyingly (see below),
14423 and compilations of nearly all other common open-source packages
14424 (Apache, TeX, [X]Emacs, Ghostscript, GTK, ImageMagick, etc.).
14425
14426 Similarly, you can emulate the functionality of X Windows using the
14427 Win32 component of the Win32 API. Cygwin provides a package to do this,
14428 from the XFree86 project. Other versions of X under Windows also exist,
14429 such as the MicroImages MI/X server. Each version potentially can come
14430 comes with its own header and library files, allowing you to compile
14431 X-Windows programs.
14432
14433 All of these different operating system and emulation layers can make
14434 for a fair amount of confusion, so:
14435
14436 @subsubheading (b) CRT is not the same as VC++
14437
14438 Note that the CRT is @strong{NOT} (completely) part of VC++. True, if you link
14439 statically, the CRT (in the form of @file{LIBC.LIB}, which comes with VC++)
14440 will be inserted into the executable (.EXE), but otherwise the CRT will
14441 be separate. The dynamic version of the CRT is provided by @file{MSVCRT.DLL}
14442 (or @file{MSVCRTD.DLL}, for debugging), which comes with Windows. Hence, it's
14443 possible to use a different compiler and still link with MSVCRT -- which
14444 is exactly what MinGW does.
14445
14446 @subsubheading (c) CRT is not the same as the Win32 API
14447
14448 Note also that the CRT is totally separate from the Win32 API. They
14449 provide different functions and are implemented in different DLL's.
14450 They are also different levels -- the CRT is implemented on top of
14451 Win32. Sometimes the CRT and Win32 both have their own versions of
14452 similar concepts, such as locales. These are typically maintained
14453 separately, and can get out of sync. Do not assume that changing a
14454 setting in the CRT will have any effect on Win32 API routines using a
14455 similar concept unless the CRT docs specifically say so. Do not assume
14456 that behavior described for CRT functions applies to Win32 API or
14457 vice-versa. Note also that the CRT knows about and is implemented on
14458 top of the Win32 API, while the Win32 API knows nothing about the CRT.
14459
14460 @subsubheading (d) MinGW is not the same as Cygwin
14461
14462 As described in (b), Microsoft's version of the CRT (@file{MSVCRT.DLL}) is
14463 provided as part of Windows, separate from VC++, which must be
14464 purchased. Hence, it is possible to write MSVCRT to provide CRT
14465 services without using VC++. This is what MinGW (www.mingw.org) does --
14466 it is a port of GCC that will use MSVCRT. The reason one might want to
14467 do this is (a) it is free, and (b) it does not require a separately
14468 installed DLL, as Cygwin does. (#### Maybe MinGW targets CRTDLL, not
14469 MSVCRT? If so, what is CRTDLL, and how does it differ from MSVCRT and
14470 @file{LIBC.LIB}?) Primarily, what MinGW provides is patches to GCC (now
14471 integrated into the standard distribution) and its own header files and
14472 import libraries that are compatible with MSVCRT. The best way to think
14473 of MinGW is as simply another Windows compiler, like how there used to
14474 be Microsoft and Borland compilers. Because MinGW programs use all the
14475 same libraries as VC++ programs, and hence the same services are
14476 available, programs that compile under VC++ should compile under MinGW
14477 with very little change, whereas programs that compile under Cygwin will
14478 look quite different.
14479
14480 The confusion between MinGW and Cygwin is the confusion between the
14481 environment that a compiler runs under and the target environment of a
14482 program, i.e. the environment that a program is compiled to run under.
14483 It's theoretically possible, for example, to compile a program under
14484 Windows and generate a binary that can only be run under Linux, or
14485 vice-versa -- or, for that matter, to use Windows, running on an Intel
14486 machine to write and a compile a program that will run on the Mac OS,
14487 running on a PowerPC machine. This is called cross-compiling, and while
14488 it may seem rather esoteric, it is quite normal when you want to
14489 generate a program for a machine that you cannot develop on -- for
14490 example, a program that will run on a Palm Pilot. Originally, this is
14491 how MinGW worked -- you needed to run GCC under a Cygwin environment and
14492 give it appropriate flags, telling it to use the MinGW headers and
14493 target @file{MSVCRT.DLL} rather than @file{CYGWIN1.DLL}. (In fact,
14494 Cygwin standardly comes with MinGW's header files.) This was because GCC
14495 was written with Unix in mind and relied on a large amount of
14496 Unix-specific functionality. To port GCC to Windows without using a
14497 POSIX emulation layer would mean a lot of rewriting of GCC. Eventually,
14498 however, this was done, and it GCC was itself compiled using MinGW. The
14499 result is that currently you can develop MinGW applications either under
14500 Cygwin or under native Windows.
14501
14502 @subsubheading (e) Operating system is not the same as window system
14503
14504 As per the above discussion, we can use either Native Windows (the OS
14505 part of Win32 provided by @file{KERNEL32.DLL} and the Windows CRT as
14506 provided by MSVCRT or CLL) or Cygwin to provide operating-system
14507 functionality, and we can use either Native Windows (the windowing part
14508 of Win32 as provided by @file{USER32.DLL} and @file{GDI32.DLL}) or X11
14509 to provide window-system functionality. This gives us four possible
14510 build environments. It's currently possible to build XEmacs with at
14511 least three of these combinations -- as far as I know native + X11 is no
14512 longer supported, although it used to be (support used to exist in
14513 @file{xemacs.mak} for linking with some X11 libraries available from
14514 somewhere, but it was bit-rotting and you could always use Cygwin; ####
14515 what happens if we try to compile with MinGW, native OS + X11?). This
14516 may still seem confusing, so:
14517
14518 @table @asis
14519 @item Native OS + native windowing
14520 We call @code{CreateProcess()} to run subprocesses
14521 (@file{process-nt.c}), and @code{CreateWindowEx()} to create a top-level
14522 window (@file{frame-msw.c}). We use @file{nt/xemacs.mak} to compile
14523 with VC++, linking with the Windows CRT (@file{MSVCRT.DLL} or
14524 @file{LIBC.LIB}) and with the various Win32 DLL's (@file{KERNEL32.DLL},
14525 @file{USER32.DLL}, @file{GDI32.DLL}); or we use
14526 @file{src/Makefile[.in.in]} to compile with GCC, telling it
14527 (e.g. -mno-cygwin, see @file{s/mingw32.h}) to use MinGW (which will end
14528 up linking with @file{MSVCRT.DLL}), and linking GCC with -lshell32
14529 -lgdi32 -luser32 etc. (see @file{configure.in}).
14530
14531 @item Cygwin + native windowing
14532 We call @code{fork()}/@code{execve()} to run subprocesses
14533 (@file{process-unix.c}), and @code{CreateWindowEx()} to create a
14534 top-level window (@file{frame-msw.c}). We use
14535 @file{src/Makefile[in.in]} to compile with GCC (it will end up linking
14536 with @file{CYGWIN1.DLL}) and link GCC with -lshell32 -lgdi32 -luser32
14537 etc. (see @file{configure.in}).
14538
14539 @item Cygwin + X11
14540 We call @code{fork()}/@code{execve()} to run subprocesses
14541 (@file{process-unix.c}), and @code{XtCreatePopupShell()} to create a
14542 top-level window (@file{frame-x.c}). We use @file{src/Makefile[.in.in]}
14543 to compile with GCC (it will end up linking with @file{CYGWIN1.DLL}) and
14544 link GCC with -lXt, -lX11, etc. (see @file{configure.in}).
14545
14546 Finally, if native OS + X11 were possible, it might look something like
14547
14548 @item [Native OS + X11]
14549 We call @code{CreateProcess()} to run subprocesses
14550 (@file{process-nt.c}), and @code{XtCreatePopupShell()} to create a
14551 top-level window (@file{frame-x.c}). We use @file{nt/xemacs.mak} to
14552 compile with VC++, linking with the Windows CRT (@file{MSVCRT.DLL} or
14553 @file{LIBC.LIB}) and with the various X11 DLL's (@file{XT.DLL},
14554 @file{XLIB.DLL}, etc.); or we use @file{src/Makefile[.in.in]} to compile with
14555 GCC, telling it (e.g. -mno-cygwin, see @file{s/mingw32.h}) to use MinGW
14556 (which will end up linking with @file{MSVCRT.DLL}), and linking GCC with
14557 -lXt, -lX11, etc. (see @file{configure.in}).
14558 @end table
14559
14560 One of the reasons that we maintain the ability to build under Cygwin
14561 and X11 on Windows, when we have native support, is that it allows
14562 Windows compilers to test under a Unix-like environment.
14563
14564 @node Windows Build Flags, Windows I18N Introduction, Different kinds of Windows environments, Interface to MS Windows
14565 @section Windows Build Flags
14566 @cindex Windows build flags
14567 @cindex MS Windows build flags
14568 @cindex build flags, Windows
14569
14570 @table @code
14571 @item CYGWIN
14572 for Cygwin-only stuff.
14573 @item WIN32_NATIVE
14574 Win32 native OS-level stuff (files, process, etc.). Applies whenever
14575 linking against the native C libraries -- i.e. all compilations with
14576 VC++ and with MINGW, but never Cygwin.
14577 @item HAVE_X_WINDOWS
14578 for X Windows (regardless of whether under MS Win)
14579 @item HAVE_MS_WINDOWS
14580 MS Windows native windowing system (anything related to the appearance
14581 of the graphical screen). May or may not apply to any of VC++, MINGW,
14582 Cygwin.
14583 @end table
14584
14585 Finally, there's also the MINGW build environment, which uses GCC
14586 (similar to Cygwin), but native MS Windows libraries rather than a
14587 POSIX emulation layer (the Cygwin approach). This environment defines
14588 WIN32_NATIVE, but also defines MINGW, which is used mostly because
14589 uses its own include files (related to Cygwin), which have a few
14590 things messed up.
14591
14592 Formerly, we had a whole host of flags. Here's the conversion, for porting
14593 code from GNU Emacs and such:
14594
14595 @c @multitable {Old Constant} {determine whether this code is really specific to MS-DOS (and not Windows -- e.g. DJGPP code}
14596 @multitable @columnfractions .25 .75
14597 @item Old Constant @tab New Constant
14598 @item ----------------------------------------------------------------
14599 @item @code{WINDOWSNT}
14600 @tab @code{WIN32_NATIVE}
14601 @item @code{WIN32}
14602 @tab @code{WIN32_NATIVE}
14603 @item @code{_WIN32}
14604 @tab @code{WIN32_NATIVE}
14605 @item @code{HAVE_WIN32}
14606 @tab @code{WIN32_NATIVE}
14607 @item @code{DOS_NT}
14608 @tab @code{WIN32_NATIVE}
14609 @item @code{HAVE_NTGUI}
14610 @tab @code{WIN32_NATIVE}, unless it ends up already bracketed by this
14611 @item @code{HAVE_FACES}
14612 @tab always true
14613 @item @code{MSDOS}
14614 @tab determine whether this code is really specific to MS-DOS (and not
14615 Windows -- e.g. DJGPP code); if so, delete the code; otherwise,
14616 convert to @code{WIN32_NATIVE} (we do not support MS-DOS w/DOS Extender
14617 under XEmacs)
14618 @item @code{__CYGWIN__}
14619 @tab @code{CYGWIN}
14620 @item @code{__CYGWIN32__}
14621 @tab @code{CYGWIN}
14622 @item @code{__MINGW32__}
14623 @tab @code{MINGW}
14624 @end multitable
14625
14626 @node Windows I18N Introduction, Modules for Interfacing with MS Windows, Windows Build Flags, Interface to MS Windows
14627 @section Windows I18N Introduction
14628 @cindex Windows I18N
14629 @cindex I18N, Windows
14630 @cindex MS Windows I18N
14631
14632 @strong{Abstract:} This page provides an overview of the aspects of the
14633 Win32 internationalization API that are relevant to XEmacs, including
14634 the basic distinction between multibyte and Unicode encodings. Also
14635 included are pointers to how XEmacs should make use of this API.
14636
14637 The Win32 API is quite well-designed in its handling of strings encoded
14638 for various character sets. The API is geared around the idea that two
14639 different methods of encoding strings should be supported. These
14640 methods are called multibyte and Unicode, respectively. The multibyte
14641 encoding is compatible with ASCII strings and is a more efficient
14642 representation when dealing with strings containing primarily ASCII
14643 characters, but it has a great number of serious deficiencies and
14644 limitations, including that it is very difficult and error-prone to work
14645 with strings in this encoding, and any particular string in a multibyte
14646 encoding can only contain characters from a very limited number of
14647 character sets. The Unicode encoding rectifies all of these
14648 deficiencies, but it is not compatible with ASCII strings (in other
14649 words, an existing program will not be able to handle the encoded
14650 strings unless it is explicitly modified to do so), and it takes up
14651 twice as much memory space as multibyte encodings when encoding a purely
14652 ASCII string.
14653
14654 Multibyte encodings use a variable number of bytes (either one or two)
14655 to represent characters. ASCII characters are also represented by a
14656 single byte with its high bit not set, and non-ASCII characters are
14657 represented by one or two bytes, the first of which always has its high
14658 bit set. (The second byte, when it exists, may or may not have its high
14659 bit set.) There is no single multibyte encoding. Instead, there is
14660 generally one encoding per non-ASCII character set. Such an encoding is
14661 capable of representing (besides ASCII characters, of course) only
14662 characters from one (or possibly two) particular character sets.
14663
14664 Multibyte encoding makes processing of strings very difficult. For
14665 example, given a pointer to the beginning of a character within a
14666 string, finding the pointer to the beginning of the previous character
14667 may require backing up all the way to the beginning of the string, and
14668 then moving forward. Also, an operation such as separating out the
14669 components of a path by searching for backslashes will fail if it's
14670 implemented in the simplest (but not multibyte-aware) fashion, because
14671 it may find what appears to be a backslash, but which is actually the
14672 second byte of a two-byte character. Also, the limited number of
14673 character sets that any particular multibyte encoding can represent
14674 means that loss of data is likely if a string is converted from the
14675 XEmacs internal format into a multibyte format.
14676
14677 For these reasons, the C code in XEmacs should never do any sort of work
14678 with multibyte encoded strings (or with strings in any external encoding
14679 for that matter). Strings should always be maintained in the internal
14680 encoding, which is predictable, and converted to an external encoding
14681 only at the point where the string moves from the XEmacs C code and
14682 enters a system library function. Similarly, when a string is returned
14683 from a system library function, it should be immediately converted into
14684 the internal coding before any operations are done on it.
14685
14686 Unicode, unlike multibyte encodings, is a fixed-width encoding where
14687 every character is represented using 16 bits. It is also capable of
14688 encoding all the characters from all the character sets in common use in
14689 the world. The predictability and completeness of the Unicode encoding
14690 makes it a very good encoding for strings that may contain characters
14691 from many character sets mixed up with each other. At the same time, of
14692 course, it is incompatible with routines that expect ASCII characters
14693 and also incompatible with general string manipulation routines, which
14694 will encounter a great number of what would appear to be embedded nulls
14695 in the string. It also takes twice as much room to encode strings
14696 containing primarily ASCII characters. This is why XEmacs does not use
14697 Unicode or similar encoding internally for buffers.
14698
14699 The Win32 API cleverly deals with the issue of 8 bit vs. 16 bit
14700 characters by declaring a type called @code{@dfn{TCHAR}} which specifies
14701 a generic character, either 8 bits or 16 bits. Generally @code{TCHAR}
14702 is defined to be the same as the simple C type @code{char}, unless the
14703 preprocessor constant @code{UNICODE} is defined, in which case
14704 @code{TCHAR} is defined to be @code{WCHAR}, which is a 16 bit type.
14705 Nearly all functions in the Win32 API that take strings are defined to
14706 take strings that are actually arrays of @code{TCHAR}s. There is a type
14707 @code{LPTSTR} which is defined to be a string of @code{TCHAR}s and
14708 another type @code{LPCTSTR} which is a const string of @code{TCHAR}s.
14709 The theory is that any program that uses @code{TCHAR}s exclusively to
14710 represent characters and does not make assumptions about the size of a
14711 @code{TCHAR} or the way that the characters are encoded should work
14712 transparently regardless of whether the @code{UNICODE} preprocessor
14713 constant is defined, which is to say, regardless of whether 8 bit
14714 multibyte or 16 bit Unicode characters are being used. The way that
14715 this is actually implemented is that every Win32 API function that takes
14716 a string as an argument actually maps to one of two functions which are
14717 suffixed with an @code{A} (which stands for ANSI, and means multibyte
14718 strings) or @code{W} (which stands for wide, and means Unicode strings).
14719 The mapping is, of course, controlled by the same @code{UNICODE}
14720 preprocessor constant. Generally all structures containing strings in
14721 them actually map to one of two different kinds of structures, with
14722 either an @code{A} or a @code{W} suffix after the structure name.
14723
14724 Unfortunately, not all of the implementations of the Win32 API
14725 implement all of the functionality described above. In particular,
14726 Windows 95 does not implement very much Unicode functionality. It
14727 does implement functions to convert multibyte-encoded strings to and
14728 from Unicode strings, and provides Unicode versions of certain
14729 low-level functions like @code{ExtTextOut()}. In fact, all of
14730 the rest of the Unicode versions of API functions are just stubs that
14731 return an error. Conversely, all versions of Windows NT completely
14732 implement all the Unicode functionality, but some versions (especially
14733 versions before Windows NT 4.0) don't implement much of the multibyte
14734 functionality. For this reason, as well as for general code
14735 cleanliness, XEmacs needs to be written in such a way that it works
14736 with or without the @code{UNICODE} preprocessor constant being
14737 defined.
14738
14739 Getting XEmacs to run when all strings are Unicode primarily
14740 involves removing any assumptions made about the size of characters.
14741 Remember what I said earlier about how the point of conversion between
14742 internally and externally encoded strings should occur at the point of
14743 entry or exit into or out of a library function. With this in mind,
14744 an externally encoded string in XEmacs can be treated simply as an
14745 arbitrary sequence of bytes of some length which has no particular
14746 relationship to the length of the string in the internal encoding.
14747
14748 #### The rest of this is @strong{out-of-date} and needs to be written
14749 to reference the actual coding systems or aliases that we currently use.
14750
14751 [[ To facilitate this, the enum @code{external_data_format}, which is
14752 declared in @file{lisp.h}, is expanded to contain three new formats,
14753 which are @code{FORMAT_LOCALE}, @code{FORMAT_UNICODE} and
14754 @code{FORMAT_TSTR}. @code{FORMAT_LOCALE} always causes encoding into a
14755 multibyte string consistent with the encoding of the current locale.
14756 The functions to handle locales are different under Unix and Windows and
14757 locales are a process property under Unix and a thread property under
14758 Windows, but the concepts are basically the same. @code{FORMAT_UNICODE}
14759 of course causes encoding into Unicode and @code{FORMAT_TSTR} logically
14760 maps to either @code{FORMAT_LOCALE} or @code{FORMAT_UNICODE} depending
14761 on the @code{UNICODE} preprocessor constant.
14762
14763 Under Unix the behavior of @code{FORMAT_TSTR} is undefined and this
14764 particular format should not be used. Under Windows however
14765 @code{FORMAT_TSTR} should be used for pretty much all of the Win32 API
14766 calls. The other two formats should only be used in particular APIs
14767 that specifically call for a multibyte or Unicode encoded string
14768 regardless of the @code{UNICODE} preprocessor constant. String
14769 constants that are to be passed directly to Win32 API functions, such as
14770 the names of window classes, need to be bracketed in their definition
14771 with a call to the macro @code{TEXT}. This awfully named macro, which
14772 comes out of the Win32 API, appropriately makes a string of either
14773 regular or wide chars, which is to say this string may be prepended with
14774 an @code{L} (causing it to be a wide string) depending on the
14775 @code{UNICODE} preprocessor constant.
14776
14777 By the way, if you're wondering what happened to @code{FORMAT_OS}, I
14778 think that this format should go away entirely because it is too vague
14779 and should be replaced by more specific formats as they are defined.
14780 ]]
14781
14782 Use Qnative for Unix conversion, Qmswindows_tstr for Windows ...
14783
14784 String constants that are to be passed directly to Win32 API functions,
14785 such as the names of window classes, need to be bracketed in their
14786 definition with a call to the macro XETEXT. This appropriately makes a
14787 string of either regular or wide chars, which is to say this string may be
14788 prepended with an L (causing it to be a wide string) depending on
14789 XEUNICODE_P.
14790
14791 @node Modules for Interfacing with MS Windows, , Windows I18N Introduction, Interface to MS Windows
14792 @section Modules for Interfacing with MS Windows
14793 @cindex modules for interfacing with MS Windows
14794 @cindex interfacing with MS Windows, modules for
14795 @cindex MS Windows, modules for interfacing with
14796 @cindex Windows, modules for interfacing with
14797
14798 There are two different general Windows-related include files in src.
14799
14800 Uses are approximately:
14801
14802 @table @file
14803 @item syswindows.h
14804 Wrapper around @file{<windows.h>}, including missing defines as
14805 necessary. Includes stuff needed on both Cygwin and native Windows,
14806 regardless of window system chosen. Includes definitions needed for
14807 Unicode conversion/encapsulation, and other Mule-related stuff, plus
14808 various other prototypes and Windows-specific, but not GUI-specific,
14809 stuff.
14810
14811 @item console-msw.h
14812 Used on both Cygwin and native Windows, but only when native window
14813 system (as opposed to X) chosen. Includes @file{syswindows.h}.
14814 @end table
14815
14816 Summary of files:
14817
14818 @table @file
14819 @item console-msw.h
14820 include file for native windowing (otherwise, @file{console-x.h}, etc.)
14821 @item console-msw.c, frame-msw.c, etc.
14822 native windowing, as above
14823 @item process-nt.c
14824 subprocess support for native OS (otherwise, @file{process-unix.c})
14825 @item nt.c
14826 support routines used under native OS
14827 @item win32.c
14828 support routines used under both OS environments
14829 @item syswindows.h
14830 support header for both environments
14831 @item nt/xemacs.mak
14832 Makefile for VC++ (otherwise, @file{src/Makefile.in.in})
14833 @item s/windowsnt.h
14834 s header for basic native-OS defines, VC++ compiler
14835 @item s/mingw32.h
14836 s header for basic native-OS defines, GCC/MinGW compiler
14837 @item s/cygwin.h
14838 s header for basic Cygwin defines
14839 @item s/win32-native.h
14840 s header for basic native-OS defines, all compilers
14841 @item s/win32-common.h
14842 s header for defines for both OS environments
14843 @item intl-win32.c
14844 internationalization functions for both OS environments
14845 @item intl-encap-win32.c
14846 Unicode encapsulation functions for both OS environments
14847 @item intl-auto-encap-win32.c
14848 Auto-generated Unicode encapsulation functions
14849 @item intl-auto-encap-win32.h
14850 Auto-generated Unicode encapsulation headers
14851 @end table
14852
14853 @node Interface to the X Window System, Future Work, Interface to MS Windows, Top
10892 @chapter Interface to the X Window System 14854 @chapter Interface to the X Window System
10893 @cindex X Window System, interface to the 14855 @cindex X Window System, interface to the
10894 14856
10895 Mostly undocumented. 14857 Mostly undocumented.
10896 14858
10897 @menu 14859 @menu
10898 * Lucid Widget Library:: An interface to various widget sets. 14860 * Lucid Widget Library:: An interface to various widget sets.
14861 * Modules for Interfacing with X Windows::
10899 @end menu 14862 @end menu
10900 14863
10901 @node Lucid Widget Library 14864 @node Lucid Widget Library, Modules for Interfacing with X Windows, Interface to the X Window System, Interface to the X Window System
10902 @section Lucid Widget Library 14865 @section Lucid Widget Library
10903 @cindex Lucid Widget Library 14866 @cindex Lucid Widget Library
10904 @cindex widget library, Lucid 14867 @cindex widget library, Lucid
10905 @cindex library, Lucid Widget 14868 @cindex library, Lucid Widget
10906 14869
10922 not know which widget set has been used to build the graphical user 14885 not know which widget set has been used to build the graphical user
10923 interface. 14886 interface.
10924 14887
10925 @menu 14888 @menu
10926 * Generic Widget Interface:: The lwlib generic widget interface. 14889 * Generic Widget Interface:: The lwlib generic widget interface.
10927 * Scrollbars:: 14890 * Scrollbars::
10928 * Menubars:: 14891 * Menubars::
10929 * Checkboxes and Radio Buttons:: 14892 * Checkboxes and Radio Buttons::
10930 * Progress Bars:: 14893 * Progress Bars::
10931 * Tab Controls:: 14894 * Tab Controls::
10932 @end menu 14895 @end menu
10933 14896
10934 @node Generic Widget Interface 14897 @node Generic Widget Interface, Scrollbars, Lucid Widget Library, Lucid Widget Library
10935 @subsection Generic Widget Interface 14898 @subsection Generic Widget Interface
10936 @cindex widget interface, generic 14899 @cindex widget interface, generic
10937 14900
10938 In general in any toolkit a widget may be a composite object. In Xt, 14901 In general in any toolkit a widget may be a composite object. In Xt,
10939 all widgets have an X window that they manage, but typically a complex 14902 all widgets have an X window that they manage, but typically a complex
11010 14973
11011 The @code{widget_instance} structure also contains a pointer to the root 14974 The @code{widget_instance} structure also contains a pointer to the root
11012 of its tree. Widget instances are further confi 14975 of its tree. Widget instances are further confi
11013 14976
11014 14977
11015 @node Scrollbars 14978 @node Scrollbars, Menubars, Generic Widget Interface, Lucid Widget Library
11016 @subsection Scrollbars 14979 @subsection Scrollbars
11017 @cindex scrollbars 14980 @cindex scrollbars
11018 14981
11019 @node Menubars 14982 @node Menubars, Checkboxes and Radio Buttons, Scrollbars, Lucid Widget Library
11020 @subsection Menubars 14983 @subsection Menubars
11021 @cindex menubars 14984 @cindex menubars
11022 14985
11023 @node Checkboxes and Radio Buttons 14986 @node Checkboxes and Radio Buttons, Progress Bars, Menubars, Lucid Widget Library
11024 @subsection Checkboxes and Radio Buttons 14987 @subsection Checkboxes and Radio Buttons
11025 @cindex checkboxes and radio buttons 14988 @cindex checkboxes and radio buttons
11026 @cindex radio buttons, checkboxes and 14989 @cindex radio buttons, checkboxes and
11027 @cindex buttons, checkboxes and radio 14990 @cindex buttons, checkboxes and radio
11028 14991
11029 @node Progress Bars 14992 @node Progress Bars, Tab Controls, Checkboxes and Radio Buttons, Lucid Widget Library
11030 @subsection Progress Bars 14993 @subsection Progress Bars
11031 @cindex progress bars 14994 @cindex progress bars
11032 @cindex bars, progress 14995 @cindex bars, progress
11033 14996
11034 @node Tab Controls 14997 @node Tab Controls, , Progress Bars, Lucid Widget Library
11035 @subsection Tab Controls 14998 @subsection Tab Controls
11036 @cindex tab controls 14999 @cindex tab controls
11037 15000
11038 @include index.texi 15001
15002 @node Modules for Interfacing with X Windows, , Lucid Widget Library, Interface to the X Window System
15003 @section Modules for Interfacing with X Windows
15004 @cindex modules for interfacing with X Windows
15005 @cindex interfacing with X Windows, modules for
15006 @cindex X Windows, modules for interfacing with
15007
15008 @example
15009 Emacs.ad.h
15010 @end example
15011
15012 A file generated from @file{Emacs.ad}, which contains XEmacs-supplied
15013 fallback resources (so that XEmacs has pretty defaults).
15014
15015
15016
15017 @example
15018 EmacsFrame.c
15019 EmacsFrame.h
15020 EmacsFrameP.h
15021 @end example
15022
15023 These modules implement an Xt widget class that encapsulates a frame.
15024 This is for ease in integrating with Xt. The EmacsFrame widget covers
15025 the entire X window except for the menubar; the scrollbars are
15026 positioned on top of the EmacsFrame widget.
15027
15028 @strong{Warning:} Abandon hope, all ye who enter here. This code took
15029 an ungodly amount of time to get right, and is likely to fall apart
15030 mercilessly at the slightest change. Such is life under Xt.
15031
15032
15033
15034 @example
15035 EmacsManager.c
15036 EmacsManager.h
15037 EmacsManagerP.h
15038 @end example
15039
15040 These modules implement a simple Xt manager (i.e. composite) widget
15041 class that simply lets its children set whatever geometry they want.
15042 It's amazing that Xt doesn't provide this standardly, but on second
15043 thought, it makes sense, considering how amazingly broken Xt is.
15044
15045
15046 @example
15047 EmacsShell-sub.c
15048 EmacsShell.c
15049 EmacsShell.h
15050 EmacsShellP.h
15051 @end example
15052
15053 These modules implement two Xt widget classes that are subclasses of
15054 the TopLevelShell and TransientShell classes. This is necessary to deal
15055 with more brokenness that Xt has sadistically thrust onto the backs of
15056 developers.
15057
15058
15059
15060 @example
15061 xgccache.c
15062 xgccache.h
15063 @end example
15064
15065 These modules provide functions for maintenance and caching of GC's
15066 (graphics contexts) under the X Window System. This code is junky and
15067 needs to be rewritten.
15068
15069
15070
15071 @example
15072 select-msw.c
15073 select-x.c
15074 select.c
15075 select.h
15076 @end example
15077
15078 @cindex selections
15079 This module provides an interface to the X Window System's concept of
15080 @dfn{selections}, the standard way for X applications to communicate
15081 with each other.
15082
15083
15084
15085 @example
15086 xintrinsic.h
15087 xintrinsicp.h
15088 xmmanagerp.h
15089 xmprimitivep.h
15090 @end example
15091
15092 These header files are similar in spirit to the @file{sys*.h} files and buffer
15093 against different implementations of Xt and Motif.
15094
15095 @itemize @bullet
15096 @item
15097 @file{xintrinsic.h} should be included in place of @file{<Intrinsic.h>}.
15098 @item
15099 @file{xintrinsicp.h} should be included in place of @file{<IntrinsicP.h>}.
15100 @item
15101 @file{xmmanagerp.h} should be included in place of @file{<XmManagerP.h>}.
15102 @item
15103 @file{xmprimitivep.h} should be included in place of @file{<XmPrimitiveP.h>}.
15104 @end itemize
15105
15106
15107
15108 @example
15109 xmu.c
15110 xmu.h
15111 @end example
15112
15113 These files provide an emulation of the Xmu library for those systems
15114 (i.e. HPUX) that don't provide it as a standard part of X.
15115
15116
15117
15118 @example
15119 ExternalClient-Xlib.c
15120 ExternalClient.c
15121 ExternalClient.h
15122 ExternalClientP.h
15123 ExternalShell.c
15124 ExternalShell.h
15125 ExternalShellP.h
15126 extw-Xlib.c
15127 extw-Xlib.h
15128 extw-Xt.c
15129 extw-Xt.h
15130 @end example
15131
15132 @cindex external widget
15133 These files provide the @dfn{external widget} interface, which allows an
15134 XEmacs frame to appear as a widget in another application. To do this,
15135 you have to configure with @samp{--external-widget}.
15136
15137 @file{ExternalShell*} provides the server (XEmacs) side of the
15138 connection.
15139
15140 @file{ExternalClient*} provides the client (other application) side of
15141 the connection. These files are not compiled into XEmacs but are
15142 compiled into libraries that are then linked into your application.
15143
15144 @file{extw-*} is common code that is used for both the client and server.
15145
15146 Don't touch this code; something is liable to break if you do.
15147
15148
15149 @node Future Work, Future Work Discussion, Interface to the X Window System, Top
15150 @chapter Future Work
15151 @cindex future work
15152
15153 @menu
15154 * Future Work -- Elisp Compatibility Package::
15155 * Future Work -- Drag-n-Drop::
15156 * Future Work -- Standard Interface for Enabling Extensions::
15157 * Future Work -- Better Initialization File Scheme::
15158 * Future Work -- Keyword Parameters::
15159 * Future Work -- Property Interface Changes::
15160 * Future Work -- Toolbars::
15161 * Future Work -- Menu API Changes::
15162 * Future Work -- Removal of Misc-User Event Type::
15163 * Future Work -- Mouse Pointer::
15164 * Future Work -- Extents::
15165 * Future Work -- Version Number and Development Tree Organization::
15166 * Future Work -- Improvements to the @code{xemacs.org} Website::
15167 * Future Work -- Keybindings::
15168 * Future Work -- Byte Code Snippets::
15169 * Future Work -- Lisp Stream API::
15170 * Future Work -- Multiple Values::
15171 * Future Work -- Macros::
15172 * Future Work -- Specifiers::
15173 * Future Work -- Display Tables::
15174 * Future Work -- Making Elisp Function Calls Faster::
15175 * Future Work -- Lisp Engine Replacement::
15176 @end menu
15177
15178 @ignore
15179 Macro to convert a single line containing a heading into the format of
15180 all headings in the Future Work section.
15181
15182 (setq last-kbd-macro (read-kbd-macro
15183 "<S-end> <f3> <home> @node SPC <end> RET @section SPC <f4> <home> <up> <C-right> <right> Future SPC Work SPC - - SPC <home> <down> <C-right> <right> Future SPC Work SPC - - SPC <end> RET @cindex SPC future SPC work, SPC <f4> C-r , RET C-x C-x M-l RET @cindex SPC <f4> <home> <C-right> <S-end> M-l , SPC future SPC work RET"))
15184 @end ignore
15185
15186 @node Future Work -- Elisp Compatibility Package, Future Work -- Drag-n-Drop, Future Work, Future Work
15187 @section Future Work -- Elisp Compatibility Package
15188 @cindex future work, elisp compatibility package
15189 @cindex elisp compatibility package, future work
15190
15191 A while ago I created a package called Sysdep, which aimed to be a
15192 forward compatibility package for Elisp. The idea was that instead of
15193 having to write your package using the oldest version of Emacs that you
15194 wanted to support, you could use the newest XEmacs API, and then simply
15195 load the Sysdep package, which would automatically define the new API in
15196 terms of older APIs as necessary. The idea of this package was good,
15197 but its design wasn't perfect, and it wasn't widely adopted. I propose
15198 a new package called Compat that corrects the design flaws in Sysdep,
15199 and hopefully will be adopted by most of the major packages.
15200
15201 In addition, this package will provide macros that can be used to
15202 bracket code as necessary to disable byte compiler warnings generated as
15203 a result of supporting the APIs of different versions of Emacs; or
15204 rather the Compat package strives to provide useful constructs to make
15205 doing this support easier, and these constructs have the side effect of
15206 not causing spurious byte compiler warnings. The idea here is that it
15207 should be possible to create well-written, clean, and understandable
15208 Elisp that supports both older and newer APIs, and has no byte compiler
15209 warnings. Currently many warnings are unavoidable, and as a result,
15210 they are simply ignored, which also causes a lot of legitimate warnings
15211 to be ignored.
15212
15213 The approach taken by the Sysdep package to make sure that the newest
15214 API was always supported was fairly simple: when the Sysdep package was
15215 loaded, it checked for the existence of new API functions, and if they
15216 weren't defined, it defined them in terms of older API functions that
15217 were defined. This had the advantage that the checks for which API
15218 functions were defined were done only once at load time rather than each
15219 time the function was called. However, the fact that the new APIs were
15220 globally defined caused a lot of problems with unwanted interactions,
15221 both with other versions of the Sysdep package provided as part of other
15222 packages, and simply with compatibility code of other sorts in packages
15223 that would determine whether an API existed by checking for the
15224 existence of certain functions within that API. In addition, the Sysdep
15225 package did not scale well because it defined all of the functions that
15226 it supported, regardless of whether or not they were used.
15227
15228 The Compat package remedies the first problem by ensuring that the new
15229 APIs are defined only within the lexical scope of the packages that
15230 actually make use of the Compat package. It remedies the second problem
15231 by ensuring that only definitions of functions that are actually used
15232 are loaded. This all works roughly according to the following scheme:
15233
15234 @enumerate
15235 @item
15236
15237 Part of the Compat package is a module called the Compat generator.
15238 This module is actually run as an additional step during byte
15239 compilation of a package that uses Compat. This can happen either
15240 through the makefile or through the use of an @code{eval-when-compile}
15241 call within the package code itself. What the generator does is scan
15242 all of the Lisp code in the package, determine which function calls are
15243 made that the Compat package knows about, and generates custom
15244 @code{compat} code that conditionally defines just these functions when
15245 the package is loaded. The custom @code{compat} code can either be
15246 written to a separate Lisp file (for use with multi-file packages), or
15247 inserted into the beginning of the Lisp file of a single file package.
15248 (In the latter case, the package indicates where this generated code
15249 should go through the use of magic comments that mark the beginning and
15250 end of the section. Some will say that doing this trick is bad juju,
15251 but I have done this sort of thing before, and it works very well in
15252 practice).
15253 @item
15254
15255 The functions in the custom @code{compat} code have their names prefixed
15256 with both the name of the package and the word @code{compat}, ensuring
15257 that there will be no name space conflicts with other functions in the
15258 same package, or with other packages that make use of the Compat
15259 package.
15260 @item
15261
15262 The actual definitions of the functions in the custom @code{compat} code
15263 are determined at run time. When the equivalent API already exists, the
15264 wrapper functions are simply defined directly in terms of the actual
15265 functions, so that the only run time overhead from using the Compat
15266 package is one additional function call. (Alternatively, even this
15267 small overhead could be avoided by retrieving the definitions of the
15268 actual functions and supplying them as the definitions of the wrapper
15269 functions. However, this appears to me to not be completely safe. For
15270 example, it might have bad interactions with the advice package).
15271 @item
15272
15273 The code that wants to make use of the custom @code{compat} code is
15274 bracketed by a call to the construct @code{compat-execute}. What this
15275 actually does is lexically bind all of the function names that are being
15276 redefined with macro functions by using the Common Lisp macro macrolet.
15277 (The definition of this macro is in the CL package, but in order for
15278 things to work on all platforms, the definition of this macro will
15279 presumably have to be copied and inserted into the custom @code{compat}
15280 code).
15281
15282 @end enumerate
15283
15284 In addition, the Compat package should define the macro
15285 @code{compat-if-fboundp}. Similar macros such as
15286 @code{compile-when-fboundp} and @code{compile-case-fboundp} could be
15287 defined using similar principles). The @code{compat-if-fboundp} macro
15288 behaves just like an @code{(if (fboundp ...) ...)} clause when executed,
15289 but in addition, when it's compiled, it ensures that the code inside the
15290 @code{if-true} sub-block will not cause any byte compiler warnings about
15291 the function in question being unbound. I think that the way to
15292 implement this would be to make @code{compat-if-fboundp} be a macro that
15293 does what it's supposed to do, but which defines its own byte code
15294 handler, which ensures that the particular warning in question will be
15295 suppressed. (Actually ensuring that just the warning in question is
15296 suppressed, and not any others, might be rather tricky. It certainly
15297 requires further thought).
15298
15299 Note: An alternative way of avoiding both warnings about unbound
15300 functions and warnings about obsolete functions is to just call the
15301 function in question by using @code{funcall}, instead of calling the
15302 function directly. This seems rather inelegant to me, though, and
15303 doesn't make it obvious why the function is being called in such a
15304 roundabout manner. Perhaps the Compat package should also provide a
15305 macro @code{compat-funcall}, which works exactly like @code{funcall},
15306 but which indicates to anyone reading the code why the code is expressed
15307 in such a fashion.
15308
15309 If you're wondering how to implement the part of the Compat generator
15310 where it scans Lisp code to find function calls for functions that it
15311 wants to do something about, I think the best way is to simply process
15312 the code using the Lisp function @code{read} and recursively descend any
15313 lists looking for function names as the first element of any list
15314 encountered. This might extract out a few more functions than are
15315 actually called, but it is almost certainly safer than doing anything
15316 trickier like byte compiling the code, and attempting to look for
15317 function calls in the result. (It could also be argued that the names
15318 of the functions should be extracted, not only from the first element of
15319 lists, but anywhere @code{symbol} occurs. For example, to catch places
15320 where a function is called using @code{funcall} or @code{apply}.
15321 However, such uses of functions would not be affected by the surrounding
15322 macrolet call, and so there doesn't appear to be any point in extracting
15323 them).
15324
15325 @uref{../../www.666.com/ben/default.htm,Ben Wing}
15326
15327 @node Future Work -- Drag-n-Drop, Future Work -- Standard Interface for Enabling Extensions, Future Work -- Elisp Compatibility Package, Future Work
15328 @section Future Work -- Drag-n-Drop
15329 @cindex future work, drag-n-drop
15330 @cindex drag-n-drop, future work
15331
15332 @strong{Abstract:} I propose completely redoing the drag-n-drop
15333 interface to make it powerful and extensible enough to support such
15334 concepts as drag over and drag under visuals and context menus invoked
15335 when a drag is done with the right mouse button, to allow drop handlers
15336 to be defined for all sorts of graphical elements including buffers,
15337 extents, mode lines, toolbar items, menubar items, glyphs, etc., and to
15338 allow different packages to add and remove drop handlers for the same
15339 drop sites without interfering with each other. The changes are
15340 extensive enough that I think they can only be implemented in version
15341 22, and the drag-n-drop interface should remain experimental until then.
15342
15343 The new drag-n-drop interface centers around the twin concepts of
15344 @dfn{drop site} and @dfn{drop handler}. A @dfn{drop site} specifies a
15345 particular graphical element where an object can be dropped onto, and a
15346 @dfn{drop handler} encapsulates all of the behavior that happens when
15347 such an object is dragged over and dropped onto a drop site.
15348
15349 Each drop site has an object associated with it which is passed to
15350 functions that are part of the drop handlers associated with that site.
15351 The type of this object depends on the graphical element that comprises
15352 the drop site. The drop site object can be a buffer, an extent, a
15353 glyph, a menu path, a toolbar item path, etc. (These last two object
15354 types are defined in @uref{lisp-interface.html,Lisp Interface Changes}
15355 in the sections on menu and toolbar API changes. If we wanted to allow
15356 drops onto other kinds of drop sites, for example mode lines, we would
15357 have to create corresponding path objects). Each such object type
15358 should be able to be accessed using the generalized property interface
15359 defined above, and should have a property called @code{drop-handlers}
15360 associated with it that specifies all of the drop handlers associated
15361 with the drop site. Normally, this property is not accessed directly,
15362 but instead by using the drop handler API defined below, and Lisp
15363 packages should not make any assumptions about the format of the data
15364 contained in the @code{drop-handlers} property.
15365
15366 Each drop handler has an object of type @code{drop-handler} associated
15367 with it, whose primary purpose is to be a container for the various
15368 properties associated with a particular drop handler. These could
15369 include, for example, a function invoked when the drop occurs, a context
15370 menu invoked when a drop occurs as a result of a drag with the right
15371 mouse button, functions invoked when a dragged object enters, leaves, or
15372 moves within a drop site, the shape that the mouse pointer changes to
15373 when an object is dragged over a drop site that allows this particular
15374 object to be dropped onto it, the MIME types (actually a regular
15375 expression matching the MIME types) of the allowable objects that can be
15376 dropped onto the drop site, a @dfn{package tag} (a symbol specifying the
15377 package that created the drop handler, used for identification
15378 purposes), etc. The drop handler object is passed to the functions that
15379 are invoked as a result of a drag or a drop, most likely indirectly as
15380 one of the properties of the drag or drop event passed to the function.
15381 Properties of a drop handler object are accessed and modified in the
15382 standard fashion using the generalized property interface.
15383
15384 A drop handler is added to a drop site using the @code{add-drop-handler}
15385 function. The drop handler itself can either be created separately
15386 using the @code{make-drop-handler} function and then passed in as one of
15387 the parameters to @code{add-drop-handler}, or it will be created
15388 automatically by the @code{add-drop-handler} function, if the drop
15389 handler argument is omitted, but keyword arguments corresponding to the
15390 valid keyword properties for a drop handler are specified in the
15391 @code{add-drop-handler} call. Other functions, such as
15392 @code{find-drop-handler}, @code{add-drop-handler} (when specifying a
15393 drop handler before which the drop handler in question is to be added),
15394 @code{remove-drop-handler} etc. should be defined with obvious
15395 semantics. All of these functions take or return a drop site object
15396 which, as mentioned above, can be one of several object types
15397 corresponding to graphical elements. Defined drop handler functions
15398 locate a particular drop handler using either the @code{MIME-type} or
15399 @code{package-tag} property of the drop handler, as defined above.
15400
15401 Logically, the drop handlers associated with a particular drop site are
15402 an ordered list. The first drop handler whose specified MIME type
15403 matches the MIME type of the object being dragged or dropped controls
15404 what happens to this object. This is important particularly because the
15405 specified MIME type of the drop handler can be a regular expression
15406 that, for example, matches all audio objects with any sub-type.
15407
15408 In the current drag-n-drop API, there is a distinction made between
15409 objects with an associated MIME type and objects with an associated URL.
15410 I think that this distinction is arbitrary, and should not exist. All
15411 objects should have a MIME type associated with them, and a new
15412 XEmacs-specific MIME type should be defined for URLs, file names,
15413 etc. as necessary. I am not even sure that this is necessary, however,
15414 as the MIME specification may specify a general concept of a pointer or
15415 link to an object, which is exactly what we want. Also in some cases
15416 (for example, the name of a file that is locally available), the pointer
15417 or link will have another MIME type associated with it, which is the
15418 type of the object that is being pointed to. I am not quite sure how we
15419 should handle URL and file name objects being dragged, but I am positive
15420 that it needs to be integrated with the mechanism used when an object
15421 itself is being dragged or dropped.
15422
15423 As is described in @uref{misc-user-event.html,a separate page}, the
15424 @code{misc-user-event} event type should be removed and split up into a
15425 number of separate event types. Two such event types would be
15426 @code{drag-event} and @code{drop-event}. A drop event is used when an
15427 object is actually dropped, and a drag event is used if a function is
15428 invoked as part of the dragging process. (Such a function would
15429 typically be used to control what are called @dfn{drag under visuals},
15430 which are changes to the appearance of the drop site reflecting the fact
15431 that a compatible object is being dragged over it). The drag events and
15432 drop events encapsulate all of the information that is pertinent to the
15433 drag or drop action occurring, including such information as the actual
15434 MIME type of the object in question, the drop handler that caused a
15435 function to be invoked, the mouse event (or possibly even a keyboard
15436 event) corresponding to the user's action that is causing the drag or
15437 drop, etc. This event is always passed to any function that is invoked
15438 as a result of the drag or drop. There should never be any need to
15439 refer to the @code{current-mouse-event} variable, and in fact, this
15440 variable should not be changed at all during a drag or a drop.
15441
15442 @uref{../../www.666.com/ben/default.htm,Ben Wing}
15443
15444 @node Future Work -- Standard Interface for Enabling Extensions, Future Work -- Better Initialization File Scheme, Future Work -- Drag-n-Drop, Future Work
15445 @section Future Work -- Standard Interface for Enabling Extensions
15446 @cindex future work, standard interface for enabling extensions
15447 @cindex standard interface for enabling extensions, future work
15448
15449 @strong{Abstract:} Apparently, if you know the name of a package (for
15450 example, @code{fusion}), you can load it using the @code{require}
15451 function, but there's no standard way to turn it on or turn it off. The
15452 only way to figure out how to do that is to go read the source file,
15453 where hopefully the comments at the start tell you the appropriate magic
15454 incantations that you need to run in order to turn the extension on or
15455 off. There really needs to be standard functions, such as
15456 @code{enable-extension} and @code{disable-extension}, to do this sort of
15457 thing. It seems like a glaring omission that this isn't currently
15458 present, and it's really surprising to me that nobody has remarked on
15459 this.
15460
15461 The easy part of this is defining the interface, and I think it should
15462 be done as soon as possible. When the package is loaded, it simply
15463 calls some standard function in the package system, and passes it the
15464 names of enable and disable functions, or perhaps just one function that
15465 takes an argument specifying whether to enable or disable. In any case,
15466 this data is kept in a table which is used by the
15467 @code{enable-extension} and @code{disable-extension} function. There
15468 should also be functions such as @code{extension-enabled-p} and
15469 @code{enabled-extension-list}, and so on with obvious semantics. The
15470 hard part is actually getting packages to obey this standard interface,
15471 but this is mitigated by the fact that the changes needed to support
15472 this interface are so simple.
15473
15474 I have been conceiving of these enabling and disabling functions as
15475 turning the feature on or off globally. It's probably also useful to
15476 have a standard interface returning a extension on or off in just the
15477 particular buffer. Perhaps then the appropriate interface would involve
15478 registering a single function that takes an argument that specifies
15479 various things, such as turn off globally, turn on globally, turn on or
15480 off in the current buffer, etc.
15481
15482 Part of this interface should specify the correct way to define global
15483 key bindings. The correct rule for this, of course, is that the key
15484 bindings should not happen when the package is loaded, which is often
15485 how things are currently done, but only when the extension is actually
15486 enabled. The key bindings should go away when the extension is
15487 disabled. I think that in order to support this properly, we should
15488 expand the keymap interface slightly, so that in addition to other
15489 properties associated with each key binding is a list of shadow
15490 bindings. Then there should be a function called
15491 @code{define-key-shadowing}, which is just like @code{define-key} but
15492 which also remembers the previous key binding in a shadow list. Then
15493 there can be another function, something like @code{undefine-key}, which
15494 restores the binding to the most recently added item on the shadow list.
15495 There are already hash tables associated with each key binding, and it
15496 should be easy to stuff additional values, such as a shadow list, into
15497 the hash table. Probably there should also be functions called
15498 @code{global-set-key-shadowing} and @code{global-unset-key-shadowing}
15499 with obvious semantics.
15500
15501 Once this interface is defined, it should be easy to expand the custom
15502 package so it knows about this interface. Then it will be possible to
15503 put all sorts of extensions on the options menu so that they could be
15504 turned off and turned on very easily, and then when you save the options
15505 out to a file, the design settings for whether these extensions are
15506 enabled or not are saved out with it. A whole lot of custom junk that's
15507 been added to a lot of different packages could be removed. After doing
15508 this, we might want to think of a way to classify extensions according
15509 to how likely we think the user will want to use them. This way we can
15510 avoid the problem of having a list of 100 extensions and the user not
15511 being able to figure out which ones might be useful. Perhaps the most
15512 useful extensions would appear immediately on the extensions menu, and
15513 the less useful ones would appear in a submenu of that, and another
15514 submenu might contain even less useful extensions. Of course the
15515 package authors might not be too happy with this, but the users probably
15516 will be. I think this at least deserves a thought, although it's
15517 possible you might simply want to maintain a list on the web site of
15518 extensions and a judgment on first of all, how commonly a user might
15519 want this extension, and second of all, how well written and bug-free
15520 the package is. Both of these sorts of judgments could be obtained by
15521 doing user surveys if need be.
15522
15523 @uref{../../www.666.com/ben/default.htm,Ben Wing}
15524
15525 @node Future Work -- Better Initialization File Scheme, Future Work -- Keyword Parameters, Future Work -- Standard Interface for Enabling Extensions, Future Work
15526 @section Future Work -- Better Initialization File Scheme
15527 @cindex future work, better initialization file scheme
15528 @cindex better initialization file scheme, future work
15529
15530 @strong{Abstract:} A proposal is outlined for converting XEmacs to use
15531 the @code{.xemacs} subdirectory for its initialization files instead of
15532 putting them in the user's home directory. In the process, a general
15533 pre-initialization scheme is created whereby all of the initialization
15534 parameters, such as the location of the initialization files, whether
15535 these files are loaded or not, where the initial frame is created,
15536 etc. that are currently specified by command line arguments, by
15537 environment variables, and other means, can be specified in a uniform
15538 way using Lisp code. Reasonable default behavior for everything will
15539 still be provided, and the older, simpler means can be used if desired.
15540 Compatibility with the current location and name of the initialization
15541 file, and the current ill-chosen use for the @code{.xemacs} directory is
15542 maintained, and the problem of how to gracefully migrate a user from the
15543 old scheme into the new scheme while still allowing the user to use GNU
15544 Emacs or older versions of XEmacs is solved. A proposal for changing
15545 the way that the initial frame is mapped is also outlined; this would
15546 allow the user's initialization file to control the way that the initial
15547 frame appears without resorting to hacks, while still making echo area
15548 messages visible as they appear, and allowing the user to debug errors
15549 in the initialization file.
15550
15551 @subheading Principles in the new scheme
15552
15553 @enumerate
15554 @item
15555
15556 XEmacs has a defined @dfn{pre-initialization process}. This process,
15557 whose purpose is to compute the values of the parameters that control
15558 how the initializiaton process proceeds, occurs as early as possible
15559 after the Lisp engine has been initialized, and in particular, it occurs
15560 before any devices have been opened, or before any initialization
15561 parameters are set that could reasonably be expected to be changed. In
15562 fact, the pre-initialization process should take care of setting these
15563 parameters. The code that implements the pre-initialization process
15564 should be written in Lisp and should be called from the Lisp function
15565 @code{normal-top-level}, and the general way that the user customizes
15566 this process should also be done using Lisp code.
15567
15568 @item
15569
15570 The pre-initialization process involves a number of properties, for
15571 example the directory containing the user initialization files (normally
15572 the @code{.xemacs} subdirectory), the name of the user init file, the
15573 name of the custom init file, where and what type the initial device is,
15574 whether and when the initial frame is mapped, etc. A standard interface
15575 is provided for getting and setting the values of these properties using
15576 functions such as @code{set-pre-init-property},
15577 @code{pre-init-property}, etc. At various points during the
15578 pre-initialization process, the value of many of these properties can be
15579 undecided, which means that at the end of the process, the value of
15580 these properties will be derived from other properties in some fashion
15581 that is specific to each property.
15582
15583 @item
15584
15585 The default values of these properties are set first from the registry
15586 under Windows, then from environment variables, then from command line
15587 switches, such as @code{-q} and @code{-nw}.
15588
15589 @item
15590
15591 One of the command line switches is @code{-pre-init}, whose value is a
15592 Lisp expression to be evaluated at pre-initialization time, similar to
15593 the @code{-eval} command line switch. This allows any
15594 pre-initialization property to be set from the command line.
15595
15596 @item
15597
15598 Let's define the term @dfn{to determine a pre-initialization property} to
15599 mean if the value of a property is undetermined, it is computed and set
15600 according to a rule that is specific to the property. Then after the
15601 pre-init properties are initialized from the registry, from the
15602 environment variables, from command line arguments, two of the pre-init
15603 properties (specifically the init file directory and the location of the
15604 @dfn{pre-init file}) are determined. The purpose of the pre-init file is
15605 to contain Lisp code that is run at pre-initialization time, and to
15606 control how the initialization proceeds. It is a bit similar to the
15607 standard init file, but the code in the pre-init file shouldn't do
15608 anything other than set pre-init properties. Executing any code that
15609 does I/O might not produce expected results because the only device that
15610 will exist at the time is probably a stream device connected to the
15611 standard I/O of the XEmacs process.
15612
15613 @item
15614
15615 After the pre-init file has been run, all of the rest of the pre-init
15616 properties are determined, and these values are then used to control the
15617 initialization process. Some of the rules used in determining specific
15618 properties are:
15619
15620 @enumerate
15621 @item
15622
15623 If the @code{.xemacs} sub-directory exists, and it's not obviously a
15624 package root (which probably means that it contains a file like
15625 @code{init.el} or @code{pre-init.el}, or if neither of those files is
15626 present, then it doesn't contain any sub-directories or files that look
15627 like what would be in a package root), then it becomes the value of the
15628 init file directory. Otherwise the user's home directory is used.
15629 @item
15630
15631
15632 If the init file directory is the user's home directory, then the init
15633 file is called @code{.emacs}. Otherwise, it's called @code{init.el}.
15634 @item
15635
15636
15637 If the init file directory is the user's home directory, then the
15638 pre-init file is called @code{.xemacs-pre-init.el}. Otherwise it's
15639 called @code{pre-init.el}. (One of the reasons for this rule has to do
15640 with the dialog box that might be displayed at startup. This will be
15641 described below.)
15642 @item
15643
15644
15645 If the init file directory is the user's home directory, then the custom
15646 init file is called @code{.xemacs-custom-init.el}. Otherwise, it's
15647 called @code{custom-init.el}.
15648
15649 @end enumerate
15650
15651 @item
15652
15653 After the first normal device is created, but before any frames are
15654 created on it, the XEmacs initialization code checks to see if the old
15655 init file scheme is being used, which is to say that the init file
15656 directory is the same as the user's home directory. If that's the case,
15657 then normally a dialog box comes up (or a question is asked on the
15658 terminal if XEmacs is being run in a non-windowing mode) which asks if
15659 the user wants to migrate his initialization files to the new scheme.
15660 The possible responses are @strong{Yes}, @strong{No}, and @strong{No,
15661 and don't ask this again}. If this last response is chosen, then the
15662 file @code{.xemacs-pre-init.el} in the user's home directory is created
15663 or appended to with a line of Lisp code that sets up a pre-init property
15664 indicating that this dialog box shouldn't come up again. If the
15665 @strong{Yes} option is chosen, then any package root files in
15666 @code{.xemacs} are moved into @code{.xemacs/packages}, the file
15667 @code{.emacs} is moved into @code{.xemacs/init.el} and @code{.emacs} in
15668 the home directory becomes a symlink to this file. This way some
15669 compatibility is still maintained with GNU Emacs and older versions of
15670 XEmacs. The code that implements this has to be written very carefully
15671 to make sure that it doesn't accidentally delete or mess up any of the
15672 files that get moved around.
15673
15674 @end enumerate
15675
15676 @subheading The custom init file
15677
15678 The @dfn{custom init file} is where the custom package writes its
15679 options. This obviously needs to be a separate file from the standard
15680 init file. It should also be loaded before the init file rather than
15681 after, as is usually done currently, so that the init file can override
15682 these options if it wants to.
15683
15684 @subheading Frame mapping
15685
15686 In addition to the above scheme, the way that XEmacs handles mapping the
15687 initial frame should be changed. However, this change perhaps should be
15688 delayed to a later version of XEmacs because of the user visible changes
15689 that it entails and the possible breakage in people's init files that
15690 might occur. (For example, if the rest of the scheme is implemented in
15691 21.2, then this part of the scheme might want to be delayed until
15692 version 22.) The basic idea is that the initial frame is not created
15693 before the initialization file is run, but instead a banner frame is
15694 created containing the XEmacs logo, a button that allows the user to
15695 cancel the execution of the init file and an area where messages that
15696 are output in the process of running this file are displayed. This area
15697 should contain a number of lines, which makes it better than the current
15698 scheme where only the last message is visible. After the init file is
15699 done, the initial frame is mapped. This way the init file can make face
15700 changes and other such modifications that affect initial frame and then
15701 have the initial frame correctly come up with these changes and not see
15702 any frame dancing or other problems that exist currently.
15703
15704 There should be a function that allows the initialization file to
15705 explicitly create and map the first frame if it wants to. There should
15706 also be a pre-init property that controls whether the banner frame
15707 appears (of course it defaults to true) a property controlling when the
15708 initial frame is created (before or after the init file, defaulting to
15709 after), and a property controlling whether the initial frame is mapped
15710 (normally true, but will be false if the @code{-unmapped} command line
15711 argument is given).
15712
15713 If an error occurs in the init file, then the initial frame should
15714 always be created and mapped at that time so that the error is displayed
15715 and the debugger has a place to be invoked.
15716
15717 @uref{../../www.666.com/ben/default.htm,Ben Wing}
15718
15719 @node Future Work -- Keyword Parameters, Future Work -- Property Interface Changes, Future Work -- Better Initialization File Scheme, Future Work
15720 @section Future Work -- Keyword Parameters
15721 @cindex future work, keyword parameters
15722 @cindex keyword parameters, future work
15723
15724 NOTE: These changes are partly motivated by the various user-interface
15725 changes elsewhere in this document, and partly for Mule support. In
15726 general the various API's in this document would benefit greatly from
15727 built-in keywords.
15728
15729 I would like to make keyword parameters an integral part of Elisp. The
15730 idea here is that you use the @code{&amp;key} identifier in the
15731 parameter list of a function and all of the following parameters
15732 specified are keyword parameters. This means that when these arguments
15733 are specified in a function call, they are immediately preceded in the
15734 argument list by a @dfn{keyword}, which is a symbol beginning with the
15735 `:' character. This allows any argument to be specified independently
15736 of any other argument with no need to place the arguments in any
15737 particular order. This is particularly useful for functions that take
15738 many optional parameters; using keyword parameters makes the code much
15739 cleaner and easier to understand.
15740
15741 The @code{cl} package already provides keyword parameters of a sort, but
15742 I would like to make this more integrated and useable in a standard
15743 fashion. The interface that I am proposing is essentially compatible
15744 with the keyword interface in Common Lisp, but it may be a subset of the
15745 Common Lisp functionality, especially in the first implementation.
15746 There is one departure from the Common Lisp specification that I would
15747 like to make in order to make it much easier to add keyword parameters
15748 to existing functions with optional parameters, and in general, to make
15749 optional and keyword parameters coexist more easily. The Common Lisp
15750 specification indicates that if a function has both optional and keyword
15751 parameters, the optional parameters are always processed before the
15752 keyword parameters. This means, for example, that if a function has
15753 three required parameters, two optional parameters, and some number of
15754 keyword parameters following, and the program attempts to call this
15755 function by passing in the three required arguments, and then some
15756 keyword arguments, the first keyword specified and the argument
15757 following it get assigned to the first and second optional parameters as
15758 specified in the function definition. This is certainly not what is
15759 intended, and means that if a function defines both optional and keyword
15760 parameters, any calls of this function must specify @code{nil} for all
15761 of the optional arguments before using any keywords. If the function
15762 definition is later changed to add more optional parameters, all
15763 existing calls to this function that use any keyword arguments will
15764 break. This problem goes away if we simply process keyword parameters
15765 before the optional parameters.
15766
15767 The primary changes needed to support the keyword syntax are:
15768
15769 @enumerate
15770 @item
15771
15772 The subr object type needs to be modified to contain additional slots
15773 for the number and names of any keyword parameters.
15774 @item
15775
15776
15777 The implementation of the @code{funcall} function needs to be modified
15778 so that it knows how to process keyword parameters. This is the only
15779 place that will require very much intricate coding, and much of the
15780 logic that would need to be added can be lifted directly from the
15781 @code{cl} code.
15782 @item
15783
15784
15785 A new macro, similar to the @code{DEFUN} macro, and probably called
15786 @code{DEFUN_WITH_KEYWORDS}, needs to be defined so that built-in Lisp
15787 primitives containing keywords can be created. Now, the
15788 @code{DEFUN_WITH_KEYWORDS} macro should take an additional parameter
15789 which is a string, which consists of the part of the lambda list
15790 declaration for this primitive that begins with the @code{&amp;key}
15791 specifier. This string is parsed in the @code{DEFSUBR} macro during
15792 XEmacs initialization, and is converted into the appropriate structure
15793 that needs to be stored into the subr object. In addition, the
15794 @var{max_args} parameter of the @code{DEFUN} macro needs to be
15795 incremented by the number of keyword parameters and these parameters are
15796 passed to the C function simply as extra parameters at the end. The
15797 @code{DEFSUBR} macro can sort out the actual number of required,
15798 optional and keyword parameters that the function takes, once it has
15799 parsed the keyword parameter string. (An alternative that might make
15800 the declaration of a primitive a little bit easier to understand would
15801 involve adding another parameter to the @code{DEFUN_WITH_KEYWORDS} macro
15802 that specifies the number of keyword parameters. However, this would
15803 require some additional complexity in the preprocessor definition of the
15804 @code{DEFUN_WITH_KEYWORDS} macro, and probably isn't worth
15805 implementing).
15806 @item
15807
15808
15809 The byte compiler would have to be modified slightly so that it knows
15810 about keyword parameters when it parses the parameter declaration of a
15811 function. For example, so that it issues the correct warnings
15812 concerning calls to that function with incorrect arguments.
15813 @item
15814
15815
15816 The @code{make-docfile} program would have to be modified so that it
15817 generates the correct parameter lists for primitives defined using the
15818 @code{DEFUN_WITH_KEYWORDS} macro.
15819 @item
15820
15821
15822 Possibly other aspects of the help system that deal with function
15823 descriptions might have to be modified.
15824 @item
15825
15826
15827 A helper function might need to be defined to make it easier for
15828 primitives that use both the @code{&amp;rest} and @code{&amp;key}
15829 specifiers to parse their argument lists.
15830
15831 @end enumerate
15832
15833 @subheading Internal API for C primitives with keywords - necessary for many of the new Mule APIs being defined.
15834
15835 @example
15836 DEFUN_WITH_KEYWORDS (Ffoo, "foo", 2, 5, 6, ALLOW_OTHER_KEYWORDS,
15837 (ichi, ARG_NIL), (ni, ARG_NIL), (san, ARG_UNBOUND), 0,
15838 (arg1, arg2, arg3, arg4, arg5)
15839 )
15840 @{
15841 ...
15842 @}
15843
15844 -> C fun of 12 args:
15845
15846 (arg1, ... arg5, ichi, ..., roku, other keywords)
15847
15848 Circled in blue is actual example declaration
15849
15850 DEFUN_WITH_KEYWORDS (Ffoo, "foo", 1,2,0 (bar, baz) <- arg list
15851 [ MIN ARGS, MAX ARGS, something that could be REST, SPECIFY_DEFAULT or
15852 REST_SPEC]
15853
15854 [#KEYWORDS [ ALLOW_OTHER, SPECIFY_DEFAULT, ALLOW_OTHER_SPECIFY_DEFAULT
15855 6, ALLOW_OTHER_SPECIFY_DEFAULT,
15856
15857 (ichi, 0) (ni, 0), (san, DEFAULT_UNBOUND), (shi, "t"), (go, "5"),
15858 (roku, "(current-buffer)")
15859 <- specifies arguments, default values (string to be read into Lisp
15860 data during init; then forms evalled at fn ref time.
15861
15862 ,0 <- [INTERACTIVE SPEC] )
15863
15864 LO = Lisp_Object
15865
15866 -> LO Ffoo (LO bar, LO baz, LO ichi, LO ni, LO san, LO shi, LO go,
15867 LO roku, int numkeywords, LO *other_keywords)
15868
15869 #define DEFUN_WITH_KEYWORDS (fun, funstr, minargs, maxargs, argspec, \
15870 #args, num_keywords, keywordspec, keywords, intspec) \
15871 LO fun (DWK_ARGS (maxargs, args) \
15872 DWK_KEYWORDS (num_keywords, keywordspec, keywords))
15873
15874 #define DWK_KEYWORDS (num_keywords, keywordspec, keywords) \
15875 DWK_KEYWORDS ## keywordspec (keywords)
15876 DWK_OTHER_KEYWORDS ## keywordspec)
15877
15878 #define DWK_KEYWORDS_ALLOW_OTHER (x,y)
15879 DWK_KEYWORDS (x,y)
15880
15881 #define DWK_KEYWORDS_ALLOW_OTHER_SPECIFICATIONS (x,y)
15882 DWK_KEYWORDS_SPECIFY_DEFAULT (x,y)
15883
15884 #define DWK_KEYWORDS_SPECIFY_DEFAULT (numkey, key)
15885 ARGLIST_CAR ## numkey key
15886
15887 #define ARGLT_GRZ (x,y) LO CAR x, LO CAR y
15888 @end example
15889
15890 @node Future Work -- Property Interface Changes, Future Work -- Toolbars, Future Work -- Keyword Parameters, Future Work
15891 @section Future Work -- Property Interface Changes
15892 @cindex future work, property interface changes
15893 @cindex property interface changes, future work
15894
15895 In my past work on XEmacs, I already expanded the standard property
15896 functions of @code{get}, @code{put}, and @code{remprop} to work on
15897 objects other than symbols and defined an additional function
15898 @code{object-plist} for this interface. I'd like to expand this
15899 interface further and advertise it as the standard way to make property
15900 changes in objects, especially the new objects that are going to be
15901 defined in order to support the added user interface features of version
15902 22. My proposed changes are as follows:
15903
15904 @enumerate
15905 @item
15906
15907 A new concept associated with each property called a @dfn{default value}
15908 is introduced. (This concept already exists, but not in a well-defined
15909 way.) The default value is the value that the property assumes for
15910 certain value retrieval functions such as @code{get} when it is
15911 @dfn{unbound}, which is to say that its value has not been explicitly
15912 specified. Note: the way to make a property unbound is to call
15913 @code{remprop}. Note also that for some built-in properties, setting
15914 the property to its default value is equivalent to making it unbound.
15915 @item
15916
15917
15918 The behavior of the @code{get} function is modified. If the @code{get}
15919 function is called on a property that is unbound and the third, optional
15920 @var{default} argument is @code{nil}, then the default value of the
15921 property is returned. If the @var{default} argument is not @code{nil},
15922 then whatever was specified as the value of this argument is returned.
15923 For the most part, this is upwardly compatible with the existing
15924 definition of @code{get} because all user-defined properties have an
15925 initial default value of @code{nil}. Code that calls the @code{get}
15926 function and specifies @code{nil} for the @var{default} argument, and
15927 expects to get @code{nil} returned if the property is unbound, is almost
15928 certainly wrong anyway.
15929 @item
15930
15931
15932 A new function, @code{get1} is defined. This function does not take a
15933 default argument like the @code{get} function. Instead, if the property
15934 is unbound, an error is signaled. Note: @code{get} can be implemented
15935 in terms of @code{get1}.
15936 @item
15937
15938
15939 New functions @code{property-default-value} and @code{property-bound-p}
15940 are defined with the obvious semantics.
15941 @item
15942
15943
15944 An additional function @code{property-built-in-p} is defined which takes
15945 two arguments, the first one being a symbol naming an object type, and
15946 the second one specifying a property, and indicates whether the property
15947 name has a built-in meaning for objects of that type.
15948 @item
15949
15950
15951 It is not necessary, or even desirable, for all object types to allow
15952 user-defined properties. It is always possible to simulate user-defined
15953 properties for an object by using a weak hash table. Therefore, whether
15954 an object allows a user to define properties or not should depend on the
15955 meaning of the object. If an object does not allow user-defined
15956 properties, the @code{put} function should signal an error, such as
15957 @code{undefined-property}, when given any property other than those that
15958 are predefined.
15959 @item
15960
15961
15962 A function called @code{user-defined-properties-allowed-p} should be
15963 defined with the obvious semantics. (See the previous item.)
15964 @item
15965
15966
15967 Three more functions should be defined, called
15968 @code{built-in-property-name-list}, @code{property-name-list}, and
15969 @code{user-defined-property-name-list}.
15970
15971 @end enumerate
15972
15973 Another idea:
15974
15975 @example
15976 (define-property-method
15977 predicate object-type
15978 predicate cons :(KEYWORD) (all lists beginning with KEYWORD)
15979
15980 :put putfun
15981 :get
15982 :remprop
15983 :object-props
15984 :clear-properties
15985 :map-properties
15986
15987 e.g. (define-property-method 'hash-table
15988 :put #'(lambda (obj key value) (puthash key obj value)))
15989 @end example
15990
15991
15992 @node Future Work -- Toolbars, Future Work -- Menu API Changes, Future Work -- Property Interface Changes, Future Work
15993 @section Future Work -- Toolbars
15994 @cindex future work, toolbars
15995 @cindex toolbars
15996
15997 @menu
15998 * Future Work -- Easier Toolbar Customization::
15999 * Future Work -- Toolbar Interface Changes::
16000 @end menu
16001
16002 @node Future Work -- Easier Toolbar Customization, Future Work -- Toolbar Interface Changes, Future Work -- Toolbars, Future Work -- Toolbars
16003 @subsection Future Work -- Easier Toolbar Customization
16004 @cindex future work, easier toolbar customization
16005 @cindex easier toolbar customization, future work
16006
16007 @strong{Abstract:} One of XEmacs' greatest strengths is its ability to
16008 be customized endlessly. Unfortunately, it is often too difficult to
16009 figure out how to do this. There has been some recent work like the
16010 Custom package, which helps in this regard, but I think there's a lot
16011 more work that needs to be done. Here are some ideas (which certainly
16012 could use some more thought).
16013
16014 Although there is currently an @code{edit-toolbar} package, it is not
16015 well integrated with XEmacs, and in general it is much too hard to
16016 customize the way toolbars look. I would like to see an interface that
16017 works a bit like the way things work under Windows, where you can
16018 right-click on a toolbar to get a menu of options that allows you to
16019 change aspects of the toolbar. The general idea is that if you
16020 right-click on an item itself, you can do things to that item, whereas
16021 if you right-click on a blank part of a toolbar, you can change the
16022 properties of the toolbar. Some of the items on the right-click menu
16023 for a particular toolbar button should be specified by the button
16024 itself. Others should be standard. For example, there should be an
16025 @strong{Execute} item which simply does what would happen if you
16026 left-click on a toolbar button. There should probably be a
16027 @strong{Delete} item to get rid of the toolbar button and a
16028 @strong{Properties} item, which brings up a property sheet that allows
16029 you to do things like change the icon and the command string that's
16030 associated with the toolbar button.
16031
16032 The options to change the appearance of the toolbar itself should
16033 probably appear both on the context menu for specific buttons, and on
16034 the menu that appears when you click on a blank part of the toolbar.
16035 That way, if there isn't a blank part of the toolbar, you can still
16036 change the toolbar appearance. As for what appears in these items, in
16037 Outlook Express, for example, there are three different menu items, one
16038 of which is called @strong{Buttons}, which brings up, or pops up a
16039 window which allows you to edit the toolbar, which for us could pop up a
16040 new frame, which is running @code{edit-toolbar.el}. The second item is
16041 called @strong{Align}, which contains a submenu that says @strong{Top},
16042 @strong{Bottom}, @strong{Left}, and @strong{Right}, which will be just
16043 like setting the default toolbar position. The third one says
16044 @strong{Text Labels}, which would just let you select whether there are
16045 captions or not. I think all three of these are useful and are easy to
16046 implement in XEmacs. These things also need to be integrated with
16047 custom so that a user can control whether these options apply to all
16048 sessions, and in such a case can save the settings out to an options
16049 file. @code{edit-toolbar.el} in particular needs to integrate with
16050 custom. Currently it has some sort of hokey stuff of its own, which it
16051 saves out to a @code{.toolbar} file. Another useful option to have,
16052 once we draw the captions dynamically rather than using pre-generated
16053 ones, would be the ability to change the font size of the captions. I'm
16054 sure that Kyle, for one, would appreciate this.
16055
16056 (This is incomplete.....)
16057
16058 @uref{../../www.666.com/ben/default.htm,Ben Wing}
16059
16060 @node Future Work -- Toolbar Interface Changes, , Future Work -- Easier Toolbar Customization, Future Work -- Toolbars
16061 @subsection Future Work -- Toolbar Interface Changes
16062 @cindex future work, toolbar interface changes
16063 @cindex toolbar interface changes, future work
16064
16065 I propose changing the way that toolbars are specified to make them more
16066 flexible.
16067
16068 @enumerate
16069 @item
16070
16071 A new format for the vector that specifies a toolbar item is allowed.
16072 In this format, the first three items of the vector are required and
16073 are, respectively, a caption, a glyph list, and a callback. The glyph
16074 list and callback arguments are the same as in the current toolbar item
16075 specification, and the caption is a string specifying the caption text
16076 placed below the toolbar glyph. The caption text is required so that
16077 toolbar items can be identified for the purpose of retrieving and
16078 changing their property values. Putting the caption first also makes it
16079 easy to distinguish between the new and the old toolbar item vector
16080 formats. In the old format, the first item, the glyph list, is either a
16081 list or a symbol. In the new format, the first item is a string. In
16082 the new format, following the three required items, are optional keyword
16083 items specified using keywords in the same format as the menu item
16084 vector format. The keywords that should be predefined are:
16085 @code{:help-echo}, @code{:context-menu}, @code{:drop-handlers}, and
16086 @code{:enabled-p}. The @code{:enabled-p} and @code{:help-echo} keyword
16087 arguments are the same as the third and fourth items in the old toolbar
16088 item vector format. The @code{:context-menu} keyword is a list in
16089 standard menu format that specifies additional items that will appear
16090 when the context menu for the toolbar item is popped up. (Typically,
16091 this happens when the right mouse button is clicked on the toolbar
16092 item). The @code{:drop-handlers} keyword is for use by the new
16093 drag-n-drop interface (see @uref{drag-n-drop.html,Drag-n-Drop Interface
16094 Changes} ), and is not normally specified or modified directly.
16095 @item
16096
16097
16098 Conceivably, there could also be keywords that are associated with a
16099 toolbar itself, rather than with a particular toolbar item. These
16100 keyword properties would be specified using keywords and arguments that
16101 occur before any toolbar item vectors, similarly to how things are done
16102 in menu specifications. Possible properties could include
16103 @code{:captioned-p} (whether the captions are visible under the
16104 toolbar), @code{:glyphs-visible-p} (whether the toolbar glyphs are
16105 visible), and @code{:context-menu} (additional items that will appear on
16106 the context menus for all toolbar items and additionally will appear on
16107 the context menu that is popped up when the right mouse button is
16108 clicked over a portion of the toolbar that does not have any toolbar
16109 buttons in it). The current standard practice with regards to such
16110 properties seems to be to have separate specifiers, such as
16111 @code{left-toolbar-width}, @code{right-toolbar-width},
16112 @code{left-toolbar-visible-p}, @code{right-toolbar-visible-p}, etc. It
16113 could easily be argued that there should be no such toolbar specifiers
16114 and that all such properties should be part of the toolbar instantiator
16115 itself. In this scheme, the only separate specifiers that would exist
16116 for individual properties would be default values. There are a lot of
16117 reasons why an interface change like this makes sense. For example,
16118 currently when VM sets its toolbar, it also sets the toolbar width and
16119 similar properties. If you change which edge of the frame the VM
16120 toolbar occurs in, VM will also have to go and modify all of the
16121 position-specific toolbar specifiers for all of the other properties
16122 associated with a toolbar. It doesn't really seem to make sense to me
16123 for the user to be specifying the width and visibility and such of
16124 specific toolbars that are attached to specific edges because the user
16125 should be free to move the toolbars around and expect that all of the
16126 toolbar properties automatically move with the toolbar. (It is also easy
16127 to imagine, for example, that a toolbar might not be attached to the
16128 edge of the frame at all, but might be floating somewhere on the user's
16129 screen). With an interface where these properties are separate
16130 specifiers, this has to be done manually. Currently, having the various
16131 toolbar properties be inside of toolbar instantiators makes them
16132 difficult to modify, but this will be different with the API that I
16133 propose below.
16134 @item
16135
16136
16137 I propose an API for modifying toolbar and toolbar item properties, as
16138 well as making other changes to toolbar instantiators, such as inserting
16139 or deleting toolbar items. This API is based around the concept of a
16140 path. There are two kinds of paths here -- @dfn{toolbar paths} and
16141 @dfn{toolbar item paths}. Each kind of path is an object (of type
16142 @code{toolbar-path} and @code{toolbar-item-path}, respectively) whose
16143 properties specify the location in a toolbar instantiator where changes
16144 to the instantiator can be made. A toolbar path, for example, would be
16145 created using the @code{make-toolbar-path} function, which takes a
16146 toolbar specifier (or optionally, a symbol, such as @code{left},
16147 @code{right}, @code{default}, or @code{nil}, which refers to a
16148 particular toolbar), and optionally, parameters such as the locale and
16149 the tag set, which specify which actual instantiator inside of the
16150 toolbar specifier is to be modified. A toolbar item path is created
16151 similarly using a function called @code{make-toolbar-item-path}, which
16152 takes a toolbar specifier and a string naming the caption of the toolbar
16153 item to be modified, as well as, of course, optionally the locale and
16154 tag set parameters and such.
16155
16156 The usefulness of these path objects is as arguments to functions that
16157 will use them as pointers to the place in a toolbar instantiator where
16158 the modification should be made. Recall, for example, the generalized
16159 property interface described above. If a function such as @code{get} or
16160 @code{put} is called on a toolbar path or toolbar item path, it will use
16161 the information contained in the path object to retrieve or modify a
16162 property located at the end of the path. The toolbar path objects can
16163 also be passed to new functions that I propose defining, such as
16164 @code{add-toolbar-item}, @code{delete-toolbar-item}, and
16165 @code{find-toolbar-item}. These functions should be parallel to the
16166 functions for inserting, deleting, finding, etc. items in a menu. The
16167 toolbar item path objects can also be passed to the drop-handler
16168 functions defined in @uref{drag-n-drop.html,Drag-n-Drop Interface
16169 Changes} to retrieve or modify the drop handlers that are associated
16170 with a toolbar item. (The idea here is that you can drag an object and
16171 drop it onto a toolbar item, just as you could onto a buffer, an extent,
16172 a menu item, or any other graphical element).
16173 @item
16174
16175
16176 We should at least think about allowing for separate default and
16177 buffer-local toolbars. The user should either be able to position these
16178 toolbars one above the other, or side by side, occupying a single
16179 toolbar line. In the latter case, the boundary between the toolbars
16180 should be draggable, and if a toolbar takes up more room than is
16181 allocated for it, there should be arrows that appear on one or both
16182 sides of the toolbar so that the items in the toolbar can be scrolled
16183 left or right. (For that matter, this sort of interface should exist
16184 even when there is only one toolbar that is on a particular toolbar
16185 line, because the toolbar may very well have more items than can be
16186 displayed at once, and it's silly in such a case if it's impossible to
16187 access the items that are not currently visible).
16188 @item
16189
16190
16191 The default context menu for toolbars (which should be specified using a
16192 specifier called @code{default-toolbar-context-menu} according to the
16193 rules defined above) should contain entries allowing the user to modify
16194 the appearance of a toolbar. Entries would include, for example,
16195 whether the toolbar is captioned, whether the glyphs for the toolbar are
16196 visible (if the toolbar is captioned but its glyphs are not visible, the
16197 toolbar appears as nothing but text; you can set things up this way, for
16198 example, in Netscape), an option that brings up a package for editing
16199 the contents of a toolbar, an option to allow the caption face to be
16200 dchanged (perhaps thorough jan @code{edit-faces} or @code{custom}
16201 interface), etc.
16202
16203 @end enumerate
16204
16205 @node Future Work -- Menu API Changes, Future Work -- Removal of Misc-User Event Type, Future Work -- Toolbars, Future Work
16206 @section Future Work -- Menu API Changes
16207 @cindex future work, menu API changes
16208 @cindex menu API changes, future work
16209
16210
16211 @enumerate
16212 @item
16213
16214 I propose making a specifier for the menubar associated with the frame.
16215 The specifier should be called @code{default-menubar} and should replace
16216 the existing @code{current-menubar} variable. This would increase the
16217 power of the menubar interface and bring it in line with the toolbar
16218 interface. (In order to provide proper backward compatibility, we might
16219 have to @uref{symbol-value-handlers.html,complete the symbol value
16220 handler mechanism})
16221 @item
16222
16223
16224 I propose an API for modifying menu instantiators similar to the API
16225 composed above for toolbar instantiators. A new object called a
16226 @dfn{menu path} (of type @code{menu-path}) can be created using the
16227 @code{make-menu-path} function, and specifies a location in a particular
16228 menu instantiator where changes can be made. The first argument to
16229 @code{make-menu-path} specifies which menu to modify and can be a
16230 specifier, a value such as @code{nil} (which means to modify the default
16231 menubar associated with the selected frame), or perhaps some other kind
16232 of specification referring to some other menu, such as the context menus
16233 invoked by the right mouse button. The second argument to
16234 @code{make-menu-path}, also required, is a list of zero or more strings
16235 that specifies the particular menu or menu item in the instantiator that
16236 is being referred to. The remaining arguments are optional and would be
16237 a locale, a tag set, etc. The menu path object can be passed to
16238 @code{get}, @code{put} or other standard property functions to access or
16239 modify particular properties of a menu or a menu item. It can also be
16240 passed to expanded versions of the existing functions such as
16241 @code{find-menu-item}, @code{delete-menu-item}, @code{add-menu-button},
16242 etc. (It is really a shame that @code{add-menu-item} is an obsolete
16243 function because it is a much better name than @code{add-menu-button}).
16244 Finally, the menu path object can be passed to the drop-handler
16245 functions described in @uref{drag-n-drop.html,Drag-n-Drop Interface
16246 Changes} to access or modify the drop handlers that are associated with
16247 a particular menu item.
16248 @item
16249
16250
16251 New keyword properties should be added to the menu item vector. These
16252 include @code{:help-echo}, @code{:context-menu} and
16253 @code{:drop-handlers}, with similar semantics to the corresponding
16254 keywords for toolbar items. (It may seem a bit strange at first to have
16255 a context menu associated with a particular menu item, but it is a user
16256 interface concept that exists both in Open Look and in Windows, and
16257 really makes a lot of sense if you give it a bit of thought). These
16258 properties may not actually be implemented at first, but at least the
16259 keywords for them should be defined.
16260
16261 @end enumerate
16262
16263 @uref{../../www.666.com/ben/default.htm,Ben Wing}
16264
16265 @node Future Work -- Removal of Misc-User Event Type, Future Work -- Mouse Pointer, Future Work -- Menu API Changes, Future Work
16266 @section Future Work -- Removal of Misc-User Event Type
16267 @cindex future work, removal of misc-user event type
16268 @cindex removal of misc-user event type, future work
16269
16270 @strong{Abstract:} This page describes why the misc-user event type
16271 should be split up into a number of different event types, and how to do
16272 this.
16273
16274 The misc-user event should not exist as a single event type. It should
16275 be split up into a number of different event types: one for scrollbar
16276 events, one for menu events, and one or two for drag-n-drop events.
16277 Possibly there will be other event types created in the future. The
16278 reason for this is that the misc-user event was a bad design choice when
16279 I made it, and it has only gotten worse with Oliver's attempts to add
16280 features to it to make it be used for drag-n-drop. I know that there
16281 was originally a separate drag-n-drop event type, and it was folded into
16282 the misc-user event type on my recommendation, but I have now realized
16283 the error of my ways. I had originally created a single event type in
16284 an attempt to prevent some Lisp programs from breaking because they
16285 might have a case statement over various event types, and would not be
16286 able to handle new event types appearing. I think now that these
16287 programs simply need to be written in a way to handle new event types
16288 appearing. It's not very hard to do this. You just use predicates
16289 instead of doing a case statement over the event type. If we preserve
16290 the existing predicate called @code{misc-user-event-p}, and just make
16291 sure that it evaluates to true when given any user event type other than
16292 the standard simple ones, then most existing code will not break either
16293 when we split the event types up like this, or if we add any new event
16294 types in the future.
16295
16296 More specifically, the only clean way to design the misc-user event type
16297 would be to add a sub-type field to it, and then have the nature of all
16298 the other fields in the event type be dependent on this sub-type. But
16299 then in essence, we'd just be reimplementing the whole event-type scheme
16300 inside of misc-user events, which would be rather pointless.
16301
16302 @node Future Work -- Mouse Pointer, Future Work -- Extents, Future Work -- Removal of Misc-User Event Type, Future Work
16303 @section Future Work -- Mouse Pointer
16304 @cindex future work, mouse pointer
16305 @cindex mouse pointer, future work
16306
16307 @menu
16308 * Future Work -- Abstracted Mouse Pointer Interface::
16309 * Future Work -- Busy Pointer::
16310 @end menu
16311
16312 @node Future Work -- Abstracted Mouse Pointer Interface, Future Work -- Busy Pointer, Future Work -- Mouse Pointer, Future Work -- Mouse Pointer
16313 @subsection Future Work -- Abstracted Mouse Pointer Interface
16314 @cindex future work, abstracted mouse pointer interface
16315 @cindex abstracted mouse pointer interface, future work
16316
16317 @strong{Abstract:} We need to create a new image format that allows
16318 standard pointer shapes to be specified in a way that works on all
16319 Windows systems. I suggest that this be called @code{pointer}, which
16320 has one tag associated with it, named @code{:data}, and whose value is a
16321 string. The possible strings that can be specified here are predefined
16322 by XEmacs, and are guaranteed to work across all Windows systems. This
16323 means that we may need to provide our own definition for pointer shapes
16324 that are not standard on some systems. In particular, there are a lot
16325 more standard pointer shapes under X than under Windows, and most of
16326 these pointer shapes are fairly useful. There are also a few pointer
16327 shapes (I think the hand, for example) on Windows, but not on X.
16328 Converting the X pointer shapes to Windows should be easy because the
16329 definitions of the pointer shapes are simply XBM files, which we can
16330 read under Windows. Going the other way might be a little bit more
16331 difficult, but it should still not be that hard.
16332
16333 While we're at it, we should change the image format currently called
16334 @code{cursor-font} to @code{x-cursor-font}, because it only works under
16335 X Windows. We also need to change the format called @code{resource} to
16336 be @code{mswindows-resource}. At least in the case of
16337 @code{cursor-font}, the old value should be maintained for compatibility
16338 as an obsolete alias. The @code{resource} format was added so recently
16339 that it's possible that we can just change it.
16340
16341 @uref{../../www.666.com/ben/default.htm,Ben Wing}
16342
16343 @node Future Work -- Busy Pointer, , Future Work -- Abstracted Mouse Pointer Interface, Future Work -- Mouse Pointer
16344 @subsection Future Work -- Busy Pointer
16345 @cindex future work, busy pointer
16346 @cindex busy pointer, future work
16347
16348 Automatically make the mouse pointer switch to a busy shape (watch
16349 signal) when XEmacs has been "busy" for more than, e.g. 2 seconds.
16350 Define the @dfn{busy time} as the time since the last time that XEmacs was
16351 ready to receive input from the user. An implementation might be:
16352
16353 @enumerate
16354 @item
16355 Set up an asynchronous timeout, to signal after the busy time; these
16356 are triggered through a call to QUIT so they will be triggered even
16357 when the code is busy doing something.
16358 @item
16359 We already have an "emacs_is_blocking" flag when we are waiting for
16360 input. In the same place, when we are about to block and wait for
16361 input (regardless of whether input is already present), maybe call a
16362 hook, which in this case would remove the timer and put back the
16363 normal mouse shape. Then when we exit the blocking stage (we got
16364 some input), call another hook, which in this case will start the
16365 timer. Note that we don't want these "blocking" hooks to be triggered
16366 just because of an accept-process-output or some similar thing that
16367 retrieves events, only to put them back onto a queue for later
16368 processing. Maybe we want some sort of flag that's bound by those
16369 routines saying that we aren't really waiting for input. Making
16370 that flag Lisp-accessible allows it to be set by similar sorts of
16371 Lisp routines (if there are any?) that loop retrieving events but
16372 defer them, or only drain the queue, or whatnot. #### Think about
16373 whether it would make some sense to try and be more clever in our
16374 determinations of what counts as "real waiting for user input", e.g.
16375 whether the event gets dispatched (unfortunately this occurs way too
16376 late, we want to know to remove the busy cursor @strong{before} getting an
16377 event), maybe whether there are any events waiting to be processed or
16378 we'll truly block, etc. (e.g. one possibility if there is input on
16379 the queue already when we "block" for input, don't remove the busy-
16380 wait pointer, but trigger the removal of it when we dispatch a user
16381 event).
16382 @end enumerate
16383
16384 @node Future Work -- Extents, Future Work -- Version Number and Development Tree Organization, Future Work -- Mouse Pointer, Future Work
16385 @section Future Work -- Extents
16386 @cindex future work, extents
16387 @cindex extents, future work
16388
16389 @menu
16390 * Future Work -- Everything should obey duplicable extents::
16391 @end menu
16392
16393 @node Future Work -- Everything should obey duplicable extents, , Future Work -- Extents, Future Work -- Extents
16394 @subsection Future Work -- Everything should obey duplicable extents
16395 @cindex future work, everything should obey duplicable extents
16396 @cindex everything should obey duplicable extents, future work
16397
16398 A lot of functions don't properly track duplicable extents. For
16399 example, the @code{concat} function does, but the @code{format} function
16400 does not, and extents in keymap prompts are not displayed either. All
16401 of the functions that generate strings or string-like entities should
16402 track the extents that are associated with the strings. Currently this
16403 is difficult because there is no general mechanism implemented for doing
16404 this. I propose such a general mechanism, which would not be hard to
16405 implement, and would be easy to use in other functions that build up
16406 strings.
16407
16408 The basic idea is that we create a C structure that is analogous to a
16409 Lisp string in that it contains string data and lists of extents for
16410 that data. Unlike standard Lisp strings, however, this structure (let's
16411 call it @code{lisp_string_struct}) can be incrementally updated and its
16412 allocation is handled explicitly so that no garbage is generated. (This
16413 is important for example, in the event-handling code which would want to
16414 use this structure, but needs to not generate any garbage for efficiency
16415 reasons). Both the string data and the list of extents in this string
16416 are handled using dynarrs so that it is easy to incrementally update
16417 this structure. Functions should exist to create and destroy instances
16418 of @code{lisp_string_struct} to generate a Lisp string from a
16419 @code{lisp_string_struct} and vice-versa to append a sub-string of a
16420 Lisp string to a @code{lisp_string_struct}, to just append characters to
16421 a @code{lisp_string_struct}, etc. The only thing possibly tricky about
16422 implementing these functions is implementing the copying of extents from
16423 a Lisp string into a @code{lisp_string_struct}. However, there is
16424 already a function @code{copy_string_extents()} that does basically this
16425 exact thing, and it should be easy to create a modified version of this
16426 function.
16427
16428 @uref{../../www.666.com/ben/default.htm,Ben Wing}
16429
16430 @node Future Work -- Version Number and Development Tree Organization, Future Work -- Improvements to the @code{xemacs.org} Website, Future Work -- Extents, Future Work
16431 @section Future Work -- Version Number and Development Tree Organization
16432 @cindex future work, version number and development tree organization
16433 @cindex version number and development tree organization, future work
16434
16435 @strong{Abstract:} The purpose of this proposal is to present a coherent
16436 plan for how development branches in XEmacs are managed. This will
16437 cover such issues as stable versus experimental branches, creating new
16438 branches, synchronizing patches between branches, and how version
16439 numbers are assigned to branches.
16440
16441 A development branch is defined to be a linear series of releases of the
16442 XEmacs code base, each of which is derived from the previous one. When
16443 the XEmacs development tree is forked and two branches are created where
16444 there used to be one, the branch that is intended to be more stable and
16445 have fewer changes made to it is considered the one that inherits the
16446 parent branch, and the other branch is considered to have begun at the
16447 branching point. The less stable of the two branches will eventually be
16448 forked again, while this will not happen usually to the more stable of
16449 the two branches, and its development will eventually come to an end.
16450 This means that every branch has a definite ending point. For example,
16451 the 20.x branch began at the point when the released
16452 19.13 code tree was split into a 19.x and a 20.x branch, and a 20.x
16453 branch will end when the last 20.x release (probably numbered 20.5 or
16454 20.6) is released.
16455
16456 I think that there should always be three active development branches at
16457 any time. These branches can be designated the stable, the semi-stable,
16458 and the experimental branches. This situation has existed in the
16459 current code tree as soon as the 21.0 development branch was split. In
16460 this situation, the stable branch is the 20.x series. The semi-stable
16461 branch is the 21.0 release and the stability releases that follow. The
16462 experimental branch is the branch that was created as the result of the
16463 21.0 development branch split. Typically, the stable branch has been
16464 released for a long period of time. The semi-stable branch has been
16465 released for a short period of time, or is about to be released, and the
16466 experimental branch has not yet been released, and will probably not be
16467 released for awhile. The conditions that should hold in all
16468 circumstances are:
16469
16470 @enumerate
16471 @item
16472
16473 There should be three active branches.
16474 @item
16475
16476 The experimental branch should never be in feature freeze.
16477
16478 @end enumerate
16479
16480 The reason for the second condition is to ensure that active development
16481 can always proceed and is never throttled, as is happening currently at
16482 the end of the 21.0 release cycle. What this means is that as soon as
16483 the experimental branch is deemed to be stable enough to go into feature
16484 freeze:
16485
16486 @enumerate
16487 @item
16488
16489 The current stable branch is made inactive and all further development
16490 on it ceases.
16491 @item
16492
16493 The semi-stable branch, which by now should have been released for a
16494 fair amount of time, and should be fairly stable, gets renamed to the
16495 stable branch.
16496 @item
16497
16498 The experimental branch is forked into two branches, one of which
16499 becomes the semi-stable branch, and the other, the experimental branch.
16500
16501 @end enumerate
16502
16503 The stable branch is always in high resistance, which is to say that the
16504 only changes that can be made to the code are important bug fixes
16505 involving a small amount of code where it should be clear just by
16506 reading the code that no destabilizing code has been introduced. The
16507 semi-stable branch is in low resistance, which means that no major
16508 features can be added, but except right before a release fairly major
16509 code changes are allowed. Features can be added if they are
16510 sufficiently small, if they are deemed sufficiently critical due to
16511 severe problems that would exist if the features were not added (for
16512 example, replacement of the unexec mechanism with a portable solution
16513 would be a feature that could be added to the semi-stable branch
16514 provided that it did not involve an overly radical code re-architecture,
16515 because otherwise it might be impossible to build XEmacs on some
16516 architectures or with some compilers), or if the primary purpose of the
16517 new feature is to remedy an incompleteness in a recent architectural
16518 change that was not finished in a prior release due to lack of time (for
16519 example, abstracting the mouse pointer and list-of-colors interfaces,
16520 which were left out of 21.0). There is no feature resistance in place
16521 in the experimental branch, which allows full development to proceed at
16522 all times.
16523
16524 In general, both the stable and semi-stable branches will contain
16525 previous net releases. In addition, there will be beta releases in all
16526 three branches, and possibly development snapshots between the beta
16527 releases. It's obviously necessary to have a good version numbering
16528 scheme in order to keep everything straight.
16529
16530 First of all, it needs to be immediately clear from the version number
16531 whether the release is a beta release or a net release. Steve has
16532 proposed getting rid of the beta version numbering system, which I think
16533 would be a big mistake. Furthermore, the net release version number and
16534 beta release version number should be kept separate, just as they are
16535 now, to make it completely clear where any particular release stands.
16536 There may be alternate ways of phrasing a beta release other than
16537 something like 21.0 beta 34, but in all such systems, the beta number
16538 needs to be zero for any release version. Three possible alternative
16539 systems, none of which I like very much, are:
16540
16541 @enumerate
16542 @item
16543
16544 The beta number is simply an extra number in the regular version number.
16545 Then, for example, 21.0 beta 34 becomes 21.0.34. The problem is that
16546 the release version, which would simply be called 21.0, appears to be
16547 earlier than 21.0 beta 34.
16548 @item
16549
16550 The beta releases appear as later revisions of earlier releases. Then,
16551 for example, 21.1 beta 34 becomes 21.0.34, and 21.0 beta 34 would have
16552 to become 21.-1.34. This has both the obvious ugliness of negative
16553 version numbers and the problem that it makes beta releases appear to be
16554 associated with their previous releases, when in fact they are more
16555 closely associated with the following release.
16556 @item
16557
16558 Simply make the beta version number be negative. In this scheme, you'd
16559 start with something like -1000 as the first beta, and then 21.0 beta 34
16560 would get renumbered to 21.0.-968. Obviously, this is a crazy and
16561 convoluted scheme as well, and we would be best to avoid it.
16562
16563 @end enumerate
16564
16565 Currently, the between-beta snapshots are not numbered, but I think that
16566 they probably should be. If appropriate scripts are handled to automate
16567 beta release, it should be very easy to have a version number
16568 automatically updated whenever a snapshot is made. The number could be
16569 added either as a separate snapshot number, and you'd have 21.0 beta 34
16570 pre 1, which becomes before 21.0 beta 34; or we could make the beta
16571 number be floating point, and then the same snapshot would have to be
16572 called 21.0 beta 33.1. The latter solution seems quite kludgey to me.
16573
16574 There also needs to be a clear way to distinguish, when a net release is
16575 made, which branch the release is a part of. Again, three solutions
16576 come to mind:
16577
16578 @enumerate
16579 @item
16580
16581 The major version number reflects which development branch the release
16582 is in and the minor version number indicates how many releases have been
16583 made along this branch. In this scheme, 21.0 is always the first
16584 release of the 21 series development branch, and when this branch is
16585 split, the child branch that becomes the experimental branch gets
16586 version numbers starting with 22. This scheme is the simplest, and it's
16587 the one I like best.
16588 @item
16589
16590 We move to a three-part version number. In this scheme, the first two
16591 numbers indicate the branch, and the third number indicates the release
16592 along the branch. In this scheme, we have numbers like 21.0.1, which
16593 would be the second release in the 21.0 series branch, and 21.1.2, which
16594 would be the third release in the
16595 21.1 series branch. The major version number then gets increased
16596 only very occasionally, and only when a sufficiently major architectural
16597 change has been made, particularly one that causes compatibility
16598 problems with code written for previous branches. I think schemes like
16599 this are unnecessary in most circumstances, because usually either the
16600 major version number ends up changing so often that the second number is
16601 always either zero or one, or the major version number never changes,
16602 and as such becomes useless. By the time the major version number would
16603 change, the product itself has changed so much that it often gets
16604 renamed. Furthermore, it is clear that the two version number scheme
16605 has been used throughout most of the history of Emacs, and recently we
16606 have been following the two number scheme also. If we introduced a
16607 third revision number, at this point it would both confuse existing code
16608 that assumed there were two numbers, and would look rather silly given
16609 that the major version number is so high and would probably remain at
16610 the same place for quite a long time.
16611 @item
16612
16613 A third scheme that would attempt to cross the two schemes would keep
16614 the same concept of major version number as for the three number scheme,
16615 and would compress the second and third numbers of the three number
16616 scheme into one number by using increments of ten. For example, the
16617 current 21.x branch would have releases No. 21.0, 21.1, etc. The next
16618 branch would be No. 21.10, 21.11, etc. I don't like this scheme very
16619 much because it seems rather kludgey, and also because it is not used in
16620 any other product as far as I know.
16621 @item
16622
16623 Another scheme that would combine the second and third numbers in the
16624 three number scheme would be to have the releases in the current 21.x
16625 series be numbered 21.0, then 21.01, then 22.02, etc. The next series
16626 is 21.1, then 21.11, then 21.12, etc. This is similar to the way that
16627 version numbers are done for DOS in Windows. I also think that this
16628 scheme is fairly silly because, like the previous scheme, its only
16629 purpose is to avoid increasing the major version number very much. But
16630 given that we have already have a fairly large major version number,
16631 there doesn't seem to be any particular problem with increasing this
16632 number by one every year or two. Some people will object that by doing
16633 this, it becomes impossible to tell when a change is so major that it
16634 causes a lot of code breakage, but past releases have not been accurate
16635 indicators of this. For example,
16636 19.12 caused a lot of code breakage, but 20.0 caused less, and 21.0
16637 caused less still. In the GNU Emacs world, there were byte code changes
16638 made between 19.28 and 19.29, but as far as I know, not between 19.29
16639 and 20.0.
16640
16641 @end enumerate
16642
16643 With three active development branches, synchronizing code changes
16644 between the branches is obviously somewhat of a problem. To make things
16645 easier, I propose a few general guidelines:
16646
16647 @enumerate
16648 @item
16649
16650 Merging between different branches need not happen that often. It
16651 should not happen more often than necessary to avoid undue burden on the
16652 maintainer, but needs to be done at all defined checkpoints. These
16653 checkpoints need to be noted in all of the places that track changes
16654 along the branch, for example, in all of the change logs and in all of
16655 the CVS tags.
16656 @item
16657
16658 Every code change that can be considered a self-contained unit, no
16659 matter how large or small, needs to have a change log entry, preferably
16660 a single change log entry associated with it. This is an absolute
16661 requirement. There should be no code changes without an associated
16662 change log entry. Otherwise, it is highly likely that patches will not
16663 be correctly synchronized across all versions, and will get lost. There
16664 is no need for change log entries to contain unnecessary detail though,
16665 and it is important that there be no more change log entries than
16666 necessary, which means that two or more change log entries associated
16667 with a single patch need to be grouped together if possible. This might
16668 imply that there should be one global change log instead of change logs
16669 in each directory, or at the very least, the number of separate change
16670 logs should be kept to a minimum.
16671 @item
16672
16673 The patch that is associated with each change log entry needs to be kept
16674 around somewhere. The reason for this is that when synchronizing code
16675 from some branch to some earlier branch, it is necessary to go through
16676 each change log entry and decide whether a change is worthy to make it
16677 into a more stable branch. If so, the patch associated with this change
16678 needs to be individually applied to the earlier branch.
16679 @item
16680
16681 All changes made in more stable branches get merged into less stable
16682 branches unless the change really is completely unnecessary in the less
16683 stable branch because it is superseded by some other change. This will
16684 probably mean more developers making changes to the semi-stable branch
16685 than to the experimental branch. This means that developers should
16686 strive to do their development in the most stable branch that they
16687 expect their code to go into. An alternative to this which is perhaps
16688 more workable is simply to insist that all developers make all patches
16689 based off of the experimental branch, and then later merge these patches
16690 down to the more stable branches as necessary. This means, however,
16691 that submitted patches should never be combinations of two or more
16692 unrelated changes. Whenever such patches are submitted, they should
16693 either be rejected (which should apply to anybody who should know
16694 better, which probably means everybody on the beta list and anybody else
16695 who is a regular contributor), or the maintainer or some other
16696 designated party needs to filter the combined patch into separate
16697 patches, one per logical change.
16698 @item
16699
16700 The maintainer should keep all the patches around in some data base, and
16701 the patches should be given an identifier consisting of the author of
16702 the patch, the date the patch was submitted, and some other identifying
16703 characteristic, such as a number, in case there is more than one patch
16704 on the same date by the same author. The database should hopefully be
16705 correctly marked at all times with something indicating which branches
16706 the patch has been applied to, and this database should hopefully be
16707 publicly visible so that patch authors can determine whether their
16708 patches have been applied, and whether their patches have been received,
16709 so that patches do not get needlessly resubmitted.
16710 @item
16711
16712 Global automatable changes such as textual renaming, reordering, and
16713 additions or deletions of parameters in function calls should still be
16714 allowed, even with multiple development branches. (Sometimes these are
16715 necessary for code cleanliness, and in the long run, they save a lot of
16716 time, even through they may cause some headaches in the short-term.) In
16717 general, when such changes are made, they should occur in a separate
16718 beta version that contains only such changes and no other patches, and
16719 the changes should be made in both the semi-stable and experimental
16720 branches at the same time. The description of the beta version should
16721 make it very clear that the beta is comprised of such changes. The
16722 reason for doing these things is to make it easier for people to diff
16723 between beta versions in order to figure out the changes that were made
16724 without the diff getting cluttered up by these code cleanliness changes
16725 that don't change any actual behavior.
16726
16727 @end enumerate
16728
16729 @uref{../../www.666.com/ben,Ben Wing}
16730
16731 @node Future Work -- Improvements to the @code{xemacs.org} Website, Future Work -- Keybindings, Future Work -- Version Number and Development Tree Organization, Future Work
16732 @section Future Work -- Improvements to the @code{xemacs.org} Website
16733 @cindex future work, improvements to the @code{xemacs.org} website
16734 @cindex improvements to the @code{xemacs.org} website, future work
16735
16736 The @code{xemacs.org} web site is the face that XEmacs presents to the
16737 outside world. In my opinion, its most important function is to present
16738 information about XEmacs in such a way that solicits new XEmacs users
16739 and co-contributors. Existing members of the XEmacs community can
16740 probably find out most of the information they want to know about XEmacs
16741 regardless of what shape the web site is in, or for that matter, perhaps
16742 even if the web site doesn't exist at all. However, potential new users
16743 and co-contributors who go to the XEmacs web site and find it out of
16744 date and/or lacking the information that they need are likely to be
16745 turned away and may never return. For this reason, I think it's
16746 extremely important that the web site be up-to-date, well-organized, and
16747 full of information that an inquisitive visitor is likely to want to
16748 know.
16749
16750 The current XEmacs web site needs a lot of work if it is to meet these
16751 standards. I don't think it's reasonable to expect one person to do all
16752 of this work and make continual updates as needed, especially given the
16753 dismal record that the XEmacs web site has had. The proper thing to do
16754 is to place the web site itself under CVS and allow many of the core
16755 members to remotely check files in and out. This way, for example,
16756 Steve could update the part of the site that contains the current
16757 release status of XEmacs. (Much of this could be done by a script that
16758 Steve executes when he sends out a beta release announcement which
16759 automatically HTML-izes the mail message and puts it in the appropriate
16760 place on the web site. There are programs that are specifically
16761 designed to convert email messages into HTML, for example
16762 @code{mhonarc}.) Meanwhile, the @code{xemacs.org} mailing list
16763 administrator (currently Jason Mastaler, I think) could maintain the
16764 part of the site that describes the various mailing lists and other
16765 addresses at @code{xemacs.org}. Someone like me (perhaps through a
16766 proxy typist) could maintain the part of the site that specifies the
16767 future directions that XEmacs is going in, etc., etc.
16768
16769 Here are some things that I think it's very important to add to the web
16770 site.
16771
16772 @enumerate
16773 @item
16774
16775 A page describing in detail how to get involved in the XEmacs
16776 development process, how to submit and where to submit various patches
16777 to the XEmacs core or associated packages, how to contact the
16778 maintainers and core developers of XEmacs and the maintainers of various
16779 packages, etc.
16780 @item
16781
16782 A page describing exactly how to download, compile, and install XEmacs,
16783 and how to download and install the various binary distributions. This
16784 page should particularly cover in detail how exactly the package system
16785 works from an installation standpoint and how to correctly compile and
16786 install under Microsoft Windows and Cygwin. This latter section should
16787 cover what compilers are needed under Microsoft Windows and Cygwin, and
16788 how to get and install the Cygwin components that are needed.
16789 @item
16790
16791 A page describing where to get the various ancillary libraries that can
16792 be linked with XEmacs, such as the JPEG, TIFF, PNG, X-Face, DBM, and
16793 other libraries. This page should also cover how to correctly compile
16794 it and install these libraries, including under Microsoft Windows (or at
16795 least it should contain pointers to where this information can be
16796 found). Also, it should describe anything that needs to be specified as
16797 an option to @code{configure} in order for XEmacs to link with and make
16798 use of these libraries or of Motif or CDE. Finally, this page should
16799 list which versions of the various libraries are required for use with
16800 the various different beta versions of XEmacs. (Remember, this can
16801 change from beta to beta, and someone needs to keep a watchful eye on
16802 this).
16803 @item
16804
16805 Pointers to any other sites containing information on XEmacs. This
16806 would include, for example, Hrvoje's XEmacs on Windows FAQ and my
16807 Architecting XEmacs web site. (Presumably, most of the information in
16808 this section will be temporary. Eventually, these pages should be
16809 integrated into the main XEmacs web site).
16810 @item
16811
16812 A page listing the various sub-projects in the XEmacs development
16813 process and who is responsible for each of these sub-projects, for
16814 example development of the package system, administration of the mailing
16815 lists, maintenance of stable XEmacs versions, maintenance of the CVS web
16816 interface, etc. This page should also list all of the packages that are
16817 archived at @code{xemacs.org} and who is the maintainer or maintainers
16818 for each of these packages.
16819
16820 @end enumerate
16821
16822 @subheading Other Places with an XEmacs Presence
16823
16824 We should try to keep an XEmacs presence in all of the major places on
16825 the web that are devoted to free software or to the "open source"
16826 community. This includes, for example, the open source web site at
16827 @uref{../../opensource.oreilly.com/default.htm,http://opensource.oreilly.com}
16828 (I'm already in the process of contacting this site), the Freshmeat site
16829 at @uref{../../www.freshmeat.net/default.htm,http://www.freshmeat.net},
16830 the various announcement news groups (for example,
16831 @uref{news:comp.os.linux.announce,comp.os.linux.announce}, and the
16832 Windows announcement news group) etc.
16833
16834 @uref{../../www.666.com/ben/default.htm,Ben Wing}
16835
16836 @node Future Work -- Keybindings, Future Work -- Byte Code Snippets, Future Work -- Improvements to the @code{xemacs.org} Website, Future Work
16837 @section Future Work -- Keybindings
16838 @cindex future work, keybindings
16839 @cindex keybindings, future work
16840
16841 @menu
16842 * Future Work -- Keybinding Schemes::
16843 * Future Work -- Better Support for Windows Style Key Bindings::
16844 * Future Work -- Misc Key Binding Ideas::
16845 @end menu
16846
16847 @node Future Work -- Keybinding Schemes, Future Work -- Better Support for Windows Style Key Bindings, Future Work -- Keybindings, Future Work -- Keybindings
16848 @subsection Future Work -- Keybinding Schemes
16849 @cindex future work, keybinding schemes
16850 @cindex keybinding schemes, future work
16851
16852 @strong{Abstract:} We need a standard mechanism that allows a different
16853 global key binding schemes to be defined. Ideally, this would be the
16854 @uref{keyboard-actions.html,keyboard action interface} that I have
16855 proposed, however this would require a lot of work on the part of mode
16856 maintainers and other external Elisp packages and will not be rady in
16857 the short term. So I propose a very kludgy interface, along the lines
16858 of what is done in Viper currently. Perhaps we can rip that key munging
16859 code out of Viper and make a separate extension that implements a global
16860 key binding scheme munging feature. This way a key binding scheme could
16861 rearrange all the default keys and have all sorts of other code, which
16862 depends on the standard keys being in their default location, still
16863 work.
16864
16865 @node Future Work -- Better Support for Windows Style Key Bindings, Future Work -- Misc Key Binding Ideas, Future Work -- Keybinding Schemes, Future Work -- Keybindings
16866 @subsection Future Work -- Better Support for Windows Style Key Bindings
16867 @cindex future work, better support for windows style key bindings
16868 @cindex better support for windows style key bindings, future work
16869
16870 @strong{Abstract:} This page describes how we could create an XEmacs
16871 extension that modifies the global key bindings so that a Windows user
16872 would feel at home when using the keyboard in XEmacs. Some of these
16873 bindings don't conflict with standard XEmacs keybindings and should be
16874 added by default, or at the very least under Windows, and probably under
16875 X Windows as well. Other key bindings would need to be implemented in a
16876 Windows compatibility extension which can be enabled and disabled on the
16877 fly, following the conventions outlined in
16878 @uref{enabling-extensions.html,Standard interface for enabling
16879 extensions} Ideally, this should be implemented using the
16880 @uref{keyboard-actions.html,keyboard action interface} but these wil not
16881 be available in the short term, so we will have to resort to some awful
16882 kludges, following the model of Michael Kifer's Viper mode.
16883
16884 We really need to make XEmacs provide standard Windows key bindings as
16885 much as possible. Currently, for example, there are at least two
16886 packages that allow the user to make a selection using the shifted arrow
16887 keys, and neither package works all that well, or is maintained. There
16888 should be one well-written piece of code that does this, and it should
16889 be a standard part of XEmacs. In fact, it should be turned on by
16890 default under Windows, and probably under X as well. (As an aside here,
16891 one point of contention in how to implement this involves what happens
16892 if you select a region using the shifted arrow keys and then hit the
16893 regular arrow keys. Does the region remain selected or not? I think
16894 there should be a variable that controls which of these two behaviors
16895 you want. We can argue over what the default value of this variable
16896 should be. The standard Windows behavior here is to keep the region
16897 selected, but move the insertion point elsewhere, which is unfortunately
16898 impossible to implement in XEmacs.)
16899
16900 Some thought should be given to what to do about the standard Windows
16901 control and alt key bindings. Under NTEmacs, there is a variable that
16902 controls whether the alt key behaves like the Emacs meta key, or whether
16903 it is passed on to the menu as in standard Windows programs. We should
16904 surely implement this and put this option on the @strong{Options} menu.
16905 Making @kbd{Alt-f} for example, invoke the @strong{File} menu, is not
16906 all that disruptive in XEmacs, because the user can always type @kbd{ESC
16907 f} to get the meta key functionality. Making @kbd{Control-x}, for
16908 example, do @strong{Cut}, is much, much more problematic, of course, but
16909 we should consider how to implement this anyway. One possibility would
16910 be to move all of the current Emacs control key bindings onto
16911 control-shift plus a key, and to make the simple control keys follow the
16912 Windows standard as much as possible. This would mean, for example,
16913 that we would have the following keybindings:@* @kbd{Control-x} ==>
16914 @strong{Cut} @* @kbd{Control-c} ==> @strong{Copy} @* @kbd{Control-v} ==>
16915 @strong{Paste} @* @kbd{Control-z} ==> @strong{Undo}@* @kbd{Control-f}
16916 ==> @strong{Find} @* @kbd{Control-a} ==> @strong{Select All}@*
16917 @kbd{Control-s} ==> @strong{Save}@* @kbd{Control-p} ==> @strong{Print}@*
16918 @kbd{Control-y} ==> @strong{Redo}@* (this functionality @emph{is}
16919 available in XEmacs with Kyle Jones' @code{redo.el} package, but it
16920 should be better integrated)@* @kbd{Control-n} ==> @strong{New} @*
16921 @kbd{Control-o} ==> @strong{Open}@* @kbd{Control-w} ==> @strong{Close
16922 Window}@*
16923
16924 The changes described in the previous paragraph should be put into an
16925 extension named @code{windows-keys.el} (see
16926 @uref{enabling-extensions.html,Standard interface for enabling
16927 extensions}) so that it can be enabled and disabled on the fly using a
16928 menu item and can be selected as the default for a particular user in
16929 their custom options file. Once this is implemented, the Windows
16930 installer should also be modified so that it brings up a dialog box that
16931 allows the user to make a selection of which key binding scheme they
16932 would prefer as the default, either the XEmacs standard bindings, Vi
16933 bindings (which would be Viper mode), Windows-style bindings, Brief,
16934 CodeWright, Visual C++, or whatever we manage to implement.
16935
16936 @uref{../../www.666.com/ben/default.htm,Ben Wing}
16937
16938 @node Future Work -- Misc Key Binding Ideas, , Future Work -- Better Support for Windows Style Key Bindings, Future Work -- Keybindings
16939 @subsection Future Work -- Misc Key Binding Ideas
16940 @cindex future work, misc key binding ideas
16941 @cindex misc key binding ideas, future work
16942
16943 @itemize
16944 @item
16945 M-123 ... do digit arg
16946
16947 @item
16948 However, M-( group commands together until M-)
16949
16950 @item
16951 Nested M-() are allowed.
16952
16953 @item
16954 Number repeating plus () repeats N times each group of commands as a
16955 unit.
16956
16957 @item
16958 M-() by itself forms an anonymous macro, and there should be a
16959 command to repeat, like VI (execute macro), but when no () before,
16960 it repeats the last command of same amount of complication - or more
16961 like, somewhere there is a repeats all command back to make to act
16962 that stopping like VI's dot command.
16963
16964 @item
16965 C-numbers switches to a particular window. maybe 1-3 or 1-4 does
16966 this.
16967
16968 @item
16969 C-4 or 5 to 9 (or ()? maybe reserved) switches to a particular frame.
16970
16971 @item
16972 Possibly C-Sh-numbers select more windows or frames.
16973
16974 @item
16975 M-C-1
16976 M-C-2
16977 M-C-3
16978 M-C-4
16979 M-C-5
16980 M-C-6
16981 M-C-7
16982 M-C-8
16983 M-C-9
16984 M-C-0
16985
16986 maybe should be execute anonymous macros (other possibility is insert
16987 register but you can easily simulate with a keyboard macro)
16988
16989 @item
16990 What about C-S M-C-S M-S??
16991
16992 @item
16993 I think there should be default fun key binding for @strong{ILLEGIBLE}
16994 similar to what I have - load, save, cut, copy, paste, kill line,
16995 start/end macro, do macro
16996 @end itemize
16997
16998 @node Future Work -- Byte Code Snippets, Future Work -- Lisp Stream API, Future Work -- Keybindings, Future Work
16999 @section Future Work -- Byte Code Snippets
17000 @cindex future work, byte code snippets
17001 @cindex byte code snippets, future work
17002
17003 @itemize
17004 @item
17005 For use in time critical (e.g. redisplay) places such as display
17006 tables - a simple piece of code is evalled, e.g.
17007 @example
17008 (int-to-char (1+ c))
17009 @end example
17010 where c is the arg, specbound.
17011
17012 @item
17013 can be compiled like
17014 @example
17015 (byte-compile-snippet (int-to-char (1+ c)) (c))
17016 ^^^
17017 environment of local vars
17018 @end example
17019
17020 @item
17021 need eval with bindings (not hard to implement)
17022 (extendable when lexical scoping present)
17023
17024 @item
17025 What's the return value of byte-compile-snippet?
17026 (Look to see how this might be implemented)
17027 @end itemize
17028
17029 @menu
17030 * Future Work -- Autodetection::
17031 * Future Work -- Conversion Error Detection::
17032 * Future Work -- BIDI Support::
17033 * Future Work -- Localized Text/Messages::
17034 @end menu
17035
17036 @node Future Work -- Autodetection, Future Work -- Conversion Error Detection, Future Work -- Byte Code Snippets, Future Work -- Byte Code Snippets
17037 @subsection Future Work -- Autodetection
17038 @cindex future work, autodetection
17039 @cindex autodetection, future work
17040
17041 There are various proposals contained here.
17042
17043 @subsection New Implementation of Autodetection Mechanism
17044
17045 The current auto detection mechanism in XEmacs Mule has many
17046 problems. For one thing, it is wrong too much of the time. Another
17047 problem, although easily fixed, is that priority lists are fixed rather
17048 than varying, depending on the particular locale; and finally, it
17049 doesn't warn the user when it's not sure of the encoding or when there's
17050 a mistake made during decoding. In both of these situations the user
17051 should be presented with a list of likely encodings and given the
17052 choice, rather than simply proceeding anyway and giving a result that is
17053 likely to be wrong and may result in data corruption when the file is
17054 saved out again.
17055
17056 All coding systems are categorized according to their type. Currently
17057 this includes ISO2022, Big 5, Shift-JIS, UTF8 and a few others. In
17058 the future there will be many more types defined and this mechanism
17059 will be generalized so that it is easily extendable by the Lisp
17060 programmer.
17061
17062 In general, each coding system type defines a series of subtypes which
17063 are handled differently for the purpose of detection. For example, ISO
17064 2022 defines many different subtypes such as 7 bit, 8 bit, locking
17065 shift, designating and so on. UCS2 may define subtypes such as normal
17066 and byte reversed.
17067
17068 The detection engine works conceptually by calling the detection
17069 methods of all of the defined coding system types in parallel on
17070 successive chunks of data (which may, for example, be 4K in size, but
17071 where the size makes no difference except for optimization purposes)
17072 and watching the results until either a definite answer is determined
17073 or the end of data is reached. The way the definite answer is
17074 determined will be defined below. The detection method of the coding
17075 system type is passed some data and a chunk of memory, which the
17076 method uses to store its current state (and which is maintained
17077 separately for each coding system type by the detection engine between
17078 successive calls to the coding system type's detection method). Its
17079 return value should be an alist consisting of a list of all of the
17080 defined subtypes for that coding system type along with a level of
17081 likelihood and a list of additional properties indicating certain
17082 features detected in the data. The extra properties returned are
17083 defined entirely by the particular coding system type and are used
17084 only in the algorithm described below under "user control." However,
17085 the levels of likelihood have a standard meaning as follows:
17086
17087 Level 4 means "near certainty" and typically indicates that a
17088 signature has been detected, usually at the beginning of the data,
17089 indicating that the data is encoded in this particular coding system
17090 type. An example of this would be the byte order mark at the beginning
17091 of UCS2 encoded data or the GZIP mark at the beginning of GZIP data.
17092
17093 Level 3 means "highly likely" and indicates that tell-tale signs have
17094 been discovered in the data that are characteristic of this particular
17095 coding system type. Examples of this might be ISO 2022 escape
17096 sequences or the current Unicode end of line markers at regular
17097 intervals.
17098
17099 Level 2 means "strongly statistically likely" indicating that
17100 statistical analysis concludes that there's a high chance that this
17101 data is encoded according to this particular type. For example, this
17102 might mean that for UCS2 data, there is a high proportion of null bytes
17103 or other repeated bytes in the odd-numbered bytes of the data and a
17104 high variance in the even-numbered bytes of the data. For Shift-JIS,
17105 this might indicate that there were no illegal Shift-JIS sequences
17106 and a fairly high occurrence of common Shift-JIS characters.
17107
17108 Level 1 means "weak statistical likelihood" meaning that there is some
17109 indication that the data is encoded in this coding system type. In
17110 fact, there is a reasonable chance that it may be some other type as
17111 well. This means, for example, that no illegal sequences were
17112 encountered and at least some data was encountered that is purposely
17113 not in other coding system types. For Shift-JIS data, this might mean
17114 that some bytes in the range 128 to 159 were encountered in the data.
17115
17116 Level 0 means "neutral" which is to say that there's either not enough
17117 data to make any decision or that the data could well be interpreted
17118 as this type (meaning no illegal sequences), but there is little or no
17119 indication of anything particular to this particular type.
17120
17121 Level -1 means "weakly unlikely" meaning that some data was
17122 encountered that could conceivably be part of the coding system type
17123 but is probably not. For example, successively long line-lengths or
17124 very rarely-encountered sequences.
17125
17126 Level -2 means "strongly unlikely" meaning that typically a number
17127 of illegal sequences were encountered.
17128
17129 The algorithm to determine when to stop and indicate that the data has
17130 been detected as a particular coding system uses a priority list,
17131 which is typically specified as part of the language environment
17132 determined from the current locale or the user's choice. This priority
17133 list consists of a list of coding system subtypes, along with a
17134 minimum level required for positive detection and optionally
17135 additional properties that need to be present. Using the return values
17136 from all of the detection methods called, the detection engine looks
17137 through this priority list until it finds a positive match. In this
17138 priority list, along with each subtype is a particular coding system
17139 to return when the subtype is encountered. (For example, in a
17140 Japanese-language environment particular subtypes of ISO 2022 will be
17141 associated with the Japanese coding system version of those
17142 subtypes). It is perfectly legal and quite common in fact, to list the
17143 same subtype more than once in the priority list with successively
17144 lower requirements. Other facts that can be listed in the priority
17145 list for a subtype are "reject", meaning that the data should never be
17146 detected as this subtype, or "ask", meaning that if the data is
17147 detected to be this subtype, the user will be asked whether they
17148 actually mean this. This latter property could be used, for example,
17149 towards the bottom of the priority list.
17150
17151 In addition there is a global variable which specifies the minimum
17152 number of characters required before any positive match is
17153 reported. There may actually be more than one such variable for
17154 different sources of data, for example, detection of files versus
17155 detection of subprocess data.
17156
17157 Whenever a file is opened and detected to be a particular coding
17158 system, the subtype, the coding system and the associated level of
17159 likelihood will be prominently displayed either in the echo area or in
17160 a status box somewhere.
17161
17162 If no positive match is found according to the priority list, or if
17163 the matches that are found have the "ask" property on them, then the
17164 user will be presented with a list of choices of possible encodings
17165 and asked to choose one. This list is typically sorted first by level
17166 of likelihood, and then within this, by the order in which the
17167 subtypes appear in the priority list. This list is displayed in a
17168 special kind of dialog box or other buffer allowing the user, in
17169 addition to just choosing a particular encoding, to view what the
17170 file would look like if it were decoded according to the type.
17171
17172 Furthermore, whenever a file is decoded according to a particular
17173 type, the decoding engine keeps track of status values that are output
17174 by the coding system type's decoding method. Generally, this status
17175 will be in the form of errors or warnings of various levels, some of
17176 which may be severe enough to stop the decoding entirely, and some of
17177 which may either indicate definitely malformed data but from which
17178 it's possible to recover, or simply data that appears rather
17179 questionable. If any of these status values are reported during
17180 decoding, the user will be informed of this and asked "are you sure?"
17181 As part of the "are you sure" dialog box or question, the user can
17182 display the results of the decoding to make sure it's correct. If the
17183 user says "no, they're not sure," then the same list of choices as
17184 previously mentioned will be presented.
17185
17186 @subheading Implementation of Coding System Priority Lists in Various Locales
17187
17188 @example
17189 @enumerate
17190 @item
17191 Default locale
17192
17193 @enumerate
17194 @item
17195 Some Unicode (fixed width; maybe UTF-8, too?) may optionally
17196 be detected by the byte-order-mark magic (if the first two
17197 bytes are 0xFE 0xFF, the file is Unicode text, if 0xFF 0xFE,
17198 it is wrong-endian Unicode; if legal in UTF-8, it would be
17199 0xFE 0xBB 0xBF, either-endian). This is probably an
17200 optimization that should not be on by default yet.
17201
17202 @item
17203 ISO-2022 encodings will be detected as long as they use
17204 explicit designation of all non-ASCII character sets. This
17205 means that many 7-bit ISO-2022 encodings would be detected
17206 (eg, ISO-2022-JP), but EUC-JP and X Compound Text would not,
17207 because they implicitly designate character sets.
17208
17209 N.B. Latin-1 will be detected as binary, as for any Latin-*.
17210
17211 N.B. An explicit ISO-2022 designation is semantically
17212 equivalent to a Content-Type: header. It is more dangerous
17213 because shorter, but I think we should recognize them by
17214 default despite the slight risk; XEmacs is a text editor.
17215
17216 N.B. This is unlikely to be as dangerous as it looks at first
17217 glance. Any file that includes an 8-bit-set byte before the
17218 first valid designation should be detected as binary.
17219
17220 @item
17221 Binary files will be detected (eg, presence of NULs, other
17222 non-whitespace control characters, absurdly long lines, and
17223 presence of bytes >127).
17224
17225 @item
17226 Everything else is ASCII.
17227
17228 @item
17229 Newlines will be detected in text files.
17230 @end enumerate
17231
17232 @item
17233 European locales
17234
17235 @enumerate
17236 @item
17237 Unicode may optionally be detected by the byte-order-mark
17238 magic.
17239
17240 @item
17241 ISO-2022 encodings will be detected as long as they use
17242 explicit designation of all non-ASCII character sets.
17243
17244 @item
17245 A locale-specific class of 1-byte character sets (eg,
17246 '(Latin-1)) will be detected.
17247
17248 N.B. The reason for permitting a class is for cases like
17249 Cyrillic where there are both ISO-8859 encodings and
17250 incompatible encodings (KOI-8r) in common use. If you want to
17251 write a Latin-1 v. Latin-2 detector, be my guest, but I don't
17252 think it would be easy or accurate.
17253
17254 @item
17255 Binary files will be detected per (2)(c), except that only
17256 8-bit bytes out of the encoding's range imply binary.
17257
17258 @item
17259 Everything else is ASCII.
17260
17261 @item
17262 Newlines will be detected in text files.
17263 @end enumerate
17264
17265 @item
17266 CJK locales
17267
17268 @enumerate
17269 @item
17270 Unicode may optionally be detected by the byte-order-mark
17271 magic.
17272
17273 @item
17274 ISO-2022 encodings will be detected as long as they use
17275 explicit designation of all non-ASCII character sets.
17276
17277 @item
17278 A locale-specific class of multi-byte and wide-character
17279 encodings will be detected.
17280 N.B. No 1-byte character sets (eg, Latin-1) will be detected.
17281 The reason for a class is to allow the Japanese to let Mule do
17282 the work of choosing EUC v. SJIS.
17283
17284 @item
17285 Binary files will be detected per (3)(d).
17286
17287 @item
17288 Everything else is ASCII.
17289
17290 @item
17291 Newlines will be detected in text files.
17292 @end enumerate
17293
17294 @item
17295 Unicode and general locales; multilingual use
17296 @end enumerate
17297
17298 @enumerate
17299 @item
17300 Hopefully a system general enough to handle (2)--(4) will
17301 handle these, too, but we should watch out for gotchas like
17302 Unicode "plane 14" tags which (I think _both_ Ben and Olivier
17303 will agree) have no place in the internal representation, and
17304 thus must be treated as out-of-band control sequences. I
17305 don't know if all such gotchas will be as easy to dispose of.
17306
17307 @item
17308 An explicit coding system priority list will be provided to
17309 allow multilingual users to autodetect both Shift JIS and Big
17310 5, say, but this ability is not promised by Mule, since it
17311 would involve (eg) heuristics like picking a set of code
17312 points that are frequent in Shift JIS and uncommon in Big 5
17313 and betting that a file containing many characters from that
17314 set is Shift JIS.
17315 @end enumerate
17316 @end example
17317
17318 @subheading Better Algorithm, More Flexibility, Different Levels of Certainty
17319
17320 @subheading Much More Flexible Coding System Priority List, per-Language Environment
17321
17322 @subheading User Ability to Select Encoding when System Unsure or Encounters Errors
17323
17324 @subheading Another Autodetection Proposal
17325
17326 however, in general the detection code has major problems and needs lots
17327 of work:
17328
17329 @itemize @bullet
17330 @item
17331 instead of merely "yes" or "no" for particular categories, we need a
17332 more flexible system, with various levels of likelihood. Currently
17333 I've created a system with six levels, as follows:
17334
17335 [see file-coding.h]
17336
17337 Let's consider what this might mean for an ASCII text detector. (In
17338 order to have accurate detection, especially given the iteration I
17339 proposed below, we need active detectors for @strong{all} types of data we
17340 might reasonably encounter, such as ASCII text files, binary files,
17341 and possibly other sorts of ASCII files, and not assume that simply
17342 "falling back to no detection" will work at all well.)
17343
17344 An ASCII text detector DOES NOT report ASCII text as level 0, since
17345 that's what the detector is looking for. Such a detector ideally
17346 wants all bytes in the range 0x20 - 0x7E (no high bytes!), except for
17347 whitespace control chars and perhaps a few others; LF, CR, or CRLF
17348 sequences at regular intervals (where "regular" might mean an average
17349 < 100 chars and 99% < 300 for code and other stuff of the "text file
17350 w/line breaks" variety, but for the "text file w/o line breaks"
17351 variety, excluding blank lines, averages could easily be 600 or more
17352 with 2000-3000 char "lines" not so uncommon); similar statistical
17353 variance between odds and evens (not Unicode); frequent occurrences of
17354 the space character; letters more common than non-letters; etc. Also
17355 checking for too little variability between frequencies of characters
17356 and for exclusion of particular characters based on character ranges
17357 can catch ASCII encodings like base-64, UUEncode, UTF-7, etc.
17358 Granted, this doesn't even apply to everything called "ASCII", and we
17359 could potentially distinguish off ASCII for code, ASCII for text,
17360 etc. as separate categories. However, it does give us a lot to work
17361 off of, in deciding what likelihood to choose -- and it shows there's
17362 in fact a lot of detectable patterns to look for even in something
17363 seemingly so generic as ASCII. The detector would report most text
17364 files in level 1 or level 2. EUC encodings, Shift-JIS, etc. probably
17365 go to level -1 because they also pass the EOL test and all other tests
17366 for the ASCII part of the text, but have lots of high bytes, which in
17367 essence turn them into binary. Aberrant text files like something in
17368 BASE64 encoding might get placed in level 0, because they pass most
17369 tests but fail dramatically the frequency test; but they should not be
17370 reported as any lower, because that would cause explicit prompting,
17371 and the user should be able any valid text file without prompting.
17372 The escape sequences and the base-64-type checks might send 7-bit
17373 iso2022 to 0, but probably not -1, for similar reasons.
17374
17375 @item
17376 The assumed algorithm for the above detection levels is to in essence
17377 sort categories first by detection level and then by priority.
17378 Perhaps, however, we would want smarter algorithms, or at least
17379 something user-controllable -- in particular, when (other than no
17380 category at level 0 or greater) do we prompt the user to pick a
17381 category?
17382
17383 @item
17384 Improvements in how the detection algorithm works: we want to handle
17385 lots of different ways something could be encoded, including multiple
17386 stacked encodings. trying to specify a series of detection levels
17387 (check for base64 first, then check for gzip, then check for an i18n
17388 decoding, then for crlf) won't generally work. for example, what
17389 about the same encoding appearing more than once? for example, take
17390 euc-jp, base64'd, then gzip'd, then base64'd again: this could well
17391 happen, and you could specify the encodings specifically as
17392 base64|gzip|base64|euc-jp, but we'd like to autodetect it without
17393 worrying about exactly what order these things appear in. we should
17394 allow for iterating over detection/decoding cycles until we reach
17395 some maximum (we got stuck in a loop, due to incorrect category
17396 tables or detection algorithms), have no reported detection levels
17397 over -1, or we end up with no change after a decoding pass (i.e. the
17398 coding system associated with a chosen category was @code{no-conversion}
17399 or something equivalent). it might make sense to divide things into
17400 two phases (internal and external), where the internal phase has a
17401 separate category list and would probably mostly end up handling EOL
17402 detection; but the i think about it, the more i disagree. with
17403 properly written detectors, and properly organized tables (in
17404 general, those decodings that are more "distinctive" and thus
17405 detectable with greater certainty go lower on the list), we shouldn't
17406 need two phases. for example, let's say the example above was also
17407 in CRLF format. The EOL detector (which really detects *plain text*
17408 with a particular EOL type) would return at most level 0 for all
17409 results until the text file is reached, whereas the base64, gzip or
17410 euc-jp decoders will return higher. Once the text file is reached,
17411 the EOL detector will return 0 or higher for the CRLF encoding, and
17412 all other detectors will return 0 or lower; thus, we will successfully
17413 proceed through CRLF decoding, or at worst prompt the user. (The only
17414 external-vs-internal distinction that might make sense here is to
17415 favor coding systems of the correct source type over those that
17416 require conversion between external and internal; if done right, this
17417 could allow the CRLF detector to return level 1 for all CRLF-encoded
17418 text files, even those that look like Base-64 or similar encoding, so
17419 that CRLF encoding will always get decoded without prompting, but not
17420 interfere with other decoders. On the other hand, this
17421 external-vs-internal distinction may not matter at all -- with
17422 automatic internal-external conversion, CRLF decoding can occur
17423 before or after decoding of euc-jp, base64, iso2022, or similar,
17424 without any difference in the final results.)
17425
17426 #### What are we trying to say? In base64, the CRLF decoding before
17427 base64 decoding is irrelevant, they will be thrown out as whitespace
17428 is not significant in base64.
17429
17430 [sjt considers all of this to be rather bogus. Ideas like "greater
17431 certainty" and "distinctive" can and should be quantified. The issue
17432 of proper table organization should be a question of optimization.]
17433
17434 [sjt wonders if it might not be a good idea to use Unicode's newline
17435 character as the internal representation so that (for non-Unicode
17436 coding systems) we can catch EOL bugs on Unix too.]
17437
17438 @item
17439 There need to be two priority lists and two
17440 category->coding-system lists. Once is general, the other
17441 category->langenv-specific. The user sets the former, the langenv
17442 category->the latter. The langenv-specific entries take precedence
17443 category->over the others. This works similarly to the
17444 category->category->Unicode charset priority list.
17445
17446 @item
17447 The simple list of coding categories per detectors is not enough.
17448 Instead of coding categories, we need parameters. For example,
17449 Unicode might have separate detectors for UTF-8, UTF-7, UTF-16,
17450 and perhaps UCS-4; or UTF-16/UCS-4 would be one detection type.
17451 UTF-16 would have parameters such as "little-endian" and "needs BOM",
17452 and possibly another one like "collapse/expand/leave alone composite
17453 sequences" once we add this support. Usually these parameters
17454 correspond directly to a coding system parameter. Different
17455 likelihood values can be specified for each parameter as well as for
17456 the detection type as a whole. The user can specify particular
17457 coding systems for a particular combination of detection type and
17458 parameters, or can give "default parameters" associated with a
17459 detection type. In the latter case, we create a new coding system as
17460 necessary that corresponds to the detected type and parameters.
17461
17462 @item
17463 a better means of presentation. rather than just coming up
17464 with the new file decoded according to the detected coding
17465 system, allow the user to browse through the file and
17466 conveniently reject it if it looks wrong; then detection
17467 starts again, but with that possibility removed. in cases where
17468 certainty is low and thus more than one possibility is presented,
17469 the user can browse each one and select one or reject them all.
17470
17471 @item
17472 fail-safe: even after the user has made a choice, if they
17473 later on realize they have the wrong coding system, they can
17474 go back, and we've squirreled away the original data so they
17475 can start the process over. this may be tricky.
17476
17477 @item
17478 using a larger buffer for detection. we use just a small
17479 piece, which can give quite random results. we may need to
17480 buffer up all the data we look through because we can't
17481 necessarily rewind. the idea is we proceed until we get a
17482 result that's at least at a certain level of certainty
17483 (e.g. "probable") or we reached a maximum limit of how much
17484 we want to buffer.
17485
17486 @item
17487 dealing with interactive systems. we might need to go ahead
17488 and present the data before we've finished detection, and
17489 then re-decode it, perhaps multiple times, as we get better
17490 detection results.
17491
17492 @item
17493 Clearly some of these are more important than others. at the
17494 very least, the "better means of presentation" should be
17495 implemented as soon as possible, along with a very simple means
17496 of fail-safe whenever the data is readibly available, e.g. it's
17497 coming from a file, which is the most common scenario.
17498 @end itemize
17499
17500 ben [at least that's what sjt thinks]
17501
17502 *****
17503
17504 While this is clearly something of an improvement over earlier designs,
17505 it doesn't deal with the most important issue: to do better than categories
17506 (which in the medium term is mostly going to mean "which flavor of Unicode
17507 is this?"), we need to look at statistical behavior rather than ruling out
17508 categories via presence of specific sequences. This means the stream
17509 processor should
17510
17511 @enumerate
17512 @item
17513 keep octet distributions (octet, 2-, 3-, 4- octet sequences)
17514 @item
17515 in some kind of compressed form
17516 @item
17517 look for "skip features" (eg, characteristic behavior of leading
17518 bytes for UTF-7, UTF-8, UTF-16, Mule code)
17519 @item
17520 pick up certain "simple" regexps
17521 @item
17522 provide "triggers" to determine when statistical detectors should be
17523 invoked, such as octet count
17524 @item
17525 and "magic" like Unicode signatures or file(1) magic.
17526 @end enumerate
17527
17528 --sjt
17529
17530 @node Future Work -- Conversion Error Detection, Future Work -- BIDI Support, Future Work -- Autodetection, Future Work -- Byte Code Snippets
17531 @subsection Future Work -- Conversion Error Detection
17532 @cindex future work, conversion error detection
17533 @cindex conversion error detection, future work
17534
17535 @subheading "No Corruption" Scheme for Preserving External Encoding when Non-Invertible Transformation Applied
17536
17537 A preliminary and simple implementation is:
17538
17539 @quotation
17540 But you could implement it much more simply and usefully by just
17541 determining, for any text being decoded into mule-internal, can we go
17542 back and read the source again? If not, remember the entire file
17543 (GNUS message, etc) in text properties. Then, implement the UI
17544 interface (like Netscape's) on top of that. This way, you have
17545 something that at least works, but it might be inefficient. All we
17546 would need to do is work on making the underlying implementation more
17547 efficient.
17548 @end quotation
17549
17550 A more detailed proposal for avoiding binary file corruption is
17551
17552 @quotation
17553 Basic idea: A coding system is a filter converting an entire input
17554 stream into an output stream. The resulting stream can be said to be
17555 "correspondent to" the input stream. Similarly, smaller units can
17556 correspond. These could potentially include zero width intervals on
17557 either side, but we avoid this. Specifically, the coding system works
17558 like:
17559
17560 @example
17561 loop (input) @{
17562
17563 Read bytes till we have enough to generate a translated character or a chars.
17564
17565 This establishes a "correspondence" between the whole input and
17566 output more or less in minimal chunks.
17567
17568 @}
17569 @end example
17570
17571 We then do the following processing:
17572
17573 @enumerate
17574 @item
17575 Eliminate correspondences where one or the other of the I/O streams
17576 has a zero interval by combining with an adjacent interval;
17577
17578 @item
17579 Group together all adjacent "identity" correspondences into as
17580 large groups as possible;
17581
17582 @item
17583 Use text properties to store the non-identity correspondences on
17584 the characters. For identity correspondences, use a simple text
17585 property on all that contains no data but just indicates that the
17586 whole string of text is identity corresponded. (How do we define
17587 "identity"? Latin 1 or could it be something else? For example,
17588 Latin 2)?
17589
17590 @item
17591 Figure out the procedures when text is inserted/deleted and copied
17592 or pasted.
17593
17594 @item
17595 Figure out to save the file out making use of the
17596 correspondences. Allow ways of saving without correspondences, and
17597 doing a "save to buffer with and without correspondences." Need to
17598 be clever when dealing with modal coding systems to parse the
17599 correspondences to get the internal state right.
17600 @end enumerate
17601 @end quotation
17602
17603 @subheading Another Error-Catching Idea
17604
17605 Nov 4, 1999
17606
17607 Finally, I don't think "save the input" is as hard as you make it out to
17608 be. Conceptually, in fact, it's simple: for each minimal group of bytes
17609 where you cannot absolutely guarantee that an external->internal
17610 transformation is reversible, you put a text property on the
17611 corresponding internal character indicating the bytes that generated
17612 this character. We also put a text property on every character,
17613 indicating the coding system that caused the transformation. This
17614 latter text property is extremely efficient (e.g. in a buffer with no
17615 data pasted from elsewhere, it will map to a single extent over all the
17616 buffer), and the former cases should not be prevalent enough to cause a
17617 lot of inefficiency, esp. if we define what "reversible" means for each
17618 coding system in such a way that it correctly handles the most common
17619 cases. The hardest part, in fact, is making all the string/text
17620 handling in XEmacs be robust w.r.t. text properties.
17621
17622 @subheading Strategies for Error Annotation and Coding Orthogonalization
17623
17624 From sjt (?):
17625
17626 We really want to separate out a number of things. Conceptually,
17627 there is a nested syntax.
17628
17629 At the top level is the ISO 2022 extension syntax, including charset
17630 designation and invocation, and certain auxiliary controls such as the
17631 ISO 6429 direction specification. These are octet-oriented, with the
17632 single exception (AFAIK) of the "exit Unicode" sequence which uses the
17633 UTF's natural width (1 byte for UTF-7 and UTF-8, 2 bytes for UCS-2 and
17634 UTF-16, and 4 bytes for UCS-4 and UTF-32). This will be treated as a
17635 (deprecated) special case in Unicode processing.
17636
17637 The middle layer is ISO 2022 character interpretation. This will depend
17638 on the current state of the ISO 2022 registers, and assembles octets
17639 into the character's internal representation.
17640
17641 The lowest level is translating system control conventions. At present
17642 this is restricted to newline translation, but one could imagine doing
17643 tab conversion or line wrapping here. "Escape from Unicode" processing
17644 would be done at this level.
17645
17646 At each level the parser will verify the syntax. In the case of a
17647 syntax error or warning (such as a redundant escape sequence that affects
17648 no characters), the parser will take some action, typically inserting the
17649 erroneous octets directly into the output and creating an annotation
17650 which can be used by higher level I/O to mark the affected region.
17651
17652 This should make it possible to do something sensible about separating
17653 newline convention processing from character construction, and about
17654 preventing ISO 2022 escape sequences from being recognized
17655 inappropriately.
17656
17657 The basic strategy will be to have octet classification tables, and
17658 switch processing according to the table entry.
17659
17660 It's possible that, by doing the processing with tables of functions or
17661 the like, the parser can be used for both detection and translation.
17662
17663 @subheading Handling Writing a File Safely, Without Data Loss
17664
17665 From ben:
17666
17667 @quotation
17668 When writing a file, we need error detection; otherwise somebody
17669 will create a Unicode file without realizing the coding system
17670 of the buffer is Raw, and then lose all the non-ASCII/Latin-1
17671 text when it's written out. We need two levels
17672
17673 @enumerate
17674 @item
17675 first, a "safe-charset" level that checks before any actual
17676 encoding to see if all characters in the document can safely
17677 be represented using the given coding system. FSF has a
17678 "safe-charset" property of coding systems, but it's stupid
17679 because this information can be automatically derived from
17680 the coding system, at least the vast majority of the time.
17681 What we need is some sort of
17682 alternative-coding-system-precedence-list, langenv-specific,
17683 where everything on it can be checked for safe charsets and
17684 then the user given a list of possibilities. When the user
17685 does "save with specified encoding", they should see the same
17686 precedence list. Again like with other precedence lists,
17687 there's also a global one, and presumably all coding systems
17688 not on other list get appended to the end (and perhaps not
17689 checked at all when doing safe-checking?). safe-checking
17690 should work something like this: compile a list of all
17691 charsets used in the buffer, along with a count of chars
17692 used. that way, "slightly unsafe" coding systems can perhaps
17693 be presented at the end, which will lose only a few characters
17694 and are perhaps what the users were looking for.
17695
17696 [sjt sez this whole step is a crock. If a universal coding system
17697 is unacceptable, the user had better know what he/she is doing,
17698 and explicitly specify a lossy encoding.
17699 In principle, we can simply check for characters being writable as
17700 we go along. Eg, via an "unrepresentable character handler." We
17701 still have the buffer contents. If we can't successfully save,
17702 then ask the user what to do. (Do we ever simply destroy previous
17703 file version before completing a write?)]
17704
17705 @item
17706 when actually writing out, we need error checking in case an
17707 individual char in a charset can't be written even though the
17708 charsets are safe. again, the user gets the choice of other
17709 reasonable coding systems.
17710
17711 [sjt -- something is very confused, here; safe charsets should be
17712 defined as those charsets all of whose characters can be encoded.]
17713
17714 @item
17715 same thing (error checking, list of alternatives, etc.) needs
17716 to happen when reading! all of this will be a lot of work!
17717 @end enumerate
17718 @end quotation
17719
17720 --ben
17721
17722 I don't much like Ben's scheme. First, this isn't an issue of I/O,
17723 it's a coding issue. It can happen in many places, not just on stream
17724 I/O. Error checking should take place on all translations. Second,
17725 the two-pass algorithm should be avoided if possible. In some cases
17726 (eg, output to a tty) we won't be able to go back and change the
17727 previously output data. Third, the whole idea of having a buffer full
17728 of arbitrary characters which we're going to somehow shoehorn into a
17729 file based on some twit user's less than informed idea of a coding system
17730 is kind of laughable from the start. If we're going to say that a buffer
17731 has a coding system, shouldn't we enforce restrictions on what you can
17732 put into it? Fourth, what's the point of having safe charsets if some
17733 of the characters in them are unsafe? Fifth, what makes you think we're
17734 going to have a list of charsets? It seems to me that there might be
17735 reasons to have user-defined charsets (eg, "German" vs "French" subsets
17736 of ISO 8859/15). Sixth, the idea of having language environment determine
17737 precedence doesn't seem very useful to me. Users who are working with a
17738 language that corresponds to the language environment are not going to
17739 run into safe charsets problems. It's users who are outside of their
17740 usual language environment who run into trouble. Also, the reason for
17741 specifying anything other than a universal coding system is normally
17742 restrictions imposed by other users or applications. Seventh, the
17743 statistical feedback isn't terribly useful. Users rarely "want" a
17744 coding system, they want their file saved in a useful way. We could
17745 add a FORCE argument to conversions for those who really want a specific
17746 coding system. But mostly, a user might want to edit out a few unsafe
17747 characters. So (up to some maximum) we should keep a list of unsafe
17748 text positions, and provide a convenient function for traversing them.
17749
17750 --sjt
17751
17752 @node Future Work -- BIDI Support, Future Work -- Localized Text/Messages, Future Work -- Conversion Error Detection, Future Work -- Byte Code Snippets
17753 @subsection Future Work -- BIDI Support
17754 @cindex future work, bidi support
17755 @cindex bidi support, future work
17756
17757 @enumerate
17758 @item
17759 Use text properties to handle nesting levels, overrides
17760 BIDI-specific text properties (as per Unicode BIDI algorithm)
17761 computed at text insertion time.
17762
17763 @item
17764 Lisp API for reordering a display line at redisplay time,
17765 possibly substitution of different glyphs (esp. mirroring of
17766 glyphs).
17767
17768 @item
17769 Lisp API called after a display line is laid out, but only when
17770 reordering may be necessary (display engine checks for
17771 non-uniform BIDI text properties; can handle internally a line
17772 that's completely in one direction)
17773
17774 @item
17775 Default direction is a buffer-local variable
17776
17777 @item
17778 We concentrate on implementing Unicode BIDI algorithm.
17779
17780 @item
17781 Display support for mirroring of entire window
17782
17783 @item
17784 Display code keeps track of mirroring junctures so it can
17785 display double cursor.
17786
17787 @item
17788 Entire layout of screen (on a per window basis) is exported as a
17789 Lisp API, for visual editing (also very useful for other
17790 purposes e.g. proper handling of word wrapping with proportional
17791 fonts, complex Lisp layout engines e.g. W3)
17792
17793 @item
17794 Logical, visual, etc. cursor movement handled entirely in Lisp,
17795 using aforementioned API, plus a specifier for controlling how
17796 cursor is shown (e.g. split or not).
17797 @end enumerate
17798
17799 @node Future Work -- Localized Text/Messages, , Future Work -- BIDI Support, Future Work -- Byte Code Snippets
17800 @subsection Future Work -- Localized Text/Messages
17801 @cindex future work, localized text/messages
17802 @cindex localized text/messages, future work
17803
17804 NOTE: There is existing message translation in X Windows of menu names.
17805 This is handled through X resources. The files are in
17806 @file{PACKAGES/mule-packages/locale/app-defaults/LOCALE/Emacs}, where
17807 @var{locale} is @samp{ja}, @samp{fr}, etc.
17808
17809 See lib-src/make-msgfile.lex.
17810
17811 Long comment from jwz, some additions from ben marked "ben":
17812
17813 (much of this comment is outdated, and a lot of it is actually
17814 implemented)
17815
17816 @subsection Proposal for How This All Ought to Work
17817
17818 this isn't implemented yet, but this is the plan-in-progress
17819
17820 In general, it's accepted that the best way to internationalize is for all
17821 messages to be referred to by a symbolic name (or number) and come out of a
17822 table or tables, which are easy to change.
17823
17824 However, with Emacs, we've got the task of internationalizing a huge body
17825 of existing code, which already contains messages internally.
17826
17827 For the C code we've got two options:
17828
17829 @itemize @bullet
17830 @item
17831 Use a Sun-like @code{gettext()} form, which takes an "english" string which
17832 appears literally in the source, and uses that as a hash key to find
17833 a translated string;
17834 @item
17835 Rip all of the strings out and put them in a table.
17836 @end itemize
17837
17838 In this case, it's desirable to make as few changes as possible to the C
17839 code, to make it easier to merge the code with the FSF version of emacs
17840 which won't ever have these changes made to it. So we should go with the
17841 former option.
17842
17843 The way it has been done (between 19.8 and 19.9) was to use @code{gettext()}, but
17844 @strong{also} to make massive changes to the source code. The goal now is to use
17845 @code{gettext()} at run-time and yet not require a textual change to every line
17846 in the C code which contains a string constant. A possible way to do this
17847 is described below.
17848
17849 (@code{gettext()} can be implemented in terms of @code{catgets()} for non-Sun systems, so
17850 that in itself isn't a problem.)
17851
17852 For the Lisp code, we've got basically the same options: put everything in
17853 a table, or translate things implicitly.
17854
17855 Another kink that lisp code introduces is that there are thousands of third-
17856 party packages, so changing the source for all of those is simply not an
17857 option.
17858
17859 Is it a goal that if some third party package displays a message which is
17860 one we know how to translate, then we translate it? I think this is a
17861 worthy goal. It remains to be seen how well it will work in practice.
17862
17863 So, we should endeavor to minimize the impact on the lisp code. Certain
17864 primitive lisp routines (the stuff in lisp/prim/, and especially in
17865 cmdloop.el and minibuf.el) may need to be changed to know about translation,
17866 but that's an ideologically clean thing to do because those are considered
17867 a part of the emacs substrate.
17868
17869 However, if we find ourselves wanting to make changes to, say, RMAIL, then
17870 something has gone wrong. (Except to do things like remove assumptions
17871 about the order of words within a sentence, or how pluralization works.)
17872
17873 There are two parts to the task of displaying translated strings to the
17874 user: the first is to extract the strings which need to be translated from
17875 the sources; and the second is to make some call which will translate those
17876 strings before they are presented to the user.
17877
17878 The old way was to use the same form to do both, that is, @code{GETTEXT()} was both
17879 the tag that we searched for to build a catalog, and was the form which did
17880 the translation. The new plan is to separate these two things more: the
17881 tags that we search for to build the catalog will be stuff that was in there
17882 already, and the translation will get done in some more centralized, lower
17883 level place.
17884
17885 This program (make-msgfile.c) addresses the first part, extracting the
17886 strings.
17887
17888 For the emacs C code, we need to recognize the following patterns:
17889
17890 @example
17891 message ("string" ... )
17892 error ("string")
17893 report_file_error ("string" ... )
17894 signal_simple_error ("string" ... )
17895 signal_simple_error_2 ("string" ... )
17896
17897 build_translated_string ("string")
17898 #### add this and use it instead of @code{build_string()} in some places.
17899
17900 yes_or_no_p ("string" ... )
17901 #### add this instead of funcalling Qyes_or_no_p directly.
17902
17903 barf_or_query_if_file_exists #### restructure this
17904 check all callers of Fsignal #### restructure these
17905 signal_error (Qerror ... ) #### change all of these to @code{error()}
17906
17907 And we also parse out the @code{interactive} prompts from @code{DEFUN()} forms.
17908
17909 #### When we've got a string which is a candidate for translation, we
17910 should ignore it if it contains only format directives, that is, if
17911 there are no alphabetic characters in it that are not a part of a `%'
17912 directive. (Careful not to translate either "%s%s" or "%s: ".)
17913 @end example
17914
17915 For the emacs Lisp code, we need to recognize the following patterns:
17916
17917 @example
17918 (message "string" ... )
17919 (error "string" ... )
17920 (format "string" ... )
17921 (read-from-minibuffer "string" ... )
17922 (read-shell-command "string" ... )
17923 (y-or-n-p "string" ... )
17924 (yes-or-no-p "string" ... )
17925 (read-file-name "string" ... )
17926 (temp-minibuffer-message "string")
17927 (query-replace-read-args "string" ... )
17928 @end example
17929
17930 I expect there will be a lot like the above; basically, any function which
17931 is a commonly used wrapper around an eventual call to @code{message} or
17932 @code{read-from-minibuffer} needs to be recognized by this program.
17933
17934
17935 @example
17936 (dgettext "domain-name" "string") #### do we still need this?
17937
17938 things that should probably be restructured:
17939 @code{princ} in cmdloop.el
17940 @code{insert} in debug.el
17941 face-interactive
17942 help.el, syntax.el all messed up
17943 @end example
17944
17945 ben: (format) is a tricky case. If I use format to create a string
17946 that I then send to a file, I probably don't want the string translated.
17947 On the other hand, If the string gets used as an argument to (y-or-n-p)
17948 or some such function, I do want it translated, and it needs to be
17949 translated before the %s and such are replaced. The proper solution
17950 here is for (format) and other functions that call gettext but don't
17951 immediately output the string to the user to add the translated (and
17952 formatted) string as a string property of the object, and have
17953 functions that output potentially translated strings look for a
17954 "translated string" property. Of course, this will fail if someone
17955 does something like
17956
17957 @example
17958 (y-or-n-p (concat (if you-p "Do you " "Does he ")
17959 (format "want to delete %s? " filename))))
17960 @end example
17961
17962 But you shouldn't be doing things like this anyway.
17963
17964 ben: Also, to avoid excessive translating, strings should be marked
17965 as translated once they get translated, and further calls to gettext
17966 don't do any more translating. Otherwise, a call like
17967
17968 @example
17969 (y-or-n-p (format "Delete %s? " filename))
17970 @end example
17971
17972 would cause translation on both the pre-formatted and post-formatted
17973 strings, which could lead to weird results in some cases (y-or-n-p
17974 has to translate its argument because someone could pass a string to
17975 it directly). Note that the "translating too much" solution outlined
17976 below could be implemented by just marking all strings that don't
17977 come from a .el or .elc file as already translated.
17978
17979 Menu descriptors: one way to extract the strings in menu labels would be
17980 to teach this program about "^(defvar .*menu\n" forms; that's probably
17981 kind of hard, though, so perhaps a better approach would be to make this
17982 program recognize lines of the form
17983
17984 @example
17985 "string" ... ;###translate
17986 @end example
17987
17988 where the magic token ";###translate" on a line means that the string
17989 constant on this line should go into the message catalog. This is analogous
17990 to the magic ";###autoload" comments, and to the magic comments used in the
17991 EPSF structuring conventions.
17992
17993 -----
17994 So this program manages to build up a catalog of strings to be translated.
17995 To address the second part of the problem, of actually looking up the
17996 translations, there are hooks in a small number of low level places in
17997 emacs.
17998
17999 Assume the existence of a C function gettext(str) which returns the
18000 translation of @var{str} if there is one, otherwise returns @var{str}.
18001
18002 @itemize @bullet
18003 @item
18004 @code{message()} takes a char* as its argument, and always filters it through
18005 @code{gettext()} before displaying it.
18006
18007 @item
18008 errors are printed by running the lisp function @code{display-error} which
18009 doesn't call @code{message} directly (it princ's to streams), so it must be
18010 carefully coded to translate its arguments. This is only a few lines
18011 of code.
18012
18013 @item
18014 @code{Fread_minibuffer_internal()} is the lowest level interface to all minibuf
18015 interactions, so it is responsible for translating the value that will go
18016 into Vminibuf_prompt.
18017
18018 @item
18019 Fpopup_menu filters the menu titles through @code{gettext()}.
18020
18021 The above take care of 99% of all messages the user ever sees.
18022
18023 @item
18024 The lisp function temp-minibuffer-message translates its arg.
18025
18026 @item
18027 query-replace-read-args is funny; it does
18028 (setq from (read-from-minibuffer (format "%s: " string) ... ))
18029 (setq to (read-from-minibuffer (format "%s %s with: " string from) ... ))
18030 @end itemize
18031
18032 What should we do about this? We could hack query-replace-read-args to
18033 translate its args, but might this be a more general problem? I don't
18034 think we ought to translate all calls to format. We could just change
18035 the calling sequence, since this is odd in that the first %s wants to be
18036 translated but the second doesn't.
18037
18038 Solving the "translating too much" problem:
18039
18040 The concern has been raised that in this situation:
18041
18042 @itemize @bullet
18043 @item
18044 "Help" is a string for which we know a translation;
18045 @item
18046 someone visits a file called Help, and someone does something
18047 contrived like (error buffer-file-name)
18048 @end itemize
18049
18050 then we would display the translation of Help, which would not be correct.
18051 We can solve this by adding a bit to Lisp_String objects which identifies
18052 them as having been read as literal constants from a .el or .elc file (as
18053 opposed to having been constructed at run time as it would in the above
18054 case.) To solve this:
18055
18056 @example
18057 - @code{Fmessage()} takes a lisp string as its first argument.
18058 If that string is a constant, that is, was read from a source file
18059 as a literal, then it calls @code{message()} with it, which translates.
18060 Otherwise, it calls @code{message_no_translate()}, which does not translate.
18061
18062 - @code{Ferror()} (actually, @code{Fsignal()} when condition is Qerror) works similarly.
18063 @end example
18064
18065 More specifically, we do:
18066
18067 @quotation
18068 Scan specified C and Lisp files, extracting the following messages:
18069
18070 @example
18071 C files:
18072 GETTEXT (...)
18073 DEFER_GETTEXT (...)
18074 DEFUN interactive prompts
18075 Lisp files:
18076 (gettext ...)
18077 (dgettext "domain-name" ...)
18078 (defer-gettext ...)
18079 (interactive ...)
18080 @end example
18081
18082 The arguments given to this program are all the C and Lisp source files
18083 of GNU Emacs. .el and .c files are allowed. There is no support for .elc
18084 files at this time, but they may be specified; the corresponding .el file
18085 will be used. Similarly, .o files can also be specified, and the corresponding
18086 .c file will be used. This helps the makefile pass the correct list of files.
18087
18088 The results, which go to standard output or to a file specified with -a or -o
18089 (-a to append, -o to start from nothing), are quoted strings wrapped in
18090 gettext(...). The results can be passed to xgettext to produce a .po message
18091 file.
18092
18093 However, we also need to do the following:
18094
18095 @enumerate
18096 @item
18097 Definition of Arg below won't handle a generalized argument
18098 as might appear in a function call. This is fine for DEFUN
18099 and friends, because only simple arguments appear there; but
18100 it might run into problems if Arg is used for other sorts
18101 of functions.
18102 @item
18103 @code{snarf()} should be modified so that it doesn't output null
18104 strings and non-textual strings (see the comment at the top
18105 of make-msgfile.c).
18106 @item
18107 parsing of (insert) should snarf all of the arguments.
18108 @item
18109 need to add set-keymap-prompt and deal with gettext of that.
18110 @item
18111 parsing of arguments should snarf all strings anywhere within
18112 the arguments, rather than just looking for a string as the
18113 argument. This allows if statements as arguments to get parsed.
18114 @item
18115 @code{begin_paren_counting()} et al. should handle recursive entry.
18116 @item
18117 handle set-window-buffer and other such functions that take
18118 a buffer as the other-than-first argument.
18119 @item
18120 there is a fair amount of work to be done on the C code.
18121 Look through the code for #### comments associated with
18122 '#ifdef I18N3' or with an I18N3 nearby.
18123 @item
18124 Deal with @code{get-buffer-process} et al.
18125 @item
18126 Many of the changes in the Lisp code marked
18127 'rewritten for I18N3 snarfing' should be undone once (5) is
18128 implemented.
18129 @item
18130 Go through the Lisp code in prim and make sure that all
18131 strings are gettexted as necessary. This may reveal more
18132 things to implement.
18133 @item
18134 Do the equivalent of (8) for the Lisp code.
18135 @item
18136 Deal with parsing of menu specifications.
18137 @end enumerate
18138 @end quotation
18139
18140 @node Future Work -- Lisp Stream API, Future Work -- Multiple Values, Future Work -- Byte Code Snippets, Future Work
18141 @section Future Work -- Lisp Stream API
18142 @cindex future work, Lisp stream API
18143 @cindex Lisp stream API, future work
18144
18145 Expose XEmacs internal lstreams to Lisp as stream objects. (In
18146 addition to the functions given below, each stream object has
18147 properties that can be associated with it using the standard put, get
18148 etc. API. For GNU Emacs, where put and get have not been extended to
18149 be general property functions, but work only on strings, we would have
18150 to create functions set-stream-property, stream-property,
18151 remove-stream-property, and stream-properties. These provide the same
18152 functionality as the generic get, put, remprop, and object-plist
18153 functions under XEmacs)
18154
18155 (Implement properties using a hash table, and @strong{generalize} this so
18156 that it is extremely easy to add a property interface onto any kind
18157 of object)
18158
18159 @example
18160 (write-stream STREAM STRING)
18161 @end example
18162
18163 Write the STRING to the STREAM. This will signal an error if all the
18164 bytes cannot be written.
18165
18166 @example
18167 (read-stream STREAM &optional N SEQUENCE)
18168 @end example
18169
18170 Reads data from STREAM. N specifies the number of bytes or
18171 characters, depending on the stream. SEQUENCE specifies where to
18172 write the data into. If N is not specified, data is read until end of
18173 file. If SEQUENCE is not specified, the data is returned as a stream.
18174 If SEQUENCE is specified, the SEQUENCE must be large enough to hold
18175 the data.
18176
18177 @example
18178 (push-stream-marker STREAM)
18179 @end example
18180
18181 returns ID, probably a stream marker object
18182
18183 @example
18184 (pop-stream-marker STREAM)
18185 @end example
18186
18187 backs up stream to last marker
18188
18189 @example
18190 (unread-stream STREAM STRING)
18191 @end example
18192
18193 The only valid STREAM is an input stream in which case the data in
18194 STRING is pushed back and will be read ahead of all other data. In
18195 general, there is no limit to the amount of data that can be unread or
18196 the number of times that unread-stream can be called before another
18197 read.
18198
18199 @example
18200 (stream-available-chars STREAM)
18201 @end example
18202
18203 This returns the number of characters (or bytes) that can definitely
18204 be read from the screen without an error. This can be useful, for
18205 example, when dealing with non-blocking streams when an attempt to
18206 read too much data will result in a blocking error.
18207
18208 @example
18209 (stream-seekable-p STREAM)
18210 @end example
18211
18212 Returns true if the stream is seekable. If false, operations such as
18213 seek-stream and stream-position will signal an error. However, the
18214 functions set-stream-marker and seek-stream-marker will still succeed
18215 for an input stream.
18216
18217 @example
18218 (stream-position STREAM)
18219 @end example
18220
18221 If STREAM is a seekable stream, returns a position which can be passed
18222 to seek-stream.
18223
18224 @example
18225 (seek-stream STREAM N)
18226 @end example
18227
18228 If STREAM is a seekable stream, move to the position indicated by N,
18229 otherwise signal an error.
18230
18231 @example
18232 (set-stream-marker STREAM)
18233 @end example
18234
18235 If STREAM is an input stream, create a marker at the current position,
18236 which can later be moved back to. The stream does not need to be a
18237 seekable stream. In this case, all successive data will be buffered
18238 to simulate the effect of a seekable stream. Therefore use this
18239 function with care.
18240
18241 @example
18242 (seek-stream-marker STREAM marker)
18243 @end example
18244
18245 Move the stream back to the position that was stored in the marker
18246 object. (this is generally an opaque object of type stream-marker).
18247
18248 @example
18249 (delete-stream-marker MARKER)
18250 @end example
18251
18252 Destroy the stream marker and if the stream is a non-seekable stream
18253 and there are no other stream markers pointing to an earlier position,
18254 frees up some buffering information.
18255
18256 @example
18257 (delete-stream STREAM N)
18258 @end example
18259
18260 @example
18261 (delete-stream-marker STREAM ID)
18262 @end example
18263
18264 @example
18265 (close-stream stream)
18266 @end example
18267
18268 Writes any remaining data to the stream and closes it and the object
18269 to which it's attached. This also happens automatically when the
18270 stream is garbage collected.
18271
18272 @example
18273 (getchar-stream STREAM)
18274 @end example
18275
18276 Return a single character from the stream. (This may be a single byte
18277 depending on the nature of the stream). This is actually a macro with
18278 an extremely efficient implementation (as efficient as you can get in
18279 Emacs Lisp), so that this can be used without fear in a loop. The
18280 implementation works by reading a large amount of data into a vector
18281 and then simply using the function AREF to read characters one by one
18282 from the vector. Because AREF is one of the primitives handled
18283 specially by the byte interpreter, this will be very efficient. The
18284 actual implementation may in fact use the function
18285 call-with-condition-handler to avoid the necessity of checking for
18286 overflow. Its typical implementation is to fetch the vector
18287 containing the characters as a stream property, as well as the index
18288 into that vector. Then it retrieves the character and increments the
18289 value and stores it back in the stream. As a first implementation, we
18290 check to see when we are reading the character whether the character
18291 would be out of range. If so, we read another 4096 characters,
18292 storing them into the same vector, setting the index back to the
18293 beginning, and then proceeding with the rest of the getchar algorithm.
18294
18295 @example
18296 (putchar-stream STREAM CHAR)
18297 @end example
18298
18299 This is similar to getchar-stream but it writes data instead of
18300 reading data.
18301
18302 @example
18303 Function make-stream
18304 @end example
18305
18306 There are actually two stream-creation functions, which are:
18307
18308 @example
18309 (make-input-stream TYPE PROPERTIES)
18310 (make-output-stream TYPE PROPERTIES)
18311 @end example
18312
18313 These can be used to create a stream that reads data, or writes data,
18314 respectively. PROPERTIES is a property list and the allowable
18315 properties in it are defined by the type. Possible types are:
18316
18317 @enumerate
18318 @item
18319 @code{file} (this reads data from a file or writes to a file)
18320
18321 Allowable properties are:
18322
18323 @table @code
18324 @item :file-name
18325 (the name of the file)
18326
18327 @item :create
18328 (for output streams only, creates the file if it doesn't
18329 already exist)
18330
18331 @item :exclusive
18332 (for output streams only, fails if the file already
18333 exists)
18334
18335 @item :append
18336 (for output streams only; starts appending to the end
18337 of the file rather than overwriting the file)
18338
18339 @item :offset
18340 (positions in bytes in the file where reading or writing
18341 should begin. If unspecified, defaults to the beginning of the
18342 file or to the end of the file when :appended specified)
18343
18344 @item :count
18345 (for input streams only, the number of bytes to read from
18346 the file before signaling "end of file". If nil or omitted, the
18347 number of bytes is unlimited)
18348
18349 @item :non-blocking
18350 (if true, reads or writes will fail if the operation
18351 would block. This only makes sense for non-regular files).
18352 @end table
18353
18354 @item
18355 @code{process} (For output streams only, send data to a process.)
18356
18357 Allowable properties are:
18358
18359 @table @code
18360 @item :process
18361 (the process object)
18362 @end table
18363
18364 @item
18365 @code{buffer} (Read from or write to a buffer.)
18366
18367 Allowable properties are:
18368
18369 @table @code
18370 @item :buffer
18371 (the name of the buffer or the buffer object.)
18372
18373 @item :start
18374 (the position to start reading from or writing to. If nil,
18375 use the buffer point. If true, use the buffer's point and move
18376 point beyond the end of the data read or written.)
18377
18378 @item :end
18379 (only for input streams, the position to stop reading at. If
18380 nil, continue to the end of the buffer.)
18381
18382 @item :ignore-accessible
18383 (if true, the default for :start and :end
18384 ignore any narrowing of the buffer.)
18385 @end table
18386
18387 @item
18388 @code{stream} (read from or write to a lisp stream)
18389
18390 Allowable properties are:
18391
18392 @table @code
18393 @item :stream
18394 (the stream object)
18395
18396 @item :offset
18397 (the position to begin to be reading from or writing to)
18398
18399 @item :length
18400 (For input streams only, the amount of data to read,
18401 defaulting to the rest of the data in the string. Revise string
18402 for output streams only if true, the stream is resized as
18403 necessary to accommodate data written off the end, otherwise the
18404 writes will fail.
18405 @end table
18406
18407 @item
18408 @code{memory} (For output only, writes data to an internal memory
18409 buffer. This is more lightweight than using a Lisp buffer. The
18410 function memory-stream-string can be used to convert the memory
18411 into a string.)
18412
18413 @item
18414 @code{debugging} (For output streams only, write data to the debugging
18415 output.)
18416
18417 @item
18418 @code{stream-device} (During non-interactive invocations only, Read
18419 from or write to the initial stream terminal device.)
18420
18421 @item
18422 @code{function} (For output streams only, send data by calling a
18423 function, exactly as with the STREAM argument to the print
18424 primitive.)
18425
18426 Allowable Properties are:
18427
18428 @table @code
18429 @item :function
18430 (the function to call. The function is called with one
18431 argument, the stream.)
18432 @end table
18433
18434 @item
18435 @code{marker} (Write data to the location pointed to by a marker and
18436 move the marker past the data.)
18437
18438 Allowable properties are:
18439
18440 @table @code
18441 @item :marker
18442 (the marker object.)
18443 @end table
18444
18445 @item
18446 @code{decoding} (As an input stream, reads data from another stream and
18447 decodes it according to a coding system. As an output stream
18448 decodes the data written to it according to a coding system and
18449 then writes results in another stream.)
18450
18451 Properties are:
18452
18453 @table @code
18454 @item :coding-system
18455 (the symbol of coding system object, which defines the
18456 decoding.)
18457
18458 @item :stream
18459 (the stream on the other end.)
18460 @end table
18461
18462 @item
18463 @code{encoding} (As an input stream, reads data from another stream and
18464 encodes it according to a coding system. As an output stream
18465 encodes the data written to it according to a coding system and
18466 then writes results in another stream.)
18467
18468 Properties are:
18469
18470 @table @code
18471 @item :coding-system
18472 (the symbol of coding system object, which defines the
18473 encoding.)
18474
18475 @item :stream
18476 (the stream on the other end.)
18477 @end table
18478 @end enumerate
18479
18480 Consider
18481
18482 @example
18483 (define-stream-type 'type
18484 :read-function
18485 :write-function
18486 :rewind-
18487 :seek-
18488 :tell-
18489 (?:buffer)
18490 @end example
18491
18492 Old Notes:
18493
18494 Expose lstreams as hash (put get etc. properties) table.
18495
18496 @example
18497 (write-stream stream string)
18498 (read-stream stream &optional n sequence)
18499 (make-stream ...)
18500 (push-stream-marker stream)
18501 returns ID prob a stream marker object
18502 (pop-stream-marker stream)
18503 backs up stream to last marker
18504 (unread-stream stream string)
18505 (stream-available-chars stream)
18506 (seek-stream stream n)
18507 (delete-stream stream n)
18508 (delete-stream-marker stream ic) can always be poe only nested if you
18509 have set stream marker
18510
18511 (get-char-stream @strong{generalizes} stream)
18512
18513 a macro that tries to be efficient perhaps by reading the next
18514 e.g. 512 characters into a vector and arefing them. Might check aref
18515 optimization for vectors in the byte interpreter.
18516
18517 (make-stream 'process :process ... :type write)
18518
18519 Consider
18520
18521 (define-stream-type 'type
18522 :read-function
18523 :write-function
18524 :rewind-
18525 :seek-
18526 :tell-
18527 (?:buffer)
18528 @end example
18529
18530 @node Future Work -- Multiple Values, Future Work -- Macros, Future Work -- Lisp Stream API, Future Work
18531 @section Future Work -- Multiple Values
18532 @cindex future work, multiple values
18533 @cindex multiple values, future work
18534
18535 On low level, all funs that can return multiple values are defined
18536 with DEFUN_MULTIPLE_VALUES and have an extra parameter, a struct
18537 mv_context *.
18538
18539 It has to be this way to ensure that only the fun itself, and no called
18540 funs, think they're called in an mv context.
18541
18542 apply, funcall, eval might propagate their mv context to their
18543 children?
18544
18545 Might need eval-mv to implement calling a fun in an mv context. Maybe
18546 also funcall_mv? apply_mv?
18547
18548 Generally, just set up context appropriately. Call fun (noticing
18549 whether it's an mv-aware fun) and binding values on the way back or
18550 passing them out. (e.g. to multiple-value-bind)
18551
18552 @subheading Common Lisp multiple values, required for specifier improvements.
18553
18554 The multiple return values from get-specifier should allow the
18555 specifier value to be modified in the correct fashion (i.e. should
18556 interact correctly with all manner of changes from other callers)
18557 using set-specifier. We should check this and see if we need other
18558 return values. (how-to-add? inst-list?)
18559
18560 In C, call multiple-values-context to get number of expected values,
18561 and multiple-value-set (#, value) to get values other than the first.
18562
18563 (Returns Qno_value, or something, if there are no values.
18564
18565 #### Or should throw? Probably not.
18566 #### What happens if a fn returns no values but the caller expects a
18567 #### value?
18568
18569 Something like @code{funcall_with_multiple_values()} for setting up the
18570 context.
18571
18572 For efficiency, byte code could notice Ffuncall to m.v. functions and
18573 sub in special opcodes during load in processing, if it mattered.
18574
18575 @node Future Work -- Macros, Future Work -- Specifiers, Future Work -- Multiple Values, Future Work
18576 @section Future Work -- Macros
18577 @cindex future work, macros
18578 @cindex macros, future work
18579
18580 @enumerate
18581 @item
18582 Option to control whether beep really kills a macro execution.
18583 @item
18584 Recently defined macros are remembered on a stack, so accidentally
18585 defining another one doesn't fuck you up. You can "rotate"
18586 anonymous macros or just pick one (numbered) to put on tags, so it
18587 works with execute macro - menu shows the anonymous macro, and
18588 lists some keystrokes. Normally numbered but you can easily assign
18589 to named fun or to keyboard sequence or give it a number (or give
18590 it a letter accelerator?)
18591 @end enumerate
18592
18593 @node Future Work -- Specifiers, Future Work -- Display Tables, Future Work -- Macros, Future Work
18594 @section Future Work -- Specifiers
18595 @cindex future work, specifiers
18596 @cindex specifiers, future work
18597
18598 @subheading Ideas To Work On When Their Time Has Come
18599
18600 @itemize
18601 @item
18602 specifier-instance returns additional params (multiple-value) - the instantiator
18603 used, the associated tag set, the locale found in, a code that can
18604 be passed in as an additional param RESTART to restart an
18605 instantiation process, e.g. to allow an instantiator to "inherit"
18606 from another one higher up. Also, domain can be 'global (look only
18607 in global specs) or "complex" - a list of the actual locales to look
18608 in (e.g. a buffer - frame - a device - 'global)
18609
18610 @item
18611 pragmatic-specifier-domain (locale)
18612 Converts a locale into a domain in a way that's "pragmatic" - does
18613 what most users expect will happen, but is not clean. In
18614 particular, handling of "buffer" requires trickiness, as mentioned
18615 before.
18616
18617 @item
18618 ensure-instantiator-exists (specifier locale)
18619 Ensures an actual instantiator exists in a locale, so that it can
18620 later be futzed with. If none exists, one is constructed by first
18621 calling pragmatic-specifier domain and then specifier-instance and
18622 fetching out the instantiator for this call.
18623
18624 @item
18625 map-modifying-instantiators (specifier fun &optional locale tag-set)
18626 Same args as map-specifier, but use the return value from the fun to
18627 replace the instantiator. Called with three args (instantiator
18628 locale tag-set)
18629
18630 @item
18631 map-modifying-instantiators-force (specifier fun &optional locale tag-set)
18632 Same as previous, but calls ensure-instantiator-exists on each
18633 locale before processing.
18634 @end itemize
18635
18636 NOTE: Can do preliminary implementation without Multiple Values -
18637 instead create fun specifier-instance - that returns a list (and will
18638 be deleted at some point)
18639
18640 @subheading specifier &c changes for glyphs
18641
18642 @enumerate
18643 @item
18644 @itemize @bullet
18645 @item
18646 resizable vectors with funs to insert, delete elements (elements
18647 shift accordingly)
18648 @item
18649 gap array vectors as an implementation of resizing vectors.
18650 @end itemize
18651
18652 @item
18653 You can @code{put} @code{get}, etc. on vectors to modify properties within
18654 them.
18655
18656 @item
18657 copy-over routines
18658 routines that carefully copy one complex item OVER another one,
18659 destroying the second in the process. I wrote one for lists. Need
18660 a general copy-over-tree.
18661
18662 @item
18663 improvement to specifier mapping routines e.g.
18664
18665 map-modifying-instantiator and its force versions below, so that we
18666 could implement in turns.
18667
18668 @item
18669 put-specifier-property (specifier which finds the key, value
18670 instantiator in the locale, &opt locale possibly creating one
18671 tag-set) if necessary and goes into the vector, changes it, and
18672 puts it back into the specifier.
18673
18674 @item
18675 Smarter add-spec-to-specifier
18676
18677 If it notices that it's just replacing one instantiator with
18678 another, instead of just copy-tree the first one and throw away the
18679 other, use copy-over-tree to save lots of garbage when repeatedly
18680 called.
18681
18682 ILLEGIBLE: GOTO LOO BUI BUGS LAST PNOTE
18683
18684 @item
18685 When at image instantiate:
18686 @itemize @bullet
18687 @item
18688 Some properties in the instantiators could be implemented through
18689 dynamically modifying an existing image instance (e.g. when the
18690 value of a slider or progress bar or text in a text field
18691 changes). So when we hash, we only hash the part of the
18692 instantiator that cannot be dynamically modified (We might need
18693 to do something tricky here - allowing a :key property in hash
18694 tables or @strong{ILLEGIBLE}). Anyway, so we need to generate an image
18695 instance, and we mask off the dynamic properties and look up in
18696 our hash table, and we get something back! But is it ours to
18697 modify? (We already checked to see it wasn't exactly the same
18698 dynamic properties that it had) Thus ---
18699 @end itemize
18700
18701 @item
18702 Reference counting. Somehow or other, each image instance in the
18703 cache needs to keep track of the instantiators that generated it.
18704 @end enumerate
18705
18706 It might do this through some sort of special instantiator-reference
18707 object. This points to the instantiator, where in the hierarchy the
18708 instantiator is etc. When an instantiator gets removed, this
18709 gu*ILLEGIBLE* values report not attached. Somehow that gets
18710 communicated back to the image instance in the cache. So somehow or
18711 other, the image instance in the cache knows who's using them and so
18712 when you go and keep updating the slider value, by simply modifying an
18713 instantiator, which efficiently changes the internal structure of this
18714 specifier - eventually image instantiate notices that the image
18715 instance it points has no other user and just modifiers it, but in
18716 complex situations, some optimizations get lost, but everything is
18717 still correct.
18718
18719 vs.
18720
18721 Andy's set-image-instance-property, which achieves the same
18722 optimizations much more easily, but
18723
18724 @enumerate
18725 @item
18726 falls apart in any more complicated system
18727
18728 @item
18729 only works because of the way the caching system in XEmacs works.
18730 Any change (e.g. @strong{ILLEGIBLE} more of making the caches GQ instead
18731 of GQ) is likely to make things stop working right in all but the
18732 simplest situation.
18733 @end enumerate
18734
18735 @subheading Specifier improvements for support of specifier inheritance (necessary for the new font mapping API)
18736
18737 'Fallback should be a locale/domain.
18738
18739 @example
18740 (get-specifier specifier &optional locale)
18741
18742 #### If locale is omitted, should it be (current-buffer) or 'global?
18743 #### Should argument not be optional?
18744 @end example
18745
18746 If a buffer is specified: find a window showing buffer by looking
18747
18748 @itemize @bullet
18749 @item
18750 at selected window
18751 @item
18752 at other windows on selected frame
18753 @item
18754 at selected windows on other frames in selected device
18755 @item
18756 at other windows on ""
18757 @item
18758 at selected windows on selected frames on other devices in selected
18759 console.
18760 @item
18761 other windows sel from other devices sel con
18762 @item
18763 "" oth "" sel
18764 @item
18765 sel win sel from sel dev oth con
18766 @item
18767 oth win sel from sel dev oth con
18768 @item
18769 sel win oth from sel dev oth con
18770 @item
18771 oth win oth from sel dev oth con
18772 @item
18773 sel win sel from oth dev oth con
18774 @item
18775 oth win sel from oth dev oth con
18776 @item
18777 oth win oth from oth dev oth con
18778 @end itemize
18779
18780 If none, use buffer -> sel from -> etc.
18781
18782 @example
18783 Returns multiple values
18784 second is instantiator
18785 third is locale containing inst.
18786 fourth is tag set
18787
18788 (restart-specifier-instance ...)
18789 @end example
18790
18791 like specifier-instance, but allows restarting the lookup, for
18792 implementing inheritance, etc. Obsoletes
18793 specifier-matching-find-charset, or whatever it is. The restart
18794 argument is opaque, and is returned as a multiple value of
18795 restart-specifier-instance. (It's actually an integer with the low
18796 bits holding the locale and the other bits count int to the list)
18797 attached to the locale.)
18798
18799 @node Future Work -- Display Tables, Future Work -- Making Elisp Function Calls Faster, Future Work -- Specifiers, Future Work
18800 @section Future Work -- Display Tables
18801 @cindex future work, display tables
18802 @cindex display tables, future work
18803
18804 #### It would also be really nice if you could specify that the
18805 characters come out in hex instead of in octal. Mule does that by
18806 adding a @code{ctl-hexa} variable similar to @code{ctl-arrow}, but
18807 that's bogus -- we need a more general solution. I think you need to
18808 extend the concept of display tables into a more general conversion
18809 mechanism. Ideally you could specify a Lisp function that converts
18810 characters, but this violates the Second Golden Rule and besides would
18811 make things way way way way slow.
18812
18813 So instead, we extend the display-table concept, which was historically
18814 limited to 256-byte vectors, to one of the following:
18815
18816 @enumerate
18817 @item
18818 A 256-entry vector, for backward compatibility;
18819 @item
18820 char-table, mapping characters to values;
18821 @item
18822 range-table, mapping ranges of characters to values;
18823 @item
18824 a list of the above.
18825 @end enumerate
18826
18827 The fourth option allows you to specify multiple display tables instead
18828 of just one. Each display table can specify conversions for some
18829 characters and leave others unchanged. The way the character gets
18830 displayed is determined by the first display table with a binding for
18831 that character. This way, you could call a function
18832 @code{enable-hex-display} that adds a hex display-table to the list of
18833 display tables for the current buffer.
18834
18835 #### ...not yet implemented... Also, we extend the concept of "mapping"
18836 to include a printf-like spec. Thus you can make all extended
18837 characters show up as hex with a display table like this:
18838
18839 @example
18840 #s(range-table data ((256 524288) (format "%x")))
18841 @end example
18842
18843 Since more than one display table is possible, you have
18844 great flexibility in mapping ranges of characters.
18845
18846 @uref{../../www.666.com/ben/default.htm,Ben Wing}
18847
18848 @node Future Work -- Making Elisp Function Calls Faster, Future Work -- Lisp Engine Replacement, Future Work -- Display Tables, Future Work
18849 @section Future Work -- Making Elisp Function Calls Faster
18850 @cindex future work, making Elisp function calls faster
18851 @cindex making Elisp function calls faster, future work
18852
18853 @strong{Abstract: }This page describes many optimizations that can be
18854 made to the existing Elisp function call mechanism without too much
18855 effort. The most important optimizations can probably be implemented
18856 with only a day or two of work. I think it's important to do this work
18857 regardless of whether we eventually decide to replace the Lisp engine.
18858
18859 Many complaints have been made about the speed of Elisp, and in
18860 particular about the slowness in executing function calls, and rightly
18861 so. If you look at the implementation of the @code{funcall} function,
18862 you'll notice that it does an incredible amount of work. Now logically,
18863 it doesn't need to be so. Let's look first from the theoretical
18864 standpoint at what absolutely needs to be done to call a Lisp function.
18865
18866 First, let's look at the situation that would exist if we were smart
18867 enough to have made lexical scoping be the default language policy. We
18868 know at compile time exactly which code can reference the variables that
18869 are the formal parameters for the function being called (specifically,
18870 only the code that is part of that function's definition) and where
18871 these references are. As a result, we can simply push all the values of
18872 the variables onto a stack, and convert all the variable references in
18873 the function definition into stack references. Therefore, binding
18874 lexically-scoped parameters in preparation for a function call involves
18875 nothing more than pushing the values of the parameters onto a stack and
18876 then setting a new value for the frame pointer, at the same time
18877 remembering the old one. Because the byte-code interpreter has a
18878 stack-based architecture, however, the parameter values have already
18879 been pushed onto the stack at the time of the function call invocation.
18880 Therefore, binding the variables involves doing nothing at all, other
18881 than dealing with the frame pointer.
18882
18883 With dynamic scoping, the situation is somewhat more complicated.
18884 Because the parameters can be referenced anywhere, and these references
18885 cannot be located at compile time, their values have to be stored into a
18886 global table that maps the name of the parameter to its current value.
18887 In Elisp, this table is called the @dfn{obarray}. Variable binding in
18888 Elisp is done using the C function @code{specbind()}. (This stands for
18889 "special variable binding" where @dfn{special} is the standard Lisp
18890 terminology for a dynamically-scoped variable.) What @code{specbind()}
18891 does, essentially, is retrieve the old value of the variable out of the
18892 obarray, remember the value by pushing it, along with the name of the
18893 variable, onto what's called the @dfn{specpdl} stack, and then store the
18894 new value into the obarray. The term "specpdl" means @dfn{Special
18895 Variable Pushdown List}, where @dfn{Pushdown List} is an archaic computer
18896 science term for a stack that used to be popular at MIT. These binding
18897 operations, however, should still not take very much time because of the
18898 use of symbols, i.e. because the location in the obarray where the
18899 variable's value is stored has already been determined (specifically, it
18900 was determined at the time that the byte code was loaded and the symbol
18901 created), so no expensive hash table lookups need to be performed.
18902
18903 An actual function invocation in Elisp does a great deal more work,
18904 however, than was just outlined above. Let's just take a look at what
18905 happens when one byte-compiled function invokes another byte-compiled
18906 function, checking for places where unnecessary work is being done and
18907 determining how to optimize these places.
18908
18909 @enumerate
18910 @item
18911
18912 The byte-compiled function's parameter list is stored in exactly the
18913 format that the programmer entered it in, which is to say as a Lisp
18914 list, complete with @code{&amp;optional} and @code{&amp;rest} keywords.
18915 This list has to be parsed for @emph{every} function invocation, which
18916 means that for every element in a list, the element is checked to see
18917 whether it's the @code{&amp;optional} or @code{&amp;rest} keywords, its
18918 surrounding cons cell is checked to make sure that it is indeed a cons
18919 cell, the @code{QUIT} macro is called, etc. What should be happening
18920 here is that the argument list is parsed exactly once, at the time that
18921 the byte code is loaded, and converted into a C array. The C array
18922 should be stored as part of the byte-code object. The C array should
18923 also contain, in addition to the symbols themselves, the number of
18924 required and optional arguments. At function call time, the C array can
18925 be very quickly retrieved and processed.
18926 @item
18927
18928 For every variable that is to be bound, the @code{specbind()} function
18929 is called. This actually does quite a lot of things, including:
18930
18931 @enumerate
18932 @item
18933
18934 Checking the symbol argument to the function to make sure it's actually
18935 a symbol.
18936 @item
18937
18938 Checking for specpdl stack overflow, and increasing its size as
18939 necessary.
18940 @item
18941
18942 Calling @code{symbol_value_buffer_local_info()} to retrieve buffer local
18943 information for the symbol, and then processing the return value from
18944 this function in a series of if statements.
18945 @item
18946
18947 Actually storing the old value onto the specpdl stack.
18948 @item
18949
18950 Calling @code{Fset()} to change the variable's value.
18951
18952 @end enumerate
18953
18954
18955 @end enumerate
18956
18957
18958
18959 The entire series of calls to @code{specbind()} should be inline and
18960 merged into the argument processing code as a single tight loop, with no
18961 function calls in the vast majority of cases. The @code{specbind()}
18962 logic should be streamlined as follows:
18963
18964 @enumerate
18965 @item
18966
18967 The symbol argument type checking is unnecessary.
18968 @item
18969
18970 The check for the specpdl stack overflow needs to be done only once, not
18971 once per argument.
18972 @item
18973
18974 All of the remaining logic should be boiled down as follows:
18975
18976 @enumerate
18977 @item
18978
18979 Retrieve the old value from the symbol's value cell.
18980 @item
18981
18982 If this value is a symbol-value-magic object, then call the real
18983 @code{specbind()} to do the work.
18984 @item
18985
18986 Otherwise, we know that nothing complicated needs to be done, so we
18987 simply push the symbol and its value onto the specpdl stack, and then
18988 replace the value in the symbol's value cell.
18989 @item
18990
18991 The only logic that we are omitting is the code in @code{Fset()} that
18992 checks to make sure a constant isn't being set. These checks should be
18993 made at the time that the byte code for the function is loaded and the C
18994 array of parameters to the function is created. (Whether a symbol is
18995 constant or not is generally known at XEmacs compile time. The only
18996 issue here is with symbols whose names begin with a colon. These
18997 symbols should simply be disallowed completely as parameter names.)
18998
18999 @end enumerate
19000
19001
19002 @end enumerate
19003
19004
19005
19006 Other optimizations that could be done are:
19007
19008 @itemize
19009 @item
19010
19011 At the beginning of the function that implements the byte-code
19012 interpreter (this is the Lisp primitive @code{byte-code}), the string
19013 containing the actual byte code is converted into an array of integers.
19014 I added this code specifically for MULE so that the byte-code engine
19015 didn't have to deal with the complexities of the internal string format
19016 for text. This conversion, however, is generally useful because on
19017 modern processors accessing 32-bit values out of an array is
19018 significantly faster than accessing unaligned 8-bit values. This
19019 conversion takes time, though, and should be done once at load time
19020 rather than each time the byte code is executed. This array should be
19021 stored in the byte-code object. Currently, this is a bit tricky to do,
19022 because @code{byte-code} is not actually passed the byte-code object,
19023 but rather three of its elements. We can't just change @code{byte-code}
19024 so that it is directly passed the byte-code object because this
19025 function, with its existing argument calling pattern, is called directly
19026 from compiled Elisp files. What we can and should do, however, is
19027 create a subfunction that does take a byte-code object and actually
19028 implements the byte-code interpreter engine. Whenever the C code wants
19029 to execute byte code, it calls this subfunction. @code{byte-code}
19030 itself also calls this subfunction after conjuring up an appropriate
19031 byte-code object and storing its arguments into this object. With a
19032 small amount of work, it's possible to do this conjuring in such a way
19033 that it doesn't generate any garbage.
19034 @item
19035
19036 At the end of a function call, the parameter bindings that have been
19037 done need to be undone. This is standardly done by calling
19038 @code{unbind_to()}. Just as for a @code{specbind()}, this function does
19039 a lot of work that is unnecessary in the vast majority of cases, and it
19040 could also be inlined and streamlined.
19041 @item
19042
19043 As part of each Elisp function call, a whole bunch of checks are done
19044 for a series of unlikely but possible conditions that may occur. These
19045 include, for example,
19046
19047 @itemize
19048 @item
19049
19050 Calling the @code{QUIT} macro, which essentially involves
19051 checking a global volatile variable to see whether additional processing
19052 needs to be done.
19053 @item
19054
19055 Checking whether a garbage collection needs to be done.
19056 @item
19057
19058 Checking the variable @code{debug_on_next_call}.
19059 @item
19060
19061 Checking for whether Elisp profiling is active. (An additional
19062 optimization that's perhaps not worth the effort is to do some
19063 post-processing on the array of integers after it has been converted.
19064 For example, whenever a 16-bit value occurs in the byte code, it has
19065 to be encoded as two separate 8-bit values. These values could be
19066 combined. The tricky part here is that all of the places where a goto
19067 occurs across the place where this modification is made would have to
19068 have their offsets changed. Other such optimizations can easily be
19069 imagined as well.)
19070
19071 @end itemize
19072
19073 @item
19074
19075 With a little bit smarter code, it should be possible to make a
19076 single trip variable that indicates whether any of these conditions is
19077 true. This variable would be updated by any code that changes the
19078 actual variables whose values are checked in the various checks just
19079 mentioned. (By the way, all of this is occurring in the C function
19080 @code{funcall_recording_as()}.) There is a little bit of code
19081 between each of the checks. This code would simply have to be
19082 duplicated between the two cases where this general trip variable is
19083 true and is false. (Note: the optimization detailed in this item is
19084 probably not worth doing on the first pass.)
19085
19086 @end itemize
19087
19088 @uref{../../www.666.com/ben/default.htm,Ben Wing}
19089
19090 @node Future Work -- Lisp Engine Replacement, , Future Work -- Making Elisp Function Calls Faster, Future Work
19091 @section Future Work -- Lisp Engine Replacement
19092 @cindex future work, lisp engine replacement
19093 @cindex lisp engine replacement, future work
19094
19095 @menu
19096 * Future Work -- Lisp Engine Discussion::
19097 * Future Work -- Lisp Engine Replacement -- Implementation::
19098 @end menu
19099
19100 @node Future Work -- Lisp Engine Discussion, Future Work -- Lisp Engine Replacement -- Implementation, Future Work -- Lisp Engine Replacement, Future Work -- Lisp Engine Replacement
19101 @subsection Future Work -- Lisp Engine Discussion
19102 @cindex future work, lisp engine discussion
19103 @cindex lisp engine discussion, future work
19104
19105
19106 @strong{Abstract: }Recently there has been a great deal of talk on the
19107 XEmacs mailing lists about potential changes to the XEmacs Lisp engine.
19108 Usually the discussion has centered around the question which is better,
19109 Common Lisp or Scheme? This is certainly an interesting debate topic,
19110 but it didn't seem to have much practical relevance to me, so I vowed to
19111 stay out of the discussion. Recently, however, it seems that people are
19112 losing sight of the broader picture. For example, nobody seems to be
19113 asking the question, ``"Would an extension language other than Lisp or
19114 Scheme (perhaps not a Lisp variant at all) be more appropriate?"'' Nor
19115 does anybody seem to be addressing what I consider to be the most
19116 fundamental question, is changing the extension language a good thing to
19117 do?
19118
19119 I think it would be a mistake at this point in XEmacs development to
19120 begin any project involving fundamental changes to the Lisp engine or to
19121 the XEmacs Lisp language itself. It would take a huge amount of effort
19122 to complete even part of this project, and would be a major drain on the
19123 already-insufficient resources of the XEmacs development community.
19124 Most of the gains that are purported to stem from a project such as this
19125 could be obtained with far less effort by making more incremental
19126 changes to the XEmacs core. I think it would be an even bigger mistake
19127 to change the actual XEmacs extension language (as opposed to just
19128 changing the Lisp engine, making few, if any, externally visible
19129 changes). The only language change that I could possibly imagine
19130 justifying would involve switching to some ubiquitous web language, such
19131 as Java and JavaScript, or Perl. (Even among those, I think Java would
19132 be the only possibility that really makes sense).
19133
19134 In the rest of this document I'll present the broader issues that would
19135 be involved in changing the Lisp engine or extension language. This
19136 should make clear why I've come to believe as I do.
19137
19138 @subheading Is everyone clear on the difference between interface and implementation?
19139
19140 There seems to be a great deal of confusion concerning the difference
19141 between interface and implementation. In the context of XEmacs,
19142 changing the interface means switching to a different extension language
19143 such as Common Lisp, Scheme, Java, etc. Changing the implementation
19144 means using a different Lisp engine. There is obviously some relation
19145 between these two issues, but there is no particular requirement that
19146 one be changed if the other is changed. It is quite possible, for
19147 example, to imagine taking the underlying engine for any of the various
19148 Lisp dialects in existence, and adapting it so that it implements the
19149 same Elisp extension language that currently exists. The vast majority
19150 of the purported benefits that we would get from changing the extension
19151 language could just as easily be obtained while making minimal changes
19152 to the external Elisp interface. This way nearly all existing Elisp
19153 programs would continue to work, there would be no need to translate
19154 Elisp programs into some other language or to simultaneously support two
19155 incompatible Lisp variants, and there would be no need for users or
19156 package authors to learn a new extension language that would be just as
19157 unfamiliar to the vast majority of them as Elisp is.
19158
19159 @subheading Why should we change the Lisp engine?
19160
19161 Let's go over the possible reasons for changing the Lisp engine.
19162
19163 @subsubheading Speed.
19164
19165 Changing the Lisp engine might make XEmacs faster. However,
19166 consider the following.
19167
19168 @enumerate
19169 @item
19170
19171 XEmacs will get faster over time without any development effort at all
19172 because computers will get faster.
19173 @item
19174
19175 Perhaps the biggest causes of the slowness of XEmacs are not related to
19176 the Lisp engine at all. It has been asserted, for example, that the
19177 slowness of XEmacs is primarily due to the redisplay mechanism, to the
19178 handling of insertion and deletion of text in a buffer, to the event
19179 loop, etc. Nobody has done any real studies to determine what the
19180 actual cause of slowness is.
19181 @item
19182
19183 Emacs 18 seems plenty fast enough to most people. However, Emacs 18
19184 also had a worse Lisp engine and a worse byte compiler than XEmacs.
19185 @item
19186
19187 Significant speed increases in the execution of Lisp code could be
19188 achieved without too much effort by working on the existing byte code
19189 interpreter and function call mechanism a bit.
19190
19191 @end enumerate
19192
19193 @subsubheading Memory usage.
19194
19195 A new Lisp engine with a better garbage collection mechanism might make
19196 more efficient use of memory; for example, through the use of a
19197 relocating garbage collector. However, consider this:
19198
19199 @enumerate
19200 @item
19201
19202 A new Lisp engine would probably have a larger memory footprint, perhaps
19203 a significantly larger one.
19204 @item
19205
19206 The worst memory problems might not be due to Lisp object inefficiency
19207 at all. The problems could simply be due mainly to the inefficient
19208 buffer representation. Nobody has come up with any concrete numbers on
19209 where the real problem lies.
19210
19211 @end enumerate
19212
19213 @subsubheading Robustness.
19214
19215 A new Lisp engine might well be more robust. (On the other hand, it
19216 might not be. It is not always easy to tell). However, I think that
19217 the biggest problems with robustness are in the part of the C code that
19218 is not concerned with implementing the Lisp engine. The redisplay
19219 mechanism and the unexec mechanism are probably the biggest sources of
19220 robustness problems. I think the biggest robustness problems that are
19221 related to the Lisp engine concern the use of GCPRO declarations. The
19222 entire GCPRO mechanism is ill-conceived and unsafe. The only real way
19223 to make this safe would be to do conservative garbage collection over
19224 the C stack and to eliminate the GCPRO declarations entirely. But how
19225 many of the Lisp engines that are being considered have such a mechanism
19226 built into them?
19227
19228
19229 @subsubheading Maintainability.
19230
19231 A new Lisp engine might well improve the maintainability of XEmacs by
19232 offloading the maintenance of the Lisp engine. However, we need to make
19233 very sure that this is, in fact, the case before embarking on a project
19234 like this. We would almost certainly have to make significant
19235 modifications to any Lisp engine that we choose to integrate, and
19236 without the active and committed support and cooperation of the
19237 developers of that Lisp engine, the maintainability problem would
19238 actually get worse.
19239
19240 @subsubheading Features.
19241
19242 A new Lisp engine might have built in support for various features that
19243 we would like to add to the XEmacs extension language, such as lexical
19244 scoping and an object system.
19245
19246 @subheading Why would we want to change the extension language?
19247
19248 Possible reasons for changing the extension language include:
19249
19250 @subsubheading More standard.
19251
19252 Switching to a language that is more standard and more commonly in use
19253 would be beneficial for various reasons. First of all, the language
19254 that is more commonly used and more familiar would make it easier for
19255 users to write their own extensions and in general, increase the
19256 acceptance of XEmacs. Also, an accepted standard probably has had a lot
19257 more thought put into it than any language interface created by the
19258 XEmacs developers themselves. Furthermore, if our extension language is
19259 being actively developed and supported, much of the work that we would
19260 otherwise have to do ourselves is transferred elsewhere.
19261
19262 However, both Scheme and Common Lisp flunk the familiarity test.
19263 Neither language is being actively used for program development outside
19264 of small research communities, and few prospective authors of XEmacs
19265 extensions will be familiar with any Lisp variant for real world uses.
19266 (I consider the argument that Scheme is often used in introductory
19267 programming courses to be irrelevant. Many existing programmers were
19268 taught Pascal in their introductory programming courses. How many of
19269 them would actually be comfortable writing a program in Pascal?)
19270 Furthermore, someone who wants to learn Lisp can't exactly go to their
19271 neighborhood bookstore and pick up a book on this topic.
19272
19273 @subsubheading Ease of use.
19274
19275 There are endless arguments about which language is easiest to use. In
19276 practice, this largely boils down to which languages are most familiar.
19277
19278 @subsubheading Object oriented.
19279
19280 The object-oriented paradigm is the dominant one in use today for new
19281 languages. User interface concepts in particular are expressed very
19282 naturally in an object-oriented system. However, neither Scheme nor
19283 Common Lisp has been designed with object orientation in mind. There is
19284 a standard object system for Common Lisp, but it is extremely complex
19285 and difficult to understand.
19286
19287
19288 @uref{../../www.666.com/ben/default.htm,Ben Wing}
19289
19290
19291 @node Future Work -- Lisp Engine Replacement -- Implementation, , Future Work -- Lisp Engine Discussion, Future Work -- Lisp Engine Replacement
19292 @subsection Future Work -- Lisp Engine Replacement -- Implementation
19293 @cindex future work, lisp engine replacement, implementation
19294 @cindex lisp engine replacement, implementation, future work
19295
19296 Let's take a look at the sort of work that would be required if we were
19297 to replace the existing Elisp engine in XEmacs with some other engine,
19298 for example, the Clisp engine. I'm assuming here, of course, that we
19299 are not going to be changing the interface here at the same time, which
19300 is to say that we will be keeping the same Elisp language that we
19301 currently have as the extension language for XEmacs, except perhaps for
19302 incremental changes that we will make, such as lexical scoping and
19303 proper structure support in an attempt to gradually move the language
19304 towards an upwardly-compatible goal, such as Common Lisp. I am writing
19305 this page primarily as food for thought. I feel fairly strongly that
19306 actually doing this work would be a big waste of effort that would
19307 inevitably become a huge time sink on the part of nearly everyone
19308 involved in XEmacs development, and not only for the ones who were
19309 supposed to be actually doing the engine change. I feel that most of
19310 the desired changes that we want for the language and/or the engine can
19311 be achieved with much less effort and time through incremental changes
19312 to the existing code base.
19313
19314 First of all, in order to make a successful Lisp engine change in
19315 XEmacs, it is vitally important that the work be done through a series
19316 of incremental stages where at the end of each stage XEmacs can be
19317 compiled and run, and it works. It is tempting to try to make the
19318 change all at once, but this would be disastrous. If the resulting
19319 product worked at all, it would inevitably contain a huge number of
19320 subtle and extremely difficult to track down bugs, and it would be next
19321 to impossible to determine which of the myriad changes made introduced
19322 the bug.
19323
19324 Now let's look at what the possible stages of implementation could be.
19325
19326 @subsubheading An Extra C Preprocessing Stage
19327
19328 The first step would be to introduce another preprocessing stage for the
19329 XEmacs C code, which is done before the C compiler itself is invoked on
19330 the code, and before the standard C preprocessor runs. The C
19331 preprocessor is simply not powerful enough to do many of the things we
19332 would like to do in the C code. The existing results of this have been
19333 a combination of a lot of hacked up and tricky-to-maintain stuff (such
19334 as the @code{DEFUN} macro, and the associated @code{DEFSUBR}), as well
19335 as code constructs that are difficult to write. (Consider for example,
19336 attempting to do structured exception handling, such as catch/throw and
19337 unwind-protect constructs), as well as code that is potentially or
19338 actually unsafe (such as the uses of @code{alloca}), which could easily
19339 cause stack overflow with large amounts of memory allocated in this
19340 fashion.) The problem is that the C preprocessor does not allow macros
19341 to have the power of an actual language, such as C or Lisp. What our
19342 own preprocessor should do is allow us to define macros, whose
19343 definitions are simply functions written in some language which are
19344 executed at compile time, and whose arguments are the actual argument
19345 for the macro call, as well as an environment which should have a data
19346 structure representation of the C code in the file and allow this
19347 environment to be queried and modified. It can be debated what the
19348 language should be that these extensions are written in. Whatever the
19349 language chosen, it needs to be a very standard language and a language
19350 whose compiler or interpreter is available on all of the platforms that
19351 we could ever possibly consider putting XEmacs to, which is basically to
19352 say all the platforms in existence. One obvious choice is C, because
19353 there will obviously be a C compiler available, because it is needed to
19354 compile XEmacs itself. Another possibility is Perl, which is already
19355 installed on most systems, and is universally available on all others.
19356 This language has powerful text processing facilities which would
19357 probably make it possible to implement the macro definitions more
19358 quickly and easily; however, this might also encourage bad coding
19359 practices in the macros (often simple text processing is not
19360 appropriate, and more sophisticated parsing or recursive data structure
19361 processing needs to be done instead), and we'd have to make sure that
19362 the nested data structure that comprises the environment could be
19363 represented well in Perl. Elisp would not be a good choice because it
19364 would create a bootstrapping problem. Other possible languages, such as
19365 Python, are not appropriate, because most programmers are unfamiliar
19366 with this language (creating a maintainability problem) and the Python
19367 interpreter would have to be included and compiled as part of the XEmacs
19368 compilation process (another maintainability problem). Java is still
19369 too much in flux to be considered at this point.
19370
19371 The macro facility that we will provide needs to add two features to the
19372 language: the ability to define a macro, and the ability to call a
19373 macro. One good way of doing this would be to make use of special
19374 characters that have no meaning in the C language (or in C++ for that
19375 matter), and thus can never appear in a C file outside of comments and
19376 strings. Two obvious characters are the @@ sign and the $ sign. We
19377 could, for example, use @code{@@} defined to define new macros, and the
19378 @code{$} sign followed by the macro name to call a macro. (Proponents
19379 of Perl will note that both of these characters have a meaning in Perl.
19380 This should not be a problem, however, because the way that macros are
19381 defined and called inside of another macro should not be through the use
19382 of any special characters which would in effect be extending the macro
19383 language, but through function calls made in the normal way for the
19384 language.)
19385
19386 The program that actually implements this extra preprocessing stage
19387 needs to know a certain amount about how to parse C code. In
19388 particular, it needs to know how to recognize comments, strings,
19389 character constants, and perhaps certain other kinds of C tokens, and
19390 needs to be able to parse C code down to the statement level. (This is
19391 to say it needs to be able to parse function definitions and to separate
19392 out the statements, @code{if} blocks, @code{while} blocks, etc. within
19393 these definitions. It probably doesn't, however need to parse the
19394 contents of a C expression.) The preprocessing program should work
19395 first by parsing the entire file into a data structure (which may just
19396 contain expressions in the form of literal strings rather than a data
19397 structure representing the parsed expression). This data structure
19398 should become the environment parameter that is passed as an argument to
19399 macros as mentioned above. The implementation of the parsing could and
19400 probably should be done using @code{lex} and @code{yacc}. One good idea
19401 is simply to steal some of the @code{lex} and @code{yacc} code that is
19402 part of GCC.
19403
19404 Here are some possibilities that could be implemented as part of the
19405 preprocessing:
19406
19407 @enumerate
19408 @item
19409
19410 A proper way of doing the @code{DEFUN} macros. These could, for
19411 example, take an argument list in the form of a Lisp argument list
19412 (complete with keyword parameters and other complex features) and
19413 automatically generate the appropriate @code{subr} structure, the
19414 appropriate C function definition header, and the appropriate call to
19415 the @code{DEFSUBR} initialization function.
19416 @item
19417
19418 A truly safe and easy to use implementation of the @code{alloca}
19419 function. This could allocate the memory in any fashion it chooses
19420 (calling @code{malloc} using a large global array, or a series of such
19421 arrays, etc.) an @code{insert} in the appropriate places to
19422 automatically free up this memory. (Appropriate places here would be at
19423 the end of the function and before any return statements. Non-local
19424 exits can be handled in the function that actually implements the
19425 non-local exit.)
19426 @item
19427
19428 If we allow for the possibility of having an arbitrary Lisp engine, we
19429 can't necessarily assume that we can call Lisp primitives implemented in
19430 C from other C functions by simply making a function all. Perhaps
19431 something special needs to happen when this is done. This could be
19432 handled fairly easily by having our new and improved @code{DEFUN} macro
19433 define a new macro for use when calling a primitive.
19434 @end enumerate
19435
19436
19437 @subsubheading Make the Existing Lisp Engine be Self-contained.
19438
19439 The goal of this stage is to gradually build up a self-contained Lisp
19440 engine out of the existing XEmacs core, which has no dependencies on any
19441 of the code elsewhere in the XEmacs core, and has a well-defined and
19442 black box-style interface. (This is to say that the rest of the C code
19443 should not be able to access the implementation of the Lisp engine, and
19444 should make as few assumptions as possible about how this implementation
19445 works). The Lisp engine could, and probably should, be built up as a
19446 separate library which can be compiled on its own without any of the
19447 rest of the XEmacs C code, and can be tested in this configuration as
19448 well.
19449
19450 The creation of this engine library should be done as a series of
19451 subsets, each of which moves more code out of the XEmacs core and into
19452 the engine library, and XEmacs should be compilable and runnable between
19453 each sub-step. One possible series of sub-steps would be to first
19454 create an engine that does only object allocation and garbage
19455 collection, then as a second sub-step, move in the code that handles
19456 symbols, symbol values, and simple binding, and then finally move in the
19457 code that handles control structures, function calling, @code{byte-code}
19458 execution, exception handling, etc. (It might well be possible to
19459 further separate this last sub-step).
19460
19461 @subsubheading Removal of Assumptions About the Lisp Engine Implementation
19462
19463 Currently, the XEmacs C code makes all sorts of assumptions about the
19464 implementation of the Lisp engine, particularly in the areas of object
19465 allocation, object representation, and garbage collection. A different
19466 Lisp engine may well have different ways of doing these implementations,
19467 and thus the XEmacs C code must be rid of any assumptions about these
19468 implementations. This is a tough and tedious job, but it needs to be
19469 done. Here are some examples:
19470
19471 @enumerate
19472 @item
19473
19474 @code{GCPRO} must go. The @code{GCPRO} mechanism is tedious,
19475 error-prone, unmaintainable, and fundamentally unsafe. As anyone who
19476 has worked on the C Core of XEmacs knows, figuring out where to insert
19477 the @code{GCPRO} calls is an exercise in black magic, and debugging
19478 crashes as a result of incorrect @code{GCPROing} is an absolute
19479 nightmare. Furthermore, the entire mechanism is fundamentally unsafe.
19480 Even if we were to use the extra preprocessing stage detailed above to
19481 automatically generate @code{GCPRO} and @code{UNGCPRO} calls for all
19482 Lisp object variables occurring anywhere in the C code, there are still
19483 places where we could be bitten. Consider, for example, code which
19484 calls @code{cons} and where the two arguments to this functions are both
19485 calls to the @code{append} function. Now the @code{append} function
19486 generates new Lisp objects, and it also calls @code{QUIT}, which could
19487 potentially execute arbitrary Lisp code and cause a garbage collection
19488 before returning control to the @code{append} function. Now in order to
19489 generate the arguments to the @code{cons} function, the @code{append}
19490 function is called twice in a row. When the first @code{append} call
19491 returns, new Lisp data has been created, but has no @code{GCPRO}
19492 pointers to it. If the second @code{append} call causes a garbage
19493 collection, the Lisp data from the first @code{append} call will be
19494 collected and recycled, which is likely to lead to obscure and
19495 impossible-to-debug crashes. The only way around this would be to
19496 rewrite all function calls whose parameters are Lisp objects in terms of
19497 temporary variables, so that no such function calls ever contain other
19498 function calls as arguments. This would not only be annoying to
19499 implement, even in a smart preprocessor, but would make the C code
19500 become incredibly slow because of all the constant updating of the
19501 @code{GCPRO} lists.
19502 @item
19503
19504 The only proper solution here is to completely do away with the
19505 @code{GCPRO} mechanism and simply do conservative garbage collection
19506 over the C stack. There are already portable implementations of
19507 conservative pointer marking over the C stack, and these could easily be
19508 adapted for use in the Elisp garbage collector. If, as outlined above,
19509 we use an extra preprocessing stage to create a new version of
19510 @code{alloca} that allocates its memory elsewhere than actually on the C
19511 stack, and we ensure that we don't declare any large arrays as local
19512 variables, but instead use @code{alloca}, then we can be guaranteed that
19513 the C stack is small and thus that the conservative pointer marking
19514 stage will be fast and not very likely to find false matches.
19515 @item
19516
19517 Removing the @code{GCPRO} declarations as just outlined would also
19518 remove the assumption currently made that garbage collection can occur
19519 only in certain places in the C code, rather than in any arbitrary spot.
19520 (For example, any time an allocation of Lisp data happens). In order to
19521 make things really safe, however, we also have to remove another
19522 assumption as detailed in the following item.
19523 @item
19524
19525 Lisp objects might be relocatable. Currently, the C code assumes that
19526 Lisp objects other than string data are not relocatable and therefore
19527 it's safe to pass around and hold onto the actual pointers for the C
19528 structures that implement the Lisp objects. Current code, for example,
19529 assumes that a @code{Lisp_Object} of type buffer and a C pointer to a
19530 @code{struct buffer} mean basically the same thing, and indiscriminately
19531 passes the two kinds of buffer pointers around. With relocatable Lisp
19532 objects, the pointers to the C structures might change at any time.
19533 (Remember, we are now assuming that a garbage collection can happen at
19534 basically any point). All of the C code needs to be changed so that
19535 Lisp objects are always passed around using a Lisp object type, and the
19536 underlying pointers are only retrieved at the time when a particular
19537 data element out of the structure is needed. (As an aside, here's
19538 another reason why Lisp objects, instead of pointers, should always be
19539 passed around. If pointers are passed around, it's conceivable that at
19540 the time a garbage collection occurs, the only reference to a Lisp
19541 object (for example, a deleted buffer) would be in the form of a C
19542 pointer rather than a Lisp object. In such a case, the conservative
19543 pointer marking mechanism might not notice the reference, especially if,
19544 in an attempt to eliminate false matches and make the code generally
19545 more efficient, it will be written so that it will look for actual Lisp
19546 object references.)
19547 @item
19548
19549 I would go a step farther and completely eliminate the macros that
19550 convert a Lisp object reference into a C pointer. This way the only way
19551 to access an element out of a Lisp object would be to use the macro for
19552 that element, which in one atomic operation de-references the Lisp
19553 object reference and retrieves the value contained in the element. We
19554 probably do need the ability to retrieve actual C pointers, though. For
19555 example, in the case where an array is stored in a Lisp object, or
19556 simply for efficiency purposes where we might want some code to retrieve
19557 the C pointer for a Lisp object, and work on that directly to avoid a
19558 whole bunch of extra indirections. I think the way to do this would be
19559 through the use of a special locking construct implemented as part of
19560 the extra preprocessor stage mentioned above. This would essentially be
19561 what you might call a @dfn{lock block}, just like a @code{while} block.
19562 You'd write the word @code{lock} followed by a parenthesized expression
19563 that retrieves the C pointer and stores it into a variable that is
19564 scoped only within the lock block and followed in turn by some code in
19565 braces, which is the actual code associated with the lock block, and
19566 which can make use of this pointer. While the code inside the lock
19567 block is executing, that particular pointer and the object pointed to by
19568 it is guaranteed not to be relocated.
19569 @item
19570
19571 If all the XEmacs C code were converted according to these rules, there
19572 would be no restrictions on the sorts of implementations that can be
19573 used for the garbage collector. It would be possible, for example, to
19574 have an incremental asynchronous relocating garbage collector that
19575 operated continuously in another thread while XEmacs was running.
19576 @item
19577
19578 The C implementation of Lisp objects might not, and probably should not,
19579 be visible to the rest of the XEmacs C code. It should theoretically be
19580 possible, for example, to implement Lisp objects entirely in terms of
19581 association lists, rather than using C structures in the standard way.
19582 (This may be an extreme example, but it's good to keep in mind an
19583 example such as this when cleaning up the XEmacs C code). The changes
19584 mentioned in the previous item would go a long way towards removing this
19585 assumption. The only places where this assumption might still be made
19586 would be inside of the lock blocks where an actual pointer is retrieved.
19587 (Also, of course, we'd have to change the way that Lisp objects are
19588 defined in C so that this is done with some function calls and new and
19589 improved macros rather than by having the XEmacs C code actually define
19590 the structures. This sort of thing would probably have to be done in
19591 any case once the allocation mechanism is moved into a separate
19592 library.) With some thought it should be possible to define the lock
19593 block interface in such a way as to remove any assumptions about the
19594 implementation of Lisp objects.
19595 @item
19596
19597 C code may not be able to call Lisp primitives that are defined in C
19598 simply by making standard C function calls. There might need to be some
19599 wrapper around all such calls. This could be achieved cleanly through
19600 the extra preprocessing step mentioned above, in line with the example
19601 described there.
19602
19603 @end enumerate
19604
19605 @subsubheading Actually Replacing the Engine.
19606
19607 Once we've done all of the work mentioned in the previous steps (and
19608 admittedly, this is quite a lot of work), we should have an XEmacs that
19609 still uses what is essentially the old and previously existing Lisp
19610 engine, but which is ready to have its Lisp engine replaced. The
19611 replacement might proceed as follows:
19612
19613 @enumerate
19614 @item
19615
19616 Identify any further changes that need to be made to the engine
19617 interface that we have defined as a result of the previous steps so that
19618 features and idiosyncrasies of various Lisp engines that we examine
19619 could be properly supported.
19620 @item
19621
19622 Pick a Lisp engine and write an interface layer that sits on top of this
19623 Lisp engine and makes it adhere to what I'll now call the XEmacs Lisp
19624 engine interface.
19625 @item
19626
19627 Strongly consider creating, if we haven't already done so, a test suite
19628 that can test the XEmacs Lisp engine interface when used with a
19629 stand-alone Lisp engine.
19630 @item
19631
19632 Test the hell out of the Lisp engine that we've chosen when combined
19633 with its XEmacs Lisp engine interface layer as a stand-alone program.
19634 @item
19635
19636 Now finally attach this stand-alone program to XEmacs itself. Debug and
19637 fix any further problems that ensue (and there inevitably will be such
19638 problems), updating the test suite as we go along so that if it were run
19639 again on the old and buggy interfaced Lisp engine, it would note the
19640 bug.
19641
19642 @end enumerate
19643
19644
19645 @uref{../../www.666.com/ben/default.htm,Ben Wing}
19646
19647 @node Future Work Discussion, Old Future Work, Future Work, Top
19648 @chapter Future Work Discussion
19649 @cindex future work, discussion
19650 @cindex discussion, future work
19651
19652 This chapter includes (mostly) email discussions about particular design
19653 issues, edited to include only relevant and useful stuff. Ideally over
19654 time these could be condensed down to a single design document to go
19655 into the normal Future Work section.
19656
19657 @menu
19658 * Discussion -- garbage collection::
19659 * Discussion -- glyphs::
19660 @end menu
19661
19662 @node Discussion -- garbage collection, Discussion -- glyphs, Future Work Discussion, Future Work Discussion
19663 @section Discussion -- garbage collection
19664 @cindex discussion, garbage collection
19665 @cindex garbage collection, discussion
19666
19667
19668 @example
19669 On Tue, Oct 12, 1999 at 03:36:59AM -0700, Ben Wing wrote:
19670 @end example
19671
19672 So what am I missing here?
19673
19674 @example
19675 In response, Olivier Galibert wrote:
19676 @end example
19677
19678 Two things:
19679 @enumerate
19680 @item
19681 The purespace is gone
19682
19683 I mean absolutely, completely and utterly removed. Fpurecopy is a
19684 no-op now (and have been for some time). Readonly objects are gone
19685 too. Having less checks to do in Fsetcar, Fsetcdr, Faset and some
19686 others is probably a good thing, speedwise. I have it removed some
19687 time ago because it does not make sense when using a portable dumper
19688 to copy data in a special area of the memory at dump time and I wanted
19689 to be sure that supressing the copying from Fpurecopy wouldn't break
19690 things.
19691
19692 Now, we want to get the post-dumping data sharing back, of course. In
19693 today systems, it is quite easy: you just have to map the file
19694 MAP_PRIVATE and avoid writing to the subset of pages you want to keep
19695 shared. Copy-on-write does the job for you. It has the nice side
19696 effect of completely avoiding bus errors due to trying to write to
19697 readonly memory zones.
19698
19699 Avoiding writing to the "pure" objects themselves is already done, of
19700 course. Would lisp code have written to the purecopied parts of the
19701 dumped data that it would have exploded long ago. So there is nothing
19702 to do in this area. So the only remaining thing is the markbit. Two
19703 possible strategies:
19704
19705 @itemize @bullet
19706 @item
19707 have Fpurecopy mark somehow the lrecords it would have copied in the
19708 good old times. Post-dump, use this mark as a "always marked, don't
19709 touch, don't look into, don't free" flag, the same way CHECK_PURE
19710 was used.
19711 @item
19712 move the markbit outside of the lrecord.
19713 @end itemize
19714
19715
19716 The second solution is more appealing to me for a bunch of reasons:
19717 @itemize @bullet
19718 @item
19719 more things are shared than only what is purecopied (not yet used
19720 functions come to mind)
19721 @item
19722 no more "the only references to this non-purecopied object are from
19723 purecopied objects, XEmacs will self-destruct in ten seconds" kind
19724 of bugs.
19725 @item
19726 removing flags goes the right way towards implementing Jan's
19727 allocator ideas.
19728 @item
19729 it becomes probably easier to experiment with the GC code
19730 @end itemize
19731
19732 @item
19733 Finding all the dumped objects in order to unmark them sucks
19734
19735 Not having to rebuild a list of all the dumped objects in order to
19736 find them all and ensure that all are unmarked simplifies things for
19737 me. Errr, ok, now that I really think of it, I can rebuild this list
19738 easily, in fact. And I'm probably going to have to manage it, since I
19739 feel like the lack of calls to the finalizers for the dumped objects
19740 is going to someday turn over and bite me in the face. But anyways,
19741 it makes my life easier for now.
19742
19743 So no, it's not a _necessity_. But it helps. And the automatic
19744 sharing of all objects until you write to them explicitely is, I
19745 think, really cool.
19746 @end enumerate
19747
19748
19749 @example
19750 On 10/12/1999 5:49 PM Ben Wing wrote:
19751
19752 Subject: Re: hashtable-based marking and cleanups
19753 @end example
19754
19755 OK, I can see the advantages. But:
19756
19757 @enumerate
19758 @item
19759 There will be an inevitable loss of speed using a large hashtable. If
19760 it's large, I say that it's just not worth it. There are things that are
19761 so much more important than futzing around with the garbage collector
19762 (e.g. fixing the god damn user interface), things which if not fixed will
19763 sooner or later cause XEmacs to die entirely. If we are causing a major
19764 slowdown in the name of some not-so-important work that may or may not get
19765 done, we shouldn't do it. (On the other hand, if the slowdown is
19766 negligible, I have no problems with this.)
19767
19768 @item
19769 I think you should @strong{expand} the concept of read-only objects so
19770 that @strong{any} object (especially strings and cons cells) can get
19771 marked read-only by the C code if it wants. (Perhaps you could use the
19772 now-unused mark bit to hold a read-only flag.) This is important because
19773 it allows C code to directly return internal lists (e.g. from the
19774 specifiers and various object property lists) without having to do a
19775 copy, like is now done (and similarly, potentially to directly accept
19776 lists from a Lisp call without copying them for internal use, if the
19777 Lisp caller is made aware that the list might become read-only) -- if
19778 the copy weren't done and some piece of Lisp code went and modified the
19779 list, XEmacs might very well crash. Thus, this read-only flag would be
19780 a huge efficiency gain in terms of the garbage collection overhead saved
19781 as well as the speed of copying a large list. The extra checks in
19782 @code{Fsetcar()}, etc. for this that you mention are in fact negligible
19783 in their speed overhead -- one or two instructions -- and these
19784 functions are not used all that commonly, either. With the changes I
19785 have proposed in Architecting XEmacs, the case of returning an internal
19786 list will become more and more common as the power of the user interface
19787 would be greatly increased and along with it are lots and lots of lists
19788 of info that need to be retrievable from Lisp.
19789 @end enumerate
19790
19791 BTW there is a wonderful book all about garbage collection by Jones and
19792 Lins. Ever seen it?
19793
19794 @example
19795 http://www.amazon.com/exec/obidos/ASIN/0471941484/qid=939775572/sr=1-1/002-3092633-2509405
19796 @end example
19797
19798 @node Discussion -- glyphs, , Discussion -- garbage collection, Future Work Discussion
19799 @section Discussion -- glyphs
19800 @cindex discussion, glyphs
19801 @cindex glyphs, discussion
19802
19803 Some comments (not always pretty!) by Ben:
19804
19805 @example
19806 March 20, 2000
19807
19808 Andy, I use the tab widgets but I've been having lots of problems.
19809
19810 1] Sometimes clicking on them does nothing.
19811
19812 2] There's a design flaw: I frequently use M-C-l to switch to the
19813 previous buffer. If I use this in conjunction with the tabs, things get
19814 all screwed up because selecting a buffer with the tab does not bring it
19815 to the front of the buffer list, like it should. It looks like you're
19816 doing this to avoid having the order of the tabs change, but this is
19817 wrong: If you don't reorder the buffer list, everything else gets
19818 screwed up. If you want the order of the tabs not to change, you need
19819 to decouple this order from the buffer list order.
19820 @end example
19821
19822 @example
19823 March 23, 2000
19824
19825 I'm very confused. The SIGIO timer is used @strong{only} for C-g. It has
19826 nothing to do with any other events. (sit-for 0) ought to
19827
19828 (1) cause all pending non-command events to get executed, and
19829 (b) do redisplay
19830
19831 However, sit-for gets preempted by input coming in.
19832
19833 What about (sit-for 0.1)?
19834
19835 I suppose a solution along the lines of dispatch-non-command-events
19836 might be OK if you've tried everything else and it doesn't work, but i'm
19837 leery of introducing new Lisp functions to deal with specific problems.
19838 Pretty soon we end up with a whole bevy of such ill-defined functions,
19839 like we already have. I think instead, you should introduce the
19840 following primitive:
19841
19842 (wait-for-event redisplay &rest event-specs)
19843
19844 Waits for one of the event specifications specified to happen. Returns
19845 something about what happened.
19846
19847 REDISPLAY controls the behavior of redisplay during waiting. Something
19848 like
19849
19850 - nil (never redisplay),
19851 - t (redisplay when it seems appropriate), etc.
19852
19853 EVENT-SPECS could be
19854
19855 t -- drain all non-user events, and then return
19856 any-process -- wait till input or state change on any process
19857 process -- wait till input or state change on process
19858 time -- wait till such-and-such time has elapsed
19859 'user -- wait till user event has happened
19860 '(user predicate) -- wait till user event matching the predicate has
19861 happened
19862 'event -- wait till any event has happened
19863 '(event predicate) -- wait till event matching the predicate has happened
19864
19865 The existing functions @code{next-event}, @code{next-command-event},
19866 @code{accept-process-output}, @code{sit-for}, @code{sleep-for}, etc. could all be
19867 written in terms of this new command. You could use this command inside
19868 of your glyph code to ensure that the events get processed that need do
19869 in order for widget updates to happen.
19870
19871 But you said something about need a magic event to invoke redisplay?
19872 Why is that?
19873 @end example
19874
19875 @example
19876 April 2, 2000
19877
19878 the internal distinction between "widget" and "layout" is bogus. there
19879 exist widgets that do drawing and do layout of their children,
19880 e.g. group-box widgets and proper tab widgets. the only sensible
19881 distinction is between widgets with children and those without children.
19882 @end example
19883
19884 @example
19885 April 5, 2000
19886
19887 andy, i'm not sure i really believe that you need to cycle the event
19888 code to get widgets to redisplay, but in any case you should
19889
19890 @enumerate
19891 @item
19892 hide the logic to do this in the c code; the lisp code should do
19893 nothing other than call (redisplay widget)
19894
19895 @item
19896 make sure your event-cycling code processes @strong{NO} events at all. this
19897 includes non-user events. queue the events instead.
19898 @end enumerate
19899
19900 in other words, dispatch-non-command-events must go, and i am proposing
19901 a general function (redisplay OBJECT) to replace the existing ad-hoc
19902 functions.
19903 @end example
19904
19905 @example
19906 April 6, 2000
19907
19908 the tab widget code should simply be able to create a whole lot of tabs
19909 without regard to the size of the gutter, and the surrounding layout
19910 widget (please please make layouts be proper widgets!) should
19911 automatically map and unmap them as necessary, to fill up the available
19912 space. perhaps this already works and what you're doing is just for
19913 optimization? but i get the feeling this is not the case.
19914 @end example
19915
19916 @example
19917 April 6, 2000
19918
19919 the function make-gutter-only-dialog-frame is bogus. the use of the
19920 gutter here to hold widgets is an implementation detail and should not
19921 be exposed in the interface. similarly, make-search-dialog should not
19922 have to do all the futzing that it does. creating the frame unmapped,
19923 creating an extent and messing with the gutter: all this stuff should be
19924 hidden. you should have a simple function make-dialog-frame that takes
19925 a dialog specification, and that's all you need to do.
19926
19927 also, these dialog boxes, and this function make-dialog-frame, should
19928
19929 a] be in dialog.el, not gutter-items.el.
19930 b] when possible, be placed in the interactive spec of standard lisp
19931 functions rather than accessed directly from menubar-items.el
19932 c] wrapped in calls to should-use-dialog-box-p, so the user has control
19933 over when dialog boxes appear.
19934 @end example
19935
19936 @example
19937 April 7, 2000
19938
19939 hmmm ... in that case, the whitespace absolutely needs to be specified
19940 as properties of the layout widget (e.g. :border-width and
19941 :border-height), rather than setting an overall size. you have no idea
19942 what the correct size should be if the user changes font size or uses
19943 translations in a different language.
19944
19945 Your modus operandi should be "hardcoded pixel sizes are @strong{always} bad."
19946 @end example
19947
19948 @example
19949 April 7, 2000
19950
19951 you mean the number of tabs adjusts, or the size of each tab adjusts (by
19952 making the font smaller or something)? if the size of a single tab is
19953 not related to the total space the tabs can fix into, then it should be
19954 possible to simply specify as many tabs as exist for buffers, and have
19955 the layout manager decide how many can fit into the available space.
19956 this does @strong{not} mean the layout manager will resize the tabs, because
19957 query-geometry on the tabs should find out that the tabs don't want to
19958 be any size other than they are.
19959
19960 the point here is that you should not @strong{have} to worry about pixel
19961 heights and widths @strong{anywhere} in Lisp-level code. The layout managers
19962 should take care of everything for you. The only exceptions may be in
19963 some text fields, which will be blank by default and you want to specify
19964 a maximum width (which should be done in 'n' sizes, not in pixels!).
19965
19966 i won't stop complaining until i see nearly every one of those
19967 pixel-width and pixel-height parameters gone, and the remaining ones
19968 there for a very, very good reason.
19969 @end example
19970
19971 @example
19972 April 7, 2000
19973
19974 Andy Piper wrote:
19975
19976 > At 03:51 PM 4/6/00 -0700, Ben Wing wrote:
19977 > >[the function make-gutter-only-dialog-frame is bogus]
19978 >
19979 > The problem is that some of the callbacks and such need access to the
19980 > @strong{created} frame, so you end up in a catch 22 unless you do what I've done.
19981
19982 [Ben proposes other ways to avoid exposing all the guts, as in
19983 @code{make-gutter-only-dialog-frame}:]
19984
19985 @enumerate
19986 @item
19987 Instead of passing in the actual glyph spec or glyph, pass in a
19988 function of two args (the dialog frame and its parents), which when
19989 called, creates and returns the appropriate glyph.
19990
19991 @item
19992 [Better] Provide a way for callbacks to determine where they were
19993 invoked at. This is much more general and is what you should really
19994 do. For example, have the code that calls the callbacks bind some
19995 global variables such as widget-callback-current-glyph and
19996 widget-callback-current-channel, which contain the glyph whose
19997 callback is being invoked, and the window or frame of the glyph
19998 (depending on where the glyph is) where the invocation actually
19999 happened. That way, the callbacks can easily figure out the dialog
20000 box and its parent, and not have to worry about embedding it in at
20001 creation time.
20002 @end enumerate
20003 @end example
20004
20005 @example
20006 April 15, 2000
20007 I don't understand when you say "the various types of callback". Are
20008 you using the callback for various different purposes?
20009
20010 Your widget callbacks should work just like any other callback: they
20011 take two arguments, one indicating the object to which the callback was
20012 attached (an image instance, i think), and the event that caused the
20013 callback to be invoked.
20014 @end example
20015
20016 @example
20017 April 17, 2000
20018
20019 I am completely vetoing widget-callback-current-channel. How about you
20020 create a new keyword, :new-callback, that is a function of two args,
20021 like i specified before.
20022
20023 btw if you really are calling your callback using call-interactively,
20024 why don't you declare a function (interactive "e") and then call
20025 event-channel on the resulting event? that should get you the same
20026 result as widget-callback-current-channel.
20027
20028 the problem with this and everything you've proposed is that there's no
20029 way, of course, to get at the actual widget that you were invoked from.
20030 would you propose adding widget-callback-current-widget?
20031 @end example
20032
20033 @node Old Future Work, Index, Future Work Discussion, Top
20034 @chapter Old Future Work
20035 @cindex old future work
20036 @cindex future work, old
20037
20038 This chapter includes proposals for future work that were later
20039 implemented. These proposals are included because they may describe to
20040 some extent the actual workings of the implemented code, and because
20041 they may discuss relevant design issues, alternative implementations, or
20042 work still to be done.
20043
20044
20045 @menu
20046 * Future Work -- A Portable Unexec Replacement::
20047 * Future Work -- Indirect Buffers::
20048 * Future Work -- Improvements in support for non-ASCII (European) keysyms under X::
20049 * Future Work -- xemacs.org Mailing Address Changes::
20050 * Future Work -- Lisp callbacks from critical areas of the C code::
20051 @end menu
20052
20053 @node Future Work -- A Portable Unexec Replacement, Future Work -- Indirect Buffers, Old Future Work, Old Future Work
20054 @section Future Work -- A Portable Unexec Replacement
20055 @cindex future work, a portable unexec replacement
20056 @cindex a portable unexec replacement, future work
20057
20058 @strong{Abstract:} Currently, during the build stage of XEmacs, a bare
20059 version of the program (called @dfn{temacs}) is run, which loads up a
20060 bunch of Lisp data and then writes out a modified executable file. This
20061 process is very tricky to implement and highly system-dependent. It can
20062 be replaced by a simple, mostly portable, and easy to implement scheme
20063 where the Lisp data is written out to a separate data file.
20064
20065 The scheme makes only three assumptions about the memory layout of a
20066 running XEmacs process, which, as far as I know, are met by all current
20067 implementations of XEmacs (and they're also requirements of the existing
20068 unexec scheme):
20069
20070 @enumerate
20071 @item
20072
20073 The initialized data segments of the various XEmacs modules are all laid
20074 out contiguously in memory and are separated from the initialized data
20075 segments of libraries that are linked with XEmacs; likewise for
20076 uninitialized data segments.
20077 @item
20078
20079 The beginning and end of the XEmacs portion of the combined initialized
20080 data segment can be programmatically determined; likewise for the
20081 uninitialized data segment.
20082 @item
20083
20084 The XEmacs portion of the initialized and uninitialized data segments
20085 are always loaded at the same place in memory.
20086
20087 @end enumerate
20088
20089 Assumption number three means that this scheme is non-relocatable, which
20090 is a disadvantage as compared to other, relocatable schemes that have
20091 been proposed. However, the advantage of this scheme over them is that
20092 it is much easier to implement and requires minimal changes to the
20093 XEmacs code base.
20094
20095 First, let's go over the theory behind the dumping mechanism. The
20096 principles that we would like to follow are:
20097
20098 @enumerate
20099 @item
20100
20101 We write out to disk all of the data structures and all of their
20102 sub-structures that we have created ourselves, except for data that is
20103 expected to change from invocation to invocation (in particular, data
20104 that is extracted from the external environment at run time).
20105 @item
20106
20107 We don't write out to disk any data structures created or initialized by
20108 system libraries, by the kernel or by any other code that we didn't
20109 create ourselves, because we can't count on that code working in the way
20110 that we want it to.
20111 @item
20112
20113 At the beginning of the next invocation of our program, we read in all
20114 those data structures that we have written out to disk, and then
20115 continue as if we had just created and initialized all of that data
20116 ourselves.
20117 @item
20118
20119 We make sure that our own data structures don't have any pointers to
20120 system data, or if they do, that we note all of these pointers so that
20121 we can re-create the system data and set up pointers to the data again
20122 in the next invocation.
20123 @item
20124
20125 During the next invocation of our program, we re-create all of our own
20126 data structures that are derived from the external environment.
20127
20128 @end enumerate
20129
20130 XEmacs, of course, is already set up to adhere to most of these
20131 principles.
20132
20133 In fact, the current dumping process that we are replacing does a few of
20134 these principles slightly differently and adds a few extra of its own:
20135
20136 @enumerate
20137 @item
20138
20139 All data structures of all sorts, including system data, are written
20140 out. This is the cause of no end of problems, and it is avoidable,
20141 because we can ensure that our own data and the system data are
20142 physically separated in memory.
20143 @item
20144
20145 Our own data structures that we derive from the external environment are
20146 in fact written out and read in, but then are simply overwritten during
20147 the next invocation with new data. Before dumping, we make sure to free
20148 any such data structure that would cause memory leaks.
20149 @item
20150
20151 XEmacs carefully arranges things so that all static variables in the
20152 initialized data are never written to after the dumping stage has
20153 completed. This allows for an additional optimization in which we can
20154 make static initialized data segments in pre-dumped invocations of
20155 XEmacs be read-only and shared among all XEmacs processes on a single
20156 machine.
20157
20158 @end enumerate
20159
20160 The difficult part in this process is figuring out where our data
20161 structures lie in memory so that we can correctly write them out and
20162 read them back in. The trick that we use to make this problem solvable
20163 is to ensure that the heap that is used for all dynamically allocated
20164 data structures that are created during the dumping process is located
20165 inside the memory of a large, statically declared array. This ensures
20166 that all of our own data structures are contained (at least at the time
20167 that we dump out our data) inside the static initialized and
20168 uninitialized data segments, which are physically separated in memory
20169 from any data treated by system libraries and whose starting and ending
20170 points are known and unchanging (we know that all of these things are
20171 true because we require them to be so, as preconditions of being able to
20172 make use of this method of dumping).
20173
20174 In order to implement this method of heap allocation, we change the
20175 memory allocation function that we use for our own data. (It's
20176 extremely important that this function not be used to allocate system
20177 data. This means that we must not redefine the @code{malloc} function
20178 using the linker, but instead we need to achieve this using the C
20179 preprocessor, or by simply using a different name, such as
20180 @code{xmalloc}. It's also very important that we use the correct
20181 @code{free} function when freeing dynamically-allocated data, depending
20182 on whether this data was allocated by us or by the
20183
20184 @node Future Work -- Indirect Buffers, Future Work -- Improvements in support for non-ASCII (European) keysyms under X, Future Work -- A Portable Unexec Replacement, Old Future Work
20185 @section Future Work -- Indirect Buffers
20186 @cindex future work, indirect buffers
20187 @cindex indirect buffers, future work
20188
20189 An indirect buffer is a buffer that shares its text with some other
20190 buffer, but has its own version of all of the buffer properties,
20191 including markers, extents, buffer local variables, etc. Indirect
20192 buffers are not currently implemented in XEmacs, but they are in GNU
20193 Emacs, and some people have asked for this feature. I consider this
20194 feature somewhat extent-related because much of the work required to
20195 implement this feature involves tracking extents properly.
20196
20197 In a world with indirect buffers, some buffers are direct, and some
20198 buffers are indirect. This only matters when there is more than one
20199 buffer sharing the same text. In such a case, one of the buffers can be
20200 considered the canonical buffer for the text in question. This buffer
20201 is a direct buffer, and all buffers sharing the text are indirect
20202 buffers. These two kinds of buffers are created differently. One of
20203 them is created simply using the @code{make_buffer()} function (or
20204 perhaps the @code{Fget_buffer_create()} function), and the other kind is
20205 created using the @code{make_indirect_buffer()} function, which takes
20206 another buffer as an argument which specifies the text of the indirect
20207 buffer being created. Every indirect buffer keeps track of the direct
20208 buffer that is its parent, and every direct buffer keeps a list of all
20209 of its indirect buffer children. This list is modified as buffers are
20210 created and deleted. Because buffers are permanent objects, there is no
20211 special garbage collection-related trickery involved in these parent and
20212 children pointers. There should never be an indirect buffer whose
20213 parent is also an indirect buffer. If the user attempts to set up such
20214 a situation using @code{make_indirect_buffer()}, either an error should
20215 be signaled or the parent of the indirect buffer should automatically
20216 become the direct buffer that actually is responsible for the text.
20217 Deleting a direct buffer should perhaps cause all of the indirect buffer
20218 children to be deleted automatically. There should be Lisp functions
20219 for determining whether a buffer is direct or indirect, and other
20220 functions for retrieving the parents, or the children of the buffer,
20221 depending on which is appropriate. (The scheme being described here is
20222 similar to symbolic links. Another possible scheme would be analogous
20223 to hard links, and would make no distinction between direct and indirect
20224 buffers. In that case, the text of the buffer logically exists as an
20225 object separate from the buffer itself and only goes away when the last
20226 buffer pointing to this text is deleted.)
20227
20228 Other than keeping track of parent and child pointer, the only remaining
20229 thing required to implement indirect buffers is to ensure that changes
20230 to the text of the buffer trigger the same sorts of effect in all the
20231 buffers that share that text. Luckily there are only three functions in
20232 XEmacs that actually make changes to the text of the buffer, and they
20233 are all located in the file @code{insdel.c}.
20234
20235 These three functions are called @code{buffer_insert_string_1()},
20236 @code{buffer_delete_range()}, and @code{buffer_replace_char()}. All of
20237 the subfunctions called by these functions are also in @code{insdel.c}.
20238
20239 The first thing that each of these three functions needs to do is check
20240 to see if its buffer argument is an indirect buffer, and if so, convert
20241 it to the indirect buffer's parent. Once that is done, the functions
20242 need to be modified so that all of the things they do, other than
20243 actually changing the buffers text, such as calling
20244 before-change-functions and after-change-functions, and updating extents
20245 and markers, need to be done over all of the buffers that are indirect
20246 children of the buffers being modified; as well as, of course, for the
20247 buffer itself. Each step in the process needs to be iterated for all of
20248 the buffers in question before proceeding to the next step. For
20249 example, in @code{buffer_insert_string_1()},
20250 @code{prepare_to_modify_buffer()} needs to be called in turn, for all of
20251 the buffers sharing the text being modified. Then the text itself is
20252 modified, then @code{insert_invalidate_line_number_cache()} is called
20253 for all of the buffers, then @code{record_insert()} is called for all of
20254 the buffers, etc. Essentially, the operation is being done on all of
20255 the buffers in parallel, rather than each buffer being processed in
20256 series. This is necessary because many of the steps can quit or call
20257 Lisp code and each step depends on the previous step, and some steps are
20258 done only once, rather than on each buffer. I imagine it would be
20259 significantly easier to implement this, if a macro were created for
20260 iterating over a buffer, and then all of the indirect children of that
20261 buffer.
20262
20263 @node Future Work -- Improvements in support for non-ASCII (European) keysyms under X, Future Work -- xemacs.org Mailing Address Changes, Future Work -- Indirect Buffers, Old Future Work
20264 @section Future Work -- Improvements in support for non-ASCII (European) keysyms under X
20265 @cindex future work, improvements in support for non-ascii (european) keysyms under x
20266 @cindex improvements in support for non-ascii (european) keysyms under x, future work
20267
20268 From Martin Buchholz.
20269
20270 If a user has a keyboard with known standard non-ASCII character
20271 equivalents, typically for European users, then Emacs' default
20272 binding should be self-insert-command, with the obvious character
20273 inserted. For example, if a user has a keyboard with
20274
20275 xmodmap -e "keycode 54 = scaron"
20276
20277 then pressing that key on the keyboard will insert the (Latin-2)
20278 character corresponding to "scaron" into the buffer.
20279
20280 Note: Emacs 20.6 does NOTHING when pressing such a key (not even an
20281 error), i.e. even (read-event) ignores this key, which means it can't
20282 even be bound to anything by a user trying to customize it.
20283
20284 This is implemented by maintaining a table of translations between all
20285 the known X keysym names and the corresponding (charset, octet) pairs.
20286
20287 For every key on the keyboard that has a known character correspondence,
20288 we define the ascii-character property of the keysym, and make the
20289 default binding for the key be self-insert-command.
20290
20291 The following magic is basically intimate knowledge of X11/keysymdef.h.
20292 The keysym mappings defined by X11 are based on the iso8859 standards,
20293 except for Cyrillic and Greek.
20294
20295 In a non-Mule world, a user can still have a multi-lingual editor, by doing
20296 (set-face-font "...-iso8859-2" (current-buffer))
20297 for all their Latin-2 buffers, etc.
20298
20299 @node Future Work -- xemacs.org Mailing Address Changes, Future Work -- Lisp callbacks from critical areas of the C code, Future Work -- Improvements in support for non-ASCII (European) keysyms under X, Old Future Work
20300 @section Future Work -- xemacs.org Mailing Address Changes
20301 @cindex future work, xemacs.org mailing address changes
20302 @cindex xemacs.org mailing address changes, future work
20303
20304 @subheading Personal addresses
20305
20306 @enumerate
20307 @item
20308
20309 Everyone who is contributing or has ever contributed code to the XEmacs
20310 core, or to any of the packages archived at xemacs.org, even if they
20311 don't actually have an account on any machine at xemacs.org. In fact,
20312 all of these people should have two mailing addresses at xemacs.org, one
20313 of which is their actual login name (or potential login name if they
20314 were ever to have an account), and the other one is in the form of first
20315 name/last name, similar to the way things are done at Sun. For example,
20316 Martin would have two addresses at xemacs.org, @code{martin@@xemacs.org},
20317 and @code{martin.buchholz@@xemacs.org}, with the latter one simply being
20318 an alias for the former. The idea is that in all cases, if you simply
20319 know the name of any past or present contributor to XEmacs, and you want
20320 to mail them, you will know immediately how to do this without having to
20321 do any complicated searching on the Web or in XEmacs documentation.
20322 @item
20323
20324 Furthermore, I think that all of the email addresses mentioned anywhere
20325 in the XEmacs source code or documentation should be changed to be the
20326 corresponding ones at xemacs.org, instead of any other email addresses
20327 that any contributors might have.
20328 @item
20329
20330 All the places in the source code where a contributor's name is
20331 mentioned, but no email addressed is attached, should be found, and the
20332 correct xemacs.org address should be attached.
20333 @item
20334
20335 The alias file mapping people's addresses at xemacs.org to their actual
20336 addresses elsewhere (in the case, as will be true for the majority of
20337 addresses, where the contributor does not actually have an account at
20338 xemacs.org, but simply a forwarding pointer), should be viewable on the
20339 xemacs.org web site through a CGI script that reads the alias file and
20340 turns it into an HTML table.
20341
20342 @end enumerate
20343
20344 @subheading Package addresses
20345
20346 I also think that for every package archived at xemacs.org, there should
20347 be three corresponding email addresses at xemacs.org. For example,
20348 consider a package such as @code{lazy-shot}. The addresses associated
20349 with this package would be:
20350
20351 @table @code
20352 @item lazy-shot@@xemacs.org
20353 This is a discussion mailing list about the @code{lazy-shot} package,
20354 and it should be controlled by Majordomo in the standard fashion.
20355 @item lazy-shot-patches@@xemacs.org
20356 This is where patches to the @code{lazy-shot} package are set. This
20357 should go to various people who are interested in such patches. For
20358 example, the maintainer of @code{lazy-shot}, perhaps the maintainer of
20359 XEmacs itself, and probably to other people who have volunteered to do
20360 code review for this package, or for a larger group of packages that
20361 this package is in. Perhaps this list should also be maintained by
20362 Majordomo.
20363 @item lazy-shot-maintainer@@xemacs.org
20364 This address is for mailing the maintainer directly. It is possible
20365 that this will go to more than one person. This would particularly be
20366 the case, for example, if the maintainer is dormant or does not appear
20367 very responsive to patches. In this case, the address would also point
20368 to someone like Steve, who is acting in the maintainer's stead, and who
20369 will himself apply patches or make other changes to the package as
20370 maintained in the CVS archive on xemacs.org.
20371 @end table
20372
20373 It may take a bit of work to track down the current addresses for the
20374 various package maintainers, and may in general seem like a lot of work
20375 to set up all of these mail addresses, but I think it's very important
20376 to make it as easy as possible for random XEmacs users to be able to
20377 submit patches and report bugs in an orderly fashion. The general idea
20378 that I'm striving for is to create as much momentum as possible in the
20379 XEmacs development community, and I think having the system of mail
20380 addresses set up will make it much easier for this momentum to be built
20381 up and to remain.
20382
20383 @uref{../../www.666.com/ben/default.htm,Ben Wing}
20384
20385 @node Future Work -- Lisp callbacks from critical areas of the C code, , Future Work -- xemacs.org Mailing Address Changes, Old Future Work
20386 @section Future Work -- Lisp callbacks from critical areas of the C code
20387 @cindex future work, lisp callbacks from critical areas of the c code
20388 @cindex lisp callbacks from critical areas of the c code, future work
20389
20390 @example
20391 There are many places in the XEmacs C code where Lisp functions are
20392 called, usually because the Lisp function is acting as a callback,
20393 hook, process filter, or the like. The lisp code is often called in
20394 places where some lisp operations are dangerous. Currently there are
20395 a lot of ad-hoc schemes implemented to try to prevent these dangerous
20396 operations from causing problems. I've added a lot of them myself,
20397 for example, the @code{call*_trapping_errors()} functions. Other places,
20398 such as the pre-gc- and post-gc-hooks, do their own ad hoc processing.
20399 I'm proposing a scheme that would generalize all of this ad hoc code
20400 and allow Lisp code to be called in all sorts of sensitive areas of
20401 the C code, including even within redisplay.
20402
20403 Basically, we define a set of operations that are disallowable because
20404 they are dangerous. We essentially assign a bit flag to all of these
20405 operations. Whenever any sensitive C code wants to call Lisp code,
20406 instead of using the standard call* functions, it uses a new set of
20407 functions, call*_critical, which takes an extra parameter, which is a
20408 bit mask specifying the set of operations which are disallowed. The
20409 basic operations of these functions is simply to set a global variable
20410 corresponding to the bit mask (more specifically, the functions store
20411 the previous value of this global variable in an unwind_protect, and
20412 use bitwise-or to combine the previous value with the new bit mask
20413 that was passed in). (Actually, we should first implement a slightly
20414 lower level function which is called @code{enter_sensitive_code_section()},
20415 which simply sets up the global variable and the @code{unwind_protect()}, and
20416 returns a @code{specbind()} value, but doesn't actually call any Lisp code.
20417 There is a corresponding function @code{exit_sensitive_code_section()}, which
20418 takes the specbind value as an argument, and unwinds the
20419 unwind_protect. The call*_sensitive functions are trivially
20420 implemented in terms of these lower level functions.)
20421
20422 Corresponding to each of these entries is the C name of the bit flag.
20423
20424 The sets of dangerous operations which can be prohibited are:
20425
20426 OPERATION_GC_PROHIBITED
20427 1. garbage collection. When this flag is set, and the garbage
20428 collection threshold is reached, garbage collection simply doesn't
20429 happen. It will happen at the next opportunity that it is allowed.
20430 Similarly, explicitly calling the Lisp function garbage-collect
20431 simply does nothing.
20432
20433 OPERATION_CATCH_ERRORS
20434 2. signalling an error. When @code{enter_sensitive_code_section()} is
20435 called, with the bit flag corresponding to this prohibited
20436 operation. When this bit flag is passed to
20437 @code{enter_sensitive_code_section()}, a catch is set up which catches all
20438 errors, signals a warning with @code{warn_when_safe()}, and then simply
20439 continues. This is exactly the same behavior you now get with the
20440 @code{call_*_trapping_errors()} functions. (there should also be some way
20441 of specifying a warning level and class here, similar to the
20442 @code{call_*_trapping_errors()} functions. This is not completely
20443 important, however, because a standard warning level and class
20444 could simply be chosen.)
20445
20446 OPERATION_NO_UNSAFE_OBJECT_DELETION
20447 3. This flag prohibits deletion of any permanent object (i.e. any
20448 object that does not automatically disappear when created, such as
20449 buffers, frames, devices, windows, etc...) unless they were created
20450 after this bit flag was set. This would be implemented using a
20451 list which stores all of the permanent objects created after this
20452 bit flag was set. This list is reset to its previous value when
20453 the call to @code{exit_sensitive_code_section()} occurs. The motivation
20454 here is to allow Lisp callbacks to create their own temporary
20455 buffers or frames, and later delete them, but not allow any other
20456 permanent objects to be deleted, because C code might be working
20457 with them, and not expect them to change.
20458
20459 OPERATION_NO_BUFFER_MODIFICATION
20460 4. This flag disallows modifications to the text, extent or any other
20461 properties of any buffers except those created after this flag was
20462 set, just like in the previous entry.
20463
20464 OPERATION_NO_REDISPLAY
20465 5. This bit flag inhibits any redisplay-related operations from
20466 happening, more specifically, any entry into the redisplay-related
20467 code. This includes, for example, the Lisp functions sit-for,
20468 force-redisplay, force-cursor-redisplay, window-end with certain
20469 arguments to it, and various other functions. When this flag is
20470 set, instead of entering the redisplay code, the calling function
20471 should simply make sure not to enter the redisplay code, (for
20472 example, in the case of window-end), or postpone the redisplay
20473 until such a time when it's safe (for example, with sit-for and
20474 force-redisplay).
20475
20476 OPERATION_NO_REDISPLAY_SETTINGS_CHANGE
20477 6. This flag prohibits any modifications to faces, glyphs, specifiers,
20478 extents, or any other settings that will affect the way that any
20479 window is displayed.
20480
20481
20482 The idea here is that it will finally be safe to call Lisp code from
20483 nearly any part of the C code, simply by setting any combination of
20484 restricted operation bit flags. This even includes from within
20485 redisplay. (in such a case, all of the bit flags need to be set). The
20486 reason that I thought of this is that some coding system translations
20487 might cause Lisp code to be invoked and C code often invokes these
20488 translations in sensitive places.
20489 @end example
20490
20491 @c Indexing guidelines
20492
20493 @c I assume that all indexes will be combined.
20494 @c Therefore, if a generated findex and permutations
20495 @c cover the ways an index user would look up the entry,
20496 @c then no cindex is added.
20497 @c Concept index (cindex) entries will also be permuted. Therefore, they
20498 @c have no commas and few irrelevant connectives in them.
20499
20500 @c I tried to include words in a cindex that give the context of the entry,
20501 @c particularly if there is more than one entry for the same concept.
20502 @c For example, "nil in keymap"
20503 @c Similarly for explicit findex and vindex entries, e.g. "print example".
20504
20505 @c Error codes are given cindex entries, e.g. "end-of-file error".
20506
20507 @c pindex is used for .el files and Unix programs
20508
20509 @node Index, , Old Future Work, Top
20510 @unnumbered Index
20511
20512 @ignore
20513 All variables, functions, keys, programs, files, and concepts are
20514 in this one index.
20515
20516 All names and concepts are permuted, so they appear several times, one
20517 for each permutation of the parts of the name. For example,
20518 @code{function-name} would appear as @b{function-name} and @b{name,
20519 function-}. Key entries are not permuted, however.
20520 @end ignore
20521
20522 @c Print the indices
20523
20524 @printindex fn
11039 20525
11040 @c Print the tables of contents 20526 @c Print the tables of contents
11041 @summarycontents 20527 @summarycontents
11042 @contents 20528 @contents
11043 @c That's all 20529 @c That's all