Mercurial > hg > xemacs-beta
comparison man/internals/internals.texi @ 2362:6aa56b089139
[xemacs-hg @ 2004-11-02 09:51:04 by ben]
To: xemacs-patches@xemacs.org
internals/index.texi: Deleted.
Incorporated into internals.texi. Having a separate
index file messes up texinfo-master-menu.
internals/internals.texi:
Add bunches and bunches and bunches and bunches of stuff, taken
from documentation floating around in various places -- text.c,
file-coding.c, other .c and .h files, stuff that I wrote up for an
old XEmacs contract, proposals written up in the process of an
e-mail discussion, etc. Fix up some mistakes, esp. in CCL. Extra
crap from CCL, duplicated with Lispref, removed. Sections on Old
Future Work and Future Work Discussion added.
Bunches of other work. Add bunches of documentation taken from the
source code. Fixup various places to use @strong{}, @code{},
@file{}. Create new Text chapter, split off from Buffers and
Textual Representation. Create new chapter for MS Windows, mostly
written from scratch. Consolidate all Mule info under
"Multilingual Support". Break up chapter on modules and move some
parts to the sections discussing the modules, for consolidation
purposes. Add a big cross-reference table for all the modules to
where they're discussed (or not). New chapter Asynchronous
Events; Quit Checking. (Taken from various parts of the code.) New
Introduction. New section on Focus Handling (from the code).
NOTE that in the process, I discovered that we essentially have
FOUR redundant introductions to Mule issues! Someone really needs
to go through and clean them up and integrate them (sjt?).
author | ben |
---|---|
date | Tue, 02 Nov 2004 09:51:18 +0000 |
parents | e13775448cf0 |
children | ce4aa0ef8af1 |
comparison
equal
deleted
inserted
replaced
2361:5ff532e448b5 | 2362:6aa56b089139 |
---|---|
8 @dircategory XEmacs Editor | 8 @dircategory XEmacs Editor |
9 @direntry | 9 @direntry |
10 * Internals: (internals). XEmacs Internals Manual. | 10 * Internals: (internals). XEmacs Internals Manual. |
11 @end direntry | 11 @end direntry |
12 | 12 |
13 Copyright @copyright{} 1992 - 1996 Ben Wing. | 13 Edition History: |
14 | |
15 Created November 1995 (?) by Ben Wing. | |
16 XEmacs Internals Manual Version 1.0, March, 1996. | |
17 XEmacs Internals Manual Version 1.1, March, 1997. | |
18 XEmacs Internals Manual Version 1.4, March, 2001. | |
19 XEmacs Internals Manual Version 21.5, October, 2004. | |
20 @c Please REMEMBER to update edition number in *four* places in this file, | |
21 @c including adding a line above. | |
22 | |
23 Copyright @copyright{} 1992 - 2004 Ben Wing. | |
14 Copyright @copyright{} 1996, 1997 Sun Microsystems. | 24 Copyright @copyright{} 1996, 1997 Sun Microsystems. |
15 Copyright @copyright{} 1994 - 1998, 2002, 2003 Free Software Foundation. | 25 Copyright @copyright{} 1994 - 1998, 2002, 2003 Free Software Foundation. |
16 Copyright @copyright{} 1994, 1995 Board of Trustees, University of Illinois. | 26 Copyright @copyright{} 1994, 1995 Board of Trustees, University of Illinois. |
17 | 27 |
18 | 28 |
61 @setchapternewpage odd | 71 @setchapternewpage odd |
62 @finalout | 72 @finalout |
63 | 73 |
64 @titlepage | 74 @titlepage |
65 @title XEmacs Internals Manual | 75 @title XEmacs Internals Manual |
66 @subtitle Version 1.4, March 2001 | 76 @subtitle Version 21.5, October 2004 |
67 | 77 |
68 @author Ben Wing | 78 @author Ben Wing |
79 @sp 1 | |
80 | |
81 Improvements by | |
82 | |
83 @sp 1 | |
84 | |
85 @author Stephen Turnbull | |
69 @author Martin Buchholz | 86 @author Martin Buchholz |
70 @author Hrvoje Niksic | 87 @author Hrvoje Niksic |
71 @author Matthias Neubauer | 88 @author Matthias Neubauer |
72 @author Olivier Galibert | 89 @author Olivier Galibert |
90 @author Andy Piper | |
91 | |
92 | |
73 @page | 93 @page |
74 @vskip 0pt plus 1fill | 94 @vskip 0pt plus 1fill |
75 | 95 |
76 @noindent | 96 @noindent |
77 Copyright @copyright{} 1992 - 1996, 2001 Ben Wing. @* | 97 Copyright @copyright{} 1992 - 2004 Ben Wing. @* |
78 Copyright @copyright{} 1996, 1997 Sun Microsystems, Inc. @* | 98 Copyright @copyright{} 1996, 1997 Sun Microsystems. @* |
79 Copyright @copyright{} 1994 - 1998 Free Software Foundation. @* | 99 Copyright @copyright{} 1994 - 1998, 2002, 2003 Free Software Foundation. @* |
80 Copyright @copyright{} 1994, 1995 Board of Trustees, University of Illinois. | 100 Copyright @copyright{} 1994, 1995 Board of Trustees, University of Illinois. |
81 | 101 |
82 @sp 2 | 102 @sp 2 |
83 Version 1.4 @* | 103 Version 21.5 @* |
84 March 2001.@* | 104 October 2004.@* |
85 | 105 |
86 Permission is granted to make and distribute verbatim copies of this | 106 Permission is granted to make and distribute verbatim copies of this |
87 manual provided the copyright notice and this permission notice are | 107 manual provided the copyright notice and this permission notice are |
88 preserved on all copies. | 108 preserved on all copies. |
89 | 109 |
100 included in a translation approved by the Free Software Foundation | 120 included in a translation approved by the Free Software Foundation |
101 instead of in the original English. | 121 instead of in the original English. |
102 @end titlepage | 122 @end titlepage |
103 @page | 123 @page |
104 | 124 |
105 @node Top, A History of Emacs, (dir), (dir) | 125 @node Top, Introduction, (dir), (dir) |
106 | 126 |
107 @ifinfo | 127 @ifinfo |
108 This Info file contains v1.4 of the XEmacs Internals Manual, March 2001. | 128 This Info file contains v21.5 of the XEmacs Internals Manual, October 2004. |
109 @end ifinfo | 129 @end ifinfo |
110 | 130 |
131 @c Don't update this by hand!!!!!! | |
132 @c Use C-u C-c C-u m (aka C-u M-x texinfo-master-list). | |
133 @c NOTE: This command does not include the Index:: menu entry. | |
134 @c You must add it by hand. | |
135 | |
136 @c Here are some useful Lisp routines for quickly Texinfo-izing text that | |
137 @c has been formatted into ASCII lists and tables. The first routine is | |
138 @c currently more general and well-developed than the second. | |
139 | |
140 @c (defun list-to-texinfo (b e) | |
141 @c "Convert the selected region from an ASCII list to a Texinfo list." | |
142 @c (interactive "r") | |
143 @c (save-restriction | |
144 @c (narrow-to-region b e) | |
145 @c (goto-char (point-min)) | |
146 @c (let ((dash-type "^ *-+ +") | |
147 @c (num-type "^ *[[(]?\\([0-9]+\\|[a-z]\\)[]).] +") | |
148 @c dash) | |
149 @c (save-excursion | |
150 @c (cond ((re-search-forward num-type nil t)) | |
151 @c ((re-search-forward dash-type nil t) (setq dash t)) | |
152 @c (t (error "No table entries?")))) | |
153 @c (if dash (insert "@itemize @bullet\n") | |
154 @c (insert "@enumerate\n")) | |
155 @c (while (re-search-forward (if dash dash-type num-type) nil t) | |
156 @c (let ((p (point))) | |
157 @c (or (re-search-forward (if dash dash-type num-type) nil t) | |
158 @c (goto-char (point-max))) | |
159 @c (beginning-of-line) | |
160 @c (forward-line -1) | |
161 @c (let ((q (point))) | |
162 @c (goto-char p) | |
163 @c (kill-rectangle p q)) | |
164 @c (insert "@item\n"))) | |
165 @c (goto-char (point-max)) | |
166 @c (beginning-of-line) | |
167 @c (if dash (insert "@end itemize\n") | |
168 @c (insert "@end enumerate\n"))))) | |
169 | |
170 @c (defun table-to-texinfo (b e) | |
171 @c "Convert the selected region from an ASCII table to a Texinfo table." | |
172 @c (interactive "r") | |
173 @c (save-restriction | |
174 @c (narrow-to-region b e) | |
175 @c (goto-char (point-min)) | |
176 @c (insert "@table @code\n") | |
177 @c (while (not (eobp)) | |
178 @c (insert "@item ") | |
179 @c (forward-sexp) | |
180 @c (delete-char) | |
181 @c (insert "\n") | |
182 @c (or (search-forward "\n\n" nil t) | |
183 @c (goto-char (point-max)))) | |
184 @c (beginning-of-line) | |
185 @c (insert "@end table\n"))) | |
186 | |
187 @c A useful Lisp routine for adding markup based on conventions used in plain | |
188 @c text files; see doc string below. | |
189 | |
190 @c (defun convert-text-to-texinfo (&optional no-narrow) | |
191 @c "Convert text to Texinfo. | |
192 @c If the region is active, do the region; otherwise, go from point to the end | |
193 @c of the buffer. This query-replaces for various kinds of conventions used | |
194 @c in text: @code{} surrounded by ` and ' or followed by a (); @strong{} | |
195 @c surrounded by *'s; @file{} something that looks like a file name." | |
196 @c (interactive) | |
197 @c (if (region-active-p) | |
198 @c (save-restriction | |
199 @c (narrow-to-region (region-beginning) (region-end)) | |
200 @c (convert-comments-to-texinfo t)) | |
201 @c (let ((p (point)) | |
202 @c (case-replace nil)) | |
203 @c (query-replace-regexp "`\\([^']+\\)'\\([^']\\)" "@code{\\1}\\2" nil) | |
204 @c (goto-char p) | |
205 @c (query-replace-regexp "\\(\\Sw\\)\\*\\(\\(?:\\s_\\|\\sw\\)+\\)\\*\\([^A-Za-z.}]\\)" "\\1@strong{\\2}\\3" nil) | |
206 @c (goto-char p) | |
207 @c (query-replace-regexp "\\(\\(\\s_\\|\\sw\\)+()\\)\\([^}]\\)" "@code{\\1}\\3" nil) | |
208 @c (goto-char p) | |
209 @c (query-replace-regexp "\\(\\(\\s_\\|\\sw\\)+\\.[A-Za-z]+\\)\\([^A-Za-z.}]\\)" "@file{\\1}\\3" nil) | |
210 @c ))) | |
211 | |
111 @menu | 212 @menu |
213 * Introduction:: Overview of this manual. | |
214 * Authorship of XEmacs:: | |
112 * A History of Emacs:: Times, dates, important events. | 215 * A History of Emacs:: Times, dates, important events. |
113 * XEmacs From the Outside:: A broad conceptual overview. | 216 * XEmacs From the Outside:: A broad conceptual overview. |
114 * The Lisp Language:: An overview. | 217 * The Lisp Language:: An overview. |
115 * XEmacs From the Perspective of Building:: | 218 * XEmacs From the Perspective of Building:: |
116 * Build-Time Dependencies:: | 219 * Build-Time Dependencies:: |
117 * XEmacs From the Inside:: | 220 * XEmacs From the Inside:: |
118 * The XEmacs Object System (Abstractly Speaking):: | 221 * The XEmacs Object System (Abstractly Speaking):: |
119 * How Lisp Objects Are Represented in C:: | 222 * How Lisp Objects Are Represented in C:: |
120 * Major Textual Changes:: | 223 * Major Textual Changes:: |
121 * Rules When Writing New C Code:: | 224 * Rules When Writing New C Code:: |
122 * Regression Testing XEmacs:: | 225 * Regression Testing XEmacs:: |
123 * CVS Techniques:: | 226 * CVS Techniques:: |
124 * A Summary of the Various XEmacs Modules:: | 227 * The Modules of XEmacs:: |
125 * Allocation of Objects in XEmacs Lisp:: | 228 * Allocation of Objects in XEmacs Lisp:: |
126 * Dumping:: | 229 * Dumping:: |
127 * Events and the Event Loop:: | 230 * Events and the Event Loop:: |
128 * Evaluation; Stack Frames; Bindings:: | 231 * Asynchronous Events; Quit Checking:: |
129 * Symbols and Variables:: | 232 * Evaluation; Stack Frames; Bindings:: |
130 * Buffers and Textual Representation:: | 233 * Symbols and Variables:: |
131 * MULE Character Sets and Encodings:: | 234 * Buffers:: |
132 * The Lisp Reader and Compiler:: | 235 * Text:: |
133 * Lstreams:: | 236 * Multilingual Support:: |
134 * Consoles; Devices; Frames; Windows:: | 237 * The Lisp Reader and Compiler:: |
135 * The Redisplay Mechanism:: | 238 * Lstreams:: |
136 * Extents:: | 239 * Consoles; Devices; Frames; Windows:: |
137 * Faces:: | 240 * The Redisplay Mechanism:: |
138 * Glyphs:: | 241 * Extents:: |
139 * Specifiers:: | 242 * Faces:: |
140 * Menus:: | 243 * Glyphs:: |
141 * Subprocesses:: | 244 * Specifiers:: |
142 * Interface to the X Window System:: | 245 * Menus:: |
143 * Index:: | 246 * Subprocesses:: |
247 * Interface to MS Windows:: | |
248 * Interface to the X Window System:: | |
249 * Future Work:: | |
250 * Future Work Discussion:: | |
251 * Old Future Work:: | |
252 * Index:: | |
144 | 253 |
145 @detailmenu | 254 @detailmenu |
146 | 255 --- The Detailed Node Listing --- |
147 --- The Detailed Node Listing --- | |
148 | 256 |
149 A History of Emacs | 257 A History of Emacs |
150 | 258 |
151 * Through Version 18:: Unification prevails. | 259 * Through Version 18:: Unification prevails. |
152 * Lucid Emacs:: One version 19 Emacs. | 260 * Lucid Emacs:: One version 19 Emacs. |
153 * GNU Emacs 19:: The other version 19 Emacs. | 261 * GNU Emacs 19:: The other version 19 Emacs. |
154 * GNU Emacs 20:: The other version 20 Emacs. | 262 * GNU Emacs 20:: The other version 20 Emacs. |
155 * XEmacs:: The continuation of Lucid Emacs. | 263 * XEmacs:: The continuation of Lucid Emacs. |
156 | 264 |
265 Major Textual Changes | |
266 | |
267 * Great Integral Type Renaming:: | |
268 * Text/Char Type Renaming:: | |
269 | |
157 Rules When Writing New C Code | 270 Rules When Writing New C Code |
158 | 271 |
159 * General Coding Rules:: | 272 * A Reader's Guide to XEmacs Coding Conventions:: |
160 * Writing Lisp Primitives:: | 273 * General Coding Rules:: |
161 * Writing Good Comments:: | 274 * Object-Oriented Techniques for C:: |
162 * Adding Global Lisp Variables:: | 275 * Writing Lisp Primitives:: |
163 * Proper Use of Unsigned Types:: | 276 * Writing Good Comments:: |
164 * Coding for Mule:: | 277 * Adding Global Lisp Variables:: |
165 * Techniques for XEmacs Developers:: | 278 * Writing Macros:: |
166 | 279 * Proper Use of Unsigned Types:: |
167 Coding for Mule | 280 * Techniques for XEmacs Developers:: |
168 | 281 |
169 * Character-Related Data Types:: | 282 Regression Testing XEmacs |
170 * Working With Character and Byte Positions:: | 283 |
171 * Conversion to and from External Data:: | 284 * How to Regression-Test:: |
172 * General Guidelines for Writing Mule-Aware Code:: | 285 * Modules for Regression Testing:: |
173 * An Example of Mule-Aware Code:: | |
174 | 286 |
175 CVS Techniques | 287 CVS Techniques |
176 | 288 |
177 * Merging a Branch into the Trunk:: | 289 * Merging a Branch into the Trunk:: |
178 | 290 |
179 Regression Testing XEmacs | 291 The Modules of XEmacs |
180 | 292 |
181 A Summary of the Various XEmacs Modules | 293 * A Summary of the Various XEmacs Modules:: |
182 | 294 * Low-Level Modules:: |
183 * Low-Level Modules:: | 295 * Basic Lisp Modules:: |
184 * Basic Lisp Modules:: | 296 * Modules for Standard Editing Operations:: |
185 * Modules for Standard Editing Operations:: | 297 * Modules for Interfacing with the File System:: |
186 * Editor-Level Control Flow Modules:: | 298 * Modules for Other Aspects of the Lisp Interpreter and Object System:: |
187 * Modules for the Basic Displayable Lisp Objects:: | 299 * Modules for Interfacing with the Operating System:: |
188 * Modules for other Display-Related Lisp Objects:: | |
189 * Modules for the Redisplay Mechanism:: | |
190 * Modules for Interfacing with the File System:: | |
191 * Modules for Other Aspects of the Lisp Interpreter and Object System:: | |
192 * Modules for Interfacing with the Operating System:: | |
193 * Modules for Interfacing with X Windows:: | |
194 * Modules for Internationalization:: | |
195 * Modules for Regression Testing:: | |
196 | 300 |
197 Allocation of Objects in XEmacs Lisp | 301 Allocation of Objects in XEmacs Lisp |
198 | 302 |
199 * Introduction to Allocation:: | 303 * Introduction to Allocation:: |
200 * Garbage Collection:: | 304 * Garbage Collection:: |
201 * GCPROing:: | 305 * GCPROing:: |
202 * Garbage Collection - Step by Step:: | 306 * Garbage Collection - Step by Step:: |
203 * Integers and Characters:: | 307 * Integers and Characters:: |
204 * Allocation from Frob Blocks:: | 308 * Allocation from Frob Blocks:: |
205 * lrecords:: | 309 * lrecords:: |
206 * Low-level allocation:: | 310 * Low-level allocation:: |
207 * Cons:: | 311 * Cons:: |
208 * Vector:: | 312 * Vector:: |
209 * Bit Vector:: | 313 * Bit Vector:: |
210 * Symbol:: | 314 * Symbol:: |
211 * Marker:: | 315 * Marker:: |
212 * String:: | 316 * String:: |
213 * Compiled Function:: | 317 * Compiled Function:: |
214 | 318 |
215 Garbage Collection - Step by Step | 319 Garbage Collection - Step by Step |
216 | 320 |
217 * Invocation:: | 321 * Invocation:: |
218 * garbage_collect_1:: | 322 * garbage_collect_1:: |
219 * mark_object:: | 323 * mark_object:: |
220 * gc_sweep:: | 324 * gc_sweep:: |
221 * sweep_lcrecords_1:: | 325 * sweep_lcrecords_1:: |
222 * compact_string_chars:: | 326 * compact_string_chars:: |
223 * sweep_strings:: | 327 * sweep_strings:: |
224 * sweep_bit_vectors_1:: | 328 * sweep_bit_vectors_1:: |
225 | 329 |
226 Dumping | 330 Dumping |
227 | 331 |
228 * Overview:: | 332 * Dumping Justification:: |
229 * Data descriptions:: | 333 * Overview:: |
230 * Dumping phase:: | 334 * Data descriptions:: |
231 * Reloading phase:: | 335 * Dumping phase:: |
336 * Reloading phase:: | |
337 * Remaining issues:: | |
232 | 338 |
233 Dumping phase | 339 Dumping phase |
234 | 340 |
235 * Object inventory:: | 341 * Object inventory:: |
236 * Address allocation:: | 342 * Address allocation:: |
237 * The header:: | 343 * The header:: |
238 * Data dumping:: | 344 * Data dumping:: |
239 * Pointers dumping:: | 345 * Pointers dumping:: |
240 | 346 |
241 Events and the Event Loop | 347 Events and the Event Loop |
242 | 348 |
243 * Introduction to Events:: | 349 * Introduction to Events:: |
244 * Main Loop:: | 350 * Main Loop:: |
245 * Specifics of the Event Gathering Mechanism:: | 351 * Specifics of the Event Gathering Mechanism:: |
246 * Specifics About the Emacs Event:: | 352 * Specifics About the Emacs Event:: |
247 * The Event Stream Callback Routines:: | 353 * Event Queues:: |
248 * Other Event Loop Functions:: | 354 * Event Stream Callback Routines:: |
249 * Converting Events:: | 355 * Other Event Loop Functions:: |
250 * Dispatching Events; The Command Builder:: | 356 * Stream Pairs:: |
357 * Converting Events:: | |
358 * Dispatching Events; The Command Builder:: | |
359 * Focus Handling:: | |
360 * Editor-Level Control Flow Modules:: | |
361 | |
362 Asynchronous Events; Quit Checking | |
363 | |
364 * Signal Handling:: | |
365 * Control-G (Quit) Checking:: | |
366 * Profiling:: | |
367 * Asynchronous Timeouts:: | |
368 * Exiting:: | |
251 | 369 |
252 Evaluation; Stack Frames; Bindings | 370 Evaluation; Stack Frames; Bindings |
253 | 371 |
254 * Evaluation:: | 372 * Evaluation:: |
255 * Dynamic Binding; The specbinding Stack; Unwind-Protects:: | 373 * Dynamic Binding; The specbinding Stack; Unwind-Protects:: |
256 * Simple Special Forms:: | 374 * Simple Special Forms:: |
257 * Catch and Throw:: | 375 * Catch and Throw:: |
258 | 376 |
259 Symbols and Variables | 377 Symbols and Variables |
260 | 378 |
261 * Introduction to Symbols:: | 379 * Introduction to Symbols:: |
262 * Obarrays:: | 380 * Obarrays:: |
263 * Symbol Values:: | 381 * Symbol Values:: |
264 | 382 |
265 Buffers and Textual Representation | 383 Buffers |
266 | 384 |
267 * Introduction to Buffers:: A buffer holds a block of text such as a file. | 385 * Introduction to Buffers:: A buffer holds a block of text such as a file. |
268 * The Text in a Buffer:: Representation of the text in a buffer. | |
269 * Buffer Lists:: Keeping track of all buffers. | 386 * Buffer Lists:: Keeping track of all buffers. |
270 * Markers and Extents:: Tagging locations within a buffer. | 387 * Markers and Extents:: Tagging locations within a buffer. |
388 * The Buffer Object:: The Lisp object corresponding to a buffer. | |
389 | |
390 Text | |
391 | |
392 * The Text in a Buffer:: Representation of the text in a buffer. | |
271 * Ibytes and Ichars:: Representation of individual characters. | 393 * Ibytes and Ichars:: Representation of individual characters. |
272 * The Buffer Object:: The Lisp object corresponding to a buffer. | 394 * Byte-Char Position Conversion:: |
273 * Searching and Matching:: Higher-level algorithms. | 395 * Searching and Matching:: Higher-level algorithms. |
274 | 396 |
275 MULE Character Sets and Encodings | 397 Multilingual Support |
276 | 398 |
277 * Character Sets:: | 399 * Introduction to Multilingual Issues #1:: |
278 * Encodings:: | 400 * Introduction to Multilingual Issues #2:: |
279 * Internal Mule Encodings:: | 401 * Introduction to Multilingual Issues #3:: |
280 * CCL:: | 402 * Introduction to Multilingual Issues #4:: |
403 * Character Sets:: | |
404 * Encodings:: | |
405 * Internal Mule Encodings:: | |
406 * Byte/Character Types; Buffer Positions; Other Typedefs:: | |
407 * Internal Text API's:: | |
408 * Coding for Mule:: | |
409 * CCL:: | |
410 * Modules for Internationalization:: | |
281 | 411 |
282 Encodings | 412 Encodings |
283 | 413 |
284 * Japanese EUC (Extended Unix Code):: | 414 * Japanese EUC (Extended Unix Code):: |
285 * JIS7:: | 415 * JIS7:: |
286 | 416 |
287 Internal Mule Encodings | 417 Internal Mule Encodings |
288 | 418 |
289 * Internal String Encoding:: | 419 * Internal String Encoding:: |
290 * Internal Character Encoding:: | 420 * Internal Character Encoding:: |
421 | |
422 Byte/Character Types; Buffer Positions; Other Typedefs | |
423 | |
424 * Byte Types:: | |
425 * Different Ways of Seeing Internal Text:: | |
426 * Buffer Positions:: | |
427 * Other Typedefs:: | |
428 * Usage of the Various Representations:: | |
429 * Working With the Various Representations:: | |
430 | |
431 Internal Text API's | |
432 | |
433 * Basic internal-format API's:: | |
434 * The DFC API:: | |
435 * The Eistring API:: | |
436 | |
437 Coding for Mule | |
438 | |
439 * Character-Related Data Types:: | |
440 * Working With Character and Byte Positions:: | |
441 * Conversion to and from External Data:: | |
442 * General Guidelines for Writing Mule-Aware Code:: | |
443 * An Example of Mule-Aware Code:: | |
444 * Mule-izing Code:: | |
291 | 445 |
292 Lstreams | 446 Lstreams |
293 | 447 |
294 * Creating an Lstream:: Creating an lstream object. | 448 * Creating an Lstream:: Creating an lstream object. |
295 * Lstream Types:: Different sorts of things that are streamed. | 449 * Lstream Types:: Different sorts of things that are streamed. |
296 * Lstream Functions:: Functions for working with lstreams. | 450 * Lstream Functions:: Functions for working with lstreams. |
297 * Lstream Methods:: Creating new lstream types. | 451 * Lstream Methods:: Creating new lstream types. |
298 | 452 |
299 Consoles; Devices; Frames; Windows | 453 Consoles; Devices; Frames; Windows |
300 | 454 |
301 * Introduction to Consoles; Devices; Frames; Windows:: | 455 * Introduction to Consoles; Devices; Frames; Windows:: |
302 * Point:: | 456 * Point:: |
303 * Window Hierarchy:: | 457 * Window Hierarchy:: |
304 * The Window Object:: | 458 * The Window Object:: |
459 * Modules for the Basic Displayable Lisp Objects:: | |
305 | 460 |
306 The Redisplay Mechanism | 461 The Redisplay Mechanism |
307 | 462 |
308 * Critical Redisplay Sections:: | 463 * Critical Redisplay Sections:: |
309 * Line Start Cache:: | 464 * Line Start Cache:: |
310 * Redisplay Piece by Piece:: | 465 * Redisplay Piece by Piece:: |
466 * Modules for the Redisplay Mechanism:: | |
467 * Modules for other Display-Related Lisp Objects:: | |
311 | 468 |
312 Extents | 469 Extents |
313 | 470 |
314 * Introduction to Extents:: Extents are ranges over text, with properties. | 471 * Introduction to Extents:: Extents are ranges over text, with properties. |
315 * Extent Ordering:: How extents are ordered internally. | 472 * Extent Ordering:: How extents are ordered internally. |
316 * Format of the Extent Info:: The extent information in a buffer or string. | 473 * Format of the Extent Info:: The extent information in a buffer or string. |
317 * Zero-Length Extents:: A weird special case. | 474 * Zero-Length Extents:: A weird special case. |
318 * Mathematics of Extent Ordering:: A rigorous foundation. | 475 * Mathematics of Extent Ordering:: A rigorous foundation. |
319 * Extent Fragments:: Cached information useful for redisplay. | 476 * Extent Fragments:: Cached information useful for redisplay. |
320 | 477 |
478 Interface to MS Windows | |
479 | |
480 * Different kinds of Windows environments:: | |
481 * Windows Build Flags:: | |
482 * Windows I18N Introduction:: | |
483 * Modules for Interfacing with MS Windows:: | |
484 | |
485 Interface to the X Window System | |
486 | |
487 * Lucid Widget Library:: An interface to various widget sets. | |
488 * Modules for Interfacing with X Windows:: | |
489 | |
490 Lucid Widget Library | |
491 | |
492 * Generic Widget Interface:: The lwlib generic widget interface. | |
493 * Scrollbars:: | |
494 * Menubars:: | |
495 * Checkboxes and Radio Buttons:: | |
496 * Progress Bars:: | |
497 * Tab Controls:: | |
498 | |
499 Future Work | |
500 | |
501 * Future Work -- Elisp Compatibility Package:: | |
502 * Future Work -- Drag-n-Drop:: | |
503 * Future Work -- Standard Interface for Enabling Extensions:: | |
504 * Future Work -- Better Initialization File Scheme:: | |
505 * Future Work -- Keyword Parameters:: | |
506 * Future Work -- Property Interface Changes:: | |
507 * Future Work -- Toolbars:: | |
508 * Future Work -- Menu API Changes:: | |
509 * Future Work -- Removal of Misc-User Event Type:: | |
510 * Future Work -- Mouse Pointer:: | |
511 * Future Work -- Extents:: | |
512 * Future Work -- Version Number and Development Tree Organization:: | |
513 * Future Work -- Improvements to the @code{xemacs.org} Website:: | |
514 * Future Work -- Keybindings:: | |
515 * Future Work -- Byte Code Snippets:: | |
516 * Future Work -- Lisp Stream API:: | |
517 * Future Work -- Multiple Values:: | |
518 * Future Work -- Macros:: | |
519 * Future Work -- Specifiers:: | |
520 * Future Work -- Display Tables:: | |
521 * Future Work -- Making Elisp Function Calls Faster:: | |
522 * Future Work -- Lisp Engine Replacement:: | |
523 | |
524 Future Work -- Toolbars | |
525 | |
526 * Future Work -- Easier Toolbar Customization:: | |
527 * Future Work -- Toolbar Interface Changes:: | |
528 | |
529 Future Work -- Mouse Pointer | |
530 | |
531 * Future Work -- Abstracted Mouse Pointer Interface:: | |
532 * Future Work -- Busy Pointer:: | |
533 | |
534 Future Work -- Extents | |
535 | |
536 * Future Work -- Everything should obey duplicable extents:: | |
537 | |
538 Future Work -- Keybindings | |
539 | |
540 * Future Work -- Keybinding Schemes:: | |
541 * Future Work -- Better Support for Windows Style Key Bindings:: | |
542 * Future Work -- Misc Key Binding Ideas:: | |
543 | |
544 Future Work -- Byte Code Snippets | |
545 | |
546 * Future Work -- Autodetection:: | |
547 * Future Work -- Conversion Error Detection:: | |
548 * Future Work -- BIDI Support:: | |
549 * Future Work -- Localized Text/Messages:: | |
550 | |
551 Future Work -- Lisp Engine Replacement | |
552 | |
553 * Future Work -- Lisp Engine Discussion:: | |
554 * Future Work -- Lisp Engine Replacement -- Implementation:: | |
555 | |
556 Future Work Discussion | |
557 | |
558 * Discussion -- garbage collection:: | |
559 * Discussion -- glyphs:: | |
560 | |
561 Old Future Work | |
562 | |
563 * Future Work -- A Portable Unexec Replacement:: | |
564 * Future Work -- Indirect Buffers:: | |
565 * Future Work -- Improvements in support for non-ASCII (European) keysyms under X:: | |
566 * Future Work -- xemacs.org Mailing Address Changes:: | |
567 * Future Work -- Lisp callbacks from critical areas of the C code:: | |
568 | |
321 @end detailmenu | 569 @end detailmenu |
322 @end menu | 570 @end menu |
323 | 571 |
324 @node A History of Emacs, XEmacs From the Outside, Top, Top | 572 @node Introduction, Authorship of XEmacs, Top, Top |
573 @chapter Introduction | |
574 @cindex introduction | |
575 @cindex authorship, manual | |
576 | |
577 This manual documents the internals of XEmacs. It presumes knowledge of | |
578 how to use XEmacs (@pxref{Top,,, xemacs, XEmacs User's Manual}), and | |
579 especially, knowledge of XEmacs Lisp (@pxref{Top,,, lispref, XEmacs Lisp | |
580 Reference Manual}). Information in either of these manuals will not be | |
581 repeated here, and some information in the Lisp Reference Manual in | |
582 particular is more relevant to a person working on the internals than | |
583 the average XEmacs Lisp programmer. (In such cases, a cross-reference is | |
584 usually made to the Lisp Reference Manual.) | |
585 | |
586 Ideally, this manual would be complete and up-to-date. Unfortunately, | |
587 in reality it is neither, due to the limited resources of the | |
588 maintainers of XEmacs. (That said, it is much better than the internal | |
589 documentation of most programs.) Also, much information about the | |
590 internals is documented only in the code itself, in the form of | |
591 comments. Furthermore, since the maintainers are more likely to be | |
592 working on the code than on this manual, information contained in | |
593 comments may be more up-to-date than information in this manual. Do not | |
594 assume that all information in this manual is necessarily accurate as of | |
595 the snapshot of the code you are looking at, and in the case of | |
596 contradictions between the code comments and the manual, @strong{always} | |
597 assume that the code comments are correct. (Because of the proximity of | |
598 the comments to the code, comments will rarely be out-of-date.) | |
599 | |
600 This manual was primarily written by Ben Wing. Certain sections were | |
601 written by others, including those mentioned on the title page as well | |
602 as other coders. Some sections were lifted directly from comments in | |
603 the code, and in those cases we may not completely be aware of the | |
604 authorship. In addition, due to the collaborative nature of XEmacs, | |
605 many people have made small changes and emendations as they have | |
606 discovered problems. | |
607 | |
608 The following is a (necessarily incomplete) list of the work that was | |
609 @emph{not} done by Ben Wing (for more complete information, take a look | |
610 at the ChangeLog for the @file{man} directory and the CVS records of | |
611 actual changes): | |
612 | |
613 @table @asis | |
614 @item Stephen Turnbull | |
615 Various cleanup work, mostly post-2000. Object-Oriented Techniques in | |
616 XEmacs. A Reader's Guide to XEmacs Coding Conventions. Searching and | |
617 Matching. Regression Testing XEmacs. Modules for Regression Testing. | |
618 Lucid Widget Library. | |
619 @item Martin Buchholz | |
620 Various cleanup work, mostly pre-2001. Docs on inline functions. Docs | |
621 on dfc conversion functions (Conversion to and from External Data). | |
622 Improvements in support for non-ASCII (European) keysyms under X. | |
623 @item Hrvoje Niksic | |
624 Coding for Mule. | |
625 @item Matthias Neubauer | |
626 Garbage Collection - Step by Step. | |
627 @item Olivier Galibert | |
628 Portable dumper documentation. | |
629 @item Andy Piper | |
630 Redisplay Piece by Piece. Glyphs. | |
631 @item Chuck Thompson | |
632 Line Start Cache. | |
633 @item Kenichi Handa | |
634 CCL. | |
635 @end table | |
636 | |
637 @node Authorship of XEmacs, A History of Emacs, Introduction, Top | |
638 @chapter Authorship of XEmacs | |
639 @cindex authorship, XEmacs | |
640 | |
641 General authorship in chronological order: | |
642 | |
643 @table @asis | |
644 | |
645 @item Jamie Zawinski, Eric Benson, Matthieu Devin, Harlan Sexton | |
646 These were the early creators of Lucid Emacs, the predecessor of Xemacs. | |
647 Jamie Zawinski was the primary maintainer and coder for Lucid Emacs— | |
648 active between early 1991 and June 1994. He presided over versions 19.0 | |
649 through 19.10, and then abruptly left for Netscape. He wrote the | |
650 advanced stream code, the Xt interface code, the byte compiler, the | |
651 original version of the X selection code, the first, second and third | |
652 versions of the face code which appeared in 19.0, 19.6 and 19.9 | |
653 respectively. Part of the keymap code separated the Lisp directories | |
654 into many subdirectories and many smaller changes. Matthieu Devin wrote | |
655 the original version of the Extents code. Someone else at Lucid wrote | |
656 the Lucid widget library (LWLIB), with the exception of the scrollbar | |
657 code, which was added later. | |
658 | |
659 @item Richard Mlynarik | |
660 Active 1991 to 1993, author of much of the current Lisp object scheme, | |
661 including Lrecords and LC records (added this support in 1993 to allow | |
662 for 28-bit pointers, which had previously been restricted to 26 bits.) | |
663 Moved the minibuffer and abbreve code into Lisp, worked on the keymap | |
664 code and did the initial synching between Xemacs and the first released | |
665 version of GNU Emacs version 19 in mid-1993. | |
666 | |
667 @item Martin Buchholz | |
668 Active 1995 to 2001, maintainer of Xemacs late 1999 to ?, author of the | |
669 current configure support, mini optimizations to the byte interpreter, | |
670 many improvements to the case changing code and many bug fixes to the | |
671 process and system-specific code, also general spell checking and code | |
672 cleanliness guru. | |
673 | |
674 @item Steve Baur | |
675 Maintainer of Xemacs 1996 to 1999, responsible for many improvements to | |
676 the Xemacs development process, for example, creation of the review | |
677 board and arranging for Xemacs to be placed under CVS. Author of the | |
678 package code. | |
679 | |
680 @item Chuck Thompson | |
681 Active January 1993 to June of 1996, author of the current and previous | |
682 ve3rsions of the redisplay code and maintainer of Xemacs from mid-1994 | |
683 to mid-1996. Creator of XEMacs.org. Also wrote the scrollbar code, the | |
684 original configure support, and prototype versions of the toolbar and | |
685 device code. | |
686 | |
687 @item Ben Wing | |
688 Active April 1993 to April 1996 and February 2000 to present. Chief | |
689 coder for Xemacs between 1994 and 1996. Ben Wing was never the | |
690 maintainer of Xemacs, and as a result, is the author of more of the | |
691 Xemacs specific code in Xemacs than anyone else. Author of the mule | |
692 support (Extense code), the glis-phonetically spelled-and specifiers | |
693 code most of the toolbars, and device distraction code, the error | |
694 checking code, the Lstream code, the bit vector, char-table, and | |
695 range-table code, much of the current Xt code, much, much of the events | |
696 code (including most of the TTY event code), some of the phase code, and | |
697 numerous other aspects of the code. Also author of most of the Xemacs | |
698 documentation including the internals manual and the Xemacs editions to | |
699 the Lisp reference manual, and responsible for much of the synching | |
700 between Xemacs and GNU Emacs. | |
701 | |
702 @item Kyle Jones | |
703 Author of the minimal tag bits support in—minimal lisp support for lisp | |
704 objects which allows for 32-bit pointers and 31-bit integers. | |
705 | |
706 @item Olivier Galibert | |
707 Author of the portable dumping mechanism. | |
708 | |
709 @item Andy Piper | |
710 Author of the widget support, the gutter support and much of the | |
711 Microsoft Windows support. | |
712 | |
713 @item Kirill Katsnelson | |
714 Author of many improvements to Microsoft Windows support, the current | |
715 sub-process code, and revamping of the display size change mechanism. | |
716 | |
717 @item Jonathan Harris | |
718 Author of much of the Microsoft Windows support. | |
719 @end table | |
720 | |
721 Authorship of some of the modules: | |
722 | |
723 @table @file | |
724 @item alloc.c | |
725 Inherited 1991 from a prototype of GNU Emacs 19. Around mid-1993 | |
726 Richard Mlynarik redid much of the code, creating the existing system of | |
727 object abstractions, (where each object can define its own marking | |
728 method, printing method, and so on) and the existing scheme of Lrecords | |
729 and LC records. This was done both to increase the number of bits that | |
730 a pointer can occupy from 26 to 28, and provide a general framework for | |
731 creating new object types easily. The garbage collection and | |
732 froblock-phonetically spelled-allocation code is left over from the | |
733 original version, but was cleaned up somewhat by Mlynarik. Later in | |
734 1993, Jamie Zawinski improved the code that kept track of pure space | |
735 usage so it would report exactly where you exceeded the pure space and | |
736 how much pure space you are going to have to add to get everything to | |
737 fit. He also added code to issue nice pure space and garbage | |
738 collections statistics at the end of dumping. Early in 1995, Ben Wing | |
739 cleaned up the froblock code to be as compact as possible, added the | |
740 various bits of error checking, which are controlled using the | |
741 _ErrorCheck*. He also added the ability of strings to be resized, which | |
742 is necessary under MULE, because you can replace one character in a | |
743 string with another character of a different size. As a result, the | |
744 string resizes. Ben Wing also added bit factors for 1913 around | |
745 September 1995, and Elsie record lists for 1914 around December 1995. | |
746 Steve Baur did some work on the purification and dump time code, and | |
747 added Doug Lea Malloc support from Emacs 20.2 circa 1998. Kyle Jones | |
748 continued to work done by Mlynarik, reducing the number of primitive | |
749 Lisp types so that there are only three: integer character and pointer | |
750 type, which encompasses all other types. This allows for 31-bit | |
751 integers and 32-bit pointers, although there is potential slowdown in | |
752 some extra in directions when determining the type of an object, and | |
753 some memory increase for the objects that previously were considered to | |
754 be the most primitive types. Martin Buchholz has recently (February | |
755 2000) done some work to eliminate most of the slowdown. | |
756 | |
757 Olivier Galibert, mid-1999 to 2000, implemented the portable | |
758 dumper. This writes out the state of the Lisp object heap to | |
759 disk file in a real locatable fashion so that it can later be | |
760 read in at any memory location. This work entails a number of | |
761 changes in Alec.C. For example, pure space was removed and | |
762 structures were created to define the types of all the elements | |
763 contained in the various lisp object structures and associated | |
764 structures. | |
765 | |
766 @item alloca.c | |
767 Inherited a long time ago from a prerelease version of GNU Emacs 19, | |
768 kept in sync with more recent versions very few changes from Xemacs. | |
769 Most changes consist of converting the code to ANSI C, and fixing up the | |
770 includes at the top of the file to follow Xemacs conventions. | |
771 | |
772 @item alloca.s | |
773 Inherited almost unchanged from FSF kept in sync up through 19.30 | |
774 basically no changes for Xemacs. | |
775 @end table | |
776 | |
777 @node A History of Emacs, XEmacs From the Outside, Authorship of XEmacs, Top | |
325 @chapter A History of Emacs | 778 @chapter A History of Emacs |
326 @cindex history of Emacs, a | 779 @cindex history of Emacs, a |
327 @cindex Emacs, a history of | 780 @cindex Emacs, a history of |
328 @cindex Hackers (Steven Levy) | 781 @cindex Hackers (Steven Levy) |
329 @cindex Levy, Steven | 782 @cindex Levy, Steven |
358 * GNU Emacs 19:: The other version 19 Emacs. | 811 * GNU Emacs 19:: The other version 19 Emacs. |
359 * GNU Emacs 20:: The other version 20 Emacs. | 812 * GNU Emacs 20:: The other version 20 Emacs. |
360 * XEmacs:: The continuation of Lucid Emacs. | 813 * XEmacs:: The continuation of Lucid Emacs. |
361 @end menu | 814 @end menu |
362 | 815 |
363 @node Through Version 18 | 816 @node Through Version 18, Lucid Emacs, A History of Emacs, A History of Emacs |
364 @section Through Version 18 | 817 @section Through Version 18 |
365 @cindex version 18, through | 818 @cindex version 18, through |
366 @cindex Gosling, James | 819 @cindex Gosling, James |
367 @cindex Great Usenet Renaming | 820 @cindex Great Usenet Renaming |
368 | 821 |
369 Although the history of the early versions of GNU Emacs is unclear, | 822 As described above, Emacs began life in the mid-1970's as a series of |
370 the history is well-known from the middle of 1985. A time line is: | 823 editor macros for TECO, an early editor on the PDP-10. In the early |
824 1980's it was rewritten in C as a collaboration between Richard | |
825 M. Stallman (RMS) and James Gosling (the creator of Java); its extension | |
826 language was known as @dfn{Mocklisp}. This version of Emacs-in-C formed | |
827 the basis for the early versions of GNU Emacs and also for Gosling's | |
828 Unipress Emacs, a commercial product. Because of bad blood between the | |
829 two over the issue of commercialism, RMS pretty much disowned this | |
830 collaboration, referring to it as "Gosling Emacs". | |
831 | |
832 At this point we pick up with a time line of events. (A broader timeline | |
833 is available at @uref{http://http://www.jwz.org/doc/emacs-timeline.html, | |
834 ``Emacs Timeline''}.) | |
371 | 835 |
372 @itemize @bullet | 836 @itemize @bullet |
373 @item | 837 @item |
374 GNU Emacs version 15 (15.34) was released sometime in 1984 or 1985 and | 838 Unipress Emacs, a $395 commercial product, was released on May 6, 1983. |
375 shared some code with a version of Emacs written by James Gosling (the | 839 This was an outgrowth of the Emacs-in-C collaboration written by Gosling |
376 same James Gosling who later created the Java language). | 840 and RMS. |
841 | |
842 @item | |
843 GNU Emacs version 13.0? was released on March 20, 1985. This may have | |
844 been the initial public release. This was also based on this same | |
845 Emacs-in-C collaboration. | |
846 | |
847 @item | |
848 GNU Emacs version 15.10 was released on April 11, 1985. | |
849 | |
850 @item | |
851 GNU Emacs version 15.34 was released on May 7, 1985. This appears | |
852 to be the last release of version 15. | |
853 | |
377 @item | 854 @item |
378 GNU Emacs version 16 (first released version was 16.56) was released on | 855 GNU Emacs version 16 (first released version was 16.56) was released on |
379 July 15, 1985. All Gosling code was removed due to potential copyright | 856 July 15, 1985. All Gosling code was removed due to potential copyright |
380 problems with the code. | 857 problems with the code. |
381 @item | 858 @item |
472 version 18.58 released ?????. | 949 version 18.58 released ?????. |
473 @item | 950 @item |
474 version 18.59 released October 31, 1992. | 951 version 18.59 released October 31, 1992. |
475 @end itemize | 952 @end itemize |
476 | 953 |
477 @node Lucid Emacs | 954 @node Lucid Emacs, GNU Emacs 19, Through Version 18, A History of Emacs |
478 @section Lucid Emacs | 955 @section Lucid Emacs |
479 @cindex Lucid Emacs | 956 @cindex Lucid Emacs |
480 @cindex Lucid Inc. | 957 @cindex Lucid Inc. |
481 @cindex Energize | 958 @cindex Energize |
482 @cindex Epoch | 959 @cindex Epoch |
538 version 19.9 released January 12, 1994. | 1015 version 19.9 released January 12, 1994. |
539 @item | 1016 @item |
540 version 19.10 released May 27, 1994. | 1017 version 19.10 released May 27, 1994. |
541 @end itemize | 1018 @end itemize |
542 | 1019 |
543 @node GNU Emacs 19 | 1020 @node GNU Emacs 19, GNU Emacs 20, Lucid Emacs, A History of Emacs |
544 @section GNU Emacs 19 | 1021 @section GNU Emacs 19 |
545 @cindex GNU Emacs 19 | 1022 @cindex GNU Emacs 19 |
546 @cindex Emacs 19, GNU | 1023 @cindex Emacs 19, GNU |
547 @cindex version 19, GNU Emacs | 1024 @cindex version 19, GNU Emacs |
548 @cindex FSF Emacs | 1025 @cindex FSF Emacs |
617 worse. Lucid soon began incorporating features from GNU Emacs 19 into | 1094 worse. Lucid soon began incorporating features from GNU Emacs 19 into |
618 Lucid Emacs; the work was mostly done by Richard Mlynarik, who had been | 1095 Lucid Emacs; the work was mostly done by Richard Mlynarik, who had been |
619 working on and using GNU Emacs for a long time (back as far as version | 1096 working on and using GNU Emacs for a long time (back as far as version |
620 16 or 17). | 1097 16 or 17). |
621 | 1098 |
622 @node GNU Emacs 20 | 1099 @node GNU Emacs 20, XEmacs, GNU Emacs 19, A History of Emacs |
623 @section GNU Emacs 20 | 1100 @section GNU Emacs 20 |
624 @cindex GNU Emacs 20 | 1101 @cindex GNU Emacs 20 |
625 @cindex Emacs 20, GNU | 1102 @cindex Emacs 20, GNU |
626 @cindex version 20, GNU Emacs | 1103 @cindex version 20, GNU Emacs |
627 @cindex FSF Emacs | 1104 @cindex FSF Emacs |
638 version 20.2 released September 20, 1997. | 1115 version 20.2 released September 20, 1997. |
639 @item | 1116 @item |
640 version 20.3 released August 19, 1998. | 1117 version 20.3 released August 19, 1998. |
641 @end itemize | 1118 @end itemize |
642 | 1119 |
643 @node XEmacs | 1120 @node XEmacs, , GNU Emacs 20, A History of Emacs |
644 @section XEmacs | 1121 @section XEmacs |
645 @cindex XEmacs | 1122 @cindex XEmacs |
646 | 1123 |
647 @cindex Sun Microsystems | 1124 @cindex Sun Microsystems |
648 @cindex University of Illinois | 1125 @cindex University of Illinois |
1287 Recompiling anything depends on @file{bytecomp.elc} and | 1764 Recompiling anything depends on @file{bytecomp.elc} and |
1288 @file{byte-optimize.elc} being up-to-date. | 1765 @file{byte-optimize.elc} being up-to-date. |
1289 @end enumerate | 1766 @end enumerate |
1290 | 1767 |
1291 Put these together and you'll see it's perfectly acceptable to build | 1768 Put these together and you'll see it's perfectly acceptable to build |
1292 auto-autoloads *after* dumping if no @file{.elc} files are out-of-date. | 1769 auto-autoloads @strong{after} dumping if no @file{.elc} files are out-of-date. |
1293 @end quotation | 1770 @end quotation |
1294 | 1771 |
1295 These Lisp driver programs typically run from temacs, not a dumped | 1772 These Lisp driver programs typically run from temacs, not a dumped |
1296 XEmacs. The simplest (but time-consuming) way to achieve a sane | 1773 XEmacs. The simplest (but time-consuming) way to achieve a sane |
1297 environment for running Lisp is to load @file{loadup.el} or | 1774 environment for running Lisp is to load @file{loadup.el} or |
1948 | 2425 |
1949 An example of the right way to do this was the so-called "great integral | 2426 An example of the right way to do this was the so-called "great integral |
1950 type renaming". | 2427 type renaming". |
1951 | 2428 |
1952 @menu | 2429 @menu |
1953 * Great Integral Type Renaming:: | 2430 * Great Integral Type Renaming:: |
1954 * Text/Char Type Renaming:: | 2431 * Text/Char Type Renaming:: |
1955 @end menu | 2432 @end menu |
1956 | 2433 |
1957 @node Great Integral Type Renaming | 2434 @node Great Integral Type Renaming, Text/Char Type Renaming, Major Textual Changes, Major Textual Changes |
1958 @section Great Integral Type Renaming | 2435 @section Great Integral Type Renaming |
1959 @cindex Great Integral Type Renaming | 2436 @cindex Great Integral Type Renaming |
1960 @cindex integral type renaming, great | 2437 @cindex integral type renaming, great |
1961 @cindex type renaming, integral | 2438 @cindex type renaming, integral |
1962 @cindex renaming, integral types | 2439 @cindex renaming, integral types |
1986 are annoying. More has been written on this elsewhere. | 2463 are annoying. More has been written on this elsewhere. |
1987 | 2464 |
1988 @item | 2465 @item |
1989 All such quantity types just mentioned boil down to EMACS_INT, which is | 2466 All such quantity types just mentioned boil down to EMACS_INT, which is |
1990 32 bits on 32-bit machines and 64 bits on 64-bit machines. This is | 2467 32 bits on 32-bit machines and 64 bits on 64-bit machines. This is |
1991 guaranteed to be the same size as Lisp objects of type `int', and (as | 2468 guaranteed to be the same size as Lisp objects of type @code{int}, and (as |
1992 far as I can tell) of size_t (unsigned!) and ssize_t. The only type | 2469 far as I can tell) of size_t (unsigned!) and ssize_t. The only type |
1993 below that is not an EMACS_INT is Hashcode, which is an unsigned value | 2470 below that is not an EMACS_INT is Hashcode, which is an unsigned value |
1994 of the same size as EMACS_INT. | 2471 of the same size as EMACS_INT. |
1995 | 2472 |
1996 @item | 2473 @item |
2068 things, particularly relating to the duplicate definitions of | 2545 things, particularly relating to the duplicate definitions of |
2069 types, now that some types merged with others. Specifically: | 2546 types, now that some types merged with others. Specifically: |
2070 | 2547 |
2071 @enumerate | 2548 @enumerate |
2072 @item | 2549 @item |
2073 in lisp.h, removed duplicate declarations of Bytecount. The changed | 2550 in @file{lisp.h}, removed duplicate declarations of Bytecount. The changed |
2074 code should now look like this: (In each code snippet below, the first | 2551 code should now look like this: (In each code snippet below, the first |
2075 and last lines are the same as the original, as are all lines outside of | 2552 and last lines are the same as the original, as are all lines outside of |
2076 those lines. That allows you to locate the section to be replaced, and | 2553 those lines. That allows you to locate the section to be replaced, and |
2077 replace the stuff in that section, verifying that there isn't anything | 2554 replace the stuff in that section, verifying that there isn't anything |
2078 new added that would need to be kept.) | 2555 new added that would need to be kept.) |
2092 /* ------------------------ dynamic arrays ------------------- */ | 2569 /* ------------------------ dynamic arrays ------------------- */ |
2093 --------------------------------- snip ------------------------------------- | 2570 --------------------------------- snip ------------------------------------- |
2094 @end example | 2571 @end example |
2095 | 2572 |
2096 @item | 2573 @item |
2097 in lstream.h, removed duplicate declaration of Bytecount. Rewrote the | 2574 in @file{lstream.h}, removed duplicate declaration of Bytecount. Rewrote the |
2098 comment about this type. The changed code should now look like this: | 2575 comment about this type. The changed code should now look like this: |
2099 | 2576 |
2100 @example | 2577 @example |
2101 --------------------------------- snip ------------------------------------- | 2578 --------------------------------- snip ------------------------------------- |
2102 #endif | 2579 #endif |
2103 | 2580 |
2104 /* The have been some arguments over the what the type should be that | 2581 /* The have been some arguments over the what the type should be that |
2105 specifies a count of bytes in a data block to be written out or read in, | 2582 specifies a count of bytes in a data block to be written out or read in, |
2106 using Lstream_read(), Lstream_write(), and related functions. | 2583 using @code{Lstream_read()}, @code{Lstream_write()}, and related functions. |
2107 Originally it was long, which worked fine; Martin "corrected" these to | 2584 Originally it was long, which worked fine; Martin "corrected" these to |
2108 size_t and ssize_t on the grounds that this is theoretically cleaner and | 2585 size_t and ssize_t on the grounds that this is theoretically cleaner and |
2109 is in keeping with the C standards. Unfortunately, this practice is | 2586 is in keeping with the C standards. Unfortunately, this practice is |
2110 horribly error-prone due to design flaws in the way that mixed | 2587 horribly error-prone due to design flaws in the way that mixed |
2111 signed/unsigned arithmetic happens. In fact, by doing this change, | 2588 signed/unsigned arithmetic happens. In fact, by doing this change, |
2119 Some earlier comments about why the type must be signed: This MUST BE | 2596 Some earlier comments about why the type must be signed: This MUST BE |
2120 SIGNED, since it also is used in functions that return the number of | 2597 SIGNED, since it also is used in functions that return the number of |
2121 bytes actually read to or written from in an operation, and these | 2598 bytes actually read to or written from in an operation, and these |
2122 functions can return -1 to signal error. | 2599 functions can return -1 to signal error. |
2123 | 2600 |
2124 Note that the standard Unix read() and write() functions define the | 2601 Note that the standard Unix @code{read()} and @code{write()} functions define the |
2125 count going in as a size_t, which is UNSIGNED, and the count going | 2602 count going in as a size_t, which is UNSIGNED, and the count going |
2126 out as an ssize_t, which is SIGNED. This is a horrible design | 2603 out as an ssize_t, which is SIGNED. This is a horrible design |
2127 flaw. Not only is it highly likely to lead to logic errors when a | 2604 flaw. Not only is it highly likely to lead to logic errors when a |
2128 -1 gets interpreted as a large positive number, but operations are | 2605 -1 gets interpreted as a large positive number, but operations are |
2129 bound to fail in all sorts of horrible ways when a number in the | 2606 bound to fail in all sorts of horrible ways when a number in the |
2138 typedef enum lstream_buffering | 2615 typedef enum lstream_buffering |
2139 --------------------------------- snip ------------------------------------- | 2616 --------------------------------- snip ------------------------------------- |
2140 @end example | 2617 @end example |
2141 | 2618 |
2142 @item | 2619 @item |
2143 in dumper.c, there are four places, all inside of switch() statements, | 2620 in @file{dumper.c}, there are four places, all inside of @code{switch()} statements, |
2144 where XD_BYTECOUNT appears twice as a case tag. In each case, the two | 2621 where XD_BYTECOUNT appears twice as a case tag. In each case, the two |
2145 case blocks contain identical code, and you should *REMOVE THE SECOND* | 2622 case blocks contain identical code, and you should *REMOVE THE SECOND* |
2146 and leave the first. | 2623 and leave the first. |
2147 @end enumerate | 2624 @end enumerate |
2148 | 2625 |
2149 @node Text/Char Type Renaming | 2626 @node Text/Char Type Renaming, , Great Integral Type Renaming, Major Textual Changes |
2150 @section Text/Char Type Renaming | 2627 @section Text/Char Type Renaming |
2151 @cindex Text/Char Type Renaming | 2628 @cindex Text/Char Type Renaming |
2152 @cindex type renaming, text/char | 2629 @cindex type renaming, text/char |
2153 @cindex renaming, text/char types | 2630 @cindex renaming, text/char types |
2154 | 2631 |
2209 present. You can probably do the same if you don't have a separate | 2686 present. You can probably do the same if you don't have a separate |
2210 workspace, but do have lots of outstanding changes and you'd rather not | 2687 workspace, but do have lots of outstanding changes and you'd rather not |
2211 just merge all the textual changes directly. Use something like this: | 2688 just merge all the textual changes directly. Use something like this: |
2212 | 2689 |
2213 (WARNING: I'm not a CVS guru; before trying this, or any large operation | 2690 (WARNING: I'm not a CVS guru; before trying this, or any large operation |
2214 that might potentially mess things up, *DEFINITELY* make a backup of | 2691 that might potentially mess things up, @strong{DEFINITELY} make a backup of |
2215 your existing workspace.) | 2692 your existing workspace.) |
2216 | 2693 |
2217 @example | 2694 @example |
2218 cup -r pre-internal-format-textual-renaming | 2695 cup -r pre-internal-format-textual-renaming |
2219 <apply script> | 2696 <apply script> |
2235 @example | 2712 @example |
2236 files="*.[ch] s/*.h m/*.h config.h.in ../configure.in Makefile.in.in ../lib-src/*.[ch] ../lwlib/*.[ch]" | 2713 files="*.[ch] s/*.h m/*.h config.h.in ../configure.in Makefile.in.in ../lib-src/*.[ch] ../lwlib/*.[ch]" |
2237 | 2714 |
2238 # Evidently Perl considers _ to be a word char ala \b, even though XEmacs | 2715 # Evidently Perl considers _ to be a word char ala \b, even though XEmacs |
2239 # doesn't. We need to be careful here with ibyte/ichar because of words | 2716 # doesn't. We need to be careful here with ibyte/ichar because of words |
2240 # like Richard, eicharlen(), multibyte, HIBYTE, etc. | 2717 # like Richard, @code{eicharlen()}, multibyte, HIBYTE, etc. |
2241 | 2718 |
2242 gr Ibyte Intbyte $files | 2719 gr Ibyte Intbyte $files |
2243 gr '\bIBYTE' INTBYTE $files | 2720 gr '\bIBYTE' INTBYTE $files |
2244 gr '\bibyte' intbyte $files | 2721 gr '\bibyte' intbyte $files |
2245 gr '\bICHAR' EMCHAR $files | 2722 gr '\bICHAR' EMCHAR $files |
2275 of the utmost importance that you follow them. If you don't, you may | 2752 of the utmost importance that you follow them. If you don't, you may |
2276 get something that appears to work, but which will crash in odd | 2753 get something that appears to work, but which will crash in odd |
2277 situations, often in code far away from where the actual breakage is. | 2754 situations, often in code far away from where the actual breakage is. |
2278 | 2755 |
2279 @menu | 2756 @menu |
2280 * A Reader's Guide to XEmacs Coding Conventions:: | 2757 * A Reader's Guide to XEmacs Coding Conventions:: |
2281 * General Coding Rules:: | 2758 * General Coding Rules:: |
2282 * Object-Oriented Techniques for C:: | 2759 * Object-Oriented Techniques for C:: |
2283 * Writing Lisp Primitives:: | 2760 * Writing Lisp Primitives:: |
2284 * Writing Good Comments:: | 2761 * Writing Good Comments:: |
2285 * Adding Global Lisp Variables:: | 2762 * Adding Global Lisp Variables:: |
2286 * Proper Use of Unsigned Types:: | 2763 * Writing Macros:: |
2287 * Coding for Mule:: | 2764 * Proper Use of Unsigned Types:: |
2288 * Techniques for XEmacs Developers:: | 2765 * Techniques for XEmacs Developers:: |
2289 @end menu | 2766 @end menu |
2290 | 2767 |
2291 @node A Reader's Guide to XEmacs Coding Conventions | 2768 See also @ref{Coding for Mule}. |
2769 | |
2770 @node A Reader's Guide to XEmacs Coding Conventions, General Coding Rules, Rules When Writing New C Code, Rules When Writing New C Code | |
2292 @section A Reader's Guide to XEmacs Coding Conventions | 2771 @section A Reader's Guide to XEmacs Coding Conventions |
2293 @cindex coding conventions | 2772 @cindex coding conventions |
2294 @cindex reader's guide | 2773 @cindex reader's guide |
2295 @cindex coding rules, naming | 2774 @cindex coding rules, naming |
2296 | 2775 |
2379 @samp{F} implement Lisp primitives. Of course all their arguments and | 2858 @samp{F} implement Lisp primitives. Of course all their arguments and |
2380 their return values must be Lisp_Objects. (This is hidden in the | 2859 their return values must be Lisp_Objects. (This is hidden in the |
2381 @code{DEFUN} macro.) | 2860 @code{DEFUN} macro.) |
2382 | 2861 |
2383 | 2862 |
2384 @node General Coding Rules | 2863 @node General Coding Rules, Object-Oriented Techniques for C, A Reader's Guide to XEmacs Coding Conventions, Rules When Writing New C Code |
2385 @section General Coding Rules | 2864 @section General Coding Rules |
2386 @cindex coding rules, general | 2865 @cindex coding rules, general |
2387 | 2866 |
2388 The C code is actually written in a dialect of C called @dfn{Clean C}, | 2867 The C code is actually written in a dialect of C called @dfn{Clean C}, |
2389 meaning that it can be compiled, mostly warning-free, with either a C or | 2868 meaning that it can be compiled, mostly warning-free, with either a C or |
2390 C++ compiler. Coding in Clean C has several advantages over plain C. | 2869 C++ compiler. Coding in Clean C has several advantages over plain C. |
2391 C++ compilers are more nit-picking, and a number of coding errors have | 2870 C++ compilers are more nit-picking, and a number of coding errors have |
2392 been found by compiling with C++. The ability to use both C and C++ | 2871 been found by compiling with C++. The ability to use both C and C++ |
2393 tools means that a greater variety of development tools are available to | 2872 tools means that a greater variety of development tools are available to |
2394 the developer. | 2873 the developer. In addition, the ability to overload operators in C++ |
2874 means it is possible, for error-checking purposes, to redefine certain | |
2875 simple types (normally defined as aliases for simple built-in types such | |
2876 as @code{unsigned char} or @code{long}) as classes, strictly limiting the permissible | |
2877 operations and catching illegal implicit casts and such. | |
2395 | 2878 |
2396 Every module includes @file{<config.h>} (angle brackets so that | 2879 Every module includes @file{<config.h>} (angle brackets so that |
2397 @samp{--srcdir} works correctly; @file{config.h} may or may not be in | 2880 @samp{--srcdir} works correctly; @file{config.h} may or may not be in |
2398 the same directory as the C sources) and @file{lisp.h}. @file{config.h} | 2881 the same directory as the C sources) and @file{lisp.h}. @file{config.h} |
2399 must always be included before any other header files (including | 2882 must always be included before any other header files (including |
2498 case of @code{GET_EXTERNAL_LIST_LENGTH}, validating the properness of | 2981 case of @code{GET_EXTERNAL_LIST_LENGTH}, validating the properness of |
2499 the list. The macros @code{EXTERNAL_LIST_LOOP_DELETE_IF} and | 2982 the list. The macros @code{EXTERNAL_LIST_LOOP_DELETE_IF} and |
2500 @code{LIST_LOOP_DELETE_IF} delete elements from a lisp list satisfying some | 2983 @code{LIST_LOOP_DELETE_IF} delete elements from a lisp list satisfying some |
2501 predicate. | 2984 predicate. |
2502 | 2985 |
2503 @node Object-Oriented Techniques for C | 2986 @node Object-Oriented Techniques for C, Writing Lisp Primitives, General Coding Rules, Rules When Writing New C Code |
2504 @section Object-Oriented Techniques for C | 2987 @section Object-Oriented Techniques for C |
2505 @cindex coding rules, object-oriented | 2988 @cindex coding rules, object-oriented |
2506 @cindex object-oriented techniques | 2989 @cindex object-oriented techniques |
2507 | 2990 |
2508 At the lowest levels, XEmacs makes heavy use of object-oriented | 2991 At the lowest levels, XEmacs makes heavy use of object-oriented |
2598 @samp{some_method}, but this will also catch calls and definitions of | 3081 @samp{some_method}, but this will also catch calls and definitions of |
2599 that method for instances of other subtypes of @samp{<Type>}, and there | 3082 that method for instances of other subtypes of @samp{<Type>}, and there |
2600 may be a rather large number of them. | 3083 may be a rather large number of them. |
2601 | 3084 |
2602 | 3085 |
2603 @node Writing Lisp Primitives | 3086 @node Writing Lisp Primitives, Writing Good Comments, Object-Oriented Techniques for C, Rules When Writing New C Code |
2604 @section Writing Lisp Primitives | 3087 @section Writing Lisp Primitives |
2605 @cindex writing Lisp primitives | 3088 @cindex writing Lisp primitives |
2606 @cindex Lisp primitives, writing | 3089 @cindex Lisp primitives, writing |
2607 @cindex primitives, writing Lisp | 3090 @cindex primitives, writing Lisp |
2608 | 3091 |
2736 The names of the C arguments will be used as the names of the arguments | 3219 The names of the C arguments will be used as the names of the arguments |
2737 to the Lisp primitive as displayed in its documentation, modulo the same | 3220 to the Lisp primitive as displayed in its documentation, modulo the same |
2738 concerns described above for @code{F...} names (in particular, | 3221 concerns described above for @code{F...} names (in particular, |
2739 underscores in the C arguments become dashes in the Lisp arguments). | 3222 underscores in the C arguments become dashes in the Lisp arguments). |
2740 | 3223 |
2741 There is one additional kludge: A trailing `_' on the C argument is | 3224 There is one additional kludge: A trailing @samp{_} on the C argument is |
2742 discarded when forming the Lisp argument. This allows C language | 3225 discarded when forming the Lisp argument. This allows C language |
2743 reserved words (like @code{default}) or global symbols (like | 3226 reserved words (like @code{default}) or global symbols (like |
2744 @code{dirname}) to be used as argument names without compiler warnings | 3227 @code{dirname}) to be used as argument names without compiler warnings |
2745 or errors. | 3228 or errors. |
2746 | 3229 |
2845 | 3328 |
2846 @file{eval.c} is a very good file to look through for examples; | 3329 @file{eval.c} is a very good file to look through for examples; |
2847 @file{lisp.h} contains the definitions for important macros and | 3330 @file{lisp.h} contains the definitions for important macros and |
2848 functions. | 3331 functions. |
2849 | 3332 |
2850 @node Writing Good Comments | 3333 @node Writing Good Comments, Adding Global Lisp Variables, Writing Lisp Primitives, Rules When Writing New C Code |
2851 @section Writing Good Comments | 3334 @section Writing Good Comments |
2852 @cindex writing good comments | 3335 @cindex writing good comments |
2853 @cindex comments, writing good | 3336 @cindex comments, writing good |
2854 | 3337 |
2855 Comments are a lifeline for programmers trying to understand tricky | 3338 Comments are a lifeline for programmers trying to understand tricky |
2908 them as incorrect. | 3391 them as incorrect. |
2909 | 3392 |
2910 To indicate a "todo" or other problem, use four pound signs -- | 3393 To indicate a "todo" or other problem, use four pound signs -- |
2911 i.e. @samp{####}. | 3394 i.e. @samp{####}. |
2912 | 3395 |
2913 @node Adding Global Lisp Variables | 3396 @node Adding Global Lisp Variables, Writing Macros, Writing Good Comments, Rules When Writing New C Code |
2914 @section Adding Global Lisp Variables | 3397 @section Adding Global Lisp Variables |
2915 @cindex global Lisp variables, adding | 3398 @cindex global Lisp variables, adding |
2916 @cindex variables, adding global Lisp | 3399 @cindex variables, adding global Lisp |
2917 | 3400 |
2918 Global variables whose names begin with @samp{Q} are constants whose | 3401 Global variables whose names begin with @samp{Q} are constants whose |
2977 garbage-collection mechanism won't know that the object in this variable | 3460 garbage-collection mechanism won't know that the object in this variable |
2978 is in use, and will happily collect it and reuse its storage for another | 3461 is in use, and will happily collect it and reuse its storage for another |
2979 Lisp object, and you will be the one who's unhappy when you can't figure | 3462 Lisp object, and you will be the one who's unhappy when you can't figure |
2980 out how your variable got overwritten. | 3463 out how your variable got overwritten. |
2981 | 3464 |
2982 @node Proper Use of Unsigned Types | 3465 @node Writing Macros, Proper Use of Unsigned Types, Adding Global Lisp Variables, Rules When Writing New C Code |
3466 @section Writing Macros | |
3467 @cindex writing macros | |
3468 @cindex macros, writing | |
3469 | |
3470 The three golden rules of macros: | |
3471 | |
3472 @enumerate | |
3473 @item | |
3474 Anything that's an lvalue can be evaluated more than once. | |
3475 @item | |
3476 Macros where anything else can be evaluated more than once should | |
3477 have the word "unsafe" in their name (exceptions may be made for | |
3478 large sets of macros that evaluate arguments of certain types more | |
3479 than once, e.g. struct buffer * arguments, when clearly indicated in | |
3480 the macro documentation). These macros are generally meant to be | |
3481 called only by other macros that have already stored the calling | |
3482 values in temporary variables. | |
3483 @item | |
3484 Nothing else can be evaluated more than once. Use inline | |
3485 functions, if necessary, to prevent multiple evaluation. | |
3486 @end enumerate | |
3487 | |
3488 NOTE: The functions and macros below are given full prototypes in their | |
3489 docs, even when the implementation is a macro. In such cases, passing | |
3490 an argument of a type other than expected will produce undefined | |
3491 results. Also, given that macros can do things functions can't (in | |
3492 particular, directly modify arguments as if they were passed by | |
3493 reference), the declaration syntax has been extended to include the | |
3494 call-by-reference syntax from C++, where an & after a type indicates | |
3495 that the argument is an lvalue and is passed by reference, i.e. the | |
3496 function can modify its value. (This is equivalent in C to passing a | |
3497 pointer to the argument, but without the need to explicitly worry about | |
3498 pointers.) | |
3499 | |
3500 When to capitalize macros: | |
3501 | |
3502 @itemize @bullet | |
3503 @item | |
3504 Capitalize macros doing stuff obviously impossible with (C) | |
3505 functions, e.g. directly modifying arguments as if they were passed by | |
3506 reference. | |
3507 @item | |
3508 Capitalize macros that evaluate @strong{any} argument more than once regardless | |
3509 of whether that's "allowed" (e.g. buffer arguments). | |
3510 @item | |
3511 Capitalize macros that directly access a field in a Lisp_Object or | |
3512 its equivalent underlying structure. In such cases, access through the | |
3513 Lisp_Object precedes the macro with an X, and access through the underlying | |
3514 structure doesn't. | |
3515 @item | |
3516 Capitalize certain other basic macros relating to Lisp_Objects; e.g. | |
3517 FRAMEP, CHECK_FRAME, etc. | |
3518 @item | |
3519 Try to avoid capitalizing any other macros. | |
3520 @end itemize | |
3521 | |
3522 @node Proper Use of Unsigned Types, Techniques for XEmacs Developers, Writing Macros, Rules When Writing New C Code | |
2983 @section Proper Use of Unsigned Types | 3523 @section Proper Use of Unsigned Types |
2984 @cindex unsigned types, proper use of | 3524 @cindex unsigned types, proper use of |
2985 @cindex types, proper use of unsigned | 3525 @cindex types, proper use of unsigned |
2986 | 3526 |
2987 Avoid using @code{unsigned int} and @code{unsigned long} whenever | 3527 Avoid using @code{unsigned int} and @code{unsigned long} whenever |
3008 @end enumerate | 3548 @end enumerate |
3009 | 3549 |
3010 Other reasonable uses of @code{unsigned int} and @code{unsigned long} | 3550 Other reasonable uses of @code{unsigned int} and @code{unsigned long} |
3011 are representing non-quantities -- e.g. bit-oriented flags and such. | 3551 are representing non-quantities -- e.g. bit-oriented flags and such. |
3012 | 3552 |
3013 @node Coding for Mule | 3553 @node Techniques for XEmacs Developers, , Proper Use of Unsigned Types, Rules When Writing New C Code |
3014 @section Coding for Mule | |
3015 @cindex coding for Mule | |
3016 @cindex Mule, coding for | |
3017 | |
3018 Although Mule support is not compiled by default in XEmacs, many people | |
3019 are using it, and we consider it crucial that new code works correctly | |
3020 with multibyte characters. This is not hard; it is only a matter of | |
3021 following several simple user-interface guidelines. Even if you never | |
3022 compile with Mule, with a little practice you will find it quite easy | |
3023 to code Mule-correctly. | |
3024 | |
3025 Note that these guidelines are not necessarily tied to the current Mule | |
3026 implementation; they are also a good idea to follow on the grounds of | |
3027 code generalization for future I18N work. | |
3028 | |
3029 @menu | |
3030 * Character-Related Data Types:: | |
3031 * Working With Character and Byte Positions:: | |
3032 * Conversion to and from External Data:: | |
3033 * General Guidelines for Writing Mule-Aware Code:: | |
3034 * An Example of Mule-Aware Code:: | |
3035 * Mule-izing Code:: | |
3036 @end menu | |
3037 | |
3038 @node Character-Related Data Types | |
3039 @subsection Character-Related Data Types | |
3040 @cindex character-related data types | |
3041 @cindex data types, character-related | |
3042 | |
3043 First, let's review the basic character-related datatypes used by | |
3044 XEmacs. Note that some of the separate @code{typedef}s are not | |
3045 mandatory, but they improve clarity of code a great deal, because one | |
3046 glance at the declaration can tell the intended use of the variable. | |
3047 | |
3048 @table @code | |
3049 @item Ichar | |
3050 @cindex Ichar | |
3051 An @code{Ichar} holds a single Emacs character. | |
3052 | |
3053 Obviously, the equality between characters and bytes is lost in the Mule | |
3054 world. Characters can be represented by one or more bytes in the | |
3055 buffer, and @code{Ichar} is a C type large enough to hold any | |
3056 character. (This currently isn't quite true for ISO 10646, which | |
3057 defines a character as a 31-bit non-negative quantity, while XEmacs | |
3058 characters are only 30-bits. This is irrelevant, unless you are | |
3059 considering using the ISO 10646 private groups to support really large | |
3060 private character sets---in particular, the Mule character set!---in | |
3061 a version of XEmacs using Unicode internally.) | |
3062 | |
3063 Without Mule support, an @code{Ichar} is equivalent to an | |
3064 @code{unsigned char}. [[This doesn't seem to be true; @file{lisp.h} | |
3065 unconditionally @samp{typedef}s @code{Ichar} to @code{int}.]] | |
3066 | |
3067 @item Ibyte | |
3068 @cindex Ibyte | |
3069 The data representing the text in a buffer or string is logically a set | |
3070 of @code{Ibyte}s. | |
3071 | |
3072 XEmacs does not work with the same character formats all the time; when | |
3073 reading characters from the outside, it decodes them to an internal | |
3074 format, and likewise encodes them when writing. @code{Ibyte} (in fact | |
3075 @code{unsigned char}) is the basic unit of XEmacs internal buffers and | |
3076 strings format. An @code{Ibyte *} is the type that points at text | |
3077 encoded in the variable-width internal encoding. | |
3078 | |
3079 One character can correspond to one or more @code{Ibyte}s. In the | |
3080 current Mule implementation, an ASCII character is represented by the | |
3081 same @code{Ibyte}, and other characters are represented by a sequence | |
3082 of two or more @code{Ibyte}s. (This will also be true of an | |
3083 implementation using UTF-8 as the internal encoding. In fact, only code | |
3084 that implements character code conversions and a very few macros used to | |
3085 implement motion by whole characters will notice the difference between | |
3086 UTF-8 and the Mule encoding.) | |
3087 | |
3088 Without Mule support, there are exactly 256 characters, implicitly | |
3089 Latin-1, and each character is represented using one @code{Ibyte}, and | |
3090 there is a one-to-one correspondence between @code{Ibyte}s and | |
3091 @code{Ichar}s. | |
3092 | |
3093 @item Charxpos | |
3094 @item Charbpos | |
3095 @itemx Charcount | |
3096 @cindex Charxpos | |
3097 @cindex Charbpos | |
3098 @cindex Charcount | |
3099 A @code{Charbpos} represents a character position in a buffer. A | |
3100 @code{Charcount} represents a number (count) of characters. Logically, | |
3101 subtracting two @code{Charbpos} values yields a @code{Charcount} value. | |
3102 When representing a character position in a string, we just use | |
3103 @code{Charcount} directly. The reason for having a separate typedef for | |
3104 buffer positions is that they are 1-based, whereas string positions are | |
3105 0-based and hence string counts and positions can be freely intermixed (a | |
3106 string position is equivalent to the count of characters from the | |
3107 beginning). When representing a character position that could be either | |
3108 in a buffer or string (for example, in the extent code), @code{Charxpos} | |
3109 is used. Although all of these are @code{typedef}ed to | |
3110 @code{EMACS_INT}, we use them in preference to @code{EMACS_INT} to make | |
3111 it clear what sort of position is being used. | |
3112 | |
3113 @code{Charxpos}, @code{Charbpos} and @code{Charcount} values are the | |
3114 only ones that are ever visible to Lisp. | |
3115 | |
3116 @item Bytexpos | |
3117 @itemx Bytecount | |
3118 @cindex Bytebpos | |
3119 @cindex Bytecount | |
3120 A @code{Bytebpos} represents a byte position in a buffer. A | |
3121 @code{Bytecount} represents the distance between two positions, in | |
3122 bytes. Byte positions in strings use @code{Bytecount}, and for byte | |
3123 positions that can be either in a buffer or string, @code{Bytexpos} is | |
3124 used. The relationship between @code{Bytexpos}, @code{Bytebpos} and | |
3125 @code{Bytecount} is the same as the relationship between | |
3126 @code{Charxpos}, @code{Charbpos} and @code{Charcount}. | |
3127 | |
3128 @item Extbyte | |
3129 @cindex Extbyte | |
3130 When dealing with the outside world, XEmacs works with @code{Extbyte}s, | |
3131 which are equivalent to @code{char}. The distance between two | |
3132 @code{Extbyte}s is a @code{Bytecount}, since external text is a | |
3133 byte-by-byte encoding. Extbytes occur mainly at the transition point | |
3134 between internal text and external functions. XEmacs code should not, | |
3135 if it can possibly avoid it, do any actual manipulation using external | |
3136 text, since its format is completely unpredictable (it might not even be | |
3137 ASCII-compatible). | |
3138 @end table | |
3139 | |
3140 @node Working With Character and Byte Positions | |
3141 @subsection Working With Character and Byte Positions | |
3142 @cindex character and byte positions, working with | |
3143 @cindex byte positions, working with character and | |
3144 @cindex positions, working with character and byte | |
3145 | |
3146 Now that we have defined the basic character-related types, we can look | |
3147 at the macros and functions designed for work with them and for | |
3148 conversion between them. Most of these macros are defined in | |
3149 @file{buffer.h}, and we don't discuss all of them here, but only the | |
3150 most important ones. Examining the existing code is the best way to | |
3151 learn about them. | |
3152 | |
3153 @table @code | |
3154 @item MAX_ICHAR_LEN | |
3155 @cindex MAX_ICHAR_LEN | |
3156 This preprocessor constant is the maximum number of buffer bytes to | |
3157 represent an Emacs character in the variable width internal encoding. | |
3158 It is useful when allocating temporary strings to keep a known number of | |
3159 characters. For instance: | |
3160 | |
3161 @example | |
3162 @group | |
3163 @{ | |
3164 Charcount cclen; | |
3165 ... | |
3166 @{ | |
3167 /* Allocate place for @var{cclen} characters. */ | |
3168 Ibyte *buf = (Ibyte *) alloca (cclen * MAX_ICHAR_LEN); | |
3169 ... | |
3170 @end group | |
3171 @end example | |
3172 | |
3173 If you followed the previous section, you can guess that, logically, | |
3174 multiplying a @code{Charcount} value with @code{MAX_ICHAR_LEN} produces | |
3175 a @code{Bytecount} value. | |
3176 | |
3177 In the current Mule implementation, @code{MAX_ICHAR_LEN} equals 4. | |
3178 Without Mule, it is 1. In a mature Unicode-based XEmacs, it will also | |
3179 be 4 (since all Unicode characters can be encoded in UTF-8 in 4 bytes or | |
3180 less), but some versions may use up to 6, in order to use the large | |
3181 private space provided by ISO 10646 to ``mirror'' the Mule code space. | |
3182 | |
3183 @item itext_ichar | |
3184 @itemx set_itext_ichar | |
3185 @cindex itext_ichar | |
3186 @cindex set_itext_ichar | |
3187 The @code{itext_ichar} macro takes a @code{Ibyte} pointer and | |
3188 returns the @code{Ichar} stored at that position. If it were a | |
3189 function, its prototype would be: | |
3190 | |
3191 @example | |
3192 Ichar itext_ichar (Ibyte *p); | |
3193 @end example | |
3194 | |
3195 @code{set_itext_ichar} stores an @code{Ichar} to the specified byte | |
3196 position. It returns the number of bytes stored: | |
3197 | |
3198 @example | |
3199 Bytecount set_itext_ichar (Ibyte *p, Ichar c); | |
3200 @end example | |
3201 | |
3202 It is important to note that @code{set_itext_ichar} is safe only for | |
3203 appending a character at the end of a buffer, not for overwriting a | |
3204 character in the middle. This is because the width of characters | |
3205 varies, and @code{set_itext_ichar} cannot resize the string if it | |
3206 writes, say, a two-byte character where a single-byte character used to | |
3207 reside. | |
3208 | |
3209 A typical use of @code{set_itext_ichar} can be demonstrated by this | |
3210 example, which copies characters from buffer @var{buf} to a temporary | |
3211 string of Ibytes. | |
3212 | |
3213 @example | |
3214 @group | |
3215 @{ | |
3216 Charbpos pos; | |
3217 for (pos = beg; pos < end; pos++) | |
3218 @{ | |
3219 Ichar c = BUF_FETCH_CHAR (buf, pos); | |
3220 p += set_itext_ichar (buf, c); | |
3221 @} | |
3222 @} | |
3223 @end group | |
3224 @end example | |
3225 | |
3226 Note how @code{set_itext_ichar} is used to store the @code{Ichar} | |
3227 and increment the counter, at the same time. | |
3228 | |
3229 @item INC_IBYTEPTR | |
3230 @itemx DEC_IBYTEPTR | |
3231 @cindex INC_IBYTEPTR | |
3232 @cindex DEC_IBYTEPTR | |
3233 These two macros increment and decrement an @code{Ibyte} pointer, | |
3234 respectively. They will adjust the pointer by the appropriate number of | |
3235 bytes according to the byte length of the character stored there. Both | |
3236 macros assume that the memory address is located at the beginning of a | |
3237 valid character. | |
3238 | |
3239 Without Mule support, @code{INC_IBYTEPTR (p)} and @code{DEC_IBYTEPTR (p)} | |
3240 simply expand to @code{p++} and @code{p--}, respectively. | |
3241 | |
3242 @item bytecount_to_charcount | |
3243 @cindex bytecount_to_charcount | |
3244 Given a pointer to a text string and a length in bytes, return the | |
3245 equivalent length in characters. | |
3246 | |
3247 @example | |
3248 Charcount bytecount_to_charcount (Ibyte *p, Bytecount bc); | |
3249 @end example | |
3250 | |
3251 @item charcount_to_bytecount | |
3252 @cindex charcount_to_bytecount | |
3253 Given a pointer to a text string and a length in characters, return the | |
3254 equivalent length in bytes. | |
3255 | |
3256 @example | |
3257 Bytecount charcount_to_bytecount (Ibyte *p, Charcount cc); | |
3258 @end example | |
3259 | |
3260 @item itext_n_addr | |
3261 @cindex itext_n_addr | |
3262 Return a pointer to the beginning of the character offset @var{cc} (in | |
3263 characters) from @var{p}. | |
3264 | |
3265 @example | |
3266 Ibyte *itext_n_addr (Ibyte *p, Charcount cc); | |
3267 @end example | |
3268 @end table | |
3269 | |
3270 @node Conversion to and from External Data | |
3271 @subsection Conversion to and from External Data | |
3272 @cindex conversion to and from external data | |
3273 @cindex external data, conversion to and from | |
3274 | |
3275 When an external function, such as a C library function, returns a | |
3276 @code{char} pointer, you should almost never treat it as @code{Ibyte}. | |
3277 This is because these returned strings may contain 8bit characters which | |
3278 can be misinterpreted by XEmacs, and cause a crash. Likewise, when | |
3279 exporting a piece of internal text to the outside world, you should | |
3280 always convert it to an appropriate external encoding, lest the internal | |
3281 stuff (such as the infamous \201 characters) leak out. | |
3282 | |
3283 The interface to conversion between the internal and external | |
3284 representations of text are the numerous conversion macros defined in | |
3285 @file{buffer.h}. There used to be a fixed set of external formats | |
3286 supported by these macros, but now any coding system can be used with | |
3287 them. The coding system alias mechanism is used to create the | |
3288 following logical coding systems, which replace the fixed external | |
3289 formats. The (dontusethis-set-symbol-value-handler) mechanism was | |
3290 enhanced to make this possible (more work on that is needed). | |
3291 | |
3292 Often useful coding systems: | |
3293 | |
3294 @table @code | |
3295 @item Qbinary | |
3296 This is the simplest format and is what we use in the absence of a more | |
3297 appropriate format. This converts according to the @code{binary} coding | |
3298 system: | |
3299 | |
3300 @enumerate a | |
3301 @item | |
3302 On input, bytes 0--255 are converted into (implicitly Latin-1) | |
3303 characters 0--255. A non-Mule xemacs doesn't really know about | |
3304 different character sets and the fonts to display them, so the bytes can | |
3305 be treated as text in different 1-byte encodings by simply setting the | |
3306 appropriate fonts. So in a sense, non-Mule xemacs is a multi-lingual | |
3307 editor if, for example, different fonts are used to display text in | |
3308 different buffers, faces, or windows. The specifier mechanism gives the | |
3309 user complete control over this kind of behavior. | |
3310 @item | |
3311 On output, characters 0--255 are converted into bytes 0--255 and other | |
3312 characters are converted into `~'. | |
3313 @end enumerate | |
3314 | |
3315 @item Qnative | |
3316 Format used for the external Unix environment---@code{argv[]}, stuff | |
3317 from @code{getenv()}, stuff from the @file{/etc/passwd} file, etc. | |
3318 This is encoded according to the encoding specified by the current locale. | |
3319 [[This is dangerous; current locale is user preference, and the system | |
3320 is probably going to be something else. Is there anything we can do | |
3321 about it?]] | |
3322 | |
3323 @item Qfile_name | |
3324 Format used for filenames. This is normally the same as @code{Qnative}, | |
3325 but the two should be distinguished for clarity and possible future | |
3326 separation -- and also because @code{Qfile_name} can be changed using either | |
3327 the @code{file-name-coding-system} or @code{pathname-coding-system} (now | |
3328 obsolete) variables. | |
3329 | |
3330 @item Qctext | |
3331 Compound-text format. This is the standard X11 format used for data | |
3332 stored in properties, selections, and the like. This is an 8-bit | |
3333 no-lock-shift ISO2022 coding system. This is a real coding system, | |
3334 unlike @code{Qfile_name}, which is user-definable. | |
3335 | |
3336 @item Qmswindows_tstr | |
3337 Used for external data in all MS Windows functions that are declared to | |
3338 accept data of type @code{LPTSTR} or @code{LPCSTR}. This maps to either | |
3339 @code{Qmswindows_multibyte} (a locale-specific encoding, same as | |
3340 @code{Qnative}) or @code{Qmswindows_unicode}, depending on whether | |
3341 XEmacs is being run under Windows 9X or Windows NT/2000/XP. | |
3342 @end table | |
3343 | |
3344 Many other coding systems are provided by default. | |
3345 | |
3346 There are two fundamental macros to convert between external and | |
3347 internal format, as well as various convenience macros to simplify the | |
3348 most common operations. | |
3349 | |
3350 @code{TO_INTERNAL_FORMAT} converts external data to internal format, and | |
3351 @code{TO_EXTERNAL_FORMAT} converts the other way around. The arguments | |
3352 each of these receives are a source type, a source, a sink type, a sink, | |
3353 and a coding system (or a symbol naming a coding system). | |
3354 | |
3355 A typical call looks like | |
3356 @example | |
3357 TO_EXTERNAL_FORMAT (LISP_STRING, str, C_STRING_MALLOC, ptr, Qfile_name); | |
3358 @end example | |
3359 | |
3360 which means that the contents of the lisp string @code{str} are written | |
3361 to a malloc'ed memory area which will be pointed to by @code{ptr}, after | |
3362 the function returns. The conversion will be done using the | |
3363 @code{file-name} coding system, which will be controlled by the user | |
3364 indirectly by setting or binding the variable | |
3365 @code{file-name-coding-system}. | |
3366 | |
3367 Some sources and sinks require two C variables to specify. We use some | |
3368 preprocessor magic to allow different source and sink types, and even | |
3369 different numbers of arguments to specify different types of sources and | |
3370 sinks. | |
3371 | |
3372 So we can have a call that looks like | |
3373 @example | |
3374 TO_INTERNAL_FORMAT (DATA, (ptr, len), | |
3375 MALLOC, (ptr, len), | |
3376 coding_system); | |
3377 @end example | |
3378 | |
3379 The parenthesized argument pairs are required to make the preprocessor | |
3380 magic work. | |
3381 | |
3382 Here are the different source and sink types: | |
3383 | |
3384 @table @code | |
3385 @item @code{DATA, (ptr, len),} | |
3386 input data is a fixed buffer of size @var{len} at address @var{ptr} | |
3387 @item @code{ALLOCA, (ptr, len),} | |
3388 output data is placed in an alloca()ed buffer of size @var{len} pointed to by @var{ptr} | |
3389 @item @code{MALLOC, (ptr, len),} | |
3390 output data is in a malloc()ed buffer of size @var{len} pointed to by @var{ptr} | |
3391 @item @code{C_STRING_ALLOCA, ptr,} | |
3392 equivalent to @code{ALLOCA (ptr, len_ignored)} on output. | |
3393 @item @code{C_STRING_MALLOC, ptr,} | |
3394 equivalent to @code{MALLOC (ptr, len_ignored)} on output | |
3395 @item @code{C_STRING, ptr,} | |
3396 equivalent to @code{DATA, (ptr, strlen/wcslen (ptr))} on input | |
3397 @item @code{LISP_STRING, string,} | |
3398 input or output is a Lisp_Object of type string | |
3399 @item @code{LISP_BUFFER, buffer,} | |
3400 output is written to @code{(point)} in lisp buffer @var{buffer} | |
3401 @item @code{LISP_LSTREAM, lstream,} | |
3402 input or output is a Lisp_Object of type lstream | |
3403 @item @code{LISP_OPAQUE, object,} | |
3404 input or output is a Lisp_Object of type opaque | |
3405 @end table | |
3406 | |
3407 A source type of @code{C_STRING} or a sink type of | |
3408 @code{C_STRING_ALLOCA} or @code{C_STRING_MALLOC} is appropriate where | |
3409 the external API is not '\0'-byte-clean -- i.e. it expects strings to be | |
3410 terminated with a null byte. For external API's that are in fact | |
3411 '\0'-byte-clean, we should of course not use these. | |
3412 | |
3413 The sinks to be specified must be lvalues, unless they are the lisp | |
3414 object types @code{LISP_LSTREAM} or @code{LISP_BUFFER}. | |
3415 | |
3416 There is no problem using the same lvalue for source and sink. | |
3417 | |
3418 Garbage collection is inhibited during these conversion operations, so | |
3419 it is OK to pass in data from Lisp strings using @code{XSTRING_DATA}. | |
3420 | |
3421 For the sink types @code{ALLOCA} and @code{C_STRING_ALLOCA}, the | |
3422 resulting text is stored in a stack-allocated buffer, which is | |
3423 automatically freed on returning from the function. However, the sink | |
3424 types @code{MALLOC} and @code{C_STRING_MALLOC} return @code{xmalloc()}ed | |
3425 memory. The caller is responsible for freeing this memory using | |
3426 @code{xfree()}. | |
3427 | |
3428 Note that it doesn't make sense for @code{LISP_STRING} to be a source | |
3429 for @code{TO_INTERNAL_FORMAT} or a sink for @code{TO_EXTERNAL_FORMAT}. | |
3430 You'll get an assertion failure if you try. | |
3431 | |
3432 99% of conversions involve raw data or Lisp strings as both source and | |
3433 sink, and usually data is output as @code{alloca()}, or sometimes | |
3434 @code{xmalloc()}. For this reason, convenience macros are defined for | |
3435 many types of conversions involving raw data and/or Lisp strings, | |
3436 especially when the output is an @code{alloca()}ed string. (When the | |
3437 destination is a Lisp string, there are other functions that should be | |
3438 used instead -- @code{build_ext_string()} and @code{make_ext_string()}, | |
3439 for example.) The convenience macros are of two types -- the older kind | |
3440 that store the result into a specified variable, and the newer kind that | |
3441 return the result. The newer kind of macros don't exist when the output | |
3442 is sized data, because that would have two return values. NOTE: All | |
3443 convenience macros are ultimately defined in terms of | |
3444 @code{TO_EXTERNAL_FORMAT} and @code{TO_INTERNAL_FORMAT}. Thus, any | |
3445 comments above about the workings of these macros also apply to all | |
3446 convenience macros. | |
3447 | |
3448 A typical old-style convenience macro is | |
3449 | |
3450 @example | |
3451 C_STRING_TO_EXTERNAL (in, out, codesys); | |
3452 @end example | |
3453 | |
3454 This is equivalent to | |
3455 | |
3456 @example | |
3457 TO_EXTERNAL_FORMAT (C_STRING, in, C_STRING_ALLOCA, out, codesys); | |
3458 @end example | |
3459 | |
3460 but is easier to write and somewhat clearer, since it clearly identifies | |
3461 the arguments without the clutter of having the preprocessor types mixed | |
3462 in. | |
3463 | |
3464 The new-style equivalent is @code{NEW_C_STRING_TO_EXTERNAL (src, | |
3465 codesys)}, which @emph{returns} the converted data (still in | |
3466 @code{alloca()} space). This is far more convenient for most | |
3467 operations. | |
3468 | |
3469 @node General Guidelines for Writing Mule-Aware Code | |
3470 @subsection General Guidelines for Writing Mule-Aware Code | |
3471 @cindex writing Mule-aware code, general guidelines for | |
3472 @cindex Mule-aware code, general guidelines for writing | |
3473 @cindex code, general guidelines for writing Mule-aware | |
3474 | |
3475 This section contains some general guidance on how to write Mule-aware | |
3476 code, as well as some pitfalls you should avoid. | |
3477 | |
3478 @table @emph | |
3479 @item Never use @code{char} and @code{char *}. | |
3480 In XEmacs, the use of @code{char} and @code{char *} is almost always a | |
3481 mistake. If you want to manipulate an Emacs character from ``C'', use | |
3482 @code{Ichar}. If you want to examine a specific octet in the internal | |
3483 format, use @code{Ibyte}. If you want a Lisp-visible character, use a | |
3484 @code{Lisp_Object} and @code{make_char}. If you want a pointer to move | |
3485 through the internal text, use @code{Ibyte *}. Also note that you | |
3486 almost certainly do not need @code{Ichar *}. Other typedefs to clarify | |
3487 the use of @code{char} are @code{Char_ASCII}, @code{Char_Binary}, | |
3488 @code{UChar_Binary}, and @code{CIbyte}. | |
3489 | |
3490 @item Be careful not to confuse @code{Charcount}, @code{Bytecount}, @code{Charbpos} and @code{Bytebpos}. | |
3491 The whole point of using different types is to avoid confusion about the | |
3492 use of certain variables. Lest this effect be nullified, you need to be | |
3493 careful about using the right types. | |
3494 | |
3495 @item Always convert external data | |
3496 It is extremely important to always convert external data, because | |
3497 XEmacs can crash if unexpected 8-bit sequences are copied to its internal | |
3498 buffers literally. | |
3499 | |
3500 This means that when a system function, such as @code{readdir}, returns | |
3501 a string, you normally need to convert it using one of the conversion macros | |
3502 described in the previous chapter, before passing it further to Lisp. | |
3503 | |
3504 Actually, most of the basic system functions that accept '\0'-terminated | |
3505 string arguments, like @code{stat()} and @code{open()}, have | |
3506 @strong{encapsulated} equivalents that do the internal to external | |
3507 conversion themselves. The encapsulated equivalents have a @code{qxe_} | |
3508 prefix and have string arguments of type @code{Ibyte *}, and you can | |
3509 pass internally encoded data to them, often from a Lisp string using | |
3510 @code{XSTRING_DATA}. (A better design might be to provide versions that | |
3511 accept Lisp strings directly.) [[Really? Then they'd either take | |
3512 @code{Lisp_Object}s and need to check type, or they'd take | |
3513 @code{Lisp_String}s, and violate the rules about passing any of the | |
3514 specific Lisp types.]] | |
3515 | |
3516 Also note that many internal functions, such as @code{make_string}, | |
3517 accept Ibytes, which removes the need for them to convert the data they | |
3518 receive. This increases efficiency because that way external data needs | |
3519 to be decoded only once, when it is read. After that, it is passed | |
3520 around in internal format. | |
3521 | |
3522 @item Do all work in internal format | |
3523 External-formatted data is completely unpredictable in its format. It | |
3524 may be fixed-width Unicode (not even ASCII compatible); it may be a | |
3525 modal encoding, in | |
3526 which case some occurrences of (e.g.) the slash character may be part of | |
3527 two-byte Asian-language characters, and a naive attempt to split apart a | |
3528 pathname by slashes will fail; etc. Internal-format text should be | |
3529 converted to external format only at the point where an external API is | |
3530 actually called, and the first thing done after receiving | |
3531 external-format text from an external API should be to convert it to | |
3532 internal text. | |
3533 @end table | |
3534 | |
3535 @node An Example of Mule-Aware Code | |
3536 @subsection An Example of Mule-Aware Code | |
3537 @cindex code, an example of Mule-aware | |
3538 @cindex Mule-aware code, an example of | |
3539 | |
3540 As an example of Mule-aware code, we will analyze the @code{string} | |
3541 function, which conses up a Lisp string from the character arguments it | |
3542 receives. Here is the definition, pasted from @code{alloc.c}: | |
3543 | |
3544 @example | |
3545 @group | |
3546 DEFUN ("string", Fstring, 0, MANY, 0, /* | |
3547 Concatenate all the argument characters and make the result a string. | |
3548 */ | |
3549 (int nargs, Lisp_Object *args)) | |
3550 @{ | |
3551 Ibyte *storage = alloca_array (Ibyte, nargs * MAX_ICHAR_LEN); | |
3552 Ibyte *p = storage; | |
3553 | |
3554 for (; nargs; nargs--, args++) | |
3555 @{ | |
3556 Lisp_Object lisp_char = *args; | |
3557 CHECK_CHAR_COERCE_INT (lisp_char); | |
3558 p += set_itext_ichar (p, XCHAR (lisp_char)); | |
3559 @} | |
3560 return make_string (storage, p - storage); | |
3561 @} | |
3562 @end group | |
3563 @end example | |
3564 | |
3565 Now we can analyze the source line by line. | |
3566 | |
3567 Obviously, string will be as long as there are arguments to the | |
3568 function. This is why we allocate @code{MAX_ICHAR_LEN} * @var{nargs} | |
3569 bytes on the stack, i.e. the worst-case number of bytes for @var{nargs} | |
3570 @code{Ichar}s to fit in the string. | |
3571 | |
3572 Then, the loop checks that each element is a character, converting | |
3573 integers in the process. Like many other functions in XEmacs, this | |
3574 function silently accepts integers where characters are expected, for | |
3575 historical and compatibility reasons. Unless you know what you are | |
3576 doing, @code{CHECK_CHAR} will also suffice. @code{XCHAR (lisp_char)} | |
3577 extracts the @code{Ichar} from the @code{Lisp_Object}, and | |
3578 @code{set_itext_ichar} stores it to storage, increasing @code{p} in | |
3579 the process. | |
3580 | |
3581 Other instructive examples of correct coding under Mule can be found all | |
3582 over the XEmacs code. For starters, I recommend | |
3583 @code{Fnormalize_menu_item_name} in @file{menubar.c}. After you have | |
3584 understood this section of the manual and studied the examples, you can | |
3585 proceed writing new Mule-aware code. | |
3586 | |
3587 @node Mule-izing Code | |
3588 @subsection Mule-izing Code | |
3589 | |
3590 A lot of code is written without Mule in mind, and needs to be made | |
3591 Mule-correct or "Mule-ized". There is really no substitute for | |
3592 line-by-line analysis when doing this, but the following checklist can | |
3593 help: | |
3594 | |
3595 @itemize @bullet | |
3596 @item | |
3597 Check all uses of @code{XSTRING_DATA}. | |
3598 @item | |
3599 Check all uses of @code{build_string} and @code{make_string}. | |
3600 @item | |
3601 Check all uses of @code{tolower} and @code{toupper}. | |
3602 @item | |
3603 Check object print methods. | |
3604 @item | |
3605 Check for use of functions such as @code{write_c_string}, | |
3606 @code{write_fmt_string}, @code{stderr_out}, @code{stdout_out}. | |
3607 @item | |
3608 Check all occurrences of @code{char} and correct to one of the other | |
3609 typedefs described above. | |
3610 @item | |
3611 Check all existing uses of @code{TO_EXTERNAL_FORMAT}, | |
3612 @code{TO_INTERNAL_FORMAT}, and any convenience macros (grep for | |
3613 @samp{EXTERNAL_TO}, @samp{TO_EXTERNAL}, and @samp{TO_SIZED_EXTERNAL}). | |
3614 @item | |
3615 In Windows code, string literals may need to be encapsulated with @code{XETEXT}. | |
3616 @end itemize | |
3617 | |
3618 @node Techniques for XEmacs Developers | |
3619 @section Techniques for XEmacs Developers | 3554 @section Techniques for XEmacs Developers |
3620 @cindex techniques for XEmacs developers | 3555 @cindex techniques for XEmacs developers |
3621 @cindex developers, techniques for XEmacs | 3556 @cindex developers, techniques for XEmacs |
3622 | 3557 |
3623 @cindex Purify | 3558 @cindex Purify |
3711 if (!marked_p (obj)) mark_object (obj), did_mark = 1 | 3646 if (!marked_p (obj)) mark_object (obj), did_mark = 1 |
3712 @end example | 3647 @end example |
3713 | 3648 |
3714 This macro evaluates its argument twice, and also fails if used like this: | 3649 This macro evaluates its argument twice, and also fails if used like this: |
3715 @example | 3650 @example |
3716 if (flag) MARK_OBJECT (obj); else do_something(); | 3651 if (flag) MARK_OBJECT (obj); else @code{do_something()}; |
3717 @end example | 3652 @end example |
3718 | 3653 |
3719 A much better definition is | 3654 A much better definition is |
3720 | 3655 |
3721 @example | 3656 @example |
3863 @end enumerate | 3798 @end enumerate |
3864 | 3799 |
3865 @node Regression Testing XEmacs, CVS Techniques, Rules When Writing New C Code, Top | 3800 @node Regression Testing XEmacs, CVS Techniques, Rules When Writing New C Code, Top |
3866 @chapter Regression Testing XEmacs | 3801 @chapter Regression Testing XEmacs |
3867 @cindex testing, regression | 3802 @cindex testing, regression |
3803 | |
3804 @menu | |
3805 * How to Regression-Test:: | |
3806 * Modules for Regression Testing:: | |
3807 @end menu | |
3808 | |
3809 @node How to Regression-Test, Modules for Regression Testing, Regression Testing XEmacs, Regression Testing XEmacs | |
3810 @section How to Regression-Test | |
3811 @cindex how to regression-test | |
3812 @cindex regression-test, how to | |
3813 @cindex testing, regression, how to | |
3868 | 3814 |
3869 The source directory @file{tests/automated} contains XEmacs' automated | 3815 The source directory @file{tests/automated} contains XEmacs' automated |
3870 test suite. The usual way of running all the tests is running | 3816 test suite. The usual way of running all the tests is running |
3871 @code{make check} from the top-level build directory. | 3817 @code{make check} from the top-level build directory. |
3872 | 3818 |
4084 reported as an assertion failure (the test failed in a foreseeable way), | 4030 reported as an assertion failure (the test failed in a foreseeable way), |
4085 rather than something else (we don't know what happened because XEmacs | 4031 rather than something else (we don't know what happened because XEmacs |
4086 is broken in a way that we weren't trying to test!) | 4032 is broken in a way that we weren't trying to test!) |
4087 @end enumerate | 4033 @end enumerate |
4088 | 4034 |
4089 | 4035 @node Modules for Regression Testing, , How to Regression-Test, Regression Testing XEmacs |
4090 @node CVS Techniques, A Summary of the Various XEmacs Modules, Regression Testing XEmacs, Top | 4036 @section Modules for Regression Testing |
4037 @cindex modules for regression testing | |
4038 @cindex regression testing, modules for | |
4039 | |
4040 @example | |
4041 @file{test-harness.el} | |
4042 @file{base64-tests.el} | |
4043 @file{byte-compiler-tests.el} | |
4044 @file{case-tests.el} | |
4045 @file{ccl-tests.el} | |
4046 @file{c-tests.el} | |
4047 @file{database-tests.el} | |
4048 @file{extent-tests.el} | |
4049 @file{hash-table-tests.el} | |
4050 @file{lisp-tests.el} | |
4051 @file{md5-tests.el} | |
4052 @file{mule-tests.el} | |
4053 @file{regexp-tests.el} | |
4054 @file{symbol-tests.el} | |
4055 @file{syntax-tests.el} | |
4056 @file{tag-tests.el} | |
4057 @file{weak-tests.el} | |
4058 @end example | |
4059 | |
4060 @file{test-harness.el} defines the macros @code{Assert}, | |
4061 @code{Check-Error}, @code{Check-Error-Message}, and | |
4062 @code{Check-Message}. The other files are test files, testing various | |
4063 XEmacs facilities. @xref{Regression Testing XEmacs}. | |
4064 | |
4065 | |
4066 @node CVS Techniques, The Modules of XEmacs, Regression Testing XEmacs, Top | |
4091 @chapter CVS Techniques | 4067 @chapter CVS Techniques |
4092 @cindex CVS techniques | 4068 @cindex CVS techniques |
4093 | 4069 |
4094 @menu | 4070 @menu |
4095 * Merging a Branch into the Trunk:: | 4071 * Merging a Branch into the Trunk:: |
4096 @end menu | 4072 @end menu |
4097 | 4073 |
4098 @node Merging a Branch into the Trunk | 4074 @node Merging a Branch into the Trunk, , CVS Techniques, CVS Techniques |
4099 @section Merging a Branch into the Trunk | 4075 @section Merging a Branch into the Trunk |
4100 @cindex merging a branch into the trunk | 4076 @cindex merging a branch into the trunk |
4101 | 4077 |
4102 @enumerate | 4078 @enumerate |
4103 @item | 4079 @item |
4175 crw rtag -F -r next-sync-ben-mule-21-5 last-sync-ben-mule-21-5 xemacs | 4151 crw rtag -F -r next-sync-ben-mule-21-5 last-sync-ben-mule-21-5 xemacs |
4176 @end example | 4152 @end example |
4177 @end enumerate | 4153 @end enumerate |
4178 | 4154 |
4179 | 4155 |
4180 @node A Summary of the Various XEmacs Modules, Allocation of Objects in XEmacs Lisp, CVS Techniques, Top | 4156 @node The Modules of XEmacs, Allocation of Objects in XEmacs Lisp, CVS Techniques, Top |
4181 @chapter A Summary of the Various XEmacs Modules | 4157 @chapter The Modules of XEmacs |
4182 @cindex modules, a summary of the various XEmacs | 4158 @cindex modules of XEmacs |
4183 | |
4184 This is accurate as of XEmacs 20.0. | |
4185 | 4159 |
4186 @menu | 4160 @menu |
4187 * Low-Level Modules:: | 4161 * A Summary of the Various XEmacs Modules:: |
4188 * Basic Lisp Modules:: | 4162 * Low-Level Modules:: |
4189 * Modules for Standard Editing Operations:: | 4163 * Basic Lisp Modules:: |
4190 * Editor-Level Control Flow Modules:: | 4164 * Modules for Standard Editing Operations:: |
4191 * Modules for the Basic Displayable Lisp Objects:: | 4165 * Modules for Interfacing with the File System:: |
4192 * Modules for other Display-Related Lisp Objects:: | 4166 * Modules for Other Aspects of the Lisp Interpreter and Object System:: |
4193 * Modules for the Redisplay Mechanism:: | 4167 * Modules for Interfacing with the Operating System:: |
4194 * Modules for Interfacing with the File System:: | |
4195 * Modules for Other Aspects of the Lisp Interpreter and Object System:: | |
4196 * Modules for Interfacing with the Operating System:: | |
4197 * Modules for Interfacing with X Windows:: | |
4198 * Modules for Internationalization:: | |
4199 * Modules for Regression Testing:: | |
4200 @end menu | 4168 @end menu |
4201 | 4169 |
4202 @node Low-Level Modules | 4170 @node A Summary of the Various XEmacs Modules, Low-Level Modules, The Modules of XEmacs, The Modules of XEmacs |
4171 @section A Summary of the Various XEmacs Modules | |
4172 @cindex summary of the various XEmacs modules | |
4173 @cindex modules, summary of the various XEmacs | |
4174 | |
4175 The following is a list of the sections describing the various modules | |
4176 (i.e. files) that implement XEmacs. Some of them are in this chapter; | |
4177 some of them are attached to the chapters describing the modules in | |
4178 question. | |
4179 | |
4180 @itemize @bullet | |
4181 @item | |
4182 @ref{Low-Level Modules}. | |
4183 @item | |
4184 @ref{Basic Lisp Modules}. | |
4185 @item | |
4186 @ref{Modules for Standard Editing Operations}. | |
4187 @item | |
4188 @ref{Editor-Level Control Flow Modules}. | |
4189 @item | |
4190 @ref{Modules for the Basic Displayable Lisp Objects}. | |
4191 @item | |
4192 @ref{Modules for other Display-Related Lisp Objects}. | |
4193 @item | |
4194 @ref{Modules for the Redisplay Mechanism}. | |
4195 @item | |
4196 @ref{Modules for Interfacing with the File System}. | |
4197 @item | |
4198 @ref{Modules for Other Aspects of the Lisp Interpreter and Object System}. | |
4199 @item | |
4200 @ref{Modules for Interfacing with the Operating System}. | |
4201 @item | |
4202 @ref{Modules for Interfacing with MS Windows}. | |
4203 @item | |
4204 @ref{Modules for Interfacing with X Windows}. | |
4205 @item | |
4206 @ref{Modules for Internationalization}. | |
4207 @item | |
4208 @ref{Modules for Regression Testing}. | |
4209 @end itemize | |
4210 | |
4211 The following table contains cross-references from each module in XEmacs | |
4212 21.5 to the section (if any) describing it. | |
4213 | |
4214 @multitable {@file{intl-auto-encap-win32.c}} {@ref{Modules for Other Aspects of the Lisp Interpreter and Object System}} | |
4215 @item @file{Emacs.ad.h} @tab @ref{Modules for Interfacing with X Windows}. | |
4216 @item @file{EmacsFrame.c} @tab @ref{Modules for Interfacing with X Windows}. | |
4217 @item @file{EmacsFrame.h} @tab @ref{Modules for Interfacing with X Windows}. | |
4218 @item @file{EmacsFrameP.h} @tab @ref{Modules for Interfacing with X Windows}. | |
4219 @item @file{EmacsManager.c} @tab @ref{Modules for Interfacing with X Windows}. | |
4220 @item @file{EmacsManager.h} @tab @ref{Modules for Interfacing with X Windows}. | |
4221 @item @file{EmacsManagerP.h} @tab @ref{Modules for Interfacing with X Windows}. | |
4222 @item @file{EmacsShell-sub.c} @tab @ref{Modules for Interfacing with X Windows}. | |
4223 @item @file{EmacsShell.c} @tab @ref{Modules for Interfacing with X Windows}. | |
4224 @item @file{EmacsShell.h} @tab @ref{Modules for Interfacing with X Windows}. | |
4225 @item @file{EmacsShellP.h} @tab @ref{Modules for Interfacing with X Windows}. | |
4226 @item @file{ExternalClient-Xlib.c} @tab @ref{Modules for Interfacing with X Windows}. | |
4227 @item @file{ExternalClient.c} @tab @ref{Modules for Interfacing with X Windows}. | |
4228 @item @file{ExternalClient.h} @tab @ref{Modules for Interfacing with X Windows}. | |
4229 @item @file{ExternalClientP.h} @tab @ref{Modules for Interfacing with X Windows}. | |
4230 @item @file{ExternalShell.c} @tab @ref{Modules for Interfacing with X Windows}. | |
4231 @item @file{ExternalShell.h} @tab @ref{Modules for Interfacing with X Windows}. | |
4232 @item @file{ExternalShellP.h} @tab @ref{Modules for Interfacing with X Windows}. | |
4233 @item @file{Makefile.in.in} @tab | |
4234 @item @file{abbrev.c} @tab @ref{Modules for Other Aspects of the Lisp Interpreter and Object System}. | |
4235 @item @file{alloc.c} @tab @ref{Basic Lisp Modules}. | |
4236 @item @file{alloca.c} @tab @ref{Low-Level Modules}. | |
4237 @item @file{alloca.s} @tab | |
4238 @item @file{backtrace.h} @tab @ref{Basic Lisp Modules}. | |
4239 @item @file{balloon-x.c} @tab | |
4240 @item @file{balloon_help.c} @tab | |
4241 @item @file{balloon_help.h} @tab | |
4242 @item @file{base64-tests.el} @tab @ref{Modules for Regression Testing}. | |
4243 @item @file{bitmaps.h} @tab @ref{Modules for other Display-Related Lisp Objects}. | |
4244 @item @file{blocktype.c} @tab @ref{Low-Level Modules}. | |
4245 @item @file{blocktype.h} @tab @ref{Low-Level Modules}. | |
4246 @item @file{broken-sun.h} @tab @ref{Modules for Interfacing with the Operating System}. | |
4247 @item @file{buffer.c} @tab @ref{Modules for Standard Editing Operations}. | |
4248 @item @file{buffer.h} @tab @ref{Modules for Standard Editing Operations}. | |
4249 @item @file{bufslots.h} @tab @ref{Modules for Standard Editing Operations}. | |
4250 @item @file{byte-compiler-tests.el} @tab @ref{Modules for Regression Testing}. | |
4251 @item @file{bytecode.c} @tab @ref{Basic Lisp Modules}. | |
4252 @item @file{bytecode.h} @tab @ref{Basic Lisp Modules}. | |
4253 @item @file{c-tests.el} @tab @ref{Modules for Regression Testing}. | |
4254 @item @file{callint.c} @tab @ref{Modules for Standard Editing Operations}. | |
4255 @item @file{case-tests.el} @tab @ref{Modules for Regression Testing}. | |
4256 @item @file{casefiddle.c} @tab @ref{Modules for Other Aspects of the Lisp Interpreter and Object System}. | |
4257 @item @file{casetab.c} @tab @ref{Modules for Other Aspects of the Lisp Interpreter and Object System}. | |
4258 @item @file{casetab.h} @tab | |
4259 @item @file{ccl-tests.el} @tab @ref{Modules for Regression Testing}. | |
4260 @item @file{charset.h} @tab | |
4261 @item @file{chartab.c} @tab @ref{Modules for Other Aspects of the Lisp Interpreter and Object System}. | |
4262 @item @file{chartab.h} @tab @ref{Modules for Other Aspects of the Lisp Interpreter and Object System}. | |
4263 @item @file{cm.c} @tab @ref{Modules for the Redisplay Mechanism}. | |
4264 @item @file{cm.h} @tab @ref{Modules for the Redisplay Mechanism}. | |
4265 @item @file{cmdloop.c} @tab @ref{Editor-Level Control Flow Modules}. | |
4266 @item @file{cmds.c} @tab @ref{Modules for Standard Editing Operations}. | |
4267 @item @file{coding-system-slots.h} @tab | |
4268 @item @file{commands.h} @tab @ref{Modules for Standard Editing Operations}. | |
4269 @item @file{compiler.h} @tab | |
4270 @item @file{config.h.in} @tab | |
4271 @item @file{config.h} @tab @ref{Low-Level Modules}. | |
4272 @item @file{conslots.h} @tab | |
4273 @item @file{console-gtk-impl.h} @tab | |
4274 @item @file{console-gtk.c} @tab | |
4275 @item @file{console-gtk.h} @tab | |
4276 @item @file{console-impl.h} @tab | |
4277 @item @file{console-msw-impl.h} @tab | |
4278 @item @file{console-msw.c} @tab @ref{Modules for the Basic Displayable Lisp Objects}. | |
4279 @item @file{console-msw.h} @tab @ref{Modules for the Basic Displayable Lisp Objects}. | |
4280 @item @file{console-stream-impl.h} @tab | |
4281 @item @file{console-stream.c} @tab @ref{Modules for the Basic Displayable Lisp Objects}. | |
4282 @item @file{console-stream.h} @tab @ref{Modules for the Basic Displayable Lisp Objects}. | |
4283 @item @file{console-tty-impl.h} @tab | |
4284 @item @file{console-tty.c} @tab @ref{Modules for the Basic Displayable Lisp Objects}. | |
4285 @item @file{console-tty.h} @tab @ref{Modules for the Basic Displayable Lisp Objects}. | |
4286 @item @file{console-x-impl.h} @tab | |
4287 @item @file{console-x.c} @tab @ref{Modules for the Basic Displayable Lisp Objects}. | |
4288 @item @file{console-x.h} @tab @ref{Modules for the Basic Displayable Lisp Objects}. | |
4289 @item @file{console.c} @tab @ref{Modules for the Basic Displayable Lisp Objects}. | |
4290 @item @file{console.h} @tab @ref{Modules for the Basic Displayable Lisp Objects}. | |
4291 @item @file{data.c} @tab @ref{Basic Lisp Modules}. | |
4292 @item @file{database-tests.el} @tab @ref{Modules for Regression Testing}. | |
4293 @item @file{database.c} @tab | |
4294 @item @file{database.h} @tab | |
4295 @item @file{debug.c} @tab @ref{Low-Level Modules}. | |
4296 @item @file{debug.h} @tab @ref{Low-Level Modules}. | |
4297 @item @file{depend} @tab | |
4298 @item @file{device-gtk.c} @tab | |
4299 @item @file{device-impl.h} @tab | |
4300 @item @file{device-msw.c} @tab @ref{Modules for the Basic Displayable Lisp Objects}. | |
4301 @item @file{device-tty.c} @tab @ref{Modules for the Basic Displayable Lisp Objects}. | |
4302 @item @file{device-x.c} @tab @ref{Modules for the Basic Displayable Lisp Objects}. | |
4303 @item @file{device.c} @tab @ref{Modules for the Basic Displayable Lisp Objects}. | |
4304 @item @file{device.h} @tab @ref{Modules for the Basic Displayable Lisp Objects}. | |
4305 @item @file{devslots.h} @tab | |
4306 @item @file{dgif_lib.c} @tab @ref{Modules for other Display-Related Lisp Objects}. | |
4307 @item @file{dialog-gtk.c} @tab | |
4308 @item @file{dialog-msw.c} @tab | |
4309 @item @file{dialog-x.c} @tab | |
4310 @item @file{dialog.c} @tab | |
4311 @item @file{dired-msw.c} @tab | |
4312 @item @file{dired.c} @tab @ref{Modules for Interfacing with the File System}. | |
4313 @item @file{doc.c} @tab @ref{Modules for Other Aspects of the Lisp Interpreter and Object System}. | |
4314 @item @file{doprnt.c} @tab @ref{Modules for Standard Editing Operations}. | |
4315 @item @file{dragdrop.c} @tab | |
4316 @item @file{dragdrop.h} @tab | |
4317 @item @file{dump-data.c} @tab | |
4318 @item @file{dump-data.h} @tab | |
4319 @item @file{dump-id.c} @tab | |
4320 @item @file{dumper.c} @tab | |
4321 @item @file{dumper.h} @tab | |
4322 @item @file{dynarr.c} @tab @ref{Low-Level Modules}. | |
4323 @item @file{ecrt0.c} @tab @ref{Low-Level Modules}. | |
4324 @item @file{editfns.c} @tab @ref{Modules for Standard Editing Operations}. | |
4325 @item @file{elhash.c} @tab @ref{Modules for Other Aspects of the Lisp Interpreter and Object System}. | |
4326 @item @file{elhash.h} @tab @ref{Modules for Other Aspects of the Lisp Interpreter and Object System}. | |
4327 @item @file{emacs-marshals.c} @tab | |
4328 @item @file{emacs-new.c.old} @tab | |
4329 @item @file{emacs-widget-accessors.c} @tab | |
4330 @item @file{emacs.c} @tab @ref{Low-Level Modules}. | |
4331 @item @file{emodules.c} @tab | |
4332 @item @file{emodules.h} @tab | |
4333 @item @file{esd.c} @tab | |
4334 @item @file{eval.c} @tab @ref{Basic Lisp Modules}. | |
4335 @item @file{event-Xt.c} @tab @ref{Editor-Level Control Flow Modules}. | |
4336 @item @file{event-gtk.c} @tab | |
4337 @item @file{event-gtk.h} @tab | |
4338 @item @file{event-msw.c} @tab @ref{Editor-Level Control Flow Modules}. | |
4339 @item @file{event-stream.c} @tab @ref{Editor-Level Control Flow Modules}. | |
4340 @item @file{event-tty.c} @tab @ref{Editor-Level Control Flow Modules}. | |
4341 @item @file{event-unixoid.c} @tab | |
4342 @item @file{event-xlike-inc.c} @tab | |
4343 @item @file{events-mod.h} @tab @ref{Editor-Level Control Flow Modules}. | |
4344 @item @file{events.c} @tab @ref{Editor-Level Control Flow Modules}. | |
4345 @item @file{events.h} @tab @ref{Editor-Level Control Flow Modules}. | |
4346 @item @file{extent-tests.el} @tab @ref{Modules for Regression Testing}. | |
4347 @item @file{extents-impl.h} @tab | |
4348 @item @file{extents.c} @tab @ref{Modules for Standard Editing Operations}. | |
4349 @item @file{extents.h} @tab @ref{Modules for Standard Editing Operations}. | |
4350 @item @file{extw-Xlib.c} @tab @ref{Modules for Interfacing with X Windows}. | |
4351 @item @file{extw-Xlib.h} @tab @ref{Modules for Interfacing with X Windows}. | |
4352 @item @file{extw-Xt.c} @tab @ref{Modules for Interfacing with X Windows}. | |
4353 @item @file{extw-Xt.h} @tab @ref{Modules for Interfacing with X Windows}. | |
4354 @item @file{faces.c} @tab @ref{Modules for other Display-Related Lisp Objects}. | |
4355 @item @file{faces.h} @tab @ref{Modules for other Display-Related Lisp Objects}. | |
4356 @item @file{file-coding.c} @tab @ref{Modules for Internationalization}. | |
4357 @item @file{file-coding.h} @tab @ref{Modules for Internationalization}. | |
4358 @item @file{fileio.c} @tab @ref{Modules for Interfacing with the File System}. | |
4359 @item @file{filelock.c} @tab @ref{Modules for Interfacing with the File System}. | |
4360 @item @file{filemode.c} @tab @ref{Modules for Interfacing with the File System}. | |
4361 @item @file{floatfns.c} @tab @ref{Basic Lisp Modules}. | |
4362 @item @file{fns.c} @tab @ref{Basic Lisp Modules}. | |
4363 @item @file{font-lock.c} @tab @ref{Modules for other Display-Related Lisp Objects}. | |
4364 @item @file{frame-gtk.c} @tab | |
4365 @item @file{frame-impl.h} @tab | |
4366 @item @file{frame-msw.c} @tab @ref{Modules for the Basic Displayable Lisp Objects}. | |
4367 @item @file{frame-tty.c} @tab @ref{Modules for the Basic Displayable Lisp Objects}. | |
4368 @item @file{frame-x.c} @tab @ref{Modules for the Basic Displayable Lisp Objects}. | |
4369 @item @file{frame.c} @tab @ref{Modules for the Basic Displayable Lisp Objects}. | |
4370 @item @file{frame.diff} @tab | |
4371 @item @file{frame.h} @tab @ref{Modules for the Basic Displayable Lisp Objects}. | |
4372 @item @file{frameslots.h} @tab | |
4373 @item @file{free-hook.c} @tab @ref{Low-Level Modules}. | |
4374 @item @file{gccache-gtk.c} @tab | |
4375 @item @file{gccache-gtk.h} @tab | |
4376 @item @file{general-slots.h} @tab | |
4377 @item @file{general.c} @tab @ref{Basic Lisp Modules}. | |
4378 @item @file{getloadavg.c} @tab @ref{Modules for Interfacing with the Operating System}. | |
4379 @item @file{getpagesize.h} @tab @ref{Low-Level Modules}. | |
4380 @item @file{gif_err.c} @tab @ref{Modules for other Display-Related Lisp Objects}. | |
4381 @item @file{gif_io.c} @tab | |
4382 @item @file{gif_lib.h} @tab @ref{Modules for other Display-Related Lisp Objects}. | |
4383 @item @file{gifalloc.c} @tab @ref{Modules for other Display-Related Lisp Objects}. | |
4384 @item @file{gifrlib.h} @tab | |
4385 @item @file{glade.c} @tab | |
4386 @item @file{glyphs-eimage.c} @tab @ref{Modules for other Display-Related Lisp Objects}. | |
4387 @item @file{glyphs-gtk.c} @tab | |
4388 @item @file{glyphs-gtk.h} @tab | |
4389 @item @file{glyphs-msw.c} @tab @ref{Modules for other Display-Related Lisp Objects}. | |
4390 @item @file{glyphs-msw.h} @tab @ref{Modules for other Display-Related Lisp Objects}. | |
4391 @item @file{glyphs-shared.c} @tab | |
4392 @item @file{glyphs-widget.c} @tab @ref{Modules for other Display-Related Lisp Objects}. | |
4393 @item @file{glyphs-x.c} @tab @ref{Modules for other Display-Related Lisp Objects}. | |
4394 @item @file{glyphs-x.h} @tab @ref{Modules for other Display-Related Lisp Objects}. | |
4395 @item @file{glyphs.c} @tab @ref{Modules for other Display-Related Lisp Objects}. | |
4396 @item @file{glyphs.h} @tab @ref{Modules for other Display-Related Lisp Objects}. | |
4397 @item @file{gmalloc.c} @tab @ref{Low-Level Modules}. | |
4398 @item @file{gpmevent.c} @tab @ref{Editor-Level Control Flow Modules}. | |
4399 @item @file{gpmevent.h} @tab @ref{Editor-Level Control Flow Modules}. | |
4400 @item @file{gtk-glue.c} @tab | |
4401 @item @file{gtk-xemacs.c} @tab | |
4402 @item @file{gtk-xemacs.h} @tab | |
4403 @item @file{gui-gtk.c} @tab | |
4404 @item @file{gui-msw.c} @tab | |
4405 @item @file{gui-x.c} @tab | |
4406 @item @file{gui.c} @tab | |
4407 @item @file{gui.h} @tab | |
4408 @item @file{gutter.c} @tab | |
4409 @item @file{gutter.h} @tab | |
4410 @item @file{hash-table-tests.el} @tab @ref{Modules for Regression Testing}. | |
4411 @item @file{hash.c} @tab @ref{Modules for Other Aspects of the Lisp Interpreter and Object System}. | |
4412 @item @file{hash.h} @tab @ref{Modules for Other Aspects of the Lisp Interpreter and Object System}. | |
4413 @item @file{hftctl.c} @tab @ref{Modules for Interfacing with the Operating System}. | |
4414 @item @file{hpplay.c} @tab @ref{Modules for Interfacing with the Operating System}. | |
4415 @item @file{imgproc.c} @tab | |
4416 @item @file{imgproc.h} @tab | |
4417 @item @file{indent.c} @tab @ref{Modules for the Redisplay Mechanism}. | |
4418 @item @file{inline.c} @tab @ref{Low-Level Modules}. | |
4419 @item @file{input-method-motif.c} @tab | |
4420 @item @file{input-method-xlib.c} @tab | |
4421 @item @file{insdel.c} @tab @ref{Modules for Standard Editing Operations}. | |
4422 @item @file{insdel.h} @tab @ref{Modules for Standard Editing Operations}. | |
4423 @item @file{intl-auto-encap-win32.c} @tab | |
4424 @item @file{intl-auto-encap-win32.h} @tab | |
4425 @item @file{intl-encap-win32.c} @tab | |
4426 @item @file{intl-win32.c} @tab | |
4427 @item @file{intl-x.c} @tab | |
4428 @item @file{intl.c} @tab @ref{Modules for Internationalization}. | |
4429 @item @file{iso-wide.h} @tab @ref{Modules for Internationalization}. | |
4430 @item @file{keymap.c} @tab @ref{Editor-Level Control Flow Modules}. | |
4431 @item @file{keymap.h} @tab @ref{Editor-Level Control Flow Modules}. | |
4432 @item @file{lastfile.c} @tab @ref{Low-Level Modules}. | |
4433 @item @file{libinterface.c} @tab | |
4434 @item @file{libinterface.h} @tab | |
4435 @item @file{libsst.c} @tab @ref{Modules for Interfacing with the Operating System}. | |
4436 @item @file{libsst.h} @tab @ref{Modules for Interfacing with the Operating System}. | |
4437 @item @file{libst.h} @tab @ref{Modules for Interfacing with the Operating System}. | |
4438 @item @file{line-number.c} @tab | |
4439 @item @file{line-number.h} @tab | |
4440 @item @file{linuxplay.c} @tab @ref{Modules for Interfacing with the Operating System}. | |
4441 @item @file{lisp-disunion.h} @tab @ref{Basic Lisp Modules}. | |
4442 @item @file{lisp-tests.el} @tab @ref{Modules for Regression Testing}. | |
4443 @item @file{lisp-union.h} @tab @ref{Basic Lisp Modules}. | |
4444 @item @file{lisp.h} @tab @ref{Basic Lisp Modules}. | |
4445 @item @file{lread.c} @tab @ref{Basic Lisp Modules}. | |
4446 @item @file{lrecord.h} @tab @ref{Basic Lisp Modules}. | |
4447 @item @file{lstream.c} @tab @ref{Modules for Interfacing with the File System}. | |
4448 @item @file{lstream.h} @tab @ref{Modules for Interfacing with the File System}. | |
4449 @item @file{macros.c} @tab @ref{Editor-Level Control Flow Modules}. | |
4450 @item @file{macros.h} @tab @ref{Editor-Level Control Flow Modules}. | |
4451 @item @file{make-src-depend} @tab | |
4452 @item @file{malloc.c} @tab @ref{Low-Level Modules}. | |
4453 @item @file{marker.c} @tab @ref{Modules for Standard Editing Operations}. | |
4454 @item @file{md5-tests.el} @tab @ref{Modules for Regression Testing}. | |
4455 @item @file{md5.c} @tab @ref{Modules for Other Aspects of the Lisp Interpreter and Object System}. | |
4456 @item @file{mem-limits.h} @tab @ref{Low-Level Modules}. | |
4457 @item @file{menubar-gtk.c} @tab | |
4458 @item @file{menubar-msw.c} @tab @ref{Modules for other Display-Related Lisp Objects}. | |
4459 @item @file{menubar-msw.h} @tab @ref{Modules for other Display-Related Lisp Objects}. | |
4460 @item @file{menubar-x.c} @tab @ref{Modules for other Display-Related Lisp Objects}. | |
4461 @item @file{menubar.c} @tab @ref{Modules for other Display-Related Lisp Objects}. | |
4462 @item @file{menubar.h} @tab @ref{Modules for other Display-Related Lisp Objects}. | |
4463 @item @file{minibuf.c} @tab @ref{Editor-Level Control Flow Modules}. | |
4464 @item @file{miscplay.c} @tab | |
4465 @item @file{miscplay.h} @tab | |
4466 @item @file{mule-canna.c} @tab @ref{Modules for Internationalization}. | |
4467 @item @file{mule-ccl.c} @tab @ref{Modules for Internationalization}. | |
4468 @item @file{mule-ccl.h} @tab | |
4469 @item @file{mule-charset.c} @tab @ref{Modules for Internationalization}. | |
4470 @item @file{mule-charset.h} @tab @ref{Modules for Internationalization}. | |
4471 @item @file{mule-coding.c} @tab @ref{Modules for Internationalization}. | |
4472 @item @file{mule-mcpath.c} @tab @ref{Modules for Internationalization}. | |
4473 @item @file{mule-mcpath.h} @tab @ref{Modules for Internationalization}. | |
4474 @item @file{mule-tests.el} @tab @ref{Modules for Regression Testing}. | |
4475 @item @file{mule-wnnfns.c} @tab @ref{Modules for Internationalization}. | |
4476 @item @file{mule.c} @tab @ref{Modules for Internationalization}. | |
4477 @item @file{nas.c} @tab @ref{Modules for Interfacing with the Operating System}. | |
4478 @item @file{native-gtk-toolbar.c} @tab | |
4479 @item @file{ndir.h} @tab @ref{Modules for Interfacing with the File System}. | |
4480 @item @file{nsselect.m} @tab | |
4481 @item @file{nt.c} @tab | |
4482 @item @file{ntheap.c} @tab | |
4483 @item @file{ntplay.c} @tab | |
4484 @item @file{number-gmp.c} @tab | |
4485 @item @file{number-gmp.h} @tab | |
4486 @item @file{number-mp.c} @tab | |
4487 @item @file{number-mp.h} @tab | |
4488 @item @file{number.c} @tab | |
4489 @item @file{number.h} @tab | |
4490 @item @file{objects-gtk-impl.h} @tab | |
4491 @item @file{objects-gtk.c} @tab | |
4492 @item @file{objects-gtk.h} @tab | |
4493 @item @file{objects-impl.h} @tab | |
4494 @item @file{objects-msw-impl.h} @tab | |
4495 @item @file{objects-msw.c} @tab @ref{Modules for other Display-Related Lisp Objects}. | |
4496 @item @file{objects-msw.h} @tab @ref{Modules for other Display-Related Lisp Objects}. | |
4497 @item @file{objects-tty-impl.h} @tab | |
4498 @item @file{objects-tty.c} @tab @ref{Modules for other Display-Related Lisp Objects}. | |
4499 @item @file{objects-tty.h} @tab @ref{Modules for other Display-Related Lisp Objects}. | |
4500 @item @file{objects-x-impl.h} @tab | |
4501 @item @file{objects-x.c} @tab @ref{Modules for other Display-Related Lisp Objects}. | |
4502 @item @file{objects-x.h} @tab @ref{Modules for other Display-Related Lisp Objects}. | |
4503 @item @file{objects.c} @tab @ref{Modules for other Display-Related Lisp Objects}. | |
4504 @item @file{objects.h} @tab @ref{Modules for other Display-Related Lisp Objects}. | |
4505 @item @file{offix-cursors.h} @tab | |
4506 @item @file{offix-types.h} @tab | |
4507 @item @file{offix.c} @tab | |
4508 @item @file{offix.h} @tab | |
4509 @item @file{opaque.c} @tab @ref{Modules for Other Aspects of the Lisp Interpreter and Object System}. | |
4510 @item @file{opaque.h} @tab @ref{Modules for Other Aspects of the Lisp Interpreter and Object System}. | |
4511 @item @file{paths.h.in} @tab | |
4512 @item @file{paths.h} @tab @ref{Low-Level Modules}. | |
4513 @item @file{ppc.ldscript} @tab | |
4514 @item @file{pre-crt0.c} @tab @ref{Low-Level Modules}. | |
4515 @item @file{print.c} @tab @ref{Basic Lisp Modules}. | |
4516 @item @file{process-nt.c} @tab | |
4517 @item @file{process-slots.h} @tab | |
4518 @item @file{process-unix.c} @tab | |
4519 @item @file{process.c} @tab @ref{Modules for Interfacing with the Operating System}. | |
4520 @item @file{process.el} @tab @ref{Modules for Interfacing with the Operating System}. | |
4521 @item @file{process.h} @tab @ref{Modules for Interfacing with the Operating System}. | |
4522 @item @file{procimpl.h} @tab | |
4523 @item @file{profile.c.orig} @tab | |
4524 @item @file{profile.c.rej} @tab | |
4525 @item @file{profile.c} @tab | |
4526 @item @file{profile.h} @tab | |
4527 @item @file{ralloc.c} @tab @ref{Low-Level Modules}. | |
4528 @item @file{rangetab.c} @tab @ref{Modules for Other Aspects of the Lisp Interpreter and Object System}. | |
4529 @item @file{rangetab.h} @tab | |
4530 @item @file{realpath.c} @tab @ref{Modules for Interfacing with the File System}. | |
4531 @item @file{redisplay-gtk.c} @tab | |
4532 @item @file{redisplay-msw.c} @tab @ref{Modules for the Redisplay Mechanism}. | |
4533 @item @file{redisplay-output.c} @tab @ref{Modules for the Redisplay Mechanism}. | |
4534 @item @file{redisplay-tty.c} @tab @ref{Modules for the Redisplay Mechanism}. | |
4535 @item @file{redisplay-x.c} @tab @ref{Modules for the Redisplay Mechanism}. | |
4536 @item @file{redisplay.c} @tab @ref{Modules for the Redisplay Mechanism}. | |
4537 @item @file{redisplay.h} @tab @ref{Modules for the Redisplay Mechanism}. | |
4538 @item @file{regex.c} @tab @ref{Modules for Standard Editing Operations}. | |
4539 @item @file{regex.h} @tab @ref{Modules for Standard Editing Operations}. | |
4540 @item @file{regexp-tests.el} @tab @ref{Modules for Regression Testing}. | |
4541 @item @file{scrollbar-gtk.c} @tab | |
4542 @item @file{scrollbar-gtk.h} @tab | |
4543 @item @file{scrollbar-msw.c} @tab @ref{Modules for other Display-Related Lisp Objects}. | |
4544 @item @file{scrollbar-msw.h} @tab @ref{Modules for other Display-Related Lisp Objects}. | |
4545 @item @file{scrollbar-x.c} @tab @ref{Modules for other Display-Related Lisp Objects}. | |
4546 @item @file{scrollbar-x.h} @tab @ref{Modules for other Display-Related Lisp Objects}. | |
4547 @item @file{scrollbar.c} @tab @ref{Modules for other Display-Related Lisp Objects}. | |
4548 @item @file{scrollbar.h} @tab @ref{Modules for other Display-Related Lisp Objects}. | |
4549 @item @file{search.c} @tab @ref{Modules for Standard Editing Operations}. | |
4550 @item @file{select-common.h} @tab | |
4551 @item @file{select-gtk.c} @tab | |
4552 @item @file{select-msw.c} @tab @ref{Modules for Interfacing with X Windows}. | |
4553 @item @file{select-x.c} @tab @ref{Modules for Interfacing with X Windows}. | |
4554 @item @file{select.c} @tab @ref{Modules for Interfacing with X Windows}. | |
4555 @item @file{select.h} @tab @ref{Modules for Interfacing with X Windows}. | |
4556 @item @file{sgiplay.c} @tab @ref{Modules for Interfacing with the Operating System}. | |
4557 @item @file{sheap.c} @tab | |
4558 @item @file{signal.c} @tab @ref{Low-Level Modules}. | |
4559 @item @file{sound.c} @tab @ref{Modules for Interfacing with the Operating System}. | |
4560 @item @file{sound.h} @tab | |
4561 @item @file{specifier.c} @tab @ref{Modules for Other Aspects of the Lisp Interpreter and Object System}. | |
4562 @item @file{specifier.h} @tab @ref{Modules for Other Aspects of the Lisp Interpreter and Object System}. | |
4563 @item @file{src-headers} @tab | |
4564 @item @file{strcat.c} @tab | |
4565 @item @file{strcmp.c} @tab @ref{Modules for Interfacing with the Operating System}. | |
4566 @item @file{strcpy.c} @tab @ref{Modules for Interfacing with the Operating System}. | |
4567 @item @file{strftime.c} @tab | |
4568 @item @file{sunOS-fix.c} @tab @ref{Modules for Interfacing with the Operating System}. | |
4569 @item @file{sunplay.c} @tab @ref{Modules for Interfacing with the Operating System}. | |
4570 @item @file{sunpro.c} @tab @ref{Modules for Interfacing with the Operating System}. | |
4571 @item @file{symbol-tests.el} @tab @ref{Modules for Regression Testing}. | |
4572 @item @file{symbols.c} @tab @ref{Basic Lisp Modules}. | |
4573 @item @file{symeval.h} @tab @ref{Basic Lisp Modules}. | |
4574 @item @file{symsinit.h} @tab @ref{Basic Lisp Modules}. | |
4575 @item @file{syntax-tests.el} @tab @ref{Modules for Regression Testing}. | |
4576 @item @file{syntax.c} @tab @ref{Modules for Other Aspects of the Lisp Interpreter and Object System}. | |
4577 @item @file{syntax.h} @tab @ref{Modules for Other Aspects of the Lisp Interpreter and Object System}. | |
4578 @item @file{sysdep.c} @tab @ref{Modules for Interfacing with the Operating System}. | |
4579 @item @file{sysdep.h} @tab @ref{Modules for Interfacing with the Operating System}. | |
4580 @item @file{sysdir.h} @tab @ref{Modules for Interfacing with the Operating System}. | |
4581 @item @file{sysdll.c} @tab | |
4582 @item @file{sysdll.h} @tab | |
4583 @item @file{sysfile.h} @tab @ref{Modules for Interfacing with the Operating System}. | |
4584 @item @file{sysfloat.h} @tab @ref{Modules for Interfacing with the Operating System}. | |
4585 @item @file{sysproc.h} @tab @ref{Modules for Interfacing with the Operating System}. | |
4586 @item @file{syspwd.h} @tab @ref{Modules for Interfacing with the Operating System}. | |
4587 @item @file{syssignal.h} @tab @ref{Modules for Interfacing with the Operating System}. | |
4588 @item @file{systime.h} @tab @ref{Modules for Interfacing with the Operating System}. | |
4589 @item @file{systty.h} @tab @ref{Modules for Interfacing with the Operating System}. | |
4590 @item @file{syswait.h} @tab @ref{Modules for Interfacing with the Operating System}. | |
4591 @item @file{syswindows.h} @tab | |
4592 @item @file{tag-tests.el} @tab @ref{Modules for Regression Testing}. | |
4593 @item @file{termcap.c} @tab @ref{Modules for the Redisplay Mechanism}. | |
4594 @item @file{terminfo.c} @tab @ref{Modules for the Redisplay Mechanism}. | |
4595 @item @file{test-harness.el} @tab @ref{Modules for Regression Testing}. | |
4596 @item @file{tests.c} @tab | |
4597 @item @file{text.c} @tab | |
4598 @item @file{text.h} @tab | |
4599 @item @file{toolbar-common.c} @tab | |
4600 @item @file{toolbar-common.h} @tab | |
4601 @item @file{toolbar-gtk.c} @tab | |
4602 @item @file{toolbar-msw.c} @tab @ref{Modules for other Display-Related Lisp Objects}. | |
4603 @item @file{toolbar-x.c} @tab @ref{Modules for other Display-Related Lisp Objects}. | |
4604 @item @file{toolbar.c} @tab @ref{Modules for other Display-Related Lisp Objects}. | |
4605 @item @file{toolbar.h} @tab @ref{Modules for other Display-Related Lisp Objects}. | |
4606 @item @file{tooltalk.c} @tab @ref{Modules for Interfacing with the Operating System}. | |
4607 @item @file{tooltalk.h} @tab @ref{Modules for Interfacing with the Operating System}. | |
4608 @item @file{tparam.c} @tab @ref{Modules for the Redisplay Mechanism}. | |
4609 @item @file{ui-byhand.c} @tab | |
4610 @item @file{ui-gtk.c} @tab | |
4611 @item @file{ui-gtk.h} @tab | |
4612 @item @file{undo.c} @tab @ref{Modules for Standard Editing Operations}. | |
4613 @item @file{unexaix.c} @tab @ref{Low-Level Modules}. | |
4614 @item @file{unexalpha.c} @tab @ref{Low-Level Modules}. | |
4615 @item @file{unexapollo.c} @tab @ref{Low-Level Modules}. | |
4616 @item @file{unexconvex.c} @tab @ref{Low-Level Modules}. | |
4617 @item @file{unexcw.c} @tab | |
4618 @item @file{unexec.c} @tab @ref{Low-Level Modules}. | |
4619 @item @file{unexelf.c} @tab @ref{Low-Level Modules}. | |
4620 @item @file{unexelfsgi.c} @tab @ref{Low-Level Modules}. | |
4621 @item @file{unexencap.c} @tab @ref{Low-Level Modules}. | |
4622 @item @file{unexenix.c} @tab @ref{Low-Level Modules}. | |
4623 @item @file{unexfreebsd.c} @tab @ref{Low-Level Modules}. | |
4624 @item @file{unexfx2800.c} @tab @ref{Low-Level Modules}. | |
4625 @item @file{unexhp9k3.c} @tab @ref{Low-Level Modules}. | |
4626 @item @file{unexhp9k800.c} @tab @ref{Low-Level Modules}. | |
4627 @item @file{unexmips.c} @tab @ref{Low-Level Modules}. | |
4628 @item @file{unexnext.c} @tab @ref{Low-Level Modules}. | |
4629 @item @file{unexnt.c} @tab | |
4630 @item @file{unexsni.c} @tab | |
4631 @item @file{unexsol2-6.c} @tab | |
4632 @item @file{unexsol2.c} @tab @ref{Low-Level Modules}. | |
4633 @item @file{unexsunos4.c} @tab @ref{Low-Level Modules}. | |
4634 @item @file{unicode.c} @tab | |
4635 @item @file{universe.h} @tab @ref{Low-Level Modules}. | |
4636 @item @file{vm-limit.c} @tab @ref{Low-Level Modules}. | |
4637 @item @file{weak-tests.el} @tab @ref{Modules for Regression Testing}. | |
4638 @item @file{widget.c} @tab | |
4639 @item @file{win32.c} @tab | |
4640 @item @file{window-impl.h} @tab | |
4641 @item @file{window.c} @tab @ref{Modules for the Basic Displayable Lisp Objects}. | |
4642 @item @file{window.h} @tab @ref{Modules for the Basic Displayable Lisp Objects}. | |
4643 @item @file{winslots.h} @tab | |
4644 @item @file{xemacs.def.in.in} @tab | |
4645 @item @file{xgccache.c} @tab @ref{Modules for Interfacing with X Windows}. | |
4646 @item @file{xgccache.h} @tab @ref{Modules for Interfacing with X Windows}. | |
4647 @item @file{xintrinsic.h} @tab @ref{Modules for Interfacing with X Windows}. | |
4648 @item @file{xintrinsicp.h} @tab @ref{Modules for Interfacing with X Windows}. | |
4649 @item @file{xmmanagerp.h} @tab @ref{Modules for Interfacing with X Windows}. | |
4650 @item @file{xmotif.h} @tab | |
4651 @item @file{xmprimitivep.h} @tab @ref{Modules for Interfacing with X Windows}. | |
4652 @item @file{xmu.c} @tab @ref{Modules for Interfacing with X Windows}. | |
4653 @item @file{xmu.h} @tab @ref{Modules for Interfacing with X Windows}. | |
4654 @end multitable | |
4655 | |
4656 | |
4657 | |
4658 @node Low-Level Modules, Basic Lisp Modules, A Summary of the Various XEmacs Modules, The Modules of XEmacs | |
4203 @section Low-Level Modules | 4659 @section Low-Level Modules |
4204 @cindex low-level modules | 4660 @cindex low-level modules |
4205 @cindex modules, low-level | 4661 @cindex modules, low-level |
4206 | 4662 |
4207 @example | 4663 @example |
4208 config.h | 4664 @file{config.h} |
4209 @end example | 4665 @end example |
4210 | 4666 |
4211 This is automatically generated from @file{config.h.in} based on the | 4667 This is automatically generated from @file{config.h.in} based on the |
4212 results of configure tests and user-selected optional features and | 4668 results of configure tests and user-selected optional features and |
4213 contains preprocessor definitions specifying the nature of the | 4669 contains preprocessor definitions specifying the nature of the |
4214 environment in which XEmacs is being compiled. | 4670 environment in which XEmacs is being compiled. |
4215 | 4671 |
4216 | 4672 |
4217 | 4673 |
4218 @example | 4674 @example |
4219 paths.h | 4675 @file{paths.h} |
4220 @end example | 4676 @end example |
4221 | 4677 |
4222 This is automatically generated from @file{paths.h.in} based on supplied | 4678 This is automatically generated from @file{paths.h.in} based on supplied |
4223 configure values, and allows for non-standard installed configurations | 4679 configure values, and allows for non-standard installed configurations |
4224 of the XEmacs directories. It's currently broken, though. | 4680 of the XEmacs directories. It's currently broken, though. |
4225 | 4681 |
4226 | 4682 |
4227 | 4683 |
4228 @example | 4684 @example |
4229 emacs.c | 4685 @file{emacs.c} |
4230 signal.c | 4686 @file{signal.c} |
4231 @end example | 4687 @end example |
4232 | 4688 |
4233 @file{emacs.c} contains @code{main()} and other code that performs the most | 4689 @file{emacs.c} contains @code{main()} and other code that performs the most |
4234 basic environment initializations and handles shutting down the XEmacs | 4690 basic environment initializations and handles shutting down the XEmacs |
4235 process (this includes @code{kill-emacs}, the normal way that XEmacs is | 4691 process (this includes @code{kill-emacs}, the normal way that XEmacs is |
4245 @file{syssignal.h} header file, described in section J below. | 4701 @file{syssignal.h} header file, described in section J below. |
4246 | 4702 |
4247 | 4703 |
4248 | 4704 |
4249 @example | 4705 @example |
4250 unexaix.c | 4706 @file{unexaix.c} |
4251 unexalpha.c | 4707 @file{unexalpha.c} |
4252 unexapollo.c | 4708 @file{unexapollo.c} |
4253 unexconvex.c | 4709 @file{unexconvex.c} |
4254 unexec.c | 4710 @file{unexec.c} |
4255 unexelf.c | 4711 @file{unexelf.c} |
4256 unexelfsgi.c | 4712 @file{unexelfsgi.c} |
4257 unexencap.c | 4713 @file{unexencap.c} |
4258 unexenix.c | 4714 @file{unexenix.c} |
4259 unexfreebsd.c | 4715 @file{unexfreebsd.c} |
4260 unexfx2800.c | 4716 @file{unexfx2800.c} |
4261 unexhp9k3.c | 4717 @file{unexhp9k3.c} |
4262 unexhp9k800.c | 4718 @file{unexhp9k800.c} |
4263 unexmips.c | 4719 @file{unexmips.c} |
4264 unexnext.c | 4720 @file{unexnext.c} |
4265 unexsol2.c | 4721 @file{unexsol2.c} |
4266 unexsunos4.c | 4722 @file{unexsunos4.c} |
4267 @end example | 4723 @end example |
4268 | 4724 |
4269 These modules contain code dumping out the XEmacs executable on various | 4725 These modules contain code dumping out the XEmacs executable on various |
4270 different systems. (This process is highly machine-specific and | 4726 different systems. (This process is highly machine-specific and |
4271 requires intimate knowledge of the executable format and the memory map | 4727 requires intimate knowledge of the executable format and the memory map |
4273 chosen by @file{configure}. | 4729 chosen by @file{configure}. |
4274 | 4730 |
4275 | 4731 |
4276 | 4732 |
4277 @example | 4733 @example |
4278 ecrt0.c | 4734 @file{ecrt0.c} |
4279 lastfile.c | 4735 @file{lastfile.c} |
4280 pre-crt0.c | 4736 @file{pre-crt0.c} |
4281 @end example | 4737 @end example |
4282 | 4738 |
4283 These modules are used in conjunction with the dump mechanism. On some | 4739 These modules are used in conjunction with the dump mechanism. On some |
4284 systems, an alternative version of the C startup code (the actual code | 4740 systems, an alternative version of the C startup code (the actual code |
4285 that receives control from the operating system when the process is | 4741 that receives control from the operating system when the process is |
4300 data space when dumping. | 4756 data space when dumping. |
4301 | 4757 |
4302 | 4758 |
4303 | 4759 |
4304 @example | 4760 @example |
4305 alloca.c | 4761 @file{alloca.c} |
4306 free-hook.c | 4762 @file{free-hook.c} |
4307 getpagesize.h | 4763 @file{getpagesize.h} |
4308 gmalloc.c | 4764 @file{gmalloc.c} |
4309 malloc.c | 4765 @file{malloc.c} |
4310 mem-limits.h | 4766 @file{mem-limits.h} |
4311 ralloc.c | 4767 @file{ralloc.c} |
4312 vm-limit.c | 4768 @file{vm-limit.c} |
4313 @end example | 4769 @end example |
4314 | 4770 |
4315 These handle basic C allocation of memory. @file{alloca.c} is an emulation of | 4771 These handle basic C allocation of memory. @file{alloca.c} is an emulation of |
4316 the stack allocation function @code{alloca()} on machines that lack | 4772 the stack allocation function @code{alloca()} on machines that lack |
4317 this. (XEmacs makes extensive use of @code{alloca()} in its code.) | 4773 this. (XEmacs makes extensive use of @code{alloca()} in its code.) |
4363 retrieving the total amount of available virtual memory. Both are | 4819 retrieving the total amount of available virtual memory. Both are |
4364 similar in spirit to the @file{sys*.h} files described in section J, below. | 4820 similar in spirit to the @file{sys*.h} files described in section J, below. |
4365 | 4821 |
4366 | 4822 |
4367 @example | 4823 @example |
4368 blocktype.c | 4824 @file{blocktype.c} |
4369 blocktype.h | 4825 @file{blocktype.h} |
4370 dynarr.c | 4826 @file{dynarr.c} |
4371 @end example | 4827 @end example |
4372 | 4828 |
4373 These implement a couple of basic C data types to facilitate memory | 4829 These implement a couple of basic C data types to facilitate memory |
4374 allocation. The @code{Blocktype} type efficiently manages the | 4830 allocation. The @code{Blocktype} type efficiently manages the |
4375 allocation of fixed-size blocks by minimizing the number of times that | 4831 allocation of fixed-size blocks by minimizing the number of times that |
4389 mechanism. | 4845 mechanism. |
4390 | 4846 |
4391 | 4847 |
4392 | 4848 |
4393 @example | 4849 @example |
4394 inline.c | 4850 @file{inline.c} |
4395 @end example | 4851 @end example |
4396 | 4852 |
4397 This module is used in connection with inline functions (available in | 4853 This module is used in connection with inline functions (available in |
4398 some compilers). Often, inline functions need to have a corresponding | 4854 some compilers). Often, inline functions need to have a corresponding |
4399 non-inline function that does the same thing. This module is where they | 4855 non-inline function that does the same thing. This module is where they |
4403 function definitions, so that each one gets a real function equivalent. | 4859 function definitions, so that each one gets a real function equivalent. |
4404 | 4860 |
4405 | 4861 |
4406 | 4862 |
4407 @example | 4863 @example |
4408 debug.c | 4864 @file{debug.c} |
4409 debug.h | 4865 @file{debug.h} |
4410 @end example | 4866 @end example |
4411 | 4867 |
4412 These functions provide a system for doing internal consistency checks | 4868 These functions provide a system for doing internal consistency checks |
4413 during code development. This system is not currently used; instead the | 4869 during code development. This system is not currently used; instead the |
4414 simpler @code{assert()} macro is used along with the various checks | 4870 simpler @code{assert()} macro is used along with the various checks |
4415 provided by the @samp{--error-check-*} configuration options. | 4871 provided by the @samp{--error-check-*} configuration options. |
4416 | 4872 |
4417 | 4873 |
4418 | 4874 |
4419 @example | 4875 @example |
4420 universe.h | 4876 @file{universe.h} |
4421 @end example | 4877 @end example |
4422 | 4878 |
4423 This is not currently used. | 4879 This is not currently used. |
4424 | 4880 |
4425 | 4881 |
4426 | 4882 |
4427 @node Basic Lisp Modules | 4883 @node Basic Lisp Modules, Modules for Standard Editing Operations, Low-Level Modules, The Modules of XEmacs |
4428 @section Basic Lisp Modules | 4884 @section Basic Lisp Modules |
4429 @cindex Lisp modules, basic | 4885 @cindex Lisp modules, basic |
4430 @cindex modules, basic Lisp | 4886 @cindex modules, basic Lisp |
4431 | 4887 |
4432 @example | 4888 @example |
4433 lisp-disunion.h | 4889 @file{lisp-disunion.h} |
4434 lisp-union.h | 4890 @file{lisp-union.h} |
4435 lisp.h | 4891 @file{lisp.h} |
4436 lrecord.h | 4892 @file{lrecord.h} |
4437 symsinit.h | 4893 @file{symsinit.h} |
4438 @end example | 4894 @end example |
4439 | 4895 |
4440 These are the basic header files for all XEmacs modules. Each module | 4896 These are the basic header files for all XEmacs modules. Each module |
4441 includes @file{lisp.h}, which brings the other header files in. | 4897 includes @file{lisp.h}, which brings the other header files in. |
4442 @file{lisp.h} contains the definitions of the structures and extractor | 4898 @file{lisp.h} contains the definitions of the structures and extractor |
4475 @file{symsinit.h}. | 4931 @file{symsinit.h}. |
4476 | 4932 |
4477 | 4933 |
4478 | 4934 |
4479 @example | 4935 @example |
4480 alloc.c | 4936 @file{alloc.c} |
4481 @end example | 4937 @end example |
4482 | 4938 |
4483 The large module @file{alloc.c} implements all of the basic allocation and | 4939 The large module @file{alloc.c} implements all of the basic allocation and |
4484 garbage collection for Lisp objects. The most commonly used Lisp | 4940 garbage collection for Lisp objects. The most commonly used Lisp |
4485 objects are allocated in chunks, similar to the Blocktype data type | 4941 objects are allocated in chunks, similar to the Blocktype data type |
4503 subtypes in the subsystem; this provides a great deal of robustness to | 4959 subtypes in the subsystem; this provides a great deal of robustness to |
4504 the XEmacs code. | 4960 the XEmacs code. |
4505 | 4961 |
4506 | 4962 |
4507 @example | 4963 @example |
4508 eval.c | 4964 @file{eval.c} |
4509 backtrace.h | 4965 @file{backtrace.h} |
4510 @end example | 4966 @end example |
4511 | 4967 |
4512 This module contains all of the functions to handle the flow of control. | 4968 This module contains all of the functions to handle the flow of control. |
4513 This includes the mechanisms of defining functions, calling functions, | 4969 This includes the mechanisms of defining functions, calling functions, |
4514 traversing stack frames, and binding variables; the control primitives | 4970 traversing stack frames, and binding variables; the control primitives |
4523 flow of control. | 4979 flow of control. |
4524 | 4980 |
4525 | 4981 |
4526 | 4982 |
4527 @example | 4983 @example |
4528 lread.c | 4984 @file{lread.c} |
4529 @end example | 4985 @end example |
4530 | 4986 |
4531 This module implements the Lisp reader and the @code{read} function, | 4987 This module implements the Lisp reader and the @code{read} function, |
4532 which converts text into Lisp objects, according to the read syntax of | 4988 which converts text into Lisp objects, according to the read syntax of |
4533 the objects, as described above. This is similar to the parser that is | 4989 the objects, as described above. This is similar to the parser that is |
4534 a part of all compilers. | 4990 a part of all compilers. |
4535 | 4991 |
4536 | 4992 |
4537 | 4993 |
4538 @example | 4994 @example |
4539 print.c | 4995 @file{print.c} |
4540 @end example | 4996 @end example |
4541 | 4997 |
4542 This module implements the Lisp print mechanism and the @code{print} | 4998 This module implements the Lisp print mechanism and the @code{print} |
4543 function and related functions. This is the inverse of the Lisp reader | 4999 function and related functions. This is the inverse of the Lisp reader |
4544 -- it converts Lisp objects to a printed, textual representation. | 5000 -- it converts Lisp objects to a printed, textual representation. |
4546 an equivalent object.) | 5002 an equivalent object.) |
4547 | 5003 |
4548 | 5004 |
4549 | 5005 |
4550 @example | 5006 @example |
4551 general.c | 5007 @file{general.c} |
4552 symbols.c | 5008 @file{symbols.c} |
4553 symeval.h | 5009 @file{symeval.h} |
4554 @end example | 5010 @end example |
4555 | 5011 |
4556 @file{symbols.c} implements the handling of symbols, obarrays, and | 5012 @file{symbols.c} implements the handling of symbols, obarrays, and |
4557 retrieving the values of symbols. Much of the code is devoted to | 5013 retrieving the values of symbols. Much of the code is devoted to |
4558 handling the special @dfn{symbol-value-magic} objects that define | 5014 handling the special @dfn{symbol-value-magic} objects that define |
4566 @code{DEFVAR_LISP()} and related macros for declaring variables. | 5022 @code{DEFVAR_LISP()} and related macros for declaring variables. |
4567 | 5023 |
4568 | 5024 |
4569 | 5025 |
4570 @example | 5026 @example |
4571 data.c | 5027 @file{data.c} |
4572 floatfns.c | 5028 @file{floatfns.c} |
4573 fns.c | 5029 @file{fns.c} |
4574 @end example | 5030 @end example |
4575 | 5031 |
4576 These modules implement the methods and standard Lisp primitives for all | 5032 These modules implement the methods and standard Lisp primitives for all |
4577 the basic Lisp object types other than symbols (which are described | 5033 the basic Lisp object types other than symbols (which are described |
4578 above). @file{data.c} contains all the predicates (primitives that return | 5034 above). @file{data.c} contains all the predicates (primitives that return |
4587 arithmetic. | 5043 arithmetic. |
4588 | 5044 |
4589 | 5045 |
4590 | 5046 |
4591 @example | 5047 @example |
4592 bytecode.c | 5048 @file{bytecode.c} |
4593 bytecode.h | 5049 @file{bytecode.h} |
4594 @end example | 5050 @end example |
4595 | 5051 |
4596 @file{bytecode.c} implements the byte-code interpreter and | 5052 @file{bytecode.c} implements the byte-code interpreter and |
4597 compiled-function objects, and @file{bytecode.h} contains associated | 5053 compiled-function objects, and @file{bytecode.h} contains associated |
4598 structures. Note that the byte-code @emph{compiler} is written in Lisp. | 5054 structures. Note that the byte-code @emph{compiler} is written in Lisp. |
4599 | 5055 |
4600 | 5056 |
4601 | 5057 |
4602 | 5058 |
4603 @node Modules for Standard Editing Operations | 5059 @node Modules for Standard Editing Operations, Modules for Interfacing with the File System, Basic Lisp Modules, The Modules of XEmacs |
4604 @section Modules for Standard Editing Operations | 5060 @section Modules for Standard Editing Operations |
4605 @cindex modules for standard editing operations | 5061 @cindex modules for standard editing operations |
4606 @cindex editing operations, modules for standard | 5062 @cindex editing operations, modules for standard |
4607 | 5063 |
4608 @example | 5064 @example |
4609 buffer.c | 5065 @file{buffer.c} |
4610 buffer.h | 5066 @file{buffer.h} |
4611 bufslots.h | 5067 @file{bufslots.h} |
4612 @end example | 5068 @end example |
4613 | 5069 |
4614 @file{buffer.c} implements the @dfn{buffer} Lisp object type. This | 5070 @file{buffer.c} implements the @dfn{buffer} Lisp object type. This |
4615 includes functions that create and destroy buffers; retrieve buffers by | 5071 includes functions that create and destroy buffers; retrieve buffers by |
4616 name or by other properties; manipulate lists of buffers (remember that | 5072 name or by other properties; manipulate lists of buffers (remember that |
4635 the built-in buffer-local variables. | 5091 the built-in buffer-local variables. |
4636 | 5092 |
4637 | 5093 |
4638 | 5094 |
4639 @example | 5095 @example |
4640 insdel.c | 5096 @file{insdel.c} |
4641 insdel.h | 5097 @file{insdel.h} |
4642 @end example | 5098 @end example |
4643 | 5099 |
4644 @file{insdel.c} contains low-level functions for inserting and deleting text in | 5100 @file{insdel.c} contains low-level functions for inserting and deleting text in |
4645 a buffer, keeping track of changed regions for use by redisplay, and | 5101 a buffer, keeping track of changed regions for use by redisplay, and |
4646 calling any before-change and after-change functions that may have been | 5102 calling any before-change and after-change functions that may have been |
4650 @file{insdel.h} contains associated headers. | 5106 @file{insdel.h} contains associated headers. |
4651 | 5107 |
4652 | 5108 |
4653 | 5109 |
4654 @example | 5110 @example |
4655 marker.c | 5111 @file{marker.c} |
4656 @end example | 5112 @end example |
4657 | 5113 |
4658 This module implements the @dfn{marker} Lisp object type, which | 5114 This module implements the @dfn{marker} Lisp object type, which |
4659 conceptually is a pointer to a text position in a buffer that moves | 5115 conceptually is a pointer to a text position in a buffer that moves |
4660 around as text is inserted and deleted, so as to remain in the same | 5116 around as text is inserted and deleted, so as to remain in the same |
4669 current buffer position of the marker. | 5125 current buffer position of the marker. |
4670 | 5126 |
4671 | 5127 |
4672 | 5128 |
4673 @example | 5129 @example |
4674 extents.c | 5130 @file{extents.c} |
4675 extents.h | 5131 @file{extents.h} |
4676 @end example | 5132 @end example |
4677 | 5133 |
4678 This module implements the @dfn{extent} Lisp object type, which is like | 5134 This module implements the @dfn{extent} Lisp object type, which is like |
4679 a marker that works over a range of text rather than a single position. | 5135 a marker that works over a range of text rather than a single position. |
4680 Extents are also much more complex and powerful than markers and have a | 5136 Extents are also much more complex and powerful than markers and have a |
4690 cover.) | 5146 cover.) |
4691 | 5147 |
4692 | 5148 |
4693 | 5149 |
4694 @example | 5150 @example |
4695 editfns.c | 5151 @file{editfns.c} |
4696 @end example | 5152 @end example |
4697 | 5153 |
4698 @file{editfns.c} contains the standard Lisp primitives for working with | 5154 @file{editfns.c} contains the standard Lisp primitives for working with |
4699 a buffer's text, and calls the low-level functions in @file{insdel.c}. | 5155 a buffer's text, and calls the low-level functions in @file{insdel.c}. |
4700 It also contains primitives for working with @code{point} (the default | 5156 It also contains primitives for working with @code{point} (the default |
4707 @file{editfns.c}. | 5163 @file{editfns.c}. |
4708 | 5164 |
4709 | 5165 |
4710 | 5166 |
4711 @example | 5167 @example |
4712 callint.c | 5168 @file{callint.c} |
4713 cmds.c | 5169 @file{cmds.c} |
4714 commands.h | 5170 @file{commands.h} |
4715 @end example | 5171 @end example |
4716 | 5172 |
4717 @cindex interactive | 5173 @cindex interactive |
4718 These modules implement the basic @dfn{interactive} commands, | 5174 These modules implement the basic @dfn{interactive} commands, |
4719 i.e. user-callable functions. Commands, as opposed to other functions, | 5175 i.e. user-callable functions. Commands, as opposed to other functions, |
4736 @file{commands.h} contains associated structure definitions and prototypes. | 5192 @file{commands.h} contains associated structure definitions and prototypes. |
4737 | 5193 |
4738 | 5194 |
4739 | 5195 |
4740 @example | 5196 @example |
4741 regex.c | 5197 @file{regex.c} |
4742 regex.h | 5198 @file{regex.h} |
4743 search.c | 5199 @file{search.c} |
4744 @end example | 5200 @end example |
4745 | 5201 |
4746 @file{search.c} implements the Lisp primitives for searching for text in | 5202 @file{search.c} implements the Lisp primitives for searching for text in |
4747 a buffer, and some of the low-level algorithms for doing this. In | 5203 a buffer, and some of the low-level algorithms for doing this. In |
4748 particular, the fast fixed-string Boyer-Moore search algorithm is | 5204 particular, the fast fixed-string Boyer-Moore search algorithm is |
4753 routines used in @file{grep} and other GNU utilities. | 5209 routines used in @file{grep} and other GNU utilities. |
4754 | 5210 |
4755 | 5211 |
4756 | 5212 |
4757 @example | 5213 @example |
4758 doprnt.c | 5214 @file{doprnt.c} |
4759 @end example | 5215 @end example |
4760 | 5216 |
4761 @file{doprnt.c} implements formatted-string processing, similar to | 5217 @file{doprnt.c} implements formatted-string processing, similar to |
4762 @code{printf()} command in C. | 5218 @code{printf()} command in C. |
4763 | 5219 |
4764 | 5220 |
4765 | 5221 |
4766 @example | 5222 @example |
4767 undo.c | 5223 @file{undo.c} |
4768 @end example | 5224 @end example |
4769 | 5225 |
4770 This module implements the undo mechanism for tracking buffer changes. | 5226 This module implements the undo mechanism for tracking buffer changes. |
4771 Most of this could be implemented in Lisp. | 5227 Most of this could be implemented in Lisp. |
4772 | 5228 |
4773 | 5229 |
4774 | 5230 @node Modules for Interfacing with the File System, Modules for Other Aspects of the Lisp Interpreter and Object System, Modules for Standard Editing Operations, The Modules of XEmacs |
4775 @node Editor-Level Control Flow Modules | |
4776 @section Editor-Level Control Flow Modules | |
4777 @cindex control flow modules, editor-level | |
4778 @cindex modules, editor-level control flow | |
4779 | |
4780 @example | |
4781 event-Xt.c | |
4782 event-msw.c | |
4783 event-stream.c | |
4784 event-tty.c | |
4785 events-mod.h | |
4786 gpmevent.c | |
4787 gpmevent.h | |
4788 events.c | |
4789 events.h | |
4790 @end example | |
4791 | |
4792 These implement the handling of events (user input and other system | |
4793 notifications). | |
4794 | |
4795 @file{events.c} and @file{events.h} define the @dfn{event} Lisp object | |
4796 type and primitives for manipulating it. | |
4797 | |
4798 @file{event-stream.c} implements the basic functions for working with | |
4799 event queues, dispatching an event by looking it up in relevant keymaps | |
4800 and such, and handling timeouts; this includes the primitives | |
4801 @code{next-event} and @code{dispatch-event}, as well as related | |
4802 primitives such as @code{sit-for}, @code{sleep-for}, and | |
4803 @code{accept-process-output}. (@file{event-stream.c} is one of the | |
4804 hairiest and trickiest modules in XEmacs. Beware! You can easily mess | |
4805 things up here.) | |
4806 | |
4807 @file{event-Xt.c} and @file{event-tty.c} implement the low-level | |
4808 interfaces onto retrieving events from Xt (the X toolkit) and from TTY's | |
4809 (using @code{read()} and @code{select()}), respectively. The event | |
4810 interface enforces a clean separation between the specific code for | |
4811 interfacing with the operating system and the generic code for working | |
4812 with events, by defining an API of basic, low-level event methods; | |
4813 @file{event-Xt.c} and @file{event-tty.c} are two different | |
4814 implementations of this API. To add support for a new operating system | |
4815 (e.g. NeXTstep), one merely needs to provide another implementation of | |
4816 those API functions. | |
4817 | |
4818 Note that the choice of whether to use @file{event-Xt.c} or | |
4819 @file{event-tty.c} is made at compile time! Or at the very latest, it | |
4820 is made at startup time. @file{event-Xt.c} handles events for | |
4821 @emph{both} X and TTY frames; @file{event-tty.c} is only used when X | |
4822 support is not compiled into XEmacs. The reason for this is that there | |
4823 is only one event loop in XEmacs: thus, it needs to be able to receive | |
4824 events from all different kinds of frames. | |
4825 | |
4826 | |
4827 | |
4828 @example | |
4829 keymap.c | |
4830 keymap.h | |
4831 @end example | |
4832 | |
4833 @file{keymap.c} and @file{keymap.h} define the @dfn{keymap} Lisp object | |
4834 type and associated methods and primitives. (Remember that keymaps are | |
4835 objects that associate event descriptions with functions to be called to | |
4836 ``execute'' those events; @code{dispatch-event} looks up events in the | |
4837 relevant keymaps.) | |
4838 | |
4839 | |
4840 | |
4841 @example | |
4842 cmdloop.c | |
4843 @end example | |
4844 | |
4845 @file{cmdloop.c} contains functions that implement the actual editor | |
4846 command loop---i.e. the event loop that cyclically retrieves and | |
4847 dispatches events. This code is also rather tricky, just like | |
4848 @file{event-stream.c}. | |
4849 | |
4850 | |
4851 | |
4852 @example | |
4853 macros.c | |
4854 macros.h | |
4855 @end example | |
4856 | |
4857 These two modules contain the basic code for defining keyboard macros. | |
4858 These functions don't actually do much; most of the code that handles keyboard | |
4859 macros is mixed in with the event-handling code in @file{event-stream.c}. | |
4860 | |
4861 | |
4862 | |
4863 @example | |
4864 minibuf.c | |
4865 @end example | |
4866 | |
4867 This contains some miscellaneous code related to the minibuffer (most of | |
4868 the minibuffer code was moved into Lisp by Richard Mlynarik). This | |
4869 includes the primitives for completion (although filename completion is | |
4870 in @file{dired.c}), the lowest-level interface to the minibuffer (if the | |
4871 command loop were cleaned up, this too could be in Lisp), and code for | |
4872 dealing with the echo area (this, too, was mostly moved into Lisp, and | |
4873 the only code remaining is code to call out to Lisp or provide simple | |
4874 bootstrapping implementations early in temacs, before the echo-area Lisp | |
4875 code is loaded). | |
4876 | |
4877 | |
4878 | |
4879 @node Modules for the Basic Displayable Lisp Objects | |
4880 @section Modules for the Basic Displayable Lisp Objects | |
4881 @cindex modules for the basic displayable Lisp objects | |
4882 @cindex displayable Lisp objects, modules for the basic | |
4883 @cindex Lisp objects, modules for the basic displayable | |
4884 @cindex objects, modules for the basic displayable Lisp | |
4885 | |
4886 @example | |
4887 console-msw.c | |
4888 console-msw.h | |
4889 console-stream.c | |
4890 console-stream.h | |
4891 console-tty.c | |
4892 console-tty.h | |
4893 console-x.c | |
4894 console-x.h | |
4895 console.c | |
4896 console.h | |
4897 @end example | |
4898 | |
4899 These modules implement the @dfn{console} Lisp object type. A console | |
4900 contains multiple display devices, but only one keyboard and mouse. | |
4901 Most of the time, a console will contain exactly one device. | |
4902 | |
4903 Consoles are the top of a lisp object inclusion hierarchy. Consoles | |
4904 contain devices, which contain frames, which contain windows. | |
4905 | |
4906 | |
4907 | |
4908 @example | |
4909 device-msw.c | |
4910 device-tty.c | |
4911 device-x.c | |
4912 device.c | |
4913 device.h | |
4914 @end example | |
4915 | |
4916 These modules implement the @dfn{device} Lisp object type. This | |
4917 abstracts a particular screen or connection on which frames are | |
4918 displayed. As with Lisp objects, event interfaces, and other | |
4919 subsystems, the device code is separated into a generic component that | |
4920 contains a standardized interface (in the form of a set of methods) onto | |
4921 particular device types. | |
4922 | |
4923 The device subsystem defines all the methods and provides method | |
4924 services for not only device operations but also for the frame, window, | |
4925 menubar, scrollbar, toolbar, and other displayable-object subsystems. | |
4926 The reason for this is that all of these subsystems have the same | |
4927 subtypes (X, TTY, NeXTstep, Microsoft Windows, etc.) as devices do. | |
4928 | |
4929 | |
4930 | |
4931 @example | |
4932 frame-msw.c | |
4933 frame-tty.c | |
4934 frame-x.c | |
4935 frame.c | |
4936 frame.h | |
4937 @end example | |
4938 | |
4939 Each device contains one or more frames in which objects (e.g. text) are | |
4940 displayed. A frame corresponds to a window in the window system; | |
4941 usually this is a top-level window but it could potentially be one of a | |
4942 number of overlapping child windows within a top-level window, using the | |
4943 MDI (Multiple Document Interface) protocol in Microsoft Windows or a | |
4944 similar scheme. | |
4945 | |
4946 The @file{frame-*} files implement the @dfn{frame} Lisp object type and | |
4947 provide the generic and device-type-specific operations on frames | |
4948 (e.g. raising, lowering, resizing, moving, etc.). | |
4949 | |
4950 | |
4951 | |
4952 @example | |
4953 window.c | |
4954 window.h | |
4955 @end example | |
4956 | |
4957 @cindex window (in Emacs) | |
4958 @cindex pane | |
4959 Each frame consists of one or more non-overlapping @dfn{windows} (better | |
4960 known as @dfn{panes} in standard window-system terminology) in which a | |
4961 buffer's text can be displayed. Windows can also have scrollbars | |
4962 displayed around their edges. | |
4963 | |
4964 @file{window.c} and @file{window.h} implement the @dfn{window} Lisp | |
4965 object type and provide code to manage windows. Since windows have no | |
4966 associated resources in the window system (the window system knows only | |
4967 about the frame; no child windows or anything are used for XEmacs | |
4968 windows), there is no device-type-specific code here; all of that code | |
4969 is part of the redisplay mechanism or the code for particular object | |
4970 types such as scrollbars. | |
4971 | |
4972 | |
4973 | |
4974 @node Modules for other Display-Related Lisp Objects | |
4975 @section Modules for other Display-Related Lisp Objects | |
4976 @cindex modules for other display-related Lisp objects | |
4977 @cindex display-related Lisp objects, modules for other | |
4978 @cindex Lisp objects, modules for other display-related | |
4979 | |
4980 @example | |
4981 faces.c | |
4982 faces.h | |
4983 @end example | |
4984 | |
4985 | |
4986 | |
4987 @example | |
4988 bitmaps.h | |
4989 glyphs-eimage.c | |
4990 glyphs-msw.c | |
4991 glyphs-msw.h | |
4992 glyphs-widget.c | |
4993 glyphs-x.c | |
4994 glyphs-x.h | |
4995 glyphs.c | |
4996 glyphs.h | |
4997 @end example | |
4998 | |
4999 | |
5000 | |
5001 @example | |
5002 objects-msw.c | |
5003 objects-msw.h | |
5004 objects-tty.c | |
5005 objects-tty.h | |
5006 objects-x.c | |
5007 objects-x.h | |
5008 objects.c | |
5009 objects.h | |
5010 @end example | |
5011 | |
5012 | |
5013 | |
5014 @example | |
5015 menubar-msw.c | |
5016 menubar-msw.h | |
5017 menubar-x.c | |
5018 menubar.c | |
5019 menubar.h | |
5020 @end example | |
5021 | |
5022 | |
5023 | |
5024 @example | |
5025 scrollbar-msw.c | |
5026 scrollbar-msw.h | |
5027 scrollbar-x.c | |
5028 scrollbar-x.h | |
5029 scrollbar.c | |
5030 scrollbar.h | |
5031 @end example | |
5032 | |
5033 | |
5034 | |
5035 @example | |
5036 toolbar-msw.c | |
5037 toolbar-x.c | |
5038 toolbar.c | |
5039 toolbar.h | |
5040 @end example | |
5041 | |
5042 | |
5043 | |
5044 @example | |
5045 font-lock.c | |
5046 @end example | |
5047 | |
5048 This file provides C support for syntax highlighting---i.e. | |
5049 highlighting different syntactic constructs of a source file in | |
5050 different colors, for easy reading. The C support is provided so that | |
5051 this is fast. | |
5052 | |
5053 | |
5054 | |
5055 @example | |
5056 dgif_lib.c | |
5057 gif_err.c | |
5058 gif_lib.h | |
5059 gifalloc.c | |
5060 @end example | |
5061 | |
5062 These modules decode GIF-format image files, for use with glyphs. | |
5063 These files were removed due to Unisys patent infringement concerns. | |
5064 | |
5065 | |
5066 | |
5067 @node Modules for the Redisplay Mechanism | |
5068 @section Modules for the Redisplay Mechanism | |
5069 @cindex modules for the redisplay mechanism | |
5070 @cindex redisplay mechanism, modules for the | |
5071 | |
5072 @example | |
5073 redisplay-output.c | |
5074 redisplay-msw.c | |
5075 redisplay-tty.c | |
5076 redisplay-x.c | |
5077 redisplay.c | |
5078 redisplay.h | |
5079 @end example | |
5080 | |
5081 These files provide the redisplay mechanism. As with many other | |
5082 subsystems in XEmacs, there is a clean separation between the general | |
5083 and device-specific support. | |
5084 | |
5085 @file{redisplay.c} contains the bulk of the redisplay engine. These | |
5086 functions update the redisplay structures (which describe how the screen | |
5087 is to appear) to reflect any changes made to the state of any | |
5088 displayable objects (buffer, frame, window, etc.) since the last time | |
5089 that redisplay was called. These functions are highly optimized to | |
5090 avoid doing more work than necessary (since redisplay is called | |
5091 extremely often and is potentially a huge time sink), and depend heavily | |
5092 on notifications from the objects themselves that changes have occurred, | |
5093 so that redisplay doesn't explicitly have to check each possible object. | |
5094 The redisplay mechanism also contains a great deal of caching to further | |
5095 speed things up; some of this caching is contained within the various | |
5096 displayable objects. | |
5097 | |
5098 @file{redisplay-output.c} goes through the redisplay structures and converts | |
5099 them into calls to device-specific methods to actually output the screen | |
5100 changes. | |
5101 | |
5102 @file{redisplay-x.c} and @file{redisplay-tty.c} are two implementations | |
5103 of these redisplay output methods, for X frames and TTY frames, | |
5104 respectively. | |
5105 | |
5106 | |
5107 | |
5108 @example | |
5109 indent.c | |
5110 @end example | |
5111 | |
5112 This module contains various functions and Lisp primitives for | |
5113 converting between buffer positions and screen positions. These | |
5114 functions call the redisplay mechanism to do most of the work, and then | |
5115 examine the redisplay structures to get the necessary information. This | |
5116 module needs work. | |
5117 | |
5118 | |
5119 | |
5120 @example | |
5121 termcap.c | |
5122 terminfo.c | |
5123 tparam.c | |
5124 @end example | |
5125 | |
5126 These files contain functions for working with the termcap (BSD-style) | |
5127 and terminfo (System V style) databases of terminal capabilities and | |
5128 escape sequences, used when XEmacs is displaying in a TTY. | |
5129 | |
5130 | |
5131 | |
5132 @example | |
5133 cm.c | |
5134 cm.h | |
5135 @end example | |
5136 | |
5137 These files provide some miscellaneous TTY-output functions and should | |
5138 probably be merged into @file{redisplay-tty.c}. | |
5139 | |
5140 | |
5141 | |
5142 @node Modules for Interfacing with the File System | |
5143 @section Modules for Interfacing with the File System | 5231 @section Modules for Interfacing with the File System |
5144 @cindex modules for interfacing with the file system | 5232 @cindex modules for interfacing with the file system |
5145 @cindex interfacing with the file system, modules for | 5233 @cindex interfacing with the file system, modules for |
5146 @cindex file system, modules for interfacing with the | 5234 @cindex file system, modules for interfacing with the |
5147 | 5235 |
5148 @example | 5236 @example |
5149 lstream.c | 5237 @file{lstream.c} |
5150 lstream.h | 5238 @file{lstream.h} |
5151 @end example | 5239 @end example |
5152 | 5240 |
5153 These modules implement the @dfn{stream} Lisp object type. This is an | 5241 These modules implement the @dfn{stream} Lisp object type. This is an |
5154 internal-only Lisp object that implements a generic buffering stream. | 5242 internal-only Lisp object that implements a generic buffering stream. |
5155 The idea is to provide a uniform interface onto all sources and sinks of | 5243 The idea is to provide a uniform interface onto all sources and sinks of |
5172 types of streams; others are provided, e.g., in @file{file-coding.c}. | 5260 types of streams; others are provided, e.g., in @file{file-coding.c}. |
5173 | 5261 |
5174 | 5262 |
5175 | 5263 |
5176 @example | 5264 @example |
5177 fileio.c | 5265 @file{fileio.c} |
5178 @end example | 5266 @end example |
5179 | 5267 |
5180 This implements the basic primitives for interfacing with the file | 5268 This implements the basic primitives for interfacing with the file |
5181 system. This includes primitives for reading files into buffers, | 5269 system. This includes primitives for reading files into buffers, |
5182 writing buffers into files, checking for the presence or accessibility | 5270 writing buffers into files, checking for the presence or accessibility |
5189 @file{simple.el}. | 5277 @file{simple.el}. |
5190 | 5278 |
5191 | 5279 |
5192 | 5280 |
5193 @example | 5281 @example |
5194 filelock.c | 5282 @file{filelock.c} |
5195 @end example | 5283 @end example |
5196 | 5284 |
5197 This file provides functions for detecting clashes between different | 5285 This file provides functions for detecting clashes between different |
5198 processes (e.g. XEmacs and some external process, or two different | 5286 processes (e.g. XEmacs and some external process, or two different |
5199 XEmacs processes) modifying the same file. (XEmacs can optionally use | 5287 XEmacs processes) modifying the same file. (XEmacs can optionally use |
5204 modified, the user is made aware of this so that the buffer can be | 5292 modified, the user is made aware of this so that the buffer can be |
5205 synched up with the external changes if necessary. | 5293 synched up with the external changes if necessary. |
5206 | 5294 |
5207 | 5295 |
5208 @example | 5296 @example |
5209 filemode.c | 5297 @file{filemode.c} |
5210 @end example | 5298 @end example |
5211 | 5299 |
5212 This file provides some miscellaneous functions that construct a | 5300 This file provides some miscellaneous functions that construct a |
5213 @samp{rwxr-xr-x}-type permissions string (as might appear in an | 5301 @samp{rwxr-xr-x}-type permissions string (as might appear in an |
5214 @file{ls}-style directory listing) given the information returned by the | 5302 @file{ls}-style directory listing) given the information returned by the |
5215 @code{stat()} system call. | 5303 @code{stat()} system call. |
5216 | 5304 |
5217 | 5305 |
5218 | 5306 |
5219 @example | 5307 @example |
5220 dired.c | 5308 @file{dired.c} |
5221 ndir.h | 5309 @file{ndir.h} |
5222 @end example | 5310 @end example |
5223 | 5311 |
5224 These files implement the XEmacs interface to directory searching. This | 5312 These files implement the XEmacs interface to directory searching. This |
5225 includes a number of primitives for determining the files in a directory | 5313 includes a number of primitives for determining the files in a directory |
5226 and for doing filename completion. (Remember that generic completion is | 5314 and for doing filename completion. (Remember that generic completion is |
5232 those systems, directories can be read directly as files, and parsed.) | 5320 those systems, directories can be read directly as files, and parsed.) |
5233 | 5321 |
5234 | 5322 |
5235 | 5323 |
5236 @example | 5324 @example |
5237 realpath.c | 5325 @file{realpath.c} |
5238 @end example | 5326 @end example |
5239 | 5327 |
5240 This file provides an implementation of the @code{realpath()} function | 5328 This file provides an implementation of the @code{realpath()} function |
5241 for expanding symbolic links, on systems that don't implement it or have | 5329 for expanding symbolic links, on systems that don't implement it or have |
5242 a broken implementation. | 5330 a broken implementation. |
5243 | 5331 |
5244 | 5332 |
5245 | 5333 |
5246 @node Modules for Other Aspects of the Lisp Interpreter and Object System | 5334 @node Modules for Other Aspects of the Lisp Interpreter and Object System, Modules for Interfacing with the Operating System, Modules for Interfacing with the File System, The Modules of XEmacs |
5247 @section Modules for Other Aspects of the Lisp Interpreter and Object System | 5335 @section Modules for Other Aspects of the Lisp Interpreter and Object System |
5248 @cindex modules for other aspects of the Lisp interpreter and object system | 5336 @cindex modules for other aspects of the Lisp interpreter and object system |
5249 @cindex Lisp interpreter and object system, modules for other aspects of the | 5337 @cindex Lisp interpreter and object system, modules for other aspects of the |
5250 @cindex interpreter and object system, modules for other aspects of the Lisp | 5338 @cindex interpreter and object system, modules for other aspects of the Lisp |
5251 @cindex object system, modules for other aspects of the Lisp interpreter and | 5339 @cindex object system, modules for other aspects of the Lisp interpreter and |
5252 | 5340 |
5253 @example | 5341 @example |
5254 elhash.c | 5342 @file{elhash.c} |
5255 elhash.h | 5343 @file{elhash.h} |
5256 hash.c | 5344 @file{hash.c} |
5257 hash.h | 5345 @file{hash.h} |
5258 @end example | 5346 @end example |
5259 | 5347 |
5260 These files provide two implementations of hash tables. Files | 5348 These files provide two implementations of hash tables. Files |
5261 @file{hash.c} and @file{hash.h} provide a generic C implementation of | 5349 @file{hash.c} and @file{hash.h} provide a generic C implementation of |
5262 hash tables which can stand independently of XEmacs. Files | 5350 hash tables which can stand independently of XEmacs. Files |
5265 things like garbage collection, and implement the @dfn{hash-table} Lisp | 5353 things like garbage collection, and implement the @dfn{hash-table} Lisp |
5266 object type. | 5354 object type. |
5267 | 5355 |
5268 | 5356 |
5269 @example | 5357 @example |
5270 specifier.c | 5358 @file{specifier.c} |
5271 specifier.h | 5359 @file{specifier.h} |
5272 @end example | 5360 @end example |
5273 | 5361 |
5274 This module implements the @dfn{specifier} Lisp object type. This is | 5362 This module implements the @dfn{specifier} Lisp object type. This is |
5275 primarily used for displayable properties, and allows for values that | 5363 primarily used for displayable properties, and allows for values that |
5276 are specific to a particular buffer, window, frame, device, or device | 5364 are specific to a particular buffer, window, frame, device, or device |
5282 looks up a value given a window (from which a buffer, frame, and device | 5370 looks up a value given a window (from which a buffer, frame, and device |
5283 can be derived). | 5371 can be derived). |
5284 | 5372 |
5285 | 5373 |
5286 @example | 5374 @example |
5287 chartab.c | 5375 @file{chartab.c} |
5288 chartab.h | 5376 @file{chartab.h} |
5289 casetab.c | 5377 @file{casetab.c} |
5290 @end example | 5378 @end example |
5291 | 5379 |
5292 @file{chartab.c} and @file{chartab.h} implement the @dfn{char table} | 5380 @file{chartab.c} and @file{chartab.h} implement the @dfn{char table} |
5293 Lisp object type, which maps from characters or certain sorts of | 5381 Lisp object type, which maps from characters or certain sorts of |
5294 character ranges to Lisp objects. The implementation of this object | 5382 character ranges to Lisp objects. The implementation of this object |
5304 and to do case-insensitive searching. | 5392 and to do case-insensitive searching. |
5305 | 5393 |
5306 | 5394 |
5307 | 5395 |
5308 @example | 5396 @example |
5309 syntax.c | 5397 @file{syntax.c} |
5310 syntax.h | 5398 @file{syntax.h} |
5311 @end example | 5399 @end example |
5312 | 5400 |
5313 @cindex scanner | 5401 @cindex scanner |
5314 This module implements @dfn{syntax tables}, another sort of char table | 5402 This module implements @dfn{syntax tables}, another sort of char table |
5315 that maps characters into syntax classes that define the syntax of these | 5403 that maps characters into syntax classes that define the syntax of these |
5374 been removed, readded, and removed again. Currently neither GNU Emacs | 5462 been removed, readded, and removed again. Currently neither GNU Emacs |
5375 (21.3.99) nor XEmacs (21.5.17) seems to use it. | 5463 (21.3.99) nor XEmacs (21.5.17) seems to use it. |
5376 | 5464 |
5377 | 5465 |
5378 @example | 5466 @example |
5379 casefiddle.c | 5467 @file{casefiddle.c} |
5380 @end example | 5468 @end example |
5381 | 5469 |
5382 This module implements various Lisp primitives for upcasing, downcasing | 5470 This module implements various Lisp primitives for upcasing, downcasing |
5383 and capitalizing strings or regions of buffers. | 5471 and capitalizing strings or regions of buffers. |
5384 | 5472 |
5385 | 5473 |
5386 | 5474 |
5387 @example | 5475 @example |
5388 rangetab.c | 5476 @file{rangetab.c} |
5389 @end example | 5477 @end example |
5390 | 5478 |
5391 This module implements the @dfn{range table} Lisp object type, which | 5479 This module implements the @dfn{range table} Lisp object type, which |
5392 provides for a mapping from ranges of integers to arbitrary Lisp | 5480 provides for a mapping from ranges of integers to arbitrary Lisp |
5393 objects. | 5481 objects. |
5394 | 5482 |
5395 | 5483 |
5396 | 5484 |
5397 @example | 5485 @example |
5398 opaque.c | 5486 @file{opaque.c} |
5399 opaque.h | 5487 @file{opaque.h} |
5400 @end example | 5488 @end example |
5401 | 5489 |
5402 This module implements the @dfn{opaque} Lisp object type, an | 5490 This module implements the @dfn{opaque} Lisp object type, an |
5403 internal-only Lisp object that encapsulates an arbitrary block of memory | 5491 internal-only Lisp object that encapsulates an arbitrary block of memory |
5404 so that it can be managed by the Lisp allocation system. To create an | 5492 so that it can be managed by the Lisp allocation system. To create an |
5416 create a new Lisp object type---it's not hard.) | 5504 create a new Lisp object type---it's not hard.) |
5417 | 5505 |
5418 | 5506 |
5419 | 5507 |
5420 @example | 5508 @example |
5421 abbrev.c | 5509 @file{abbrev.c} |
5422 @end example | 5510 @end example |
5423 | 5511 |
5424 This function provides a few primitives for doing dynamic abbreviation | 5512 This function provides a few primitives for doing dynamic abbreviation |
5425 expansion. In XEmacs, most of the code for this has been moved into | 5513 expansion. In XEmacs, most of the code for this has been moved into |
5426 Lisp. Some C code remains for speed and because the primitive | 5514 Lisp. Some C code remains for speed and because the primitive |
5429 is itself in C only for speed.) | 5517 is itself in C only for speed.) |
5430 | 5518 |
5431 | 5519 |
5432 | 5520 |
5433 @example | 5521 @example |
5434 doc.c | 5522 @file{doc.c} |
5435 @end example | 5523 @end example |
5436 | 5524 |
5437 This function provides primitives for retrieving the documentation | 5525 This function provides primitives for retrieving the documentation |
5438 strings of functions and variables. These documentation strings contain | 5526 strings of functions and variables. These documentation strings contain |
5439 certain special markers that get dynamically expanded (e.g. a | 5527 certain special markers that get dynamically expanded (e.g. a |
5448 the appropriate documentation string.) | 5536 the appropriate documentation string.) |
5449 | 5537 |
5450 | 5538 |
5451 | 5539 |
5452 @example | 5540 @example |
5453 md5.c | 5541 @file{md5.c} |
5454 @end example | 5542 @end example |
5455 | 5543 |
5456 This function provides a Lisp primitive that implements the MD5 secure | 5544 This function provides a Lisp primitive that implements the MD5 secure |
5457 hashing scheme, used to create a large hash value of a string of data such that | 5545 hashing scheme, used to create a large hash value of a string of data such that |
5458 the data cannot be derived from the hash value. This is used for | 5546 the data cannot be derived from the hash value. This is used for |
5459 various security applications on the Internet. | 5547 various security applications on the Internet. |
5460 | 5548 |
5461 | 5549 |
5462 | 5550 |
5463 | 5551 |
5464 @node Modules for Interfacing with the Operating System | 5552 @node Modules for Interfacing with the Operating System, , Modules for Other Aspects of the Lisp Interpreter and Object System, The Modules of XEmacs |
5465 @section Modules for Interfacing with the Operating System | 5553 @section Modules for Interfacing with the Operating System |
5466 @cindex modules for interfacing with the operating system | 5554 @cindex modules for interfacing with the operating system |
5467 @cindex interfacing with the operating system, modules for | 5555 @cindex interfacing with the operating system, modules for |
5468 @cindex operating system, modules for interfacing with the | 5556 @cindex operating system, modules for interfacing with the |
5469 | 5557 |
5470 @example | 5558 @example |
5471 process.el | 5559 @file{process.el} |
5472 process.c | 5560 @file{process.c} |
5473 process.h | 5561 @file{process.h} |
5474 @end example | 5562 @end example |
5475 | 5563 |
5476 These modules allow XEmacs to spawn and communicate with subprocesses | 5564 These modules allow XEmacs to spawn and communicate with subprocesses |
5477 and network connections. | 5565 and network connections. |
5478 | 5566 |
5513 subprocesses. | 5601 subprocesses. |
5514 | 5602 |
5515 | 5603 |
5516 | 5604 |
5517 @example | 5605 @example |
5518 sysdep.c | 5606 @file{sysdep.c} |
5519 sysdep.h | 5607 @file{sysdep.h} |
5520 @end example | 5608 @end example |
5521 | 5609 |
5522 These modules implement most of the low-level, messy operating-system | 5610 These modules implement most of the low-level, messy operating-system |
5523 interface code. This includes various device control (ioctl) operations | 5611 interface code. This includes various device control (ioctl) operations |
5524 for file descriptors, TTY's, pseudo-terminals, etc. (usually this stuff | 5612 for file descriptors, TTY's, pseudo-terminals, etc. (usually this stuff |
5527 provide them or have broken versions. | 5615 provide them or have broken versions. |
5528 | 5616 |
5529 | 5617 |
5530 | 5618 |
5531 @example | 5619 @example |
5532 sysdir.h | 5620 @file{sysdir.h} |
5533 sysfile.h | 5621 @file{sysfile.h} |
5534 sysfloat.h | 5622 @file{sysfloat.h} |
5535 sysproc.h | 5623 @file{sysproc.h} |
5536 syspwd.h | 5624 @file{syspwd.h} |
5537 syssignal.h | 5625 @file{syssignal.h} |
5538 systime.h | 5626 @file{systime.h} |
5539 systty.h | 5627 @file{systty.h} |
5540 syswait.h | 5628 @file{syswait.h} |
5541 @end example | 5629 @end example |
5542 | 5630 |
5543 These header files provide consistent interfaces onto system-dependent | 5631 These header files provide consistent interfaces onto system-dependent |
5544 header files and system calls. The idea is that, instead of including a | 5632 header files and system calls. The idea is that, instead of including a |
5545 standard header file like @file{<sys/param.h>} (which may or may not | 5633 standard header file like @file{<sys/param.h>} (which may or may not |
5590 an int). | 5678 an int). |
5591 | 5679 |
5592 | 5680 |
5593 | 5681 |
5594 @example | 5682 @example |
5595 hpplay.c | 5683 @file{hpplay.c} |
5596 libsst.c | 5684 @file{libsst.c} |
5597 libsst.h | 5685 @file{libsst.h} |
5598 libst.h | 5686 @file{libst.h} |
5599 linuxplay.c | 5687 @file{linuxplay.c} |
5600 nas.c | 5688 @file{nas.c} |
5601 sgiplay.c | 5689 @file{sgiplay.c} |
5602 sound.c | 5690 @file{sound.c} |
5603 sunplay.c | 5691 @file{sunplay.c} |
5604 @end example | 5692 @end example |
5605 | 5693 |
5606 These files implement the ability to play various sounds on some types | 5694 These files implement the ability to play various sounds on some types |
5607 of computers. You have to configure your XEmacs with sound support in | 5695 of computers. You have to configure your XEmacs with sound support in |
5608 order to get this capability. | 5696 order to get this capability. |
5635 currently in use. | 5723 currently in use. |
5636 | 5724 |
5637 | 5725 |
5638 | 5726 |
5639 @example | 5727 @example |
5640 tooltalk.c | 5728 @file{tooltalk.c} |
5641 tooltalk.h | 5729 @file{tooltalk.h} |
5642 @end example | 5730 @end example |
5643 | 5731 |
5644 These two modules implement an interface to the ToolTalk protocol, which | 5732 These two modules implement an interface to the ToolTalk protocol, which |
5645 is an interprocess communication protocol implemented on some versions | 5733 is an interprocess communication protocol implemented on some versions |
5646 of Unix. ToolTalk is a high-level protocol that allows processes to | 5734 of Unix. ToolTalk is a high-level protocol that allows processes to |
5652 parts of the SPARCWorks development environment. | 5740 parts of the SPARCWorks development environment. |
5653 | 5741 |
5654 | 5742 |
5655 | 5743 |
5656 @example | 5744 @example |
5657 getloadavg.c | 5745 @file{getloadavg.c} |
5658 @end example | 5746 @end example |
5659 | 5747 |
5660 This module provides the ability to retrieve the system's current load | 5748 This module provides the ability to retrieve the system's current load |
5661 average. (The way to do this is highly system-specific, unfortunately, | 5749 average. (The way to do this is highly system-specific, unfortunately, |
5662 and requires a lot of special-case code.) | 5750 and requires a lot of special-case code.) |
5663 | 5751 |
5664 | 5752 |
5665 | 5753 |
5666 @example | 5754 @example |
5667 sunpro.c | 5755 @file{sunpro.c} |
5668 @end example | 5756 @end example |
5669 | 5757 |
5670 This module provides a small amount of code used internally at Sun to | 5758 This module provides a small amount of code used internally at Sun to |
5671 keep statistics on the usage of XEmacs. | 5759 keep statistics on the usage of XEmacs. |
5672 | 5760 |
5673 | 5761 |
5674 | 5762 |
5675 @example | 5763 @example |
5676 broken-sun.h | 5764 @file{broken-sun.h} |
5677 strcmp.c | 5765 @file{strcmp.c} |
5678 strcpy.c | 5766 @file{strcpy.c} |
5679 sunOS-fix.c | 5767 @file{sunOS-fix.c} |
5680 @end example | 5768 @end example |
5681 | 5769 |
5682 These files provide replacement functions and prototypes to fix numerous | 5770 These files provide replacement functions and prototypes to fix numerous |
5683 bugs in early releases of SunOS 4.1. | 5771 bugs in early releases of SunOS 4.1. |
5684 | 5772 |
5685 | 5773 |
5686 | 5774 |
5687 @example | 5775 @example |
5688 hftctl.c | 5776 @file{hftctl.c} |
5689 @end example | 5777 @end example |
5690 | 5778 |
5691 This module provides some terminal-control code necessary on versions of | 5779 This module provides some terminal-control code necessary on versions of |
5692 AIX prior to 4.1. | 5780 AIX prior to 4.1. |
5693 | 5781 |
5694 | 5782 |
5695 | 5783 @node Allocation of Objects in XEmacs Lisp, Dumping, The Modules of XEmacs, Top |
5696 @node Modules for Interfacing with X Windows | |
5697 @section Modules for Interfacing with X Windows | |
5698 @cindex modules for interfacing with X Windows | |
5699 @cindex interfacing with X Windows, modules for | |
5700 @cindex X Windows, modules for interfacing with | |
5701 | |
5702 @example | |
5703 Emacs.ad.h | |
5704 @end example | |
5705 | |
5706 A file generated from @file{Emacs.ad}, which contains XEmacs-supplied | |
5707 fallback resources (so that XEmacs has pretty defaults). | |
5708 | |
5709 | |
5710 | |
5711 @example | |
5712 EmacsFrame.c | |
5713 EmacsFrame.h | |
5714 EmacsFrameP.h | |
5715 @end example | |
5716 | |
5717 These modules implement an Xt widget class that encapsulates a frame. | |
5718 This is for ease in integrating with Xt. The EmacsFrame widget covers | |
5719 the entire X window except for the menubar; the scrollbars are | |
5720 positioned on top of the EmacsFrame widget. | |
5721 | |
5722 @strong{Warning:} Abandon hope, all ye who enter here. This code took | |
5723 an ungodly amount of time to get right, and is likely to fall apart | |
5724 mercilessly at the slightest change. Such is life under Xt. | |
5725 | |
5726 | |
5727 | |
5728 @example | |
5729 EmacsManager.c | |
5730 EmacsManager.h | |
5731 EmacsManagerP.h | |
5732 @end example | |
5733 | |
5734 These modules implement a simple Xt manager (i.e. composite) widget | |
5735 class that simply lets its children set whatever geometry they want. | |
5736 It's amazing that Xt doesn't provide this standardly, but on second | |
5737 thought, it makes sense, considering how amazingly broken Xt is. | |
5738 | |
5739 | |
5740 @example | |
5741 EmacsShell-sub.c | |
5742 EmacsShell.c | |
5743 EmacsShell.h | |
5744 EmacsShellP.h | |
5745 @end example | |
5746 | |
5747 These modules implement two Xt widget classes that are subclasses of | |
5748 the TopLevelShell and TransientShell classes. This is necessary to deal | |
5749 with more brokenness that Xt has sadistically thrust onto the backs of | |
5750 developers. | |
5751 | |
5752 | |
5753 | |
5754 @example | |
5755 xgccache.c | |
5756 xgccache.h | |
5757 @end example | |
5758 | |
5759 These modules provide functions for maintenance and caching of GC's | |
5760 (graphics contexts) under the X Window System. This code is junky and | |
5761 needs to be rewritten. | |
5762 | |
5763 | |
5764 | |
5765 @example | |
5766 select-msw.c | |
5767 select-x.c | |
5768 select.c | |
5769 select.h | |
5770 @end example | |
5771 | |
5772 @cindex selections | |
5773 This module provides an interface to the X Window System's concept of | |
5774 @dfn{selections}, the standard way for X applications to communicate | |
5775 with each other. | |
5776 | |
5777 | |
5778 | |
5779 @example | |
5780 xintrinsic.h | |
5781 xintrinsicp.h | |
5782 xmmanagerp.h | |
5783 xmprimitivep.h | |
5784 @end example | |
5785 | |
5786 These header files are similar in spirit to the @file{sys*.h} files and buffer | |
5787 against different implementations of Xt and Motif. | |
5788 | |
5789 @itemize @bullet | |
5790 @item | |
5791 @file{xintrinsic.h} should be included in place of @file{<Intrinsic.h>}. | |
5792 @item | |
5793 @file{xintrinsicp.h} should be included in place of @file{<IntrinsicP.h>}. | |
5794 @item | |
5795 @file{xmmanagerp.h} should be included in place of @file{<XmManagerP.h>}. | |
5796 @item | |
5797 @file{xmprimitivep.h} should be included in place of @file{<XmPrimitiveP.h>}. | |
5798 @end itemize | |
5799 | |
5800 | |
5801 | |
5802 @example | |
5803 xmu.c | |
5804 xmu.h | |
5805 @end example | |
5806 | |
5807 These files provide an emulation of the Xmu library for those systems | |
5808 (i.e. HPUX) that don't provide it as a standard part of X. | |
5809 | |
5810 | |
5811 | |
5812 @example | |
5813 ExternalClient-Xlib.c | |
5814 ExternalClient.c | |
5815 ExternalClient.h | |
5816 ExternalClientP.h | |
5817 ExternalShell.c | |
5818 ExternalShell.h | |
5819 ExternalShellP.h | |
5820 extw-Xlib.c | |
5821 extw-Xlib.h | |
5822 extw-Xt.c | |
5823 extw-Xt.h | |
5824 @end example | |
5825 | |
5826 @cindex external widget | |
5827 These files provide the @dfn{external widget} interface, which allows an | |
5828 XEmacs frame to appear as a widget in another application. To do this, | |
5829 you have to configure with @samp{--external-widget}. | |
5830 | |
5831 @file{ExternalShell*} provides the server (XEmacs) side of the | |
5832 connection. | |
5833 | |
5834 @file{ExternalClient*} provides the client (other application) side of | |
5835 the connection. These files are not compiled into XEmacs but are | |
5836 compiled into libraries that are then linked into your application. | |
5837 | |
5838 @file{extw-*} is common code that is used for both the client and server. | |
5839 | |
5840 Don't touch this code; something is liable to break if you do. | |
5841 | |
5842 | |
5843 | |
5844 @node Modules for Internationalization | |
5845 @section Modules for Internationalization | |
5846 @cindex modules for internationalization | |
5847 @cindex internationalization, modules for | |
5848 | |
5849 @example | |
5850 mule-canna.c | |
5851 mule-ccl.c | |
5852 mule-charset.c | |
5853 mule-charset.h | |
5854 file-coding.c | |
5855 file-coding.h | |
5856 mule-coding.c | |
5857 mule-mcpath.c | |
5858 mule-mcpath.h | |
5859 mule-wnnfns.c | |
5860 mule.c | |
5861 @end example | |
5862 | |
5863 These files implement the MULE (Asian-language) support. Note that MULE | |
5864 actually provides a general interface for all sorts of languages, not | |
5865 just Asian languages (although they are generally the most complicated | |
5866 to support). This code is still in beta. | |
5867 | |
5868 @file{mule-charset.*} and @file{file-coding.*} provide the heart of the | |
5869 XEmacs MULE support. @file{mule-charset.*} implements the @dfn{charset} | |
5870 Lisp object type, which encapsulates a character set (an ordered one- or | |
5871 two-dimensional set of characters, such as US ASCII or JISX0208 Japanese | |
5872 Kanji). | |
5873 | |
5874 @file{file-coding.*} implements the @dfn{coding-system} Lisp object | |
5875 type, which encapsulates a method of converting between different | |
5876 encodings. An encoding is a representation of a stream of characters, | |
5877 possibly from multiple character sets, using a stream of bytes or words, | |
5878 and defines (e.g.) which escape sequences are used to specify particular | |
5879 character sets, how the indices for a character are converted into bytes | |
5880 (sometimes this involves setting the high bit; sometimes complicated | |
5881 rearranging of the values takes place, as in the Shift-JIS encoding), | |
5882 etc. It also contains some generic coding system implementations, such | |
5883 as the binary (no-conversion) coding system and a sample gzip coding system. | |
5884 | |
5885 @file{mule-coding.c} contains the implementations of text coding systems. | |
5886 | |
5887 @file{mule-ccl.c} provides the CCL (Code Conversion Language) | |
5888 interpreter. CCL is similar in spirit to Lisp byte code and is used to | |
5889 implement converters for custom encodings. | |
5890 | |
5891 @file{mule-canna.c} and @file{mule-wnnfns.c} implement interfaces to | |
5892 external programs used to implement the Canna and WNN input methods, | |
5893 respectively. This is currently in beta. | |
5894 | |
5895 @file{mule-mcpath.c} provides some functions to allow for pathnames | |
5896 containing extended characters. This code is fragmentary, obsolete, and | |
5897 completely non-working. Instead, @code{pathname-coding-system} is used | |
5898 to specify conversions of names of files and directories. The standard | |
5899 C I/O functions like @samp{open()} are wrapped so that conversion occurs | |
5900 automatically. | |
5901 | |
5902 @file{mule.c} contains a few miscellaneous things. It currently seems | |
5903 to be unused and probably should be removed. | |
5904 | |
5905 | |
5906 | |
5907 @example | |
5908 intl.c | |
5909 @end example | |
5910 | |
5911 This provides some miscellaneous internationalization code for | |
5912 implementing message translation and interfacing to the Ximp input | |
5913 method. None of this code is currently working. | |
5914 | |
5915 | |
5916 | |
5917 @example | |
5918 iso-wide.h | |
5919 @end example | |
5920 | |
5921 This contains leftover code from an earlier implementation of | |
5922 Asian-language support, and is not currently used. | |
5923 | |
5924 | |
5925 | |
5926 | |
5927 @node Modules for Regression Testing | |
5928 @section Modules for Regression Testing | |
5929 @cindex modules for regression testing | |
5930 @cindex regression testing, modules for | |
5931 | |
5932 @example | |
5933 test-harness.el | |
5934 base64-tests.el | |
5935 byte-compiler-tests.el | |
5936 case-tests.el | |
5937 ccl-tests.el | |
5938 c-tests.el | |
5939 database-tests.el | |
5940 extent-tests.el | |
5941 hash-table-tests.el | |
5942 lisp-tests.el | |
5943 md5-tests.el | |
5944 mule-tests.el | |
5945 regexp-tests.el | |
5946 symbol-tests.el | |
5947 syntax-tests.el | |
5948 tag-tests.el | |
5949 weak-tests.el | |
5950 @end example | |
5951 | |
5952 @file{test-harness.el} defines the macros @code{Assert}, | |
5953 @code{Check-Error}, @code{Check-Error-Message}, and | |
5954 @code{Check-Message}. The other files are test files, testing various | |
5955 XEmacs facilities. @xref{Regression Testing XEmacs}. | |
5956 | |
5957 | |
5958 | |
5959 @node Allocation of Objects in XEmacs Lisp, Dumping, A Summary of the Various XEmacs Modules, Top | |
5960 @chapter Allocation of Objects in XEmacs Lisp | 5784 @chapter Allocation of Objects in XEmacs Lisp |
5961 @cindex allocation of objects in XEmacs Lisp | 5785 @cindex allocation of objects in XEmacs Lisp |
5962 @cindex objects in XEmacs Lisp, allocation of | 5786 @cindex objects in XEmacs Lisp, allocation of |
5963 @cindex Lisp objects, allocation of in XEmacs | 5787 @cindex Lisp objects, allocation of in XEmacs |
5964 | 5788 |
5965 @menu | 5789 @menu |
5966 * Introduction to Allocation:: | 5790 * Introduction to Allocation:: |
5967 * Garbage Collection:: | 5791 * Garbage Collection:: |
5968 * GCPROing:: | 5792 * GCPROing:: |
5969 * Garbage Collection - Step by Step:: | 5793 * Garbage Collection - Step by Step:: |
5970 * Integers and Characters:: | 5794 * Integers and Characters:: |
5971 * Allocation from Frob Blocks:: | 5795 * Allocation from Frob Blocks:: |
5972 * lrecords:: | 5796 * lrecords:: |
5973 * Low-level allocation:: | 5797 * Low-level allocation:: |
5974 * Cons:: | 5798 * Cons:: |
5975 * Vector:: | 5799 * Vector:: |
5976 * Bit Vector:: | 5800 * Bit Vector:: |
5977 * Symbol:: | 5801 * Symbol:: |
5978 * Marker:: | 5802 * Marker:: |
5979 * String:: | 5803 * String:: |
5980 * Compiled Function:: | 5804 * Compiled Function:: |
5981 @end menu | 5805 @end menu |
5982 | 5806 |
5983 @node Introduction to Allocation | 5807 @node Introduction to Allocation, Garbage Collection, Allocation of Objects in XEmacs Lisp, Allocation of Objects in XEmacs Lisp |
5984 @section Introduction to Allocation | 5808 @section Introduction to Allocation |
5985 @cindex allocation, introduction to | 5809 @cindex allocation, introduction to |
5986 | 5810 |
5987 Emacs Lisp, like all Lisps, has garbage collection. This means that | 5811 Emacs Lisp, like all Lisps, has garbage collection. This means that |
5988 the programmer never has to explicitly free (destroy) an object; it | 5812 the programmer never has to explicitly free (destroy) an object; it |
6050 like vectors. You can basically view them as exactly like vectors | 5874 like vectors. You can basically view them as exactly like vectors |
6051 except that their type is stored in lrecord fashion rather than | 5875 except that their type is stored in lrecord fashion rather than |
6052 in directly-tagged fashion. | 5876 in directly-tagged fashion. |
6053 | 5877 |
6054 | 5878 |
6055 @node Garbage Collection | 5879 @node Garbage Collection, GCPROing, Introduction to Allocation, Allocation of Objects in XEmacs Lisp |
6056 @section Garbage Collection | 5880 @section Garbage Collection |
6057 @cindex garbage collection | 5881 @cindex garbage collection |
6058 | 5882 |
6059 @cindex mark and sweep | 5883 @cindex mark and sweep |
6060 Garbage collection is simple in theory but tricky to implement. | 5884 Garbage collection is simple in theory but tricky to implement. |
6061 Emacs Lisp uses the oldest garbage collection method, called | 5885 Emacs Lisp uses the oldest garbage collection method, called |
6062 @dfn{mark and sweep}. Garbage collection begins by starting with | 5886 @dfn{mark and sweep}. Garbage collection begins by starting with |
6063 all accessible locations (i.e. all variables and other slots where | 5887 all accessible locations (i.e. all variables and other slots where |
6064 Lisp objects might occur) and recursively traversing all objects | 5888 Lisp objects might occur) and recursively traversing all objects |
6065 accessible from those slots, marking each one that is found. | 5889 accessible from those slots, marking each one that is found. |
6066 We then go through all of memory and free each object that is | 5890 We then go through all of memory and free each object that is |
6067 not marked, and unmarking each object that is marked. Note | 5891 not marked, and unmarking each object that is marked. Note |
6074 @code{garbage-collect} but is also called automatically by @code{eval}, | 5898 @code{garbage-collect} but is also called automatically by @code{eval}, |
6075 once a certain amount of memory has been allocated since the last | 5899 once a certain amount of memory has been allocated since the last |
6076 garbage collection (according to @code{gc-cons-threshold}). | 5900 garbage collection (according to @code{gc-cons-threshold}). |
6077 | 5901 |
6078 | 5902 |
6079 @node GCPROing | 5903 @node GCPROing, Garbage Collection - Step by Step, Garbage Collection, Allocation of Objects in XEmacs Lisp |
6080 @section @code{GCPRO}ing | 5904 @section @code{GCPRO}ing |
6081 @cindex @code{GCPRO}ing | 5905 @cindex @code{GCPRO}ing |
6082 @cindex garbage collection protection | 5906 @cindex garbage collection protection |
6083 @cindex protection, garbage collection | 5907 @cindex protection, garbage collection |
6084 | 5908 |
6249 anything that looks like a reference to an object as a reference. This | 6073 anything that looks like a reference to an object as a reference. This |
6250 will result in a few objects not getting collected when they should, but | 6074 will result in a few objects not getting collected when they should, but |
6251 it obviates the need for @code{GCPRO}ing, and allows garbage collection | 6075 it obviates the need for @code{GCPRO}ing, and allows garbage collection |
6252 to happen at any point at all, such as during object allocation. | 6076 to happen at any point at all, such as during object allocation. |
6253 | 6077 |
6254 @node Garbage Collection - Step by Step | 6078 @node Garbage Collection - Step by Step, Integers and Characters, GCPROing, Allocation of Objects in XEmacs Lisp |
6255 @section Garbage Collection - Step by Step | 6079 @section Garbage Collection - Step by Step |
6256 @cindex garbage collection - step by step | 6080 @cindex garbage collection - step by step |
6257 | 6081 |
6258 @menu | 6082 @menu |
6259 * Invocation:: | 6083 * Invocation:: |
6260 * garbage_collect_1:: | 6084 * garbage_collect_1:: |
6261 * mark_object:: | 6085 * mark_object:: |
6262 * gc_sweep:: | 6086 * gc_sweep:: |
6263 * sweep_lcrecords_1:: | 6087 * sweep_lcrecords_1:: |
6264 * compact_string_chars:: | 6088 * compact_string_chars:: |
6265 * sweep_strings:: | 6089 * sweep_strings:: |
6266 * sweep_bit_vectors_1:: | 6090 * sweep_bit_vectors_1:: |
6267 @end menu | 6091 @end menu |
6268 | 6092 |
6269 @node Invocation | 6093 @node Invocation, garbage_collect_1, Garbage Collection - Step by Step, Garbage Collection - Step by Step |
6270 @subsection Invocation | 6094 @subsection Invocation |
6271 @cindex garbage collection, invocation | 6095 @cindex garbage collection, invocation |
6272 | 6096 |
6273 The first thing that anyone should know about garbage collection is: | 6097 The first thing that anyone should know about garbage collection is: |
6274 when and how the garbage collector is invoked. One might think that this | 6098 when and how the garbage collector is invoked. One might think that this |
6324 everything related to @code{eval} (@code{Feval_buffer}, @code{call0}, | 6148 everything related to @code{eval} (@code{Feval_buffer}, @code{call0}, |
6325 ...) and inside @code{Fsignal}. The latter is used to handle signals, as | 6149 ...) and inside @code{Fsignal}. The latter is used to handle signals, as |
6326 for example the ones raised by every @code{QUIT}-macro triggered after | 6150 for example the ones raised by every @code{QUIT}-macro triggered after |
6327 pressing Ctrl-g. | 6151 pressing Ctrl-g. |
6328 | 6152 |
6329 @node garbage_collect_1 | 6153 @node garbage_collect_1, mark_object, Invocation, Garbage Collection - Step by Step |
6330 @subsection @code{garbage_collect_1} | 6154 @subsection @code{garbage_collect_1} |
6331 @cindex @code{garbage_collect_1} | 6155 @cindex @code{garbage_collect_1} |
6332 | 6156 |
6333 We can now describe exactly what happens after the invocation takes | 6157 We can now describe exactly what happens after the invocation takes |
6334 place. | 6158 place. |
6514 A small memory reserve is always held back that can be reached by | 6338 A small memory reserve is always held back that can be reached by |
6515 @code{breathing_space}. If nothing more is left, we create a new reserve | 6339 @code{breathing_space}. If nothing more is left, we create a new reserve |
6516 and exit. | 6340 and exit. |
6517 @end enumerate | 6341 @end enumerate |
6518 | 6342 |
6519 @node mark_object | 6343 @node mark_object, gc_sweep, garbage_collect_1, Garbage Collection - Step by Step |
6520 @subsection @code{mark_object} | 6344 @subsection @code{mark_object} |
6521 @cindex @code{mark_object} | 6345 @cindex @code{mark_object} |
6522 | 6346 |
6523 The first thing that is checked while marking an object is whether the | 6347 The first thing that is checked while marking an object is whether the |
6524 object is a real Lisp object @code{Lisp_Type_Record} or just an integer | 6348 object is a real Lisp object @code{Lisp_Type_Record} or just an integer |
6548 be performed. | 6372 be performed. |
6549 | 6373 |
6550 In case another object was returned, as mentioned before, we reiterate | 6374 In case another object was returned, as mentioned before, we reiterate |
6551 the whole @code{mark_object} process beginning with this next object. | 6375 the whole @code{mark_object} process beginning with this next object. |
6552 | 6376 |
6553 @node gc_sweep | 6377 @node gc_sweep, sweep_lcrecords_1, mark_object, Garbage Collection - Step by Step |
6554 @subsection @code{gc_sweep} | 6378 @subsection @code{gc_sweep} |
6555 @cindex @code{gc_sweep} | 6379 @cindex @code{gc_sweep} |
6556 | 6380 |
6557 The job of this function is to free all unmarked records from memory. As | 6381 The job of this function is to free all unmarked records from memory. As |
6558 we know, there are different types of objects implemented and managed, and | 6382 we know, there are different types of objects implemented and managed, and |
6643 (by @code{UNMARK_...}). While going through one block, we note if the | 6467 (by @code{UNMARK_...}). While going through one block, we note if the |
6644 whole block is empty. If so, the whole block is freed (using | 6468 whole block is empty. If so, the whole block is freed (using |
6645 @code{xfree}) and the free list state is set to the state it had before | 6469 @code{xfree}) and the free list state is set to the state it had before |
6646 handling this block. | 6470 handling this block. |
6647 | 6471 |
6648 @node sweep_lcrecords_1 | 6472 @node sweep_lcrecords_1, compact_string_chars, gc_sweep, Garbage Collection - Step by Step |
6649 @subsection @code{sweep_lcrecords_1} | 6473 @subsection @code{sweep_lcrecords_1} |
6650 @cindex @code{sweep_lcrecords_1} | 6474 @cindex @code{sweep_lcrecords_1} |
6651 | 6475 |
6652 After nullifying the complete lcrecord statistics, we go over all | 6476 After nullifying the complete lcrecord statistics, we go over all |
6653 lcrecords two separate times. They are all chained together in a list with | 6477 lcrecords two separate times. They are all chained together in a list with |
6664 through the whole list. In case an object is read only or marked, it | 6488 through the whole list. In case an object is read only or marked, it |
6665 has to persist, otherwise it is manually freed by calling | 6489 has to persist, otherwise it is manually freed by calling |
6666 @code{xfree}. During this loop, the lcrecord statistics are kept up to | 6490 @code{xfree}. During this loop, the lcrecord statistics are kept up to |
6667 date by calling @code{tick_lcrecord_stats} with the right arguments, | 6491 date by calling @code{tick_lcrecord_stats} with the right arguments, |
6668 | 6492 |
6669 @node compact_string_chars | 6493 @node compact_string_chars, sweep_strings, sweep_lcrecords_1, Garbage Collection - Step by Step |
6670 @subsection @code{compact_string_chars} | 6494 @subsection @code{compact_string_chars} |
6671 @cindex @code{compact_string_chars} | 6495 @cindex @code{compact_string_chars} |
6672 | 6496 |
6673 The purpose of this function is to compact all the data parts of the | 6497 The purpose of this function is to compact all the data parts of the |
6674 strings that are held in so-called @code{string_chars_block}, i.e. the | 6498 strings that are held in so-called @code{string_chars_block}, i.e. the |
6710 @code{string_chars_block}, sitting in @code{current_string_chars_block}, | 6534 @code{string_chars_block}, sitting in @code{current_string_chars_block}, |
6711 is reset on the last block to which we moved a string, | 6535 is reset on the last block to which we moved a string, |
6712 i.e. @code{to_block}, and all remaining blocks (we know that they just | 6536 i.e. @code{to_block}, and all remaining blocks (we know that they just |
6713 carry garbage) are explicitly @code{xfree}d. | 6537 carry garbage) are explicitly @code{xfree}d. |
6714 | 6538 |
6715 @node sweep_strings | 6539 @node sweep_strings, sweep_bit_vectors_1, compact_string_chars, Garbage Collection - Step by Step |
6716 @subsection @code{sweep_strings} | 6540 @subsection @code{sweep_strings} |
6717 @cindex @code{sweep_strings} | 6541 @cindex @code{sweep_strings} |
6718 | 6542 |
6719 The sweeping for the fixed sized string objects is essentially exactly | 6543 The sweeping for the fixed sized string objects is essentially exactly |
6720 the same as it is for all other fixed size types. As before, the freeing | 6544 the same as it is for all other fixed size types. As before, the freeing |
6731 addition: in case, the string was not allocated in a | 6555 addition: in case, the string was not allocated in a |
6732 @code{string_chars_block} because it exceeded the maximal length, and | 6556 @code{string_chars_block} because it exceeded the maximal length, and |
6733 therefore it was @code{malloc}ed separately, we know also @code{xfree} | 6557 therefore it was @code{malloc}ed separately, we know also @code{xfree} |
6734 it explicitly. | 6558 it explicitly. |
6735 | 6559 |
6736 @node sweep_bit_vectors_1 | 6560 @node sweep_bit_vectors_1, , sweep_strings, Garbage Collection - Step by Step |
6737 @subsection @code{sweep_bit_vectors_1} | 6561 @subsection @code{sweep_bit_vectors_1} |
6738 @cindex @code{sweep_bit_vectors_1} | 6562 @cindex @code{sweep_bit_vectors_1} |
6739 | 6563 |
6740 Bit vectors are also one of the rare types that are @code{malloc}ed | 6564 Bit vectors are also one of the rare types that are @code{malloc}ed |
6741 individually. Consequently, while sweeping, all further needless | 6565 individually. Consequently, while sweeping, all further needless |
6745 all unmarked bit vectors are unlinked by calling @code{xfree} and all of | 6569 all unmarked bit vectors are unlinked by calling @code{xfree} and all of |
6746 them become unmarked. | 6570 them become unmarked. |
6747 In addition, the bookkeeping information used for garbage | 6571 In addition, the bookkeeping information used for garbage |
6748 collector's output purposes is updated. | 6572 collector's output purposes is updated. |
6749 | 6573 |
6750 @node Integers and Characters | 6574 @node Integers and Characters, Allocation from Frob Blocks, Garbage Collection - Step by Step, Allocation of Objects in XEmacs Lisp |
6751 @section Integers and Characters | 6575 @section Integers and Characters |
6752 @cindex integers and characters | 6576 @cindex integers and characters |
6753 @cindex characters, integers and | 6577 @cindex characters, integers and |
6754 | 6578 |
6755 Integer and character Lisp objects are created from integers using the | 6579 Integer and character Lisp objects are created from integers using the |
6761 | 6585 |
6762 @code{XSETINT()} and the like will truncate values given to them that | 6586 @code{XSETINT()} and the like will truncate values given to them that |
6763 are too big; i.e. you won't get the value you expected but the tag bits | 6587 are too big; i.e. you won't get the value you expected but the tag bits |
6764 will at least be correct. | 6588 will at least be correct. |
6765 | 6589 |
6766 @node Allocation from Frob Blocks | 6590 @node Allocation from Frob Blocks, lrecords, Integers and Characters, Allocation of Objects in XEmacs Lisp |
6767 @section Allocation from Frob Blocks | 6591 @section Allocation from Frob Blocks |
6768 @cindex allocation from frob blocks | 6592 @cindex allocation from frob blocks |
6769 @cindex frob blocks, allocation from | 6593 @cindex frob blocks, allocation from |
6770 | 6594 |
6771 The uninitialized memory required by a @code{Lisp_Object} of a particular type | 6595 The uninitialized memory required by a @code{Lisp_Object} of a particular type |
6790 @code{ALLOCATE_FIXED_TYPE_FROM_BLOCK()}, which looks at the end of the | 6614 @code{ALLOCATE_FIXED_TYPE_FROM_BLOCK()}, which looks at the end of the |
6791 last frob block for space, and creates a new frob block if there is | 6615 last frob block for space, and creates a new frob block if there is |
6792 none. (There are actually two versions of these macros, one of which is | 6616 none. (There are actually two versions of these macros, one of which is |
6793 more defensive but less efficient and is used for error-checking.) | 6617 more defensive but less efficient and is used for error-checking.) |
6794 | 6618 |
6795 @node lrecords | 6619 @node lrecords, Low-level allocation, Allocation from Frob Blocks, Allocation of Objects in XEmacs Lisp |
6796 @section lrecords | 6620 @section lrecords |
6797 @cindex lrecords | 6621 @cindex lrecords |
6798 | 6622 |
6799 [see @file{lrecord.h}] | 6623 [see @file{lrecord.h}] |
6800 | 6624 |
7030 (i.e. declared with a @code{_SEQUENCE_IMPLEMENTATION} macro.) This should | 6854 (i.e. declared with a @code{_SEQUENCE_IMPLEMENTATION} macro.) This should |
7031 simply return the object's size in bytes, exactly as you might expect. | 6855 simply return the object's size in bytes, exactly as you might expect. |
7032 For an example, see the methods for window configurations and opaques. | 6856 For an example, see the methods for window configurations and opaques. |
7033 @end enumerate | 6857 @end enumerate |
7034 | 6858 |
7035 @node Low-level allocation | 6859 @node Low-level allocation, Cons, lrecords, Allocation of Objects in XEmacs Lisp |
7036 @section Low-level allocation | 6860 @section Low-level allocation |
7037 @cindex low-level allocation | 6861 @cindex low-level allocation |
7038 @cindex allocation, low-level | 6862 @cindex allocation, low-level |
7039 | 6863 |
7040 Memory that you want to allocate directly should be allocated using | 6864 Memory that you want to allocate directly should be allocated using |
7103 and bit-vector creation routines. These routines also call | 6927 and bit-vector creation routines. These routines also call |
7104 @code{INCREMENT_CONS_COUNTER()} at the appropriate times; this keeps | 6928 @code{INCREMENT_CONS_COUNTER()} at the appropriate times; this keeps |
7105 statistics on how much memory is allocated, so that garbage-collection | 6929 statistics on how much memory is allocated, so that garbage-collection |
7106 can be invoked when the threshold is reached. | 6930 can be invoked when the threshold is reached. |
7107 | 6931 |
7108 @node Cons | 6932 @node Cons, Vector, Low-level allocation, Allocation of Objects in XEmacs Lisp |
7109 @section Cons | 6933 @section Cons |
7110 @cindex cons | 6934 @cindex cons |
7111 | 6935 |
7112 Conses are allocated in standard frob blocks. The only thing to | 6936 Conses are allocated in standard frob blocks. The only thing to |
7113 note is that conses can be explicitly freed using @code{free_cons()} | 6937 note is that conses can be explicitly freed using @code{free_cons()} |
7118 generating extra objects and thereby triggering GC sooner. | 6942 generating extra objects and thereby triggering GC sooner. |
7119 However, you have to be @emph{extremely} careful when doing this. | 6943 However, you have to be @emph{extremely} careful when doing this. |
7120 If you mess this up, you will get BADLY BURNED, and it has happened | 6944 If you mess this up, you will get BADLY BURNED, and it has happened |
7121 before. | 6945 before. |
7122 | 6946 |
7123 @node Vector | 6947 @node Vector, Bit Vector, Cons, Allocation of Objects in XEmacs Lisp |
7124 @section Vector | 6948 @section Vector |
7125 @cindex vector | 6949 @cindex vector |
7126 | 6950 |
7127 As mentioned above, each vector is @code{malloc()}ed individually, and | 6951 As mentioned above, each vector is @code{malloc()}ed individually, and |
7128 all are threaded through the variable @code{all_vectors}. Vectors are | 6952 all are threaded through the variable @code{all_vectors}. Vectors are |
7130 Note that the @code{struct Lisp_Vector} is declared with its | 6954 Note that the @code{struct Lisp_Vector} is declared with its |
7131 @code{contents} field being a @emph{stretchy} array of one element. It | 6955 @code{contents} field being a @emph{stretchy} array of one element. It |
7132 is actually @code{malloc()}ed with the right size, however, and access | 6956 is actually @code{malloc()}ed with the right size, however, and access |
7133 to any element through the @code{contents} array works fine. | 6957 to any element through the @code{contents} array works fine. |
7134 | 6958 |
7135 @node Bit Vector | 6959 @node Bit Vector, Symbol, Vector, Allocation of Objects in XEmacs Lisp |
7136 @section Bit Vector | 6960 @section Bit Vector |
7137 @cindex bit vector | 6961 @cindex bit vector |
7138 @cindex vector, bit | 6962 @cindex vector, bit |
7139 | 6963 |
7140 Bit vectors work exactly like vectors, except for more complicated | 6964 Bit vectors work exactly like vectors, except for more complicated |
7142 vectors are lrecords while vectors are not. (The only difference here is | 6966 vectors are lrecords while vectors are not. (The only difference here is |
7143 that there's an lrecord implementation pointer at the beginning and the | 6967 that there's an lrecord implementation pointer at the beginning and the |
7144 tag field in bit vector Lisp words is ``lrecord'' rather than | 6968 tag field in bit vector Lisp words is ``lrecord'' rather than |
7145 ``vector''.) | 6969 ``vector''.) |
7146 | 6970 |
7147 @node Symbol | 6971 @node Symbol, Marker, Bit Vector, Allocation of Objects in XEmacs Lisp |
7148 @section Symbol | 6972 @section Symbol |
7149 @cindex symbol | 6973 @cindex symbol |
7150 | 6974 |
7151 Symbols are also allocated in frob blocks. Symbols in the awful | 6975 Symbols are also allocated in frob blocks. Symbols in the awful |
7152 horrible obarray structure are chained through their @code{next} field. | 6976 horrible obarray structure are chained through their @code{next} field. |
7153 | 6977 |
7154 Remember that @code{intern} looks up a symbol in an obarray, creating | 6978 Remember that @code{intern} looks up a symbol in an obarray, creating |
7155 one if necessary. | 6979 one if necessary. |
7156 | 6980 |
7157 @node Marker | 6981 @node Marker, String, Symbol, Allocation of Objects in XEmacs Lisp |
7158 @section Marker | 6982 @section Marker |
7159 @cindex marker | 6983 @cindex marker |
7160 | 6984 |
7161 Markers are allocated in frob blocks, as usual. They are kept | 6985 Markers are allocated in frob blocks, as usual. They are kept |
7162 in a buffer unordered, but in a doubly-linked list so that they | 6986 in a buffer unordered, but in a doubly-linked list so that they |
7164 but in some cases garbage collection took an extraordinarily | 6988 but in some cases garbage collection took an extraordinarily |
7165 long time due to the O(N^2) time required to remove lots of | 6989 long time due to the O(N^2) time required to remove lots of |
7166 markers from a buffer.) Markers are removed from a buffer in | 6990 markers from a buffer.) Markers are removed from a buffer in |
7167 the finalize stage, in @code{ADDITIONAL_FREE_marker()}. | 6991 the finalize stage, in @code{ADDITIONAL_FREE_marker()}. |
7168 | 6992 |
7169 @node String | 6993 @node String, Compiled Function, Marker, Allocation of Objects in XEmacs Lisp |
7170 @section String | 6994 @section String |
7171 @cindex string | 6995 @cindex string |
7172 | 6996 |
7173 As mentioned above, strings are a special case. A string is logically | 6997 As mentioned above, strings are a special case. A string is logically |
7174 two parts, a fixed-size object (containing the length, property list, | 6998 two parts, a fixed-size object (containing the length, property list, |
7226 string data (which would normally be obtained from the now-non-existent | 7050 string data (which would normally be obtained from the now-non-existent |
7227 @code{struct Lisp_String}) at the beginning of the dead string data gap. | 7051 @code{struct Lisp_String}) at the beginning of the dead string data gap. |
7228 The string compactor recognizes this special 0xFFFFFFFF marker and | 7052 The string compactor recognizes this special 0xFFFFFFFF marker and |
7229 handles it correctly. | 7053 handles it correctly. |
7230 | 7054 |
7231 @node Compiled Function | 7055 @node Compiled Function, , String, Allocation of Objects in XEmacs Lisp |
7232 @section Compiled Function | 7056 @section Compiled Function |
7233 @cindex compiled function | 7057 @cindex compiled function |
7234 @cindex function, compiled | 7058 @cindex function, compiled |
7235 | 7059 |
7236 Not yet documented. | 7060 Not yet documented. |
7238 | 7062 |
7239 @node Dumping, Events and the Event Loop, Allocation of Objects in XEmacs Lisp, Top | 7063 @node Dumping, Events and the Event Loop, Allocation of Objects in XEmacs Lisp, Top |
7240 @chapter Dumping | 7064 @chapter Dumping |
7241 @cindex dumping | 7065 @cindex dumping |
7242 | 7066 |
7243 @section What is dumping and its justification | 7067 @menu |
7244 @cindex dumping and its justification, what is | 7068 * Dumping Justification:: |
7069 * Overview:: | |
7070 * Data descriptions:: | |
7071 * Dumping phase:: | |
7072 * Reloading phase:: | |
7073 * Remaining issues:: | |
7074 @end menu | |
7075 | |
7076 @node Dumping Justification, Overview, Dumping, Dumping | |
7077 @section Dumping Justification | |
7078 @cindex dumping, justification | |
7245 | 7079 |
7246 The C code of XEmacs is just a Lisp engine with a lot of built-in | 7080 The C code of XEmacs is just a Lisp engine with a lot of built-in |
7247 primitives useful for writing an editor. The editor itself is written | 7081 primitives useful for writing an editor. The editor itself is written |
7248 mostly in Lisp, and represents around 100K lines of code. Loading and | 7082 mostly in Lisp, and represents around 100K lines of code. Loading and |
7249 executing the initialization of all this code takes a bit a time (five | 7083 executing the initialization of all this code takes a bit a time (five |
7261 This solution, while working, has a huge problem: the creation of the | 7095 This solution, while working, has a huge problem: the creation of the |
7262 new executable from the actual contents of memory is an extremely | 7096 new executable from the actual contents of memory is an extremely |
7263 system-specific process, quite error-prone, and which interferes with a | 7097 system-specific process, quite error-prone, and which interferes with a |
7264 lot of system libraries (like malloc). It is even getting worse | 7098 lot of system libraries (like malloc). It is even getting worse |
7265 nowadays with libraries using constructors which are automatically | 7099 nowadays with libraries using constructors which are automatically |
7266 called when the program is started (even before main()) which tend to | 7100 called when the program is started (even before @code{main()}) which tend to |
7267 crash when they are called multiple times, once before dumping and once | 7101 crash when they are called multiple times, once before dumping and once |
7268 after (IRIX 6.x libz.so pulls in some C++ image libraries thru | 7102 after (IRIX 6.x @file{libz.so} pulls in some C++ image libraries thru |
7269 dependencies which have this problem). Writing the dumper is also one | 7103 dependencies which have this problem). Writing the dumper is also one |
7270 of the most difficult parts of porting XEmacs to a new operating system. | 7104 of the most difficult parts of porting XEmacs to a new operating system. |
7271 Basically, `dumping' is an operation that is just not officially | 7105 Basically, `dumping' is an operation that is just not officially |
7272 supported on many operating systems. | 7106 supported on many operating systems. |
7273 | 7107 |
7274 The aim of the portable dumper is to solve the same problem as the | 7108 The aim of the portable dumper is to solve the same problem as the |
7275 system-specific dumper, that is to be able to reload quickly, using only | 7109 system-specific dumper, that is to be able to reload quickly, using only |
7276 a small number of files, the fully initialized lisp part of the editor, | 7110 a small number of files, the fully initialized lisp part of the editor, |
7277 without any system-specific hacks. | 7111 without any system-specific hacks. |
7278 | 7112 |
7279 @menu | 7113 @node Overview, Data descriptions, Dumping Justification, Dumping |
7280 * Overview:: | |
7281 * Data descriptions:: | |
7282 * Dumping phase:: | |
7283 * Reloading phase:: | |
7284 * Remaining issues:: | |
7285 @end menu | |
7286 | |
7287 @node Overview | |
7288 @section Overview | 7114 @section Overview |
7289 @cindex dumping overview | 7115 @cindex dumping overview |
7290 | 7116 |
7291 The portable dumping system has to: | 7117 The portable dumping system has to: |
7292 | 7118 |
7293 @enumerate | 7119 @enumerate |
7294 @item | 7120 @item |
7295 At dump time, write all initialized, non-quickly-rebuildable data to a | 7121 At dump time, write all initialized, non-quickly-rebuildable data to a |
7296 file [Note: currently named @file{xemacs.dmp}, but the name will | 7122 file [Note: currently named @file{xemacs.dmp}, but the name will |
7297 change], along with all informations needed for the reloading. | 7123 change], along with all information needed for the reloading. |
7298 | 7124 |
7299 @item | 7125 @item |
7300 When starting xemacs, reload the dump file, relocate it to its new | 7126 When starting xemacs, reload the dump file, relocate it to its new |
7301 starting address if needed, and reinitialize all pointers to this | 7127 starting address if needed, and reinitialize all pointers to this |
7302 data. Also, rebuild all the quickly rebuildable data. | 7128 data. Also, rebuild all the quickly rebuildable data. |
7303 @end enumerate | 7129 @end enumerate |
7304 | 7130 |
7305 Note: As of 21.5.18, the dump file has been moved inside of the | 7131 Note: As of 21.5.18, the dump file has been moved inside of the |
7306 executable, although there are still problems with this on some systems. | 7132 executable, although there are still problems with this on some systems. |
7307 | 7133 |
7308 @node Data descriptions | 7134 @node Data descriptions, Dumping phase, Overview, Dumping |
7309 @section Data descriptions | 7135 @section Data descriptions |
7310 @cindex dumping data descriptions | 7136 @cindex dumping data descriptions |
7311 | 7137 |
7312 The more complex task of the dumper is to be able to write lisp objects | 7138 The more complex task of the dumper is to be able to write memory blocks |
7313 (lrecords) and C structs to disk and reload them at a different address, | 7139 on the heap (lisp objects, i.e. lrecords, and C-allocated memory, such |
7140 as structs and arrays) to disk and reload them at a different address, | |
7314 updating all the pointers they include in the process. This is done by | 7141 updating all the pointers they include in the process. This is done by |
7315 using external data descriptions that give information about the layout | 7142 using external data descriptions that give information about the layout |
7316 of the structures in memory. | 7143 of the blocks in memory. |
7317 | 7144 |
7318 The specification of these descriptions is in lrecord.h. A description | 7145 The specification of these descriptions is in lrecord.h. A description |
7319 of an lrecord is an array of struct lrecord_description. Each of these | 7146 of an lrecord is an array of struct memory_description. Each of these |
7320 structs include a type, an offset in the structure and some optional | 7147 structs include a type, an offset in the block and some optional |
7321 parameters depending on the type. For instance, here is the string | 7148 parameters depending on the type. For instance, here is the string |
7322 description: | 7149 description: |
7323 | 7150 |
7324 @example | 7151 @example |
7325 static const struct lrecord_description string_description[] = @{ | 7152 static const struct memory_description string_description[] = @{ |
7326 @{ XD_BYTECOUNT, offsetof (Lisp_String, size) @}, | 7153 @{ XD_BYTECOUNT, offsetof (Lisp_String, size) @}, |
7327 @{ XD_OPAQUE_DATA_PTR, offsetof (Lisp_String, data), XD_INDIRECT(0, 1) @}, | 7154 @{ XD_OPAQUE_DATA_PTR, offsetof (Lisp_String, data), XD_INDIRECT(0, 1) @}, |
7328 @{ XD_LISP_OBJECT, offsetof (Lisp_String, plist) @}, | 7155 @{ XD_LISP_OBJECT, offsetof (Lisp_String, plist) @}, |
7329 @{ XD_END @} | 7156 @{ XD_END @} |
7330 @}; | 7157 @}; |
7337 in the 0th line of the description (welcome to C) plus one". The third | 7164 in the 0th line of the description (welcome to C) plus one". The third |
7338 line means "there is a Lisp_Object member @code{plist} in the Lisp_String | 7165 line means "there is a Lisp_Object member @code{plist} in the Lisp_String |
7339 structure". @code{XD_END} then ends the description. | 7166 structure". @code{XD_END} then ends the description. |
7340 | 7167 |
7341 This gives us all the information we need to move around what is pointed | 7168 This gives us all the information we need to move around what is pointed |
7342 to by a structure (C or lrecord) and, by transitivity, everything that | 7169 to by a memory block (C or lrecord) and, by transitivity, everything |
7343 it points to. The only missing information for dumping is the size of | 7170 that it points to. The only missing information for dumping is the size |
7344 the structure. For lrecords, this is part of the | 7171 of the block. For lrecords, this is part of the |
7345 lrecord_implementation, so we don't need to duplicate it. For C | 7172 lrecord_implementation, so we don't need to duplicate it. For C blocks |
7346 structures we use a struct struct_description, which includes a size | 7173 we use a struct sized_memory_description, which includes a size field |
7347 field and a pointer to an associated array of lrecord_description. | 7174 and a pointer to an associated array of memory_description. |
7348 | 7175 |
7349 @node Dumping phase | 7176 @node Dumping phase, Reloading phase, Data descriptions, Dumping |
7350 @section Dumping phase | 7177 @section Dumping phase |
7351 @cindex dumping phase | 7178 @cindex dumping phase |
7352 | 7179 |
7353 Dumping is done by calling the function pdump() (in dumper.c) which is | 7180 Dumping is done by calling the function @code{pdump()} (in @file{dumper.c}) which is |
7354 invoked from Fdump_emacs (in emacs.c). This function performs a number | 7181 invoked from Fdump_emacs (in @file{emacs.c}). This function performs a number |
7355 of tasks. | 7182 of tasks. |
7356 | 7183 |
7357 @menu | 7184 @menu |
7358 * Object inventory:: | 7185 * Object inventory:: |
7359 * Address allocation:: | 7186 * Address allocation:: |
7360 * The header:: | 7187 * The header:: |
7361 * Data dumping:: | 7188 * Data dumping:: |
7362 * Pointers dumping:: | 7189 * Pointers dumping:: |
7363 @end menu | 7190 @end menu |
7364 | 7191 |
7365 @node Object inventory | 7192 @node Object inventory, Address allocation, Dumping phase, Dumping phase |
7366 @subsection Object inventory | 7193 @subsection Object inventory |
7367 @cindex dumping object inventory | 7194 @cindex dumping object inventory |
7195 @cindex memory blocks | |
7368 | 7196 |
7369 The first task is to build the list of the objects to dump. This | 7197 The first task is to build the list of the objects to dump. This |
7370 includes: | 7198 includes: |
7371 | 7199 |
7372 @itemize @bullet | 7200 @itemize @bullet |
7373 @item lisp objects | 7201 @item lisp objects |
7374 @item C structures | 7202 @item other memory blocks (C structures, arrays. etc) |
7375 @end itemize | 7203 @end itemize |
7376 | 7204 |
7377 We end up with one @code{pdump_entry_list_elmt} per object group (arrays | 7205 We end up with one @code{pdump_block_list_elt} per object group (arrays |
7378 of C structs are kept together) which includes a pointer to the first | 7206 of C structs are kept together) which includes a pointer to the first |
7379 object of the group, the per-object size and the count of objects in the | 7207 object of the group, the per-object size and the count of objects in the |
7380 group, along with some other information which is initialized later. | 7208 group, along with some other information which is initialized later. |
7381 | 7209 |
7382 These entries are linked together in @code{pdump_entry_list} structures | 7210 These entries are linked together in @code{pdump_block_list} structures |
7383 and can be enumerated thru either: | 7211 and can be enumerated thru either: |
7384 | 7212 |
7385 @enumerate | 7213 @enumerate |
7386 @item | 7214 @item |
7387 the @code{pdump_object_table}, an array of @code{pdump_entry_list}, one | 7215 the @code{pdump_object_table}, an array of @code{pdump_block_list}, one |
7388 per lrecord type, indexed by type number. | 7216 per lrecord type, indexed by type number. |
7389 | 7217 |
7390 @item | 7218 @item |
7391 the @code{pdump_opaque_data_list}, used for the opaque data which does | 7219 the @code{pdump_opaque_data_list}, used for the opaque data which does |
7392 not include pointers, and hence does not need descriptions. | 7220 not include pointers, and hence does not need descriptions. |
7393 | 7221 |
7394 @item | 7222 @item |
7395 the @code{pdump_struct_table}, which is a vector of | 7223 the @code{pdump_desc_table}, which is a vector of |
7396 @code{struct_description}/@code{pdump_entry_list} pairs, used for | 7224 @code{memory_description}/@code{pdump_block_list} pairs, used for |
7397 non-opaque C structures. | 7225 non-opaque C memory blocks. |
7398 @end enumerate | 7226 @end enumerate |
7399 | 7227 |
7400 This uses a marking strategy similar to the garbage collector. Some | 7228 This uses a marking strategy similar to the garbage collector. Some |
7401 differences though: | 7229 differences though: |
7402 | 7230 |
7403 @enumerate | 7231 @enumerate |
7404 @item | 7232 @item |
7405 We do not use the mark bit (which does not exist for C structures | 7233 We do not use the mark bit (which does not exist for generic memory blocks |
7406 anyway); we use a big hash table instead. | 7234 anyway); we use a big hash table instead. |
7407 | 7235 |
7408 @item | 7236 @item |
7409 We do not use the mark function of lrecords but instead rely on the | 7237 We do not use the mark function of lrecords but instead rely on the |
7410 external descriptions. This happens essentially because we need to | 7238 external descriptions. This happens essentially because we need to |
7411 follow pointers to C structures and opaque data in addition to | 7239 follow pointers to generic memory blocks and opaque data in addition to |
7412 Lisp_Object members. | 7240 Lisp_Object members. |
7413 @end enumerate | 7241 @end enumerate |
7414 | 7242 |
7415 This is done by @code{pdump_register_object()}, which handles Lisp_Object | 7243 This is done by @code{pdump_register_object()}, which handles |
7416 variables, and @code{pdump_register_struct()} which handles C structures, | 7244 Lisp_Object variables, and @code{pdump_register_block()} which handles |
7417 which both delegate the description management to @code{pdump_register_sub()}. | 7245 generic memory blocks (C structures, arrays, etc.), which both delegate |
7418 | 7246 the description management to @code{pdump_register_sub()}. |
7419 The hash table doubles as a map object to pdump_entry_list_elmt (i.e. | 7247 |
7420 allows us to look up a pdump_entry_list_elmt with the object it points | 7248 The hash table doubles as a map object to pdump_block_list_elmt (i.e. |
7421 to). Entries are added with @code{pdump_add_entry()} and looked up with | 7249 allows us to look up a pdump_block_list_elmt with the object it points |
7422 @code{pdump_get_entry()}. There is no need for entry removal. The hash | 7250 to). Entries are added with @code{pdump_add_block()} and looked up with |
7251 @code{pdump_get_block()}. There is no need for entry removal. The hash | |
7423 value is computed quite simply from the object pointer by | 7252 value is computed quite simply from the object pointer by |
7424 @code{pdump_make_hash()}. | 7253 @code{pdump_make_hash()}. |
7425 | 7254 |
7426 The roots for the marking are: | 7255 The roots for the marking are: |
7427 | 7256 |
7428 @enumerate | 7257 @enumerate |
7429 @item | 7258 @item |
7430 the @code{staticpro}'ed variables (there is a special @code{staticpro_nodump()} | 7259 the @code{staticpro}'ed variables (there is a special |
7431 call for protected variables we do not want to dump). | 7260 @code{staticpro_nodump()} call for protected variables we do not want to |
7432 | 7261 dump). |
7433 @item | 7262 |
7434 the variables registered via @code{dump_add_root_object} | 7263 @item |
7264 the Lisp_Object variables registered via @code{dump_add_root_lisp_object} | |
7435 (@code{staticpro()} is equivalent to @code{staticpro_nodump()} + | 7265 (@code{staticpro()} is equivalent to @code{staticpro_nodump()} + |
7436 @code{dump_add_root_object()}). | 7266 @code{dump_add_root_lisp_object()}). |
7437 | 7267 |
7438 @item | 7268 @item |
7439 the variables registered via @code{dump_add_root_struct_ptr}, each of | 7269 the data-segment memory blocks registered via @code{dump_add_root_block} |
7440 which points to a C structure. | 7270 (for blocks with relocatable pointers), or @code{dump_add_opaque} (for |
7271 "opaque" blocks with no relocatable pointers; this is just a shortcut | |
7272 for calling @code{dump_add_root_block} with a NULL description). | |
7273 | |
7274 @item | |
7275 the pointer variables registered via @code{dump_add_root_block_ptr}, | |
7276 each of which points to a block of heap memory (generally a C structure | |
7277 or array). Note that @code{dump_add_root_block_ptr} is not technically | |
7278 necessary, as a pointer variable can be seen as a special case of a | |
7279 data-segment memory block and registered using | |
7280 @code{dump_add_root_block}. Doing it this way, however, would require | |
7281 another level of static structures declared. Since pointer variables | |
7282 are quite common, @code{dump_add_root_block_ptr} is provided for | |
7283 convenience. Note also that internally we have to treat it separately | |
7284 from @code{dump_add_root_block} rather than writing the former as a call | |
7285 to the latter, since we don't have support for creating and using memory | |
7286 descriptions on the fly -- they must all be statically declared in the | |
7287 data-segment. | |
7441 @end enumerate | 7288 @end enumerate |
7442 | 7289 |
7443 This does not include the GCPRO'ed variables, the specbinds, the | 7290 This does not include the GCPRO'ed variables, the specbinds, the |
7444 catchtags, the backlist, the redisplay or the profiling info, since we | 7291 catchtags, the backlist, the redisplay or the profiling info, since we |
7445 do not want to rebuild the actual chain of lisp calls which end up to | 7292 do not want to rebuild the actual chain of lisp calls which end up to |
7447 | 7294 |
7448 Weak lists and weak hash tables are dumped as if they were their | 7295 Weak lists and weak hash tables are dumped as if they were their |
7449 non-weak equivalent (without changing their type, of course). This has | 7296 non-weak equivalent (without changing their type, of course). This has |
7450 not yet been a problem. | 7297 not yet been a problem. |
7451 | 7298 |
7452 @node Address allocation | 7299 @node Address allocation, The header, Object inventory, Dumping phase |
7453 @subsection Address allocation | 7300 @subsection Address allocation |
7454 @cindex dumping address allocation | 7301 @cindex dumping address allocation |
7455 | 7302 |
7456 | 7303 |
7457 The next step is to allocate the offsets of each of the objects in the | 7304 The next step is to allocate the offsets of each of the objects in the |
7476 @end enumerate | 7323 @end enumerate |
7477 | 7324 |
7478 Hence, for each lrecord type, C struct type or opaque data block the | 7325 Hence, for each lrecord type, C struct type or opaque data block the |
7479 alignment requirement is computed as a power of two, with a minimum of | 7326 alignment requirement is computed as a power of two, with a minimum of |
7480 2^2 for lrecords. @code{pdump_scan_by_alignment()} then scans all the | 7327 2^2 for lrecords. @code{pdump_scan_by_alignment()} then scans all the |
7481 @code{pdump_entry_list_elmt}'s, the ones with the highest requirements | 7328 @code{pdump_block_list_elmt}'s, the ones with the highest requirements |
7482 first. This ensures the best packing. | 7329 first. This ensures the best packing. |
7483 | 7330 |
7484 The maximum alignment requirement we take into account is 2^8. | 7331 The maximum alignment requirement we take into account is 2^8. |
7485 | 7332 |
7486 @code{pdump_allocate_offset()} only has to do a linear allocation, | 7333 @code{pdump_allocate_offset()} only has to do a linear allocation, |
7487 starting at offset 256 (this leaves room for the header and keeps the | 7334 starting at offset 256 (this leaves room for the header and keeps the |
7488 alignments happy). | 7335 alignments happy). |
7489 | 7336 |
7490 @node The header | 7337 @node The header, Data dumping, Address allocation, Dumping phase |
7491 @subsection The header | 7338 @subsection The header |
7492 @cindex dumping, the header | 7339 @cindex dumping, the header |
7493 | 7340 |
7494 The next step creates the file and writes a header with a signature and | 7341 The next step creates the file and writes a header with a signature and |
7495 some random information in it. The @code{reloc_address} field, which | 7342 some random information in it. The @code{reloc_address} field, which |
7496 indicates at which address the file should be loaded if we want to avoid | 7343 indicates at which address the file should be loaded if we want to avoid |
7497 post-reload relocation, is set to 0. It then seeks to offset 256 (base | 7344 post-reload relocation, is set to 0. It then seeks to offset 256 (base |
7498 offset for the objects). | 7345 offset for the objects). |
7499 | 7346 |
7500 @node Data dumping | 7347 @node Data dumping, Pointers dumping, The header, Dumping phase |
7501 @subsection Data dumping | 7348 @subsection Data dumping |
7502 @cindex data dumping | 7349 @cindex data dumping |
7503 @cindex dumping, data | 7350 @cindex dumping, data |
7504 | 7351 |
7505 The data is dumped in the same order as the addresses were allocated by | 7352 The data is dumped in the same order as the addresses were allocated by |
7509 Allocation, and writes it to the file. Using the same order means that, | 7356 Allocation, and writes it to the file. Using the same order means that, |
7510 if we are careful with lrecords whose size is not a multiple of 4, we | 7357 if we are careful with lrecords whose size is not a multiple of 4, we |
7511 are ensured that the object is always written at the offset in the file | 7358 are ensured that the object is always written at the offset in the file |
7512 allocated in step Address Allocation. | 7359 allocated in step Address Allocation. |
7513 | 7360 |
7514 @node Pointers dumping | 7361 @node Pointers dumping, , Data dumping, Dumping phase |
7515 @subsection Pointers dumping | 7362 @subsection Pointers dumping |
7516 @cindex pointers dumping | 7363 @cindex pointers dumping |
7517 @cindex dumping, pointers | 7364 @cindex dumping, pointers |
7518 | 7365 |
7519 A bunch of tables needed to reassign properly the global pointers are | 7366 A bunch of tables needed to reassign properly the global pointers are |
7520 then written. They are: | 7367 then written. They are: |
7521 | 7368 |
7522 @enumerate | 7369 @enumerate |
7523 @item | 7370 @item |
7524 the pdump_root_struct_ptrs dynarr | 7371 the pdump_root_block_ptrs dynarr |
7525 @item | 7372 @item |
7526 the pdump_opaques dynarr | 7373 the pdump_opaques dynarr |
7527 @item | 7374 @item |
7528 a vector of all the offsets to the objects in the file that include a | 7375 a vector of all the offsets to the objects in the file that include a |
7529 description (for faster relocation at reload time) | 7376 description (for faster relocation at reload time) |
7544 reason why they are not used as roots for the purpose of object | 7391 reason why they are not used as roots for the purpose of object |
7545 enumeration. | 7392 enumeration. |
7546 | 7393 |
7547 Some very important information like the @code{staticpros} and | 7394 Some very important information like the @code{staticpros} and |
7548 @code{lrecord_implementations_table} are handled indirectly using | 7395 @code{lrecord_implementations_table} are handled indirectly using |
7549 @code{dump_add_opaque} or @code{dump_add_root_struct_ptr}. | 7396 @code{dump_add_opaque} or @code{dump_add_root_block_ptr}. |
7550 | 7397 |
7551 This is the end of the dumping part. | 7398 This is the end of the dumping part. |
7552 | 7399 |
7553 @node Reloading phase | 7400 @node Reloading phase, Remaining issues, Dumping phase, Dumping |
7554 @section Reloading phase | 7401 @section Reloading phase |
7555 @cindex reloading phase | 7402 @cindex reloading phase |
7556 @cindex dumping, reloading phase | 7403 @cindex dumping, reloading phase |
7557 | 7404 |
7558 @subsection File loading | 7405 @subsection File loading |
7572 @cindex dumping, putting back the pdump_opaques | 7419 @cindex dumping, putting back the pdump_opaques |
7573 | 7420 |
7574 The memory contents are restored in the obvious and trivial way. | 7421 The memory contents are restored in the obvious and trivial way. |
7575 | 7422 |
7576 | 7423 |
7577 @subsection Putting back the pdump_root_struct_ptrs | 7424 @subsection Putting back the pdump_root_block_ptrs |
7578 @cindex dumping, putting back the pdump_root_struct_ptrs | 7425 @cindex dumping, putting back the pdump_root_block_ptrs |
7579 | 7426 |
7580 The variables pointed to by pdump_root_struct_ptrs in the dump phase are | 7427 The variables pointed to by pdump_root_block_ptrs in the dump phase are |
7581 reset to the right relocated object addresses. | 7428 reset to the right relocated object addresses. |
7582 | 7429 |
7583 | 7430 |
7584 @subsection Object relocation | 7431 @subsection Object relocation |
7585 @cindex dumping, object relocation | 7432 @cindex dumping, object relocation |
7590 | 7437 |
7591 | 7438 |
7592 @subsection Putting back the pdump_root_objects and pdump_weak_object_chains | 7439 @subsection Putting back the pdump_root_objects and pdump_weak_object_chains |
7593 @cindex dumping, putting back the pdump_root_objects and pdump_weak_object_chains | 7440 @cindex dumping, putting back the pdump_root_objects and pdump_weak_object_chains |
7594 | 7441 |
7595 Same as Putting back the pdump_root_struct_ptrs. | 7442 Same as Putting back the pdump_root_block_ptrs. |
7596 | 7443 |
7597 | 7444 |
7598 @subsection Reorganize the hash tables | 7445 @subsection Reorganize the hash tables |
7599 @cindex dumping, reorganize the hash tables | 7446 @cindex dumping, reorganize the hash tables |
7600 | 7447 |
7601 Since some of the hash values in the lisp hash tables are | 7448 Since some of the hash values in the lisp hash tables are |
7602 address-dependent, their layout is now wrong. So we go through each of | 7449 address-dependent, their layout is now wrong. So we go through each of |
7603 them and have them resorted by calling @code{pdump_reorganize_hash_table}. | 7450 them and have them resorted by calling @code{pdump_reorganize_hash_table}. |
7604 | 7451 |
7605 @node Remaining issues | 7452 @node Remaining issues, , Reloading phase, Dumping |
7606 @section Remaining issues | 7453 @section Remaining issues |
7607 @cindex dumping, remaining issues | 7454 @cindex dumping, remaining issues |
7608 | 7455 |
7609 The build process will have to start a post-dump xemacs, ask it the | 7456 The build process will have to start a post-dump xemacs, ask it the |
7610 loading address (which will, hopefully, be always the same between | 7457 loading address (which will, hopefully, be always the same between |
7622 on the same system (mule and no-mule comes to mind). | 7469 on the same system (mule and no-mule comes to mind). |
7623 | 7470 |
7624 The DOC file contents should probably end up in the dump file. | 7471 The DOC file contents should probably end up in the dump file. |
7625 | 7472 |
7626 | 7473 |
7627 @node Events and the Event Loop, Evaluation; Stack Frames; Bindings, Dumping, Top | 7474 @node Events and the Event Loop, Asynchronous Events; Quit Checking, Dumping, Top |
7628 @chapter Events and the Event Loop | 7475 @chapter Events and the Event Loop |
7629 @cindex events and the event loop | 7476 @cindex events and the event loop |
7630 @cindex event loop, events and the | 7477 @cindex event loop, events and the |
7631 | 7478 |
7632 @menu | 7479 @menu |
7633 * Introduction to Events:: | 7480 * Introduction to Events:: |
7634 * Main Loop:: | 7481 * Main Loop:: |
7635 * Specifics of the Event Gathering Mechanism:: | 7482 * Specifics of the Event Gathering Mechanism:: |
7636 * Specifics About the Emacs Event:: | 7483 * Specifics About the Emacs Event:: |
7637 * The Event Stream Callback Routines:: | 7484 * Event Queues:: |
7638 * Other Event Loop Functions:: | 7485 * Event Stream Callback Routines:: |
7639 * Converting Events:: | 7486 * Other Event Loop Functions:: |
7640 * Dispatching Events; The Command Builder:: | 7487 * Stream Pairs:: |
7488 * Converting Events:: | |
7489 * Dispatching Events; The Command Builder:: | |
7490 * Focus Handling:: | |
7491 * Editor-Level Control Flow Modules:: | |
7641 @end menu | 7492 @end menu |
7642 | 7493 |
7643 @node Introduction to Events | 7494 @node Introduction to Events, Main Loop, Events and the Event Loop, Events and the Event Loop |
7644 @section Introduction to Events | 7495 @section Introduction to Events |
7645 @cindex events, introduction to | 7496 @cindex events, introduction to |
7646 | 7497 |
7647 An event is an object that encapsulates information about an | 7498 An event is an object that encapsulates information about an |
7648 interesting occurrence in the operating system. Events are | 7499 interesting occurrence in the operating system. Events are |
7678 Emacs events---there may not be a one-to-one correspondence. | 7529 Emacs events---there may not be a one-to-one correspondence. |
7679 | 7530 |
7680 Emacs events are documented in @file{events.h}; I'll discuss them | 7531 Emacs events are documented in @file{events.h}; I'll discuss them |
7681 later. | 7532 later. |
7682 | 7533 |
7683 @node Main Loop | 7534 @node Main Loop, Specifics of the Event Gathering Mechanism, Introduction to Events, Events and the Event Loop |
7684 @section Main Loop | 7535 @section Main Loop |
7685 @cindex main loop | 7536 @cindex main loop |
7686 @cindex events, main loop | 7537 @cindex events, main loop |
7687 | 7538 |
7688 The @dfn{command loop} is the top-level loop that the editor is always | 7539 The @dfn{command loop} is the top-level loop that the editor is always |
7747 wrapper similar to @code{command_loop_2()}. Note also that | 7598 wrapper similar to @code{command_loop_2()}. Note also that |
7748 @code{initial_command_loop()} sets up a catch for @code{top-level} when | 7599 @code{initial_command_loop()} sets up a catch for @code{top-level} when |
7749 invoking @code{top_level_1()}, just like when it invokes | 7600 invoking @code{top_level_1()}, just like when it invokes |
7750 @code{command_loop_2()}. | 7601 @code{command_loop_2()}. |
7751 | 7602 |
7752 @node Specifics of the Event Gathering Mechanism | 7603 @node Specifics of the Event Gathering Mechanism, Specifics About the Emacs Event, Main Loop, Events and the Event Loop |
7753 @section Specifics of the Event Gathering Mechanism | 7604 @section Specifics of the Event Gathering Mechanism |
7754 @cindex event gathering mechanism, specifics of the | 7605 @cindex event gathering mechanism, specifics of the |
7755 | 7606 |
7756 Here is an approximate diagram of the collection processes | 7607 Here is an approximate diagram of the collection processes |
7757 at work in XEmacs, under TTY's (TTY's are simpler than X | 7608 at work in XEmacs, under TTY's (TTY's are simpler than X |
7778 | | | | | | | 7629 | | | | | | |
7779 V V V V V V | 7630 V V V V V V |
7780 ------>-----------<----------------<---------------- | 7631 ------>-----------<----------------<---------------- |
7781 | | 7632 | |
7782 | | 7633 | |
7783 | [collected using select() in emacs_tty_next_event() | 7634 | [collected using @code{select()} in @code{emacs_tty_next_event()} |
7784 | and converted to the appropriate Emacs event] | 7635 | and converted to the appropriate Emacs event] |
7785 | | 7636 | |
7786 | | 7637 | |
7787 V (above this line is TTY-specific) | 7638 V (above this line is TTY-specific) |
7788 Emacs ----------------------------------------------- | 7639 Emacs ----------------------------------------------- |
7789 event (below this line is the generic event mechanism) | 7640 event (below this line is the generic event mechanism) |
7790 | | 7641 | |
7791 | | 7642 | |
7792 was there if not, call | 7643 was there if not, call |
7793 a SIGINT? emacs_tty_next_event() | 7644 a SIGINT? @code{emacs_tty_next_event()} |
7794 | | | 7645 | | |
7795 | | | 7646 | | |
7796 | | | 7647 | | |
7797 V V | 7648 V V |
7798 --->------<---- | 7649 --->------<---- |
7799 | | 7650 | |
7800 | [collected in event_stream_next_event(); | 7651 | [collected in @code{event_stream_next_event()}; |
7801 | SIGINT is converted using maybe_read_quit_event()] | 7652 | SIGINT is converted using @code{maybe_read_quit_event()}] |
7802 V | 7653 V |
7803 Emacs | 7654 Emacs |
7804 event | 7655 event |
7805 | | 7656 | |
7806 \---->------>----- maybe_kbd_translate() ---->---\ | 7657 \---->------>----- maybe_kbd_translate() ---->---\ |
7808 | | 7659 | |
7809 | | 7660 | |
7810 command event queue | | 7661 command event queue | |
7811 if not from command | 7662 if not from command |
7812 (contains events that were event queue, call | 7663 (contains events that were event queue, call |
7813 read earlier but not processed, event_stream_next_event() | 7664 read earlier but not processed, @code{event_stream_next_event()} |
7814 typically when waiting in a | | 7665 typically when waiting in a | |
7815 sit-for, sleep-for, etc. for | | 7666 sit-for, sleep-for, etc. for | |
7816 a particular event to be received) | | 7667 a particular event to be received) | |
7817 | | | 7668 | | |
7818 | | | 7669 | | |
7819 V V | 7670 V V |
7820 ---->------------------------------------<---- | 7671 ---->------------------------------------<---- |
7821 | | 7672 | |
7822 | [collected in | 7673 | [collected in |
7823 | next_event_internal()] | 7674 | @code{next_event_internal()}] |
7824 | | 7675 | |
7825 unread- unread- event from | | 7676 unread- unread- event from | |
7826 command- command- keyboard else, call | 7677 command- command- keyboard else, call |
7827 events event macro next_event_internal() | 7678 events event macro @code{next_event_internal()} |
7828 | | | | | 7679 | | | | |
7829 | | | | | 7680 | | | | |
7830 | | | | | 7681 | | | | |
7831 V V V V | 7682 V V V V |
7832 --------->----------------------<------------ | 7683 --------->----------------------<------------ |
7833 | | 7684 | |
7834 | [collected in `next-event', which may loop | 7685 | [collected in @code{next-event}, which may loop |
7835 | more than once if the event it gets is on | 7686 | more than once if the event it gets is on |
7836 | a dead frame, device, etc.] | 7687 | a dead frame, device, etc.] |
7837 | | 7688 | |
7838 | | 7689 | |
7839 V | 7690 V |
7840 feed into top-level event loop, | 7691 feed into top-level event loop, |
7841 which repeatedly calls `next-event' | 7692 which repeatedly calls @code{next-event} |
7842 and then dispatches the event | 7693 and then dispatches the event |
7843 using `dispatch-event' | 7694 using @code{dispatch-event} |
7844 @end example | 7695 @end example |
7845 | 7696 |
7846 Notice the separation between TTY-specific and generic event mechanism. | 7697 Notice the separation between TTY-specific and generic event mechanism. |
7847 When using the Xt-based event loop, the TTY-specific stuff is replaced | 7698 When using the Xt-based event loop, the TTY-specific stuff is replaced |
7848 but the rest stays the same. | 7699 but the rest stays the same. |
7887 | | | | | | | | | 7738 | | | | | | | | |
7888 | | | | | | | | | 7739 | | | | | | | | |
7889 V V V V V V V V | 7740 V V V V V V V V |
7890 --->----------------------------------------<---------<------ | 7741 --->----------------------------------------<---------<------ |
7891 | | | | 7742 | | | |
7892 | | |[collected using select() in | 7743 | | |[collected using @code{select()} in |
7893 | | | _XtWaitForSomething(), called | 7744 | | | @code{_XtWaitForSomething()}, called |
7894 | | | from XtAppProcessEvent(), called | 7745 | | | from @code{XtAppProcessEvent()}, called |
7895 | | | in emacs_Xt_next_event(); | 7746 | | | in @code{emacs_Xt_next_event()}; |
7896 | | | dispatched to various callbacks] | 7747 | | | dispatched to various callbacks] |
7897 | | | | 7748 | | | |
7898 | | | | 7749 | | | |
7899 emacs_Xt_ p_s_callback(), | [popup_selection_callback] | 7750 emacs_Xt_ p_s_callback(), | [popup_selection_callback] |
7900 event_handler() x_u_v_s_callback(),| [x_update_vertical_scrollbar_ | 7751 event_handler() x_u_v_s_callback(),| [x_update_vertical_scrollbar_ |
7914 | | | | 7765 | | | |
7915 V V | | 7766 V V | |
7916 -->----------<-- | | 7767 -->----------<-- | |
7917 | | | 7768 | | |
7918 | | | 7769 | | |
7919 dispatch Xt_what_callback() | 7770 dispatch @code{Xt_what_callback()} |
7920 event sets flags | 7771 event sets flags |
7921 queue | | 7772 queue | |
7922 | | | 7773 | | |
7923 | | | 7774 | | |
7924 | | | 7775 | | |
7925 | | | 7776 | | |
7926 ---->-----------<-------- | 7777 ---->-----------<-------- |
7927 | | 7778 | |
7928 | | 7779 | |
7929 | [collected and converted as appropriate in | 7780 | [collected and converted as appropriate in |
7930 | emacs_Xt_next_event()] | 7781 | @code{emacs_Xt_next_event()}] |
7931 | | 7782 | |
7932 | | 7783 | |
7933 V (above this line is Xt-specific) | 7784 V (above this line is Xt-specific) |
7934 Emacs ------------------------------------------------ | 7785 Emacs ------------------------------------------------ |
7935 event (below this line is the generic event mechanism) | 7786 event (below this line is the generic event mechanism) |
7936 | | 7787 | |
7937 | | 7788 | |
7938 was there if not, call | 7789 was there if not, call |
7939 a SIGINT? emacs_Xt_next_event() | 7790 a SIGINT? @code{emacs_Xt_next_event()} |
7940 | | | 7791 | | |
7941 | | | 7792 | | |
7942 | | | 7793 | | |
7943 V V | 7794 V V |
7944 --->-------<---- | 7795 --->-------<---- |
7945 | | 7796 | |
7946 | [collected in event_stream_next_event(); | 7797 | [collected in @code{event_stream_next_event()}; |
7947 | SIGINT is converted using maybe_read_quit_event()] | 7798 | SIGINT is converted using @code{maybe_read_quit_event()}] |
7948 V | 7799 V |
7949 Emacs | 7800 Emacs |
7950 event | 7801 event |
7951 | | 7802 | |
7952 \---->------>----- maybe_kbd_translate() -->-----\ | 7803 \---->------>----- maybe_kbd_translate() -->-----\ |
7954 | | 7805 | |
7955 | | 7806 | |
7956 command event queue | | 7807 command event queue | |
7957 if not from command | 7808 if not from command |
7958 (contains events that were event queue, call | 7809 (contains events that were event queue, call |
7959 read earlier but not processed, event_stream_next_event() | 7810 read earlier but not processed, @code{event_stream_next_event()} |
7960 typically when waiting in a | | 7811 typically when waiting in a | |
7961 sit-for, sleep-for, etc. for | | 7812 sit-for, sleep-for, etc. for | |
7962 a particular event to be received) | | 7813 a particular event to be received) | |
7963 | | | 7814 | | |
7964 | | | 7815 | | |
7965 V V | 7816 V V |
7966 ---->----------------------------------<------ | 7817 ---->----------------------------------<------ |
7967 | | 7818 | |
7968 | [collected in | 7819 | [collected in |
7969 | next_event_internal()] | 7820 | @code{next_event_internal()}] |
7970 | | 7821 | |
7971 unread- unread- event from | | 7822 unread- unread- event from | |
7972 command- command- keyboard else, call | 7823 command- command- keyboard else, call |
7973 events event macro next_event_internal() | 7824 events event macro @code{next_event_internal()} |
7974 | | | | | 7825 | | | | |
7975 | | | | | 7826 | | | | |
7976 | | | | | 7827 | | | | |
7977 V V V V | 7828 V V V V |
7978 --------->----------------------<------------ | 7829 --------->----------------------<------------ |
7979 | | 7830 | |
7980 | [collected in `next-event', which may loop | 7831 | [collected in @code{next-event}, which may loop |
7981 | more than once if the event it gets is on | 7832 | more than once if the event it gets is on |
7982 | a dead frame, device, etc.] | 7833 | a dead frame, device, etc.] |
7983 | | 7834 | |
7984 | | 7835 | |
7985 V | 7836 V |
7986 feed into top-level event loop, | 7837 feed into top-level event loop, |
7987 which repeatedly calls `next-event' | 7838 which repeatedly calls @code{next-event} |
7988 and then dispatches the event | 7839 and then dispatches the event |
7989 using `dispatch-event' | 7840 using @code{dispatch-event} |
7990 @end example | 7841 @end example |
7991 | 7842 |
7992 @node Specifics About the Emacs Event | 7843 @node Specifics About the Emacs Event, Event Queues, Specifics of the Event Gathering Mechanism, Events and the Event Loop |
7993 @section Specifics About the Emacs Event | 7844 @section Specifics About the Emacs Event |
7994 @cindex event, specifics about the Lisp object | 7845 @cindex event, specifics about the Lisp object |
7995 | 7846 |
7996 @node The Event Stream Callback Routines | 7847 @node Event Queues, Event Stream Callback Routines, Specifics About the Emacs Event, Events and the Event Loop |
7997 @section The Event Stream Callback Routines | 7848 @section Event Queues |
7998 @cindex event stream callback routines, the | 7849 @cindex event queues |
7999 @cindex callback routines, the event stream | 7850 @cindex queues, event |
8000 | 7851 |
8001 @node Other Event Loop Functions | 7852 There are two event queues here -- the command event queue (#### which |
7853 should be called "deferred event queue" and is in my glyph ws) and the | |
7854 dispatch event queue. (MS Windows actually has an extra dispatch queue | |
7855 for non-user events and uses the generic one only for user events. This | |
7856 is because user and non-user events in Windows come through the same | |
7857 place -- the window procedure -- but under X, it's possible to | |
7858 selectively process events such that we take all the user events before | |
7859 the non-user ones. #### In fact, given the way we now drain the queue, | |
7860 we might need two separate queues, like under Windows. Need to think | |
7861 carefully exactly how this works, and should certainly generalize the | |
7862 two different queues. | |
7863 | |
7864 The dispatch queue (which used to occur duplicated inside of each event | |
7865 implementation) is used for events that have been read from the | |
7866 window-system event queue(s) and not yet process by | |
7867 @code{next_event_internal()}. It exists for two reasons: (1) because in many | |
7868 implementations, events often come from the window system by way of | |
7869 callbacks, and need to push the event to be returned onto a queue; (2) | |
7870 in order to handle QUIT in a guaranteed correct fashion without | |
7871 resorting to weird implementation-specific hacks that may or may not | |
7872 work well, we need to drain the window-system event queues and then look | |
7873 through to see if there's an event matching quit-char (usually ^G). the | |
7874 drained events need to go onto a queue. (There are other, similar cases | |
7875 where we need to drain the pending events so we can look ahead -- for | |
7876 example, checking for pending expose events under X to avoid excessive | |
7877 server activity.) | |
7878 | |
7879 The command event queue is used @strong{AFTER} an event has been read from | |
7880 @code{next_event_internal()}, when it needs to be pushed back. This | |
7881 includes, for example, @code{accept-process-output}, @code{sleep-for} | |
7882 and @code{wait_delaying_user_input()}. Eval events and the like, | |
7883 generated by @code{enqueue-eval-event}, | |
7884 @code{enqueue_magic_eval_event()}, etc. are also pushed onto this queue. | |
7885 Some events generated by callbacks are also pushed onto this queue, #### | |
7886 although maybe shouldn't be. | |
7887 | |
7888 The command queue takes precedence over the dispatch queue. | |
7889 | |
7890 #### It is worth investigating to see whether both queues are really | |
7891 needed, and how exactly they should be used. @code{enqueue-eval-event}, | |
7892 for example, could certainly push onto the dispatch queue, and all | |
7893 callbacks maybe should. @code{wait_delaying_user_input()} seems to need | |
7894 both queues, since it can take events from the dispatch queue and push | |
7895 them onto the command queue; but it perhaps could be rewritten to avoid | |
7896 this. #### In general we need to review the handling of these two | |
7897 queues, figure out exactly what ought to be happening, and document it. | |
7898 | |
7899 | |
7900 @node Event Stream Callback Routines, Other Event Loop Functions, Event Queues, Events and the Event Loop | |
7901 @section Event Stream Callback Routines | |
7902 @cindex event stream callback routines | |
7903 @cindex callback routines, event stream | |
7904 | |
7905 There is one object called an event_stream. This object contains | |
7906 callback functions for doing the window-system-dependent operations | |
7907 that XEmacs requires. | |
7908 | |
7909 If XEmacs is compiled with support for X11 and the X Toolkit, then this | |
7910 event_stream structure will contain functions that can cope with input | |
7911 on XEmacs windows on multiple displays, as well as input from dumb tty | |
7912 frames. | |
7913 | |
7914 If it is desired to have XEmacs able to open frames on the displays of | |
7915 multiple heterogeneous machines, X11 and SunView, or X11 and NeXT, for | |
7916 example, then it will be necessary to construct an event_stream structure | |
7917 that can cope with the given types. Currently, the only implemented | |
7918 event_streams are for dumb-ttys, and for X11 plus dumb-ttys, | |
7919 and for mswindows. | |
7920 | |
7921 To implement this for one window system is relatively simple. | |
7922 To implement this for multiple window systems is trickier and may | |
7923 not be possible in all situations, but it's been done for X and TTY. | |
7924 | |
7925 Note that these callbacks are @strong{NOT} console methods; that's because | |
7926 the routines are not specific to a particular console type but must | |
7927 be able to simultaneously cope with all allowable console types. | |
7928 | |
7929 The slots of the event_stream structure: | |
7930 | |
7931 @table @code | |
7932 @item next_event_cb | |
7933 A function which fills in an XEmacs_event structure with the next event | |
7934 available. If there is no event available, then this should block. | |
7935 | |
7936 IMPORTANT: timer events and especially process events *must not* be | |
7937 returned if there are events of other types available; otherwise you can | |
7938 end up with an infinite loop in @code{Fdiscard_input()}. | |
7939 | |
7940 @item event_pending_cb | |
7941 A function which says whether there are events to be read. If called | |
7942 with an argument of 0, then this should say whether calling the | |
7943 @code{next_event_cb} will block. If called with a non-zero argument, | |
7944 then this should say whether there are that many user-generated events | |
7945 pending (that is, keypresses, mouse-clicks, dialog-box selection events, | |
7946 etc.). (This is used for redisplay optimization, among other things.) | |
7947 The difference is that the former includes process events and timer | |
7948 events, but the latter doesn't. | |
7949 | |
7950 If this function is not sure whether there are events to be read, it | |
7951 @strong{must} return 0. Otherwise various undesirable effects will | |
7952 occur, such as redisplay not occurring until the next event occurs. | |
7953 | |
7954 @item handle_magic_event_cb | |
7955 XEmacs calls this with an event structure which contains window-system | |
7956 dependent information that XEmacs doesn't need to know about, but which | |
7957 must happen in order. If the @code{next_event_cb} never returns an | |
7958 event of type "magic", this will never be used. | |
7959 | |
7960 @item format_magic_event_cb | |
7961 Called with a magic event; print a representation of the innards of the | |
7962 event to @var{PSTREAM}. | |
7963 | |
7964 @item compare_magic_event_cb | |
7965 Called with two magic events; return non-zero if the innards of the two | |
7966 are equal, zero otherwise. | |
7967 | |
7968 @item hash_magic_event_cb | |
7969 Called with a magic event; return a hash of the innards of the event. | |
7970 | |
7971 @item add_timeout_cb | |
7972 Called with an @var{EMACS_TIME}, the absolute time at which a wakeup event | |
7973 should be generated; and a void *, which is an arbitrary value that will | |
7974 be returned in the timeout event. The timeouts generated by this | |
7975 function should be one-shots: they fire once and then disappear. This | |
7976 callback should return an int id-number which uniquely identifies this | |
7977 wakeup. If an implementation doesn't have microseconds or millisecond | |
7978 granularity, it should round up to the closest value it can deal with. | |
7979 | |
7980 @item remove_timeout_cb | |
7981 Called with an int, the id number of a wakeup to discard. This id | |
7982 number must have been returned by the @code{add_timeout_cb}. If the given | |
7983 wakeup has already expired, this should do nothing. | |
7984 | |
7985 @item select_process_cb | |
7986 @item unselect_process_cb | |
7987 These callbacks tell the underlying implementation to add or remove a | |
7988 file descriptor from the list of fds which are polled for | |
7989 inferior-process input. When input becomes available on the given | |
7990 process connection, an event of type "process" should be generated. | |
7991 | |
7992 @item select_console_cb | |
7993 @item unselect_console_cb | |
7994 These callbacks tell the underlying implementation to add or remove a | |
7995 console from the list of consoles which are polled for user-input. | |
7996 | |
7997 @item select_device_cb | |
7998 @item unselect_device_cb | |
7999 These callbacks are used by Unixoid event loops (those that use @code{select()} | |
8000 and file descriptors and have a separate input fd per device). | |
8001 | |
8002 @item create_io_streams_cb | |
8003 @item delete_io_streams_cb | |
8004 These callbacks are called by process code to create the input and | |
8005 output lstreams which are used for subprocess I/O. | |
8006 | |
8007 @item quitp_cb | |
8008 A handler function called from the @code{QUIT} macro which should check | |
8009 whether the quit character has been typed. On systems with SIGIO, this | |
8010 will not be called unless the @code{sigio_happened} flag is true (it is set | |
8011 from the SIGIO handler). | |
8012 @end table | |
8013 | |
8014 XEmacs has its own event structures, which are distinct from the event | |
8015 structures used by X or any other window system. It is the job of the | |
8016 event_stream layer to translate to this format. | |
8017 | |
8018 @node Other Event Loop Functions, Stream Pairs, Event Stream Callback Routines, Events and the Event Loop | |
8002 @section Other Event Loop Functions | 8019 @section Other Event Loop Functions |
8003 @cindex event loop functions, other | 8020 @cindex event loop functions, other |
8004 | 8021 |
8005 @code{detect_input_pending()} and @code{input-pending-p} look for | 8022 @code{detect_input_pending()} and @code{input-pending-p} look for |
8006 input by calling @code{event_stream->event_pending_p} and looking in | 8023 input by calling @code{event_stream->event_pending_p} and looking in |
8019 @code{read-char} calls @code{next-command-event} and uses | 8036 @code{read-char} calls @code{next-command-event} and uses |
8020 @code{event_to_character()} to return the character equivalent. With | 8037 @code{event_to_character()} to return the character equivalent. With |
8021 the right kind of input method support, it is possible for (read-char) | 8038 the right kind of input method support, it is possible for (read-char) |
8022 to return a Kanji character. | 8039 to return a Kanji character. |
8023 | 8040 |
8024 @node Converting Events | 8041 @node Stream Pairs, Converting Events, Other Event Loop Functions, Events and the Event Loop |
8042 @section Stream Pairs | |
8043 @cindex stream pairs | |
8044 @cindex pairs, stream | |
8045 | |
8046 Since there are many possible processes/event loop combinations, the | |
8047 event code is responsible for creating an appropriate lstream type. The | |
8048 process implementation does not care about that implementation. | |
8049 | |
8050 The Create stream pair function is passed two void* values, which | |
8051 identify process-dependent 'handles'. The process implementation uses | |
8052 these handles to communicate with child processes. The function must be | |
8053 prepared to receive handle types of any process implementation. Since | |
8054 only one process implementation exists in a particular XEmacs | |
8055 configuration, preprocessing is a means of compiling in the support for | |
8056 the code which deals with particular handle types. | |
8057 | |
8058 For example, a unixoid type loop, which relies on file descriptors, may be | |
8059 asked to create a pair of streams by a unix-style process implementation. | |
8060 In this case, the handles passed are unix file descriptors, and the code | |
8061 may deal with these directly. Although, the same code may be used on Win32 | |
8062 system with X-Windows. In this case, Win32 process implementation passes | |
8063 handles of type HANDLE, and the @code{create_io_streams} function must call | |
8064 appropriate function to get file descriptors given HANDLEs, so that these | |
8065 descriptors may be passed to @code{XtAddInput}. | |
8066 | |
8067 The handle given may have special denying value, in which case the | |
8068 corresponding lstream should not be created. | |
8069 | |
8070 The return value of the function is a unique stream identifier. It is used | |
8071 by processes implementation, in its platform-independent part. There is | |
8072 the get_process_from_usid function, which returns process object given its | |
8073 USID. The event stream is responsible for converting its internal handle | |
8074 type into USID. | |
8075 | |
8076 Example is the TTY event stream. When a file descriptor signals input, the | |
8077 event loop must determine process to which the input is destined. Thus, | |
8078 the implementation uses process input stream file descriptor as USID, by | |
8079 simply casting the fd value to USID type. | |
8080 | |
8081 There are two special USID values. One, @code{USID_ERROR}, indicates | |
8082 that the stream pair cannot be created. The second, | |
8083 @code{USID_DONTHASH}, indicates that streams are created, but the event | |
8084 stream does not wish to be able to find the process by its | |
8085 USID. Specifically, if an event stream implementation never calls | |
8086 @code{get_process_from_usid}, this value should always be returned, to | |
8087 prevent accumulating useless information on USID to process | |
8088 relationship. | |
8089 | |
8090 @node Converting Events, Dispatching Events; The Command Builder, Stream Pairs, Events and the Event Loop | |
8025 @section Converting Events | 8091 @section Converting Events |
8026 @cindex converting events | 8092 @cindex converting events |
8027 @cindex events, converting | 8093 @cindex events, converting |
8028 | 8094 |
8029 @code{character_to_event()}, @code{event_to_character()}, | 8095 @code{character_to_event()}, @code{event_to_character()}, |
8032 event was not a keypress, @code{event_to_character()} returns -1 and | 8098 event was not a keypress, @code{event_to_character()} returns -1 and |
8033 @code{event-to-character} returns @code{nil}. These functions convert | 8099 @code{event-to-character} returns @code{nil}. These functions convert |
8034 between character representation and the split-up event representation | 8100 between character representation and the split-up event representation |
8035 (keysym plus mod keys). | 8101 (keysym plus mod keys). |
8036 | 8102 |
8037 @node Dispatching Events; The Command Builder | 8103 @node Dispatching Events; The Command Builder, Focus Handling, Converting Events, Events and the Event Loop |
8038 @section Dispatching Events; The Command Builder | 8104 @section Dispatching Events; The Command Builder |
8039 @cindex dispatching events; the command builder | 8105 @cindex dispatching events; the command builder |
8040 @cindex events; the command builder, dispatching | 8106 @cindex events; the command builder, dispatching |
8041 @cindex command builder, dispatching events; the | 8107 @cindex command builder, dispatching events; the |
8042 | 8108 |
8043 Not yet documented. | 8109 Not yet documented. |
8044 | 8110 |
8045 @node Evaluation; Stack Frames; Bindings, Symbols and Variables, Events and the Event Loop, Top | 8111 @node Focus Handling, Editor-Level Control Flow Modules, Dispatching Events; The Command Builder, Events and the Event Loop |
8112 @section Focus Handling | |
8113 @cindex focus handling | |
8114 | |
8115 Ben's capsule lecture on focus: | |
8116 | |
8117 In GNU Emacs @code{select-frame} never changes the window-manager frame | |
8118 focus. All it does is change the "selected frame". This is similar to | |
8119 what happens when we call @code{select-device} or @code{select-console}. | |
8120 Whenever an event comes in (including a keyboard event), its frame is | |
8121 selected; therefore, evaluating @code{select-frame} in @samp{*scratch*} | |
8122 won't cause any effects because the next received event (in the same | |
8123 frame) will cause a switch back to the frame displaying | |
8124 @samp{*scratch*}. | |
8125 | |
8126 Whenever a focus-change event is received from the window manager, it | |
8127 generates a @code{switch-frame} event, which causes the Lisp function | |
8128 @code{handle-switch-frame} to get run. This basically just runs | |
8129 @code{select-frame} (see below, however). | |
8130 | |
8131 In GNU Emacs, if you want to have an operation run when a frame is | |
8132 selected, you supply an event binding for @code{switch-frame} (and then | |
8133 maybe call @code{handle-switch-frame}, or something ...). | |
8134 | |
8135 In XEmacs, we @strong{do} change the window-manager frame focus as a | |
8136 result of @code{select-frame}, but not until the next time an event is | |
8137 received, so that a function that momentarily changes the selected frame | |
8138 won't cause WM focus flashing. (#### There's something not quite right | |
8139 here; this is causing the wrong-cursor-focus problems that you | |
8140 occasionally see. But the general idea is correct.) This approach is | |
8141 winning for people who use the explicit-focus model, but is trickier to | |
8142 implement. | |
8143 | |
8144 We also don't make the @code{switch-frame} event visible but instead have | |
8145 @code{select-frame-hook}, which is a better approach. | |
8146 | |
8147 There is the problem of surrogate minibuffers, where when we enter the | |
8148 minibuffer, you essentially want to temporarily switch the WM focus to | |
8149 the frame with the minibuffer, and switch it back when you exit the | |
8150 minibuffer. | |
8151 | |
8152 GNU Emacs solves this with the crockish @code{redirect-frame-focus}, | |
8153 which says "for keyboard events received from FRAME, act like they're | |
8154 coming from FOCUS-FRAME". I think what this means is that, when a | |
8155 keyboard event comes in and the event manager is about to select the | |
8156 event's frame, if that frame has its focus redirected, the redirected-to | |
8157 frame is selected instead. That way, if you're in a minibufferless | |
8158 frame and enter the minibuffer, then all Lisp functions that run see the | |
8159 selected frame as the minibuffer's frame rather than the minibufferless | |
8160 frame you came from, so that (e.g.) your typing actually appears in the | |
8161 minibuffer's frame and things behave sanely. | |
8162 | |
8163 There's also some weird logic that switches the redirected frame focus | |
8164 from one frame to another if Lisp code explicitly calls | |
8165 @code{select-frame} (but not if @code{handle-switch-frame} is called), | |
8166 and saves and restores the frame focus in window configurations, | |
8167 etc. etc. All of this logic is heavily @code{#if 0}'d, with lots of | |
8168 comments saying "No, this approach doesn't seem to work, so I'm trying | |
8169 this ... is it reasonable? Well, I'm not sure ..." that are a red flag | |
8170 indicating crockishness. | |
8171 | |
8172 Because of our way of doing things, we can avoid all this crock. | |
8173 Keyboard events never cause a select-frame (who cares what frame they're | |
8174 associated with? They come from a console, only). We change the actual | |
8175 WM focus to a surrogate minibuffer frame, so we don't have to do any | |
8176 internal redirection. In order to get the focus back, I took the | |
8177 approach in @file{minibuf.el} of just checking to see if the frame we moved to | |
8178 is still the selected frame, and move back to the old one if so. | |
8179 Conceivably we might have to do the weird "tracking" that GNU Emacs does | |
8180 when @code{select-frame} is called, but I don't think so. If the | |
8181 selected frame moved from the minibuffer frame, then we just leave it | |
8182 there, figuring that someone knows what they're doing. Because we don't | |
8183 have any redirection recorded anywhere, it's safe to do this, and we | |
8184 don't end up with unwanted redirection. | |
8185 | |
8186 @node Editor-Level Control Flow Modules, , Focus Handling, Events and the Event Loop | |
8187 @section Editor-Level Control Flow Modules | |
8188 @cindex control flow modules, editor-level | |
8189 @cindex modules, editor-level control flow | |
8190 | |
8191 @example | |
8192 @file{event-Xt.c} | |
8193 @file{event-msw.c} | |
8194 @file{event-stream.c} | |
8195 @file{event-tty.c} | |
8196 @file{events-mod.h} | |
8197 @file{gpmevent.c} | |
8198 @file{gpmevent.h} | |
8199 @file{events.c} | |
8200 @file{events.h} | |
8201 @end example | |
8202 | |
8203 These implement the handling of events (user input and other system | |
8204 notifications). | |
8205 | |
8206 @file{events.c} and @file{events.h} define the @dfn{event} Lisp object | |
8207 type and primitives for manipulating it. | |
8208 | |
8209 @file{event-stream.c} implements the basic functions for working with | |
8210 event queues, dispatching an event by looking it up in relevant keymaps | |
8211 and such, and handling timeouts; this includes the primitives | |
8212 @code{next-event} and @code{dispatch-event}, as well as related | |
8213 primitives such as @code{sit-for}, @code{sleep-for}, and | |
8214 @code{accept-process-output}. (@file{event-stream.c} is one of the | |
8215 hairiest and trickiest modules in XEmacs. Beware! You can easily mess | |
8216 things up here.) | |
8217 | |
8218 @file{event-Xt.c} and @file{event-tty.c} implement the low-level | |
8219 interfaces onto retrieving events from Xt (the X toolkit) and from TTY's | |
8220 (using @code{read()} and @code{select()}), respectively. The event | |
8221 interface enforces a clean separation between the specific code for | |
8222 interfacing with the operating system and the generic code for working | |
8223 with events, by defining an API of basic, low-level event methods; | |
8224 @file{event-Xt.c} and @file{event-tty.c} are two different | |
8225 implementations of this API. To add support for a new operating system | |
8226 (e.g. NeXTstep), one merely needs to provide another implementation of | |
8227 those API functions. | |
8228 | |
8229 Note that the choice of whether to use @file{event-Xt.c} or | |
8230 @file{event-tty.c} is made at compile time! Or at the very latest, it | |
8231 is made at startup time. @file{event-Xt.c} handles events for | |
8232 @emph{both} X and TTY frames; @file{event-tty.c} is only used when X | |
8233 support is not compiled into XEmacs. The reason for this is that there | |
8234 is only one event loop in XEmacs: thus, it needs to be able to receive | |
8235 events from all different kinds of frames. | |
8236 | |
8237 | |
8238 | |
8239 @example | |
8240 @file{keymap.c} | |
8241 @file{keymap.h} | |
8242 @end example | |
8243 | |
8244 @file{keymap.c} and @file{keymap.h} define the @dfn{keymap} Lisp object | |
8245 type and associated methods and primitives. (Remember that keymaps are | |
8246 objects that associate event descriptions with functions to be called to | |
8247 ``execute'' those events; @code{dispatch-event} looks up events in the | |
8248 relevant keymaps.) | |
8249 | |
8250 | |
8251 | |
8252 @example | |
8253 @file{cmdloop.c} | |
8254 @end example | |
8255 | |
8256 @file{cmdloop.c} contains functions that implement the actual editor | |
8257 command loop---i.e. the event loop that cyclically retrieves and | |
8258 dispatches events. This code is also rather tricky, just like | |
8259 @file{event-stream.c}. | |
8260 | |
8261 | |
8262 | |
8263 @example | |
8264 @file{macros.c} | |
8265 @file{macros.h} | |
8266 @end example | |
8267 | |
8268 These two modules contain the basic code for defining keyboard macros. | |
8269 These functions don't actually do much; most of the code that handles keyboard | |
8270 macros is mixed in with the event-handling code in @file{event-stream.c}. | |
8271 | |
8272 | |
8273 | |
8274 @example | |
8275 @file{minibuf.c} | |
8276 @end example | |
8277 | |
8278 This contains some miscellaneous code related to the minibuffer (most of | |
8279 the minibuffer code was moved into Lisp by Richard Mlynarik). This | |
8280 includes the primitives for completion (although filename completion is | |
8281 in @file{dired.c}), the lowest-level interface to the minibuffer (if the | |
8282 command loop were cleaned up, this too could be in Lisp), and code for | |
8283 dealing with the echo area (this, too, was mostly moved into Lisp, and | |
8284 the only code remaining is code to call out to Lisp or provide simple | |
8285 bootstrapping implementations early in temacs, before the echo-area Lisp | |
8286 code is loaded). | |
8287 | |
8288 | |
8289 @node Asynchronous Events; Quit Checking, Evaluation; Stack Frames; Bindings, Events and the Event Loop, Top | |
8290 @chapter Asynchronous Events; Quit Checking | |
8291 @cindex asynchronous events; quit checking | |
8292 @cindex asynchronous events | |
8293 | |
8294 @menu | |
8295 * Signal Handling:: | |
8296 * Control-G (Quit) Checking:: | |
8297 * Profiling:: | |
8298 * Asynchronous Timeouts:: | |
8299 * Exiting:: | |
8300 @end menu | |
8301 | |
8302 @node Signal Handling, Control-G (Quit) Checking, Asynchronous Events; Quit Checking, Asynchronous Events; Quit Checking | |
8303 @section Signal Handling | |
8304 @cindex signal handling | |
8305 | |
8306 @node Control-G (Quit) Checking, Profiling, Signal Handling, Asynchronous Events; Quit Checking | |
8307 @section Control-G (Quit) Checking | |
8308 @cindex Control-g checking | |
8309 @cindex C-g checking | |
8310 @cindex quit checking | |
8311 @cindex QUIT checking | |
8312 @cindex critical quit | |
8313 | |
8314 @emph{Note}: The code to handle QUIT is divided between @file{lisp.h} | |
8315 and @file{signal.c}. There is also some special-case code in the async | |
8316 timer code in @file{event-stream.c} to notice when the poll-for-quit | |
8317 (and poll-for-sigchld) timers have gone off. | |
8318 | |
8319 Here's an overview of how this convoluted stuff works: | |
8320 | |
8321 @enumerate | |
8322 @item | |
8323 | |
8324 Scattered throughout the XEmacs core code are calls to the macro QUIT; | |
8325 This macro checks to see whether a @kbd{C-g} has recently been pressed | |
8326 and not yet handled, and if so, it handles the @kbd{C-g} by calling | |
8327 @code{signal_quit()}, which invokes the standard @code{Fsignal()} code, | |
8328 with the error being @code{Qquit}. Lisp code can establish handlers | |
8329 for this (using @code{condition-case}), but normally there is no | |
8330 handler, and so execution is thrown back to the innermost enclosing | |
8331 event loop. (One of the things that happens when entering an event loop | |
8332 is that a @code{condition-case} is established that catches @strong{all} calls | |
8333 to @code{signal}, including this one.) | |
8334 | |
8335 @item | |
8336 How does the QUIT macro check to see whether @kbd{C-g} has been pressed; | |
8337 obviously this needs to be extremely fast. Now for some history. | |
8338 In early Lemacs as inherited from the FSF going back 15 years or | |
8339 more, there was a great fondness for using SIGIO (which is sent | |
8340 whenever there is I/O available on a given socket, tty, etc.). | |
8341 In fact, in GNU Emacs, perhaps even today, all reading of events | |
8342 from the X server occurs inside the SIGIO handler! This is crazy, | |
8343 but not completely relevant. What is relevant is that similar | |
8344 stuff happened inside the SIGIO handler for @kbd{C-g}: it searched | |
8345 through all the pending (i.e. not yet delivered to XEmacs yet) | |
8346 X events for one that matched @kbd{C-g}. When it saw a match, it set | |
8347 Vquit_flag to Qt. On TTY's, @kbd{C-g} is actually mapped to be the | |
8348 interrupt character (i.e. it generates SIGINT), and XEmacs's | |
8349 handler for this signal sets Vquit_flag to Qt. Then, sometime | |
8350 later after the signal handlers finished and a QUIT macro was | |
8351 called, the macro noticed the setting of @code{Vquit_flag} and used | |
8352 this as an indication to call @code{signal_quit()}. What @code{signal_quit()} | |
8353 actually does is set @code{Vquit_flag} to Qnil (so that we won't get | |
8354 repeated interruptions from a single @kbd{C-g} press) and then calls | |
8355 the equivalent of (signal 'quit nil). | |
8356 | |
8357 @item | |
8358 Another complication is introduced in that Vquit_flag is actually | |
8359 exported to Lisp as @code{quit-flag}. This allows users some level of | |
8360 control over whether and when @kbd{C-g} is processed as quit, esp. in | |
8361 combination with @code{inhibit-quit}. This is another Lisp variable, | |
8362 and if set to non-nil, it inhibits @code{signal_quit()} from getting | |
8363 called, meaning that the @kbd{C-g} gets essentially ignored. But not | |
8364 completely: Because the resetting of @code{quit-flag} happens only | |
8365 in @code{signal_quit()}, which isn't getting called, the @kbd{C-g} press is | |
8366 still noticed, and as soon as @code{inhibit-quit} is set back to nil, | |
8367 a quit will be signalled at the next QUIT macro. Thus, what | |
8368 @code{inhibit-quit} really does is defer quits until after the quit- | |
8369 inhibitted period. | |
8370 | |
8371 @item | |
8372 Another consideration, introduced by XEmacs, is critical quitting. If | |
8373 you press @kbd{Control-Shift-G} instead of just @kbd{C-g}, | |
8374 @code{quit-flag} is set to @code{critical} instead of to t. When QUIT | |
8375 processes this value, it @strong{ignores} the value of | |
8376 @code{inhibit-quit}. This allows you to quit even out of a | |
8377 quit-inhibitted section of code! Furthermore, when @code{signal_quit()} | |
8378 notices that it was invoked as a result of a critical quit, it | |
8379 automatically invokes the debugger (which otherwise would only happen | |
8380 when @code{debug-on-quit} is set to t). | |
8381 | |
8382 @item | |
8383 Well, I explained above about how @code{quit-flag} gets set correctly, | |
8384 but I began with a disclaimer stating that this was the old way | |
8385 of doing things. What's done now? Well, first of all, the SIGIO | |
8386 handler (which formerly checked all pending events to see if there's | |
8387 a @kbd{C-g}) now does nothing but set a flag -- or actually two flags, | |
8388 something_happened and quit_check_signal_happened. There are two | |
8389 flags because the QUIT macro is now used for more than just handling | |
8390 QUIT; it's also used for running asynchronous timeout handlers that | |
8391 have recently expired, and perhaps other things. The idea here is | |
8392 that the QUIT macros occur extremely often in the code, but only occur | |
8393 at places that are relatively safe -- in particular, if an error occurs, | |
8394 nothing will get completely trashed. | |
8395 | |
8396 @item | |
8397 Now, let's look at QUIT again. | |
8398 | |
8399 @item | |
8400 | |
8401 UNFINISHED. Note, however, that as of the point when this comment got | |
8402 committed to CVS (mid-2001), the interaction between reading @kbd{C-g} | |
8403 as an event and processing it as QUIT was overhauled to (for the first | |
8404 time) be understandable and actually work correctly. Now, the way | |
8405 things work is that if @kbd{C-g} is pressed while XEmacs is blocking at | |
8406 the top level, waiting for a user event, it will be read as an event; | |
8407 otherwise, it will cause QUIT. (This includes times when XEmacs is | |
8408 blocking, but not waiting for a user event, | |
8409 e.g. @code{accept-process-output} and | |
8410 @code{wait_delaying_user_events()}.) Formerly, this was supposed to | |
8411 happen, but didn't always due to a bizarre and broken scheme, documented | |
8412 in @code{next_event_internal} like this: | |
8413 | |
8414 @quotation | |
8415 If we read a @kbd{C-g}, then set @code{quit-flag} but do not discard the | |
8416 @kbd{C-g}. The callers of @code{next_event_internal()} will do one of | |
8417 two things: | |
8418 | |
8419 @enumerate | |
8420 @item | |
8421 set @code{Vquit_flag} to Qnil. (@code{next-event} does this.) This will | |
8422 cause the ^G to be treated as a normal keystroke. | |
8423 | |
8424 @item | |
8425 not change @code{Vquit_flag} but attempt to enqueue the ^G, at which | |
8426 point it will be discarded. The next time QUIT is called, it will | |
8427 notice that @code{Vquit_flag} was set. | |
8428 @end enumerate | |
8429 @end quotation | |
8430 | |
8431 This required weirdness in @code{enqueue_command_event_1} like this: | |
8432 | |
8433 @quotation | |
8434 put the event on the typeahead queue, unless the event is the quit char, | |
8435 in which case the @code{QUIT} which will occur on the next trip through this | |
8436 loop is all the processing we should do - leaving it on the queue would | |
8437 cause the quit to be processed twice. | |
8438 @end quotation | |
8439 | |
8440 And further weirdness elsewhere, none of which made any sense, and | |
8441 didn't work, because (e.g.) it required that QUIT never happen anywhere | |
8442 inside @code{next_event_internal()} or any callers when @kbd{C-g} should | |
8443 be read as a user event, which was impossible to implement in practice. | |
8444 | |
8445 Now what we do is fairly simple. Callers of | |
8446 @code{next_event_internal()} that want @kbd{C-g} read as a user event | |
8447 call @code{begin_dont_check_for_quit()}. @code{next_event_internal()}, | |
8448 when it gets a @kbd{C-g}, simply sets @code{Vquit_flag} (just as when a | |
8449 @kbd{C-g} is detected during the operation of @code{QUIT} or | |
8450 @code{QUITP}), and then tries to @code{QUIT}. This will fail if blocked | |
8451 by the previous call, at which point @code{next_event_internal()} will | |
8452 return the @kbd{C-g} as an event. To unblock things, first set | |
8453 @code{Vquit_flag} to nil (it was set to t when the @kbd{C-g} was read, | |
8454 and if we don't reset it, the next call to @code{QUIT} will quit), and | |
8455 then @code{unbind_to()} the depth returned by | |
8456 @code{begin_dont_check_for_quit()}. It makes no difference is | |
8457 @code{QUIT} is called a zillion times in @code{next_event_internal()} or | |
8458 anywhere else, because it's blocked and will never signal. | |
8459 @end enumerate | |
8460 | |
8461 @node Profiling, Asynchronous Timeouts, Control-G (Quit) Checking, Asynchronous Events; Quit Checking | |
8462 @section Profiling | |
8463 @cindex profiling | |
8464 @cindex SIGPROF | |
8465 | |
8466 We implement our own profiling scheme so that we can determine | |
8467 things like which Lisp functions are occupying the most time. Any | |
8468 standard OS-provided profiling works on C functions, which is | |
8469 not always that useful -- and inconvenient, since it requires compiling | |
8470 with profile info and can't be retrieved dynamically, as XEmacs is | |
8471 running. | |
8472 | |
8473 The basic idea is simple. We set a profiling timer using setitimer | |
8474 (ITIMER_PROF), which generates a SIGPROF every so often. (This runs not | |
8475 in real time but rather when the process is executing or the system is | |
8476 running on behalf of the process -- at least, that is the case under | |
8477 Unix. Under MS Windows and Cygwin, there is no @code{setitimer()}, so we | |
8478 simulate it using multimedia timers, which run in real time. To make | |
8479 the results a bit more realistic, we ignore ticks that go off while | |
8480 blocking on an event wait. Note that Cygwin does provide a simulation | |
8481 of @code{setitimer()}, but it's in real time anyway, since Windows doesn't | |
8482 provide a way to have process-time timers, and furthermore, it's broken, | |
8483 so we don't use it.) When the signal goes off, we see what we're in, and | |
8484 add 1 to the count associated with that function. | |
8485 | |
8486 It would be nice to use the Lisp allocation mechanism etc. to keep track | |
8487 of the profiling information (i.e. to use Lisp hash tables), but we | |
8488 can't because that's not safe -- updating the timing information happens | |
8489 inside of a signal handler, so we can't rely on not being in the middle | |
8490 of Lisp allocation, garbage collection, @code{malloc()}, etc. Trying to make | |
8491 it work would be much more work than it's worth. Instead we use a basic | |
8492 (non-Lisp) hash table, which will not conflict with garbage collection | |
8493 or anything else as long as it doesn't try to resize itself. Resizing | |
8494 itself, however (which happens as a result of a @code{puthash()}), could be | |
8495 deadly. To avoid this, we make sure, at points where it's safe | |
8496 (e.g. @code{profile_record_about_to_call()} -- recording the entry into a | |
8497 function call), that the table always has some breathing room in it so | |
8498 that no resizes will occur until at least that many items are added. | |
8499 This is safe because any new item to be added in the sigprof would | |
8500 likely have the @code{profile_record_about_to_call()} called just before it, | |
8501 and the breathing room is checked. | |
8502 | |
8503 In general: any entry that the sigprof handler puts into the table comes | |
8504 from a backtrace frame (except "Processing Events at Top Level", and | |
8505 there's only one of those). Either that backtrace frame was added when | |
8506 profiling was on (in which case @code{profile_record_about_to_call()} was | |
8507 called and the breathing space updated), or when it was off -- and in | |
8508 this case, no such frames can have been added since the last time | |
8509 @code{start-profile} was called, so when @code{start-profile} is called we make | |
8510 sure there is sufficient breathing room to account for all entries | |
8511 currently on the stack. | |
8512 | |
8513 Jan 1998: In addition to timing info, I have added code to remember call | |
8514 counts of Lisp funcalls. The @code{profile_increase_call_count()} | |
8515 function is called from @code{Ffuncall()}, and serves to add data to | |
8516 Vcall_count_profile_table. This mechanism is much simpler and | |
8517 independent of the SIGPROF-driven one. It uses the Lisp allocation | |
8518 mechanism normally, since it is not called from a handler. It may | |
8519 even be useful to provide a way to turn on only one profiling | |
8520 mechanism, but I haven't done so yet. --hniksic | |
8521 | |
8522 Dec 2002: Total overhaul of the interface, making it sane and easier to | |
8523 use. --ben | |
8524 | |
8525 Feb 2003: Lots of rewriting of the internal code. Add GC-consing-usage, | |
8526 total GC usage, and total timing to the information tracked. Track | |
8527 profiling overhead and allow the ability to have internal sections | |
8528 (e.g. internal-external conversion, byte-char conversion) that are | |
8529 treated like Lisp functions for the purpose of profiling. --ben | |
8530 | |
8531 BEWARE: If you are modifying this file, be @strong{very} careful. Correctly | |
8532 implementing the "total" values is very tricky due to the possibility of | |
8533 recursion and of functions already on the stack when starting to | |
8534 profile/still on the stack when stopping. | |
8535 | |
8536 @node Asynchronous Timeouts, Exiting, Profiling, Asynchronous Events; Quit Checking | |
8537 @section Asynchronous Timeouts | |
8538 @cindex asynchronous timeouts | |
8539 | |
8540 @node Exiting, , Asynchronous Timeouts, Asynchronous Events; Quit Checking | |
8541 @section Exiting | |
8542 @cindex exiting | |
8543 @cindex crash | |
8544 @cindex hang | |
8545 @cindex core dump | |
8546 @cindex Armageddon | |
8547 @cindex exits, expected and unexpected | |
8548 @cindex unexpected exits | |
8549 @cindex expected exits | |
8550 | |
8551 Ben's capsule summary about expected and unexpected exits from XEmacs. | |
8552 | |
8553 Expected exits occur when the user directs XEmacs to exit, for example | |
8554 by pressing the close button on the only frame in XEmacs, or by typing | |
8555 @kbd{C-x C-c}. This runs @code{save-buffers-kill-emacs}, which saves | |
8556 any necessary buffers, and then exits using the primitive | |
8557 @code{kill-emacs}. | |
8558 | |
8559 However, unexpected exits occur in a few different ways: | |
8560 | |
8561 @itemize @bullet | |
8562 @item | |
8563 A memory access violation or other hardware-generated exception occurs. | |
8564 This is the worst possible problem to deal with, because the fault can | |
8565 occur while XEmacs is in any state whatsoever, even quite unstable ones. | |
8566 As a result, we need to be @strong{extremely} careful what we do. | |
8567 | |
8568 @item | |
8569 We are using one X display (or if we've used more, we've closed the | |
8570 others already), and some hardware or other problem happens and | |
8571 suddenly we've lost our connection to the display. In this situation, | |
8572 things are not so dire as in the last one; our code itself isn't | |
8573 trashed, so we can continue execution as normal, after having set | |
8574 things up so that we can exit at the appropriate time. Our exit | |
8575 still needs to be of the emergency nature; we have no displays, so | |
8576 any attempts to use them will fail. We simply want to auto-save | |
8577 (the single most important thing to do during shut-down), do minimal | |
8578 cleanup of stuff that has an independent existence outside of XEmacs, | |
8579 and exit. | |
8580 @end itemize | |
8581 | |
8582 Currently, both unexpected exit scenarios described above set | |
8583 @code{preparing_for_armageddon} to indicate that nonessential and possibly | |
8584 dangerous things should not be done, specifically: | |
8585 | |
8586 @itemize @minus | |
8587 @item | |
8588 no garbage collection. | |
8589 @item | |
8590 no hooks are run. | |
8591 @item | |
8592 no messages of any sort from autosaving. | |
8593 @item | |
8594 autosaving tries harder, ignoring certain failures. | |
8595 @item | |
8596 existing frames are not deleted. | |
8597 @end itemize | |
8598 | |
8599 (Also, all places that set @code{preparing_for_armageddon} also | |
8600 set @code{dont_check_for_quit}. This happens separately because it's | |
8601 also necessary to set other variables to make absolutely sure | |
8602 no quitting happens.) | |
8603 | |
8604 In the first scenario above (the access violation), we also set | |
8605 @code{fatal_error_in_progress}. This causes more things to not happen: | |
8606 | |
8607 @itemize @minus | |
8608 @item | |
8609 assertion failures do not abort. | |
8610 @item | |
8611 printing code does not do code conversion or gettext when | |
8612 printing to stdout/stderr. | |
8613 @end itemize | |
8614 | |
8615 @node Evaluation; Stack Frames; Bindings, Symbols and Variables, Asynchronous Events; Quit Checking, Top | |
8046 @chapter Evaluation; Stack Frames; Bindings | 8616 @chapter Evaluation; Stack Frames; Bindings |
8047 @cindex evaluation; stack frames; bindings | 8617 @cindex evaluation; stack frames; bindings |
8048 @cindex stack frames; bindings, evaluation; | 8618 @cindex stack frames; bindings, evaluation; |
8049 @cindex bindings, evaluation; stack frames; | 8619 @cindex bindings, evaluation; stack frames; |
8050 | 8620 |
8051 @menu | 8621 @menu |
8052 * Evaluation:: | 8622 * Evaluation:: |
8053 * Dynamic Binding; The specbinding Stack; Unwind-Protects:: | 8623 * Dynamic Binding; The specbinding Stack; Unwind-Protects:: |
8054 * Simple Special Forms:: | 8624 * Simple Special Forms:: |
8055 * Catch and Throw:: | 8625 * Catch and Throw:: |
8056 @end menu | 8626 @end menu |
8057 | 8627 |
8058 @node Evaluation | 8628 @node Evaluation, Dynamic Binding; The specbinding Stack; Unwind-Protects, Evaluation; Stack Frames; Bindings, Evaluation; Stack Frames; Bindings |
8059 @section Evaluation | 8629 @section Evaluation |
8060 @cindex evaluation | 8630 @cindex evaluation |
8061 | 8631 |
8062 @code{Feval()} evaluates the form (a Lisp object) that is passed to | 8632 @code{Feval()} evaluates the form (a Lisp object) that is passed to |
8063 it. Note that evaluation is only non-trivial for two types of objects: | 8633 it. Note that evaluation is only non-trivial for two types of objects: |
8184 @code{call3()} call a function, passing it the argument(s) given (the | 8754 @code{call3()} call a function, passing it the argument(s) given (the |
8185 arguments are given as separate C arguments rather than being passed as | 8755 arguments are given as separate C arguments rather than being passed as |
8186 an array). @code{apply1()} uses @code{Fapply()} while the others use | 8756 an array). @code{apply1()} uses @code{Fapply()} while the others use |
8187 @code{Ffuncall()} to do the real work. | 8757 @code{Ffuncall()} to do the real work. |
8188 | 8758 |
8189 @node Dynamic Binding; The specbinding Stack; Unwind-Protects | 8759 @node Dynamic Binding; The specbinding Stack; Unwind-Protects, Simple Special Forms, Evaluation, Evaluation; Stack Frames; Bindings |
8190 @section Dynamic Binding; The specbinding Stack; Unwind-Protects | 8760 @section Dynamic Binding; The specbinding Stack; Unwind-Protects |
8191 @cindex dynamic binding; the specbinding stack; unwind-protects | 8761 @cindex dynamic binding; the specbinding stack; unwind-protects |
8192 @cindex binding; the specbinding stack; unwind-protects, dynamic | 8762 @cindex binding; the specbinding stack; unwind-protects, dynamic |
8193 @cindex specbinding stack; unwind-protects, dynamic binding; the | 8763 @cindex specbinding stack; unwind-protects, dynamic binding; the |
8194 @cindex unwind-protects, dynamic binding; the specbinding stack; | 8764 @cindex unwind-protects, dynamic binding; the specbinding stack; |
8242 a local-variable binding (@code{func} is 0, @code{symbol} is not | 8812 a local-variable binding (@code{func} is 0, @code{symbol} is not |
8243 @code{nil}, and @code{old_value} holds the old value, which is stored as | 8813 @code{nil}, and @code{old_value} holds the old value, which is stored as |
8244 the symbol's value). | 8814 the symbol's value). |
8245 @end enumerate | 8815 @end enumerate |
8246 | 8816 |
8247 @node Simple Special Forms | 8817 @node Simple Special Forms, Catch and Throw, Dynamic Binding; The specbinding Stack; Unwind-Protects, Evaluation; Stack Frames; Bindings |
8248 @section Simple Special Forms | 8818 @section Simple Special Forms |
8249 @cindex special forms, simple | 8819 @cindex special forms, simple |
8250 | 8820 |
8251 @code{or}, @code{and}, @code{if}, @code{cond}, @code{progn}, | 8821 @code{or}, @code{and}, @code{if}, @code{cond}, @code{progn}, |
8252 @code{prog1}, @code{prog2}, @code{setq}, @code{quote}, @code{function}, | 8822 @code{prog1}, @code{prog2}, @code{setq}, @code{quote}, @code{function}, |
8260 Note that, with the exception of @code{Fprogn}, these functions are | 8830 Note that, with the exception of @code{Fprogn}, these functions are |
8261 typically called in real life only in interpreted code, since the byte | 8831 typically called in real life only in interpreted code, since the byte |
8262 compiler knows how to convert calls to these functions directly into | 8832 compiler knows how to convert calls to these functions directly into |
8263 byte code. | 8833 byte code. |
8264 | 8834 |
8265 @node Catch and Throw | 8835 @node Catch and Throw, , Simple Special Forms, Evaluation; Stack Frames; Bindings |
8266 @section Catch and Throw | 8836 @section Catch and Throw |
8267 @cindex catch and throw | 8837 @cindex catch and throw |
8268 @cindex throw, catch and | 8838 @cindex throw, catch and |
8269 | 8839 |
8270 @example | 8840 @example |
8321 the values of @code{gcprolist}, @code{backtrace_list}, and | 8891 the values of @code{gcprolist}, @code{backtrace_list}, and |
8322 @code{lisp_eval}, and calls @code{unbind_to()} to undo any specbindings | 8892 @code{lisp_eval}, and calls @code{unbind_to()} to undo any specbindings |
8323 created since the catch. | 8893 created since the catch. |
8324 | 8894 |
8325 | 8895 |
8326 @node Symbols and Variables, Buffers and Textual Representation, Evaluation; Stack Frames; Bindings, Top | 8896 @node Symbols and Variables, Buffers, Evaluation; Stack Frames; Bindings, Top |
8327 @chapter Symbols and Variables | 8897 @chapter Symbols and Variables |
8328 @cindex symbols and variables | 8898 @cindex symbols and variables |
8329 @cindex variables, symbols and | 8899 @cindex variables, symbols and |
8330 | 8900 |
8331 @menu | 8901 @menu |
8332 * Introduction to Symbols:: | 8902 * Introduction to Symbols:: |
8333 * Obarrays:: | 8903 * Obarrays:: |
8334 * Symbol Values:: | 8904 * Symbol Values:: |
8335 @end menu | 8905 @end menu |
8336 | 8906 |
8337 @node Introduction to Symbols | 8907 @node Introduction to Symbols, Obarrays, Symbols and Variables, Symbols and Variables |
8338 @section Introduction to Symbols | 8908 @section Introduction to Symbols |
8339 @cindex symbols, introduction to | 8909 @cindex symbols, introduction to |
8340 | 8910 |
8341 A symbol is basically just an object with four fields: a name (a | 8911 A symbol is basically just an object with four fields: a name (a |
8342 string), a value (some Lisp object), a function (some Lisp object), and | 8912 string), a value (some Lisp object), a function (some Lisp object), and |
8350 there can be a distinct function and variable with the same name. The | 8920 there can be a distinct function and variable with the same name. The |
8351 property list is used as a more general mechanism of associating | 8921 property list is used as a more general mechanism of associating |
8352 additional values with particular names, and once again the namespace is | 8922 additional values with particular names, and once again the namespace is |
8353 independent of the function and variable namespaces. | 8923 independent of the function and variable namespaces. |
8354 | 8924 |
8355 @node Obarrays | 8925 @node Obarrays, Symbol Values, Introduction to Symbols, Symbols and Variables |
8356 @section Obarrays | 8926 @section Obarrays |
8357 @cindex obarrays | 8927 @cindex obarrays |
8358 | 8928 |
8359 The identity of symbols with their names is accomplished through a | 8929 The identity of symbols with their names is accomplished through a |
8360 structure called an obarray, which is just a poorly-implemented hash | 8930 structure called an obarray, which is just a poorly-implemented hash |
8418 a new one, and @code{unintern} to remove a symbol from an obarray. This | 8988 a new one, and @code{unintern} to remove a symbol from an obarray. This |
8419 returns the removed symbol. (Remember: You can't put the symbol back | 8989 returns the removed symbol. (Remember: You can't put the symbol back |
8420 into any obarray.) Finally, @code{mapatoms} maps over all of the symbols | 8990 into any obarray.) Finally, @code{mapatoms} maps over all of the symbols |
8421 in an obarray. | 8991 in an obarray. |
8422 | 8992 |
8423 @node Symbol Values | 8993 @node Symbol Values, , Obarrays, Symbols and Variables |
8424 @section Symbol Values | 8994 @section Symbol Values |
8425 @cindex symbol values | 8995 @cindex symbol values |
8426 @cindex values, symbol | 8996 @cindex values, symbol |
8427 | 8997 |
8428 The value field of a symbol normally contains a Lisp object. However, | 8998 The value field of a symbol normally contains a Lisp object. However, |
8463 | 9033 |
8464 The exact workings of this are rather complex and involved and are | 9034 The exact workings of this are rather complex and involved and are |
8465 well-documented in comments in @file{buffer.c}, @file{symbols.c}, and | 9035 well-documented in comments in @file{buffer.c}, @file{symbols.c}, and |
8466 @file{lisp.h}. | 9036 @file{lisp.h}. |
8467 | 9037 |
8468 @node Buffers and Textual Representation, MULE Character Sets and Encodings, Symbols and Variables, Top | 9038 @node Buffers, Text, Symbols and Variables, Top |
8469 @chapter Buffers and Textual Representation | 9039 @chapter Buffers |
8470 @cindex buffers and textual representation | 9040 @cindex buffers |
8471 @cindex textual representation, buffers and | |
8472 | 9041 |
8473 @menu | 9042 @menu |
8474 * Introduction to Buffers:: A buffer holds a block of text such as a file. | 9043 * Introduction to Buffers:: A buffer holds a block of text such as a file. |
8475 * The Text in a Buffer:: Representation of the text in a buffer. | |
8476 * Buffer Lists:: Keeping track of all buffers. | 9044 * Buffer Lists:: Keeping track of all buffers. |
8477 * Markers and Extents:: Tagging locations within a buffer. | 9045 * Markers and Extents:: Tagging locations within a buffer. |
8478 * Ibytes and Ichars:: Representation of individual characters. | |
8479 * The Buffer Object:: The Lisp object corresponding to a buffer. | 9046 * The Buffer Object:: The Lisp object corresponding to a buffer. |
8480 * Searching and Matching:: Higher-level algorithms. | |
8481 @end menu | 9047 @end menu |
8482 | 9048 |
8483 @node Introduction to Buffers | 9049 @node Introduction to Buffers, Buffer Lists, Buffers, Buffers |
8484 @section Introduction to Buffers | 9050 @section Introduction to Buffers |
8485 @cindex buffers, introduction to | 9051 @cindex buffers, introduction to |
8486 | 9052 |
8487 A buffer is logically just a Lisp object that holds some text. | 9053 A buffer is logically just a Lisp object that holds some text. |
8488 In this, it is like a string, but a buffer is optimized for | 9054 In this, it is like a string, but a buffer is optimized for |
8532 and @dfn{buffer of the selected window}, and the distinction between | 9098 and @dfn{buffer of the selected window}, and the distinction between |
8533 @dfn{point} of the current buffer and @dfn{window-point} of the selected | 9099 @dfn{point} of the current buffer and @dfn{window-point} of the selected |
8534 window. (This latter distinction is explained in detail in the section | 9100 window. (This latter distinction is explained in detail in the section |
8535 on windows.) | 9101 on windows.) |
8536 | 9102 |
8537 @node The Text in a Buffer | 9103 @node Buffer Lists, Markers and Extents, Introduction to Buffers, Buffers |
9104 @section Buffer Lists | |
9105 @cindex buffer lists | |
9106 | |
9107 Recall earlier that buffers are @dfn{permanent} objects, i.e. that | |
9108 they remain around until explicitly deleted. This entails that there is | |
9109 a list of all the buffers in existence. This list is actually an | |
9110 assoc-list (mapping from the buffer's name to the buffer) and is stored | |
9111 in the global variable @code{Vbuffer_alist}. | |
9112 | |
9113 The order of the buffers in the list is important: the buffers are | |
9114 ordered approximately from most-recently-used to least-recently-used. | |
9115 Switching to a buffer using @code{switch-to-buffer}, | |
9116 @code{pop-to-buffer}, etc. and switching windows using | |
9117 @code{other-window}, etc. usually brings the new current buffer to the | |
9118 front of the list. @code{switch-to-buffer}, @code{other-buffer}, | |
9119 etc. look at the beginning of the list to find an alternative buffer to | |
9120 suggest. You can also explicitly move a buffer to the end of the list | |
9121 using @code{bury-buffer}. | |
9122 | |
9123 In addition to the global ordering in @code{Vbuffer_alist}, each frame | |
9124 has its own ordering of the list. These lists always contain the same | |
9125 elements as in @code{Vbuffer_alist} although possibly in a different | |
9126 order. @code{buffer-list} normally returns the list for the selected | |
9127 frame. This allows you to work in separate frames without things | |
9128 interfering with each other. | |
9129 | |
9130 The standard way to look up a buffer given a name is | |
9131 @code{get-buffer}, and the standard way to create a new buffer is | |
9132 @code{get-buffer-create}, which looks up a buffer with a given name, | |
9133 creating a new one if necessary. These operations correspond exactly | |
9134 with the symbol operations @code{intern-soft} and @code{intern}, | |
9135 respectively. You can also force a new buffer to be created using | |
9136 @code{generate-new-buffer}, which takes a name and (if necessary) makes | |
9137 a unique name from this by appending a number, and then creates the | |
9138 buffer. This is basically like the symbol operation @code{gensym}. | |
9139 | |
9140 @node Markers and Extents, The Buffer Object, Buffer Lists, Buffers | |
9141 @section Markers and Extents | |
9142 @cindex markers and extents | |
9143 @cindex extents, markers and | |
9144 | |
9145 Among the things associated with a buffer are things that are | |
9146 logically attached to certain buffer positions. This can be used to | |
9147 keep track of a buffer position when text is inserted and deleted, so | |
9148 that it remains at the same spot relative to the text around it; to | |
9149 assign properties to particular sections of text; etc. There are two | |
9150 such objects that are useful in this regard: they are @dfn{markers} and | |
9151 @dfn{extents}. | |
9152 | |
9153 A @dfn{marker} is simply a flag placed at a particular buffer | |
9154 position, which is moved around as text is inserted and deleted. | |
9155 Markers are used for all sorts of purposes, such as the @code{mark} that | |
9156 is the other end of textual regions to be cut, copied, etc. | |
9157 | |
9158 An @dfn{extent} is similar to two markers plus some associated | |
9159 properties, and is used to keep track of regions in a buffer as text is | |
9160 inserted and deleted, and to add properties (e.g. fonts) to particular | |
9161 regions of text. The external interface of extents is explained | |
9162 elsewhere. | |
9163 | |
9164 The important thing here is that markers and extents simply contain | |
9165 buffer positions in them as integers, and every time text is inserted or | |
9166 deleted, these positions must be updated. In order to minimize the | |
9167 amount of shuffling that needs to be done, the positions in markers and | |
9168 extents (there's one per marker, two per extent) are stored in Membpos's. | |
9169 This means that they only need to be moved when the text is physically | |
9170 moved in memory; since the gap structure tries to minimize this, it also | |
9171 minimizes the number of marker and extent indices that need to be | |
9172 adjusted. Look in @file{insdel.c} for the details of how this works. | |
9173 | |
9174 One other important distinction is that markers are @dfn{temporary} | |
9175 while extents are @dfn{permanent}. This means that markers disappear as | |
9176 soon as there are no more pointers to them, and correspondingly, there | |
9177 is no way to determine what markers are in a buffer if you are just | |
9178 given the buffer. Extents remain in a buffer until they are detached | |
9179 (which could happen as a result of text being deleted) or the buffer is | |
9180 deleted, and primitives do exist to enumerate the extents in a buffer. | |
9181 | |
9182 @node The Buffer Object, , Markers and Extents, Buffers | |
9183 @section The Buffer Object | |
9184 @cindex buffer object, the | |
9185 @cindex object, the buffer | |
9186 | |
9187 Buffers contain fields not directly accessible by the Lisp programmer. | |
9188 We describe them here, naming them by the names used in the C code. | |
9189 Many are accessible indirectly in Lisp programs via Lisp primitives. | |
9190 | |
9191 @table @code | |
9192 @item name | |
9193 The buffer name is a string that names the buffer. It is guaranteed to | |
9194 be unique. @xref{Buffer Names,,, lispref, XEmacs Lisp Reference | |
9195 Manual}. | |
9196 | |
9197 @item save_modified | |
9198 This field contains the time when the buffer was last saved, as an | |
9199 integer. @xref{Buffer Modification,,, lispref, XEmacs Lisp Reference | |
9200 Manual}. | |
9201 | |
9202 @item modtime | |
9203 This field contains the modification time of the visited file. It is | |
9204 set when the file is written or read. Every time the buffer is written | |
9205 to the file, this field is compared to the modification time of the | |
9206 file. @xref{Buffer Modification,,, lispref, XEmacs Lisp Reference | |
9207 Manual}. | |
9208 | |
9209 @item auto_save_modified | |
9210 This field contains the time when the buffer was last auto-saved. | |
9211 | |
9212 @item last_window_start | |
9213 This field contains the @code{window-start} position in the buffer as of | |
9214 the last time the buffer was displayed in a window. | |
9215 | |
9216 @item undo_list | |
9217 This field points to the buffer's undo list. @xref{Undo,,, lispref, | |
9218 XEmacs Lisp Reference Manual}. | |
9219 | |
9220 @item syntax_table_v | |
9221 This field contains the syntax table for the buffer. @xref{Syntax | |
9222 Tables,,, lispref, XEmacs Lisp Reference Manual}. | |
9223 | |
9224 @item downcase_table | |
9225 This field contains the conversion table for converting text to lower | |
9226 case. @xref{Case Tables,,, lispref, XEmacs Lisp Reference Manual}. | |
9227 | |
9228 @item upcase_table | |
9229 This field contains the conversion table for converting text to upper | |
9230 case. @xref{Case Tables,,, lispref, XEmacs Lisp Reference Manual}. | |
9231 | |
9232 @item case_canon_table | |
9233 This field contains the conversion table for canonicalizing text for | |
9234 case-folding search. @xref{Case Tables,,, lispref, XEmacs Lisp | |
9235 Reference Manual}. | |
9236 | |
9237 @item case_eqv_table | |
9238 This field contains the equivalence table for case-folding search. | |
9239 @xref{Case Tables,,, lispref, XEmacs Lisp Reference Manual}. | |
9240 | |
9241 @item display_table | |
9242 This field contains the buffer's display table, or @code{nil} if it | |
9243 doesn't have one. @xref{Display Tables,,, lispref, XEmacs Lisp | |
9244 Reference Manual}. | |
9245 | |
9246 @item markers | |
9247 This field contains the chain of all markers that currently point into | |
9248 the buffer. Deletion of text in the buffer, and motion of the buffer's | |
9249 gap, must check each of these markers and perhaps update it. | |
9250 @xref{Markers,,, lispref, XEmacs Lisp Reference Manual}. | |
9251 | |
9252 @item backed_up | |
9253 This field is a flag that tells whether a backup file has been made for | |
9254 the visited file of this buffer. | |
9255 | |
9256 @item mark | |
9257 This field contains the mark for the buffer. The mark is a marker, | |
9258 hence it is also included on the list @code{markers}. @xref{The Mark,,, | |
9259 lispref, XEmacs Lisp Reference Manual}. | |
9260 | |
9261 @item mark_active | |
9262 This field is non-@code{nil} if the buffer's mark is active. | |
9263 | |
9264 @item local_var_alist | |
9265 This field contains the association list describing the variables local | |
9266 in this buffer, and their values, with the exception of local variables | |
9267 that have special slots in the buffer object. (Those slots are omitted | |
9268 from this table.) @xref{Buffer-Local Variables,,, lispref, XEmacs Lisp | |
9269 Reference Manual}. | |
9270 | |
9271 @item modeline_format | |
9272 This field contains a Lisp object which controls how to display the mode | |
9273 line for this buffer. @xref{Modeline Format,,, lispref, XEmacs Lisp | |
9274 Reference Manual}. | |
9275 | |
9276 @item base_buffer | |
9277 This field holds the buffer's base buffer (if it is an indirect buffer), | |
9278 or @code{nil}. | |
9279 @end table | |
9280 | |
9281 @node Text, Multilingual Support, Buffers, Top | |
9282 @chapter Text | |
9283 @cindex text | |
9284 | |
9285 @menu | |
9286 * The Text in a Buffer:: Representation of the text in a buffer. | |
9287 * Ibytes and Ichars:: Representation of individual characters. | |
9288 * Byte-Char Position Conversion:: | |
9289 * Searching and Matching:: Higher-level algorithms. | |
9290 @end menu | |
9291 | |
9292 @node The Text in a Buffer, Ibytes and Ichars, Text, Text | |
8538 @section The Text in a Buffer | 9293 @section The Text in a Buffer |
8539 @cindex text in a buffer, the | 9294 @cindex text in a buffer, the |
8540 @cindex buffer, the text in a | 9295 @cindex buffer, the text in a |
8541 | 9296 |
8542 The text in a buffer consists of a sequence of zero or more | 9297 The text in a buffer consists of a sequence of zero or more |
8674 Ibytes underscores the fact that we are working with a string of bytes | 9429 Ibytes underscores the fact that we are working with a string of bytes |
8675 in the internal Emacs buffer representation rather than in one of a | 9430 in the internal Emacs buffer representation rather than in one of a |
8676 number of possible alternative representations (e.g. EUC-encoded text, | 9431 number of possible alternative representations (e.g. EUC-encoded text, |
8677 etc.). | 9432 etc.). |
8678 | 9433 |
8679 @node Buffer Lists | 9434 @node Ibytes and Ichars, Byte-Char Position Conversion, The Text in a Buffer, Text |
8680 @section Buffer Lists | |
8681 @cindex buffer lists | |
8682 | |
8683 Recall earlier that buffers are @dfn{permanent} objects, i.e. that | |
8684 they remain around until explicitly deleted. This entails that there is | |
8685 a list of all the buffers in existence. This list is actually an | |
8686 assoc-list (mapping from the buffer's name to the buffer) and is stored | |
8687 in the global variable @code{Vbuffer_alist}. | |
8688 | |
8689 The order of the buffers in the list is important: the buffers are | |
8690 ordered approximately from most-recently-used to least-recently-used. | |
8691 Switching to a buffer using @code{switch-to-buffer}, | |
8692 @code{pop-to-buffer}, etc. and switching windows using | |
8693 @code{other-window}, etc. usually brings the new current buffer to the | |
8694 front of the list. @code{switch-to-buffer}, @code{other-buffer}, | |
8695 etc. look at the beginning of the list to find an alternative buffer to | |
8696 suggest. You can also explicitly move a buffer to the end of the list | |
8697 using @code{bury-buffer}. | |
8698 | |
8699 In addition to the global ordering in @code{Vbuffer_alist}, each frame | |
8700 has its own ordering of the list. These lists always contain the same | |
8701 elements as in @code{Vbuffer_alist} although possibly in a different | |
8702 order. @code{buffer-list} normally returns the list for the selected | |
8703 frame. This allows you to work in separate frames without things | |
8704 interfering with each other. | |
8705 | |
8706 The standard way to look up a buffer given a name is | |
8707 @code{get-buffer}, and the standard way to create a new buffer is | |
8708 @code{get-buffer-create}, which looks up a buffer with a given name, | |
8709 creating a new one if necessary. These operations correspond exactly | |
8710 with the symbol operations @code{intern-soft} and @code{intern}, | |
8711 respectively. You can also force a new buffer to be created using | |
8712 @code{generate-new-buffer}, which takes a name and (if necessary) makes | |
8713 a unique name from this by appending a number, and then creates the | |
8714 buffer. This is basically like the symbol operation @code{gensym}. | |
8715 | |
8716 @node Markers and Extents | |
8717 @section Markers and Extents | |
8718 @cindex markers and extents | |
8719 @cindex extents, markers and | |
8720 | |
8721 Among the things associated with a buffer are things that are | |
8722 logically attached to certain buffer positions. This can be used to | |
8723 keep track of a buffer position when text is inserted and deleted, so | |
8724 that it remains at the same spot relative to the text around it; to | |
8725 assign properties to particular sections of text; etc. There are two | |
8726 such objects that are useful in this regard: they are @dfn{markers} and | |
8727 @dfn{extents}. | |
8728 | |
8729 A @dfn{marker} is simply a flag placed at a particular buffer | |
8730 position, which is moved around as text is inserted and deleted. | |
8731 Markers are used for all sorts of purposes, such as the @code{mark} that | |
8732 is the other end of textual regions to be cut, copied, etc. | |
8733 | |
8734 An @dfn{extent} is similar to two markers plus some associated | |
8735 properties, and is used to keep track of regions in a buffer as text is | |
8736 inserted and deleted, and to add properties (e.g. fonts) to particular | |
8737 regions of text. The external interface of extents is explained | |
8738 elsewhere. | |
8739 | |
8740 The important thing here is that markers and extents simply contain | |
8741 buffer positions in them as integers, and every time text is inserted or | |
8742 deleted, these positions must be updated. In order to minimize the | |
8743 amount of shuffling that needs to be done, the positions in markers and | |
8744 extents (there's one per marker, two per extent) are stored in Membpos's. | |
8745 This means that they only need to be moved when the text is physically | |
8746 moved in memory; since the gap structure tries to minimize this, it also | |
8747 minimizes the number of marker and extent indices that need to be | |
8748 adjusted. Look in @file{insdel.c} for the details of how this works. | |
8749 | |
8750 One other important distinction is that markers are @dfn{temporary} | |
8751 while extents are @dfn{permanent}. This means that markers disappear as | |
8752 soon as there are no more pointers to them, and correspondingly, there | |
8753 is no way to determine what markers are in a buffer if you are just | |
8754 given the buffer. Extents remain in a buffer until they are detached | |
8755 (which could happen as a result of text being deleted) or the buffer is | |
8756 deleted, and primitives do exist to enumerate the extents in a buffer. | |
8757 | |
8758 @node Ibytes and Ichars | |
8759 @section Ibytes and Ichars | 9435 @section Ibytes and Ichars |
8760 @cindex Ibytes and Ichars | 9436 @cindex Ibytes and Ichars |
8761 @cindex Ichars, Ibytes and | 9437 @cindex Ichars, Ibytes and |
8762 | 9438 |
8763 Not yet documented. | 9439 Not yet documented. |
8764 | 9440 |
8765 @node The Buffer Object | 9441 @node Byte-Char Position Conversion, Searching and Matching, Ibytes and Ichars, Text |
8766 @section The Buffer Object | 9442 @section Byte-Char Position Conversion |
8767 @cindex buffer object, the | 9443 @cindex byte-char position conversion |
8768 @cindex object, the buffer | 9444 @cindex position conversion, byte-char |
8769 | 9445 @cindex conversion, byte-char position |
8770 Buffers contain fields not directly accessible by the Lisp programmer. | 9446 |
8771 We describe them here, naming them by the names used in the C code. | 9447 Oct 2004: |
8772 Many are accessible indirectly in Lisp programs via Lisp primitives. | 9448 |
8773 | 9449 This is what I wrote when describing the previous algorithm: |
8774 @table @code | 9450 |
8775 @item name | 9451 @quotation |
8776 The buffer name is a string that names the buffer. It is guaranteed to | 9452 The basic algorithm we use is to keep track of a known region of |
8777 be unique. @xref{Buffer Names,,, lispref, XEmacs Lisp Reference | 9453 characters in each buffer, all of which are of the same width. We keep |
8778 Manual}. | 9454 track of the boundaries of the region in both Charbpos and Bytebpos |
8779 | 9455 coordinates and also keep track of the char width, which is 1 - 4 bytes. |
8780 @item save_modified | 9456 If the position we're translating is not in the known region, then we |
8781 This field contains the time when the buffer was last saved, as an | 9457 invoke a function to update the known region to surround the position in |
8782 integer. @xref{Buffer Modification,,, lispref, XEmacs Lisp Reference | 9458 question. This assumes locality of reference, which is usually the |
8783 Manual}. | 9459 case. |
8784 | 9460 |
8785 @item modtime | 9461 Note that the function to update the known region can be simple or |
8786 This field contains the modification time of the visited file. It is | 9462 complicated depending on how much information we cache. In addition to |
8787 set when the file is written or read. Every time the buffer is written | 9463 the known region, we always cache the correct conversions for point, |
8788 to the file, this field is compared to the modification time of the | 9464 BEGV, and ZV, and in addition to this we cache 16 positions where the |
8789 file. @xref{Buffer Modification,,, lispref, XEmacs Lisp Reference | 9465 conversion is known. We only look in the cache or update it when we |
8790 Manual}. | 9466 need to move the known region more than a certain amount (currently 50 |
8791 | 9467 chars), and then we throw away a "random" value and replace it with the |
8792 @item auto_save_modified | 9468 newly calculated value. |
8793 This field contains the time when the buffer was last auto-saved. | 9469 |
8794 | 9470 Finally, we maintain an extra flag that tracks whether the buffer is |
8795 @item last_window_start | 9471 entirely ASCII, to speed up the conversions even more. This flag is |
8796 This field contains the @code{window-start} position in the buffer as of | 9472 actually of dubious value because in an entirely-ASCII buffer the known |
8797 the last time the buffer was displayed in a window. | 9473 region will always span the entire buffer (in fact, we update the flag |
8798 | 9474 based on this fact), and so all we're saving is a few machine cycles. |
8799 @item undo_list | 9475 |
8800 This field points to the buffer's undo list. @xref{Undo,,, lispref, | 9476 A potentially smarter method than what we do with known regions and |
8801 XEmacs Lisp Reference Manual}. | 9477 cached positions would be to keep some sort of pseudo-extent layer over |
8802 | 9478 the buffer; maybe keep track of the charbpos/bytebpos correspondence at |
8803 @item syntax_table_v | 9479 the beginning of each line, which would allow us to do a binary search |
8804 This field contains the syntax table for the buffer. @xref{Syntax | 9480 over the pseudo-extents to narrow things down to the correct line, at |
8805 Tables,,, lispref, XEmacs Lisp Reference Manual}. | 9481 which point you could use a linear movement method. This would also |
8806 | 9482 mesh well with efficiently implementing a line-numbering scheme. |
8807 @item downcase_table | 9483 However, you have to weigh the amount of time spent updating the cache |
8808 This field contains the conversion table for converting text to lower | 9484 vs. the savings that result from it. In reality, we modify the buffer |
8809 case. @xref{Case Tables,,, lispref, XEmacs Lisp Reference Manual}. | 9485 far less often than we access it, so a cache of this sort that provides |
8810 | 9486 guaranteed LOG (N) performance (or perhaps N * LOG (N), if we set a |
8811 @item upcase_table | 9487 maximum on the cache size) would indeed be a win, particularly in very |
8812 This field contains the conversion table for converting text to upper | 9488 large buffers. If we ever implement this, we should probably set a |
8813 case. @xref{Case Tables,,, lispref, XEmacs Lisp Reference Manual}. | 9489 reasonably high minimum below which we use the old method, because the |
8814 | 9490 time spent updating the fancy cache would likely become dominant when |
8815 @item case_canon_table | 9491 making buffer modifications in smaller buffers. |
8816 This field contains the conversion table for canonicalizing text for | 9492 |
8817 case-folding search. @xref{Case Tables,,, lispref, XEmacs Lisp | 9493 Note also that we have to multiply or divide by the char width in order |
8818 Reference Manual}. | 9494 to convert the positions. We do some tricks to avoid ever actually |
8819 | 9495 having to do a multiply or divide, because that is typically an |
8820 @item case_eqv_table | 9496 expensive operation (esp. divide). Multiplying or dividing by 1, 2, or |
8821 This field contains the equivalence table for case-folding search. | 9497 4 can be implemented simply as a shift left or shift right, and we keep |
8822 @xref{Case Tables,,, lispref, XEmacs Lisp Reference Manual}. | 9498 track of a shifter value (0, 1, or 2) indicating how much to shift. |
8823 | 9499 Multiplying by 3 can be implemented by doubling and then adding the |
8824 @item display_table | 9500 original value. Dividing by 3, alas, cannot be implemented in any |
8825 This field contains the buffer's display table, or @code{nil} if it | 9501 simple shift/subtract method, as far as I know; so we just do a table |
8826 doesn't have one. @xref{Display Tables,,, lispref, XEmacs Lisp | 9502 lookup. For simplicity, we use a table of size 128K, which indexes the |
8827 Reference Manual}. | 9503 "divide-by-3" values for the first 64K non-negative numbers. (Note that |
8828 | 9504 we can increase the size up to 384K, i.e. indexing the first 192K |
8829 @item markers | 9505 non-negative numbers, while still using shorts in the array.) This also |
8830 This field contains the chain of all markers that currently point into | 9506 means that the size of the known region can be at most 64K for |
8831 the buffer. Deletion of text in the buffer, and motion of the buffer's | 9507 width-three characters. |
8832 gap, must check each of these markers and perhaps update it. | 9508 @end quotation |
8833 @xref{Markers,,, lispref, XEmacs Lisp Reference Manual}. | 9509 |
8834 | 9510 Unfortunately, it turned out that the implementation had serious problems |
8835 @item backed_up | 9511 which had never been corrected. In particular, the known region had a |
8836 This field is a flag that tells whether a backup file has been made for | 9512 large tendency to become zero-length and stay that way. |
8837 the visited file of this buffer. | 9513 |
8838 | 9514 So I decided to port the algorithm from FSF 21.3, in markers.c. |
8839 @item mark | 9515 |
8840 This field contains the mark for the buffer. The mark is a marker, | 9516 This algorithm is fairly simple. Instead of using markers I kept the cache |
8841 hence it is also included on the list @code{markers}. @xref{The Mark,,, | 9517 array of known positions from the previous implementation. |
8842 lispref, XEmacs Lisp Reference Manual}. | 9518 |
8843 | 9519 Basically, we keep a number of positions cached: |
8844 @item mark_active | 9520 |
8845 This field is non-@code{nil} if the buffer's mark is active. | 9521 @itemize @bullet |
8846 | 9522 @item |
8847 @item local_var_alist | 9523 the actual end of the buffer |
8848 This field contains the association list describing the variables local | 9524 @item |
8849 in this buffer, and their values, with the exception of local variables | 9525 the beginning and end of the accessible region |
8850 that have special slots in the buffer object. (Those slots are omitted | 9526 @item |
8851 from this table.) @xref{Buffer-Local Variables,,, lispref, XEmacs Lisp | 9527 the value of point |
8852 Reference Manual}. | 9528 @item |
8853 | 9529 the position of the gap |
8854 @item modeline_format | 9530 @item |
8855 This field contains a Lisp object which controls how to display the mode | 9531 the last value we computed |
8856 line for this buffer. @xref{Modeline Format,,, lispref, XEmacs Lisp | 9532 @item |
8857 Reference Manual}. | 9533 a set of positions that are "far away" from previously computed positions |
8858 | 9534 (5000 chars currently; #### perhaps should be smaller) |
8859 @item base_buffer | 9535 @end itemize |
8860 This field holds the buffer's base buffer (if it is an indirect buffer), | 9536 |
8861 or @code{nil}. | 9537 For each position, we @code{CONSIDER()} it. This means: |
8862 @end table | 9538 |
8863 | 9539 @itemize @bullet |
8864 @node Searching and Matching | 9540 @item |
9541 If the position is what we're looking for, return it directly. | |
9542 @item | |
9543 Starting with the beginning and end of the buffer, we successively | |
9544 compute the smallest enclosing range of known positions. If at any | |
9545 point we discover that this range has the same byte and char length | |
9546 (i.e. is entirely single-byte), then our computation is trivial. | |
9547 @item | |
9548 If at any point we get a small enough range (50 chars currently), | |
9549 stop considering further positions. | |
9550 @end itemize | |
9551 | |
9552 Otherwise, once we have an enclosing range, see which side is closer, and | |
9553 iterate until we find the desired value. As an optimization, I replaced | |
9554 the simple loop in FSF with the use of @code{bytecount_to_charcount()}, | |
9555 @code{charcount_to_bytecount()}, @code{bytecount_to_charcount_down()}, or | |
9556 @code{charcount_to_bytecount_down()}. (The latter two I added for this purpose.) | |
9557 These scan 4 or 8 bytes at a time through purely single-byte characters. | |
9558 | |
9559 If the amount we had to scan was more than our "far away" distance (5000 | |
9560 characters, see above), then cache the new position. | |
9561 | |
9562 #### Things to do: | |
9563 | |
9564 @itemize @bullet | |
9565 @item | |
9566 Look at the most recent GNU Emacs to see whether anything has changed. | |
9567 @item | |
9568 Think about whether it makes sense to try to implement some sort of | |
9569 known region or list of "known regions", like we had before. This would | |
9570 be a region of entirely single-byte characters that we can check very | |
9571 quickly. (Previously I used a range of same-width characters of any | |
9572 size; but this adds extra complexity and slows down the scanning, and is | |
9573 probably not worth it.) As part of the scanning process in | |
9574 @code{bytecount_to_charcount()} et al, we skip over chunks of entirely | |
9575 single-byte chars, so it should be easy to remember the last one. | |
9576 Presumably what we should do is keep track of the largest known surrounding | |
9577 entirely-single-byte region for each of the cache positions as well as | |
9578 perhaps the last-cached position. We want to be careful not to get bitten | |
9579 by the previous problem of having the known region getting reset too | |
9580 often. If we implement this, we might well want to continue scanning | |
9581 some distance past the desired position (maybe 300-1000 bytes) if we are | |
9582 in a single-byte range so that we won't end up expanding the known range | |
9583 one position at a time and entering the function each time. | |
9584 @item | |
9585 Think about whether it makes sense to keep the position cache sorted. | |
9586 This would allow it to be larger and finer-grained in its positions. | |
9587 Note that with FSF's use of markers, they were sorted, but this | |
9588 was not really made good use of. With an array, we can do binary searching | |
9589 to quickly find the smallest range. We would probably want to make use of | |
9590 the gap-array code in extents.c. | |
9591 @end itemize | |
9592 | |
9593 Note that FSF's algorithm checked @strong{ALL} markers, not just the ones cached | |
9594 by this algorithm. This includes markers created by the user as well as | |
9595 both ends of any overlays. We could do similarly, and our extents could | |
9596 keep both byte and character positions rather than just the former. (But | |
9597 this would probably be overkill. We should just use our cache instead. | |
9598 Any place an extent was set was surely already visited by the char<-->byte | |
9599 conversion routines.) | |
9600 | |
9601 @node Searching and Matching, , Byte-Char Position Conversion, Text | |
8865 @section Searching and Matching | 9602 @section Searching and Matching |
8866 @cindex searching | 9603 @cindex searching |
8867 @cindex matching | 9604 @cindex matching |
8868 | 9605 |
8869 Very incomplete, limited to a brief introduction. | 9606 Very incomplete, limited to a brief introduction. |
9080 @end enumerate | 9817 @end enumerate |
9081 | 9818 |
9082 But if you keep your eye on the "switch in a loop" structure, you | 9819 But if you keep your eye on the "switch in a loop" structure, you |
9083 should be able to understand the parts you need. | 9820 should be able to understand the parts you need. |
9084 | 9821 |
9085 | 9822 @node Multilingual Support, The Lisp Reader and Compiler, Text, Top |
9086 @node MULE Character Sets and Encodings, The Lisp Reader and Compiler, Buffers and Textual Representation, Top | 9823 @chapter Multilingual Support |
9087 @chapter MULE Character Sets and Encodings | |
9088 @cindex Mule character sets and encodings | 9824 @cindex Mule character sets and encodings |
9089 @cindex character sets and encodings, Mule | 9825 @cindex character sets and encodings, Mule |
9090 @cindex encodings, Mule character sets and | 9826 @cindex encodings, Mule character sets and |
9827 | |
9828 @emph{NOTE}: There is a great deal of overlapping and redundant | |
9829 information in this chapter. Ben wrote introductions to Mule issues a | |
9830 number of times, each time not realizing that he had already written | |
9831 another introduction previously. Hopefully, in time these will all be | |
9832 integrated. | |
9833 | |
9834 @emph{NOTE}: The information at the top of the source file | |
9835 @file{text.c} is more complete than the following, and there is also a | |
9836 list of all other places to look for text/I18N-related info. Also look in | |
9837 @file{text.h} for info about the DFC and Eistring API's. | |
9091 | 9838 |
9092 Recall that there are two primary ways that text is represented in | 9839 Recall that there are two primary ways that text is represented in |
9093 XEmacs. The @dfn{buffer} representation sees the text as a series of | 9840 XEmacs. The @dfn{buffer} representation sees the text as a series of |
9094 bytes (Ibytes), with a variable number of bytes used per character. | 9841 bytes (Ibytes), with a variable number of bytes used per character. |
9095 The @dfn{character} representation sees the text as a series of integers | 9842 The @dfn{character} representation sees the text as a series of integers |
9100 Lisp strings and buffers, and because of this, it is the ``default'' | 9847 Lisp strings and buffers, and because of this, it is the ``default'' |
9101 representation that text comes in. The reason for using this | 9848 representation that text comes in. The reason for using this |
9102 representation is that it's compact and is compatible with ASCII. | 9849 representation is that it's compact and is compatible with ASCII. |
9103 | 9850 |
9104 @menu | 9851 @menu |
9105 * Character Sets:: | 9852 * Introduction to Multilingual Issues #1:: |
9106 * Encodings:: | 9853 * Introduction to Multilingual Issues #2:: |
9107 * Internal Mule Encodings:: | 9854 * Introduction to Multilingual Issues #3:: |
9108 * CCL:: | 9855 * Introduction to Multilingual Issues #4:: |
9856 * Character Sets:: | |
9857 * Encodings:: | |
9858 * Internal Mule Encodings:: | |
9859 * Byte/Character Types; Buffer Positions; Other Typedefs:: | |
9860 * Internal Text API's:: | |
9861 * Coding for Mule:: | |
9862 * CCL:: | |
9863 * Modules for Internationalization:: | |
9109 @end menu | 9864 @end menu |
9110 | 9865 |
9111 @node Character Sets | 9866 @node Introduction to Multilingual Issues #1, Introduction to Multilingual Issues #2, Multilingual Support, Multilingual Support |
9867 @section Introduction to Multilingual Issues #1 | |
9868 @cindex introduction to multilingual issues #1 | |
9869 | |
9870 There is an introduction to these issues in the Lisp Reference manual. | |
9871 @xref{Internationalization Terminology,,, lispref, XEmacs Lisp Reference | |
9872 Manual}. Among other documentation that may be of interest to internals | |
9873 programmers is ISO-2022 (@pxref{ISO 2022,,, lispref, XEmacs Lisp | |
9874 Reference Manual}) and CCL (@pxref{CCL,,, lispref, XEmacs Lisp Reference | |
9875 Manual}) | |
9876 | |
9877 @node Introduction to Multilingual Issues #2, Introduction to Multilingual Issues #3, Introduction to Multilingual Issues #1, Multilingual Support | |
9878 @section Introduction to Multilingual Issues #2 | |
9879 @cindex introduction to multilingual issues #2 | |
9880 | |
9881 @subheading Introduction | |
9882 | |
9883 This document covers a number of design issues, problems and proposals | |
9884 with regards to XEmacs MULE. At first we present some definitions and | |
9885 some aspects of the design that have been agreed upon. Then we present | |
9886 some issues and problems that need to be addressed, and then I include a | |
9887 proposal of mine to address some of these issues. When there are other | |
9888 proposals, for example from Olivier, these will be appended to the end | |
9889 of this document. | |
9890 | |
9891 @subheading Definitions and Design Basics | |
9892 | |
9893 First, @dfn{text} is defined to be a series of characters which together | |
9894 defines an utterance or partial utterance in some language. | |
9895 Generally, this language is a human language, but it may also be a | |
9896 computer language if the computer language uses a representation close | |
9897 enough to that of human languages for it to also make sense to call its | |
9898 representation text. Text is opposed to @dfn{binary}, which is a sequence | |
9899 of bytes, representing machine-readable but not human-readable data. | |
9900 A @dfn{byte} is merely a number within a predefined range, which nowadays is | |
9901 nearly always zero to 255. A @dfn{character} is a unit of text. What makes | |
9902 one character different from another is not always clear-cut. It is | |
9903 generally related to the appearance of the character, although perhaps | |
9904 not any possible appearance of that character, but some sort of ideal | |
9905 appearance that is assigned to a character. Whether two characters | |
9906 that look very similar are actually the same depends on various | |
9907 factors such as political ones, such as whether the characters are | |
9908 used to mean similar sorts of things, or behave similarly in similar | |
9909 contexts. In any case, it is not always clearly defined whether two | |
9910 characters are actually the same or not. In practice, however, this | |
9911 is more or less agreed upon. | |
9912 | |
9913 A @dfn{character set} is just that, a set of one or more characters. | |
9914 The set is unique in that there will not be more than one instance of | |
9915 the same character in a character set, and logically is unordered, | |
9916 although an order is often imposed or suggested for the characters in | |
9917 the character set. We can also define an @dfn{order} on a character | |
9918 set, which is a way of assigning a unique number, or possibly a pair of | |
9919 numbers, or a triplet of numbers, or even a set of four or more numbers | |
9920 to each character in the character set. The combination of an order in | |
9921 the character set results in an @dfn{ordered character set}. In an | |
9922 ordered character set, there is an upper limit and a lower limit on the | |
9923 possible values that a character, or that any number within the set of | |
9924 numbers assigned to a character, can take. However, the lower limit | |
9925 does not have to start at zero or one, or anywhere else in particular, | |
9926 nor does the upper limit have to end anywhere particular, and there may | |
9927 be gaps within these ranges such that particular numbers or sets of | |
9928 numbers do not have a corresponding character, even though they are | |
9929 within the upper and lower limits. For example, @dfn{ASCII} defines a | |
9930 very standard ordered character set. It is normally defined to be 94 | |
9931 characters in the range 33 through 126 inclusive on both ends, with | |
9932 every possible character within this range being actually present in the | |
9933 character set. | |
9934 | |
9935 Sometimes the ASCII character set is extended to include what are called | |
9936 @dfn{non-printing characters}. Non-printing characters are characters | |
9937 which instead of really being displayed in a more or less rectangular | |
9938 block, like all other characters, instead indicate certain functions | |
9939 typically related to either control of the display upon which the | |
9940 characters are being displayed, or have some effect on a communications | |
9941 channel that may be currently open and transmitting characters, or may | |
9942 change the meaning of future characters as they are being decoded, or | |
9943 some other similar function. You might say that non-printing characters | |
9944 are somewhat of a hack because they are a special exception to the | |
9945 standard concept of a character as being a printed glyph that has some | |
9946 direct correspondence in the non-computer world. | |
9947 | |
9948 With non-printing characters in mind, the 94-character ordered character | |
9949 set called ASCII is often extended into a 96-character ordered character | |
9950 set, also often called ASCII, which includes in addition to the 94 | |
9951 characters already mentioned, two non-printing characters, one called | |
9952 space and assigned the number 32, just below the bottom of the previous | |
9953 range, and another called @dfn{delete} or @dfn{rubout}, which is given | |
9954 number 127 just above the end of the previous range. Thus to reiterate, | |
9955 the result is a 96-character ordered character set, whose characters | |
9956 take the values from 32 to 127 inclusive. Sometimes ASCII is further | |
9957 extended to contain 32 more non-printing characters, which are given the | |
9958 numbers zero through 31 so that the result is a 128-character ordered | |
9959 character set with characters numbered zero through 127, and with many | |
9960 non-printing characters. Another way to look at this, and the way that | |
9961 is normally taken by XEmacs MULE, is that the characters that would be | |
9962 in the range 30 through 31 in the most extended definition of ASCII, | |
9963 instead form their own ordered character set, which is called | |
9964 @dfn{control zero}, and consists of 32 characters in the range zero | |
9965 through 31. A similar ordered character set called @dfn{control one} is | |
9966 also created, and it contains 32 more non-printing characters in the | |
9967 range 128 through 159. Note that none of these three ordered character | |
9968 sets overlaps in any of the numbers they are assigned to their | |
9969 characters, so they can all be used at once. Note further that the same | |
9970 character can occur in more than one character set. This was shown | |
9971 above, for example, in two different ordered character sets we defined, | |
9972 one of which we could have called @dfn{ASCII}, and the other | |
9973 @dfn{ASCII-extended}, to show that it had extended by two non-printable | |
9974 characters. Most of the characters in these two character sets are | |
9975 shared and present in both of them. | |
9976 | |
9977 Note that there is no restriction on the size of the character set, or | |
9978 on the numbers that are assigned to characters in an ordered character | |
9979 set. It is often extremely useful to represent a sequence of characters | |
9980 as a sequence of bytes, where a byte as defined above is a number in the | |
9981 range zero to 255. An @dfn{encoding} does precisely this. It is simply | |
9982 a mapping from a sequence of characters, possibly augmented with | |
9983 information indicating the character set that each of these characters | |
9984 belongs to, to a sequence of bytes which represents that sequence of | |
9985 characters and no other -- which is to say the mapping is reversible. | |
9986 | |
9987 A @dfn{coding system} is a set of rules for encoding a sequence of | |
9988 characters augmented with character set information into a sequence of | |
9989 bytes, and later performing the reverse operation. It is frequently | |
9990 possible to group coding systems into classes or types based on common | |
9991 features. Typically, for example, a particular coding system class | |
9992 may contain a base coding system which specifies some of the rules, | |
9993 but leaves the rest unspecified. Individual members of the coding | |
9994 system class are formed by starting with the base coding system, and | |
9995 augmenting it with additional rules to produce a particular coding | |
9996 system, what you might think of as a sort of variation within a | |
9997 theme. | |
9998 | |
9999 @subheading XEmacs Specific Definitions | |
10000 | |
10001 First of all, in XEmacs, the concept of character is a little different | |
10002 from the general definition given above. For one thing, the character | |
10003 set that a character belongs to may or may not be an inherent part of | |
10004 the character itself. In other words, the same character occurring in | |
10005 two different character sets may appear in XEmacs as two different | |
10006 characters. This is generally the case now, but we are attempting to | |
10007 move in the other direction. Different proposals may have different | |
10008 ideas about exactly the extent to which this change will be carried out. | |
10009 The general trend, though, is to represent all information about a | |
10010 character other than the character itself, using text properties | |
10011 attached to the character. That way two instances of the same character | |
10012 will look the same to lisp code that merely retrieves the character, and | |
10013 does not also look at the text properties of that character. Everyone | |
10014 involved is in agreement in doing it this way with all Latin characters, | |
10015 and in fact for all characters other than Chinese, Japanese, and Korean | |
10016 ideographs. For those, there may be a difference of opinion. | |
10017 | |
10018 A second difference between the general definition of character and the | |
10019 XEmacs usage of character is that each character is assigned a unique | |
10020 number that distinguishes it from all other characters in the world, or | |
10021 at the very least, from all other characters currently existing anywhere | |
10022 inside the current XEmacs invocation. (If there is a case where the | |
10023 weaker statement applies, but not the stronger statement, it would | |
10024 possibly be with composite characters and any other such characters that | |
10025 are created on the sly.) | |
10026 | |
10027 This unique number is called the @dfn{character representation} of the | |
10028 character, and its particular details are a matter of debate. There is | |
10029 the current standard in use that it is undoubtedly going to change. | |
10030 What has definitely been agreed upon is that it will be an integer, more | |
10031 specifically a positive integer, represented with less than or equal to | |
10032 31 bits on a 32-bit architecture, and possibly up to 63 bits on a 64-bit | |
10033 architecture, with the proviso that any characters that whose | |
10034 representation would fit in a 64-bit architecture, but not on a 32-bit | |
10035 architecture, would be used only for composite characters, and others | |
10036 that would satisfy the weak uniqueness property mentioned above, but not | |
10037 with the strong uniqueness property. | |
10038 | |
10039 At this point, it is useful to talk about the different representations | |
10040 that a sequence of characters can take. The simplest representation is | |
10041 simply as a sequence of characters, and this is called the @dfn{Lisp | |
10042 representation} of text, because it is the representation that Lisp | |
10043 programs see. Other representations include the external | |
10044 representation, which refers to any encoding of the sequence of | |
10045 characters, using the definition of encoding mentioned above. | |
10046 Typically, text in the external representation is used outside of | |
10047 XEmacs, for example in files, e-mail messages, web sites, and the like. | |
10048 Another representation for a sequence of characters is what I will call | |
10049 the @dfn{byte representation}, and it represents the way that XEmacs | |
10050 internally represents text in a buffer, or in a string. Potentially, | |
10051 the representation could be different between a buffer and a string, and | |
10052 then the terms @dfn{buffer byte representation} and @dfn{string byte | |
10053 representation} would be used, but in practice I don't think this will | |
10054 occur. It will be possible, of course, for buffers and strings, or | |
10055 particular buffers and particular strings, to contain different | |
10056 sub-representations of a single representation. For example, Olivier's | |
10057 1-2-4 proposal allows for three sub-representations of his internal byte | |
10058 representation, allowing for 1 byte, 2 bytes, and 4 byte width | |
10059 characters respectively. A particular string may be in one | |
10060 sub-representation, and a particular buffer in another | |
10061 sub-representation, but overall both are following the same byte | |
10062 representation. I do not use the term @dfn{internal representation} | |
10063 here, as many people have, because it is potentially ambiguous. | |
10064 | |
10065 Another representation is called the @dfn{array of characters | |
10066 representation}. This is a representation on the C-level in which the | |
10067 sequence of text is represented, not using the byte representation, but | |
10068 by using an array of characters, each represented using the character | |
10069 representation. This sort of representation is often used by redisplay | |
10070 because it is more convenient to work with than any of the other | |
10071 internal representations. | |
10072 | |
10073 The term @dfn{binary representation} may also be heard. Binary | |
10074 representation is used to represent binary data. When binary data is | |
10075 represented in the lisp representation, an equivalence is simply set up | |
10076 between bytes zero through 255, and characters zero through 255. These | |
10077 characters come from four character sets, which are from bottom to top, | |
10078 control zero, ASCII, control 1, and Latin 1. Together, they comprise | |
10079 256 characters, and are a good mapping for the 256 possible bytes in a | |
10080 binary representation. Binary representation could also be used to | |
10081 refer to an external representation of the binary data, which is a | |
10082 simple direct byte-to-byte representation. No internal representation | |
10083 should ever be referred to as a binary representation because of | |
10084 ambiguity. The terms character set/encoding system were defined | |
10085 generally, above. In XEmacs, the equivalent concepts exist, although | |
10086 character set has been shortened to charset, and in fact represents | |
10087 specifically an ordered character set. For each possible charset, and | |
10088 for each possible coding system, there is an associated object in | |
10089 XEmacs. These objects will be of type charset and coding system, | |
10090 respectively. Charsets and coding systems are divided into classes, or | |
10091 @dfn{types}, the normal term under XEmacs, and all possible charsets | |
10092 encoding systems that may be defined must be in one of these types. If | |
10093 you need to create a charset or coding system that is not one of these | |
10094 types, you will have to modify the C code to support this new type. | |
10095 Some of the existing or soon-to-be-created types are, or will be, | |
10096 generic enough so that this shouldn't be an issue. Note also that the | |
10097 byte encoding for text and the character coding of a character are | |
10098 closely related. You might say that ideally each is the simplest | |
10099 equivalent of the other given the general constraints on each | |
10100 representation. | |
10101 | |
10102 To be specific, in the current MULE representation, | |
10103 | |
10104 @enumerate | |
10105 @item | |
10106 Characters encode both the character itself and the character set | |
10107 that it comes from. These character sets are always assumed to be | |
10108 representable as an ordered character set of size 96 or of size 96 | |
10109 by 96, or the trivially-related sizes 94 and 94 by 94. The only | |
10110 allowable exceptions are the control zero and control one character | |
10111 sets, which are of size 32. Character sets which do not naturally | |
10112 have a compatible ordering such as this are shoehorned into an | |
10113 ordered character set, or possibly two ordered character sets of a | |
10114 compatible size. | |
10115 @item | |
10116 The variable width byte representation was deliberately chosen to | |
10117 allow scanning text forwards and backwards efficiently. This | |
10118 necessitated defining the possible bytes into three ranges which | |
10119 we shall call A, B, and C. Range A is used exclusively for | |
10120 single-byte characters, which is to say characters that are | |
10121 representing using only one contiguous byte. Multi-byte | |
10122 characters are always represented by using one byte from Range B, | |
10123 followed by one or more bytes from Range C. What this means is | |
10124 that bytes that begin a character are unequivocally distinguished | |
10125 from bytes that do not begin a character, and therefore there is | |
10126 never a problem scaling backwards and finding the beginning of a | |
10127 character. Know that UTF8 adopts a proposal that is very similar | |
10128 in spirit in that it uses separate ranges for the first byte of a | |
10129 multi byte sequence, and the following bytes in multi-byte | |
10130 sequence. | |
10131 @item | |
10132 Given the fact that all ordered character sets allowed were | |
10133 essentially 96 characters per dimension, it made perfect sense to | |
10134 make Range C comprise 96 bytes. With a little more tweaking, the | |
10135 currently-standard MULE byte representation was created, and was | |
10136 drafted from this. | |
10137 @item | |
10138 The MULE byte representation defined four basic representations for | |
10139 characters, which would take up from one to four bytes, | |
10140 respectively. The MULE character representation thus had the | |
10141 following constraints: | |
10142 @enumerate | |
10143 @item | |
10144 Character numbers zero through 255 should represent the | |
10145 characters that binary values zero through 255 would be | |
10146 mapped onto. (Note: this was not the case in Kenichi Handa's | |
10147 version of this representation, but I changed it.) | |
10148 @item | |
10149 The four sub-classes of representation in the MULE byte | |
10150 representation should correspond to four contiguous | |
10151 non-overlapping ranges of characters. | |
10152 @item | |
10153 The algorithmic conversion between the single character | |
10154 represented in the byte representation and in the character | |
10155 representation should be as easy as possible. | |
10156 @item | |
10157 Given the previous constraints, the character representation | |
10158 should be as compact as possible, which is to say it should | |
10159 use the least number of bits possible. | |
10160 @end enumerate | |
10161 @end enumerate | |
10162 | |
10163 So you see that the entire structure of the byte and character | |
10164 representations stemmed from a very small number of basic choices, | |
10165 which were | |
10166 | |
10167 @enumerate | |
10168 @item | |
10169 the choice to encode character set information in a character | |
10170 @item | |
10171 the choice to assume that all character sets would have an order | |
10172 imposed upon them with 96 characters per one or two | |
10173 dimensions. (This is less arbitrary than it seems--it follows | |
10174 ISO-2022) | |
10175 @item | |
10176 the choice to use a variable width byte representation. | |
10177 @end enumerate | |
10178 | |
10179 What this means is that you cannot really separate the byte | |
10180 representation, the character representation, and the assumptions made | |
10181 about characters and whether they represent character sets from each | |
10182 other. All of these are closely intertwined, and for purposes of | |
10183 simplicity, they should be designed together. If you change one | |
10184 representation without changing another, you are in essence creating a | |
10185 completely new design with its own attendant problems--since your new | |
10186 design is likely to be quite complex and not very coherent with | |
10187 regards to the translation between the character and byte | |
10188 representations, you are likely to run into problems. | |
10189 | |
10190 @node Introduction to Multilingual Issues #3, Introduction to Multilingual Issues #4, Introduction to Multilingual Issues #2, Multilingual Support | |
10191 @section Introduction to Multilingual Issues #3 | |
10192 @cindex introduction to multilingual issues #3 | |
10193 | |
10194 In XEmacs, Mule is a code word for the support for input handling and | |
10195 display of multi-lingual text. This section provides an overview of how | |
10196 this support impacts the C and Lisp code in XEmacs. It is important for | |
10197 anyone who works on the C or the Lisp code, especially on the C code, to | |
10198 be aware of these issues, even if they don't work directly on code that | |
10199 implements multi-lingual features, because there are various general | |
10200 procedures that need to be followed in order to write Mule-compliant | |
10201 code. (The specifics of these procedures are documented elsewhere in | |
10202 this manual.) | |
10203 | |
10204 There are four primary aspects of Mule support: | |
10205 | |
10206 @enumerate | |
10207 @item | |
10208 internal handling and representation of multi-lingual text. | |
10209 @item | |
10210 conversion between the internal representation of text and the various | |
10211 external representations in which multi-lingual text is encoded, such as | |
10212 Unicode representations (including mostly fixed width encodings such as | |
10213 UCS-2/UTF-16 and UCS-4 and variable width ASCII conformant encodings, | |
10214 such as UTF-7 and UTF-8); the various ISO2022 representations, which | |
10215 typically use escape sequences to switch between different character | |
10216 sets (such as Compound Text, used under X Windows; JIS, used | |
10217 specifically for encoding Japanese; and EUC, a non-modal encoding used | |
10218 for Japanese, Korean, and certain other languages); Microsoft's | |
10219 multi-byte encodings (such as Shift-JIS); various simple encodings for | |
10220 particular 8-bit character sets (such as Latin-1 and Latin-2, and | |
10221 encodings (such as koi8 and Alternativny) for Cyrillic); and others. | |
10222 This conversion needs to happen both for text in files and text sent to | |
10223 or retrieved from system API calls. It even needs to happen for | |
10224 external binary data because the internal representation does not | |
10225 represent binary data simply as a sequence of bytes as it is represented | |
10226 externally. | |
10227 @item | |
10228 Proper display of multi-lingual characters. | |
10229 @item | |
10230 Input of multi-lingual text using the keyboard. | |
10231 @end enumerate | |
10232 | |
10233 These four aspects are for the most part independent of each other. | |
10234 | |
10235 @subheading Characters, Character Sets, and Encodings | |
10236 | |
10237 A @dfn{character} (which is, BTW, a surprisingly complex concept) is, in | |
10238 a written representation of text, the most basic written unit that has a | |
10239 meaning of its own. It's comparable to a phoneme when analyzing words | |
10240 in spoken speech (for example, the sound of @samp{t} in English, which | |
10241 in fact has different pronunciations in different words -- aspirated in | |
10242 @samp{time}, unaspirated in @samp{stop}, unreleased or even pronounced | |
10243 as a glottal stop in @samp{button}, etc. -- but logically is a single | |
10244 concept). Like a phoneme, a character is an abstract concept defined by | |
10245 its @emph{meaning}. The character @samp{lowercase f}, for example, can | |
10246 always be used to represent the first letter in the word @samp{fill}, | |
10247 regardless of whether it's drawn upright or italic, whether the | |
10248 @samp{fi} combination is drawn as a single ligature, whether there are | |
10249 serifs on the bottom of the vertical stroke, etc. (These different | |
10250 appearances of a single character are often called @dfn{graphs} or | |
10251 @dfn{glyphs}.) Our concern when representing text is on representing the | |
10252 abstract characters, and not on their exact appearance. | |
10253 | |
10254 A @dfn{character set} (or @dfn{charset}), as we define it, is a set of | |
10255 characters, each with an associated number (or set of numbers -- see | |
10256 below), called a @dfn{code point}. It's important to understand that a | |
10257 character is not defined by any number attached to it, but by its | |
10258 meaning. For example, ASCII and EBCDIC are two charsets containing | |
10259 exactly the same characters (lowercase and uppercase letters, numbers 0 | |
10260 through 9, particular punctuation marks) but with different | |
10261 numberings. The `comma' character in ASCII and EBCDIC, for instance, is | |
10262 the same character despite having a different numbering. Conversely, | |
10263 when comparing ASCII and JIS-Roman, which look the same except that the | |
10264 latter has a yen sign substituted for the backslash, we would say that | |
10265 the backslash and yen sign are @strong{not} the same characters, despite having | |
10266 the same number (95) and despite the fact that all other characters are | |
10267 present in both charsets, with the same numbering. ASCII and JIS-Roman, | |
10268 then, do @emph{not} have exactly the same characters in them (ASCII has | |
10269 a backslash character but no yen-sign character, and vice-versa for | |
10270 JIS-Roman), unlike ASCII and EBCDIC, even though the numberings in ASCII | |
10271 and JIS-Roman are closer. | |
10272 | |
10273 It's also important to distinguish between charsets and encodings. For | |
10274 a simple charset like ASCII, there is only one encoding normally used -- | |
10275 each character is represented by a single byte, with the same value as | |
10276 its code point. For more complicated charsets, however, things are not | |
10277 so obvious. Unicode version 2, for example, is a large charset with | |
10278 thousands of characters, each indexed by a 16-bit number, often | |
10279 represented in hex, e.g. 0x05D0 for the Hebrew letter "aleph". One | |
10280 obvious encoding uses two bytes per character (actually two encodings, | |
10281 depending on which of the two possible byte orderings is chosen). This | |
10282 encoding is convenient for internal processing of Unicode text; however, | |
10283 it's incompatible with ASCII, so a different encoding, e.g. UTF-8, is | |
10284 usually used for external text, for example files or e-mail. UTF-8 | |
10285 represents Unicode characters with one to three bytes (often extended to | |
10286 six bytes to handle characters with up to 31-bit indices). Unicode | |
10287 characters 00 to 7F (identical with ASCII) are directly represented with | |
10288 one byte, and other characters with two or more bytes, each in the range | |
10289 80 to FF. | |
10290 | |
10291 In general, a single encoding may be able to represent more than one | |
10292 charset. | |
10293 | |
10294 @subheading Internal Representation of Text | |
10295 | |
10296 In an ASCII or single-European-character-set world, life is very simple. | |
10297 There are 256 characters, and each character is represented using the | |
10298 numbers 0 through 255, which fit into a single byte. With a few | |
10299 exceptions (such as case-changing operations or syntax classes like | |
10300 'whitespace'), "text" is simply an array of indices into a font. You | |
10301 can get different languages simply by choosing fonts with different | |
10302 8-bit character sets (ISO-8859-1, -2, special-symbol fonts, etc.), and | |
10303 everything will "just work" as long as anyone else receiving your text | |
10304 uses a compatible font. | |
10305 | |
10306 In the multi-lingual world, however, it is much more complicated. There | |
10307 are a great number of different characters which are organized in a | |
10308 complex fashion into various character sets. The representation to use | |
10309 is not obvious because there are issues of size versus speed to | |
10310 consider. In fact, there are in general two kinds of representations to | |
10311 work with: one that represents a single character using an integer | |
10312 (possibly a byte), and the other representing a single character as a | |
10313 sequence of bytes. The former representation is normally called fixed | |
10314 width, and the other variable width. Both representations represent | |
10315 exactly the same characters, and the conversion from one representation | |
10316 to the other is governed by a specific formula (rather than by table | |
10317 lookup) but it may not be simple. Most C code need not, and in fact | |
10318 should not, know the specifics of exactly how the representations work. | |
10319 In fact, the code must not make assumptions about the representations. | |
10320 This means in particular that it must use the proper macros for | |
10321 retrieving the character at a particular memory location, determining | |
10322 how many characters are present in a particular stretch of text, and | |
10323 incrementing a pointer to a particular character to point to the | |
10324 following character, and so on. It must not assume that one character | |
10325 is stored using one byte, or even using any particular number of bytes. | |
10326 It must not assume that the number of characters in a stretch of text | |
10327 bears any particular relation to a number of bytes in that stretch. It | |
10328 must not assume that the character at a particular memory location can | |
10329 be retrieved simply by dereferencing the memory location, even if a | |
10330 character is known to be ASCII or is being compared with an ASCII | |
10331 character, etc. Careful coding is required to be Mule clean. The | |
10332 biggest work of adding Mule support, in fact, is converting all of the | |
10333 existing code to be Mule clean. | |
10334 | |
10335 Lisp code is mostly unaffected by these concerns. Text in strings and | |
10336 buffers appears simply as a sequence of characters regardless of | |
10337 whether Mule support is present. The biggest difference with older | |
10338 versions of Emacs, as well as current versions of GNU Emacs, is that | |
10339 integers and characters are no longer equivalent, but are separate | |
10340 Lisp Object types. | |
10341 | |
10342 @subheading Conversion Between Internal and External Representations | |
10343 | |
10344 All text needs to be converted to an external representation before being | |
10345 sent to a function or file, and all text retrieved from a function of | |
10346 file needs to be converted to the internal representation. This | |
10347 conversion needs to happen as close to the source or destination of the | |
10348 text as possible. No operations should ever be performed on text encoded | |
10349 in an external representation other than simple copying, because no | |
10350 assumptions can reliably be made about the format of this text. You | |
10351 cannot assume, for example, that the end of text is terminated by a null | |
10352 byte. (For example, if the text is Unicode, it will have many null bytes | |
10353 in it.) You cannot find the next "slash" character by searching through | |
10354 the bytes until you find a byte that looks like a "slash" character, | |
10355 because it might actually be the second byte of a Kanji character. | |
10356 Furthermore, all text in the internal representation must be converted, | |
10357 even if it is known to be completely ASCII, because the external | |
10358 representation may not be ASCII compatible (for example, if it is | |
10359 Unicode). | |
10360 | |
10361 The place where C code needs to be the most careful is when calling | |
10362 external API functions. It is easy to forget that all text passed to or | |
10363 retrieved from these functions needs to be converted. This includes text | |
10364 in structures passed to or retrieved from these functions and all text | |
10365 that is passed to a callback function that is called by the system. | |
10366 | |
10367 Macros are provided to perform conversions to or from external text. | |
10368 These macros are called TO_EXTERNAL_FORMAT and TO_INTERNAL_FORMAT | |
10369 respectively. These macros accept input in various forms, for example, | |
10370 Lisp strings, buffers, lstreams, raw data, and can return data in | |
10371 multiple formats, including both @code{malloc()}ed and @code{alloca()}ed data. The use | |
10372 of @code{alloca()}ed data here is particularly important because, in general, | |
10373 the returned data will not be used after making the API call, and as a | |
10374 result, using @code{alloca()}ed data provides a very cheap and easy to use | |
10375 method of allocation. | |
10376 | |
10377 These macros take a coding system argument which indicates the nature of | |
10378 the external encoding. A coding system is an object that encapsulates | |
10379 the structures of a particular external encoding and the methods required | |
10380 to convert to and from this encoding. A facility exists to create coding | |
10381 system aliases, which in essence gives a single coding system two | |
10382 different names. It is effectively used in XEmacs to provide a layer of | |
10383 abstraction on top of the actual coding systems. For example, the coding | |
10384 system alias "file-name" points to whichever coding system is currently | |
10385 used for encoding and decoding file names as passed to or retrieved from | |
10386 system calls. In general, the actual encoding will differ from system to | |
10387 system, and also on the particular locale that the user is in. The use | |
10388 of the file-name alias effectively hides that implementation detail on | |
10389 top of that abstract interface layer which provides a unified set of | |
10390 coding systems which are consistent across all operating environments. | |
10391 | |
10392 The choice of which coding system to use in a particular conversion macro | |
10393 requires some thought. In general, you should choose a lower-level | |
10394 actual coding system when the very design of the APIs you are working | |
10395 with call for that particular coding system. In all other cases, you | |
10396 should find the least general abstract coding system (i.e. coding system | |
10397 alias) that applies to your specific situation. Only use the most | |
10398 general coding systems, such as native, when there is simply nothing else | |
10399 that is more appropriate. By doing things this way, you allow the user | |
10400 more control over how the encoding actually works, because the user is | |
10401 free to map the abstracted coding system names onto to different actual | |
10402 coding systems. | |
10403 | |
10404 Some common coding systems are: | |
10405 | |
10406 @table @code | |
10407 @item ctext | |
10408 Compound Text, which is the standard encoding under X Windows, which is | |
10409 used for clipboard data and possibly other data. (ctext is a coding | |
10410 system of type ISO2022.) | |
10411 | |
10412 @item mswindows-unicode | |
10413 this is used for representing text passed to MS Window API calls with | |
10414 arguments that need to be in Unicode format. (mswindows-unicode is a | |
10415 coding system of type UTF-16) | |
10416 | |
10417 @item ms-windows-multi-byte | |
10418 this is used for representing text passed to MS Windows API calls with | |
10419 arguments that need to be in multi-byte format. Note that there are | |
10420 very few if any examples of such calls. | |
10421 | |
10422 @item mswindows-tstr | |
10423 this is used for representing text passed to any MS Windows API calls | |
10424 that declare their argument as LPTSTR, or LPCTSTR. This is the vast | |
10425 majority of system calls and automatically translates either to | |
10426 mswindows-unicode or mswindows-multi-byte, depending on the presence or | |
10427 absence of the UNICODE preprocessor constant. (If we compile XEmacs | |
10428 with this preprocessor constant, then all API calls use Unicode for all | |
10429 text passed to or received from these API calls.) | |
10430 | |
10431 @item terminal | |
10432 used for text sent to or read from a text terminal in the absence of a | |
10433 more specific coding system (calls to window-system specific APIs should | |
10434 use the appropriate window-specific coding system if it makes sense to | |
10435 do so.) | |
10436 | |
10437 @item file-name | |
10438 used when specifying the names of files in the absence of a more | |
10439 specific encoding, such as ms-windows-tstr. | |
10440 | |
10441 @item native | |
10442 the most general coding system for specifying text passed to system | |
10443 calls. This generally translates to whatever coding system is specified | |
10444 by the current locale. This should only be used when none of the coding | |
10445 systems mentioned above are appropriate. | |
10446 @end table | |
10447 | |
10448 @subheading Proper Display of Multilingual Text | |
10449 | |
10450 There are two things required to get this working correctly. One is | |
10451 selecting the correct font, and the other is encoding the text according | |
10452 to the encoding used for that specific font, or the window-system | |
10453 specific text display API. Generally each separate character set has a | |
10454 different font associated with it, which is specified by name and each | |
10455 font has an associated encoding into which the characters must be | |
10456 translated. (this is the case on X Windows, at least; on Windows there | |
10457 is a more general mechanism). Both the specific font for a charset and | |
10458 the encoding of that font are system dependent. Currently there is a | |
10459 way of specifying these two properties under X Windows (using the | |
10460 registry and ccl properties of a character set) but not for other window | |
10461 systems. A more general system needs to be implemented to allow these | |
10462 characteristics to be specified for all Windows systems. | |
10463 | |
10464 Another issue is making sure that the necessary fonts for displaying | |
10465 various character sets are installed on the system. Currently, XEmacs | |
10466 provides, on its web site, X Windows fonts for a number of different | |
10467 character sets that can be installed by users. This isn't done yet for | |
10468 Windows, but it should be. | |
10469 | |
10470 @subheading Inputting of Multilingual Text | |
10471 | |
10472 This is a rather complicated issue because there are many paradigms | |
10473 defined for inputting multi-lingual text, some of which are specific to | |
10474 particular languages, and any particular language may have many | |
10475 different paradigms defined for inputting its text. These paradigms are | |
10476 encoded in input methods and there is a standard API for defining an | |
10477 input method in XEmacs called LEIM, or Library of Emacs Input Methods. | |
10478 Some of these input methods are written entirely in Elisp, and thus are | |
10479 system-independent, while others require the aid either of an external | |
10480 process, or of C level support that ties into a particular | |
10481 system-specific input method API, for example, XIM under X Windows, or | |
10482 the active keyboard layout and IME support under Windows. Currently, | |
10483 there is no support for any system-specific input methods under | |
10484 Microsoft Windows, although this will change. | |
10485 | |
10486 @node Introduction to Multilingual Issues #4, Character Sets, Introduction to Multilingual Issues #3, Multilingual Support | |
10487 @section Introduction to Multilingual Issues #4 | |
10488 @cindex introduction to multilingual issues #4 | |
10489 | |
10490 The rest of the sections in this chapter consist of yet another | |
10491 introduction to multilingual issues, duplicating the information in the | |
10492 previous sections. | |
10493 | |
10494 @node Character Sets, Encodings, Introduction to Multilingual Issues #4, Multilingual Support | |
9112 @section Character Sets | 10495 @section Character Sets |
9113 @cindex character sets | 10496 @cindex character sets |
9114 | 10497 |
9115 A character set (or @dfn{charset}) is an ordered set of characters. A | 10498 A @dfn{character set} (or @dfn{charset}) is an ordered set of |
9116 particular character in a charset is indexed using one or more | 10499 characters. A particular character in a charset is indexed using one or |
9117 @dfn{position codes}, which are non-negative integers. The number of | 10500 more @dfn{position codes}, which are non-negative integers. The number |
9118 position codes needed to identify a particular character in a charset is | 10501 of position codes needed to identify a particular character in a charset |
9119 called the @dfn{dimension} of the charset. In XEmacs/Mule, all charsets | 10502 is called the @dfn{dimension} of the charset. In XEmacs/Mule, all |
9120 have dimension 1 or 2, and the size of all charsets (except for a few | 10503 charsets have dimension 1 or 2, and the size of all charsets (except for |
9121 special cases) is either 94, 96, 94 by 94, or 96 by 96. The range of | 10504 a few special cases) is either 94, 96, 94 by 94, or 96 by 96. The range |
9122 position codes used to index characters from any of these types of | 10505 of position codes used to index characters from any of these types of |
9123 character sets is as follows: | 10506 character sets is as follows: |
9124 | 10507 |
9125 @example | 10508 @example |
9126 Charset type Position code 1 Position code 2 | 10509 Charset type Position code 1 Position code 2 |
9127 ------------------------------------------------------------ | 10510 ------------------------------------------------------------ |
9188 160 - 255 Latin-1 32 - 127 | 10571 160 - 255 Latin-1 32 - 127 |
9189 @end example | 10572 @end example |
9190 | 10573 |
9191 This is a bit ad-hoc but gets the job done. | 10574 This is a bit ad-hoc but gets the job done. |
9192 | 10575 |
9193 @node Encodings | 10576 @node Encodings, Internal Mule Encodings, Character Sets, Multilingual Support |
9194 @section Encodings | 10577 @section Encodings |
9195 @cindex encodings, Mule | 10578 @cindex encodings, Mule |
9196 @cindex Mule encodings | 10579 @cindex Mule encodings |
9197 | 10580 |
9198 An @dfn{encoding} is a way of numerically representing characters from | 10581 An @dfn{encoding} is a way of numerically representing characters from |
9213 | 10596 |
9214 Here are descriptions of a couple of common | 10597 Here are descriptions of a couple of common |
9215 encodings: | 10598 encodings: |
9216 | 10599 |
9217 @menu | 10600 @menu |
9218 * Japanese EUC (Extended Unix Code):: | 10601 * Japanese EUC (Extended Unix Code):: |
9219 * JIS7:: | 10602 * JIS7:: |
9220 @end menu | 10603 @end menu |
9221 | 10604 |
9222 @node Japanese EUC (Extended Unix Code) | 10605 @node Japanese EUC (Extended Unix Code), JIS7, Encodings, Encodings |
9223 @subsection Japanese EUC (Extended Unix Code) | 10606 @subsection Japanese EUC (Extended Unix Code) |
9224 @cindex Japanese EUC (Extended Unix Code) | 10607 @cindex Japanese EUC (Extended Unix Code) |
9225 @cindex EUC (Extended Unix Code), Japanese | 10608 @cindex EUC (Extended Unix Code), Japanese |
9226 @cindex Extended Unix Code, Japanese EUC | 10609 @cindex Extended Unix Code, Japanese EUC |
9227 | 10610 |
9228 This encompasses the character sets Printing-ASCII, Japanese-JISX0201, | 10611 This encompasses the character sets Printing-ASCII, Katakana-JISX0201 |
9229 and Japanese-JISX0208-Kana (half-width katakana, the right half of | 10612 (half-width katakana, the right half of JISX0201), Japanese-JISX0208, |
9230 JISX0201). It uses 8-bit bytes. | 10613 and Japanese-JISX0212. |
9231 | 10614 |
9232 Note that Printing-ASCII and Japanese-JISX0201-Kana are 94-character | 10615 Note that Printing-ASCII and Katakana-JISX0201 are 94-character |
9233 charsets, while Japanese-JISX0208 is a 94x94-character charset. | 10616 charsets, while Japanese-JISX0208 and Japanese-JISX0212 are |
10617 94x94-character charsets. | |
9234 | 10618 |
9235 The encoding is as follows: | 10619 The encoding is as follows: |
9236 | 10620 |
9237 @example | 10621 @example |
9238 Character set Representation (PC=position-code) | 10622 Character set Representation (PC=position-code) |
9239 ------------- -------------- | 10623 ------------- -------------- |
9240 Printing-ASCII PC1 | 10624 Printing-ASCII PC1 |
9241 Japanese-JISX0201-Kana 0x8E | PC1 + 0x80 | 10625 Katakana-JISX0201 0x8E | PC1 + 0x80 |
9242 Japanese-JISX0208 PC1 + 0x80 | PC2 + 0x80 | 10626 Japanese-JISX0208 PC1 + 0x80 | PC2 + 0x80 |
9243 Japanese-JISX0212 PC1 + 0x80 | PC2 + 0x80 | 10627 Japanese-JISX0212 PC1 + 0x80 | PC2 + 0x80 |
9244 @end example | 10628 @end example |
9245 | 10629 |
9246 | 10630 Note that there are other versions of EUC for other Asian languages. |
9247 @node JIS7 | 10631 EUC in general is characterized by |
10632 | |
10633 @enumerate | |
10634 @item | |
10635 row-column encoding, | |
10636 @item | |
10637 big-endian (row-first) ordering, and | |
10638 @item | |
10639 ASCII compatibility in variable width forms. | |
10640 @end enumerate | |
10641 | |
10642 @node JIS7, , Japanese EUC (Extended Unix Code), Encodings | |
9248 @subsection JIS7 | 10643 @subsection JIS7 |
9249 @cindex JIS7 | 10644 @cindex JIS7 |
9250 | 10645 |
9251 This encompasses the character sets Printing-ASCII, | 10646 This encompasses the character sets Printing-ASCII, |
9252 Japanese-JISX0201-Roman (the left half of JISX0201; this character set | 10647 Latin-JISX0201 (the left half of JISX0201; this character set |
9253 is very similar to Printing-ASCII and is a 94-character charset), | 10648 is very similar to Printing-ASCII and is a 94-character charset), |
9254 Japanese-JISX0208, and Japanese-JISX0201-Kana. It uses 7-bit bytes. | 10649 Japanese-JISX0208, and Katakana-JISX0201. It uses 7-bit bytes. |
9255 | 10650 |
9256 Unlike Japanese EUC, this is a @dfn{modal} encoding, which | 10651 Unlike EUC, this is a @dfn{modal} encoding, which means that there are |
9257 means that there are multiple states that the encoding can | 10652 multiple states that the encoding can be in, which affect how the bytes |
9258 be in, which affect how the bytes are to be interpreted. | 10653 are to be interpreted. Special sequences of bytes (called @dfn{escape |
9259 Special sequences of bytes (called @dfn{escape sequences}) | 10654 sequences}) are used to change states. |
9260 are used to change states. | |
9261 | 10655 |
9262 The encoding is as follows: | 10656 The encoding is as follows: |
9263 | 10657 |
9264 @example | 10658 @example |
9265 Character set Representation (PC=position-code) | 10659 Character set Representation (PC=position-code) |
9266 ------------- -------------- | 10660 ------------- -------------- |
9267 Printing-ASCII PC1 | 10661 Printing-ASCII PC1 |
9268 Japanese-JISX0201-Roman PC1 | 10662 Latin-JISX0201 PC1 |
9269 Japanese-JISX0201-Kana PC1 | 10663 Katakana-JISX0201 PC1 |
9270 Japanese-JISX0208 PC1 PC2 | 10664 Japanese-JISX0208 PC1 | PC2 |
9271 | 10665 |
9272 | 10666 |
9273 Escape sequence ASCII equivalent Meaning | 10667 Escape sequence ASCII equivalent Meaning |
9274 --------------- ---------------- ------- | 10668 --------------- ---------------- ------- |
9275 0x1B 0x28 0x4A ESC ( J invoke Japanese-JISX0201-Roman | 10669 0x1B 0x28 0x4A ESC ( J invoke Latin-JISX0201 |
9276 0x1B 0x28 0x49 ESC ( I invoke Japanese-JISX0201-Kana | 10670 0x1B 0x28 0x49 ESC ( I invoke Katakana-JISX0201 |
9277 0x1B 0x24 0x42 ESC $ B invoke Japanese-JISX0208 | 10671 0x1B 0x24 0x42 ESC $ B invoke Japanese-JISX0208 |
9278 0x1B 0x28 0x42 ESC ( B invoke Printing-ASCII | 10672 0x1B 0x28 0x42 ESC ( B invoke Printing-ASCII |
9279 @end example | 10673 @end example |
9280 | 10674 |
9281 Initially, Printing-ASCII is invoked. | 10675 Initially, Printing-ASCII is invoked. |
9282 | 10676 |
9283 @node Internal Mule Encodings | 10677 @node Internal Mule Encodings, Byte/Character Types; Buffer Positions; Other Typedefs, Encodings, Multilingual Support |
9284 @section Internal Mule Encodings | 10678 @section Internal Mule Encodings |
9285 @cindex internal Mule encodings | 10679 @cindex internal Mule encodings |
9286 @cindex Mule encodings, internal | 10680 @cindex Mule encodings, internal |
9287 @cindex encodings, internal Mule | 10681 @cindex encodings, internal Mule |
9288 | 10682 |
9297 these are user-defined charsets. | 10691 these are user-defined charsets. |
9298 | 10692 |
9299 More specifically: | 10693 More specifically: |
9300 | 10694 |
9301 @example | 10695 @example |
9302 Character set Leading byte | 10696 Character set Leading byte |
9303 ------------- ------------ | 10697 ------------- ------------ |
9304 ASCII 0 | 10698 ASCII 0 (0x7F in arrays indexed by leading byte) |
9305 Composite 0x80 | 10699 Composite 0x8D |
9306 Dimension-1 Official 0x81 - 0x8D | 10700 Dimension-1 Official 0x80 - 0x8C/0x8D |
9307 (0x8E is free) | 10701 (0x8E is free) |
9308 Control-1 0x8F | 10702 Control 0x8F |
9309 Dimension-2 Official 0x90 - 0x99 | 10703 Dimension-2 Official 0x90 - 0x99 |
9310 (0x9A - 0x9D are free; | 10704 (0x9A - 0x9D are free) |
9311 0x9E and 0x9F are reserved) | 10705 Dimension-1 Private Marker 0x9E |
9312 Dimension-1 Private 0xA0 - 0xEF | 10706 Dimension-2 Private Marker 0x9F |
9313 Dimension-2 Private 0xF0 - 0xFF | 10707 Dimension-1 Private 0xA0 - 0xEF |
10708 Dimension-2 Private 0xF0 - 0xFF | |
9314 @end example | 10709 @end example |
9315 | 10710 |
9316 There are two internal encodings for characters in XEmacs/Mule. One is | 10711 There are two internal encodings for characters in XEmacs/Mule. One is |
9317 called @dfn{string encoding} and is an 8-bit encoding that is used for | 10712 called @dfn{string encoding} and is an 8-bit encoding that is used for |
9318 representing characters in a buffer or string. It uses 1 to 4 bytes per | 10713 representing characters in a buffer or string. It uses 1 to 4 bytes per |
9323 (In the following descriptions, we'll ignore composite characters for | 10718 (In the following descriptions, we'll ignore composite characters for |
9324 the moment. We also give a general (structural) overview first, | 10719 the moment. We also give a general (structural) overview first, |
9325 followed later by the exact details.) | 10720 followed later by the exact details.) |
9326 | 10721 |
9327 @menu | 10722 @menu |
9328 * Internal String Encoding:: | 10723 * Internal String Encoding:: |
9329 * Internal Character Encoding:: | 10724 * Internal Character Encoding:: |
9330 @end menu | 10725 @end menu |
9331 | 10726 |
9332 @node Internal String Encoding | 10727 @node Internal String Encoding, Internal Character Encoding, Internal Mule Encodings, Internal Mule Encodings |
9333 @subsection Internal String Encoding | 10728 @subsection Internal String Encoding |
9334 @cindex internal string encoding | 10729 @cindex internal string encoding |
9335 @cindex string encoding, internal | 10730 @cindex string encoding, internal |
9336 @cindex encoding, internal string | 10731 @cindex encoding, internal string |
9337 | 10732 |
9380 None of the standard non-modal encodings meet all of these | 10775 None of the standard non-modal encodings meet all of these |
9381 conditions. For example, EUC satisfies only (2) and (3), while | 10776 conditions. For example, EUC satisfies only (2) and (3), while |
9382 Shift-JIS and Big5 (not yet described) satisfy only (2). (All | 10777 Shift-JIS and Big5 (not yet described) satisfy only (2). (All |
9383 non-modal encodings must satisfy (2), in order to be unambiguous.) | 10778 non-modal encodings must satisfy (2), in order to be unambiguous.) |
9384 | 10779 |
9385 @node Internal Character Encoding | 10780 @node Internal Character Encoding, , Internal String Encoding, Internal Mule Encodings |
9386 @subsection Internal Character Encoding | 10781 @subsection Internal Character Encoding |
9387 @cindex internal character encoding | 10782 @cindex internal character encoding |
9388 @cindex character encoding, internal | 10783 @cindex character encoding, internal |
9389 @cindex encoding, internal character | 10784 @cindex encoding, internal character |
9390 | 10785 |
9404 ------------- ------- ------- ------- | 10799 ------------- ------- ------- ------- |
9405 ASCII 0 0 PC1 | 10800 ASCII 0 0 PC1 |
9406 range: (00 - 7F) | 10801 range: (00 - 7F) |
9407 Control-1 0 1 PC1 | 10802 Control-1 0 1 PC1 |
9408 range: (00 - 1F) | 10803 range: (00 - 1F) |
9409 Dimension-1 official 0 LB - 0x80 PC1 | 10804 Dimension-1 official 0 LB - 0x7F PC1 |
9410 range: (01 - 0D) (20 - 7F) | 10805 range: (01 - 0D) (20 - 7F) |
9411 Dimension-1 private 0 LB - 0x80 PC1 | 10806 Dimension-1 private 0 LB - 0x80 PC1 |
9412 range: (20 - 6F) (20 - 7F) | 10807 range: (20 - 6F) (20 - 7F) |
9413 Dimension-2 official LB - 0x8F PC1 PC2 | 10808 Dimension-2 official LB - 0x8F PC1 PC2 |
9414 range: (01 - 0A) (20 - 7F) (20 - 7F) | 10809 range: (01 - 0A) (20 - 7F) (20 - 7F) |
9415 Dimension-2 private LB - 0xE1 PC1 PC2 | 10810 Dimension-2 private LB - 0xE1 PC1 PC2 |
9416 range: (0F - 1E) (20 - 7F) (20 - 7F) | 10811 range: (0F - 1E) (20 - 7F) (20 - 7F) |
9417 Composite 0x1F ? ? | 10812 Composite 0x1F ? ? |
9418 @end example | 10813 @end example |
9419 | 10814 |
9420 Note that character codes 0 - 255 are the same as the ``binary encoding'' | 10815 Note that character codes 0 - 255 are the same as the ``binary |
9421 described above. | 10816 encoding'' described above. |
9422 | 10817 |
9423 @node CCL | 10818 Most of the code in XEmacs knows nothing of the representation of a |
10819 character other than that values 0 - 255 represent ASCII, Control 1, | |
10820 and Latin 1. | |
10821 | |
10822 @strong{WARNING WARNING WARNING}: The Boyer-Moore code in | |
10823 @file{search.c}, and the code in @code{search_buffer()} that determines | |
10824 whether that code can be used, knows that ``field 3'' in a character | |
10825 always corresponds to the last byte in the textual representation of the | |
10826 character. (This is important because the Boyer-Moore algorithm works by | |
10827 looking at the last byte of the search string and &&#### finish this. | |
10828 | |
10829 @node Byte/Character Types; Buffer Positions; Other Typedefs, Internal Text API's, Internal Mule Encodings, Multilingual Support | |
10830 @section Byte/Character Types; Buffer Positions; Other Typedefs | |
10831 @cindex byte/character types; buffer positions; other typedefs | |
10832 @cindex byte/character types | |
10833 @cindex character types | |
10834 @cindex buffer positions | |
10835 @cindex typedefs, other | |
10836 | |
10837 @menu | |
10838 * Byte Types:: | |
10839 * Different Ways of Seeing Internal Text:: | |
10840 * Buffer Positions:: | |
10841 * Other Typedefs:: | |
10842 * Usage of the Various Representations:: | |
10843 * Working With the Various Representations:: | |
10844 @end menu | |
10845 | |
10846 @node Byte Types, Different Ways of Seeing Internal Text, Byte/Character Types; Buffer Positions; Other Typedefs, Byte/Character Types; Buffer Positions; Other Typedefs | |
10847 @subsection Byte Types | |
10848 @cindex byte types | |
10849 | |
10850 Stuff pointed to by a char * or unsigned char * will nearly always be | |
10851 one of the following types: | |
10852 | |
10853 @itemize @minus | |
10854 @item | |
10855 a) [Ibyte] pointer to internally-formatted text | |
10856 @item | |
10857 b) [Extbyte] pointer to text in some external format, which can be | |
10858 defined as all formats other than the internal one | |
10859 @item | |
10860 c) [Ascbyte] pure ASCII text | |
10861 @item | |
10862 d) [Binbyte] binary data that is not meant to be interpreted as text | |
10863 @item | |
10864 e) [Rawbyte] general data in memory, where we don't care about whether | |
10865 it's text or binary | |
10866 @item | |
10867 f) [Boolbyte] a zero or a one | |
10868 @item | |
10869 g) [Bitbyte] a byte used for bit fields | |
10870 @item | |
10871 h) [Chbyte] null-semantics @code{char *}; used when casting an argument to | |
10872 an external API where the the other types may not be | |
10873 appropriate | |
10874 @end itemize | |
10875 | |
10876 Types (b), (c), (f) and (h) are defined as @code{char}, while the others are | |
10877 @code{unsigned char}. This is for maximum safety (signed characters are | |
10878 dangerous to work with) while maintaining as much compatibility with | |
10879 external API's and string constants as possible. | |
10880 | |
10881 We also provide versions of the above types defined with different | |
10882 underlying C types, for API compatibility. These use the following | |
10883 prefixes: | |
10884 | |
10885 @example | |
10886 C = plain char, when the base type is unsigned | |
10887 U = unsigned | |
10888 S = signed | |
10889 @end example | |
10890 | |
10891 (Formerly I had a comment saying that type (e) "should be replaced with | |
10892 void *". However, there are in fact many places where an unsigned char | |
10893 * might be used -- e.g. for ease in pointer computation, since void * | |
10894 doesn't allow this, and for compatibility with external API's.) | |
10895 | |
10896 Note that these typedefs are purely for documentation purposes; from | |
10897 the C code's perspective, they are exactly equivalent to @code{char *}, | |
10898 @code{unsigned char *}, etc., so you can freely use them with library | |
10899 functions declared as such. | |
10900 | |
10901 Using these more specific types rather than the general ones helps avoid | |
10902 the confusions that occur when the semantics of a char * or unsigned | |
10903 char * argument being studied are unclear. Furthermore, by requiring | |
10904 that ALL uses of @code{char} be replaced with some other type as part of the | |
10905 Mule-ization process, we can use a search for @code{char} as a way of finding | |
10906 code that has not been properly Mule-ized yet. | |
10907 | |
10908 @node Different Ways of Seeing Internal Text, Buffer Positions, Byte Types, Byte/Character Types; Buffer Positions; Other Typedefs | |
10909 @subsection Different Ways of Seeing Internal Text | |
10910 @cindex different ways of seeing internal text | |
10911 | |
10912 There are various ways of representing internal text. The two primary | |
10913 ways are as an "array" of individual characters; the other is as a | |
10914 "stream" of bytes. In the ASCII world, where there are only 255 | |
10915 characters at most, things are easy because each character fits into a | |
10916 byte. In general, however, this is not true -- see the above discussion | |
10917 of characters vs. encodings. | |
10918 | |
10919 In some cases, it's also important to distinguish between a stream | |
10920 representation as a series of bytes and as a series of textual units. | |
10921 This is particularly important wrt Unicode. The UTF-16 representation | |
10922 (sometimes referred to, rather sloppily, as simply the "Unicode" format) | |
10923 represents text as a series of 16-bit units. Mostly, each unit | |
10924 corresponds to a single character, but not necessarily, as characters | |
10925 outside of the range 0-65535 (the BMP or "Basic Multilingual Plane" of | |
10926 Unicode) require two 16-bit units, through the mechanism of | |
10927 "surrogates". When a series of 16-bit units is serialized into a byte | |
10928 stream, there are at least two possible representations, little-endian | |
10929 and big-endian, and which one is used may depend on the native format of | |
10930 16-bit integers in the CPU of the machine that XEmacs is running | |
10931 on. (Similarly, UTF-32 is logically a representation with 32-bit textual | |
10932 units.) | |
10933 | |
10934 Specifically: | |
10935 | |
10936 @itemize @minus | |
10937 @item | |
10938 UTF-8 has 1-byte (8-bit) units. | |
10939 @item | |
10940 UTF-16 has 2-byte (16-bit) units. | |
10941 @item | |
10942 UTF-32 has 4-byte (32-bit) units. | |
10943 @item | |
10944 XEmacs-internal encoding (the old "Mule" encoding) has 1-byte (8-bit) | |
10945 units. | |
10946 @item | |
10947 UTF-7 technically has 7-bit units that are within the "mail-safe" range | |
10948 (ASCII 32 - 126 plus a few control characters), but normally is encoded | |
10949 in an 8-bit stream. (UTF-7 is also a modal encoding, since it has a | |
10950 normal mode where printable ASCII characters represent themselves and a | |
10951 shifted mode, introduced with a plus sign, where a base-64 encoding is | |
10952 used.) | |
10953 @item | |
10954 UTF-5 technically has 7-bit units (normally encoded in an 8-bit stream, | |
10955 like UTF-7), but only uses uppercase A-V and 0-9, and only encodes 4 | |
10956 bits worth of data per character. UTF-5 is meant for encoding Unicode | |
10957 inside of DNS names. | |
10958 @end itemize | |
10959 | |
10960 Thus, we can imagine three levels in the representation of texual data: | |
10961 | |
10962 @example | |
10963 series of characters -> series of textual units -> series of bytes | |
10964 [Ichar] [Itext] [Ibyte] | |
10965 @end example | |
10966 | |
10967 XEmacs has three corresponding typedefs: | |
10968 | |
10969 @itemize @minus | |
10970 @item | |
10971 An Ichar is an integer (at least 32-bit), representing a 31-bit | |
10972 character. | |
10973 @item | |
10974 An Itext is an unsigned value, either 8, 16 or 32 bits, depending | |
10975 on the nature of the internal representation, and corresponding to | |
10976 a single textual unit. | |
10977 @item | |
10978 An Ibyte is an @code{unsigned char}, representing a single byte in a | |
10979 textual byte stream. | |
10980 @end itemize | |
10981 | |
10982 Internal text in stream format can be simultaneously viewed as either | |
10983 @code{Itext *} or @code{Ibyte *}. The @code{Ibyte *} representation is convenient for | |
10984 copying data from one place to another, because such routines usually | |
10985 expect byte counts. However, @code{Itext *} is much better for actually | |
10986 working with the data. | |
10987 | |
10988 From a text-unit perspective, units 0 through 127 will always be ASCII | |
10989 compatible, and data in Lisp strings (and other textual data generated | |
10990 as a whole, e.g. from external conversion) will be followed by a | |
10991 null-unit terminator. From an @code{Ibyte *} perspective, however, the | |
10992 encoding is only ASCII-compatible if it uses 1-byte units. | |
10993 | |
10994 Similarly to the different text representations, three integral count | |
10995 types exist -- Charcount, Textcount and Bytecount. | |
10996 | |
10997 NOTE: Despite the presence of the terminator, internal text itself can | |
10998 have nulls in it! (Null text units, not just the null bytes present in | |
10999 any UTF-16 encoding.) The terminator is present because in many cases | |
11000 internal text is passed to routines that will ultimately pass the text | |
11001 to library functions that cannot handle embedded nulls, e.g. functions | |
11002 manipulating filenames, and it is a real hassle to have to pass the | |
11003 length around constantly. But this can lead to sloppy coding! We need | |
11004 to be careful about watching for nulls in places that are important, | |
11005 e.g. manipulating string objects or passing data to/from the clipboard. | |
11006 | |
11007 @table @code | |
11008 @item Ibyte | |
11009 The data in a buffer or string is logically made up of Ibyte objects, | |
11010 where a Ibyte takes up the same amount of space as a char. (It is | |
11011 declared differently, though, to catch invalid usages.) Strings stored | |
11012 using Ibytes are said to be in "internal format". The important | |
11013 characteristics of internal format are | |
11014 | |
11015 @itemize @minus | |
11016 @item | |
11017 ASCII characters are represented as a single Ibyte, in the range 0 - | |
11018 0x7f. | |
11019 @item | |
11020 All other characters are represented as a Ibyte in the range 0x80 - 0x9f | |
11021 followed by one or more Ibytes in the range 0xa0 to 0xff. | |
11022 @end itemize | |
11023 | |
11024 This leads to a number of desirable properties: | |
11025 | |
11026 @itemize @minus | |
11027 @item | |
11028 Given the position of the beginning of a character, you can find the | |
11029 beginning of the next or previous character in constant time. | |
11030 @item | |
11031 When searching for a substring or an ASCII character within the string, | |
11032 you need merely use standard searching routines. | |
11033 @end itemize | |
11034 | |
11035 @item Itext | |
11036 | |
11037 #### Document me. | |
11038 | |
11039 @item Ichar | |
11040 This typedef represents a single Emacs character, which can be ASCII, | |
11041 ISO-8859, or some extended character, as would typically be used for | |
11042 Kanji. Note that the representation of a character as an Ichar is @strong{not} | |
11043 the same as the representation of that same character in a string; thus, | |
11044 you cannot do the standard C trick of passing a pointer to a character | |
11045 to a function that expects a string. | |
11046 | |
11047 An Ichar takes up 19 bits of representation and (for code compatibility | |
11048 and such) is compatible with an int. This representation is visible on | |
11049 the Lisp level. The important characteristics of the Ichar | |
11050 representation are | |
11051 | |
11052 @itemize @minus | |
11053 @item | |
11054 values 0x00 - 0x7f represent ASCII. | |
11055 @item | |
11056 values 0x80 - 0xff represent the right half of ISO-8859-1. | |
11057 @item | |
11058 values 0x100 and up represent all other characters. | |
11059 @end itemize | |
11060 | |
11061 This means that Ichar values are upwardly compatible with the standard | |
11062 8-bit representation of ASCII/ISO-8859-1. | |
11063 | |
11064 @item Extbyte | |
11065 Strings that go in or out of Emacs are in "external format", typedef'ed | |
11066 as an array of char or a char *. There is more than one external format | |
11067 (JIS, EUC, etc.) but they all have similar properties. They are modal | |
11068 encodings, which is to say that the meaning of particular bytes is not | |
11069 fixed but depends on what "mode" the string is currently in (e.g. bytes | |
11070 in the range 0 - 0x7f might be interpreted as ASCII, or as Hiragana, or | |
11071 as 2-byte Kanji, depending on the current mode). The mode starts out in | |
11072 ASCII/ISO-8859-1 and is switched using escape sequences -- for example, | |
11073 in the JIS encoding, 'ESC $ B' switches to a mode where pairs of bytes | |
11074 in the range 0 - 0x7f are interpreted as Kanji characters. | |
11075 | |
11076 External-formatted data is generally desirable for passing data between | |
11077 programs because it is upwardly compatible with standard | |
11078 ASCII/ISO-8859-1 strings and may require less space than internal | |
11079 encodings such as the one described above. In addition, some encodings | |
11080 (e.g. JIS) keep all characters (except the ESC used to switch modes) in | |
11081 the printing ASCII range 0x20 - 0x7e, which results in a much higher | |
11082 probability that the data will avoid being garbled in transmission. | |
11083 Externally-formatted data is generally not very convenient to work with, | |
11084 however, and for this reason is usually converted to internal format | |
11085 before any work is done on the string. | |
11086 | |
11087 NOTE: filenames need to be in external format so that ISO-8859-1 | |
11088 characters come out correctly. | |
11089 @end table | |
11090 | |
11091 @node Buffer Positions, Other Typedefs, Different Ways of Seeing Internal Text, Byte/Character Types; Buffer Positions; Other Typedefs | |
11092 @subsection Buffer Positions | |
11093 @cindex buffer positions | |
11094 | |
11095 There are three possible ways to specify positions in a buffer. All | |
11096 of these are one-based: the beginning of the buffer is position or | |
11097 index 1, and 0 is not a valid position. | |
11098 | |
11099 As a "buffer position" (typedef Charbpos): | |
11100 | |
11101 This is an index specifying an offset in characters from the | |
11102 beginning of the buffer. Note that buffer positions are | |
11103 logically @strong{between} characters, not on a character. The | |
11104 difference between two buffer positions specifies the number of | |
11105 characters between those positions. Buffer positions are the | |
11106 only kind of position externally visible to the user. | |
11107 | |
11108 As a "byte index" (typedef Bytebpos): | |
11109 | |
11110 This is an index over the bytes used to represent the characters | |
11111 in the buffer. If there is no Mule support, this is identical | |
11112 to a buffer position, because each character is represented | |
11113 using one byte. However, with Mule support, many characters | |
11114 require two or more bytes for their representation, and so a | |
11115 byte index may be greater than the corresponding buffer | |
11116 position. | |
11117 | |
11118 As a "memory index" (typedef Membpos): | |
11119 | |
11120 This is the byte index adjusted for the gap. For positions | |
11121 before the gap, this is identical to the byte index. For | |
11122 positions after the gap, this is the byte index plus the gap | |
11123 size. There are two possible memory indices for the gap | |
11124 position; the memory index at the beginning of the gap should | |
11125 always be used, except in code that deals with manipulating the | |
11126 gap, where both indices may be seen. The address of the | |
11127 character "at" (i.e. following) a particular position can be | |
11128 obtained from the formula | |
11129 | |
11130 buffer_start_address + memory_index(position) - 1 | |
11131 | |
11132 except in the case of characters at the gap position. | |
11133 | |
11134 @node Other Typedefs, Usage of the Various Representations, Buffer Positions, Byte/Character Types; Buffer Positions; Other Typedefs | |
11135 @subsection Other Typedefs | |
11136 @cindex other typedefs | |
11137 | |
11138 Charcount: | |
11139 ---------- | |
11140 This typedef represents a count of characters, such as | |
11141 a character offset into a string or the number of | |
11142 characters between two positions in a buffer. The | |
11143 difference between two Charbpos's is a Charcount, and | |
11144 character positions in a string are represented using | |
11145 a Charcount. | |
11146 | |
11147 Textcount: | |
11148 ---------- | |
11149 #### Document me. | |
11150 | |
11151 Bytecount: | |
11152 ---------- | |
11153 Similar to a Charcount but represents a count of bytes. | |
11154 The difference between two Bytebpos's is a Bytecount. | |
11155 | |
11156 | |
11157 @node Usage of the Various Representations, Working With the Various Representations, Other Typedefs, Byte/Character Types; Buffer Positions; Other Typedefs | |
11158 @subsection Usage of the Various Representations | |
11159 @cindex usage of the various representations | |
11160 | |
11161 Memory indices are used in low-level functions in insdel.c and for | |
11162 extent endpoints and marker positions. The reason for this is that | |
11163 this way, the extents and markers don't need to be updated for most | |
11164 insertions, which merely shrink the gap and don't move any | |
11165 characters around in memory. | |
11166 | |
11167 (The beginning-of-gap memory index simplifies insertions w.r.t. | |
11168 markers, because text usually gets inserted after markers. For | |
11169 extents, it is merely for consistency, because text can get | |
11170 inserted either before or after an extent's endpoint depending on | |
11171 the open/closedness of the endpoint.) | |
11172 | |
11173 Byte indices are used in other code that needs to be fast, | |
11174 such as the searching, redisplay, and extent-manipulation code. | |
11175 | |
11176 Buffer positions are used in all other code. This is because this | |
11177 representation is easiest to work with (especially since Lisp | |
11178 code always uses buffer positions), necessitates the fewest | |
11179 changes to existing code, and is the safest (e.g. if the text gets | |
11180 shifted underneath a buffer position, it will still point to a | |
11181 character; if text is shifted under a byte index, it might point | |
11182 to the middle of a character, which would be bad). | |
11183 | |
11184 Similarly, Charcounts are used in all code that deals with strings | |
11185 except for code that needs to be fast, which used Bytecounts. | |
11186 | |
11187 Strings are always passed around internally using internal format. | |
11188 Conversions between external format are performed at the time | |
11189 that the data goes in or out of Emacs. | |
11190 | |
11191 @node Working With the Various Representations, , Usage of the Various Representations, Byte/Character Types; Buffer Positions; Other Typedefs | |
11192 @subsection Working With the Various Representations | |
11193 @cindex working with the various representations | |
11194 | |
11195 We write things this way because it's very important the | |
11196 MAX_BYTEBPOS_GAP_SIZE_3 is a multiple of 3. (As it happens, | |
11197 65535 is a multiple of 3, but this may not always be the | |
11198 case. #### unfinished | |
11199 | |
11200 @node Internal Text API's, Coding for Mule, Byte/Character Types; Buffer Positions; Other Typedefs, Multilingual Support | |
11201 @section Internal Text API's | |
11202 @cindex internal text API's | |
11203 @cindex text API's, internal | |
11204 @cindex API's, text, internal | |
11205 | |
11206 @strong{NOTE}: The most current documentation for these API's is in | |
11207 @file{text.h}. In case of error, assume that file is correct and this | |
11208 one wrong. | |
11209 | |
11210 @menu | |
11211 * Basic internal-format API's:: | |
11212 * The DFC API:: | |
11213 * The Eistring API:: | |
11214 @end menu | |
11215 | |
11216 @node Basic internal-format API's, The DFC API, Internal Text API's, Internal Text API's | |
11217 @subsection Basic internal-format API's | |
11218 @cindex basic internal-format API's | |
11219 @cindex internal-format API's, basic | |
11220 @cindex API's, basic internal-format | |
11221 | |
11222 These are simple functions and macros to convert between text | |
11223 representation and characters, move forward and back in text, etc. | |
11224 | |
11225 #### Finish the rest of this. | |
11226 | |
11227 Use the following functions/macros on contiguous text in any of the | |
11228 internal formats. Those that take a format arg work on all internal | |
11229 formats; the others work only on the default (variable-width under Mule) | |
11230 format. If the text you're operating on is known to come from a buffer, | |
11231 use the buffer-level functions in buffer.h, which automatically know the | |
11232 correct format and handle the gap. | |
11233 | |
11234 Some terminology: | |
11235 | |
11236 "itext" appearing in the macros means "internal-format text" -- type | |
11237 @code{Ibyte *}. Operations on such pointers themselves, rather than on the | |
11238 text being pointed to, have "itext" instead of "itext" in the macro | |
11239 name. "ichar" in the macro names means an Ichar -- the representation | |
11240 of a character as a single integer rather than a series of bytes, as part | |
11241 of "itext". Many of the macros below are for converting between the | |
11242 two representations of characters. | |
11243 | |
11244 Note also that we try to consistently distinguish between an "Ichar" and | |
11245 a Lisp character. Stuff working with Lisp characters often just says | |
11246 "char", so we consistently use "Ichar" when that's what we're working | |
11247 with. | |
11248 | |
11249 @node The DFC API, The Eistring API, Basic internal-format API's, Internal Text API's | |
11250 @subsection The DFC API | |
11251 @cindex DFC API | |
11252 @cindex API, DFC | |
11253 | |
11254 This is for conversion between internal and external text. Note that | |
11255 there is also the "new DFC" API, which @strong{returns} a pointer to the | |
11256 converted text (in alloca space), rather than storing it into a | |
11257 variable. | |
11258 | |
11259 The macros below are used for converting data between different formats. | |
11260 Generally, the data is textual, and the formats are related to | |
11261 internationalization (e.g. converting between internal-format text and | |
11262 UTF-8) -- but the mechanism is general, and could be used for anything, | |
11263 e.g. decoding gzipped data. | |
11264 | |
11265 In general, conversion involves a source of data, a sink, the existing | |
11266 format of the source data, and the desired format of the sink. The | |
11267 macros below, however, always require that either the source or sink is | |
11268 internal-format text. Therefore, in practice the conversions below | |
11269 involve source, sink, an external format (specified by a coding system), | |
11270 and the direction of conversion (internal->external or vice-versa). | |
11271 | |
11272 Sources and sinks can be raw data (sized or unsized -- when unsized, | |
11273 input data is assumed to be null-terminated [double null-terminated for | |
11274 Unicode-format data], and on output the length is not stored anywhere), | |
11275 Lisp strings, Lisp buffers, lstreams, and opaque data objects. When the | |
11276 output is raw data, the result can be allocated either with @code{alloca()} or | |
11277 @code{malloc()}. (There is currently no provision for writing into a fixed | |
11278 buffer. If you want this, use @code{alloca()} output and then copy the data -- | |
11279 but be careful with the size! Unless you are very sure of the encoding | |
11280 being used, upper bounds for the size are not in general computable.) | |
11281 The obvious restrictions on source and sink types apply (e.g. Lisp | |
11282 strings are a source and sink only for internal data). | |
11283 | |
11284 All raw data outputted will contain an extra null byte (two bytes for | |
11285 Unicode -- currently, in fact, all output data, whether internal or | |
11286 external, is double-null-terminated, but you can't count on this; see | |
11287 below). This means that enough space is allocated to contain the extra | |
11288 nulls; however, these nulls are not reflected in the returned output | |
11289 size. | |
11290 | |
11291 The most basic macros are TO_EXTERNAL_FORMAT and TO_INTERNAL_FORMAT. | |
11292 These can be used to convert between any kinds of sources or sinks. | |
11293 However, 99% of conversions involve raw data or Lisp strings as both | |
11294 source and sink, and usually data is output as @code{alloca()} rather than | |
11295 @code{malloc()}. For this reason, convenience macros are defined for many types | |
11296 of conversions involving raw data and/or Lisp strings, especially when | |
11297 the output is an @code{alloca()}ed string. (When the destination is a | |
11298 Lisp_String, there are other functions that should be used instead -- | |
11299 @code{build_ext_string()} and @code{make_ext_string()}, for example.) The convenience | |
11300 macros are of two types -- the older kind that store the result into a | |
11301 specified variable, and the newer kind that return the result. The newer | |
11302 kind of macros don't exist when the output is sized data, because that | |
11303 would have two return values. NOTE: All convenience macros are | |
11304 ultimately defined in terms of TO_EXTERNAL_FORMAT and TO_INTERNAL_FORMAT. | |
11305 Thus, any comments below about the workings of these macros also apply to | |
11306 all convenience macros. | |
11307 | |
11308 @example | |
11309 TO_EXTERNAL_FORMAT (source_type, source, sink_type, sink, codesys) | |
11310 TO_INTERNAL_FORMAT (source_type, source, sink_type, sink, codesys) | |
11311 @end example | |
11312 | |
11313 Typical use is | |
11314 | |
11315 @example | |
11316 TO_EXTERNAL_FORMAT (LISP_STRING, str, C_STRING_MALLOC, ptr, Qfile_name); | |
11317 @end example | |
11318 | |
11319 which means that the contents of the lisp string @var{str} are written | |
11320 to a malloc'ed memory area which will be pointed to by @var{ptr}, after the | |
11321 function returns. The conversion will be done using the @code{file-name} | |
11322 coding system (which will be controlled by the user indirectly by | |
11323 setting or binding the variable @code{file-name-coding-system}). | |
11324 | |
11325 Some sources and sinks require two C variables to specify. We use | |
11326 some preprocessor magic to allow different source and sink types, and | |
11327 even different numbers of arguments to specify different types of | |
11328 sources and sinks. | |
11329 | |
11330 So we can have a call that looks like | |
11331 | |
11332 @example | |
11333 TO_INTERNAL_FORMAT (DATA, (ptr, len), | |
11334 MALLOC, (ptr, len), | |
11335 coding_system); | |
11336 @end example | |
11337 | |
11338 The parenthesized argument pairs are required to make the | |
11339 preprocessor magic work. | |
11340 | |
11341 NOTE: GC is inhibited during the entire operation of these macros. This | |
11342 is because frequently the data to be converted comes from strings but | |
11343 gets passed in as just DATA, and GC may move around the string data. If | |
11344 we didn't inhibit GC, there'd have to be a lot of messy recoding, | |
11345 alloca-copying of strings and other annoying stuff. | |
11346 | |
11347 The source or sink can be specified in one of these ways: | |
11348 | |
11349 @example | |
11350 DATA, (ptr, len), // input data is a fixed buffer of size len | |
11351 ALLOCA, (ptr, len), // output data is in a @code{ALLOCA()}ed buffer of size len | |
11352 MALLOC, (ptr, len), // output data is in a @code{malloc()}ed buffer of size len | |
11353 C_STRING_ALLOCA, ptr, // equivalent to ALLOCA (ptr, len_ignored) on output | |
11354 C_STRING_MALLOC, ptr, // equivalent to MALLOC (ptr, len_ignored) on output | |
11355 C_STRING, ptr, // equivalent to DATA, (ptr, strlen/wcslen (ptr)) | |
11356 // on input (the Unicode version is used when correct) | |
11357 LISP_STRING, string, // input or output is a Lisp_Object of type string | |
11358 LISP_BUFFER, buffer, // output is written to (point) in lisp buffer | |
11359 LISP_LSTREAM, lstream, // input or output is a Lisp_Object of type lstream | |
11360 LISP_OPAQUE, object, // input or output is a Lisp_Object of type opaque | |
11361 @end example | |
11362 | |
11363 When specifying the sink, use lvalues, since the macro will assign to them, | |
11364 except when the sink is an lstream or a lisp buffer. | |
11365 | |
11366 For the sink types @code{ALLOCA} and @code{C_STRING_ALLOCA}, the resulting text is | |
11367 stored in a stack-allocated buffer, which is automatically freed on | |
11368 returning from the function. However, the sink types @code{MALLOC} and | |
11369 @code{C_STRING_MALLOC} return @code{xmalloc()}ed memory. The caller is responsible | |
11370 for freeing this memory using @code{xfree()}. | |
11371 | |
11372 The macros accept the kinds of sources and sinks appropriate for | |
11373 internal and external data representation. See the type_checking_assert | |
11374 macros below for the actual allowed types. | |
11375 | |
11376 Since some sources and sinks use one argument (a Lisp_Object) to | |
11377 specify them, while others take a (pointer, length) pair, we use | |
11378 some C preprocessor trickery to allow pair arguments to be specified | |
11379 by parenthesizing them, as in the examples above. | |
11380 | |
11381 Anything prefixed by dfc_ (`data format conversion') is private. | |
11382 They are only used to implement these macros. | |
11383 | |
11384 [[Using C_STRING* is appropriate for using with external APIs that | |
11385 take null-terminated strings. For internal data, we should try to | |
11386 be '\0'-clean - i.e. allow arbitrary data to contain embedded '\0'. | |
11387 | |
11388 Sometime in the future we might allow output to C_STRING_ALLOCA or | |
11389 C_STRING_MALLOC _only_ with @code{TO_EXTERNAL_FORMAT()}, not | |
11390 @code{TO_INTERNAL_FORMAT()}.]] | |
11391 | |
11392 The above comments are not true. Frequently (most of the time, in | |
11393 fact), external strings come as zero-terminated entities, where the | |
11394 zero-termination is the only way to find out the length. Even in | |
11395 cases where you can get the length, most of the time the system will | |
11396 still use the null to signal the end of the string, and there will | |
11397 still be no way to either send in or receive a string with embedded | |
11398 nulls. In such situations, it's pointless to track the length | |
11399 because null bytes can never be in the string. We have a lot of | |
11400 operations that make it easy to operate on zero-terminated strings, | |
11401 and forcing the user the deal with the length everywhere would only | |
11402 make the code uglier and more complicated, for no gain. --ben | |
11403 | |
11404 There is no problem using the same lvalue for source and sink. | |
11405 | |
11406 Also, when pointers are required, the code (currently at least) is | |
11407 lax and allows any pointer types, either in the source or the sink. | |
11408 This makes it possible, e.g., to deal with internal format data held | |
11409 in char *'s or external format data held in WCHAR * (i.e. Unicode). | |
11410 | |
11411 Finally, whenever storage allocation is called for, extra space is | |
11412 allocated for a terminating zero, and such a zero is stored in the | |
11413 appropriate place, regardless of whether the source data was | |
11414 specified using a length or was specified as zero-terminated. This | |
11415 allows you to freely pass the resulting data, no matter how | |
11416 obtained, to a routine that expects zero termination (modulo, of | |
11417 course, that any embedded zeros in the resulting text will cause | |
11418 truncation). In fact, currently two embedded zeros are allocated | |
11419 and stored after the data result. This is to allow for the | |
11420 possibility of storing a Unicode value on output, which needs the | |
11421 two zeros. Currently, however, the two zeros are stored regardless | |
11422 of whether the conversion is internal or external and regardless of | |
11423 whether the external coding system is in fact Unicode. This | |
11424 behavior may change in the future, and you cannot rely on this -- | |
11425 the most you can rely on is that sink data in Unicode format will | |
11426 have two terminating nulls, which combine to form one Unicode null | |
11427 character. | |
11428 | |
11429 NOTE: You might ask, why are these not written as functions that | |
11430 @strong{RETURN} the converted string, since that would allow them to be used | |
11431 much more conveniently, without having to constantly declare temporary | |
11432 variables? The answer is that in fact I originally did write the | |
11433 routines that way, but that required either | |
11434 | |
11435 @itemize @bullet | |
11436 @item | |
11437 (a) calling @code{alloca()} inside of a function call, or | |
11438 @item | |
11439 (b) using expressions separated by commas and a global temporary variable, or | |
11440 @item | |
11441 (c) using the GCC extension (@{ ... @}). | |
11442 @end itemize | |
11443 | |
11444 Turned out that all of the above had bugs, all caused by GCC (hence the | |
11445 comments about "those GCC wankers" and "ream gcc up the ass"). As for | |
11446 (a), some versions of GCC (especially on Intel platforms), which had | |
11447 buggy implementations of @code{alloca()} that couldn't handle being called | |
11448 inside of a function call -- they just decremented the stack right in the | |
11449 middle of pushing args. Oops, crash with stack trashing, very bad. (b) | |
11450 was an attempt to fix (a), and that led to further GCC crashes, esp. when | |
11451 you had two such calls in a single subexpression, because GCC couldn't be | |
11452 counted upon to follow even a minimally reasonable order of execution. | |
11453 True, you can't count on one argument being evaluated before another, but | |
11454 GCC would actually interleave them so that the temp var got stomped on by | |
11455 one while the other was accessing it. So I tried (c), which was | |
11456 problematic because that GCC extension has more bugs in it than a | |
11457 termite's nest. | |
11458 | |
11459 So reluctantly I converted to the current way. Now, that was awhile ago | |
11460 (c. 1994), and it appears that the bug involving alloca in function calls | |
11461 has long since been fixed. More recently, I defined the new-dfc routines | |
11462 down below, which DO allow exactly such convenience of returning your | |
11463 args rather than store them in temp variables, and I also wrote a | |
11464 configure check to see whether @code{alloca()} causes crashes inside of function | |
11465 calls, and if so use the portable @code{alloca()} implementation in alloca.c. | |
11466 If you define TEST_NEW_DFC, the old routines get written in terms of the | |
11467 new ones, and I've had a beta put out with this on and it appeared to | |
11468 this appears to cause no problems -- so we should consider | |
11469 switching, and feel no compunctions about writing further such function- | |
11470 like @code{alloca()} routines in lieu of statement-like ones. --ben | |
11471 | |
11472 @node The Eistring API, , The DFC API, Internal Text API's | |
11473 @subsection The Eistring API | |
11474 @cindex Eistring API | |
11475 @cindex API, Eistring | |
11476 | |
11477 (This API is currently under-used) When doing simple things with | |
11478 internal text, the basic internal-format API's are enough. But to do | |
11479 things like delete or replace a substring, concatenate various strings, | |
11480 etc. is difficult to do cleanly because of the allocation issues. | |
11481 The Eistring API is designed to deal with this, and provides a clean | |
11482 way of modifying and building up internal text. (Note that the former | |
11483 lack of this API has meant that some code uses Lisp strings to do | |
11484 similar manipulations, resulting in excess garbage and increased | |
11485 garbage collection.) | |
11486 | |
11487 NOTE: The Eistring API is (or should be) Mule-correct even without | |
11488 an ASCII-compatible internal representation. | |
11489 | |
11490 @example | |
11491 #### NOTE: This is a work in progress. Neither the API nor especially | |
11492 the implementation is finished. | |
11493 | |
11494 NOTE: An Eistring is a structure that makes it easy to work with | |
11495 internally-formatted strings of data. It provides operations similar | |
11496 in feel to the standard @code{strcpy()}, @code{strcat()}, @code{strlen()}, etc., but | |
11497 | |
11498 (a) it is Mule-correct | |
11499 (b) it does dynamic allocation so you never have to worry about size | |
11500 restrictions | |
11501 (c) it comes in an @code{ALLOCA()} variety (all allocation is stack-local, | |
11502 so there is no need to explicitly clean up) as well as a @code{malloc()} | |
11503 variety | |
11504 (d) it knows its own length, so it does not suffer from standard null | |
11505 byte brain-damage -- but it null-terminates the data anyway, so | |
11506 it can be passed to standard routines | |
11507 (e) it provides a much more powerful set of operations and knows about | |
11508 all the standard places where string data might reside: Lisp_Objects, | |
11509 other Eistrings, Ibyte * data with or without an explicit length, | |
11510 ASCII strings, Ichars, etc. | |
11511 (f) it provides easy operations to convert to/from externally-formatted | |
11512 data, and is easier to use than the standard TO_INTERNAL_FORMAT | |
11513 and TO_EXTERNAL_FORMAT macros. (An Eistring can store both the internal | |
11514 and external version of its data, but the external version is only | |
11515 initialized or changed when you call @code{eito_external()}.) | |
11516 | |
11517 The idea is to make it as easy to write Mule-correct string manipulation | |
11518 code as it is to write normal string manipulation code. We also make | |
11519 the API sufficiently general that it can handle multiple internal data | |
11520 formats (e.g. some fixed-width optimizing formats and a default variable | |
11521 width format) and allows for @strong{ANY} data format we might choose in the | |
11522 future for the default format, including UCS2. (In other words, we can't | |
11523 assume that the internal format is ASCII-compatible and we can't assume | |
11524 it doesn't have embedded null bytes. We do assume, however, that any | |
11525 chosen format will have the concept of null-termination.) All of this is | |
11526 hidden from the user. | |
11527 | |
11528 #### It is really too bad that we don't have a real object-oriented | |
11529 language, or at least a language with polymorphism! | |
11530 | |
11531 | |
11532 ********************************************** | |
11533 * Declaration * | |
11534 ********************************************** | |
11535 | |
11536 To declare an Eistring, either put one of the following in the local | |
11537 variable section: | |
11538 | |
11539 DECLARE_EISTRING (name); | |
11540 Declare a new Eistring and initialize it to the empy string. This | |
11541 is a standard local variable declaration and can go anywhere in the | |
11542 variable declaration section. NAME itself is declared as an | |
11543 Eistring *, and its storage declared on the stack. | |
11544 | |
11545 DECLARE_EISTRING_MALLOC (name); | |
11546 Declare and initialize a new Eistring, which uses @code{malloc()}ed | |
11547 instead of @code{ALLOCA()}ed data. This is a standard local variable | |
11548 declaration and can go anywhere in the variable declaration | |
11549 section. Once you initialize the Eistring, you will have to free | |
11550 it using @code{eifree()} to avoid memory leaks. You will need to use this | |
11551 form if you are passing an Eistring to any function that modifies | |
11552 it (otherwise, the modified data may be in stack space and get | |
11553 overwritten when the function returns). | |
11554 | |
11555 or use | |
11556 | |
11557 Eistring ei; | |
11558 void eiinit (Eistring *ei); | |
11559 void eiinit_malloc (Eistring *einame); | |
11560 If you need to put an Eistring elsewhere than in a local variable | |
11561 declaration (e.g. in a structure), declare it as shown and then | |
11562 call one of the init macros. | |
11563 | |
11564 Also note: | |
11565 | |
11566 void eifree (Eistring *ei); | |
11567 If you declared an Eistring to use @code{malloc()} to hold its data, | |
11568 or converted it to the heap using @code{eito_malloc()}, then this | |
11569 releases any data in it and afterwards resets the Eistring | |
11570 using @code{eiinit_malloc()}. Otherwise, it just resets the Eistring | |
11571 using @code{eiinit()}. | |
11572 | |
11573 | |
11574 ********************************************** | |
11575 * Conventions * | |
11576 ********************************************** | |
11577 | |
11578 - The names of the functions have been chosen, where possible, to | |
11579 match the names of @code{str*()} functions in the standard C API. | |
11580 - | |
11581 | |
11582 | |
11583 ********************************************** | |
11584 * Initialization * | |
11585 ********************************************** | |
11586 | |
11587 void eireset (Eistring *eistr); | |
11588 Initialize the Eistring to the empty string. | |
11589 | |
11590 void eicpy_* (Eistring *eistr, ...); | |
11591 Initialize the Eistring from somewhere: | |
11592 | |
11593 void eicpy_ei (Eistring *eistr, Eistring *eistr2); | |
11594 ... from another Eistring. | |
11595 void eicpy_lstr (Eistring *eistr, Lisp_Object lisp_string); | |
11596 ... from a Lisp_Object string. | |
11597 void eicpy_ch (Eistring *eistr, Ichar ch); | |
11598 ... from an Ichar (this can be a conventional C character). | |
11599 | |
11600 void eicpy_lstr_off (Eistring *eistr, Lisp_Object lisp_string, | |
11601 Bytecount off, Charcount charoff, | |
11602 Bytecount len, Charcount charlen); | |
11603 ... from a section of a Lisp_Object string. | |
11604 void eicpy_lbuf (Eistring *eistr, Lisp_Object lisp_buf, | |
11605 Bytecount off, Charcount charoff, | |
11606 Bytecount len, Charcount charlen); | |
11607 ... from a section of a Lisp_Object buffer. | |
11608 void eicpy_raw (Eistring *eistr, const Ibyte *data, Bytecount len); | |
11609 ... from raw internal-format data in the default internal format. | |
11610 void eicpy_rawz (Eistring *eistr, const Ibyte *data); | |
11611 ... from raw internal-format data in the default internal format | |
11612 that is "null-terminated" (the meaning of this depends on the nature | |
11613 of the default internal format). | |
11614 void eicpy_raw_fmt (Eistring *eistr, const Ibyte *data, Bytecount len, | |
11615 Internal_Format intfmt, Lisp_Object object); | |
11616 ... from raw internal-format data in the specified format. | |
11617 void eicpy_rawz_fmt (Eistring *eistr, const Ibyte *data, | |
11618 Internal_Format intfmt, Lisp_Object object); | |
11619 ... from raw internal-format data in the specified format that is | |
11620 "null-terminated" (the meaning of this depends on the nature of | |
11621 the specific format). | |
11622 void eicpy_c (Eistring *eistr, const Ascbyte *c_string); | |
11623 ... from an ASCII null-terminated string. Non-ASCII characters in | |
11624 the string are @strong{ILLEGAL} (read @code{abort()} with error-checking defined). | |
11625 void eicpy_c_len (Eistring *eistr, const Ascbyte *c_string, len); | |
11626 ... from an ASCII string, with length specified. Non-ASCII characters | |
11627 in the string are @strong{ILLEGAL} (read @code{abort()} with error-checking defined). | |
11628 void eicpy_ext (Eistring *eistr, const Extbyte *extdata, | |
11629 Lisp_Object codesys); | |
11630 ... from external null-terminated data, with coding system specified. | |
11631 void eicpy_ext_len (Eistring *eistr, const Extbyte *extdata, | |
11632 Bytecount extlen, Lisp_Object codesys); | |
11633 ... from external data, with length and coding system specified. | |
11634 void eicpy_lstream (Eistring *eistr, Lisp_Object lstream); | |
11635 ... from an lstream; reads data till eof. Data must be in default | |
11636 internal format; otherwise, interpose a decoding lstream. | |
11637 | |
11638 | |
11639 ********************************************** | |
11640 * Getting the data out of the Eistring * | |
11641 ********************************************** | |
11642 | |
11643 Ibyte *eidata (Eistring *eistr); | |
11644 Return a pointer to the raw data in an Eistring. This is NOT | |
11645 a copy. | |
11646 | |
11647 Lisp_Object eimake_string (Eistring *eistr); | |
11648 Make a Lisp string out of the Eistring. | |
11649 | |
11650 Lisp_Object eimake_string_off (Eistring *eistr, | |
11651 Bytecount off, Charcount charoff, | |
11652 Bytecount len, Charcount charlen); | |
11653 Make a Lisp string out of a section of the Eistring. | |
11654 | |
11655 void eicpyout_alloca (Eistring *eistr, LVALUE: Ibyte *ptr_out, | |
11656 LVALUE: Bytecount len_out); | |
11657 Make an @code{ALLOCA()} copy of the data in the Eistring, using the | |
11658 default internal format. Due to the nature of @code{ALLOCA()}, this | |
11659 must be a macro, with all lvalues passed in as parameters. | |
11660 (More specifically, not all compilers correctly handle using | |
11661 @code{ALLOCA()} as the argument to a function call -- GCC on x86 | |
11662 didn't used to, for example.) A pointer to the @code{ALLOCA()}ed data | |
11663 is stored in PTR_OUT, and the length of the data (not including | |
11664 the terminating zero) is stored in LEN_OUT. | |
11665 | |
11666 void eicpyout_alloca_fmt (Eistring *eistr, LVALUE: Ibyte *ptr_out, | |
11667 LVALUE: Bytecount len_out, | |
11668 Internal_Format intfmt, Lisp_Object object); | |
11669 Like @code{eicpyout_alloca()}, but converts to the specified internal | |
11670 format. (No formats other than FORMAT_DEFAULT are currently | |
11671 implemented, and you get an assertion failure if you try.) | |
11672 | |
11673 Ibyte *eicpyout_malloc (Eistring *eistr, Bytecount *intlen_out); | |
11674 Make a @code{malloc()} copy of the data in the Eistring, using the | |
11675 default internal format. This is a real function. No lvalues | |
11676 passed in. Returns the new data, and stores the length (not | |
11677 including the terminating zero) using INTLEN_OUT, unless it's | |
11678 a NULL pointer. | |
11679 | |
11680 Ibyte *eicpyout_malloc_fmt (Eistring *eistr, Internal_Format intfmt, | |
11681 Bytecount *intlen_out, Lisp_Object object); | |
11682 Like @code{eicpyout_malloc()}, but converts to the specified internal | |
11683 format. (No formats other than FORMAT_DEFAULT are currently | |
11684 implemented, and you get an assertion failure if you try.) | |
11685 | |
11686 | |
11687 ********************************************** | |
11688 * Moving to the heap * | |
11689 ********************************************** | |
11690 | |
11691 void eito_malloc (Eistring *eistr); | |
11692 Move this Eistring to the heap. Its data will be stored in a | |
11693 @code{malloc()}ed block rather than the stack. Subsequent changes to | |
11694 this Eistring will @code{realloc()} the block as necessary. Use this | |
11695 when you want the Eistring to remain in scope past the end of | |
11696 this function call. You will have to manually free the data | |
11697 in the Eistring using @code{eifree()}. | |
11698 | |
11699 void eito_alloca (Eistring *eistr); | |
11700 Move this Eistring back to the stack, if it was moved to the | |
11701 heap with @code{eito_malloc()}. This will automatically free any | |
11702 heap-allocated data. | |
11703 | |
11704 | |
11705 | |
11706 ********************************************** | |
11707 * Retrieving the length * | |
11708 ********************************************** | |
11709 | |
11710 Bytecount eilen (Eistring *eistr); | |
11711 Return the length of the internal data, in bytes. See also | |
11712 @code{eiextlen()}, below. | |
11713 Charcount eicharlen (Eistring *eistr); | |
11714 Return the length of the internal data, in characters. | |
11715 | |
11716 | |
11717 ********************************************** | |
11718 * Working with positions * | |
11719 ********************************************** | |
11720 | |
11721 Bytecount eicharpos_to_bytepos (Eistring *eistr, Charcount charpos); | |
11722 Convert a char offset to a byte offset. | |
11723 Charcount eibytepos_to_charpos (Eistring *eistr, Bytecount bytepos); | |
11724 Convert a byte offset to a char offset. | |
11725 Bytecount eiincpos (Eistring *eistr, Bytecount bytepos); | |
11726 Increment the given position by one character. | |
11727 Bytecount eiincpos_n (Eistring *eistr, Bytecount bytepos, Charcount n); | |
11728 Increment the given position by N characters. | |
11729 Bytecount eidecpos (Eistring *eistr, Bytecount bytepos); | |
11730 Decrement the given position by one character. | |
11731 Bytecount eidecpos_n (Eistring *eistr, Bytecount bytepos, Charcount n); | |
11732 Deccrement the given position by N characters. | |
11733 | |
11734 | |
11735 ********************************************** | |
11736 * Getting the character at a position * | |
11737 ********************************************** | |
11738 | |
11739 Ichar eigetch (Eistring *eistr, Bytecount bytepos); | |
11740 Return the character at a particular byte offset. | |
11741 Ichar eigetch_char (Eistring *eistr, Charcount charpos); | |
11742 Return the character at a particular character offset. | |
11743 | |
11744 | |
11745 ********************************************** | |
11746 * Setting the character at a position * | |
11747 ********************************************** | |
11748 | |
11749 Ichar eisetch (Eistring *eistr, Bytecount bytepos, Ichar chr); | |
11750 Set the character at a particular byte offset. | |
11751 Ichar eisetch_char (Eistring *eistr, Charcount charpos, Ichar chr); | |
11752 Set the character at a particular character offset. | |
11753 | |
11754 | |
11755 ********************************************** | |
11756 * Concatenation * | |
11757 ********************************************** | |
11758 | |
11759 void eicat_* (Eistring *eistr, ...); | |
11760 Concatenate onto the end of the Eistring, with data coming from the | |
11761 same places as above: | |
11762 | |
11763 void eicat_ei (Eistring *eistr, Eistring *eistr2); | |
11764 ... from another Eistring. | |
11765 void eicat_c (Eistring *eistr, Ascbyte *c_string); | |
11766 ... from an ASCII null-terminated string. Non-ASCII characters in | |
11767 the string are @strong{ILLEGAL} (read @code{abort()} with error-checking defined). | |
11768 void eicat_raw (ei, const Ibyte *data, Bytecount len); | |
11769 ... from raw internal-format data in the default internal format. | |
11770 void eicat_rawz (ei, const Ibyte *data); | |
11771 ... from raw internal-format data in the default internal format | |
11772 that is "null-terminated" (the meaning of this depends on the nature | |
11773 of the default internal format). | |
11774 void eicat_lstr (ei, Lisp_Object lisp_string); | |
11775 ... from a Lisp_Object string. | |
11776 void eicat_ch (ei, Ichar ch); | |
11777 ... from an Ichar. | |
11778 | |
11779 All except the first variety are convenience functions. | |
11780 n the general case, create another Eistring from the source.) | |
11781 | |
11782 | |
11783 ********************************************** | |
11784 * Replacement * | |
11785 ********************************************** | |
11786 | |
11787 void eisub_* (Eistring *eistr, Bytecount off, Charcount charoff, | |
11788 Bytecount len, Charcount charlen, ...); | |
11789 Replace a section of the Eistring, specifically: | |
11790 | |
11791 void eisub_ei (Eistring *eistr, Bytecount off, Charcount charoff, | |
11792 Bytecount len, Charcount charlen, Eistring *eistr2); | |
11793 ... with another Eistring. | |
11794 void eisub_c (Eistring *eistr, Bytecount off, Charcount charoff, | |
11795 Bytecount len, Charcount charlen, Ascbyte *c_string); | |
11796 ... with an ASCII null-terminated string. Non-ASCII characters in | |
11797 the string are @strong{ILLEGAL} (read @code{abort()} with error-checking defined). | |
11798 void eisub_ch (Eistring *eistr, Bytecount off, Charcount charoff, | |
11799 Bytecount len, Charcount charlen, Ichar ch); | |
11800 ... with an Ichar. | |
11801 | |
11802 void eidel (Eistring *eistr, Bytecount off, Charcount charoff, | |
11803 Bytecount len, Charcount charlen); | |
11804 Delete a section of the Eistring. | |
11805 | |
11806 | |
11807 ********************************************** | |
11808 * Converting to an external format * | |
11809 ********************************************** | |
11810 | |
11811 void eito_external (Eistring *eistr, Lisp_Object codesys); | |
11812 Convert the Eistring to an external format and store the result | |
11813 in the string. NOTE: Further changes to the Eistring will @strong{NOT} | |
11814 change the external data stored in the string. You will have to | |
11815 call @code{eito_external()} again in such a case if you want the external | |
11816 data. | |
11817 | |
11818 Extbyte *eiextdata (Eistring *eistr); | |
11819 Return a pointer to the external data stored in the Eistring as | |
11820 a result of a prior call to @code{eito_external()}. | |
11821 | |
11822 Bytecount eiextlen (Eistring *eistr); | |
11823 Return the length in bytes of the external data stored in the | |
11824 Eistring as a result of a prior call to @code{eito_external()}. | |
11825 | |
11826 | |
11827 ********************************************** | |
11828 * Searching in the Eistring for a character * | |
11829 ********************************************** | |
11830 | |
11831 Bytecount eichr (Eistring *eistr, Ichar chr); | |
11832 Charcount eichr_char (Eistring *eistr, Ichar chr); | |
11833 Bytecount eichr_off (Eistring *eistr, Ichar chr, Bytecount off, | |
11834 Charcount charoff); | |
11835 Charcount eichr_off_char (Eistring *eistr, Ichar chr, Bytecount off, | |
11836 Charcount charoff); | |
11837 Bytecount eirchr (Eistring *eistr, Ichar chr); | |
11838 Charcount eirchr_char (Eistring *eistr, Ichar chr); | |
11839 Bytecount eirchr_off (Eistring *eistr, Ichar chr, Bytecount off, | |
11840 Charcount charoff); | |
11841 Charcount eirchr_off_char (Eistring *eistr, Ichar chr, Bytecount off, | |
11842 Charcount charoff); | |
11843 | |
11844 | |
11845 ********************************************** | |
11846 * Searching in the Eistring for a string * | |
11847 ********************************************** | |
11848 | |
11849 Bytecount eistr_ei (Eistring *eistr, Eistring *eistr2); | |
11850 Charcount eistr_ei_char (Eistring *eistr, Eistring *eistr2); | |
11851 Bytecount eistr_ei_off (Eistring *eistr, Eistring *eistr2, Bytecount off, | |
11852 Charcount charoff); | |
11853 Charcount eistr_ei_off_char (Eistring *eistr, Eistring *eistr2, | |
11854 Bytecount off, Charcount charoff); | |
11855 Bytecount eirstr_ei (Eistring *eistr, Eistring *eistr2); | |
11856 Charcount eirstr_ei_char (Eistring *eistr, Eistring *eistr2); | |
11857 Bytecount eirstr_ei_off (Eistring *eistr, Eistring *eistr2, Bytecount off, | |
11858 Charcount charoff); | |
11859 Charcount eirstr_ei_off_char (Eistring *eistr, Eistring *eistr2, | |
11860 Bytecount off, Charcount charoff); | |
11861 | |
11862 Bytecount eistr_c (Eistring *eistr, Ascbyte *c_string); | |
11863 Charcount eistr_c_char (Eistring *eistr, Ascbyte *c_string); | |
11864 Bytecount eistr_c_off (Eistring *eistr, Ascbyte *c_string, Bytecount off, | |
11865 Charcount charoff); | |
11866 Charcount eistr_c_off_char (Eistring *eistr, Ascbyte *c_string, | |
11867 Bytecount off, Charcount charoff); | |
11868 Bytecount eirstr_c (Eistring *eistr, Ascbyte *c_string); | |
11869 Charcount eirstr_c_char (Eistring *eistr, Ascbyte *c_string); | |
11870 Bytecount eirstr_c_off (Eistring *eistr, Ascbyte *c_string, | |
11871 Bytecount off, Charcount charoff); | |
11872 Charcount eirstr_c_off_char (Eistring *eistr, Ascbyte *c_string, | |
11873 Bytecount off, Charcount charoff); | |
11874 | |
11875 | |
11876 ********************************************** | |
11877 * Comparison * | |
11878 ********************************************** | |
11879 | |
11880 int eicmp_* (Eistring *eistr, ...); | |
11881 int eicmp_off_* (Eistring *eistr, Bytecount off, Charcount charoff, | |
11882 Bytecount len, Charcount charlen, ...); | |
11883 int eicasecmp_* (Eistring *eistr, ...); | |
11884 int eicasecmp_off_* (Eistring *eistr, Bytecount off, Charcount charoff, | |
11885 Bytecount len, Charcount charlen, ...); | |
11886 int eicasecmp_i18n_* (Eistring *eistr, ...); | |
11887 int eicasecmp_i18n_off_* (Eistring *eistr, Bytecount off, Charcount charoff, | |
11888 Bytecount len, Charcount charlen, ...); | |
11889 | |
11890 Compare the Eistring with the other data. Return value same as | |
11891 from strcmp. The `*' is either `ei' for another Eistring (in | |
11892 which case `...' is an Eistring), or `c' for a pure-ASCII string | |
11893 (in which case `...' is a pointer to that string). For anything | |
11894 more complex, first create an Eistring out of the source. | |
11895 Comparison is either simple (`eicmp_...'), ASCII case-folding | |
11896 (`eicasecmp_...'), or multilingual case-folding | |
11897 (`eicasecmp_i18n_...). | |
11898 | |
11899 | |
11900 More specifically, the prototypes are: | |
11901 | |
11902 int eicmp_ei (Eistring *eistr, Eistring *eistr2); | |
11903 int eicmp_off_ei (Eistring *eistr, Bytecount off, Charcount charoff, | |
11904 Bytecount len, Charcount charlen, Eistring *eistr2); | |
11905 int eicasecmp_ei (Eistring *eistr, Eistring *eistr2); | |
11906 int eicasecmp_off_ei (Eistring *eistr, Bytecount off, Charcount charoff, | |
11907 Bytecount len, Charcount charlen, Eistring *eistr2); | |
11908 int eicasecmp_i18n_ei (Eistring *eistr, Eistring *eistr2); | |
11909 int eicasecmp_i18n_off_ei (Eistring *eistr, Bytecount off, | |
11910 Charcount charoff, Bytecount len, | |
11911 Charcount charlen, Eistring *eistr2); | |
11912 | |
11913 int eicmp_c (Eistring *eistr, Ascbyte *c_string); | |
11914 int eicmp_off_c (Eistring *eistr, Bytecount off, Charcount charoff, | |
11915 Bytecount len, Charcount charlen, Ascbyte *c_string); | |
11916 int eicasecmp_c (Eistring *eistr, Ascbyte *c_string); | |
11917 int eicasecmp_off_c (Eistring *eistr, Bytecount off, Charcount charoff, | |
11918 Bytecount len, Charcount charlen, | |
11919 Ascbyte *c_string); | |
11920 int eicasecmp_i18n_c (Eistring *eistr, Ascbyte *c_string); | |
11921 int eicasecmp_i18n_off_c (Eistring *eistr, Bytecount off, Charcount charoff, | |
11922 Bytecount len, Charcount charlen, | |
11923 Ascbyte *c_string); | |
11924 | |
11925 | |
11926 ********************************************** | |
11927 * Case-changing the Eistring * | |
11928 ********************************************** | |
11929 | |
11930 void eilwr (Eistring *eistr); | |
11931 Convert all characters in the Eistring to lowercase. | |
11932 void eiupr (Eistring *eistr); | |
11933 Convert all characters in the Eistring to uppercase. | |
11934 @end example | |
11935 | |
11936 @node Coding for Mule, CCL, Internal Text API's, Multilingual Support | |
11937 @section Coding for Mule | |
11938 @cindex coding for Mule | |
11939 @cindex Mule, coding for | |
11940 | |
11941 Although Mule support is not compiled by default in XEmacs, many people | |
11942 are using it, and we consider it crucial that new code works correctly | |
11943 with multibyte characters. This is not hard; it is only a matter of | |
11944 following several simple user-interface guidelines. Even if you never | |
11945 compile with Mule, with a little practice you will find it quite easy | |
11946 to code Mule-correctly. | |
11947 | |
11948 Note that these guidelines are not necessarily tied to the current Mule | |
11949 implementation; they are also a good idea to follow on the grounds of | |
11950 code generalization for future I18N work. | |
11951 | |
11952 @menu | |
11953 * Character-Related Data Types:: | |
11954 * Working With Character and Byte Positions:: | |
11955 * Conversion to and from External Data:: | |
11956 * General Guidelines for Writing Mule-Aware Code:: | |
11957 * An Example of Mule-Aware Code:: | |
11958 * Mule-izing Code:: | |
11959 @end menu | |
11960 | |
11961 @node Character-Related Data Types, Working With Character and Byte Positions, Coding for Mule, Coding for Mule | |
11962 @subsection Character-Related Data Types | |
11963 @cindex character-related data types | |
11964 @cindex data types, character-related | |
11965 | |
11966 First, let's review the basic character-related datatypes used by | |
11967 XEmacs. Note that some of the separate @code{typedef}s are not | |
11968 mandatory, but they improve clarity of code a great deal, because one | |
11969 glance at the declaration can tell the intended use of the variable. | |
11970 | |
11971 @table @code | |
11972 @item Ichar | |
11973 @cindex Ichar | |
11974 An @code{Ichar} holds a single Emacs character. | |
11975 | |
11976 Obviously, the equality between characters and bytes is lost in the Mule | |
11977 world. Characters can be represented by one or more bytes in the | |
11978 buffer, and @code{Ichar} is a C type large enough to hold any | |
11979 character. (This currently isn't quite true for ISO 10646, which | |
11980 defines a character as a 31-bit non-negative quantity, while XEmacs | |
11981 characters are only 30-bits. This is irrelevant, unless you are | |
11982 considering using the ISO 10646 private groups to support really large | |
11983 private character sets---in particular, the Mule character set!---in | |
11984 a version of XEmacs using Unicode internally.) | |
11985 | |
11986 Without Mule support, an @code{Ichar} is equivalent to an | |
11987 @code{unsigned char}. [[This doesn't seem to be true; @file{lisp.h} | |
11988 unconditionally @samp{typedef}s @code{Ichar} to @code{int}.]] | |
11989 | |
11990 @item Ibyte | |
11991 @cindex Ibyte | |
11992 The data representing the text in a buffer or string is logically a set | |
11993 of @code{Ibyte}s. | |
11994 | |
11995 XEmacs does not work with the same character formats all the time; when | |
11996 reading characters from the outside, it decodes them to an internal | |
11997 format, and likewise encodes them when writing. @code{Ibyte} (in fact | |
11998 @code{unsigned char}) is the basic unit of XEmacs internal buffers and | |
11999 strings format. An @code{Ibyte *} is the type that points at text | |
12000 encoded in the variable-width internal encoding. | |
12001 | |
12002 One character can correspond to one or more @code{Ibyte}s. In the | |
12003 current Mule implementation, an ASCII character is represented by the | |
12004 same @code{Ibyte}, and other characters are represented by a sequence | |
12005 of two or more @code{Ibyte}s. (This will also be true of an | |
12006 implementation using UTF-8 as the internal encoding. In fact, only code | |
12007 that implements character code conversions and a very few macros used to | |
12008 implement motion by whole characters will notice the difference between | |
12009 UTF-8 and the Mule encoding.) | |
12010 | |
12011 Without Mule support, there are exactly 256 characters, implicitly | |
12012 Latin-1, and each character is represented using one @code{Ibyte}, and | |
12013 there is a one-to-one correspondence between @code{Ibyte}s and | |
12014 @code{Ichar}s. | |
12015 | |
12016 @item Charxpos | |
12017 @item Charbpos | |
12018 @itemx Charcount | |
12019 @cindex Charxpos | |
12020 @cindex Charbpos | |
12021 @cindex Charcount | |
12022 A @code{Charbpos} represents a character position in a buffer. A | |
12023 @code{Charcount} represents a number (count) of characters. Logically, | |
12024 subtracting two @code{Charbpos} values yields a @code{Charcount} value. | |
12025 When representing a character position in a string, we just use | |
12026 @code{Charcount} directly. The reason for having a separate typedef for | |
12027 buffer positions is that they are 1-based, whereas string positions are | |
12028 0-based and hence string counts and positions can be freely intermixed (a | |
12029 string position is equivalent to the count of characters from the | |
12030 beginning). When representing a character position that could be either | |
12031 in a buffer or string (for example, in the extent code), @code{Charxpos} | |
12032 is used. Although all of these are @code{typedef}ed to | |
12033 @code{EMACS_INT}, we use them in preference to @code{EMACS_INT} to make | |
12034 it clear what sort of position is being used. | |
12035 | |
12036 @code{Charxpos}, @code{Charbpos} and @code{Charcount} values are the | |
12037 only ones that are ever visible to Lisp. | |
12038 | |
12039 @item Bytexpos | |
12040 @itemx Bytecount | |
12041 @cindex Bytebpos | |
12042 @cindex Bytecount | |
12043 A @code{Bytebpos} represents a byte position in a buffer. A | |
12044 @code{Bytecount} represents the distance between two positions, in | |
12045 bytes. Byte positions in strings use @code{Bytecount}, and for byte | |
12046 positions that can be either in a buffer or string, @code{Bytexpos} is | |
12047 used. The relationship between @code{Bytexpos}, @code{Bytebpos} and | |
12048 @code{Bytecount} is the same as the relationship between | |
12049 @code{Charxpos}, @code{Charbpos} and @code{Charcount}. | |
12050 | |
12051 @item Extbyte | |
12052 @cindex Extbyte | |
12053 When dealing with the outside world, XEmacs works with @code{Extbyte}s, | |
12054 which are equivalent to @code{char}. The distance between two | |
12055 @code{Extbyte}s is a @code{Bytecount}, since external text is a | |
12056 byte-by-byte encoding. Extbytes occur mainly at the transition point | |
12057 between internal text and external functions. XEmacs code should not, | |
12058 if it can possibly avoid it, do any actual manipulation using external | |
12059 text, since its format is completely unpredictable (it might not even be | |
12060 ASCII-compatible). | |
12061 @end table | |
12062 | |
12063 @node Working With Character and Byte Positions, Conversion to and from External Data, Character-Related Data Types, Coding for Mule | |
12064 @subsection Working With Character and Byte Positions | |
12065 @cindex character and byte positions, working with | |
12066 @cindex byte positions, working with character and | |
12067 @cindex positions, working with character and byte | |
12068 | |
12069 Now that we have defined the basic character-related types, we can look | |
12070 at the macros and functions designed for work with them and for | |
12071 conversion between them. Most of these macros are defined in | |
12072 @file{buffer.h}, and we don't discuss all of them here, but only the | |
12073 most important ones. Examining the existing code is the best way to | |
12074 learn about them. | |
12075 | |
12076 @table @code | |
12077 @item MAX_ICHAR_LEN | |
12078 @cindex MAX_ICHAR_LEN | |
12079 This preprocessor constant is the maximum number of buffer bytes to | |
12080 represent an Emacs character in the variable width internal encoding. | |
12081 It is useful when allocating temporary strings to keep a known number of | |
12082 characters. For instance: | |
12083 | |
12084 @example | |
12085 @group | |
12086 @{ | |
12087 Charcount cclen; | |
12088 ... | |
12089 @{ | |
12090 /* Allocate place for @var{cclen} characters. */ | |
12091 Ibyte *buf = (Ibyte *) alloca (cclen * MAX_ICHAR_LEN); | |
12092 ... | |
12093 @end group | |
12094 @end example | |
12095 | |
12096 If you followed the previous section, you can guess that, logically, | |
12097 multiplying a @code{Charcount} value with @code{MAX_ICHAR_LEN} produces | |
12098 a @code{Bytecount} value. | |
12099 | |
12100 In the current Mule implementation, @code{MAX_ICHAR_LEN} equals 4. | |
12101 Without Mule, it is 1. In a mature Unicode-based XEmacs, it will also | |
12102 be 4 (since all Unicode characters can be encoded in UTF-8 in 4 bytes or | |
12103 less), but some versions may use up to 6, in order to use the large | |
12104 private space provided by ISO 10646 to ``mirror'' the Mule code space. | |
12105 | |
12106 @item itext_ichar | |
12107 @itemx set_itext_ichar | |
12108 @cindex itext_ichar | |
12109 @cindex set_itext_ichar | |
12110 The @code{itext_ichar} macro takes a @code{Ibyte} pointer and | |
12111 returns the @code{Ichar} stored at that position. If it were a | |
12112 function, its prototype would be: | |
12113 | |
12114 @example | |
12115 Ichar itext_ichar (Ibyte *p); | |
12116 @end example | |
12117 | |
12118 @code{set_itext_ichar} stores an @code{Ichar} to the specified byte | |
12119 position. It returns the number of bytes stored: | |
12120 | |
12121 @example | |
12122 Bytecount set_itext_ichar (Ibyte *p, Ichar c); | |
12123 @end example | |
12124 | |
12125 It is important to note that @code{set_itext_ichar} is safe only for | |
12126 appending a character at the end of a buffer, not for overwriting a | |
12127 character in the middle. This is because the width of characters | |
12128 varies, and @code{set_itext_ichar} cannot resize the string if it | |
12129 writes, say, a two-byte character where a single-byte character used to | |
12130 reside. | |
12131 | |
12132 A typical use of @code{set_itext_ichar} can be demonstrated by this | |
12133 example, which copies characters from buffer @var{buf} to a temporary | |
12134 string of Ibytes. | |
12135 | |
12136 @example | |
12137 @group | |
12138 @{ | |
12139 Charbpos pos; | |
12140 for (pos = beg; pos < end; pos++) | |
12141 @{ | |
12142 Ichar c = BUF_FETCH_CHAR (buf, pos); | |
12143 p += set_itext_ichar (buf, c); | |
12144 @} | |
12145 @} | |
12146 @end group | |
12147 @end example | |
12148 | |
12149 Note how @code{set_itext_ichar} is used to store the @code{Ichar} | |
12150 and increment the counter, at the same time. | |
12151 | |
12152 @item INC_IBYTEPTR | |
12153 @itemx DEC_IBYTEPTR | |
12154 @cindex INC_IBYTEPTR | |
12155 @cindex DEC_IBYTEPTR | |
12156 These two macros increment and decrement an @code{Ibyte} pointer, | |
12157 respectively. They will adjust the pointer by the appropriate number of | |
12158 bytes according to the byte length of the character stored there. Both | |
12159 macros assume that the memory address is located at the beginning of a | |
12160 valid character. | |
12161 | |
12162 Without Mule support, @code{INC_IBYTEPTR (p)} and @code{DEC_IBYTEPTR (p)} | |
12163 simply expand to @code{p++} and @code{p--}, respectively. | |
12164 | |
12165 @item bytecount_to_charcount | |
12166 @cindex bytecount_to_charcount | |
12167 Given a pointer to a text string and a length in bytes, return the | |
12168 equivalent length in characters. | |
12169 | |
12170 @example | |
12171 Charcount bytecount_to_charcount (Ibyte *p, Bytecount bc); | |
12172 @end example | |
12173 | |
12174 @item charcount_to_bytecount | |
12175 @cindex charcount_to_bytecount | |
12176 Given a pointer to a text string and a length in characters, return the | |
12177 equivalent length in bytes. | |
12178 | |
12179 @example | |
12180 Bytecount charcount_to_bytecount (Ibyte *p, Charcount cc); | |
12181 @end example | |
12182 | |
12183 @item itext_n_addr | |
12184 @cindex itext_n_addr | |
12185 Return a pointer to the beginning of the character offset @var{cc} (in | |
12186 characters) from @var{p}. | |
12187 | |
12188 @example | |
12189 Ibyte *itext_n_addr (Ibyte *p, Charcount cc); | |
12190 @end example | |
12191 @end table | |
12192 | |
12193 @node Conversion to and from External Data, General Guidelines for Writing Mule-Aware Code, Working With Character and Byte Positions, Coding for Mule | |
12194 @subsection Conversion to and from External Data | |
12195 @cindex conversion to and from external data | |
12196 @cindex external data, conversion to and from | |
12197 | |
12198 When an external function, such as a C library function, returns a | |
12199 @code{char} pointer, you should almost never treat it as @code{Ibyte}. | |
12200 This is because these returned strings may contain 8bit characters which | |
12201 can be misinterpreted by XEmacs, and cause a crash. Likewise, when | |
12202 exporting a piece of internal text to the outside world, you should | |
12203 always convert it to an appropriate external encoding, lest the internal | |
12204 stuff (such as the infamous \201 characters) leak out. | |
12205 | |
12206 The interface to conversion between the internal and external | |
12207 representations of text are the numerous conversion macros defined in | |
12208 @file{buffer.h}. There used to be a fixed set of external formats | |
12209 supported by these macros, but now any coding system can be used with | |
12210 them. The coding system alias mechanism is used to create the | |
12211 following logical coding systems, which replace the fixed external | |
12212 formats. The (dontusethis-set-symbol-value-handler) mechanism was | |
12213 enhanced to make this possible (more work on that is needed). | |
12214 | |
12215 Often useful coding systems: | |
12216 | |
12217 @table @code | |
12218 @item Qbinary | |
12219 This is the simplest format and is what we use in the absence of a more | |
12220 appropriate format. This converts according to the @code{binary} coding | |
12221 system: | |
12222 | |
12223 @enumerate a | |
12224 @item | |
12225 On input, bytes 0--255 are converted into (implicitly Latin-1) | |
12226 characters 0--255. A non-Mule xemacs doesn't really know about | |
12227 different character sets and the fonts to display them, so the bytes can | |
12228 be treated as text in different 1-byte encodings by simply setting the | |
12229 appropriate fonts. So in a sense, non-Mule xemacs is a multi-lingual | |
12230 editor if, for example, different fonts are used to display text in | |
12231 different buffers, faces, or windows. The specifier mechanism gives the | |
12232 user complete control over this kind of behavior. | |
12233 @item | |
12234 On output, characters 0--255 are converted into bytes 0--255 and other | |
12235 characters are converted into @samp{~}. | |
12236 @end enumerate | |
12237 | |
12238 @item Qnative | |
12239 Format used for the external Unix environment---@code{argv[]}, stuff | |
12240 from @code{getenv()}, stuff from the @file{/etc/passwd} file, etc. | |
12241 This is encoded according to the encoding specified by the current locale. | |
12242 [[This is dangerous; current locale is user preference, and the system | |
12243 is probably going to be something else. Is there anything we can do | |
12244 about it?]] | |
12245 | |
12246 @item Qfile_name | |
12247 Format used for filenames. This is normally the same as @code{Qnative}, | |
12248 but the two should be distinguished for clarity and possible future | |
12249 separation -- and also because @code{Qfile_name} can be changed using either | |
12250 the @code{file-name-coding-system} or @code{pathname-coding-system} (now | |
12251 obsolete) variables. | |
12252 | |
12253 @item Qctext | |
12254 Compound-text format. This is the standard X11 format used for data | |
12255 stored in properties, selections, and the like. This is an 8-bit | |
12256 no-lock-shift ISO2022 coding system. This is a real coding system, | |
12257 unlike @code{Qfile_name}, which is user-definable. | |
12258 | |
12259 @item Qmswindows_tstr | |
12260 Used for external data in all MS Windows functions that are declared to | |
12261 accept data of type @code{LPTSTR} or @code{LPCSTR}. This maps to either | |
12262 @code{Qmswindows_multibyte} (a locale-specific encoding, same as | |
12263 @code{Qnative}) or @code{Qmswindows_unicode}, depending on whether | |
12264 XEmacs is being run under Windows 9X or Windows NT/2000/XP. | |
12265 @end table | |
12266 | |
12267 Many other coding systems are provided by default. | |
12268 | |
12269 There are two fundamental macros to convert between external and | |
12270 internal format, as well as various convenience macros to simplify the | |
12271 most common operations. | |
12272 | |
12273 @code{TO_INTERNAL_FORMAT} converts external data to internal format, and | |
12274 @code{TO_EXTERNAL_FORMAT} converts the other way around. The arguments | |
12275 each of these receives are a source type, a source, a sink type, a sink, | |
12276 and a coding system (or a symbol naming a coding system). | |
12277 | |
12278 A typical call looks like | |
12279 @example | |
12280 TO_EXTERNAL_FORMAT (LISP_STRING, str, C_STRING_MALLOC, ptr, Qfile_name); | |
12281 @end example | |
12282 | |
12283 which means that the contents of the lisp string @code{str} are written | |
12284 to a malloc'ed memory area which will be pointed to by @code{ptr}, after | |
12285 the function returns. The conversion will be done using the | |
12286 @code{file-name} coding system, which will be controlled by the user | |
12287 indirectly by setting or binding the variable | |
12288 @code{file-name-coding-system}. | |
12289 | |
12290 Some sources and sinks require two C variables to specify. We use some | |
12291 preprocessor magic to allow different source and sink types, and even | |
12292 different numbers of arguments to specify different types of sources and | |
12293 sinks. | |
12294 | |
12295 So we can have a call that looks like | |
12296 @example | |
12297 TO_INTERNAL_FORMAT (DATA, (ptr, len), | |
12298 MALLOC, (ptr, len), | |
12299 coding_system); | |
12300 @end example | |
12301 | |
12302 The parenthesized argument pairs are required to make the preprocessor | |
12303 magic work. | |
12304 | |
12305 Here are the different source and sink types: | |
12306 | |
12307 @table @code | |
12308 @item @code{DATA, (ptr, len),} | |
12309 input data is a fixed buffer of size @var{len} at address @var{ptr} | |
12310 @item @code{ALLOCA, (ptr, len),} | |
12311 output data is placed in an @code{alloca()}ed buffer of size @var{len} pointed to by @var{ptr} | |
12312 @item @code{MALLOC, (ptr, len),} | |
12313 output data is in a @code{malloc()}ed buffer of size @var{len} pointed to by @var{ptr} | |
12314 @item @code{C_STRING_ALLOCA, ptr,} | |
12315 equivalent to @code{ALLOCA (ptr, len_ignored)} on output. | |
12316 @item @code{C_STRING_MALLOC, ptr,} | |
12317 equivalent to @code{MALLOC (ptr, len_ignored)} on output | |
12318 @item @code{C_STRING, ptr,} | |
12319 equivalent to @code{DATA, (ptr, strlen/wcslen (ptr))} on input | |
12320 @item @code{LISP_STRING, string,} | |
12321 input or output is a Lisp_Object of type string | |
12322 @item @code{LISP_BUFFER, buffer,} | |
12323 output is written to @code{(point)} in lisp buffer @var{buffer} | |
12324 @item @code{LISP_LSTREAM, lstream,} | |
12325 input or output is a Lisp_Object of type lstream | |
12326 @item @code{LISP_OPAQUE, object,} | |
12327 input or output is a Lisp_Object of type opaque | |
12328 @end table | |
12329 | |
12330 A source type of @code{C_STRING} or a sink type of | |
12331 @code{C_STRING_ALLOCA} or @code{C_STRING_MALLOC} is appropriate where | |
12332 the external API is not '\0'-byte-clean -- i.e. it expects strings to be | |
12333 terminated with a null byte. For external API's that are in fact | |
12334 '\0'-byte-clean, we should of course not use these. | |
12335 | |
12336 The sinks to be specified must be lvalues, unless they are the lisp | |
12337 object types @code{LISP_LSTREAM} or @code{LISP_BUFFER}. | |
12338 | |
12339 There is no problem using the same lvalue for source and sink. | |
12340 | |
12341 Garbage collection is inhibited during these conversion operations, so | |
12342 it is OK to pass in data from Lisp strings using @code{XSTRING_DATA}. | |
12343 | |
12344 For the sink types @code{ALLOCA} and @code{C_STRING_ALLOCA}, the | |
12345 resulting text is stored in a stack-allocated buffer, which is | |
12346 automatically freed on returning from the function. However, the sink | |
12347 types @code{MALLOC} and @code{C_STRING_MALLOC} return @code{xmalloc()}ed | |
12348 memory. The caller is responsible for freeing this memory using | |
12349 @code{xfree()}. | |
12350 | |
12351 Note that it doesn't make sense for @code{LISP_STRING} to be a source | |
12352 for @code{TO_INTERNAL_FORMAT} or a sink for @code{TO_EXTERNAL_FORMAT}. | |
12353 You'll get an assertion failure if you try. | |
12354 | |
12355 99% of conversions involve raw data or Lisp strings as both source and | |
12356 sink, and usually data is output as @code{alloca()}, or sometimes | |
12357 @code{xmalloc()}. For this reason, convenience macros are defined for | |
12358 many types of conversions involving raw data and/or Lisp strings, | |
12359 especially when the output is an @code{alloca()}ed string. (When the | |
12360 destination is a Lisp string, there are other functions that should be | |
12361 used instead -- @code{build_ext_string()} and @code{make_ext_string()}, | |
12362 for example.) The convenience macros are of two types -- the older kind | |
12363 that store the result into a specified variable, and the newer kind that | |
12364 return the result. The newer kind of macros don't exist when the output | |
12365 is sized data, because that would have two return values. NOTE: All | |
12366 convenience macros are ultimately defined in terms of | |
12367 @code{TO_EXTERNAL_FORMAT} and @code{TO_INTERNAL_FORMAT}. Thus, any | |
12368 comments above about the workings of these macros also apply to all | |
12369 convenience macros. | |
12370 | |
12371 A typical old-style convenience macro is | |
12372 | |
12373 @example | |
12374 C_STRING_TO_EXTERNAL (in, out, codesys); | |
12375 @end example | |
12376 | |
12377 This is equivalent to | |
12378 | |
12379 @example | |
12380 TO_EXTERNAL_FORMAT (C_STRING, in, C_STRING_ALLOCA, out, codesys); | |
12381 @end example | |
12382 | |
12383 but is easier to write and somewhat clearer, since it clearly identifies | |
12384 the arguments without the clutter of having the preprocessor types mixed | |
12385 in. | |
12386 | |
12387 The new-style equivalent is @code{NEW_C_STRING_TO_EXTERNAL (src, | |
12388 codesys)}, which @emph{returns} the converted data (still in | |
12389 @code{alloca()} space). This is far more convenient for most | |
12390 operations. | |
12391 | |
12392 @node General Guidelines for Writing Mule-Aware Code, An Example of Mule-Aware Code, Conversion to and from External Data, Coding for Mule | |
12393 @subsection General Guidelines for Writing Mule-Aware Code | |
12394 @cindex writing Mule-aware code, general guidelines for | |
12395 @cindex Mule-aware code, general guidelines for writing | |
12396 @cindex code, general guidelines for writing Mule-aware | |
12397 | |
12398 This section contains some general guidance on how to write Mule-aware | |
12399 code, as well as some pitfalls you should avoid. | |
12400 | |
12401 @table @emph | |
12402 @item Never use @code{char} and @code{char *}. | |
12403 In XEmacs, the use of @code{char} and @code{char *} is almost always a | |
12404 mistake. If you want to manipulate an Emacs character from ``C'', use | |
12405 @code{Ichar}. If you want to examine a specific octet in the internal | |
12406 format, use @code{Ibyte}. If you want a Lisp-visible character, use a | |
12407 @code{Lisp_Object} and @code{make_char}. If you want a pointer to move | |
12408 through the internal text, use @code{Ibyte *}. Also note that you | |
12409 almost certainly do not need @code{Ichar *}. Other typedefs to clarify | |
12410 the use of @code{char} are @code{Char_ASCII}, @code{Char_Binary}, | |
12411 @code{UChar_Binary}, and @code{CIbyte}. | |
12412 | |
12413 @item Be careful not to confuse @code{Charcount}, @code{Bytecount}, @code{Charbpos} and @code{Bytebpos}. | |
12414 The whole point of using different types is to avoid confusion about the | |
12415 use of certain variables. Lest this effect be nullified, you need to be | |
12416 careful about using the right types. | |
12417 | |
12418 @item Always convert external data | |
12419 It is extremely important to always convert external data, because | |
12420 XEmacs can crash if unexpected 8-bit sequences are copied to its internal | |
12421 buffers literally. | |
12422 | |
12423 This means that when a system function, such as @code{readdir}, returns | |
12424 a string, you normally need to convert it using one of the conversion macros | |
12425 described in the previous chapter, before passing it further to Lisp. | |
12426 | |
12427 Actually, most of the basic system functions that accept '\0'-terminated | |
12428 string arguments, like @code{stat()} and @code{open()}, have | |
12429 @strong{encapsulated} equivalents that do the internal to external | |
12430 conversion themselves. The encapsulated equivalents have a @code{qxe_} | |
12431 prefix and have string arguments of type @code{Ibyte *}, and you can | |
12432 pass internally encoded data to them, often from a Lisp string using | |
12433 @code{XSTRING_DATA}. (A better design might be to provide versions that | |
12434 accept Lisp strings directly.) [[Really? Then they'd either take | |
12435 @code{Lisp_Object}s and need to check type, or they'd take | |
12436 @code{Lisp_String}s, and violate the rules about passing any of the | |
12437 specific Lisp types.]] | |
12438 | |
12439 Also note that many internal functions, such as @code{make_string}, | |
12440 accept Ibytes, which removes the need for them to convert the data they | |
12441 receive. This increases efficiency because that way external data needs | |
12442 to be decoded only once, when it is read. After that, it is passed | |
12443 around in internal format. | |
12444 | |
12445 @item Do all work in internal format | |
12446 External-formatted data is completely unpredictable in its format. It | |
12447 may be fixed-width Unicode (not even ASCII compatible); it may be a | |
12448 modal encoding, in | |
12449 which case some occurrences of (e.g.) the slash character may be part of | |
12450 two-byte Asian-language characters, and a naive attempt to split apart a | |
12451 pathname by slashes will fail; etc. Internal-format text should be | |
12452 converted to external format only at the point where an external API is | |
12453 actually called, and the first thing done after receiving | |
12454 external-format text from an external API should be to convert it to | |
12455 internal text. | |
12456 @end table | |
12457 | |
12458 @node An Example of Mule-Aware Code, Mule-izing Code, General Guidelines for Writing Mule-Aware Code, Coding for Mule | |
12459 @subsection An Example of Mule-Aware Code | |
12460 @cindex code, an example of Mule-aware | |
12461 @cindex Mule-aware code, an example of | |
12462 | |
12463 As an example of Mule-aware code, we will analyze the @code{string} | |
12464 function, which conses up a Lisp string from the character arguments it | |
12465 receives. Here is the definition, pasted from @code{alloc.c}: | |
12466 | |
12467 @example | |
12468 @group | |
12469 DEFUN ("string", Fstring, 0, MANY, 0, /* | |
12470 Concatenate all the argument characters and make the result a string. | |
12471 */ | |
12472 (int nargs, Lisp_Object *args)) | |
12473 @{ | |
12474 Ibyte *storage = alloca_array (Ibyte, nargs * MAX_ICHAR_LEN); | |
12475 Ibyte *p = storage; | |
12476 | |
12477 for (; nargs; nargs--, args++) | |
12478 @{ | |
12479 Lisp_Object lisp_char = *args; | |
12480 CHECK_CHAR_COERCE_INT (lisp_char); | |
12481 p += set_itext_ichar (p, XCHAR (lisp_char)); | |
12482 @} | |
12483 return make_string (storage, p - storage); | |
12484 @} | |
12485 @end group | |
12486 @end example | |
12487 | |
12488 Now we can analyze the source line by line. | |
12489 | |
12490 Obviously, string will be as long as there are arguments to the | |
12491 function. This is why we allocate @code{MAX_ICHAR_LEN} * @var{nargs} | |
12492 bytes on the stack, i.e. the worst-case number of bytes for @var{nargs} | |
12493 @code{Ichar}s to fit in the string. | |
12494 | |
12495 Then, the loop checks that each element is a character, converting | |
12496 integers in the process. Like many other functions in XEmacs, this | |
12497 function silently accepts integers where characters are expected, for | |
12498 historical and compatibility reasons. Unless you know what you are | |
12499 doing, @code{CHECK_CHAR} will also suffice. @code{XCHAR (lisp_char)} | |
12500 extracts the @code{Ichar} from the @code{Lisp_Object}, and | |
12501 @code{set_itext_ichar} stores it to storage, increasing @code{p} in | |
12502 the process. | |
12503 | |
12504 Other instructive examples of correct coding under Mule can be found all | |
12505 over the XEmacs code. For starters, I recommend | |
12506 @code{Fnormalize_menu_item_name} in @file{menubar.c}. After you have | |
12507 understood this section of the manual and studied the examples, you can | |
12508 proceed writing new Mule-aware code. | |
12509 | |
12510 @node Mule-izing Code, , An Example of Mule-Aware Code, Coding for Mule | |
12511 @subsection Mule-izing Code | |
12512 | |
12513 A lot of code is written without Mule in mind, and needs to be made | |
12514 Mule-correct or "Mule-ized". There is really no substitute for | |
12515 line-by-line analysis when doing this, but the following checklist can | |
12516 help: | |
12517 | |
12518 @itemize @bullet | |
12519 @item | |
12520 Check all uses of @code{XSTRING_DATA}. | |
12521 @item | |
12522 Check all uses of @code{build_string} and @code{make_string}. | |
12523 @item | |
12524 Check all uses of @code{tolower} and @code{toupper}. | |
12525 @item | |
12526 Check object print methods. | |
12527 @item | |
12528 Check for use of functions such as @code{write_c_string}, | |
12529 @code{write_fmt_string}, @code{stderr_out}, @code{stdout_out}. | |
12530 @item | |
12531 Check all occurrences of @code{char} and correct to one of the other | |
12532 typedefs described above. | |
12533 @item | |
12534 Check all existing uses of @code{TO_EXTERNAL_FORMAT}, | |
12535 @code{TO_INTERNAL_FORMAT}, and any convenience macros (grep for | |
12536 @samp{EXTERNAL_TO}, @samp{TO_EXTERNAL}, and @samp{TO_SIZED_EXTERNAL}). | |
12537 @item | |
12538 In Windows code, string literals may need to be encapsulated with @code{XETEXT}. | |
12539 @end itemize | |
12540 | |
12541 @node CCL, Modules for Internationalization, Coding for Mule, Multilingual Support | |
9424 @section CCL | 12542 @section CCL |
9425 @cindex CCL | 12543 @cindex CCL |
9426 | 12544 |
9427 @example | 12545 @example |
9428 CCL PROGRAM SYNTAX: | |
9429 CCL_PROGRAM := (CCL_MAIN_BLOCK | |
9430 [ CCL_EOF_BLOCK ]) | |
9431 | |
9432 CCL_MAIN_BLOCK := CCL_BLOCK | |
9433 CCL_EOF_BLOCK := CCL_BLOCK | |
9434 | |
9435 CCL_BLOCK := STATEMENT | (STATEMENT [STATEMENT ...]) | |
9436 STATEMENT := | |
9437 SET | IF | BRANCH | LOOP | REPEAT | BREAK | |
9438 | READ | WRITE | |
9439 | |
9440 SET := (REG = EXPRESSION) | (REG SELF_OP EXPRESSION) | |
9441 | INT-OR-CHAR | |
9442 | |
9443 EXPRESSION := ARG | (EXPRESSION OP ARG) | |
9444 | |
9445 IF := (if EXPRESSION CCL_BLOCK CCL_BLOCK) | |
9446 BRANCH := (branch EXPRESSION CCL_BLOCK [CCL_BLOCK ...]) | |
9447 LOOP := (loop STATEMENT [STATEMENT ...]) | |
9448 BREAK := (break) | |
9449 REPEAT := (repeat) | |
9450 | (write-repeat [REG | INT-OR-CHAR | string]) | |
9451 | (write-read-repeat REG [INT-OR-CHAR | string | ARRAY]?) | |
9452 READ := (read REG) | (read REG REG) | |
9453 | (read-if REG ARITH_OP ARG CCL_BLOCK CCL_BLOCK) | |
9454 | (read-branch REG CCL_BLOCK [CCL_BLOCK ...]) | |
9455 WRITE := (write REG) | (write REG REG) | |
9456 | (write INT-OR-CHAR) | (write STRING) | STRING | |
9457 | (write REG ARRAY) | |
9458 END := (end) | |
9459 | |
9460 REG := r0 | r1 | r2 | r3 | r4 | r5 | r6 | r7 | |
9461 ARG := REG | INT-OR-CHAR | |
9462 OP := + | - | * | / | % | & | '|' | ^ | << | >> | <8 | >8 | // | |
9463 | < | > | == | <= | >= | != | |
9464 SELF_OP := | |
9465 += | -= | *= | /= | %= | &= | '|=' | ^= | <<= | >>= | |
9466 ARRAY := '[' INT-OR-CHAR ... ']' | |
9467 INT-OR-CHAR := INT | CHAR | |
9468 | |
9469 MACHINE CODE: | 12546 MACHINE CODE: |
9470 | 12547 |
9471 The machine code consists of a vector of 32-bit words. | 12548 The machine code consists of a vector of 32-bit words. |
9472 The first such word specifies the start of the EOF section of the code; | 12549 The first such word specifies the start of the EOF section of the code; |
9473 this is the code executed to handle any stuff that needs to be done | 12550 this is the code executed to handle any stuff that needs to be done |
9585 ReadJumpCondR: 11111 RRR c...c Read1 and JumpCondR | 12662 ReadJumpCondR: 11111 RRR c...c Read1 and JumpCondR |
9586 ............rrr | 12663 ............rrr |
9587 ..........AAAAA | 12664 ..........AAAAA |
9588 @end example | 12665 @end example |
9589 | 12666 |
9590 @node The Lisp Reader and Compiler, Lstreams, MULE Character Sets and Encodings, Top | 12667 @node Modules for Internationalization, , CCL, Multilingual Support |
12668 @section Modules for Internationalization | |
12669 @cindex modules for internationalization | |
12670 @cindex internationalization, modules for | |
12671 | |
12672 @example | |
12673 @file{mule-canna.c} | |
12674 @file{mule-ccl.c} | |
12675 @file{mule-charset.c} | |
12676 @file{mule-charset.h} | |
12677 @file{file-coding.c} | |
12678 @file{file-coding.h} | |
12679 @file{mule-coding.c} | |
12680 @file{mule-mcpath.c} | |
12681 @file{mule-mcpath.h} | |
12682 @file{mule-wnnfns.c} | |
12683 @file{mule.c} | |
12684 @end example | |
12685 | |
12686 These files implement the MULE (Asian-language) support. Note that MULE | |
12687 actually provides a general interface for all sorts of languages, not | |
12688 just Asian languages (although they are generally the most complicated | |
12689 to support). This code is still in beta. | |
12690 | |
12691 @file{mule-charset.*} and @file{file-coding.*} provide the heart of the | |
12692 XEmacs MULE support. @file{mule-charset.*} implements the @dfn{charset} | |
12693 Lisp object type, which encapsulates a character set (an ordered one- or | |
12694 two-dimensional set of characters, such as US ASCII or JISX0208 Japanese | |
12695 Kanji). | |
12696 | |
12697 @file{file-coding.*} implements the @dfn{coding-system} Lisp object | |
12698 type, which encapsulates a method of converting between different | |
12699 encodings. An encoding is a representation of a stream of characters, | |
12700 possibly from multiple character sets, using a stream of bytes or words, | |
12701 and defines (e.g.) which escape sequences are used to specify particular | |
12702 character sets, how the indices for a character are converted into bytes | |
12703 (sometimes this involves setting the high bit; sometimes complicated | |
12704 rearranging of the values takes place, as in the Shift-JIS encoding), | |
12705 etc. It also contains some generic coding system implementations, such | |
12706 as the binary (no-conversion) coding system and a sample gzip coding system. | |
12707 | |
12708 @file{mule-coding.c} contains the implementations of text coding systems. | |
12709 | |
12710 @file{mule-ccl.c} provides the CCL (Code Conversion Language) | |
12711 interpreter. CCL is similar in spirit to Lisp byte code and is used to | |
12712 implement converters for custom encodings. | |
12713 | |
12714 @file{mule-canna.c} and @file{mule-wnnfns.c} implement interfaces to | |
12715 external programs used to implement the Canna and WNN input methods, | |
12716 respectively. This is currently in beta. | |
12717 | |
12718 @file{mule-mcpath.c} provides some functions to allow for pathnames | |
12719 containing extended characters. This code is fragmentary, obsolete, and | |
12720 completely non-working. Instead, @code{pathname-coding-system} is used | |
12721 to specify conversions of names of files and directories. The standard | |
12722 C I/O functions like @samp{open()} are wrapped so that conversion occurs | |
12723 automatically. | |
12724 | |
12725 @file{mule.c} contains a few miscellaneous things. It currently seems | |
12726 to be unused and probably should be removed. | |
12727 | |
12728 | |
12729 | |
12730 @example | |
12731 @file{intl.c} | |
12732 @end example | |
12733 | |
12734 This provides some miscellaneous internationalization code for | |
12735 implementing message translation and interfacing to the Ximp input | |
12736 method. None of this code is currently working. | |
12737 | |
12738 | |
12739 | |
12740 @example | |
12741 @file{iso-wide.h} | |
12742 @end example | |
12743 | |
12744 This contains leftover code from an earlier implementation of | |
12745 Asian-language support, and is not currently used. | |
12746 | |
12747 | |
12748 @node The Lisp Reader and Compiler, Lstreams, Multilingual Support, Top | |
9591 @chapter The Lisp Reader and Compiler | 12749 @chapter The Lisp Reader and Compiler |
9592 @cindex Lisp reader and compiler, the | 12750 @cindex Lisp reader and compiler, the |
9593 @cindex reader and compiler, the Lisp | 12751 @cindex reader and compiler, the Lisp |
9594 @cindex compiler, the Lisp reader and | 12752 @cindex compiler, the Lisp reader and |
9595 | 12753 |
9614 * Lstream Types:: Different sorts of things that are streamed. | 12772 * Lstream Types:: Different sorts of things that are streamed. |
9615 * Lstream Functions:: Functions for working with lstreams. | 12773 * Lstream Functions:: Functions for working with lstreams. |
9616 * Lstream Methods:: Creating new lstream types. | 12774 * Lstream Methods:: Creating new lstream types. |
9617 @end menu | 12775 @end menu |
9618 | 12776 |
9619 @node Creating an Lstream | 12777 @node Creating an Lstream, Lstream Types, Lstreams, Lstreams |
9620 @section Creating an Lstream | 12778 @section Creating an Lstream |
9621 @cindex lstream, creating an | 12779 @cindex lstream, creating an |
9622 | 12780 |
9623 Lstreams come in different types, depending on what is being interfaced | 12781 Lstreams come in different types, depending on what is being interfaced |
9624 to. Although the primitive for creating new lstreams is | 12782 to. Although the primitive for creating new lstreams is |
9646 Open for reading, but ``read'' never returns partial MULE characters. | 12804 Open for reading, but ``read'' never returns partial MULE characters. |
9647 @item "wc" | 12805 @item "wc" |
9648 Open for writing, but never writes partial MULE characters. | 12806 Open for writing, but never writes partial MULE characters. |
9649 @end table | 12807 @end table |
9650 | 12808 |
9651 @node Lstream Types | 12809 @node Lstream Types, Lstream Functions, Creating an Lstream, Lstreams |
9652 @section Lstream Types | 12810 @section Lstream Types |
9653 @cindex lstream types | 12811 @cindex lstream types |
9654 @cindex types, lstream | 12812 @cindex types, lstream |
9655 | 12813 |
9656 @table @asis | 12814 @table @asis |
9673 @item decoding | 12831 @item decoding |
9674 | 12832 |
9675 @item encoding | 12833 @item encoding |
9676 @end table | 12834 @end table |
9677 | 12835 |
9678 @node Lstream Functions | 12836 @node Lstream Functions, Lstream Methods, Lstream Types, Lstreams |
9679 @section Lstream Functions | 12837 @section Lstream Functions |
9680 @cindex lstream functions | 12838 @cindex lstream functions |
9681 | 12839 |
9682 @deftypefun {Lstream *} Lstream_new (Lstream_implementation *@var{imp}, const char *@var{mode}) | 12840 @deftypefun {Lstream *} Lstream_new (Lstream_implementation *@var{imp}, const char *@var{mode}) |
9683 Allocate and return a new Lstream. This function is not really meant to | 12841 Allocate and return a new Lstream. This function is not really meant to |
9757 | 12915 |
9758 @deftypefun void Lstream_rewind (Lstream *@var{stream}) | 12916 @deftypefun void Lstream_rewind (Lstream *@var{stream}) |
9759 Rewind the stream to the beginning. | 12917 Rewind the stream to the beginning. |
9760 @end deftypefun | 12918 @end deftypefun |
9761 | 12919 |
9762 @node Lstream Methods | 12920 @node Lstream Methods, , Lstream Functions, Lstreams |
9763 @section Lstream Methods | 12921 @section Lstream Methods |
9764 @cindex lstream methods | 12922 @cindex lstream methods |
9765 | 12923 |
9766 @deftypefn {Lstream Method} Bytecount reader (Lstream *@var{stream}, unsigned char *@var{data}, Bytecount @var{size}) | 12924 @deftypefn {Lstream Method} Bytecount reader (Lstream *@var{stream}, unsigned char *@var{data}, Bytecount @var{size}) |
9767 Read some data from the stream's end and store it into @var{data}, which | 12925 Read some data from the stream's end and store it into @var{data}, which |
9831 @cindex devices; frames; windows, consoles; | 12989 @cindex devices; frames; windows, consoles; |
9832 @cindex frames; windows, consoles; devices; | 12990 @cindex frames; windows, consoles; devices; |
9833 @cindex windows, consoles; devices; frames; | 12991 @cindex windows, consoles; devices; frames; |
9834 | 12992 |
9835 @menu | 12993 @menu |
9836 * Introduction to Consoles; Devices; Frames; Windows:: | 12994 * Introduction to Consoles; Devices; Frames; Windows:: |
9837 * Point:: | 12995 * Point:: |
9838 * Window Hierarchy:: | 12996 * Window Hierarchy:: |
9839 * The Window Object:: | 12997 * The Window Object:: |
12998 * Modules for the Basic Displayable Lisp Objects:: | |
9840 @end menu | 12999 @end menu |
9841 | 13000 |
9842 @node Introduction to Consoles; Devices; Frames; Windows | 13001 @node Introduction to Consoles; Devices; Frames; Windows, Point, Consoles; Devices; Frames; Windows, Consoles; Devices; Frames; Windows |
9843 @section Introduction to Consoles; Devices; Frames; Windows | 13002 @section Introduction to Consoles; Devices; Frames; Windows |
9844 @cindex consoles; devices; frames; windows, introduction to | 13003 @cindex consoles; devices; frames; windows, introduction to |
9845 @cindex devices; frames; windows, introduction to consoles; | 13004 @cindex devices; frames; windows, introduction to consoles; |
9846 @cindex frames; windows, introduction to consoles; devices; | 13005 @cindex frames; windows, introduction to consoles; devices; |
9847 @cindex windows, introduction to consoles; devices; frames; | 13006 @cindex windows, introduction to consoles; devices; frames; |
9883 window, but every frame remembers the last window in it that was | 13042 window, but every frame remembers the last window in it that was |
9884 selected, and changing the selected frame causes the remembered window | 13043 selected, and changing the selected frame causes the remembered window |
9885 within it to become the selected window. Similar relationships apply | 13044 within it to become the selected window. Similar relationships apply |
9886 for consoles to devices and devices to frames. | 13045 for consoles to devices and devices to frames. |
9887 | 13046 |
9888 @node Point | 13047 @node Point, Window Hierarchy, Introduction to Consoles; Devices; Frames; Windows, Consoles; Devices; Frames; Windows |
9889 @section Point | 13048 @section Point |
9890 @cindex point | 13049 @cindex point |
9891 | 13050 |
9892 Recall that every buffer has a current insertion position, called | 13051 Recall that every buffer has a current insertion position, called |
9893 @dfn{point}. Now, two or more windows may be displaying the same buffer, | 13052 @dfn{point}. Now, two or more windows may be displaying the same buffer, |
9905 want to retrieve the correct value of @code{point} for a window, | 13064 want to retrieve the correct value of @code{point} for a window, |
9906 you must special-case on the selected window and retrieve the | 13065 you must special-case on the selected window and retrieve the |
9907 buffer's point instead. This is related to why @code{save-window-excursion} | 13066 buffer's point instead. This is related to why @code{save-window-excursion} |
9908 does not save the selected window's value of @code{point}. | 13067 does not save the selected window's value of @code{point}. |
9909 | 13068 |
9910 @node Window Hierarchy | 13069 @node Window Hierarchy, The Window Object, Point, Consoles; Devices; Frames; Windows |
9911 @section Window Hierarchy | 13070 @section Window Hierarchy |
9912 @cindex window hierarchy | 13071 @cindex window hierarchy |
9913 @cindex hierarchy of windows | 13072 @cindex hierarchy of windows |
9914 | 13073 |
9915 If a frame contains multiple windows (panes), they are always created | 13074 If a frame contains multiple windows (panes), they are always created |
10003 frames have no root window, and the @code{next} of the minibuffer window | 13162 frames have no root window, and the @code{next} of the minibuffer window |
10004 is @code{nil} but the @code{prev} points to itself. (#### This is an | 13163 is @code{nil} but the @code{prev} points to itself. (#### This is an |
10005 artifact that should be fixed.) | 13164 artifact that should be fixed.) |
10006 @end enumerate | 13165 @end enumerate |
10007 | 13166 |
10008 @node The Window Object | 13167 @node The Window Object, Modules for the Basic Displayable Lisp Objects, Window Hierarchy, Consoles; Devices; Frames; Windows |
10009 @section The Window Object | 13168 @section The Window Object |
10010 @cindex window object, the | 13169 @cindex window object, the |
10011 @cindex object, the window | 13170 @cindex object, the window |
10012 | 13171 |
10013 Windows have the following accessible fields: | 13172 Windows have the following accessible fields: |
10110 If the region (or part of it) is highlighted in this window, this field | 13269 If the region (or part of it) is highlighted in this window, this field |
10111 holds the mark position that made one end of that region. Otherwise, | 13270 holds the mark position that made one end of that region. Otherwise, |
10112 this field is @code{nil}. | 13271 this field is @code{nil}. |
10113 @end table | 13272 @end table |
10114 | 13273 |
13274 @node Modules for the Basic Displayable Lisp Objects, , The Window Object, Consoles; Devices; Frames; Windows | |
13275 @section Modules for the Basic Displayable Lisp Objects | |
13276 @cindex modules for the basic displayable Lisp objects | |
13277 @cindex displayable Lisp objects, modules for the basic | |
13278 @cindex Lisp objects, modules for the basic displayable | |
13279 @cindex objects, modules for the basic displayable Lisp | |
13280 | |
13281 @example | |
13282 @file{console-msw.c} | |
13283 @file{console-msw.h} | |
13284 @file{console-stream.c} | |
13285 @file{console-stream.h} | |
13286 @file{console-tty.c} | |
13287 @file{console-tty.h} | |
13288 @file{console-x.c} | |
13289 @file{console-x.h} | |
13290 @file{console.c} | |
13291 @file{console.h} | |
13292 @end example | |
13293 | |
13294 These modules implement the @dfn{console} Lisp object type. A console | |
13295 contains multiple display devices, but only one keyboard and mouse. | |
13296 Most of the time, a console will contain exactly one device. | |
13297 | |
13298 Consoles are the top of a lisp object inclusion hierarchy. Consoles | |
13299 contain devices, which contain frames, which contain windows. | |
13300 | |
13301 | |
13302 | |
13303 @example | |
13304 @file{device-msw.c} | |
13305 @file{device-tty.c} | |
13306 @file{device-x.c} | |
13307 @file{device.c} | |
13308 @file{device.h} | |
13309 @end example | |
13310 | |
13311 These modules implement the @dfn{device} Lisp object type. This | |
13312 abstracts a particular screen or connection on which frames are | |
13313 displayed. As with Lisp objects, event interfaces, and other | |
13314 subsystems, the device code is separated into a generic component that | |
13315 contains a standardized interface (in the form of a set of methods) onto | |
13316 particular device types. | |
13317 | |
13318 The device subsystem defines all the methods and provides method | |
13319 services for not only device operations but also for the frame, window, | |
13320 menubar, scrollbar, toolbar, and other displayable-object subsystems. | |
13321 The reason for this is that all of these subsystems have the same | |
13322 subtypes (X, TTY, NeXTstep, Microsoft Windows, etc.) as devices do. | |
13323 | |
13324 | |
13325 | |
13326 @example | |
13327 @file{frame-msw.c} | |
13328 @file{frame-tty.c} | |
13329 @file{frame-x.c} | |
13330 @file{frame.c} | |
13331 @file{frame.h} | |
13332 @end example | |
13333 | |
13334 Each device contains one or more frames in which objects (e.g. text) are | |
13335 displayed. A frame corresponds to a window in the window system; | |
13336 usually this is a top-level window but it could potentially be one of a | |
13337 number of overlapping child windows within a top-level window, using the | |
13338 MDI (Multiple Document Interface) protocol in Microsoft Windows or a | |
13339 similar scheme. | |
13340 | |
13341 The @file{frame-*} files implement the @dfn{frame} Lisp object type and | |
13342 provide the generic and device-type-specific operations on frames | |
13343 (e.g. raising, lowering, resizing, moving, etc.). | |
13344 | |
13345 | |
13346 | |
13347 @example | |
13348 @file{window.c} | |
13349 @file{window.h} | |
13350 @end example | |
13351 | |
13352 @cindex window (in Emacs) | |
13353 @cindex pane | |
13354 Each frame consists of one or more non-overlapping @dfn{windows} (better | |
13355 known as @dfn{panes} in standard window-system terminology) in which a | |
13356 buffer's text can be displayed. Windows can also have scrollbars | |
13357 displayed around their edges. | |
13358 | |
13359 @file{window.c} and @file{window.h} implement the @dfn{window} Lisp | |
13360 object type and provide code to manage windows. Since windows have no | |
13361 associated resources in the window system (the window system knows only | |
13362 about the frame; no child windows or anything are used for XEmacs | |
13363 windows), there is no device-type-specific code here; all of that code | |
13364 is part of the redisplay mechanism or the code for particular object | |
13365 types such as scrollbars. | |
13366 | |
10115 @node The Redisplay Mechanism, Extents, Consoles; Devices; Frames; Windows, Top | 13367 @node The Redisplay Mechanism, Extents, Consoles; Devices; Frames; Windows, Top |
10116 @chapter The Redisplay Mechanism | 13368 @chapter The Redisplay Mechanism |
10117 @cindex redisplay mechanism, the | 13369 @cindex redisplay mechanism, the |
10118 | 13370 |
10119 The redisplay mechanism is one of the most complicated sections of | 13371 The redisplay mechanism is one of the most complicated sections of |
10133 @item | 13385 @item |
10134 It Is Better To Be Fast Than Not To Be. | 13386 It Is Better To Be Fast Than Not To Be. |
10135 @end enumerate | 13387 @end enumerate |
10136 | 13388 |
10137 @menu | 13389 @menu |
10138 * Critical Redisplay Sections:: | 13390 * Critical Redisplay Sections:: |
10139 * Line Start Cache:: | 13391 * Line Start Cache:: |
10140 * Redisplay Piece by Piece:: | 13392 * Redisplay Piece by Piece:: |
13393 * Modules for the Redisplay Mechanism:: | |
13394 * Modules for other Display-Related Lisp Objects:: | |
10141 @end menu | 13395 @end menu |
10142 | 13396 |
10143 @node Critical Redisplay Sections | 13397 @node Critical Redisplay Sections, Line Start Cache, The Redisplay Mechanism, The Redisplay Mechanism |
10144 @section Critical Redisplay Sections | 13398 @section Critical Redisplay Sections |
10145 @cindex redisplay sections, critical | 13399 @cindex redisplay sections, critical |
10146 @cindex critical redisplay sections | 13400 @cindex critical redisplay sections |
10147 | 13401 |
10148 Within this section, we are defenseless and assume that the | 13402 Within this section, we are defenseless and assume that the |
10171 we simply return. #### We should abort instead. | 13425 we simply return. #### We should abort instead. |
10172 | 13426 |
10173 #### If a frame-size change does occur we should probably | 13427 #### If a frame-size change does occur we should probably |
10174 actually be preempting redisplay. | 13428 actually be preempting redisplay. |
10175 | 13429 |
10176 @node Line Start Cache | 13430 @node Line Start Cache, Redisplay Piece by Piece, Critical Redisplay Sections, The Redisplay Mechanism |
10177 @section Line Start Cache | 13431 @section Line Start Cache |
10178 @cindex line start cache | 13432 @cindex line start cache |
10179 | 13433 |
10180 The traditional scrolling code in Emacs breaks in a variable height | 13434 The traditional scrolling code in Emacs breaks in a variable height |
10181 world. It depends on the key assumption that the number of lines that | 13435 world. It depends on the key assumption that the number of lines that |
10232 @end itemize | 13486 @end itemize |
10233 | 13487 |
10234 In case you're wondering, the Second Golden Rule of Redisplay is not | 13488 In case you're wondering, the Second Golden Rule of Redisplay is not |
10235 applicable. | 13489 applicable. |
10236 | 13490 |
10237 @node Redisplay Piece by Piece | 13491 @node Redisplay Piece by Piece, Modules for the Redisplay Mechanism, Line Start Cache, The Redisplay Mechanism |
10238 @section Redisplay Piece by Piece | 13492 @section Redisplay Piece by Piece |
10239 @cindex redisplay piece by piece | 13493 @cindex redisplay piece by piece |
10240 | 13494 |
10241 As you can begin to see redisplay is complex and also not well | 13495 As you can begin to see redisplay is complex and also not well |
10242 documented. Chuck no longer works on XEmacs so this section is my take | 13496 documented. Chuck no longer works on XEmacs so this section is my take |
10282 a string we cannot use @code{create_text_block}. Instead we use | 13536 a string we cannot use @code{create_text_block}. Instead we use |
10283 @code{create_text_string_block} which performs the same function as | 13537 @code{create_text_string_block} which performs the same function as |
10284 @code{create_text_block} but for strings. Many of the complexities of | 13538 @code{create_text_block} but for strings. Many of the complexities of |
10285 @code{create_text_block} to do with cursor handling and selective | 13539 @code{create_text_block} to do with cursor handling and selective |
10286 display have been removed. | 13540 display have been removed. |
13541 | |
13542 @node Modules for the Redisplay Mechanism, Modules for other Display-Related Lisp Objects, Redisplay Piece by Piece, The Redisplay Mechanism | |
13543 @section Modules for the Redisplay Mechanism | |
13544 @cindex modules for the redisplay mechanism | |
13545 @cindex redisplay mechanism, modules for the | |
13546 | |
13547 @example | |
13548 @file{redisplay-output.c} | |
13549 @file{redisplay-msw.c} | |
13550 @file{redisplay-tty.c} | |
13551 @file{redisplay-x.c} | |
13552 @file{redisplay.c} | |
13553 @file{redisplay.h} | |
13554 @end example | |
13555 | |
13556 These files provide the redisplay mechanism. As with many other | |
13557 subsystems in XEmacs, there is a clean separation between the general | |
13558 and device-specific support. | |
13559 | |
13560 @file{redisplay.c} contains the bulk of the redisplay engine. These | |
13561 functions update the redisplay structures (which describe how the screen | |
13562 is to appear) to reflect any changes made to the state of any | |
13563 displayable objects (buffer, frame, window, etc.) since the last time | |
13564 that redisplay was called. These functions are highly optimized to | |
13565 avoid doing more work than necessary (since redisplay is called | |
13566 extremely often and is potentially a huge time sink), and depend heavily | |
13567 on notifications from the objects themselves that changes have occurred, | |
13568 so that redisplay doesn't explicitly have to check each possible object. | |
13569 The redisplay mechanism also contains a great deal of caching to further | |
13570 speed things up; some of this caching is contained within the various | |
13571 displayable objects. | |
13572 | |
13573 @file{redisplay-output.c} goes through the redisplay structures and converts | |
13574 them into calls to device-specific methods to actually output the screen | |
13575 changes. | |
13576 | |
13577 @file{redisplay-x.c} and @file{redisplay-tty.c} are two implementations | |
13578 of these redisplay output methods, for X frames and TTY frames, | |
13579 respectively. | |
13580 | |
13581 | |
13582 | |
13583 @example | |
13584 @file{indent.c} | |
13585 @end example | |
13586 | |
13587 This module contains various functions and Lisp primitives for | |
13588 converting between buffer positions and screen positions. These | |
13589 functions call the redisplay mechanism to do most of the work, and then | |
13590 examine the redisplay structures to get the necessary information. This | |
13591 module needs work. | |
13592 | |
13593 | |
13594 | |
13595 @example | |
13596 @file{termcap.c} | |
13597 @file{terminfo.c} | |
13598 @file{tparam.c} | |
13599 @end example | |
13600 | |
13601 These files contain functions for working with the termcap (BSD-style) | |
13602 and terminfo (System V style) databases of terminal capabilities and | |
13603 escape sequences, used when XEmacs is displaying in a TTY. | |
13604 | |
13605 | |
13606 | |
13607 @example | |
13608 @file{cm.c} | |
13609 @file{cm.h} | |
13610 @end example | |
13611 | |
13612 These files provide some miscellaneous TTY-output functions and should | |
13613 probably be merged into @file{redisplay-tty.c}. | |
13614 | |
13615 | |
13616 | |
13617 @node Modules for other Display-Related Lisp Objects, , Modules for the Redisplay Mechanism, The Redisplay Mechanism | |
13618 @section Modules for other Display-Related Lisp Objects | |
13619 @cindex modules for other display-related Lisp objects | |
13620 @cindex display-related Lisp objects, modules for other | |
13621 @cindex Lisp objects, modules for other display-related | |
13622 | |
13623 @example | |
13624 @file{faces.c} | |
13625 @file{faces.h} | |
13626 @end example | |
13627 | |
13628 | |
13629 | |
13630 @example | |
13631 @file{bitmaps.h} | |
13632 @file{glyphs-eimage.c} | |
13633 @file{glyphs-msw.c} | |
13634 @file{glyphs-msw.h} | |
13635 @file{glyphs-widget.c} | |
13636 @file{glyphs-x.c} | |
13637 @file{glyphs-x.h} | |
13638 @file{glyphs.c} | |
13639 @file{glyphs.h} | |
13640 @end example | |
13641 | |
13642 | |
13643 | |
13644 @example | |
13645 @file{objects-msw.c} | |
13646 @file{objects-msw.h} | |
13647 @file{objects-tty.c} | |
13648 @file{objects-tty.h} | |
13649 @file{objects-x.c} | |
13650 @file{objects-x.h} | |
13651 @file{objects.c} | |
13652 @file{objects.h} | |
13653 @end example | |
13654 | |
13655 | |
13656 | |
13657 @example | |
13658 @file{menubar-msw.c} | |
13659 @file{menubar-msw.h} | |
13660 @file{menubar-x.c} | |
13661 @file{menubar.c} | |
13662 @file{menubar.h} | |
13663 @end example | |
13664 | |
13665 | |
13666 | |
13667 @example | |
13668 @file{scrollbar-msw.c} | |
13669 @file{scrollbar-msw.h} | |
13670 @file{scrollbar-x.c} | |
13671 @file{scrollbar-x.h} | |
13672 @file{scrollbar.c} | |
13673 @file{scrollbar.h} | |
13674 @end example | |
13675 | |
13676 | |
13677 | |
13678 @example | |
13679 @file{toolbar-msw.c} | |
13680 @file{toolbar-x.c} | |
13681 @file{toolbar.c} | |
13682 @file{toolbar.h} | |
13683 @end example | |
13684 | |
13685 | |
13686 | |
13687 @example | |
13688 @file{font-lock.c} | |
13689 @end example | |
13690 | |
13691 This file provides C support for syntax highlighting---i.e. | |
13692 highlighting different syntactic constructs of a source file in | |
13693 different colors, for easy reading. The C support is provided so that | |
13694 this is fast. | |
13695 | |
13696 | |
13697 | |
13698 @example | |
13699 @file{dgif_lib.c} | |
13700 @file{gif_err.c} | |
13701 @file{gif_lib.h} | |
13702 @file{gifalloc.c} | |
13703 @end example | |
13704 | |
13705 These modules decode GIF-format image files, for use with glyphs. | |
13706 These files were removed due to Unisys patent infringement concerns. | |
13707 | |
10287 | 13708 |
10288 @node Extents, Faces, The Redisplay Mechanism, Top | 13709 @node Extents, Faces, The Redisplay Mechanism, Top |
10289 @chapter Extents | 13710 @chapter Extents |
10290 @cindex extents | 13711 @cindex extents |
10291 | 13712 |
10296 * Zero-Length Extents:: A weird special case. | 13717 * Zero-Length Extents:: A weird special case. |
10297 * Mathematics of Extent Ordering:: A rigorous foundation. | 13718 * Mathematics of Extent Ordering:: A rigorous foundation. |
10298 * Extent Fragments:: Cached information useful for redisplay. | 13719 * Extent Fragments:: Cached information useful for redisplay. |
10299 @end menu | 13720 @end menu |
10300 | 13721 |
10301 @node Introduction to Extents | 13722 @node Introduction to Extents, Extent Ordering, Extents, Extents |
10302 @section Introduction to Extents | 13723 @section Introduction to Extents |
10303 @cindex extents, introduction to | 13724 @cindex extents, introduction to |
10304 | 13725 |
10305 Extents are regions over a buffer, with a start and an end position | 13726 Extents are regions over a buffer, with a start and an end position |
10306 denoting the region of the buffer included in the extent. In | 13727 denoting the region of the buffer included in the extent. In |
10319 automatically go inside or out of extents as necessary with no | 13740 automatically go inside or out of extents as necessary with no |
10320 further work needing to be done. It didn't work out that way, | 13741 further work needing to be done. It didn't work out that way, |
10321 however, and just ended up complexifying and buggifying all the | 13742 however, and just ended up complexifying and buggifying all the |
10322 rest of the code.) | 13743 rest of the code.) |
10323 | 13744 |
10324 @node Extent Ordering | 13745 @node Extent Ordering, Format of the Extent Info, Introduction to Extents, Extents |
10325 @section Extent Ordering | 13746 @section Extent Ordering |
10326 @cindex extent ordering | 13747 @cindex extent ordering |
10327 | 13748 |
10328 Extents are compared using memory indices. There are two orderings | 13749 Extents are compared using memory indices. There are two orderings |
10329 for extents and both orders are kept current at all times. The normal | 13750 for extents and both orders are kept current at all times. The normal |
10354 The display order and the e-order are complementary orders: any | 13775 The display order and the e-order are complementary orders: any |
10355 theorem about the display order also applies to the e-order if you swap | 13776 theorem about the display order also applies to the e-order if you swap |
10356 all occurrences of ``display order'' and ``e-order'', ``less than'' and | 13777 all occurrences of ``display order'' and ``e-order'', ``less than'' and |
10357 ``greater than'', and ``extent start'' and ``extent end''. | 13778 ``greater than'', and ``extent start'' and ``extent end''. |
10358 | 13779 |
10359 @node Format of the Extent Info | 13780 @node Format of the Extent Info, Zero-Length Extents, Extent Ordering, Extents |
10360 @section Format of the Extent Info | 13781 @section Format of the Extent Info |
10361 @cindex extent info, format of the | 13782 @cindex extent info, format of the |
10362 | 13783 |
10363 An extent-info structure consists of a list of the buffer or string's | 13784 An extent-info structure consists of a list of the buffer or string's |
10364 extents and a @dfn{stack of extents} that lists all of the extents over | 13785 extents and a @dfn{stack of extents} that lists all of the extents over |
10417 An alternative would be balanced binary trees, which have guaranteed | 13838 An alternative would be balanced binary trees, which have guaranteed |
10418 @math{O(log N)} time for all operations (although the constant factors | 13839 @math{O(log N)} time for all operations (although the constant factors |
10419 are not as good, and repeated localized operations will be slower than | 13840 are not as good, and repeated localized operations will be slower than |
10420 for a gap array). Such code is quite tricky to write, however. | 13841 for a gap array). Such code is quite tricky to write, however. |
10421 | 13842 |
10422 @node Zero-Length Extents | 13843 @node Zero-Length Extents, Mathematics of Extent Ordering, Format of the Extent Info, Extents |
10423 @section Zero-Length Extents | 13844 @section Zero-Length Extents |
10424 @cindex zero-length extents | 13845 @cindex zero-length extents |
10425 @cindex extents, zero-length | 13846 @cindex extents, zero-length |
10426 | 13847 |
10427 Extents can be zero-length, and will end up that way if their endpoints | 13848 Extents can be zero-length, and will end up that way if their endpoints |
10448 | 13869 |
10449 Note that closed-open, non-detachable zero-length extents behave | 13870 Note that closed-open, non-detachable zero-length extents behave |
10450 exactly like markers and that open-closed, non-detachable zero-length | 13871 exactly like markers and that open-closed, non-detachable zero-length |
10451 extents behave like the ``point-type'' marker in Mule. | 13872 extents behave like the ``point-type'' marker in Mule. |
10452 | 13873 |
10453 @node Mathematics of Extent Ordering | 13874 @node Mathematics of Extent Ordering, Extent Fragments, Zero-Length Extents, Extents |
10454 @section Mathematics of Extent Ordering | 13875 @section Mathematics of Extent Ordering |
10455 @cindex mathematics of extent ordering | 13876 @cindex mathematics of extent ordering |
10456 @cindex extent mathematics | 13877 @cindex extent mathematics |
10457 @cindex extent ordering | 13878 @cindex extent ordering |
10458 | 13879 |
10576 Proof: If @math{F2} does not include @math{I} then its start index is | 13997 Proof: If @math{F2} does not include @math{I} then its start index is |
10577 greater than @math{I} and thus it is greater than any extent in | 13998 greater than @math{I} and thus it is greater than any extent in |
10578 @math{S}, including @math{F}. Otherwise, @math{F2} includes @math{I} | 13999 @math{S}, including @math{F}. Otherwise, @math{F2} includes @math{I} |
10579 and thus is in @math{S}, and thus @math{F2 >= F}. | 14000 and thus is in @math{S}, and thus @math{F2 >= F}. |
10580 | 14001 |
10581 @node Extent Fragments | 14002 @node Extent Fragments, , Mathematics of Extent Ordering, Extents |
10582 @section Extent Fragments | 14003 @section Extent Fragments |
10583 @cindex extent fragments | 14004 @cindex extent fragments |
10584 @cindex fragments, extent | 14005 @cindex fragments, extent |
10585 | 14006 |
10586 Imagine that the buffer is divided up into contiguous, non-overlapping | 14007 Imagine that the buffer is divided up into contiguous, non-overlapping |
10759 @chapter Specifiers | 14180 @chapter Specifiers |
10760 @cindex specifiers | 14181 @cindex specifiers |
10761 | 14182 |
10762 Not yet documented. | 14183 Not yet documented. |
10763 | 14184 |
14185 Specifiers are documented in depth in the Lisp Reference manual. | |
14186 @xref{Specifiers,,, lispref, XEmacs Lisp Reference Manual}. The code in | |
14187 @file{specifier.c} is pretty straightforward. | |
14188 | |
10764 @node Menus, Subprocesses, Specifiers, Top | 14189 @node Menus, Subprocesses, Specifiers, Top |
10765 @chapter Menus | 14190 @chapter Menus |
10766 @cindex menus | 14191 @cindex menus |
10767 | 14192 |
10768 A menu is set by setting the value of the variable | 14193 A menu is set by setting the value of the variable |
10812 @code{menubar_selection_callback()} enqueues a menu event, putting in it | 14237 @code{menubar_selection_callback()} enqueues a menu event, putting in it |
10813 a function to call (either @code{eval} or @code{call-interactively}) and | 14238 a function to call (either @code{eval} or @code{call-interactively}) and |
10814 its argument, which is the callback function or form given in the menu's | 14239 its argument, which is the callback function or form given in the menu's |
10815 description. | 14240 description. |
10816 | 14241 |
10817 @node Subprocesses, Interface to the X Window System, Menus, Top | 14242 @node Subprocesses, Interface to MS Windows, Menus, Top |
10818 @chapter Subprocesses | 14243 @chapter Subprocesses |
10819 @cindex subprocesses | 14244 @cindex subprocesses |
10820 | 14245 |
10821 The fields of a process are: | 14246 The fields of a process are: |
10822 | 14247 |
10886 @item tty_name | 14311 @item tty_name |
10887 The name of the terminal that the subprocess is using, | 14312 The name of the terminal that the subprocess is using, |
10888 or @code{nil} if it is using pipes. | 14313 or @code{nil} if it is using pipes. |
10889 @end table | 14314 @end table |
10890 | 14315 |
10891 @node Interface to the X Window System, Index, Subprocesses, Top | 14316 @node Interface to MS Windows, Interface to the X Window System, Subprocesses, Top |
14317 @chapter Interface to MS Windows | |
14318 @cindex MS Windows, interface to | |
14319 @cindex Windows, interface to | |
14320 | |
14321 @menu | |
14322 * Different kinds of Windows environments:: | |
14323 * Windows Build Flags:: | |
14324 * Windows I18N Introduction:: | |
14325 * Modules for Interfacing with MS Windows:: | |
14326 @end menu | |
14327 | |
14328 @node Different kinds of Windows environments, Windows Build Flags, Interface to MS Windows, Interface to MS Windows | |
14329 @section Different kinds of Windows environments | |
14330 @cindex different kinds of Windows environments | |
14331 @cindex Windows environments, different kinds of | |
14332 @cindex MS Windows environments, different kinds of | |
14333 | |
14334 @subsubheading (a) operating system (OS) vs. window system vs. Win32 API vs. C runtime library (CRT) vs. and compiler | |
14335 | |
14336 There are various Windows operating systems (Windows NT, 2000, XP, 95, | |
14337 98, ME, etc.), which come in two basic classes: Windows NT (NT, 2000, | |
14338 XP, and all future versions) and 9x (95, 98, ME). 9x-class operating | |
14339 systems are a kind of hodgepodge of a 32-bit upper layer on top of a | |
14340 16-bit MS-DOS-compatible lower layer. NT-class operating systems are | |
14341 written from the ground up as 32-bit (there are also 64-bit versions | |
14342 available now), and provide many more features and much greater | |
14343 stability, since there is full memory protection between all processes | |
14344 and the between processes and the system. NT-class operating systems | |
14345 also provide emulation for DOS programs inside of a "sandbox" (i.e. a | |
14346 walled-off environment in which one DOS program can screw up another | |
14347 one, but there is theoretically no way for a DOS program to screw up the | |
14348 OS itself). From the perspective of XEmacs, the different between NT | |
14349 and 9x is very important in Unicode support (not really provided under | |
14350 9x -- see @file{intl-win32.c}) and subprocess creation, among other things. | |
14351 | |
14352 The operating system provides the framework for accessing files and | |
14353 devices and running programs. From the perspective of a program, the | |
14354 operating system provides a set of services. At the lowest level, the | |
14355 way to call these services is dependent on the processor the OS is | |
14356 running on, but a portable interface is provided to C programs through | |
14357 functions called "system calls". Under Windows, this interface is called | |
14358 the Win32 API, and includes file-manipulation calls such as @code{CreateFile()} | |
14359 and @code{ReadFile()}, process-creation calls such as @code{CreateProcess()}, etc. | |
14360 | |
14361 This concept of system calls goes back to Unix, where similar services | |
14362 are available but through routines with different, simpler names, such | |
14363 as @code{open()}, @code{read()}, @code{fork()}, @code{execve()}, etc. In addition, Unix provides | |
14364 a higher layer of routines, called the C Runtime Library (CRT), which | |
14365 provide higher-level, more convenient versions of the same services (e.g. | |
14366 "stream-oriented" file routines such as @code{fopen()} and @code{fread()}) as well | |
14367 as various other utility functions, such as string-manipulation routines | |
14368 (e.g. @code{strcpy()} and @code{strcmp()}). | |
14369 | |
14370 For compatibility, a C Runtime Library (CRT) is also provided under | |
14371 Windows, which provides a partial implementation of both the Unix CRT | |
14372 and the Unix system-call API, implemented using the Win32 API. The CRT | |
14373 sources come with Visual C++ (VC++). For example, under VC++ 6, look in | |
14374 the CRT/SRC directory, e.g. for me (ben): /Program Files/Microsoft | |
14375 Visual Studio/VC98/CRT/SRC. The CRT is provided using either MSVCRT | |
14376 (dynamically linked) or @file{LIBC.LIB} (statically linked). | |
14377 | |
14378 The window system provides the framework for creating overlapped windows | |
14379 and unifying signals provided by various devices (input devices such as | |
14380 the keyboard and mouse, timers, etc.) into a single event queue (or | |
14381 "message queue", under Windows). Like the operating system, the window | |
14382 system can be viewed from the perspective of a program as a set of | |
14383 services provided by an API of function calls. Under Windows, | |
14384 window-system services are also available through the Win32 API, while | |
14385 under UNIX the window system is typically a separate component (e.g. the | |
14386 X Windowing System, aka X Windows or X11). The term "GUI" ("graphical | |
14387 user interface") is often used to refer to the services provided by the | |
14388 window system, or to a windowing interface provided by a program. | |
14389 | |
14390 The Win32 API is implemented by various dynamic libraries, or DLL's. | |
14391 The most important are KERNEL32, USER32, and GDI32. KERNEL32 implements | |
14392 the basic file-system and process services. USER32 implements the | |
14393 fundamental window-system services such as creating windows and handling | |
14394 messages. GDI32 implements higher-level drawing capabilities -- fonts, | |
14395 colors, lines, etc. | |
14396 | |
14397 C programs are compiled into executables using a compiler. Under Unix, | |
14398 a compiler usually comes as part of the operating system, but not under | |
14399 Windows, where the compiler is a separate product. Even under Unix, | |
14400 people often install their own compilers, such as gcc. Under Windows, | |
14401 the Microsoft-standard compiler is Visual C++ (VC++). | |
14402 | |
14403 It is possible to provide an emulation of any API using any other, as | |
14404 long as the underlying API provides the suitable functionality. This is | |
14405 what Cygwin (www.cygwin.com) does. It provides a fairly complete POSIX | |
14406 emulation layer (POSIX is a government standard for Unix behavior) on | |
14407 top of MS Windows -- in particular, providing the file-system, process, | |
14408 tty, and signal semantics that are part of a modern, standard Unix | |
14409 operating system. Cygwin does this using its own DLL, @file{cygwin1.dll}, | |
14410 which makes calls to the Win32 API services in @file{kernel32.dll}. Cygwin | |
14411 also provides its own implementation of the C runtime library, called | |
14412 @code{newlib} (@file{libcygwin.a}; @file{libc.a} and @file{libm.a} are symlinked to it), which is | |
14413 implemented on top of the Unix system calls provided in @file{cygwin1.dll}. In | |
14414 addition, Cygwin provides static import libraries that give you direct | |
14415 access to the Win32 API -- XEmacs uses this to provide GUI support under | |
14416 Cygwin. Cygwin provides a version of GCC (the GNU Project C compiler) | |
14417 that is set up to automatically link with the appropriate Cygwin | |
14418 libraries. Cygwin also provides, as optional components, pre-compiled | |
14419 binaries for a great number of open-source programs compiled under the | |
14420 Cygwin environment. This includes all of the standard Unix file-system, | |
14421 text-manipulation, development, networking, database, etc. utilities, a | |
14422 version of X Windows that uses the Win32 API underlyingly (see below), | |
14423 and compilations of nearly all other common open-source packages | |
14424 (Apache, TeX, [X]Emacs, Ghostscript, GTK, ImageMagick, etc.). | |
14425 | |
14426 Similarly, you can emulate the functionality of X Windows using the | |
14427 Win32 component of the Win32 API. Cygwin provides a package to do this, | |
14428 from the XFree86 project. Other versions of X under Windows also exist, | |
14429 such as the MicroImages MI/X server. Each version potentially can come | |
14430 comes with its own header and library files, allowing you to compile | |
14431 X-Windows programs. | |
14432 | |
14433 All of these different operating system and emulation layers can make | |
14434 for a fair amount of confusion, so: | |
14435 | |
14436 @subsubheading (b) CRT is not the same as VC++ | |
14437 | |
14438 Note that the CRT is @strong{NOT} (completely) part of VC++. True, if you link | |
14439 statically, the CRT (in the form of @file{LIBC.LIB}, which comes with VC++) | |
14440 will be inserted into the executable (.EXE), but otherwise the CRT will | |
14441 be separate. The dynamic version of the CRT is provided by @file{MSVCRT.DLL} | |
14442 (or @file{MSVCRTD.DLL}, for debugging), which comes with Windows. Hence, it's | |
14443 possible to use a different compiler and still link with MSVCRT -- which | |
14444 is exactly what MinGW does. | |
14445 | |
14446 @subsubheading (c) CRT is not the same as the Win32 API | |
14447 | |
14448 Note also that the CRT is totally separate from the Win32 API. They | |
14449 provide different functions and are implemented in different DLL's. | |
14450 They are also different levels -- the CRT is implemented on top of | |
14451 Win32. Sometimes the CRT and Win32 both have their own versions of | |
14452 similar concepts, such as locales. These are typically maintained | |
14453 separately, and can get out of sync. Do not assume that changing a | |
14454 setting in the CRT will have any effect on Win32 API routines using a | |
14455 similar concept unless the CRT docs specifically say so. Do not assume | |
14456 that behavior described for CRT functions applies to Win32 API or | |
14457 vice-versa. Note also that the CRT knows about and is implemented on | |
14458 top of the Win32 API, while the Win32 API knows nothing about the CRT. | |
14459 | |
14460 @subsubheading (d) MinGW is not the same as Cygwin | |
14461 | |
14462 As described in (b), Microsoft's version of the CRT (@file{MSVCRT.DLL}) is | |
14463 provided as part of Windows, separate from VC++, which must be | |
14464 purchased. Hence, it is possible to write MSVCRT to provide CRT | |
14465 services without using VC++. This is what MinGW (www.mingw.org) does -- | |
14466 it is a port of GCC that will use MSVCRT. The reason one might want to | |
14467 do this is (a) it is free, and (b) it does not require a separately | |
14468 installed DLL, as Cygwin does. (#### Maybe MinGW targets CRTDLL, not | |
14469 MSVCRT? If so, what is CRTDLL, and how does it differ from MSVCRT and | |
14470 @file{LIBC.LIB}?) Primarily, what MinGW provides is patches to GCC (now | |
14471 integrated into the standard distribution) and its own header files and | |
14472 import libraries that are compatible with MSVCRT. The best way to think | |
14473 of MinGW is as simply another Windows compiler, like how there used to | |
14474 be Microsoft and Borland compilers. Because MinGW programs use all the | |
14475 same libraries as VC++ programs, and hence the same services are | |
14476 available, programs that compile under VC++ should compile under MinGW | |
14477 with very little change, whereas programs that compile under Cygwin will | |
14478 look quite different. | |
14479 | |
14480 The confusion between MinGW and Cygwin is the confusion between the | |
14481 environment that a compiler runs under and the target environment of a | |
14482 program, i.e. the environment that a program is compiled to run under. | |
14483 It's theoretically possible, for example, to compile a program under | |
14484 Windows and generate a binary that can only be run under Linux, or | |
14485 vice-versa -- or, for that matter, to use Windows, running on an Intel | |
14486 machine to write and a compile a program that will run on the Mac OS, | |
14487 running on a PowerPC machine. This is called cross-compiling, and while | |
14488 it may seem rather esoteric, it is quite normal when you want to | |
14489 generate a program for a machine that you cannot develop on -- for | |
14490 example, a program that will run on a Palm Pilot. Originally, this is | |
14491 how MinGW worked -- you needed to run GCC under a Cygwin environment and | |
14492 give it appropriate flags, telling it to use the MinGW headers and | |
14493 target @file{MSVCRT.DLL} rather than @file{CYGWIN1.DLL}. (In fact, | |
14494 Cygwin standardly comes with MinGW's header files.) This was because GCC | |
14495 was written with Unix in mind and relied on a large amount of | |
14496 Unix-specific functionality. To port GCC to Windows without using a | |
14497 POSIX emulation layer would mean a lot of rewriting of GCC. Eventually, | |
14498 however, this was done, and it GCC was itself compiled using MinGW. The | |
14499 result is that currently you can develop MinGW applications either under | |
14500 Cygwin or under native Windows. | |
14501 | |
14502 @subsubheading (e) Operating system is not the same as window system | |
14503 | |
14504 As per the above discussion, we can use either Native Windows (the OS | |
14505 part of Win32 provided by @file{KERNEL32.DLL} and the Windows CRT as | |
14506 provided by MSVCRT or CLL) or Cygwin to provide operating-system | |
14507 functionality, and we can use either Native Windows (the windowing part | |
14508 of Win32 as provided by @file{USER32.DLL} and @file{GDI32.DLL}) or X11 | |
14509 to provide window-system functionality. This gives us four possible | |
14510 build environments. It's currently possible to build XEmacs with at | |
14511 least three of these combinations -- as far as I know native + X11 is no | |
14512 longer supported, although it used to be (support used to exist in | |
14513 @file{xemacs.mak} for linking with some X11 libraries available from | |
14514 somewhere, but it was bit-rotting and you could always use Cygwin; #### | |
14515 what happens if we try to compile with MinGW, native OS + X11?). This | |
14516 may still seem confusing, so: | |
14517 | |
14518 @table @asis | |
14519 @item Native OS + native windowing | |
14520 We call @code{CreateProcess()} to run subprocesses | |
14521 (@file{process-nt.c}), and @code{CreateWindowEx()} to create a top-level | |
14522 window (@file{frame-msw.c}). We use @file{nt/xemacs.mak} to compile | |
14523 with VC++, linking with the Windows CRT (@file{MSVCRT.DLL} or | |
14524 @file{LIBC.LIB}) and with the various Win32 DLL's (@file{KERNEL32.DLL}, | |
14525 @file{USER32.DLL}, @file{GDI32.DLL}); or we use | |
14526 @file{src/Makefile[.in.in]} to compile with GCC, telling it | |
14527 (e.g. -mno-cygwin, see @file{s/mingw32.h}) to use MinGW (which will end | |
14528 up linking with @file{MSVCRT.DLL}), and linking GCC with -lshell32 | |
14529 -lgdi32 -luser32 etc. (see @file{configure.in}). | |
14530 | |
14531 @item Cygwin + native windowing | |
14532 We call @code{fork()}/@code{execve()} to run subprocesses | |
14533 (@file{process-unix.c}), and @code{CreateWindowEx()} to create a | |
14534 top-level window (@file{frame-msw.c}). We use | |
14535 @file{src/Makefile[in.in]} to compile with GCC (it will end up linking | |
14536 with @file{CYGWIN1.DLL}) and link GCC with -lshell32 -lgdi32 -luser32 | |
14537 etc. (see @file{configure.in}). | |
14538 | |
14539 @item Cygwin + X11 | |
14540 We call @code{fork()}/@code{execve()} to run subprocesses | |
14541 (@file{process-unix.c}), and @code{XtCreatePopupShell()} to create a | |
14542 top-level window (@file{frame-x.c}). We use @file{src/Makefile[.in.in]} | |
14543 to compile with GCC (it will end up linking with @file{CYGWIN1.DLL}) and | |
14544 link GCC with -lXt, -lX11, etc. (see @file{configure.in}). | |
14545 | |
14546 Finally, if native OS + X11 were possible, it might look something like | |
14547 | |
14548 @item [Native OS + X11] | |
14549 We call @code{CreateProcess()} to run subprocesses | |
14550 (@file{process-nt.c}), and @code{XtCreatePopupShell()} to create a | |
14551 top-level window (@file{frame-x.c}). We use @file{nt/xemacs.mak} to | |
14552 compile with VC++, linking with the Windows CRT (@file{MSVCRT.DLL} or | |
14553 @file{LIBC.LIB}) and with the various X11 DLL's (@file{XT.DLL}, | |
14554 @file{XLIB.DLL}, etc.); or we use @file{src/Makefile[.in.in]} to compile with | |
14555 GCC, telling it (e.g. -mno-cygwin, see @file{s/mingw32.h}) to use MinGW | |
14556 (which will end up linking with @file{MSVCRT.DLL}), and linking GCC with | |
14557 -lXt, -lX11, etc. (see @file{configure.in}). | |
14558 @end table | |
14559 | |
14560 One of the reasons that we maintain the ability to build under Cygwin | |
14561 and X11 on Windows, when we have native support, is that it allows | |
14562 Windows compilers to test under a Unix-like environment. | |
14563 | |
14564 @node Windows Build Flags, Windows I18N Introduction, Different kinds of Windows environments, Interface to MS Windows | |
14565 @section Windows Build Flags | |
14566 @cindex Windows build flags | |
14567 @cindex MS Windows build flags | |
14568 @cindex build flags, Windows | |
14569 | |
14570 @table @code | |
14571 @item CYGWIN | |
14572 for Cygwin-only stuff. | |
14573 @item WIN32_NATIVE | |
14574 Win32 native OS-level stuff (files, process, etc.). Applies whenever | |
14575 linking against the native C libraries -- i.e. all compilations with | |
14576 VC++ and with MINGW, but never Cygwin. | |
14577 @item HAVE_X_WINDOWS | |
14578 for X Windows (regardless of whether under MS Win) | |
14579 @item HAVE_MS_WINDOWS | |
14580 MS Windows native windowing system (anything related to the appearance | |
14581 of the graphical screen). May or may not apply to any of VC++, MINGW, | |
14582 Cygwin. | |
14583 @end table | |
14584 | |
14585 Finally, there's also the MINGW build environment, which uses GCC | |
14586 (similar to Cygwin), but native MS Windows libraries rather than a | |
14587 POSIX emulation layer (the Cygwin approach). This environment defines | |
14588 WIN32_NATIVE, but also defines MINGW, which is used mostly because | |
14589 uses its own include files (related to Cygwin), which have a few | |
14590 things messed up. | |
14591 | |
14592 Formerly, we had a whole host of flags. Here's the conversion, for porting | |
14593 code from GNU Emacs and such: | |
14594 | |
14595 @c @multitable {Old Constant} {determine whether this code is really specific to MS-DOS (and not Windows -- e.g. DJGPP code} | |
14596 @multitable @columnfractions .25 .75 | |
14597 @item Old Constant @tab New Constant | |
14598 @item ---------------------------------------------------------------- | |
14599 @item @code{WINDOWSNT} | |
14600 @tab @code{WIN32_NATIVE} | |
14601 @item @code{WIN32} | |
14602 @tab @code{WIN32_NATIVE} | |
14603 @item @code{_WIN32} | |
14604 @tab @code{WIN32_NATIVE} | |
14605 @item @code{HAVE_WIN32} | |
14606 @tab @code{WIN32_NATIVE} | |
14607 @item @code{DOS_NT} | |
14608 @tab @code{WIN32_NATIVE} | |
14609 @item @code{HAVE_NTGUI} | |
14610 @tab @code{WIN32_NATIVE}, unless it ends up already bracketed by this | |
14611 @item @code{HAVE_FACES} | |
14612 @tab always true | |
14613 @item @code{MSDOS} | |
14614 @tab determine whether this code is really specific to MS-DOS (and not | |
14615 Windows -- e.g. DJGPP code); if so, delete the code; otherwise, | |
14616 convert to @code{WIN32_NATIVE} (we do not support MS-DOS w/DOS Extender | |
14617 under XEmacs) | |
14618 @item @code{__CYGWIN__} | |
14619 @tab @code{CYGWIN} | |
14620 @item @code{__CYGWIN32__} | |
14621 @tab @code{CYGWIN} | |
14622 @item @code{__MINGW32__} | |
14623 @tab @code{MINGW} | |
14624 @end multitable | |
14625 | |
14626 @node Windows I18N Introduction, Modules for Interfacing with MS Windows, Windows Build Flags, Interface to MS Windows | |
14627 @section Windows I18N Introduction | |
14628 @cindex Windows I18N | |
14629 @cindex I18N, Windows | |
14630 @cindex MS Windows I18N | |
14631 | |
14632 @strong{Abstract:} This page provides an overview of the aspects of the | |
14633 Win32 internationalization API that are relevant to XEmacs, including | |
14634 the basic distinction between multibyte and Unicode encodings. Also | |
14635 included are pointers to how XEmacs should make use of this API. | |
14636 | |
14637 The Win32 API is quite well-designed in its handling of strings encoded | |
14638 for various character sets. The API is geared around the idea that two | |
14639 different methods of encoding strings should be supported. These | |
14640 methods are called multibyte and Unicode, respectively. The multibyte | |
14641 encoding is compatible with ASCII strings and is a more efficient | |
14642 representation when dealing with strings containing primarily ASCII | |
14643 characters, but it has a great number of serious deficiencies and | |
14644 limitations, including that it is very difficult and error-prone to work | |
14645 with strings in this encoding, and any particular string in a multibyte | |
14646 encoding can only contain characters from a very limited number of | |
14647 character sets. The Unicode encoding rectifies all of these | |
14648 deficiencies, but it is not compatible with ASCII strings (in other | |
14649 words, an existing program will not be able to handle the encoded | |
14650 strings unless it is explicitly modified to do so), and it takes up | |
14651 twice as much memory space as multibyte encodings when encoding a purely | |
14652 ASCII string. | |
14653 | |
14654 Multibyte encodings use a variable number of bytes (either one or two) | |
14655 to represent characters. ASCII characters are also represented by a | |
14656 single byte with its high bit not set, and non-ASCII characters are | |
14657 represented by one or two bytes, the first of which always has its high | |
14658 bit set. (The second byte, when it exists, may or may not have its high | |
14659 bit set.) There is no single multibyte encoding. Instead, there is | |
14660 generally one encoding per non-ASCII character set. Such an encoding is | |
14661 capable of representing (besides ASCII characters, of course) only | |
14662 characters from one (or possibly two) particular character sets. | |
14663 | |
14664 Multibyte encoding makes processing of strings very difficult. For | |
14665 example, given a pointer to the beginning of a character within a | |
14666 string, finding the pointer to the beginning of the previous character | |
14667 may require backing up all the way to the beginning of the string, and | |
14668 then moving forward. Also, an operation such as separating out the | |
14669 components of a path by searching for backslashes will fail if it's | |
14670 implemented in the simplest (but not multibyte-aware) fashion, because | |
14671 it may find what appears to be a backslash, but which is actually the | |
14672 second byte of a two-byte character. Also, the limited number of | |
14673 character sets that any particular multibyte encoding can represent | |
14674 means that loss of data is likely if a string is converted from the | |
14675 XEmacs internal format into a multibyte format. | |
14676 | |
14677 For these reasons, the C code in XEmacs should never do any sort of work | |
14678 with multibyte encoded strings (or with strings in any external encoding | |
14679 for that matter). Strings should always be maintained in the internal | |
14680 encoding, which is predictable, and converted to an external encoding | |
14681 only at the point where the string moves from the XEmacs C code and | |
14682 enters a system library function. Similarly, when a string is returned | |
14683 from a system library function, it should be immediately converted into | |
14684 the internal coding before any operations are done on it. | |
14685 | |
14686 Unicode, unlike multibyte encodings, is a fixed-width encoding where | |
14687 every character is represented using 16 bits. It is also capable of | |
14688 encoding all the characters from all the character sets in common use in | |
14689 the world. The predictability and completeness of the Unicode encoding | |
14690 makes it a very good encoding for strings that may contain characters | |
14691 from many character sets mixed up with each other. At the same time, of | |
14692 course, it is incompatible with routines that expect ASCII characters | |
14693 and also incompatible with general string manipulation routines, which | |
14694 will encounter a great number of what would appear to be embedded nulls | |
14695 in the string. It also takes twice as much room to encode strings | |
14696 containing primarily ASCII characters. This is why XEmacs does not use | |
14697 Unicode or similar encoding internally for buffers. | |
14698 | |
14699 The Win32 API cleverly deals with the issue of 8 bit vs. 16 bit | |
14700 characters by declaring a type called @code{@dfn{TCHAR}} which specifies | |
14701 a generic character, either 8 bits or 16 bits. Generally @code{TCHAR} | |
14702 is defined to be the same as the simple C type @code{char}, unless the | |
14703 preprocessor constant @code{UNICODE} is defined, in which case | |
14704 @code{TCHAR} is defined to be @code{WCHAR}, which is a 16 bit type. | |
14705 Nearly all functions in the Win32 API that take strings are defined to | |
14706 take strings that are actually arrays of @code{TCHAR}s. There is a type | |
14707 @code{LPTSTR} which is defined to be a string of @code{TCHAR}s and | |
14708 another type @code{LPCTSTR} which is a const string of @code{TCHAR}s. | |
14709 The theory is that any program that uses @code{TCHAR}s exclusively to | |
14710 represent characters and does not make assumptions about the size of a | |
14711 @code{TCHAR} or the way that the characters are encoded should work | |
14712 transparently regardless of whether the @code{UNICODE} preprocessor | |
14713 constant is defined, which is to say, regardless of whether 8 bit | |
14714 multibyte or 16 bit Unicode characters are being used. The way that | |
14715 this is actually implemented is that every Win32 API function that takes | |
14716 a string as an argument actually maps to one of two functions which are | |
14717 suffixed with an @code{A} (which stands for ANSI, and means multibyte | |
14718 strings) or @code{W} (which stands for wide, and means Unicode strings). | |
14719 The mapping is, of course, controlled by the same @code{UNICODE} | |
14720 preprocessor constant. Generally all structures containing strings in | |
14721 them actually map to one of two different kinds of structures, with | |
14722 either an @code{A} or a @code{W} suffix after the structure name. | |
14723 | |
14724 Unfortunately, not all of the implementations of the Win32 API | |
14725 implement all of the functionality described above. In particular, | |
14726 Windows 95 does not implement very much Unicode functionality. It | |
14727 does implement functions to convert multibyte-encoded strings to and | |
14728 from Unicode strings, and provides Unicode versions of certain | |
14729 low-level functions like @code{ExtTextOut()}. In fact, all of | |
14730 the rest of the Unicode versions of API functions are just stubs that | |
14731 return an error. Conversely, all versions of Windows NT completely | |
14732 implement all the Unicode functionality, but some versions (especially | |
14733 versions before Windows NT 4.0) don't implement much of the multibyte | |
14734 functionality. For this reason, as well as for general code | |
14735 cleanliness, XEmacs needs to be written in such a way that it works | |
14736 with or without the @code{UNICODE} preprocessor constant being | |
14737 defined. | |
14738 | |
14739 Getting XEmacs to run when all strings are Unicode primarily | |
14740 involves removing any assumptions made about the size of characters. | |
14741 Remember what I said earlier about how the point of conversion between | |
14742 internally and externally encoded strings should occur at the point of | |
14743 entry or exit into or out of a library function. With this in mind, | |
14744 an externally encoded string in XEmacs can be treated simply as an | |
14745 arbitrary sequence of bytes of some length which has no particular | |
14746 relationship to the length of the string in the internal encoding. | |
14747 | |
14748 #### The rest of this is @strong{out-of-date} and needs to be written | |
14749 to reference the actual coding systems or aliases that we currently use. | |
14750 | |
14751 [[ To facilitate this, the enum @code{external_data_format}, which is | |
14752 declared in @file{lisp.h}, is expanded to contain three new formats, | |
14753 which are @code{FORMAT_LOCALE}, @code{FORMAT_UNICODE} and | |
14754 @code{FORMAT_TSTR}. @code{FORMAT_LOCALE} always causes encoding into a | |
14755 multibyte string consistent with the encoding of the current locale. | |
14756 The functions to handle locales are different under Unix and Windows and | |
14757 locales are a process property under Unix and a thread property under | |
14758 Windows, but the concepts are basically the same. @code{FORMAT_UNICODE} | |
14759 of course causes encoding into Unicode and @code{FORMAT_TSTR} logically | |
14760 maps to either @code{FORMAT_LOCALE} or @code{FORMAT_UNICODE} depending | |
14761 on the @code{UNICODE} preprocessor constant. | |
14762 | |
14763 Under Unix the behavior of @code{FORMAT_TSTR} is undefined and this | |
14764 particular format should not be used. Under Windows however | |
14765 @code{FORMAT_TSTR} should be used for pretty much all of the Win32 API | |
14766 calls. The other two formats should only be used in particular APIs | |
14767 that specifically call for a multibyte or Unicode encoded string | |
14768 regardless of the @code{UNICODE} preprocessor constant. String | |
14769 constants that are to be passed directly to Win32 API functions, such as | |
14770 the names of window classes, need to be bracketed in their definition | |
14771 with a call to the macro @code{TEXT}. This awfully named macro, which | |
14772 comes out of the Win32 API, appropriately makes a string of either | |
14773 regular or wide chars, which is to say this string may be prepended with | |
14774 an @code{L} (causing it to be a wide string) depending on the | |
14775 @code{UNICODE} preprocessor constant. | |
14776 | |
14777 By the way, if you're wondering what happened to @code{FORMAT_OS}, I | |
14778 think that this format should go away entirely because it is too vague | |
14779 and should be replaced by more specific formats as they are defined. | |
14780 ]] | |
14781 | |
14782 Use Qnative for Unix conversion, Qmswindows_tstr for Windows ... | |
14783 | |
14784 String constants that are to be passed directly to Win32 API functions, | |
14785 such as the names of window classes, need to be bracketed in their | |
14786 definition with a call to the macro XETEXT. This appropriately makes a | |
14787 string of either regular or wide chars, which is to say this string may be | |
14788 prepended with an L (causing it to be a wide string) depending on | |
14789 XEUNICODE_P. | |
14790 | |
14791 @node Modules for Interfacing with MS Windows, , Windows I18N Introduction, Interface to MS Windows | |
14792 @section Modules for Interfacing with MS Windows | |
14793 @cindex modules for interfacing with MS Windows | |
14794 @cindex interfacing with MS Windows, modules for | |
14795 @cindex MS Windows, modules for interfacing with | |
14796 @cindex Windows, modules for interfacing with | |
14797 | |
14798 There are two different general Windows-related include files in src. | |
14799 | |
14800 Uses are approximately: | |
14801 | |
14802 @table @file | |
14803 @item syswindows.h | |
14804 Wrapper around @file{<windows.h>}, including missing defines as | |
14805 necessary. Includes stuff needed on both Cygwin and native Windows, | |
14806 regardless of window system chosen. Includes definitions needed for | |
14807 Unicode conversion/encapsulation, and other Mule-related stuff, plus | |
14808 various other prototypes and Windows-specific, but not GUI-specific, | |
14809 stuff. | |
14810 | |
14811 @item console-msw.h | |
14812 Used on both Cygwin and native Windows, but only when native window | |
14813 system (as opposed to X) chosen. Includes @file{syswindows.h}. | |
14814 @end table | |
14815 | |
14816 Summary of files: | |
14817 | |
14818 @table @file | |
14819 @item console-msw.h | |
14820 include file for native windowing (otherwise, @file{console-x.h}, etc.) | |
14821 @item console-msw.c, frame-msw.c, etc. | |
14822 native windowing, as above | |
14823 @item process-nt.c | |
14824 subprocess support for native OS (otherwise, @file{process-unix.c}) | |
14825 @item nt.c | |
14826 support routines used under native OS | |
14827 @item win32.c | |
14828 support routines used under both OS environments | |
14829 @item syswindows.h | |
14830 support header for both environments | |
14831 @item nt/xemacs.mak | |
14832 Makefile for VC++ (otherwise, @file{src/Makefile.in.in}) | |
14833 @item s/windowsnt.h | |
14834 s header for basic native-OS defines, VC++ compiler | |
14835 @item s/mingw32.h | |
14836 s header for basic native-OS defines, GCC/MinGW compiler | |
14837 @item s/cygwin.h | |
14838 s header for basic Cygwin defines | |
14839 @item s/win32-native.h | |
14840 s header for basic native-OS defines, all compilers | |
14841 @item s/win32-common.h | |
14842 s header for defines for both OS environments | |
14843 @item intl-win32.c | |
14844 internationalization functions for both OS environments | |
14845 @item intl-encap-win32.c | |
14846 Unicode encapsulation functions for both OS environments | |
14847 @item intl-auto-encap-win32.c | |
14848 Auto-generated Unicode encapsulation functions | |
14849 @item intl-auto-encap-win32.h | |
14850 Auto-generated Unicode encapsulation headers | |
14851 @end table | |
14852 | |
14853 @node Interface to the X Window System, Future Work, Interface to MS Windows, Top | |
10892 @chapter Interface to the X Window System | 14854 @chapter Interface to the X Window System |
10893 @cindex X Window System, interface to the | 14855 @cindex X Window System, interface to the |
10894 | 14856 |
10895 Mostly undocumented. | 14857 Mostly undocumented. |
10896 | 14858 |
10897 @menu | 14859 @menu |
10898 * Lucid Widget Library:: An interface to various widget sets. | 14860 * Lucid Widget Library:: An interface to various widget sets. |
14861 * Modules for Interfacing with X Windows:: | |
10899 @end menu | 14862 @end menu |
10900 | 14863 |
10901 @node Lucid Widget Library | 14864 @node Lucid Widget Library, Modules for Interfacing with X Windows, Interface to the X Window System, Interface to the X Window System |
10902 @section Lucid Widget Library | 14865 @section Lucid Widget Library |
10903 @cindex Lucid Widget Library | 14866 @cindex Lucid Widget Library |
10904 @cindex widget library, Lucid | 14867 @cindex widget library, Lucid |
10905 @cindex library, Lucid Widget | 14868 @cindex library, Lucid Widget |
10906 | 14869 |
10922 not know which widget set has been used to build the graphical user | 14885 not know which widget set has been used to build the graphical user |
10923 interface. | 14886 interface. |
10924 | 14887 |
10925 @menu | 14888 @menu |
10926 * Generic Widget Interface:: The lwlib generic widget interface. | 14889 * Generic Widget Interface:: The lwlib generic widget interface. |
10927 * Scrollbars:: | 14890 * Scrollbars:: |
10928 * Menubars:: | 14891 * Menubars:: |
10929 * Checkboxes and Radio Buttons:: | 14892 * Checkboxes and Radio Buttons:: |
10930 * Progress Bars:: | 14893 * Progress Bars:: |
10931 * Tab Controls:: | 14894 * Tab Controls:: |
10932 @end menu | 14895 @end menu |
10933 | 14896 |
10934 @node Generic Widget Interface | 14897 @node Generic Widget Interface, Scrollbars, Lucid Widget Library, Lucid Widget Library |
10935 @subsection Generic Widget Interface | 14898 @subsection Generic Widget Interface |
10936 @cindex widget interface, generic | 14899 @cindex widget interface, generic |
10937 | 14900 |
10938 In general in any toolkit a widget may be a composite object. In Xt, | 14901 In general in any toolkit a widget may be a composite object. In Xt, |
10939 all widgets have an X window that they manage, but typically a complex | 14902 all widgets have an X window that they manage, but typically a complex |
11010 | 14973 |
11011 The @code{widget_instance} structure also contains a pointer to the root | 14974 The @code{widget_instance} structure also contains a pointer to the root |
11012 of its tree. Widget instances are further confi | 14975 of its tree. Widget instances are further confi |
11013 | 14976 |
11014 | 14977 |
11015 @node Scrollbars | 14978 @node Scrollbars, Menubars, Generic Widget Interface, Lucid Widget Library |
11016 @subsection Scrollbars | 14979 @subsection Scrollbars |
11017 @cindex scrollbars | 14980 @cindex scrollbars |
11018 | 14981 |
11019 @node Menubars | 14982 @node Menubars, Checkboxes and Radio Buttons, Scrollbars, Lucid Widget Library |
11020 @subsection Menubars | 14983 @subsection Menubars |
11021 @cindex menubars | 14984 @cindex menubars |
11022 | 14985 |
11023 @node Checkboxes and Radio Buttons | 14986 @node Checkboxes and Radio Buttons, Progress Bars, Menubars, Lucid Widget Library |
11024 @subsection Checkboxes and Radio Buttons | 14987 @subsection Checkboxes and Radio Buttons |
11025 @cindex checkboxes and radio buttons | 14988 @cindex checkboxes and radio buttons |
11026 @cindex radio buttons, checkboxes and | 14989 @cindex radio buttons, checkboxes and |
11027 @cindex buttons, checkboxes and radio | 14990 @cindex buttons, checkboxes and radio |
11028 | 14991 |
11029 @node Progress Bars | 14992 @node Progress Bars, Tab Controls, Checkboxes and Radio Buttons, Lucid Widget Library |
11030 @subsection Progress Bars | 14993 @subsection Progress Bars |
11031 @cindex progress bars | 14994 @cindex progress bars |
11032 @cindex bars, progress | 14995 @cindex bars, progress |
11033 | 14996 |
11034 @node Tab Controls | 14997 @node Tab Controls, , Progress Bars, Lucid Widget Library |
11035 @subsection Tab Controls | 14998 @subsection Tab Controls |
11036 @cindex tab controls | 14999 @cindex tab controls |
11037 | 15000 |
11038 @include index.texi | 15001 |
15002 @node Modules for Interfacing with X Windows, , Lucid Widget Library, Interface to the X Window System | |
15003 @section Modules for Interfacing with X Windows | |
15004 @cindex modules for interfacing with X Windows | |
15005 @cindex interfacing with X Windows, modules for | |
15006 @cindex X Windows, modules for interfacing with | |
15007 | |
15008 @example | |
15009 Emacs.ad.h | |
15010 @end example | |
15011 | |
15012 A file generated from @file{Emacs.ad}, which contains XEmacs-supplied | |
15013 fallback resources (so that XEmacs has pretty defaults). | |
15014 | |
15015 | |
15016 | |
15017 @example | |
15018 EmacsFrame.c | |
15019 EmacsFrame.h | |
15020 EmacsFrameP.h | |
15021 @end example | |
15022 | |
15023 These modules implement an Xt widget class that encapsulates a frame. | |
15024 This is for ease in integrating with Xt. The EmacsFrame widget covers | |
15025 the entire X window except for the menubar; the scrollbars are | |
15026 positioned on top of the EmacsFrame widget. | |
15027 | |
15028 @strong{Warning:} Abandon hope, all ye who enter here. This code took | |
15029 an ungodly amount of time to get right, and is likely to fall apart | |
15030 mercilessly at the slightest change. Such is life under Xt. | |
15031 | |
15032 | |
15033 | |
15034 @example | |
15035 EmacsManager.c | |
15036 EmacsManager.h | |
15037 EmacsManagerP.h | |
15038 @end example | |
15039 | |
15040 These modules implement a simple Xt manager (i.e. composite) widget | |
15041 class that simply lets its children set whatever geometry they want. | |
15042 It's amazing that Xt doesn't provide this standardly, but on second | |
15043 thought, it makes sense, considering how amazingly broken Xt is. | |
15044 | |
15045 | |
15046 @example | |
15047 EmacsShell-sub.c | |
15048 EmacsShell.c | |
15049 EmacsShell.h | |
15050 EmacsShellP.h | |
15051 @end example | |
15052 | |
15053 These modules implement two Xt widget classes that are subclasses of | |
15054 the TopLevelShell and TransientShell classes. This is necessary to deal | |
15055 with more brokenness that Xt has sadistically thrust onto the backs of | |
15056 developers. | |
15057 | |
15058 | |
15059 | |
15060 @example | |
15061 xgccache.c | |
15062 xgccache.h | |
15063 @end example | |
15064 | |
15065 These modules provide functions for maintenance and caching of GC's | |
15066 (graphics contexts) under the X Window System. This code is junky and | |
15067 needs to be rewritten. | |
15068 | |
15069 | |
15070 | |
15071 @example | |
15072 select-msw.c | |
15073 select-x.c | |
15074 select.c | |
15075 select.h | |
15076 @end example | |
15077 | |
15078 @cindex selections | |
15079 This module provides an interface to the X Window System's concept of | |
15080 @dfn{selections}, the standard way for X applications to communicate | |
15081 with each other. | |
15082 | |
15083 | |
15084 | |
15085 @example | |
15086 xintrinsic.h | |
15087 xintrinsicp.h | |
15088 xmmanagerp.h | |
15089 xmprimitivep.h | |
15090 @end example | |
15091 | |
15092 These header files are similar in spirit to the @file{sys*.h} files and buffer | |
15093 against different implementations of Xt and Motif. | |
15094 | |
15095 @itemize @bullet | |
15096 @item | |
15097 @file{xintrinsic.h} should be included in place of @file{<Intrinsic.h>}. | |
15098 @item | |
15099 @file{xintrinsicp.h} should be included in place of @file{<IntrinsicP.h>}. | |
15100 @item | |
15101 @file{xmmanagerp.h} should be included in place of @file{<XmManagerP.h>}. | |
15102 @item | |
15103 @file{xmprimitivep.h} should be included in place of @file{<XmPrimitiveP.h>}. | |
15104 @end itemize | |
15105 | |
15106 | |
15107 | |
15108 @example | |
15109 xmu.c | |
15110 xmu.h | |
15111 @end example | |
15112 | |
15113 These files provide an emulation of the Xmu library for those systems | |
15114 (i.e. HPUX) that don't provide it as a standard part of X. | |
15115 | |
15116 | |
15117 | |
15118 @example | |
15119 ExternalClient-Xlib.c | |
15120 ExternalClient.c | |
15121 ExternalClient.h | |
15122 ExternalClientP.h | |
15123 ExternalShell.c | |
15124 ExternalShell.h | |
15125 ExternalShellP.h | |
15126 extw-Xlib.c | |
15127 extw-Xlib.h | |
15128 extw-Xt.c | |
15129 extw-Xt.h | |
15130 @end example | |
15131 | |
15132 @cindex external widget | |
15133 These files provide the @dfn{external widget} interface, which allows an | |
15134 XEmacs frame to appear as a widget in another application. To do this, | |
15135 you have to configure with @samp{--external-widget}. | |
15136 | |
15137 @file{ExternalShell*} provides the server (XEmacs) side of the | |
15138 connection. | |
15139 | |
15140 @file{ExternalClient*} provides the client (other application) side of | |
15141 the connection. These files are not compiled into XEmacs but are | |
15142 compiled into libraries that are then linked into your application. | |
15143 | |
15144 @file{extw-*} is common code that is used for both the client and server. | |
15145 | |
15146 Don't touch this code; something is liable to break if you do. | |
15147 | |
15148 | |
15149 @node Future Work, Future Work Discussion, Interface to the X Window System, Top | |
15150 @chapter Future Work | |
15151 @cindex future work | |
15152 | |
15153 @menu | |
15154 * Future Work -- Elisp Compatibility Package:: | |
15155 * Future Work -- Drag-n-Drop:: | |
15156 * Future Work -- Standard Interface for Enabling Extensions:: | |
15157 * Future Work -- Better Initialization File Scheme:: | |
15158 * Future Work -- Keyword Parameters:: | |
15159 * Future Work -- Property Interface Changes:: | |
15160 * Future Work -- Toolbars:: | |
15161 * Future Work -- Menu API Changes:: | |
15162 * Future Work -- Removal of Misc-User Event Type:: | |
15163 * Future Work -- Mouse Pointer:: | |
15164 * Future Work -- Extents:: | |
15165 * Future Work -- Version Number and Development Tree Organization:: | |
15166 * Future Work -- Improvements to the @code{xemacs.org} Website:: | |
15167 * Future Work -- Keybindings:: | |
15168 * Future Work -- Byte Code Snippets:: | |
15169 * Future Work -- Lisp Stream API:: | |
15170 * Future Work -- Multiple Values:: | |
15171 * Future Work -- Macros:: | |
15172 * Future Work -- Specifiers:: | |
15173 * Future Work -- Display Tables:: | |
15174 * Future Work -- Making Elisp Function Calls Faster:: | |
15175 * Future Work -- Lisp Engine Replacement:: | |
15176 @end menu | |
15177 | |
15178 @ignore | |
15179 Macro to convert a single line containing a heading into the format of | |
15180 all headings in the Future Work section. | |
15181 | |
15182 (setq last-kbd-macro (read-kbd-macro | |
15183 "<S-end> <f3> <home> @node SPC <end> RET @section SPC <f4> <home> <up> <C-right> <right> Future SPC Work SPC - - SPC <home> <down> <C-right> <right> Future SPC Work SPC - - SPC <end> RET @cindex SPC future SPC work, SPC <f4> C-r , RET C-x C-x M-l RET @cindex SPC <f4> <home> <C-right> <S-end> M-l , SPC future SPC work RET")) | |
15184 @end ignore | |
15185 | |
15186 @node Future Work -- Elisp Compatibility Package, Future Work -- Drag-n-Drop, Future Work, Future Work | |
15187 @section Future Work -- Elisp Compatibility Package | |
15188 @cindex future work, elisp compatibility package | |
15189 @cindex elisp compatibility package, future work | |
15190 | |
15191 A while ago I created a package called Sysdep, which aimed to be a | |
15192 forward compatibility package for Elisp. The idea was that instead of | |
15193 having to write your package using the oldest version of Emacs that you | |
15194 wanted to support, you could use the newest XEmacs API, and then simply | |
15195 load the Sysdep package, which would automatically define the new API in | |
15196 terms of older APIs as necessary. The idea of this package was good, | |
15197 but its design wasn't perfect, and it wasn't widely adopted. I propose | |
15198 a new package called Compat that corrects the design flaws in Sysdep, | |
15199 and hopefully will be adopted by most of the major packages. | |
15200 | |
15201 In addition, this package will provide macros that can be used to | |
15202 bracket code as necessary to disable byte compiler warnings generated as | |
15203 a result of supporting the APIs of different versions of Emacs; or | |
15204 rather the Compat package strives to provide useful constructs to make | |
15205 doing this support easier, and these constructs have the side effect of | |
15206 not causing spurious byte compiler warnings. The idea here is that it | |
15207 should be possible to create well-written, clean, and understandable | |
15208 Elisp that supports both older and newer APIs, and has no byte compiler | |
15209 warnings. Currently many warnings are unavoidable, and as a result, | |
15210 they are simply ignored, which also causes a lot of legitimate warnings | |
15211 to be ignored. | |
15212 | |
15213 The approach taken by the Sysdep package to make sure that the newest | |
15214 API was always supported was fairly simple: when the Sysdep package was | |
15215 loaded, it checked for the existence of new API functions, and if they | |
15216 weren't defined, it defined them in terms of older API functions that | |
15217 were defined. This had the advantage that the checks for which API | |
15218 functions were defined were done only once at load time rather than each | |
15219 time the function was called. However, the fact that the new APIs were | |
15220 globally defined caused a lot of problems with unwanted interactions, | |
15221 both with other versions of the Sysdep package provided as part of other | |
15222 packages, and simply with compatibility code of other sorts in packages | |
15223 that would determine whether an API existed by checking for the | |
15224 existence of certain functions within that API. In addition, the Sysdep | |
15225 package did not scale well because it defined all of the functions that | |
15226 it supported, regardless of whether or not they were used. | |
15227 | |
15228 The Compat package remedies the first problem by ensuring that the new | |
15229 APIs are defined only within the lexical scope of the packages that | |
15230 actually make use of the Compat package. It remedies the second problem | |
15231 by ensuring that only definitions of functions that are actually used | |
15232 are loaded. This all works roughly according to the following scheme: | |
15233 | |
15234 @enumerate | |
15235 @item | |
15236 | |
15237 Part of the Compat package is a module called the Compat generator. | |
15238 This module is actually run as an additional step during byte | |
15239 compilation of a package that uses Compat. This can happen either | |
15240 through the makefile or through the use of an @code{eval-when-compile} | |
15241 call within the package code itself. What the generator does is scan | |
15242 all of the Lisp code in the package, determine which function calls are | |
15243 made that the Compat package knows about, and generates custom | |
15244 @code{compat} code that conditionally defines just these functions when | |
15245 the package is loaded. The custom @code{compat} code can either be | |
15246 written to a separate Lisp file (for use with multi-file packages), or | |
15247 inserted into the beginning of the Lisp file of a single file package. | |
15248 (In the latter case, the package indicates where this generated code | |
15249 should go through the use of magic comments that mark the beginning and | |
15250 end of the section. Some will say that doing this trick is bad juju, | |
15251 but I have done this sort of thing before, and it works very well in | |
15252 practice). | |
15253 @item | |
15254 | |
15255 The functions in the custom @code{compat} code have their names prefixed | |
15256 with both the name of the package and the word @code{compat}, ensuring | |
15257 that there will be no name space conflicts with other functions in the | |
15258 same package, or with other packages that make use of the Compat | |
15259 package. | |
15260 @item | |
15261 | |
15262 The actual definitions of the functions in the custom @code{compat} code | |
15263 are determined at run time. When the equivalent API already exists, the | |
15264 wrapper functions are simply defined directly in terms of the actual | |
15265 functions, so that the only run time overhead from using the Compat | |
15266 package is one additional function call. (Alternatively, even this | |
15267 small overhead could be avoided by retrieving the definitions of the | |
15268 actual functions and supplying them as the definitions of the wrapper | |
15269 functions. However, this appears to me to not be completely safe. For | |
15270 example, it might have bad interactions with the advice package). | |
15271 @item | |
15272 | |
15273 The code that wants to make use of the custom @code{compat} code is | |
15274 bracketed by a call to the construct @code{compat-execute}. What this | |
15275 actually does is lexically bind all of the function names that are being | |
15276 redefined with macro functions by using the Common Lisp macro macrolet. | |
15277 (The definition of this macro is in the CL package, but in order for | |
15278 things to work on all platforms, the definition of this macro will | |
15279 presumably have to be copied and inserted into the custom @code{compat} | |
15280 code). | |
15281 | |
15282 @end enumerate | |
15283 | |
15284 In addition, the Compat package should define the macro | |
15285 @code{compat-if-fboundp}. Similar macros such as | |
15286 @code{compile-when-fboundp} and @code{compile-case-fboundp} could be | |
15287 defined using similar principles). The @code{compat-if-fboundp} macro | |
15288 behaves just like an @code{(if (fboundp ...) ...)} clause when executed, | |
15289 but in addition, when it's compiled, it ensures that the code inside the | |
15290 @code{if-true} sub-block will not cause any byte compiler warnings about | |
15291 the function in question being unbound. I think that the way to | |
15292 implement this would be to make @code{compat-if-fboundp} be a macro that | |
15293 does what it's supposed to do, but which defines its own byte code | |
15294 handler, which ensures that the particular warning in question will be | |
15295 suppressed. (Actually ensuring that just the warning in question is | |
15296 suppressed, and not any others, might be rather tricky. It certainly | |
15297 requires further thought). | |
15298 | |
15299 Note: An alternative way of avoiding both warnings about unbound | |
15300 functions and warnings about obsolete functions is to just call the | |
15301 function in question by using @code{funcall}, instead of calling the | |
15302 function directly. This seems rather inelegant to me, though, and | |
15303 doesn't make it obvious why the function is being called in such a | |
15304 roundabout manner. Perhaps the Compat package should also provide a | |
15305 macro @code{compat-funcall}, which works exactly like @code{funcall}, | |
15306 but which indicates to anyone reading the code why the code is expressed | |
15307 in such a fashion. | |
15308 | |
15309 If you're wondering how to implement the part of the Compat generator | |
15310 where it scans Lisp code to find function calls for functions that it | |
15311 wants to do something about, I think the best way is to simply process | |
15312 the code using the Lisp function @code{read} and recursively descend any | |
15313 lists looking for function names as the first element of any list | |
15314 encountered. This might extract out a few more functions than are | |
15315 actually called, but it is almost certainly safer than doing anything | |
15316 trickier like byte compiling the code, and attempting to look for | |
15317 function calls in the result. (It could also be argued that the names | |
15318 of the functions should be extracted, not only from the first element of | |
15319 lists, but anywhere @code{symbol} occurs. For example, to catch places | |
15320 where a function is called using @code{funcall} or @code{apply}. | |
15321 However, such uses of functions would not be affected by the surrounding | |
15322 macrolet call, and so there doesn't appear to be any point in extracting | |
15323 them). | |
15324 | |
15325 @uref{../../www.666.com/ben/default.htm,Ben Wing} | |
15326 | |
15327 @node Future Work -- Drag-n-Drop, Future Work -- Standard Interface for Enabling Extensions, Future Work -- Elisp Compatibility Package, Future Work | |
15328 @section Future Work -- Drag-n-Drop | |
15329 @cindex future work, drag-n-drop | |
15330 @cindex drag-n-drop, future work | |
15331 | |
15332 @strong{Abstract:} I propose completely redoing the drag-n-drop | |
15333 interface to make it powerful and extensible enough to support such | |
15334 concepts as drag over and drag under visuals and context menus invoked | |
15335 when a drag is done with the right mouse button, to allow drop handlers | |
15336 to be defined for all sorts of graphical elements including buffers, | |
15337 extents, mode lines, toolbar items, menubar items, glyphs, etc., and to | |
15338 allow different packages to add and remove drop handlers for the same | |
15339 drop sites without interfering with each other. The changes are | |
15340 extensive enough that I think they can only be implemented in version | |
15341 22, and the drag-n-drop interface should remain experimental until then. | |
15342 | |
15343 The new drag-n-drop interface centers around the twin concepts of | |
15344 @dfn{drop site} and @dfn{drop handler}. A @dfn{drop site} specifies a | |
15345 particular graphical element where an object can be dropped onto, and a | |
15346 @dfn{drop handler} encapsulates all of the behavior that happens when | |
15347 such an object is dragged over and dropped onto a drop site. | |
15348 | |
15349 Each drop site has an object associated with it which is passed to | |
15350 functions that are part of the drop handlers associated with that site. | |
15351 The type of this object depends on the graphical element that comprises | |
15352 the drop site. The drop site object can be a buffer, an extent, a | |
15353 glyph, a menu path, a toolbar item path, etc. (These last two object | |
15354 types are defined in @uref{lisp-interface.html,Lisp Interface Changes} | |
15355 in the sections on menu and toolbar API changes. If we wanted to allow | |
15356 drops onto other kinds of drop sites, for example mode lines, we would | |
15357 have to create corresponding path objects). Each such object type | |
15358 should be able to be accessed using the generalized property interface | |
15359 defined above, and should have a property called @code{drop-handlers} | |
15360 associated with it that specifies all of the drop handlers associated | |
15361 with the drop site. Normally, this property is not accessed directly, | |
15362 but instead by using the drop handler API defined below, and Lisp | |
15363 packages should not make any assumptions about the format of the data | |
15364 contained in the @code{drop-handlers} property. | |
15365 | |
15366 Each drop handler has an object of type @code{drop-handler} associated | |
15367 with it, whose primary purpose is to be a container for the various | |
15368 properties associated with a particular drop handler. These could | |
15369 include, for example, a function invoked when the drop occurs, a context | |
15370 menu invoked when a drop occurs as a result of a drag with the right | |
15371 mouse button, functions invoked when a dragged object enters, leaves, or | |
15372 moves within a drop site, the shape that the mouse pointer changes to | |
15373 when an object is dragged over a drop site that allows this particular | |
15374 object to be dropped onto it, the MIME types (actually a regular | |
15375 expression matching the MIME types) of the allowable objects that can be | |
15376 dropped onto the drop site, a @dfn{package tag} (a symbol specifying the | |
15377 package that created the drop handler, used for identification | |
15378 purposes), etc. The drop handler object is passed to the functions that | |
15379 are invoked as a result of a drag or a drop, most likely indirectly as | |
15380 one of the properties of the drag or drop event passed to the function. | |
15381 Properties of a drop handler object are accessed and modified in the | |
15382 standard fashion using the generalized property interface. | |
15383 | |
15384 A drop handler is added to a drop site using the @code{add-drop-handler} | |
15385 function. The drop handler itself can either be created separately | |
15386 using the @code{make-drop-handler} function and then passed in as one of | |
15387 the parameters to @code{add-drop-handler}, or it will be created | |
15388 automatically by the @code{add-drop-handler} function, if the drop | |
15389 handler argument is omitted, but keyword arguments corresponding to the | |
15390 valid keyword properties for a drop handler are specified in the | |
15391 @code{add-drop-handler} call. Other functions, such as | |
15392 @code{find-drop-handler}, @code{add-drop-handler} (when specifying a | |
15393 drop handler before which the drop handler in question is to be added), | |
15394 @code{remove-drop-handler} etc. should be defined with obvious | |
15395 semantics. All of these functions take or return a drop site object | |
15396 which, as mentioned above, can be one of several object types | |
15397 corresponding to graphical elements. Defined drop handler functions | |
15398 locate a particular drop handler using either the @code{MIME-type} or | |
15399 @code{package-tag} property of the drop handler, as defined above. | |
15400 | |
15401 Logically, the drop handlers associated with a particular drop site are | |
15402 an ordered list. The first drop handler whose specified MIME type | |
15403 matches the MIME type of the object being dragged or dropped controls | |
15404 what happens to this object. This is important particularly because the | |
15405 specified MIME type of the drop handler can be a regular expression | |
15406 that, for example, matches all audio objects with any sub-type. | |
15407 | |
15408 In the current drag-n-drop API, there is a distinction made between | |
15409 objects with an associated MIME type and objects with an associated URL. | |
15410 I think that this distinction is arbitrary, and should not exist. All | |
15411 objects should have a MIME type associated with them, and a new | |
15412 XEmacs-specific MIME type should be defined for URLs, file names, | |
15413 etc. as necessary. I am not even sure that this is necessary, however, | |
15414 as the MIME specification may specify a general concept of a pointer or | |
15415 link to an object, which is exactly what we want. Also in some cases | |
15416 (for example, the name of a file that is locally available), the pointer | |
15417 or link will have another MIME type associated with it, which is the | |
15418 type of the object that is being pointed to. I am not quite sure how we | |
15419 should handle URL and file name objects being dragged, but I am positive | |
15420 that it needs to be integrated with the mechanism used when an object | |
15421 itself is being dragged or dropped. | |
15422 | |
15423 As is described in @uref{misc-user-event.html,a separate page}, the | |
15424 @code{misc-user-event} event type should be removed and split up into a | |
15425 number of separate event types. Two such event types would be | |
15426 @code{drag-event} and @code{drop-event}. A drop event is used when an | |
15427 object is actually dropped, and a drag event is used if a function is | |
15428 invoked as part of the dragging process. (Such a function would | |
15429 typically be used to control what are called @dfn{drag under visuals}, | |
15430 which are changes to the appearance of the drop site reflecting the fact | |
15431 that a compatible object is being dragged over it). The drag events and | |
15432 drop events encapsulate all of the information that is pertinent to the | |
15433 drag or drop action occurring, including such information as the actual | |
15434 MIME type of the object in question, the drop handler that caused a | |
15435 function to be invoked, the mouse event (or possibly even a keyboard | |
15436 event) corresponding to the user's action that is causing the drag or | |
15437 drop, etc. This event is always passed to any function that is invoked | |
15438 as a result of the drag or drop. There should never be any need to | |
15439 refer to the @code{current-mouse-event} variable, and in fact, this | |
15440 variable should not be changed at all during a drag or a drop. | |
15441 | |
15442 @uref{../../www.666.com/ben/default.htm,Ben Wing} | |
15443 | |
15444 @node Future Work -- Standard Interface for Enabling Extensions, Future Work -- Better Initialization File Scheme, Future Work -- Drag-n-Drop, Future Work | |
15445 @section Future Work -- Standard Interface for Enabling Extensions | |
15446 @cindex future work, standard interface for enabling extensions | |
15447 @cindex standard interface for enabling extensions, future work | |
15448 | |
15449 @strong{Abstract:} Apparently, if you know the name of a package (for | |
15450 example, @code{fusion}), you can load it using the @code{require} | |
15451 function, but there's no standard way to turn it on or turn it off. The | |
15452 only way to figure out how to do that is to go read the source file, | |
15453 where hopefully the comments at the start tell you the appropriate magic | |
15454 incantations that you need to run in order to turn the extension on or | |
15455 off. There really needs to be standard functions, such as | |
15456 @code{enable-extension} and @code{disable-extension}, to do this sort of | |
15457 thing. It seems like a glaring omission that this isn't currently | |
15458 present, and it's really surprising to me that nobody has remarked on | |
15459 this. | |
15460 | |
15461 The easy part of this is defining the interface, and I think it should | |
15462 be done as soon as possible. When the package is loaded, it simply | |
15463 calls some standard function in the package system, and passes it the | |
15464 names of enable and disable functions, or perhaps just one function that | |
15465 takes an argument specifying whether to enable or disable. In any case, | |
15466 this data is kept in a table which is used by the | |
15467 @code{enable-extension} and @code{disable-extension} function. There | |
15468 should also be functions such as @code{extension-enabled-p} and | |
15469 @code{enabled-extension-list}, and so on with obvious semantics. The | |
15470 hard part is actually getting packages to obey this standard interface, | |
15471 but this is mitigated by the fact that the changes needed to support | |
15472 this interface are so simple. | |
15473 | |
15474 I have been conceiving of these enabling and disabling functions as | |
15475 turning the feature on or off globally. It's probably also useful to | |
15476 have a standard interface returning a extension on or off in just the | |
15477 particular buffer. Perhaps then the appropriate interface would involve | |
15478 registering a single function that takes an argument that specifies | |
15479 various things, such as turn off globally, turn on globally, turn on or | |
15480 off in the current buffer, etc. | |
15481 | |
15482 Part of this interface should specify the correct way to define global | |
15483 key bindings. The correct rule for this, of course, is that the key | |
15484 bindings should not happen when the package is loaded, which is often | |
15485 how things are currently done, but only when the extension is actually | |
15486 enabled. The key bindings should go away when the extension is | |
15487 disabled. I think that in order to support this properly, we should | |
15488 expand the keymap interface slightly, so that in addition to other | |
15489 properties associated with each key binding is a list of shadow | |
15490 bindings. Then there should be a function called | |
15491 @code{define-key-shadowing}, which is just like @code{define-key} but | |
15492 which also remembers the previous key binding in a shadow list. Then | |
15493 there can be another function, something like @code{undefine-key}, which | |
15494 restores the binding to the most recently added item on the shadow list. | |
15495 There are already hash tables associated with each key binding, and it | |
15496 should be easy to stuff additional values, such as a shadow list, into | |
15497 the hash table. Probably there should also be functions called | |
15498 @code{global-set-key-shadowing} and @code{global-unset-key-shadowing} | |
15499 with obvious semantics. | |
15500 | |
15501 Once this interface is defined, it should be easy to expand the custom | |
15502 package so it knows about this interface. Then it will be possible to | |
15503 put all sorts of extensions on the options menu so that they could be | |
15504 turned off and turned on very easily, and then when you save the options | |
15505 out to a file, the design settings for whether these extensions are | |
15506 enabled or not are saved out with it. A whole lot of custom junk that's | |
15507 been added to a lot of different packages could be removed. After doing | |
15508 this, we might want to think of a way to classify extensions according | |
15509 to how likely we think the user will want to use them. This way we can | |
15510 avoid the problem of having a list of 100 extensions and the user not | |
15511 being able to figure out which ones might be useful. Perhaps the most | |
15512 useful extensions would appear immediately on the extensions menu, and | |
15513 the less useful ones would appear in a submenu of that, and another | |
15514 submenu might contain even less useful extensions. Of course the | |
15515 package authors might not be too happy with this, but the users probably | |
15516 will be. I think this at least deserves a thought, although it's | |
15517 possible you might simply want to maintain a list on the web site of | |
15518 extensions and a judgment on first of all, how commonly a user might | |
15519 want this extension, and second of all, how well written and bug-free | |
15520 the package is. Both of these sorts of judgments could be obtained by | |
15521 doing user surveys if need be. | |
15522 | |
15523 @uref{../../www.666.com/ben/default.htm,Ben Wing} | |
15524 | |
15525 @node Future Work -- Better Initialization File Scheme, Future Work -- Keyword Parameters, Future Work -- Standard Interface for Enabling Extensions, Future Work | |
15526 @section Future Work -- Better Initialization File Scheme | |
15527 @cindex future work, better initialization file scheme | |
15528 @cindex better initialization file scheme, future work | |
15529 | |
15530 @strong{Abstract:} A proposal is outlined for converting XEmacs to use | |
15531 the @code{.xemacs} subdirectory for its initialization files instead of | |
15532 putting them in the user's home directory. In the process, a general | |
15533 pre-initialization scheme is created whereby all of the initialization | |
15534 parameters, such as the location of the initialization files, whether | |
15535 these files are loaded or not, where the initial frame is created, | |
15536 etc. that are currently specified by command line arguments, by | |
15537 environment variables, and other means, can be specified in a uniform | |
15538 way using Lisp code. Reasonable default behavior for everything will | |
15539 still be provided, and the older, simpler means can be used if desired. | |
15540 Compatibility with the current location and name of the initialization | |
15541 file, and the current ill-chosen use for the @code{.xemacs} directory is | |
15542 maintained, and the problem of how to gracefully migrate a user from the | |
15543 old scheme into the new scheme while still allowing the user to use GNU | |
15544 Emacs or older versions of XEmacs is solved. A proposal for changing | |
15545 the way that the initial frame is mapped is also outlined; this would | |
15546 allow the user's initialization file to control the way that the initial | |
15547 frame appears without resorting to hacks, while still making echo area | |
15548 messages visible as they appear, and allowing the user to debug errors | |
15549 in the initialization file. | |
15550 | |
15551 @subheading Principles in the new scheme | |
15552 | |
15553 @enumerate | |
15554 @item | |
15555 | |
15556 XEmacs has a defined @dfn{pre-initialization process}. This process, | |
15557 whose purpose is to compute the values of the parameters that control | |
15558 how the initializiaton process proceeds, occurs as early as possible | |
15559 after the Lisp engine has been initialized, and in particular, it occurs | |
15560 before any devices have been opened, or before any initialization | |
15561 parameters are set that could reasonably be expected to be changed. In | |
15562 fact, the pre-initialization process should take care of setting these | |
15563 parameters. The code that implements the pre-initialization process | |
15564 should be written in Lisp and should be called from the Lisp function | |
15565 @code{normal-top-level}, and the general way that the user customizes | |
15566 this process should also be done using Lisp code. | |
15567 | |
15568 @item | |
15569 | |
15570 The pre-initialization process involves a number of properties, for | |
15571 example the directory containing the user initialization files (normally | |
15572 the @code{.xemacs} subdirectory), the name of the user init file, the | |
15573 name of the custom init file, where and what type the initial device is, | |
15574 whether and when the initial frame is mapped, etc. A standard interface | |
15575 is provided for getting and setting the values of these properties using | |
15576 functions such as @code{set-pre-init-property}, | |
15577 @code{pre-init-property}, etc. At various points during the | |
15578 pre-initialization process, the value of many of these properties can be | |
15579 undecided, which means that at the end of the process, the value of | |
15580 these properties will be derived from other properties in some fashion | |
15581 that is specific to each property. | |
15582 | |
15583 @item | |
15584 | |
15585 The default values of these properties are set first from the registry | |
15586 under Windows, then from environment variables, then from command line | |
15587 switches, such as @code{-q} and @code{-nw}. | |
15588 | |
15589 @item | |
15590 | |
15591 One of the command line switches is @code{-pre-init}, whose value is a | |
15592 Lisp expression to be evaluated at pre-initialization time, similar to | |
15593 the @code{-eval} command line switch. This allows any | |
15594 pre-initialization property to be set from the command line. | |
15595 | |
15596 @item | |
15597 | |
15598 Let's define the term @dfn{to determine a pre-initialization property} to | |
15599 mean if the value of a property is undetermined, it is computed and set | |
15600 according to a rule that is specific to the property. Then after the | |
15601 pre-init properties are initialized from the registry, from the | |
15602 environment variables, from command line arguments, two of the pre-init | |
15603 properties (specifically the init file directory and the location of the | |
15604 @dfn{pre-init file}) are determined. The purpose of the pre-init file is | |
15605 to contain Lisp code that is run at pre-initialization time, and to | |
15606 control how the initialization proceeds. It is a bit similar to the | |
15607 standard init file, but the code in the pre-init file shouldn't do | |
15608 anything other than set pre-init properties. Executing any code that | |
15609 does I/O might not produce expected results because the only device that | |
15610 will exist at the time is probably a stream device connected to the | |
15611 standard I/O of the XEmacs process. | |
15612 | |
15613 @item | |
15614 | |
15615 After the pre-init file has been run, all of the rest of the pre-init | |
15616 properties are determined, and these values are then used to control the | |
15617 initialization process. Some of the rules used in determining specific | |
15618 properties are: | |
15619 | |
15620 @enumerate | |
15621 @item | |
15622 | |
15623 If the @code{.xemacs} sub-directory exists, and it's not obviously a | |
15624 package root (which probably means that it contains a file like | |
15625 @code{init.el} or @code{pre-init.el}, or if neither of those files is | |
15626 present, then it doesn't contain any sub-directories or files that look | |
15627 like what would be in a package root), then it becomes the value of the | |
15628 init file directory. Otherwise the user's home directory is used. | |
15629 @item | |
15630 | |
15631 | |
15632 If the init file directory is the user's home directory, then the init | |
15633 file is called @code{.emacs}. Otherwise, it's called @code{init.el}. | |
15634 @item | |
15635 | |
15636 | |
15637 If the init file directory is the user's home directory, then the | |
15638 pre-init file is called @code{.xemacs-pre-init.el}. Otherwise it's | |
15639 called @code{pre-init.el}. (One of the reasons for this rule has to do | |
15640 with the dialog box that might be displayed at startup. This will be | |
15641 described below.) | |
15642 @item | |
15643 | |
15644 | |
15645 If the init file directory is the user's home directory, then the custom | |
15646 init file is called @code{.xemacs-custom-init.el}. Otherwise, it's | |
15647 called @code{custom-init.el}. | |
15648 | |
15649 @end enumerate | |
15650 | |
15651 @item | |
15652 | |
15653 After the first normal device is created, but before any frames are | |
15654 created on it, the XEmacs initialization code checks to see if the old | |
15655 init file scheme is being used, which is to say that the init file | |
15656 directory is the same as the user's home directory. If that's the case, | |
15657 then normally a dialog box comes up (or a question is asked on the | |
15658 terminal if XEmacs is being run in a non-windowing mode) which asks if | |
15659 the user wants to migrate his initialization files to the new scheme. | |
15660 The possible responses are @strong{Yes}, @strong{No}, and @strong{No, | |
15661 and don't ask this again}. If this last response is chosen, then the | |
15662 file @code{.xemacs-pre-init.el} in the user's home directory is created | |
15663 or appended to with a line of Lisp code that sets up a pre-init property | |
15664 indicating that this dialog box shouldn't come up again. If the | |
15665 @strong{Yes} option is chosen, then any package root files in | |
15666 @code{.xemacs} are moved into @code{.xemacs/packages}, the file | |
15667 @code{.emacs} is moved into @code{.xemacs/init.el} and @code{.emacs} in | |
15668 the home directory becomes a symlink to this file. This way some | |
15669 compatibility is still maintained with GNU Emacs and older versions of | |
15670 XEmacs. The code that implements this has to be written very carefully | |
15671 to make sure that it doesn't accidentally delete or mess up any of the | |
15672 files that get moved around. | |
15673 | |
15674 @end enumerate | |
15675 | |
15676 @subheading The custom init file | |
15677 | |
15678 The @dfn{custom init file} is where the custom package writes its | |
15679 options. This obviously needs to be a separate file from the standard | |
15680 init file. It should also be loaded before the init file rather than | |
15681 after, as is usually done currently, so that the init file can override | |
15682 these options if it wants to. | |
15683 | |
15684 @subheading Frame mapping | |
15685 | |
15686 In addition to the above scheme, the way that XEmacs handles mapping the | |
15687 initial frame should be changed. However, this change perhaps should be | |
15688 delayed to a later version of XEmacs because of the user visible changes | |
15689 that it entails and the possible breakage in people's init files that | |
15690 might occur. (For example, if the rest of the scheme is implemented in | |
15691 21.2, then this part of the scheme might want to be delayed until | |
15692 version 22.) The basic idea is that the initial frame is not created | |
15693 before the initialization file is run, but instead a banner frame is | |
15694 created containing the XEmacs logo, a button that allows the user to | |
15695 cancel the execution of the init file and an area where messages that | |
15696 are output in the process of running this file are displayed. This area | |
15697 should contain a number of lines, which makes it better than the current | |
15698 scheme where only the last message is visible. After the init file is | |
15699 done, the initial frame is mapped. This way the init file can make face | |
15700 changes and other such modifications that affect initial frame and then | |
15701 have the initial frame correctly come up with these changes and not see | |
15702 any frame dancing or other problems that exist currently. | |
15703 | |
15704 There should be a function that allows the initialization file to | |
15705 explicitly create and map the first frame if it wants to. There should | |
15706 also be a pre-init property that controls whether the banner frame | |
15707 appears (of course it defaults to true) a property controlling when the | |
15708 initial frame is created (before or after the init file, defaulting to | |
15709 after), and a property controlling whether the initial frame is mapped | |
15710 (normally true, but will be false if the @code{-unmapped} command line | |
15711 argument is given). | |
15712 | |
15713 If an error occurs in the init file, then the initial frame should | |
15714 always be created and mapped at that time so that the error is displayed | |
15715 and the debugger has a place to be invoked. | |
15716 | |
15717 @uref{../../www.666.com/ben/default.htm,Ben Wing} | |
15718 | |
15719 @node Future Work -- Keyword Parameters, Future Work -- Property Interface Changes, Future Work -- Better Initialization File Scheme, Future Work | |
15720 @section Future Work -- Keyword Parameters | |
15721 @cindex future work, keyword parameters | |
15722 @cindex keyword parameters, future work | |
15723 | |
15724 NOTE: These changes are partly motivated by the various user-interface | |
15725 changes elsewhere in this document, and partly for Mule support. In | |
15726 general the various API's in this document would benefit greatly from | |
15727 built-in keywords. | |
15728 | |
15729 I would like to make keyword parameters an integral part of Elisp. The | |
15730 idea here is that you use the @code{&key} identifier in the | |
15731 parameter list of a function and all of the following parameters | |
15732 specified are keyword parameters. This means that when these arguments | |
15733 are specified in a function call, they are immediately preceded in the | |
15734 argument list by a @dfn{keyword}, which is a symbol beginning with the | |
15735 `:' character. This allows any argument to be specified independently | |
15736 of any other argument with no need to place the arguments in any | |
15737 particular order. This is particularly useful for functions that take | |
15738 many optional parameters; using keyword parameters makes the code much | |
15739 cleaner and easier to understand. | |
15740 | |
15741 The @code{cl} package already provides keyword parameters of a sort, but | |
15742 I would like to make this more integrated and useable in a standard | |
15743 fashion. The interface that I am proposing is essentially compatible | |
15744 with the keyword interface in Common Lisp, but it may be a subset of the | |
15745 Common Lisp functionality, especially in the first implementation. | |
15746 There is one departure from the Common Lisp specification that I would | |
15747 like to make in order to make it much easier to add keyword parameters | |
15748 to existing functions with optional parameters, and in general, to make | |
15749 optional and keyword parameters coexist more easily. The Common Lisp | |
15750 specification indicates that if a function has both optional and keyword | |
15751 parameters, the optional parameters are always processed before the | |
15752 keyword parameters. This means, for example, that if a function has | |
15753 three required parameters, two optional parameters, and some number of | |
15754 keyword parameters following, and the program attempts to call this | |
15755 function by passing in the three required arguments, and then some | |
15756 keyword arguments, the first keyword specified and the argument | |
15757 following it get assigned to the first and second optional parameters as | |
15758 specified in the function definition. This is certainly not what is | |
15759 intended, and means that if a function defines both optional and keyword | |
15760 parameters, any calls of this function must specify @code{nil} for all | |
15761 of the optional arguments before using any keywords. If the function | |
15762 definition is later changed to add more optional parameters, all | |
15763 existing calls to this function that use any keyword arguments will | |
15764 break. This problem goes away if we simply process keyword parameters | |
15765 before the optional parameters. | |
15766 | |
15767 The primary changes needed to support the keyword syntax are: | |
15768 | |
15769 @enumerate | |
15770 @item | |
15771 | |
15772 The subr object type needs to be modified to contain additional slots | |
15773 for the number and names of any keyword parameters. | |
15774 @item | |
15775 | |
15776 | |
15777 The implementation of the @code{funcall} function needs to be modified | |
15778 so that it knows how to process keyword parameters. This is the only | |
15779 place that will require very much intricate coding, and much of the | |
15780 logic that would need to be added can be lifted directly from the | |
15781 @code{cl} code. | |
15782 @item | |
15783 | |
15784 | |
15785 A new macro, similar to the @code{DEFUN} macro, and probably called | |
15786 @code{DEFUN_WITH_KEYWORDS}, needs to be defined so that built-in Lisp | |
15787 primitives containing keywords can be created. Now, the | |
15788 @code{DEFUN_WITH_KEYWORDS} macro should take an additional parameter | |
15789 which is a string, which consists of the part of the lambda list | |
15790 declaration for this primitive that begins with the @code{&key} | |
15791 specifier. This string is parsed in the @code{DEFSUBR} macro during | |
15792 XEmacs initialization, and is converted into the appropriate structure | |
15793 that needs to be stored into the subr object. In addition, the | |
15794 @var{max_args} parameter of the @code{DEFUN} macro needs to be | |
15795 incremented by the number of keyword parameters and these parameters are | |
15796 passed to the C function simply as extra parameters at the end. The | |
15797 @code{DEFSUBR} macro can sort out the actual number of required, | |
15798 optional and keyword parameters that the function takes, once it has | |
15799 parsed the keyword parameter string. (An alternative that might make | |
15800 the declaration of a primitive a little bit easier to understand would | |
15801 involve adding another parameter to the @code{DEFUN_WITH_KEYWORDS} macro | |
15802 that specifies the number of keyword parameters. However, this would | |
15803 require some additional complexity in the preprocessor definition of the | |
15804 @code{DEFUN_WITH_KEYWORDS} macro, and probably isn't worth | |
15805 implementing). | |
15806 @item | |
15807 | |
15808 | |
15809 The byte compiler would have to be modified slightly so that it knows | |
15810 about keyword parameters when it parses the parameter declaration of a | |
15811 function. For example, so that it issues the correct warnings | |
15812 concerning calls to that function with incorrect arguments. | |
15813 @item | |
15814 | |
15815 | |
15816 The @code{make-docfile} program would have to be modified so that it | |
15817 generates the correct parameter lists for primitives defined using the | |
15818 @code{DEFUN_WITH_KEYWORDS} macro. | |
15819 @item | |
15820 | |
15821 | |
15822 Possibly other aspects of the help system that deal with function | |
15823 descriptions might have to be modified. | |
15824 @item | |
15825 | |
15826 | |
15827 A helper function might need to be defined to make it easier for | |
15828 primitives that use both the @code{&rest} and @code{&key} | |
15829 specifiers to parse their argument lists. | |
15830 | |
15831 @end enumerate | |
15832 | |
15833 @subheading Internal API for C primitives with keywords - necessary for many of the new Mule APIs being defined. | |
15834 | |
15835 @example | |
15836 DEFUN_WITH_KEYWORDS (Ffoo, "foo", 2, 5, 6, ALLOW_OTHER_KEYWORDS, | |
15837 (ichi, ARG_NIL), (ni, ARG_NIL), (san, ARG_UNBOUND), 0, | |
15838 (arg1, arg2, arg3, arg4, arg5) | |
15839 ) | |
15840 @{ | |
15841 ... | |
15842 @} | |
15843 | |
15844 -> C fun of 12 args: | |
15845 | |
15846 (arg1, ... arg5, ichi, ..., roku, other keywords) | |
15847 | |
15848 Circled in blue is actual example declaration | |
15849 | |
15850 DEFUN_WITH_KEYWORDS (Ffoo, "foo", 1,2,0 (bar, baz) <- arg list | |
15851 [ MIN ARGS, MAX ARGS, something that could be REST, SPECIFY_DEFAULT or | |
15852 REST_SPEC] | |
15853 | |
15854 [#KEYWORDS [ ALLOW_OTHER, SPECIFY_DEFAULT, ALLOW_OTHER_SPECIFY_DEFAULT | |
15855 6, ALLOW_OTHER_SPECIFY_DEFAULT, | |
15856 | |
15857 (ichi, 0) (ni, 0), (san, DEFAULT_UNBOUND), (shi, "t"), (go, "5"), | |
15858 (roku, "(current-buffer)") | |
15859 <- specifies arguments, default values (string to be read into Lisp | |
15860 data during init; then forms evalled at fn ref time. | |
15861 | |
15862 ,0 <- [INTERACTIVE SPEC] ) | |
15863 | |
15864 LO = Lisp_Object | |
15865 | |
15866 -> LO Ffoo (LO bar, LO baz, LO ichi, LO ni, LO san, LO shi, LO go, | |
15867 LO roku, int numkeywords, LO *other_keywords) | |
15868 | |
15869 #define DEFUN_WITH_KEYWORDS (fun, funstr, minargs, maxargs, argspec, \ | |
15870 #args, num_keywords, keywordspec, keywords, intspec) \ | |
15871 LO fun (DWK_ARGS (maxargs, args) \ | |
15872 DWK_KEYWORDS (num_keywords, keywordspec, keywords)) | |
15873 | |
15874 #define DWK_KEYWORDS (num_keywords, keywordspec, keywords) \ | |
15875 DWK_KEYWORDS ## keywordspec (keywords) | |
15876 DWK_OTHER_KEYWORDS ## keywordspec) | |
15877 | |
15878 #define DWK_KEYWORDS_ALLOW_OTHER (x,y) | |
15879 DWK_KEYWORDS (x,y) | |
15880 | |
15881 #define DWK_KEYWORDS_ALLOW_OTHER_SPECIFICATIONS (x,y) | |
15882 DWK_KEYWORDS_SPECIFY_DEFAULT (x,y) | |
15883 | |
15884 #define DWK_KEYWORDS_SPECIFY_DEFAULT (numkey, key) | |
15885 ARGLIST_CAR ## numkey key | |
15886 | |
15887 #define ARGLT_GRZ (x,y) LO CAR x, LO CAR y | |
15888 @end example | |
15889 | |
15890 @node Future Work -- Property Interface Changes, Future Work -- Toolbars, Future Work -- Keyword Parameters, Future Work | |
15891 @section Future Work -- Property Interface Changes | |
15892 @cindex future work, property interface changes | |
15893 @cindex property interface changes, future work | |
15894 | |
15895 In my past work on XEmacs, I already expanded the standard property | |
15896 functions of @code{get}, @code{put}, and @code{remprop} to work on | |
15897 objects other than symbols and defined an additional function | |
15898 @code{object-plist} for this interface. I'd like to expand this | |
15899 interface further and advertise it as the standard way to make property | |
15900 changes in objects, especially the new objects that are going to be | |
15901 defined in order to support the added user interface features of version | |
15902 22. My proposed changes are as follows: | |
15903 | |
15904 @enumerate | |
15905 @item | |
15906 | |
15907 A new concept associated with each property called a @dfn{default value} | |
15908 is introduced. (This concept already exists, but not in a well-defined | |
15909 way.) The default value is the value that the property assumes for | |
15910 certain value retrieval functions such as @code{get} when it is | |
15911 @dfn{unbound}, which is to say that its value has not been explicitly | |
15912 specified. Note: the way to make a property unbound is to call | |
15913 @code{remprop}. Note also that for some built-in properties, setting | |
15914 the property to its default value is equivalent to making it unbound. | |
15915 @item | |
15916 | |
15917 | |
15918 The behavior of the @code{get} function is modified. If the @code{get} | |
15919 function is called on a property that is unbound and the third, optional | |
15920 @var{default} argument is @code{nil}, then the default value of the | |
15921 property is returned. If the @var{default} argument is not @code{nil}, | |
15922 then whatever was specified as the value of this argument is returned. | |
15923 For the most part, this is upwardly compatible with the existing | |
15924 definition of @code{get} because all user-defined properties have an | |
15925 initial default value of @code{nil}. Code that calls the @code{get} | |
15926 function and specifies @code{nil} for the @var{default} argument, and | |
15927 expects to get @code{nil} returned if the property is unbound, is almost | |
15928 certainly wrong anyway. | |
15929 @item | |
15930 | |
15931 | |
15932 A new function, @code{get1} is defined. This function does not take a | |
15933 default argument like the @code{get} function. Instead, if the property | |
15934 is unbound, an error is signaled. Note: @code{get} can be implemented | |
15935 in terms of @code{get1}. | |
15936 @item | |
15937 | |
15938 | |
15939 New functions @code{property-default-value} and @code{property-bound-p} | |
15940 are defined with the obvious semantics. | |
15941 @item | |
15942 | |
15943 | |
15944 An additional function @code{property-built-in-p} is defined which takes | |
15945 two arguments, the first one being a symbol naming an object type, and | |
15946 the second one specifying a property, and indicates whether the property | |
15947 name has a built-in meaning for objects of that type. | |
15948 @item | |
15949 | |
15950 | |
15951 It is not necessary, or even desirable, for all object types to allow | |
15952 user-defined properties. It is always possible to simulate user-defined | |
15953 properties for an object by using a weak hash table. Therefore, whether | |
15954 an object allows a user to define properties or not should depend on the | |
15955 meaning of the object. If an object does not allow user-defined | |
15956 properties, the @code{put} function should signal an error, such as | |
15957 @code{undefined-property}, when given any property other than those that | |
15958 are predefined. | |
15959 @item | |
15960 | |
15961 | |
15962 A function called @code{user-defined-properties-allowed-p} should be | |
15963 defined with the obvious semantics. (See the previous item.) | |
15964 @item | |
15965 | |
15966 | |
15967 Three more functions should be defined, called | |
15968 @code{built-in-property-name-list}, @code{property-name-list}, and | |
15969 @code{user-defined-property-name-list}. | |
15970 | |
15971 @end enumerate | |
15972 | |
15973 Another idea: | |
15974 | |
15975 @example | |
15976 (define-property-method | |
15977 predicate object-type | |
15978 predicate cons :(KEYWORD) (all lists beginning with KEYWORD) | |
15979 | |
15980 :put putfun | |
15981 :get | |
15982 :remprop | |
15983 :object-props | |
15984 :clear-properties | |
15985 :map-properties | |
15986 | |
15987 e.g. (define-property-method 'hash-table | |
15988 :put #'(lambda (obj key value) (puthash key obj value))) | |
15989 @end example | |
15990 | |
15991 | |
15992 @node Future Work -- Toolbars, Future Work -- Menu API Changes, Future Work -- Property Interface Changes, Future Work | |
15993 @section Future Work -- Toolbars | |
15994 @cindex future work, toolbars | |
15995 @cindex toolbars | |
15996 | |
15997 @menu | |
15998 * Future Work -- Easier Toolbar Customization:: | |
15999 * Future Work -- Toolbar Interface Changes:: | |
16000 @end menu | |
16001 | |
16002 @node Future Work -- Easier Toolbar Customization, Future Work -- Toolbar Interface Changes, Future Work -- Toolbars, Future Work -- Toolbars | |
16003 @subsection Future Work -- Easier Toolbar Customization | |
16004 @cindex future work, easier toolbar customization | |
16005 @cindex easier toolbar customization, future work | |
16006 | |
16007 @strong{Abstract:} One of XEmacs' greatest strengths is its ability to | |
16008 be customized endlessly. Unfortunately, it is often too difficult to | |
16009 figure out how to do this. There has been some recent work like the | |
16010 Custom package, which helps in this regard, but I think there's a lot | |
16011 more work that needs to be done. Here are some ideas (which certainly | |
16012 could use some more thought). | |
16013 | |
16014 Although there is currently an @code{edit-toolbar} package, it is not | |
16015 well integrated with XEmacs, and in general it is much too hard to | |
16016 customize the way toolbars look. I would like to see an interface that | |
16017 works a bit like the way things work under Windows, where you can | |
16018 right-click on a toolbar to get a menu of options that allows you to | |
16019 change aspects of the toolbar. The general idea is that if you | |
16020 right-click on an item itself, you can do things to that item, whereas | |
16021 if you right-click on a blank part of a toolbar, you can change the | |
16022 properties of the toolbar. Some of the items on the right-click menu | |
16023 for a particular toolbar button should be specified by the button | |
16024 itself. Others should be standard. For example, there should be an | |
16025 @strong{Execute} item which simply does what would happen if you | |
16026 left-click on a toolbar button. There should probably be a | |
16027 @strong{Delete} item to get rid of the toolbar button and a | |
16028 @strong{Properties} item, which brings up a property sheet that allows | |
16029 you to do things like change the icon and the command string that's | |
16030 associated with the toolbar button. | |
16031 | |
16032 The options to change the appearance of the toolbar itself should | |
16033 probably appear both on the context menu for specific buttons, and on | |
16034 the menu that appears when you click on a blank part of the toolbar. | |
16035 That way, if there isn't a blank part of the toolbar, you can still | |
16036 change the toolbar appearance. As for what appears in these items, in | |
16037 Outlook Express, for example, there are three different menu items, one | |
16038 of which is called @strong{Buttons}, which brings up, or pops up a | |
16039 window which allows you to edit the toolbar, which for us could pop up a | |
16040 new frame, which is running @code{edit-toolbar.el}. The second item is | |
16041 called @strong{Align}, which contains a submenu that says @strong{Top}, | |
16042 @strong{Bottom}, @strong{Left}, and @strong{Right}, which will be just | |
16043 like setting the default toolbar position. The third one says | |
16044 @strong{Text Labels}, which would just let you select whether there are | |
16045 captions or not. I think all three of these are useful and are easy to | |
16046 implement in XEmacs. These things also need to be integrated with | |
16047 custom so that a user can control whether these options apply to all | |
16048 sessions, and in such a case can save the settings out to an options | |
16049 file. @code{edit-toolbar.el} in particular needs to integrate with | |
16050 custom. Currently it has some sort of hokey stuff of its own, which it | |
16051 saves out to a @code{.toolbar} file. Another useful option to have, | |
16052 once we draw the captions dynamically rather than using pre-generated | |
16053 ones, would be the ability to change the font size of the captions. I'm | |
16054 sure that Kyle, for one, would appreciate this. | |
16055 | |
16056 (This is incomplete.....) | |
16057 | |
16058 @uref{../../www.666.com/ben/default.htm,Ben Wing} | |
16059 | |
16060 @node Future Work -- Toolbar Interface Changes, , Future Work -- Easier Toolbar Customization, Future Work -- Toolbars | |
16061 @subsection Future Work -- Toolbar Interface Changes | |
16062 @cindex future work, toolbar interface changes | |
16063 @cindex toolbar interface changes, future work | |
16064 | |
16065 I propose changing the way that toolbars are specified to make them more | |
16066 flexible. | |
16067 | |
16068 @enumerate | |
16069 @item | |
16070 | |
16071 A new format for the vector that specifies a toolbar item is allowed. | |
16072 In this format, the first three items of the vector are required and | |
16073 are, respectively, a caption, a glyph list, and a callback. The glyph | |
16074 list and callback arguments are the same as in the current toolbar item | |
16075 specification, and the caption is a string specifying the caption text | |
16076 placed below the toolbar glyph. The caption text is required so that | |
16077 toolbar items can be identified for the purpose of retrieving and | |
16078 changing their property values. Putting the caption first also makes it | |
16079 easy to distinguish between the new and the old toolbar item vector | |
16080 formats. In the old format, the first item, the glyph list, is either a | |
16081 list or a symbol. In the new format, the first item is a string. In | |
16082 the new format, following the three required items, are optional keyword | |
16083 items specified using keywords in the same format as the menu item | |
16084 vector format. The keywords that should be predefined are: | |
16085 @code{:help-echo}, @code{:context-menu}, @code{:drop-handlers}, and | |
16086 @code{:enabled-p}. The @code{:enabled-p} and @code{:help-echo} keyword | |
16087 arguments are the same as the third and fourth items in the old toolbar | |
16088 item vector format. The @code{:context-menu} keyword is a list in | |
16089 standard menu format that specifies additional items that will appear | |
16090 when the context menu for the toolbar item is popped up. (Typically, | |
16091 this happens when the right mouse button is clicked on the toolbar | |
16092 item). The @code{:drop-handlers} keyword is for use by the new | |
16093 drag-n-drop interface (see @uref{drag-n-drop.html,Drag-n-Drop Interface | |
16094 Changes} ), and is not normally specified or modified directly. | |
16095 @item | |
16096 | |
16097 | |
16098 Conceivably, there could also be keywords that are associated with a | |
16099 toolbar itself, rather than with a particular toolbar item. These | |
16100 keyword properties would be specified using keywords and arguments that | |
16101 occur before any toolbar item vectors, similarly to how things are done | |
16102 in menu specifications. Possible properties could include | |
16103 @code{:captioned-p} (whether the captions are visible under the | |
16104 toolbar), @code{:glyphs-visible-p} (whether the toolbar glyphs are | |
16105 visible), and @code{:context-menu} (additional items that will appear on | |
16106 the context menus for all toolbar items and additionally will appear on | |
16107 the context menu that is popped up when the right mouse button is | |
16108 clicked over a portion of the toolbar that does not have any toolbar | |
16109 buttons in it). The current standard practice with regards to such | |
16110 properties seems to be to have separate specifiers, such as | |
16111 @code{left-toolbar-width}, @code{right-toolbar-width}, | |
16112 @code{left-toolbar-visible-p}, @code{right-toolbar-visible-p}, etc. It | |
16113 could easily be argued that there should be no such toolbar specifiers | |
16114 and that all such properties should be part of the toolbar instantiator | |
16115 itself. In this scheme, the only separate specifiers that would exist | |
16116 for individual properties would be default values. There are a lot of | |
16117 reasons why an interface change like this makes sense. For example, | |
16118 currently when VM sets its toolbar, it also sets the toolbar width and | |
16119 similar properties. If you change which edge of the frame the VM | |
16120 toolbar occurs in, VM will also have to go and modify all of the | |
16121 position-specific toolbar specifiers for all of the other properties | |
16122 associated with a toolbar. It doesn't really seem to make sense to me | |
16123 for the user to be specifying the width and visibility and such of | |
16124 specific toolbars that are attached to specific edges because the user | |
16125 should be free to move the toolbars around and expect that all of the | |
16126 toolbar properties automatically move with the toolbar. (It is also easy | |
16127 to imagine, for example, that a toolbar might not be attached to the | |
16128 edge of the frame at all, but might be floating somewhere on the user's | |
16129 screen). With an interface where these properties are separate | |
16130 specifiers, this has to be done manually. Currently, having the various | |
16131 toolbar properties be inside of toolbar instantiators makes them | |
16132 difficult to modify, but this will be different with the API that I | |
16133 propose below. | |
16134 @item | |
16135 | |
16136 | |
16137 I propose an API for modifying toolbar and toolbar item properties, as | |
16138 well as making other changes to toolbar instantiators, such as inserting | |
16139 or deleting toolbar items. This API is based around the concept of a | |
16140 path. There are two kinds of paths here -- @dfn{toolbar paths} and | |
16141 @dfn{toolbar item paths}. Each kind of path is an object (of type | |
16142 @code{toolbar-path} and @code{toolbar-item-path}, respectively) whose | |
16143 properties specify the location in a toolbar instantiator where changes | |
16144 to the instantiator can be made. A toolbar path, for example, would be | |
16145 created using the @code{make-toolbar-path} function, which takes a | |
16146 toolbar specifier (or optionally, a symbol, such as @code{left}, | |
16147 @code{right}, @code{default}, or @code{nil}, which refers to a | |
16148 particular toolbar), and optionally, parameters such as the locale and | |
16149 the tag set, which specify which actual instantiator inside of the | |
16150 toolbar specifier is to be modified. A toolbar item path is created | |
16151 similarly using a function called @code{make-toolbar-item-path}, which | |
16152 takes a toolbar specifier and a string naming the caption of the toolbar | |
16153 item to be modified, as well as, of course, optionally the locale and | |
16154 tag set parameters and such. | |
16155 | |
16156 The usefulness of these path objects is as arguments to functions that | |
16157 will use them as pointers to the place in a toolbar instantiator where | |
16158 the modification should be made. Recall, for example, the generalized | |
16159 property interface described above. If a function such as @code{get} or | |
16160 @code{put} is called on a toolbar path or toolbar item path, it will use | |
16161 the information contained in the path object to retrieve or modify a | |
16162 property located at the end of the path. The toolbar path objects can | |
16163 also be passed to new functions that I propose defining, such as | |
16164 @code{add-toolbar-item}, @code{delete-toolbar-item}, and | |
16165 @code{find-toolbar-item}. These functions should be parallel to the | |
16166 functions for inserting, deleting, finding, etc. items in a menu. The | |
16167 toolbar item path objects can also be passed to the drop-handler | |
16168 functions defined in @uref{drag-n-drop.html,Drag-n-Drop Interface | |
16169 Changes} to retrieve or modify the drop handlers that are associated | |
16170 with a toolbar item. (The idea here is that you can drag an object and | |
16171 drop it onto a toolbar item, just as you could onto a buffer, an extent, | |
16172 a menu item, or any other graphical element). | |
16173 @item | |
16174 | |
16175 | |
16176 We should at least think about allowing for separate default and | |
16177 buffer-local toolbars. The user should either be able to position these | |
16178 toolbars one above the other, or side by side, occupying a single | |
16179 toolbar line. In the latter case, the boundary between the toolbars | |
16180 should be draggable, and if a toolbar takes up more room than is | |
16181 allocated for it, there should be arrows that appear on one or both | |
16182 sides of the toolbar so that the items in the toolbar can be scrolled | |
16183 left or right. (For that matter, this sort of interface should exist | |
16184 even when there is only one toolbar that is on a particular toolbar | |
16185 line, because the toolbar may very well have more items than can be | |
16186 displayed at once, and it's silly in such a case if it's impossible to | |
16187 access the items that are not currently visible). | |
16188 @item | |
16189 | |
16190 | |
16191 The default context menu for toolbars (which should be specified using a | |
16192 specifier called @code{default-toolbar-context-menu} according to the | |
16193 rules defined above) should contain entries allowing the user to modify | |
16194 the appearance of a toolbar. Entries would include, for example, | |
16195 whether the toolbar is captioned, whether the glyphs for the toolbar are | |
16196 visible (if the toolbar is captioned but its glyphs are not visible, the | |
16197 toolbar appears as nothing but text; you can set things up this way, for | |
16198 example, in Netscape), an option that brings up a package for editing | |
16199 the contents of a toolbar, an option to allow the caption face to be | |
16200 dchanged (perhaps thorough jan @code{edit-faces} or @code{custom} | |
16201 interface), etc. | |
16202 | |
16203 @end enumerate | |
16204 | |
16205 @node Future Work -- Menu API Changes, Future Work -- Removal of Misc-User Event Type, Future Work -- Toolbars, Future Work | |
16206 @section Future Work -- Menu API Changes | |
16207 @cindex future work, menu API changes | |
16208 @cindex menu API changes, future work | |
16209 | |
16210 | |
16211 @enumerate | |
16212 @item | |
16213 | |
16214 I propose making a specifier for the menubar associated with the frame. | |
16215 The specifier should be called @code{default-menubar} and should replace | |
16216 the existing @code{current-menubar} variable. This would increase the | |
16217 power of the menubar interface and bring it in line with the toolbar | |
16218 interface. (In order to provide proper backward compatibility, we might | |
16219 have to @uref{symbol-value-handlers.html,complete the symbol value | |
16220 handler mechanism}) | |
16221 @item | |
16222 | |
16223 | |
16224 I propose an API for modifying menu instantiators similar to the API | |
16225 composed above for toolbar instantiators. A new object called a | |
16226 @dfn{menu path} (of type @code{menu-path}) can be created using the | |
16227 @code{make-menu-path} function, and specifies a location in a particular | |
16228 menu instantiator where changes can be made. The first argument to | |
16229 @code{make-menu-path} specifies which menu to modify and can be a | |
16230 specifier, a value such as @code{nil} (which means to modify the default | |
16231 menubar associated with the selected frame), or perhaps some other kind | |
16232 of specification referring to some other menu, such as the context menus | |
16233 invoked by the right mouse button. The second argument to | |
16234 @code{make-menu-path}, also required, is a list of zero or more strings | |
16235 that specifies the particular menu or menu item in the instantiator that | |
16236 is being referred to. The remaining arguments are optional and would be | |
16237 a locale, a tag set, etc. The menu path object can be passed to | |
16238 @code{get}, @code{put} or other standard property functions to access or | |
16239 modify particular properties of a menu or a menu item. It can also be | |
16240 passed to expanded versions of the existing functions such as | |
16241 @code{find-menu-item}, @code{delete-menu-item}, @code{add-menu-button}, | |
16242 etc. (It is really a shame that @code{add-menu-item} is an obsolete | |
16243 function because it is a much better name than @code{add-menu-button}). | |
16244 Finally, the menu path object can be passed to the drop-handler | |
16245 functions described in @uref{drag-n-drop.html,Drag-n-Drop Interface | |
16246 Changes} to access or modify the drop handlers that are associated with | |
16247 a particular menu item. | |
16248 @item | |
16249 | |
16250 | |
16251 New keyword properties should be added to the menu item vector. These | |
16252 include @code{:help-echo}, @code{:context-menu} and | |
16253 @code{:drop-handlers}, with similar semantics to the corresponding | |
16254 keywords for toolbar items. (It may seem a bit strange at first to have | |
16255 a context menu associated with a particular menu item, but it is a user | |
16256 interface concept that exists both in Open Look and in Windows, and | |
16257 really makes a lot of sense if you give it a bit of thought). These | |
16258 properties may not actually be implemented at first, but at least the | |
16259 keywords for them should be defined. | |
16260 | |
16261 @end enumerate | |
16262 | |
16263 @uref{../../www.666.com/ben/default.htm,Ben Wing} | |
16264 | |
16265 @node Future Work -- Removal of Misc-User Event Type, Future Work -- Mouse Pointer, Future Work -- Menu API Changes, Future Work | |
16266 @section Future Work -- Removal of Misc-User Event Type | |
16267 @cindex future work, removal of misc-user event type | |
16268 @cindex removal of misc-user event type, future work | |
16269 | |
16270 @strong{Abstract:} This page describes why the misc-user event type | |
16271 should be split up into a number of different event types, and how to do | |
16272 this. | |
16273 | |
16274 The misc-user event should not exist as a single event type. It should | |
16275 be split up into a number of different event types: one for scrollbar | |
16276 events, one for menu events, and one or two for drag-n-drop events. | |
16277 Possibly there will be other event types created in the future. The | |
16278 reason for this is that the misc-user event was a bad design choice when | |
16279 I made it, and it has only gotten worse with Oliver's attempts to add | |
16280 features to it to make it be used for drag-n-drop. I know that there | |
16281 was originally a separate drag-n-drop event type, and it was folded into | |
16282 the misc-user event type on my recommendation, but I have now realized | |
16283 the error of my ways. I had originally created a single event type in | |
16284 an attempt to prevent some Lisp programs from breaking because they | |
16285 might have a case statement over various event types, and would not be | |
16286 able to handle new event types appearing. I think now that these | |
16287 programs simply need to be written in a way to handle new event types | |
16288 appearing. It's not very hard to do this. You just use predicates | |
16289 instead of doing a case statement over the event type. If we preserve | |
16290 the existing predicate called @code{misc-user-event-p}, and just make | |
16291 sure that it evaluates to true when given any user event type other than | |
16292 the standard simple ones, then most existing code will not break either | |
16293 when we split the event types up like this, or if we add any new event | |
16294 types in the future. | |
16295 | |
16296 More specifically, the only clean way to design the misc-user event type | |
16297 would be to add a sub-type field to it, and then have the nature of all | |
16298 the other fields in the event type be dependent on this sub-type. But | |
16299 then in essence, we'd just be reimplementing the whole event-type scheme | |
16300 inside of misc-user events, which would be rather pointless. | |
16301 | |
16302 @node Future Work -- Mouse Pointer, Future Work -- Extents, Future Work -- Removal of Misc-User Event Type, Future Work | |
16303 @section Future Work -- Mouse Pointer | |
16304 @cindex future work, mouse pointer | |
16305 @cindex mouse pointer, future work | |
16306 | |
16307 @menu | |
16308 * Future Work -- Abstracted Mouse Pointer Interface:: | |
16309 * Future Work -- Busy Pointer:: | |
16310 @end menu | |
16311 | |
16312 @node Future Work -- Abstracted Mouse Pointer Interface, Future Work -- Busy Pointer, Future Work -- Mouse Pointer, Future Work -- Mouse Pointer | |
16313 @subsection Future Work -- Abstracted Mouse Pointer Interface | |
16314 @cindex future work, abstracted mouse pointer interface | |
16315 @cindex abstracted mouse pointer interface, future work | |
16316 | |
16317 @strong{Abstract:} We need to create a new image format that allows | |
16318 standard pointer shapes to be specified in a way that works on all | |
16319 Windows systems. I suggest that this be called @code{pointer}, which | |
16320 has one tag associated with it, named @code{:data}, and whose value is a | |
16321 string. The possible strings that can be specified here are predefined | |
16322 by XEmacs, and are guaranteed to work across all Windows systems. This | |
16323 means that we may need to provide our own definition for pointer shapes | |
16324 that are not standard on some systems. In particular, there are a lot | |
16325 more standard pointer shapes under X than under Windows, and most of | |
16326 these pointer shapes are fairly useful. There are also a few pointer | |
16327 shapes (I think the hand, for example) on Windows, but not on X. | |
16328 Converting the X pointer shapes to Windows should be easy because the | |
16329 definitions of the pointer shapes are simply XBM files, which we can | |
16330 read under Windows. Going the other way might be a little bit more | |
16331 difficult, but it should still not be that hard. | |
16332 | |
16333 While we're at it, we should change the image format currently called | |
16334 @code{cursor-font} to @code{x-cursor-font}, because it only works under | |
16335 X Windows. We also need to change the format called @code{resource} to | |
16336 be @code{mswindows-resource}. At least in the case of | |
16337 @code{cursor-font}, the old value should be maintained for compatibility | |
16338 as an obsolete alias. The @code{resource} format was added so recently | |
16339 that it's possible that we can just change it. | |
16340 | |
16341 @uref{../../www.666.com/ben/default.htm,Ben Wing} | |
16342 | |
16343 @node Future Work -- Busy Pointer, , Future Work -- Abstracted Mouse Pointer Interface, Future Work -- Mouse Pointer | |
16344 @subsection Future Work -- Busy Pointer | |
16345 @cindex future work, busy pointer | |
16346 @cindex busy pointer, future work | |
16347 | |
16348 Automatically make the mouse pointer switch to a busy shape (watch | |
16349 signal) when XEmacs has been "busy" for more than, e.g. 2 seconds. | |
16350 Define the @dfn{busy time} as the time since the last time that XEmacs was | |
16351 ready to receive input from the user. An implementation might be: | |
16352 | |
16353 @enumerate | |
16354 @item | |
16355 Set up an asynchronous timeout, to signal after the busy time; these | |
16356 are triggered through a call to QUIT so they will be triggered even | |
16357 when the code is busy doing something. | |
16358 @item | |
16359 We already have an "emacs_is_blocking" flag when we are waiting for | |
16360 input. In the same place, when we are about to block and wait for | |
16361 input (regardless of whether input is already present), maybe call a | |
16362 hook, which in this case would remove the timer and put back the | |
16363 normal mouse shape. Then when we exit the blocking stage (we got | |
16364 some input), call another hook, which in this case will start the | |
16365 timer. Note that we don't want these "blocking" hooks to be triggered | |
16366 just because of an accept-process-output or some similar thing that | |
16367 retrieves events, only to put them back onto a queue for later | |
16368 processing. Maybe we want some sort of flag that's bound by those | |
16369 routines saying that we aren't really waiting for input. Making | |
16370 that flag Lisp-accessible allows it to be set by similar sorts of | |
16371 Lisp routines (if there are any?) that loop retrieving events but | |
16372 defer them, or only drain the queue, or whatnot. #### Think about | |
16373 whether it would make some sense to try and be more clever in our | |
16374 determinations of what counts as "real waiting for user input", e.g. | |
16375 whether the event gets dispatched (unfortunately this occurs way too | |
16376 late, we want to know to remove the busy cursor @strong{before} getting an | |
16377 event), maybe whether there are any events waiting to be processed or | |
16378 we'll truly block, etc. (e.g. one possibility if there is input on | |
16379 the queue already when we "block" for input, don't remove the busy- | |
16380 wait pointer, but trigger the removal of it when we dispatch a user | |
16381 event). | |
16382 @end enumerate | |
16383 | |
16384 @node Future Work -- Extents, Future Work -- Version Number and Development Tree Organization, Future Work -- Mouse Pointer, Future Work | |
16385 @section Future Work -- Extents | |
16386 @cindex future work, extents | |
16387 @cindex extents, future work | |
16388 | |
16389 @menu | |
16390 * Future Work -- Everything should obey duplicable extents:: | |
16391 @end menu | |
16392 | |
16393 @node Future Work -- Everything should obey duplicable extents, , Future Work -- Extents, Future Work -- Extents | |
16394 @subsection Future Work -- Everything should obey duplicable extents | |
16395 @cindex future work, everything should obey duplicable extents | |
16396 @cindex everything should obey duplicable extents, future work | |
16397 | |
16398 A lot of functions don't properly track duplicable extents. For | |
16399 example, the @code{concat} function does, but the @code{format} function | |
16400 does not, and extents in keymap prompts are not displayed either. All | |
16401 of the functions that generate strings or string-like entities should | |
16402 track the extents that are associated with the strings. Currently this | |
16403 is difficult because there is no general mechanism implemented for doing | |
16404 this. I propose such a general mechanism, which would not be hard to | |
16405 implement, and would be easy to use in other functions that build up | |
16406 strings. | |
16407 | |
16408 The basic idea is that we create a C structure that is analogous to a | |
16409 Lisp string in that it contains string data and lists of extents for | |
16410 that data. Unlike standard Lisp strings, however, this structure (let's | |
16411 call it @code{lisp_string_struct}) can be incrementally updated and its | |
16412 allocation is handled explicitly so that no garbage is generated. (This | |
16413 is important for example, in the event-handling code which would want to | |
16414 use this structure, but needs to not generate any garbage for efficiency | |
16415 reasons). Both the string data and the list of extents in this string | |
16416 are handled using dynarrs so that it is easy to incrementally update | |
16417 this structure. Functions should exist to create and destroy instances | |
16418 of @code{lisp_string_struct} to generate a Lisp string from a | |
16419 @code{lisp_string_struct} and vice-versa to append a sub-string of a | |
16420 Lisp string to a @code{lisp_string_struct}, to just append characters to | |
16421 a @code{lisp_string_struct}, etc. The only thing possibly tricky about | |
16422 implementing these functions is implementing the copying of extents from | |
16423 a Lisp string into a @code{lisp_string_struct}. However, there is | |
16424 already a function @code{copy_string_extents()} that does basically this | |
16425 exact thing, and it should be easy to create a modified version of this | |
16426 function. | |
16427 | |
16428 @uref{../../www.666.com/ben/default.htm,Ben Wing} | |
16429 | |
16430 @node Future Work -- Version Number and Development Tree Organization, Future Work -- Improvements to the @code{xemacs.org} Website, Future Work -- Extents, Future Work | |
16431 @section Future Work -- Version Number and Development Tree Organization | |
16432 @cindex future work, version number and development tree organization | |
16433 @cindex version number and development tree organization, future work | |
16434 | |
16435 @strong{Abstract:} The purpose of this proposal is to present a coherent | |
16436 plan for how development branches in XEmacs are managed. This will | |
16437 cover such issues as stable versus experimental branches, creating new | |
16438 branches, synchronizing patches between branches, and how version | |
16439 numbers are assigned to branches. | |
16440 | |
16441 A development branch is defined to be a linear series of releases of the | |
16442 XEmacs code base, each of which is derived from the previous one. When | |
16443 the XEmacs development tree is forked and two branches are created where | |
16444 there used to be one, the branch that is intended to be more stable and | |
16445 have fewer changes made to it is considered the one that inherits the | |
16446 parent branch, and the other branch is considered to have begun at the | |
16447 branching point. The less stable of the two branches will eventually be | |
16448 forked again, while this will not happen usually to the more stable of | |
16449 the two branches, and its development will eventually come to an end. | |
16450 This means that every branch has a definite ending point. For example, | |
16451 the 20.x branch began at the point when the released | |
16452 19.13 code tree was split into a 19.x and a 20.x branch, and a 20.x | |
16453 branch will end when the last 20.x release (probably numbered 20.5 or | |
16454 20.6) is released. | |
16455 | |
16456 I think that there should always be three active development branches at | |
16457 any time. These branches can be designated the stable, the semi-stable, | |
16458 and the experimental branches. This situation has existed in the | |
16459 current code tree as soon as the 21.0 development branch was split. In | |
16460 this situation, the stable branch is the 20.x series. The semi-stable | |
16461 branch is the 21.0 release and the stability releases that follow. The | |
16462 experimental branch is the branch that was created as the result of the | |
16463 21.0 development branch split. Typically, the stable branch has been | |
16464 released for a long period of time. The semi-stable branch has been | |
16465 released for a short period of time, or is about to be released, and the | |
16466 experimental branch has not yet been released, and will probably not be | |
16467 released for awhile. The conditions that should hold in all | |
16468 circumstances are: | |
16469 | |
16470 @enumerate | |
16471 @item | |
16472 | |
16473 There should be three active branches. | |
16474 @item | |
16475 | |
16476 The experimental branch should never be in feature freeze. | |
16477 | |
16478 @end enumerate | |
16479 | |
16480 The reason for the second condition is to ensure that active development | |
16481 can always proceed and is never throttled, as is happening currently at | |
16482 the end of the 21.0 release cycle. What this means is that as soon as | |
16483 the experimental branch is deemed to be stable enough to go into feature | |
16484 freeze: | |
16485 | |
16486 @enumerate | |
16487 @item | |
16488 | |
16489 The current stable branch is made inactive and all further development | |
16490 on it ceases. | |
16491 @item | |
16492 | |
16493 The semi-stable branch, which by now should have been released for a | |
16494 fair amount of time, and should be fairly stable, gets renamed to the | |
16495 stable branch. | |
16496 @item | |
16497 | |
16498 The experimental branch is forked into two branches, one of which | |
16499 becomes the semi-stable branch, and the other, the experimental branch. | |
16500 | |
16501 @end enumerate | |
16502 | |
16503 The stable branch is always in high resistance, which is to say that the | |
16504 only changes that can be made to the code are important bug fixes | |
16505 involving a small amount of code where it should be clear just by | |
16506 reading the code that no destabilizing code has been introduced. The | |
16507 semi-stable branch is in low resistance, which means that no major | |
16508 features can be added, but except right before a release fairly major | |
16509 code changes are allowed. Features can be added if they are | |
16510 sufficiently small, if they are deemed sufficiently critical due to | |
16511 severe problems that would exist if the features were not added (for | |
16512 example, replacement of the unexec mechanism with a portable solution | |
16513 would be a feature that could be added to the semi-stable branch | |
16514 provided that it did not involve an overly radical code re-architecture, | |
16515 because otherwise it might be impossible to build XEmacs on some | |
16516 architectures or with some compilers), or if the primary purpose of the | |
16517 new feature is to remedy an incompleteness in a recent architectural | |
16518 change that was not finished in a prior release due to lack of time (for | |
16519 example, abstracting the mouse pointer and list-of-colors interfaces, | |
16520 which were left out of 21.0). There is no feature resistance in place | |
16521 in the experimental branch, which allows full development to proceed at | |
16522 all times. | |
16523 | |
16524 In general, both the stable and semi-stable branches will contain | |
16525 previous net releases. In addition, there will be beta releases in all | |
16526 three branches, and possibly development snapshots between the beta | |
16527 releases. It's obviously necessary to have a good version numbering | |
16528 scheme in order to keep everything straight. | |
16529 | |
16530 First of all, it needs to be immediately clear from the version number | |
16531 whether the release is a beta release or a net release. Steve has | |
16532 proposed getting rid of the beta version numbering system, which I think | |
16533 would be a big mistake. Furthermore, the net release version number and | |
16534 beta release version number should be kept separate, just as they are | |
16535 now, to make it completely clear where any particular release stands. | |
16536 There may be alternate ways of phrasing a beta release other than | |
16537 something like 21.0 beta 34, but in all such systems, the beta number | |
16538 needs to be zero for any release version. Three possible alternative | |
16539 systems, none of which I like very much, are: | |
16540 | |
16541 @enumerate | |
16542 @item | |
16543 | |
16544 The beta number is simply an extra number in the regular version number. | |
16545 Then, for example, 21.0 beta 34 becomes 21.0.34. The problem is that | |
16546 the release version, which would simply be called 21.0, appears to be | |
16547 earlier than 21.0 beta 34. | |
16548 @item | |
16549 | |
16550 The beta releases appear as later revisions of earlier releases. Then, | |
16551 for example, 21.1 beta 34 becomes 21.0.34, and 21.0 beta 34 would have | |
16552 to become 21.-1.34. This has both the obvious ugliness of negative | |
16553 version numbers and the problem that it makes beta releases appear to be | |
16554 associated with their previous releases, when in fact they are more | |
16555 closely associated with the following release. | |
16556 @item | |
16557 | |
16558 Simply make the beta version number be negative. In this scheme, you'd | |
16559 start with something like -1000 as the first beta, and then 21.0 beta 34 | |
16560 would get renumbered to 21.0.-968. Obviously, this is a crazy and | |
16561 convoluted scheme as well, and we would be best to avoid it. | |
16562 | |
16563 @end enumerate | |
16564 | |
16565 Currently, the between-beta snapshots are not numbered, but I think that | |
16566 they probably should be. If appropriate scripts are handled to automate | |
16567 beta release, it should be very easy to have a version number | |
16568 automatically updated whenever a snapshot is made. The number could be | |
16569 added either as a separate snapshot number, and you'd have 21.0 beta 34 | |
16570 pre 1, which becomes before 21.0 beta 34; or we could make the beta | |
16571 number be floating point, and then the same snapshot would have to be | |
16572 called 21.0 beta 33.1. The latter solution seems quite kludgey to me. | |
16573 | |
16574 There also needs to be a clear way to distinguish, when a net release is | |
16575 made, which branch the release is a part of. Again, three solutions | |
16576 come to mind: | |
16577 | |
16578 @enumerate | |
16579 @item | |
16580 | |
16581 The major version number reflects which development branch the release | |
16582 is in and the minor version number indicates how many releases have been | |
16583 made along this branch. In this scheme, 21.0 is always the first | |
16584 release of the 21 series development branch, and when this branch is | |
16585 split, the child branch that becomes the experimental branch gets | |
16586 version numbers starting with 22. This scheme is the simplest, and it's | |
16587 the one I like best. | |
16588 @item | |
16589 | |
16590 We move to a three-part version number. In this scheme, the first two | |
16591 numbers indicate the branch, and the third number indicates the release | |
16592 along the branch. In this scheme, we have numbers like 21.0.1, which | |
16593 would be the second release in the 21.0 series branch, and 21.1.2, which | |
16594 would be the third release in the | |
16595 21.1 series branch. The major version number then gets increased | |
16596 only very occasionally, and only when a sufficiently major architectural | |
16597 change has been made, particularly one that causes compatibility | |
16598 problems with code written for previous branches. I think schemes like | |
16599 this are unnecessary in most circumstances, because usually either the | |
16600 major version number ends up changing so often that the second number is | |
16601 always either zero or one, or the major version number never changes, | |
16602 and as such becomes useless. By the time the major version number would | |
16603 change, the product itself has changed so much that it often gets | |
16604 renamed. Furthermore, it is clear that the two version number scheme | |
16605 has been used throughout most of the history of Emacs, and recently we | |
16606 have been following the two number scheme also. If we introduced a | |
16607 third revision number, at this point it would both confuse existing code | |
16608 that assumed there were two numbers, and would look rather silly given | |
16609 that the major version number is so high and would probably remain at | |
16610 the same place for quite a long time. | |
16611 @item | |
16612 | |
16613 A third scheme that would attempt to cross the two schemes would keep | |
16614 the same concept of major version number as for the three number scheme, | |
16615 and would compress the second and third numbers of the three number | |
16616 scheme into one number by using increments of ten. For example, the | |
16617 current 21.x branch would have releases No. 21.0, 21.1, etc. The next | |
16618 branch would be No. 21.10, 21.11, etc. I don't like this scheme very | |
16619 much because it seems rather kludgey, and also because it is not used in | |
16620 any other product as far as I know. | |
16621 @item | |
16622 | |
16623 Another scheme that would combine the second and third numbers in the | |
16624 three number scheme would be to have the releases in the current 21.x | |
16625 series be numbered 21.0, then 21.01, then 22.02, etc. The next series | |
16626 is 21.1, then 21.11, then 21.12, etc. This is similar to the way that | |
16627 version numbers are done for DOS in Windows. I also think that this | |
16628 scheme is fairly silly because, like the previous scheme, its only | |
16629 purpose is to avoid increasing the major version number very much. But | |
16630 given that we have already have a fairly large major version number, | |
16631 there doesn't seem to be any particular problem with increasing this | |
16632 number by one every year or two. Some people will object that by doing | |
16633 this, it becomes impossible to tell when a change is so major that it | |
16634 causes a lot of code breakage, but past releases have not been accurate | |
16635 indicators of this. For example, | |
16636 19.12 caused a lot of code breakage, but 20.0 caused less, and 21.0 | |
16637 caused less still. In the GNU Emacs world, there were byte code changes | |
16638 made between 19.28 and 19.29, but as far as I know, not between 19.29 | |
16639 and 20.0. | |
16640 | |
16641 @end enumerate | |
16642 | |
16643 With three active development branches, synchronizing code changes | |
16644 between the branches is obviously somewhat of a problem. To make things | |
16645 easier, I propose a few general guidelines: | |
16646 | |
16647 @enumerate | |
16648 @item | |
16649 | |
16650 Merging between different branches need not happen that often. It | |
16651 should not happen more often than necessary to avoid undue burden on the | |
16652 maintainer, but needs to be done at all defined checkpoints. These | |
16653 checkpoints need to be noted in all of the places that track changes | |
16654 along the branch, for example, in all of the change logs and in all of | |
16655 the CVS tags. | |
16656 @item | |
16657 | |
16658 Every code change that can be considered a self-contained unit, no | |
16659 matter how large or small, needs to have a change log entry, preferably | |
16660 a single change log entry associated with it. This is an absolute | |
16661 requirement. There should be no code changes without an associated | |
16662 change log entry. Otherwise, it is highly likely that patches will not | |
16663 be correctly synchronized across all versions, and will get lost. There | |
16664 is no need for change log entries to contain unnecessary detail though, | |
16665 and it is important that there be no more change log entries than | |
16666 necessary, which means that two or more change log entries associated | |
16667 with a single patch need to be grouped together if possible. This might | |
16668 imply that there should be one global change log instead of change logs | |
16669 in each directory, or at the very least, the number of separate change | |
16670 logs should be kept to a minimum. | |
16671 @item | |
16672 | |
16673 The patch that is associated with each change log entry needs to be kept | |
16674 around somewhere. The reason for this is that when synchronizing code | |
16675 from some branch to some earlier branch, it is necessary to go through | |
16676 each change log entry and decide whether a change is worthy to make it | |
16677 into a more stable branch. If so, the patch associated with this change | |
16678 needs to be individually applied to the earlier branch. | |
16679 @item | |
16680 | |
16681 All changes made in more stable branches get merged into less stable | |
16682 branches unless the change really is completely unnecessary in the less | |
16683 stable branch because it is superseded by some other change. This will | |
16684 probably mean more developers making changes to the semi-stable branch | |
16685 than to the experimental branch. This means that developers should | |
16686 strive to do their development in the most stable branch that they | |
16687 expect their code to go into. An alternative to this which is perhaps | |
16688 more workable is simply to insist that all developers make all patches | |
16689 based off of the experimental branch, and then later merge these patches | |
16690 down to the more stable branches as necessary. This means, however, | |
16691 that submitted patches should never be combinations of two or more | |
16692 unrelated changes. Whenever such patches are submitted, they should | |
16693 either be rejected (which should apply to anybody who should know | |
16694 better, which probably means everybody on the beta list and anybody else | |
16695 who is a regular contributor), or the maintainer or some other | |
16696 designated party needs to filter the combined patch into separate | |
16697 patches, one per logical change. | |
16698 @item | |
16699 | |
16700 The maintainer should keep all the patches around in some data base, and | |
16701 the patches should be given an identifier consisting of the author of | |
16702 the patch, the date the patch was submitted, and some other identifying | |
16703 characteristic, such as a number, in case there is more than one patch | |
16704 on the same date by the same author. The database should hopefully be | |
16705 correctly marked at all times with something indicating which branches | |
16706 the patch has been applied to, and this database should hopefully be | |
16707 publicly visible so that patch authors can determine whether their | |
16708 patches have been applied, and whether their patches have been received, | |
16709 so that patches do not get needlessly resubmitted. | |
16710 @item | |
16711 | |
16712 Global automatable changes such as textual renaming, reordering, and | |
16713 additions or deletions of parameters in function calls should still be | |
16714 allowed, even with multiple development branches. (Sometimes these are | |
16715 necessary for code cleanliness, and in the long run, they save a lot of | |
16716 time, even through they may cause some headaches in the short-term.) In | |
16717 general, when such changes are made, they should occur in a separate | |
16718 beta version that contains only such changes and no other patches, and | |
16719 the changes should be made in both the semi-stable and experimental | |
16720 branches at the same time. The description of the beta version should | |
16721 make it very clear that the beta is comprised of such changes. The | |
16722 reason for doing these things is to make it easier for people to diff | |
16723 between beta versions in order to figure out the changes that were made | |
16724 without the diff getting cluttered up by these code cleanliness changes | |
16725 that don't change any actual behavior. | |
16726 | |
16727 @end enumerate | |
16728 | |
16729 @uref{../../www.666.com/ben,Ben Wing} | |
16730 | |
16731 @node Future Work -- Improvements to the @code{xemacs.org} Website, Future Work -- Keybindings, Future Work -- Version Number and Development Tree Organization, Future Work | |
16732 @section Future Work -- Improvements to the @code{xemacs.org} Website | |
16733 @cindex future work, improvements to the @code{xemacs.org} website | |
16734 @cindex improvements to the @code{xemacs.org} website, future work | |
16735 | |
16736 The @code{xemacs.org} web site is the face that XEmacs presents to the | |
16737 outside world. In my opinion, its most important function is to present | |
16738 information about XEmacs in such a way that solicits new XEmacs users | |
16739 and co-contributors. Existing members of the XEmacs community can | |
16740 probably find out most of the information they want to know about XEmacs | |
16741 regardless of what shape the web site is in, or for that matter, perhaps | |
16742 even if the web site doesn't exist at all. However, potential new users | |
16743 and co-contributors who go to the XEmacs web site and find it out of | |
16744 date and/or lacking the information that they need are likely to be | |
16745 turned away and may never return. For this reason, I think it's | |
16746 extremely important that the web site be up-to-date, well-organized, and | |
16747 full of information that an inquisitive visitor is likely to want to | |
16748 know. | |
16749 | |
16750 The current XEmacs web site needs a lot of work if it is to meet these | |
16751 standards. I don't think it's reasonable to expect one person to do all | |
16752 of this work and make continual updates as needed, especially given the | |
16753 dismal record that the XEmacs web site has had. The proper thing to do | |
16754 is to place the web site itself under CVS and allow many of the core | |
16755 members to remotely check files in and out. This way, for example, | |
16756 Steve could update the part of the site that contains the current | |
16757 release status of XEmacs. (Much of this could be done by a script that | |
16758 Steve executes when he sends out a beta release announcement which | |
16759 automatically HTML-izes the mail message and puts it in the appropriate | |
16760 place on the web site. There are programs that are specifically | |
16761 designed to convert email messages into HTML, for example | |
16762 @code{mhonarc}.) Meanwhile, the @code{xemacs.org} mailing list | |
16763 administrator (currently Jason Mastaler, I think) could maintain the | |
16764 part of the site that describes the various mailing lists and other | |
16765 addresses at @code{xemacs.org}. Someone like me (perhaps through a | |
16766 proxy typist) could maintain the part of the site that specifies the | |
16767 future directions that XEmacs is going in, etc., etc. | |
16768 | |
16769 Here are some things that I think it's very important to add to the web | |
16770 site. | |
16771 | |
16772 @enumerate | |
16773 @item | |
16774 | |
16775 A page describing in detail how to get involved in the XEmacs | |
16776 development process, how to submit and where to submit various patches | |
16777 to the XEmacs core or associated packages, how to contact the | |
16778 maintainers and core developers of XEmacs and the maintainers of various | |
16779 packages, etc. | |
16780 @item | |
16781 | |
16782 A page describing exactly how to download, compile, and install XEmacs, | |
16783 and how to download and install the various binary distributions. This | |
16784 page should particularly cover in detail how exactly the package system | |
16785 works from an installation standpoint and how to correctly compile and | |
16786 install under Microsoft Windows and Cygwin. This latter section should | |
16787 cover what compilers are needed under Microsoft Windows and Cygwin, and | |
16788 how to get and install the Cygwin components that are needed. | |
16789 @item | |
16790 | |
16791 A page describing where to get the various ancillary libraries that can | |
16792 be linked with XEmacs, such as the JPEG, TIFF, PNG, X-Face, DBM, and | |
16793 other libraries. This page should also cover how to correctly compile | |
16794 it and install these libraries, including under Microsoft Windows (or at | |
16795 least it should contain pointers to where this information can be | |
16796 found). Also, it should describe anything that needs to be specified as | |
16797 an option to @code{configure} in order for XEmacs to link with and make | |
16798 use of these libraries or of Motif or CDE. Finally, this page should | |
16799 list which versions of the various libraries are required for use with | |
16800 the various different beta versions of XEmacs. (Remember, this can | |
16801 change from beta to beta, and someone needs to keep a watchful eye on | |
16802 this). | |
16803 @item | |
16804 | |
16805 Pointers to any other sites containing information on XEmacs. This | |
16806 would include, for example, Hrvoje's XEmacs on Windows FAQ and my | |
16807 Architecting XEmacs web site. (Presumably, most of the information in | |
16808 this section will be temporary. Eventually, these pages should be | |
16809 integrated into the main XEmacs web site). | |
16810 @item | |
16811 | |
16812 A page listing the various sub-projects in the XEmacs development | |
16813 process and who is responsible for each of these sub-projects, for | |
16814 example development of the package system, administration of the mailing | |
16815 lists, maintenance of stable XEmacs versions, maintenance of the CVS web | |
16816 interface, etc. This page should also list all of the packages that are | |
16817 archived at @code{xemacs.org} and who is the maintainer or maintainers | |
16818 for each of these packages. | |
16819 | |
16820 @end enumerate | |
16821 | |
16822 @subheading Other Places with an XEmacs Presence | |
16823 | |
16824 We should try to keep an XEmacs presence in all of the major places on | |
16825 the web that are devoted to free software or to the "open source" | |
16826 community. This includes, for example, the open source web site at | |
16827 @uref{../../opensource.oreilly.com/default.htm,http://opensource.oreilly.com} | |
16828 (I'm already in the process of contacting this site), the Freshmeat site | |
16829 at @uref{../../www.freshmeat.net/default.htm,http://www.freshmeat.net}, | |
16830 the various announcement news groups (for example, | |
16831 @uref{news:comp.os.linux.announce,comp.os.linux.announce}, and the | |
16832 Windows announcement news group) etc. | |
16833 | |
16834 @uref{../../www.666.com/ben/default.htm,Ben Wing} | |
16835 | |
16836 @node Future Work -- Keybindings, Future Work -- Byte Code Snippets, Future Work -- Improvements to the @code{xemacs.org} Website, Future Work | |
16837 @section Future Work -- Keybindings | |
16838 @cindex future work, keybindings | |
16839 @cindex keybindings, future work | |
16840 | |
16841 @menu | |
16842 * Future Work -- Keybinding Schemes:: | |
16843 * Future Work -- Better Support for Windows Style Key Bindings:: | |
16844 * Future Work -- Misc Key Binding Ideas:: | |
16845 @end menu | |
16846 | |
16847 @node Future Work -- Keybinding Schemes, Future Work -- Better Support for Windows Style Key Bindings, Future Work -- Keybindings, Future Work -- Keybindings | |
16848 @subsection Future Work -- Keybinding Schemes | |
16849 @cindex future work, keybinding schemes | |
16850 @cindex keybinding schemes, future work | |
16851 | |
16852 @strong{Abstract:} We need a standard mechanism that allows a different | |
16853 global key binding schemes to be defined. Ideally, this would be the | |
16854 @uref{keyboard-actions.html,keyboard action interface} that I have | |
16855 proposed, however this would require a lot of work on the part of mode | |
16856 maintainers and other external Elisp packages and will not be rady in | |
16857 the short term. So I propose a very kludgy interface, along the lines | |
16858 of what is done in Viper currently. Perhaps we can rip that key munging | |
16859 code out of Viper and make a separate extension that implements a global | |
16860 key binding scheme munging feature. This way a key binding scheme could | |
16861 rearrange all the default keys and have all sorts of other code, which | |
16862 depends on the standard keys being in their default location, still | |
16863 work. | |
16864 | |
16865 @node Future Work -- Better Support for Windows Style Key Bindings, Future Work -- Misc Key Binding Ideas, Future Work -- Keybinding Schemes, Future Work -- Keybindings | |
16866 @subsection Future Work -- Better Support for Windows Style Key Bindings | |
16867 @cindex future work, better support for windows style key bindings | |
16868 @cindex better support for windows style key bindings, future work | |
16869 | |
16870 @strong{Abstract:} This page describes how we could create an XEmacs | |
16871 extension that modifies the global key bindings so that a Windows user | |
16872 would feel at home when using the keyboard in XEmacs. Some of these | |
16873 bindings don't conflict with standard XEmacs keybindings and should be | |
16874 added by default, or at the very least under Windows, and probably under | |
16875 X Windows as well. Other key bindings would need to be implemented in a | |
16876 Windows compatibility extension which can be enabled and disabled on the | |
16877 fly, following the conventions outlined in | |
16878 @uref{enabling-extensions.html,Standard interface for enabling | |
16879 extensions} Ideally, this should be implemented using the | |
16880 @uref{keyboard-actions.html,keyboard action interface} but these wil not | |
16881 be available in the short term, so we will have to resort to some awful | |
16882 kludges, following the model of Michael Kifer's Viper mode. | |
16883 | |
16884 We really need to make XEmacs provide standard Windows key bindings as | |
16885 much as possible. Currently, for example, there are at least two | |
16886 packages that allow the user to make a selection using the shifted arrow | |
16887 keys, and neither package works all that well, or is maintained. There | |
16888 should be one well-written piece of code that does this, and it should | |
16889 be a standard part of XEmacs. In fact, it should be turned on by | |
16890 default under Windows, and probably under X as well. (As an aside here, | |
16891 one point of contention in how to implement this involves what happens | |
16892 if you select a region using the shifted arrow keys and then hit the | |
16893 regular arrow keys. Does the region remain selected or not? I think | |
16894 there should be a variable that controls which of these two behaviors | |
16895 you want. We can argue over what the default value of this variable | |
16896 should be. The standard Windows behavior here is to keep the region | |
16897 selected, but move the insertion point elsewhere, which is unfortunately | |
16898 impossible to implement in XEmacs.) | |
16899 | |
16900 Some thought should be given to what to do about the standard Windows | |
16901 control and alt key bindings. Under NTEmacs, there is a variable that | |
16902 controls whether the alt key behaves like the Emacs meta key, or whether | |
16903 it is passed on to the menu as in standard Windows programs. We should | |
16904 surely implement this and put this option on the @strong{Options} menu. | |
16905 Making @kbd{Alt-f} for example, invoke the @strong{File} menu, is not | |
16906 all that disruptive in XEmacs, because the user can always type @kbd{ESC | |
16907 f} to get the meta key functionality. Making @kbd{Control-x}, for | |
16908 example, do @strong{Cut}, is much, much more problematic, of course, but | |
16909 we should consider how to implement this anyway. One possibility would | |
16910 be to move all of the current Emacs control key bindings onto | |
16911 control-shift plus a key, and to make the simple control keys follow the | |
16912 Windows standard as much as possible. This would mean, for example, | |
16913 that we would have the following keybindings:@* @kbd{Control-x} ==> | |
16914 @strong{Cut} @* @kbd{Control-c} ==> @strong{Copy} @* @kbd{Control-v} ==> | |
16915 @strong{Paste} @* @kbd{Control-z} ==> @strong{Undo}@* @kbd{Control-f} | |
16916 ==> @strong{Find} @* @kbd{Control-a} ==> @strong{Select All}@* | |
16917 @kbd{Control-s} ==> @strong{Save}@* @kbd{Control-p} ==> @strong{Print}@* | |
16918 @kbd{Control-y} ==> @strong{Redo}@* (this functionality @emph{is} | |
16919 available in XEmacs with Kyle Jones' @code{redo.el} package, but it | |
16920 should be better integrated)@* @kbd{Control-n} ==> @strong{New} @* | |
16921 @kbd{Control-o} ==> @strong{Open}@* @kbd{Control-w} ==> @strong{Close | |
16922 Window}@* | |
16923 | |
16924 The changes described in the previous paragraph should be put into an | |
16925 extension named @code{windows-keys.el} (see | |
16926 @uref{enabling-extensions.html,Standard interface for enabling | |
16927 extensions}) so that it can be enabled and disabled on the fly using a | |
16928 menu item and can be selected as the default for a particular user in | |
16929 their custom options file. Once this is implemented, the Windows | |
16930 installer should also be modified so that it brings up a dialog box that | |
16931 allows the user to make a selection of which key binding scheme they | |
16932 would prefer as the default, either the XEmacs standard bindings, Vi | |
16933 bindings (which would be Viper mode), Windows-style bindings, Brief, | |
16934 CodeWright, Visual C++, or whatever we manage to implement. | |
16935 | |
16936 @uref{../../www.666.com/ben/default.htm,Ben Wing} | |
16937 | |
16938 @node Future Work -- Misc Key Binding Ideas, , Future Work -- Better Support for Windows Style Key Bindings, Future Work -- Keybindings | |
16939 @subsection Future Work -- Misc Key Binding Ideas | |
16940 @cindex future work, misc key binding ideas | |
16941 @cindex misc key binding ideas, future work | |
16942 | |
16943 @itemize | |
16944 @item | |
16945 M-123 ... do digit arg | |
16946 | |
16947 @item | |
16948 However, M-( group commands together until M-) | |
16949 | |
16950 @item | |
16951 Nested M-() are allowed. | |
16952 | |
16953 @item | |
16954 Number repeating plus () repeats N times each group of commands as a | |
16955 unit. | |
16956 | |
16957 @item | |
16958 M-() by itself forms an anonymous macro, and there should be a | |
16959 command to repeat, like VI (execute macro), but when no () before, | |
16960 it repeats the last command of same amount of complication - or more | |
16961 like, somewhere there is a repeats all command back to make to act | |
16962 that stopping like VI's dot command. | |
16963 | |
16964 @item | |
16965 C-numbers switches to a particular window. maybe 1-3 or 1-4 does | |
16966 this. | |
16967 | |
16968 @item | |
16969 C-4 or 5 to 9 (or ()? maybe reserved) switches to a particular frame. | |
16970 | |
16971 @item | |
16972 Possibly C-Sh-numbers select more windows or frames. | |
16973 | |
16974 @item | |
16975 M-C-1 | |
16976 M-C-2 | |
16977 M-C-3 | |
16978 M-C-4 | |
16979 M-C-5 | |
16980 M-C-6 | |
16981 M-C-7 | |
16982 M-C-8 | |
16983 M-C-9 | |
16984 M-C-0 | |
16985 | |
16986 maybe should be execute anonymous macros (other possibility is insert | |
16987 register but you can easily simulate with a keyboard macro) | |
16988 | |
16989 @item | |
16990 What about C-S M-C-S M-S?? | |
16991 | |
16992 @item | |
16993 I think there should be default fun key binding for @strong{ILLEGIBLE} | |
16994 similar to what I have - load, save, cut, copy, paste, kill line, | |
16995 start/end macro, do macro | |
16996 @end itemize | |
16997 | |
16998 @node Future Work -- Byte Code Snippets, Future Work -- Lisp Stream API, Future Work -- Keybindings, Future Work | |
16999 @section Future Work -- Byte Code Snippets | |
17000 @cindex future work, byte code snippets | |
17001 @cindex byte code snippets, future work | |
17002 | |
17003 @itemize | |
17004 @item | |
17005 For use in time critical (e.g. redisplay) places such as display | |
17006 tables - a simple piece of code is evalled, e.g. | |
17007 @example | |
17008 (int-to-char (1+ c)) | |
17009 @end example | |
17010 where c is the arg, specbound. | |
17011 | |
17012 @item | |
17013 can be compiled like | |
17014 @example | |
17015 (byte-compile-snippet (int-to-char (1+ c)) (c)) | |
17016 ^^^ | |
17017 environment of local vars | |
17018 @end example | |
17019 | |
17020 @item | |
17021 need eval with bindings (not hard to implement) | |
17022 (extendable when lexical scoping present) | |
17023 | |
17024 @item | |
17025 What's the return value of byte-compile-snippet? | |
17026 (Look to see how this might be implemented) | |
17027 @end itemize | |
17028 | |
17029 @menu | |
17030 * Future Work -- Autodetection:: | |
17031 * Future Work -- Conversion Error Detection:: | |
17032 * Future Work -- BIDI Support:: | |
17033 * Future Work -- Localized Text/Messages:: | |
17034 @end menu | |
17035 | |
17036 @node Future Work -- Autodetection, Future Work -- Conversion Error Detection, Future Work -- Byte Code Snippets, Future Work -- Byte Code Snippets | |
17037 @subsection Future Work -- Autodetection | |
17038 @cindex future work, autodetection | |
17039 @cindex autodetection, future work | |
17040 | |
17041 There are various proposals contained here. | |
17042 | |
17043 @subsection New Implementation of Autodetection Mechanism | |
17044 | |
17045 The current auto detection mechanism in XEmacs Mule has many | |
17046 problems. For one thing, it is wrong too much of the time. Another | |
17047 problem, although easily fixed, is that priority lists are fixed rather | |
17048 than varying, depending on the particular locale; and finally, it | |
17049 doesn't warn the user when it's not sure of the encoding or when there's | |
17050 a mistake made during decoding. In both of these situations the user | |
17051 should be presented with a list of likely encodings and given the | |
17052 choice, rather than simply proceeding anyway and giving a result that is | |
17053 likely to be wrong and may result in data corruption when the file is | |
17054 saved out again. | |
17055 | |
17056 All coding systems are categorized according to their type. Currently | |
17057 this includes ISO2022, Big 5, Shift-JIS, UTF8 and a few others. In | |
17058 the future there will be many more types defined and this mechanism | |
17059 will be generalized so that it is easily extendable by the Lisp | |
17060 programmer. | |
17061 | |
17062 In general, each coding system type defines a series of subtypes which | |
17063 are handled differently for the purpose of detection. For example, ISO | |
17064 2022 defines many different subtypes such as 7 bit, 8 bit, locking | |
17065 shift, designating and so on. UCS2 may define subtypes such as normal | |
17066 and byte reversed. | |
17067 | |
17068 The detection engine works conceptually by calling the detection | |
17069 methods of all of the defined coding system types in parallel on | |
17070 successive chunks of data (which may, for example, be 4K in size, but | |
17071 where the size makes no difference except for optimization purposes) | |
17072 and watching the results until either a definite answer is determined | |
17073 or the end of data is reached. The way the definite answer is | |
17074 determined will be defined below. The detection method of the coding | |
17075 system type is passed some data and a chunk of memory, which the | |
17076 method uses to store its current state (and which is maintained | |
17077 separately for each coding system type by the detection engine between | |
17078 successive calls to the coding system type's detection method). Its | |
17079 return value should be an alist consisting of a list of all of the | |
17080 defined subtypes for that coding system type along with a level of | |
17081 likelihood and a list of additional properties indicating certain | |
17082 features detected in the data. The extra properties returned are | |
17083 defined entirely by the particular coding system type and are used | |
17084 only in the algorithm described below under "user control." However, | |
17085 the levels of likelihood have a standard meaning as follows: | |
17086 | |
17087 Level 4 means "near certainty" and typically indicates that a | |
17088 signature has been detected, usually at the beginning of the data, | |
17089 indicating that the data is encoded in this particular coding system | |
17090 type. An example of this would be the byte order mark at the beginning | |
17091 of UCS2 encoded data or the GZIP mark at the beginning of GZIP data. | |
17092 | |
17093 Level 3 means "highly likely" and indicates that tell-tale signs have | |
17094 been discovered in the data that are characteristic of this particular | |
17095 coding system type. Examples of this might be ISO 2022 escape | |
17096 sequences or the current Unicode end of line markers at regular | |
17097 intervals. | |
17098 | |
17099 Level 2 means "strongly statistically likely" indicating that | |
17100 statistical analysis concludes that there's a high chance that this | |
17101 data is encoded according to this particular type. For example, this | |
17102 might mean that for UCS2 data, there is a high proportion of null bytes | |
17103 or other repeated bytes in the odd-numbered bytes of the data and a | |
17104 high variance in the even-numbered bytes of the data. For Shift-JIS, | |
17105 this might indicate that there were no illegal Shift-JIS sequences | |
17106 and a fairly high occurrence of common Shift-JIS characters. | |
17107 | |
17108 Level 1 means "weak statistical likelihood" meaning that there is some | |
17109 indication that the data is encoded in this coding system type. In | |
17110 fact, there is a reasonable chance that it may be some other type as | |
17111 well. This means, for example, that no illegal sequences were | |
17112 encountered and at least some data was encountered that is purposely | |
17113 not in other coding system types. For Shift-JIS data, this might mean | |
17114 that some bytes in the range 128 to 159 were encountered in the data. | |
17115 | |
17116 Level 0 means "neutral" which is to say that there's either not enough | |
17117 data to make any decision or that the data could well be interpreted | |
17118 as this type (meaning no illegal sequences), but there is little or no | |
17119 indication of anything particular to this particular type. | |
17120 | |
17121 Level -1 means "weakly unlikely" meaning that some data was | |
17122 encountered that could conceivably be part of the coding system type | |
17123 but is probably not. For example, successively long line-lengths or | |
17124 very rarely-encountered sequences. | |
17125 | |
17126 Level -2 means "strongly unlikely" meaning that typically a number | |
17127 of illegal sequences were encountered. | |
17128 | |
17129 The algorithm to determine when to stop and indicate that the data has | |
17130 been detected as a particular coding system uses a priority list, | |
17131 which is typically specified as part of the language environment | |
17132 determined from the current locale or the user's choice. This priority | |
17133 list consists of a list of coding system subtypes, along with a | |
17134 minimum level required for positive detection and optionally | |
17135 additional properties that need to be present. Using the return values | |
17136 from all of the detection methods called, the detection engine looks | |
17137 through this priority list until it finds a positive match. In this | |
17138 priority list, along with each subtype is a particular coding system | |
17139 to return when the subtype is encountered. (For example, in a | |
17140 Japanese-language environment particular subtypes of ISO 2022 will be | |
17141 associated with the Japanese coding system version of those | |
17142 subtypes). It is perfectly legal and quite common in fact, to list the | |
17143 same subtype more than once in the priority list with successively | |
17144 lower requirements. Other facts that can be listed in the priority | |
17145 list for a subtype are "reject", meaning that the data should never be | |
17146 detected as this subtype, or "ask", meaning that if the data is | |
17147 detected to be this subtype, the user will be asked whether they | |
17148 actually mean this. This latter property could be used, for example, | |
17149 towards the bottom of the priority list. | |
17150 | |
17151 In addition there is a global variable which specifies the minimum | |
17152 number of characters required before any positive match is | |
17153 reported. There may actually be more than one such variable for | |
17154 different sources of data, for example, detection of files versus | |
17155 detection of subprocess data. | |
17156 | |
17157 Whenever a file is opened and detected to be a particular coding | |
17158 system, the subtype, the coding system and the associated level of | |
17159 likelihood will be prominently displayed either in the echo area or in | |
17160 a status box somewhere. | |
17161 | |
17162 If no positive match is found according to the priority list, or if | |
17163 the matches that are found have the "ask" property on them, then the | |
17164 user will be presented with a list of choices of possible encodings | |
17165 and asked to choose one. This list is typically sorted first by level | |
17166 of likelihood, and then within this, by the order in which the | |
17167 subtypes appear in the priority list. This list is displayed in a | |
17168 special kind of dialog box or other buffer allowing the user, in | |
17169 addition to just choosing a particular encoding, to view what the | |
17170 file would look like if it were decoded according to the type. | |
17171 | |
17172 Furthermore, whenever a file is decoded according to a particular | |
17173 type, the decoding engine keeps track of status values that are output | |
17174 by the coding system type's decoding method. Generally, this status | |
17175 will be in the form of errors or warnings of various levels, some of | |
17176 which may be severe enough to stop the decoding entirely, and some of | |
17177 which may either indicate definitely malformed data but from which | |
17178 it's possible to recover, or simply data that appears rather | |
17179 questionable. If any of these status values are reported during | |
17180 decoding, the user will be informed of this and asked "are you sure?" | |
17181 As part of the "are you sure" dialog box or question, the user can | |
17182 display the results of the decoding to make sure it's correct. If the | |
17183 user says "no, they're not sure," then the same list of choices as | |
17184 previously mentioned will be presented. | |
17185 | |
17186 @subheading Implementation of Coding System Priority Lists in Various Locales | |
17187 | |
17188 @example | |
17189 @enumerate | |
17190 @item | |
17191 Default locale | |
17192 | |
17193 @enumerate | |
17194 @item | |
17195 Some Unicode (fixed width; maybe UTF-8, too?) may optionally | |
17196 be detected by the byte-order-mark magic (if the first two | |
17197 bytes are 0xFE 0xFF, the file is Unicode text, if 0xFF 0xFE, | |
17198 it is wrong-endian Unicode; if legal in UTF-8, it would be | |
17199 0xFE 0xBB 0xBF, either-endian). This is probably an | |
17200 optimization that should not be on by default yet. | |
17201 | |
17202 @item | |
17203 ISO-2022 encodings will be detected as long as they use | |
17204 explicit designation of all non-ASCII character sets. This | |
17205 means that many 7-bit ISO-2022 encodings would be detected | |
17206 (eg, ISO-2022-JP), but EUC-JP and X Compound Text would not, | |
17207 because they implicitly designate character sets. | |
17208 | |
17209 N.B. Latin-1 will be detected as binary, as for any Latin-*. | |
17210 | |
17211 N.B. An explicit ISO-2022 designation is semantically | |
17212 equivalent to a Content-Type: header. It is more dangerous | |
17213 because shorter, but I think we should recognize them by | |
17214 default despite the slight risk; XEmacs is a text editor. | |
17215 | |
17216 N.B. This is unlikely to be as dangerous as it looks at first | |
17217 glance. Any file that includes an 8-bit-set byte before the | |
17218 first valid designation should be detected as binary. | |
17219 | |
17220 @item | |
17221 Binary files will be detected (eg, presence of NULs, other | |
17222 non-whitespace control characters, absurdly long lines, and | |
17223 presence of bytes >127). | |
17224 | |
17225 @item | |
17226 Everything else is ASCII. | |
17227 | |
17228 @item | |
17229 Newlines will be detected in text files. | |
17230 @end enumerate | |
17231 | |
17232 @item | |
17233 European locales | |
17234 | |
17235 @enumerate | |
17236 @item | |
17237 Unicode may optionally be detected by the byte-order-mark | |
17238 magic. | |
17239 | |
17240 @item | |
17241 ISO-2022 encodings will be detected as long as they use | |
17242 explicit designation of all non-ASCII character sets. | |
17243 | |
17244 @item | |
17245 A locale-specific class of 1-byte character sets (eg, | |
17246 '(Latin-1)) will be detected. | |
17247 | |
17248 N.B. The reason for permitting a class is for cases like | |
17249 Cyrillic where there are both ISO-8859 encodings and | |
17250 incompatible encodings (KOI-8r) in common use. If you want to | |
17251 write a Latin-1 v. Latin-2 detector, be my guest, but I don't | |
17252 think it would be easy or accurate. | |
17253 | |
17254 @item | |
17255 Binary files will be detected per (2)(c), except that only | |
17256 8-bit bytes out of the encoding's range imply binary. | |
17257 | |
17258 @item | |
17259 Everything else is ASCII. | |
17260 | |
17261 @item | |
17262 Newlines will be detected in text files. | |
17263 @end enumerate | |
17264 | |
17265 @item | |
17266 CJK locales | |
17267 | |
17268 @enumerate | |
17269 @item | |
17270 Unicode may optionally be detected by the byte-order-mark | |
17271 magic. | |
17272 | |
17273 @item | |
17274 ISO-2022 encodings will be detected as long as they use | |
17275 explicit designation of all non-ASCII character sets. | |
17276 | |
17277 @item | |
17278 A locale-specific class of multi-byte and wide-character | |
17279 encodings will be detected. | |
17280 N.B. No 1-byte character sets (eg, Latin-1) will be detected. | |
17281 The reason for a class is to allow the Japanese to let Mule do | |
17282 the work of choosing EUC v. SJIS. | |
17283 | |
17284 @item | |
17285 Binary files will be detected per (3)(d). | |
17286 | |
17287 @item | |
17288 Everything else is ASCII. | |
17289 | |
17290 @item | |
17291 Newlines will be detected in text files. | |
17292 @end enumerate | |
17293 | |
17294 @item | |
17295 Unicode and general locales; multilingual use | |
17296 @end enumerate | |
17297 | |
17298 @enumerate | |
17299 @item | |
17300 Hopefully a system general enough to handle (2)--(4) will | |
17301 handle these, too, but we should watch out for gotchas like | |
17302 Unicode "plane 14" tags which (I think _both_ Ben and Olivier | |
17303 will agree) have no place in the internal representation, and | |
17304 thus must be treated as out-of-band control sequences. I | |
17305 don't know if all such gotchas will be as easy to dispose of. | |
17306 | |
17307 @item | |
17308 An explicit coding system priority list will be provided to | |
17309 allow multilingual users to autodetect both Shift JIS and Big | |
17310 5, say, but this ability is not promised by Mule, since it | |
17311 would involve (eg) heuristics like picking a set of code | |
17312 points that are frequent in Shift JIS and uncommon in Big 5 | |
17313 and betting that a file containing many characters from that | |
17314 set is Shift JIS. | |
17315 @end enumerate | |
17316 @end example | |
17317 | |
17318 @subheading Better Algorithm, More Flexibility, Different Levels of Certainty | |
17319 | |
17320 @subheading Much More Flexible Coding System Priority List, per-Language Environment | |
17321 | |
17322 @subheading User Ability to Select Encoding when System Unsure or Encounters Errors | |
17323 | |
17324 @subheading Another Autodetection Proposal | |
17325 | |
17326 however, in general the detection code has major problems and needs lots | |
17327 of work: | |
17328 | |
17329 @itemize @bullet | |
17330 @item | |
17331 instead of merely "yes" or "no" for particular categories, we need a | |
17332 more flexible system, with various levels of likelihood. Currently | |
17333 I've created a system with six levels, as follows: | |
17334 | |
17335 [see file-coding.h] | |
17336 | |
17337 Let's consider what this might mean for an ASCII text detector. (In | |
17338 order to have accurate detection, especially given the iteration I | |
17339 proposed below, we need active detectors for @strong{all} types of data we | |
17340 might reasonably encounter, such as ASCII text files, binary files, | |
17341 and possibly other sorts of ASCII files, and not assume that simply | |
17342 "falling back to no detection" will work at all well.) | |
17343 | |
17344 An ASCII text detector DOES NOT report ASCII text as level 0, since | |
17345 that's what the detector is looking for. Such a detector ideally | |
17346 wants all bytes in the range 0x20 - 0x7E (no high bytes!), except for | |
17347 whitespace control chars and perhaps a few others; LF, CR, or CRLF | |
17348 sequences at regular intervals (where "regular" might mean an average | |
17349 < 100 chars and 99% < 300 for code and other stuff of the "text file | |
17350 w/line breaks" variety, but for the "text file w/o line breaks" | |
17351 variety, excluding blank lines, averages could easily be 600 or more | |
17352 with 2000-3000 char "lines" not so uncommon); similar statistical | |
17353 variance between odds and evens (not Unicode); frequent occurrences of | |
17354 the space character; letters more common than non-letters; etc. Also | |
17355 checking for too little variability between frequencies of characters | |
17356 and for exclusion of particular characters based on character ranges | |
17357 can catch ASCII encodings like base-64, UUEncode, UTF-7, etc. | |
17358 Granted, this doesn't even apply to everything called "ASCII", and we | |
17359 could potentially distinguish off ASCII for code, ASCII for text, | |
17360 etc. as separate categories. However, it does give us a lot to work | |
17361 off of, in deciding what likelihood to choose -- and it shows there's | |
17362 in fact a lot of detectable patterns to look for even in something | |
17363 seemingly so generic as ASCII. The detector would report most text | |
17364 files in level 1 or level 2. EUC encodings, Shift-JIS, etc. probably | |
17365 go to level -1 because they also pass the EOL test and all other tests | |
17366 for the ASCII part of the text, but have lots of high bytes, which in | |
17367 essence turn them into binary. Aberrant text files like something in | |
17368 BASE64 encoding might get placed in level 0, because they pass most | |
17369 tests but fail dramatically the frequency test; but they should not be | |
17370 reported as any lower, because that would cause explicit prompting, | |
17371 and the user should be able any valid text file without prompting. | |
17372 The escape sequences and the base-64-type checks might send 7-bit | |
17373 iso2022 to 0, but probably not -1, for similar reasons. | |
17374 | |
17375 @item | |
17376 The assumed algorithm for the above detection levels is to in essence | |
17377 sort categories first by detection level and then by priority. | |
17378 Perhaps, however, we would want smarter algorithms, or at least | |
17379 something user-controllable -- in particular, when (other than no | |
17380 category at level 0 or greater) do we prompt the user to pick a | |
17381 category? | |
17382 | |
17383 @item | |
17384 Improvements in how the detection algorithm works: we want to handle | |
17385 lots of different ways something could be encoded, including multiple | |
17386 stacked encodings. trying to specify a series of detection levels | |
17387 (check for base64 first, then check for gzip, then check for an i18n | |
17388 decoding, then for crlf) won't generally work. for example, what | |
17389 about the same encoding appearing more than once? for example, take | |
17390 euc-jp, base64'd, then gzip'd, then base64'd again: this could well | |
17391 happen, and you could specify the encodings specifically as | |
17392 base64|gzip|base64|euc-jp, but we'd like to autodetect it without | |
17393 worrying about exactly what order these things appear in. we should | |
17394 allow for iterating over detection/decoding cycles until we reach | |
17395 some maximum (we got stuck in a loop, due to incorrect category | |
17396 tables or detection algorithms), have no reported detection levels | |
17397 over -1, or we end up with no change after a decoding pass (i.e. the | |
17398 coding system associated with a chosen category was @code{no-conversion} | |
17399 or something equivalent). it might make sense to divide things into | |
17400 two phases (internal and external), where the internal phase has a | |
17401 separate category list and would probably mostly end up handling EOL | |
17402 detection; but the i think about it, the more i disagree. with | |
17403 properly written detectors, and properly organized tables (in | |
17404 general, those decodings that are more "distinctive" and thus | |
17405 detectable with greater certainty go lower on the list), we shouldn't | |
17406 need two phases. for example, let's say the example above was also | |
17407 in CRLF format. The EOL detector (which really detects *plain text* | |
17408 with a particular EOL type) would return at most level 0 for all | |
17409 results until the text file is reached, whereas the base64, gzip or | |
17410 euc-jp decoders will return higher. Once the text file is reached, | |
17411 the EOL detector will return 0 or higher for the CRLF encoding, and | |
17412 all other detectors will return 0 or lower; thus, we will successfully | |
17413 proceed through CRLF decoding, or at worst prompt the user. (The only | |
17414 external-vs-internal distinction that might make sense here is to | |
17415 favor coding systems of the correct source type over those that | |
17416 require conversion between external and internal; if done right, this | |
17417 could allow the CRLF detector to return level 1 for all CRLF-encoded | |
17418 text files, even those that look like Base-64 or similar encoding, so | |
17419 that CRLF encoding will always get decoded without prompting, but not | |
17420 interfere with other decoders. On the other hand, this | |
17421 external-vs-internal distinction may not matter at all -- with | |
17422 automatic internal-external conversion, CRLF decoding can occur | |
17423 before or after decoding of euc-jp, base64, iso2022, or similar, | |
17424 without any difference in the final results.) | |
17425 | |
17426 #### What are we trying to say? In base64, the CRLF decoding before | |
17427 base64 decoding is irrelevant, they will be thrown out as whitespace | |
17428 is not significant in base64. | |
17429 | |
17430 [sjt considers all of this to be rather bogus. Ideas like "greater | |
17431 certainty" and "distinctive" can and should be quantified. The issue | |
17432 of proper table organization should be a question of optimization.] | |
17433 | |
17434 [sjt wonders if it might not be a good idea to use Unicode's newline | |
17435 character as the internal representation so that (for non-Unicode | |
17436 coding systems) we can catch EOL bugs on Unix too.] | |
17437 | |
17438 @item | |
17439 There need to be two priority lists and two | |
17440 category->coding-system lists. Once is general, the other | |
17441 category->langenv-specific. The user sets the former, the langenv | |
17442 category->the latter. The langenv-specific entries take precedence | |
17443 category->over the others. This works similarly to the | |
17444 category->category->Unicode charset priority list. | |
17445 | |
17446 @item | |
17447 The simple list of coding categories per detectors is not enough. | |
17448 Instead of coding categories, we need parameters. For example, | |
17449 Unicode might have separate detectors for UTF-8, UTF-7, UTF-16, | |
17450 and perhaps UCS-4; or UTF-16/UCS-4 would be one detection type. | |
17451 UTF-16 would have parameters such as "little-endian" and "needs BOM", | |
17452 and possibly another one like "collapse/expand/leave alone composite | |
17453 sequences" once we add this support. Usually these parameters | |
17454 correspond directly to a coding system parameter. Different | |
17455 likelihood values can be specified for each parameter as well as for | |
17456 the detection type as a whole. The user can specify particular | |
17457 coding systems for a particular combination of detection type and | |
17458 parameters, or can give "default parameters" associated with a | |
17459 detection type. In the latter case, we create a new coding system as | |
17460 necessary that corresponds to the detected type and parameters. | |
17461 | |
17462 @item | |
17463 a better means of presentation. rather than just coming up | |
17464 with the new file decoded according to the detected coding | |
17465 system, allow the user to browse through the file and | |
17466 conveniently reject it if it looks wrong; then detection | |
17467 starts again, but with that possibility removed. in cases where | |
17468 certainty is low and thus more than one possibility is presented, | |
17469 the user can browse each one and select one or reject them all. | |
17470 | |
17471 @item | |
17472 fail-safe: even after the user has made a choice, if they | |
17473 later on realize they have the wrong coding system, they can | |
17474 go back, and we've squirreled away the original data so they | |
17475 can start the process over. this may be tricky. | |
17476 | |
17477 @item | |
17478 using a larger buffer for detection. we use just a small | |
17479 piece, which can give quite random results. we may need to | |
17480 buffer up all the data we look through because we can't | |
17481 necessarily rewind. the idea is we proceed until we get a | |
17482 result that's at least at a certain level of certainty | |
17483 (e.g. "probable") or we reached a maximum limit of how much | |
17484 we want to buffer. | |
17485 | |
17486 @item | |
17487 dealing with interactive systems. we might need to go ahead | |
17488 and present the data before we've finished detection, and | |
17489 then re-decode it, perhaps multiple times, as we get better | |
17490 detection results. | |
17491 | |
17492 @item | |
17493 Clearly some of these are more important than others. at the | |
17494 very least, the "better means of presentation" should be | |
17495 implemented as soon as possible, along with a very simple means | |
17496 of fail-safe whenever the data is readibly available, e.g. it's | |
17497 coming from a file, which is the most common scenario. | |
17498 @end itemize | |
17499 | |
17500 ben [at least that's what sjt thinks] | |
17501 | |
17502 ***** | |
17503 | |
17504 While this is clearly something of an improvement over earlier designs, | |
17505 it doesn't deal with the most important issue: to do better than categories | |
17506 (which in the medium term is mostly going to mean "which flavor of Unicode | |
17507 is this?"), we need to look at statistical behavior rather than ruling out | |
17508 categories via presence of specific sequences. This means the stream | |
17509 processor should | |
17510 | |
17511 @enumerate | |
17512 @item | |
17513 keep octet distributions (octet, 2-, 3-, 4- octet sequences) | |
17514 @item | |
17515 in some kind of compressed form | |
17516 @item | |
17517 look for "skip features" (eg, characteristic behavior of leading | |
17518 bytes for UTF-7, UTF-8, UTF-16, Mule code) | |
17519 @item | |
17520 pick up certain "simple" regexps | |
17521 @item | |
17522 provide "triggers" to determine when statistical detectors should be | |
17523 invoked, such as octet count | |
17524 @item | |
17525 and "magic" like Unicode signatures or file(1) magic. | |
17526 @end enumerate | |
17527 | |
17528 --sjt | |
17529 | |
17530 @node Future Work -- Conversion Error Detection, Future Work -- BIDI Support, Future Work -- Autodetection, Future Work -- Byte Code Snippets | |
17531 @subsection Future Work -- Conversion Error Detection | |
17532 @cindex future work, conversion error detection | |
17533 @cindex conversion error detection, future work | |
17534 | |
17535 @subheading "No Corruption" Scheme for Preserving External Encoding when Non-Invertible Transformation Applied | |
17536 | |
17537 A preliminary and simple implementation is: | |
17538 | |
17539 @quotation | |
17540 But you could implement it much more simply and usefully by just | |
17541 determining, for any text being decoded into mule-internal, can we go | |
17542 back and read the source again? If not, remember the entire file | |
17543 (GNUS message, etc) in text properties. Then, implement the UI | |
17544 interface (like Netscape's) on top of that. This way, you have | |
17545 something that at least works, but it might be inefficient. All we | |
17546 would need to do is work on making the underlying implementation more | |
17547 efficient. | |
17548 @end quotation | |
17549 | |
17550 A more detailed proposal for avoiding binary file corruption is | |
17551 | |
17552 @quotation | |
17553 Basic idea: A coding system is a filter converting an entire input | |
17554 stream into an output stream. The resulting stream can be said to be | |
17555 "correspondent to" the input stream. Similarly, smaller units can | |
17556 correspond. These could potentially include zero width intervals on | |
17557 either side, but we avoid this. Specifically, the coding system works | |
17558 like: | |
17559 | |
17560 @example | |
17561 loop (input) @{ | |
17562 | |
17563 Read bytes till we have enough to generate a translated character or a chars. | |
17564 | |
17565 This establishes a "correspondence" between the whole input and | |
17566 output more or less in minimal chunks. | |
17567 | |
17568 @} | |
17569 @end example | |
17570 | |
17571 We then do the following processing: | |
17572 | |
17573 @enumerate | |
17574 @item | |
17575 Eliminate correspondences where one or the other of the I/O streams | |
17576 has a zero interval by combining with an adjacent interval; | |
17577 | |
17578 @item | |
17579 Group together all adjacent "identity" correspondences into as | |
17580 large groups as possible; | |
17581 | |
17582 @item | |
17583 Use text properties to store the non-identity correspondences on | |
17584 the characters. For identity correspondences, use a simple text | |
17585 property on all that contains no data but just indicates that the | |
17586 whole string of text is identity corresponded. (How do we define | |
17587 "identity"? Latin 1 or could it be something else? For example, | |
17588 Latin 2)? | |
17589 | |
17590 @item | |
17591 Figure out the procedures when text is inserted/deleted and copied | |
17592 or pasted. | |
17593 | |
17594 @item | |
17595 Figure out to save the file out making use of the | |
17596 correspondences. Allow ways of saving without correspondences, and | |
17597 doing a "save to buffer with and without correspondences." Need to | |
17598 be clever when dealing with modal coding systems to parse the | |
17599 correspondences to get the internal state right. | |
17600 @end enumerate | |
17601 @end quotation | |
17602 | |
17603 @subheading Another Error-Catching Idea | |
17604 | |
17605 Nov 4, 1999 | |
17606 | |
17607 Finally, I don't think "save the input" is as hard as you make it out to | |
17608 be. Conceptually, in fact, it's simple: for each minimal group of bytes | |
17609 where you cannot absolutely guarantee that an external->internal | |
17610 transformation is reversible, you put a text property on the | |
17611 corresponding internal character indicating the bytes that generated | |
17612 this character. We also put a text property on every character, | |
17613 indicating the coding system that caused the transformation. This | |
17614 latter text property is extremely efficient (e.g. in a buffer with no | |
17615 data pasted from elsewhere, it will map to a single extent over all the | |
17616 buffer), and the former cases should not be prevalent enough to cause a | |
17617 lot of inefficiency, esp. if we define what "reversible" means for each | |
17618 coding system in such a way that it correctly handles the most common | |
17619 cases. The hardest part, in fact, is making all the string/text | |
17620 handling in XEmacs be robust w.r.t. text properties. | |
17621 | |
17622 @subheading Strategies for Error Annotation and Coding Orthogonalization | |
17623 | |
17624 From sjt (?): | |
17625 | |
17626 We really want to separate out a number of things. Conceptually, | |
17627 there is a nested syntax. | |
17628 | |
17629 At the top level is the ISO 2022 extension syntax, including charset | |
17630 designation and invocation, and certain auxiliary controls such as the | |
17631 ISO 6429 direction specification. These are octet-oriented, with the | |
17632 single exception (AFAIK) of the "exit Unicode" sequence which uses the | |
17633 UTF's natural width (1 byte for UTF-7 and UTF-8, 2 bytes for UCS-2 and | |
17634 UTF-16, and 4 bytes for UCS-4 and UTF-32). This will be treated as a | |
17635 (deprecated) special case in Unicode processing. | |
17636 | |
17637 The middle layer is ISO 2022 character interpretation. This will depend | |
17638 on the current state of the ISO 2022 registers, and assembles octets | |
17639 into the character's internal representation. | |
17640 | |
17641 The lowest level is translating system control conventions. At present | |
17642 this is restricted to newline translation, but one could imagine doing | |
17643 tab conversion or line wrapping here. "Escape from Unicode" processing | |
17644 would be done at this level. | |
17645 | |
17646 At each level the parser will verify the syntax. In the case of a | |
17647 syntax error or warning (such as a redundant escape sequence that affects | |
17648 no characters), the parser will take some action, typically inserting the | |
17649 erroneous octets directly into the output and creating an annotation | |
17650 which can be used by higher level I/O to mark the affected region. | |
17651 | |
17652 This should make it possible to do something sensible about separating | |
17653 newline convention processing from character construction, and about | |
17654 preventing ISO 2022 escape sequences from being recognized | |
17655 inappropriately. | |
17656 | |
17657 The basic strategy will be to have octet classification tables, and | |
17658 switch processing according to the table entry. | |
17659 | |
17660 It's possible that, by doing the processing with tables of functions or | |
17661 the like, the parser can be used for both detection and translation. | |
17662 | |
17663 @subheading Handling Writing a File Safely, Without Data Loss | |
17664 | |
17665 From ben: | |
17666 | |
17667 @quotation | |
17668 When writing a file, we need error detection; otherwise somebody | |
17669 will create a Unicode file without realizing the coding system | |
17670 of the buffer is Raw, and then lose all the non-ASCII/Latin-1 | |
17671 text when it's written out. We need two levels | |
17672 | |
17673 @enumerate | |
17674 @item | |
17675 first, a "safe-charset" level that checks before any actual | |
17676 encoding to see if all characters in the document can safely | |
17677 be represented using the given coding system. FSF has a | |
17678 "safe-charset" property of coding systems, but it's stupid | |
17679 because this information can be automatically derived from | |
17680 the coding system, at least the vast majority of the time. | |
17681 What we need is some sort of | |
17682 alternative-coding-system-precedence-list, langenv-specific, | |
17683 where everything on it can be checked for safe charsets and | |
17684 then the user given a list of possibilities. When the user | |
17685 does "save with specified encoding", they should see the same | |
17686 precedence list. Again like with other precedence lists, | |
17687 there's also a global one, and presumably all coding systems | |
17688 not on other list get appended to the end (and perhaps not | |
17689 checked at all when doing safe-checking?). safe-checking | |
17690 should work something like this: compile a list of all | |
17691 charsets used in the buffer, along with a count of chars | |
17692 used. that way, "slightly unsafe" coding systems can perhaps | |
17693 be presented at the end, which will lose only a few characters | |
17694 and are perhaps what the users were looking for. | |
17695 | |
17696 [sjt sez this whole step is a crock. If a universal coding system | |
17697 is unacceptable, the user had better know what he/she is doing, | |
17698 and explicitly specify a lossy encoding. | |
17699 In principle, we can simply check for characters being writable as | |
17700 we go along. Eg, via an "unrepresentable character handler." We | |
17701 still have the buffer contents. If we can't successfully save, | |
17702 then ask the user what to do. (Do we ever simply destroy previous | |
17703 file version before completing a write?)] | |
17704 | |
17705 @item | |
17706 when actually writing out, we need error checking in case an | |
17707 individual char in a charset can't be written even though the | |
17708 charsets are safe. again, the user gets the choice of other | |
17709 reasonable coding systems. | |
17710 | |
17711 [sjt -- something is very confused, here; safe charsets should be | |
17712 defined as those charsets all of whose characters can be encoded.] | |
17713 | |
17714 @item | |
17715 same thing (error checking, list of alternatives, etc.) needs | |
17716 to happen when reading! all of this will be a lot of work! | |
17717 @end enumerate | |
17718 @end quotation | |
17719 | |
17720 --ben | |
17721 | |
17722 I don't much like Ben's scheme. First, this isn't an issue of I/O, | |
17723 it's a coding issue. It can happen in many places, not just on stream | |
17724 I/O. Error checking should take place on all translations. Second, | |
17725 the two-pass algorithm should be avoided if possible. In some cases | |
17726 (eg, output to a tty) we won't be able to go back and change the | |
17727 previously output data. Third, the whole idea of having a buffer full | |
17728 of arbitrary characters which we're going to somehow shoehorn into a | |
17729 file based on some twit user's less than informed idea of a coding system | |
17730 is kind of laughable from the start. If we're going to say that a buffer | |
17731 has a coding system, shouldn't we enforce restrictions on what you can | |
17732 put into it? Fourth, what's the point of having safe charsets if some | |
17733 of the characters in them are unsafe? Fifth, what makes you think we're | |
17734 going to have a list of charsets? It seems to me that there might be | |
17735 reasons to have user-defined charsets (eg, "German" vs "French" subsets | |
17736 of ISO 8859/15). Sixth, the idea of having language environment determine | |
17737 precedence doesn't seem very useful to me. Users who are working with a | |
17738 language that corresponds to the language environment are not going to | |
17739 run into safe charsets problems. It's users who are outside of their | |
17740 usual language environment who run into trouble. Also, the reason for | |
17741 specifying anything other than a universal coding system is normally | |
17742 restrictions imposed by other users or applications. Seventh, the | |
17743 statistical feedback isn't terribly useful. Users rarely "want" a | |
17744 coding system, they want their file saved in a useful way. We could | |
17745 add a FORCE argument to conversions for those who really want a specific | |
17746 coding system. But mostly, a user might want to edit out a few unsafe | |
17747 characters. So (up to some maximum) we should keep a list of unsafe | |
17748 text positions, and provide a convenient function for traversing them. | |
17749 | |
17750 --sjt | |
17751 | |
17752 @node Future Work -- BIDI Support, Future Work -- Localized Text/Messages, Future Work -- Conversion Error Detection, Future Work -- Byte Code Snippets | |
17753 @subsection Future Work -- BIDI Support | |
17754 @cindex future work, bidi support | |
17755 @cindex bidi support, future work | |
17756 | |
17757 @enumerate | |
17758 @item | |
17759 Use text properties to handle nesting levels, overrides | |
17760 BIDI-specific text properties (as per Unicode BIDI algorithm) | |
17761 computed at text insertion time. | |
17762 | |
17763 @item | |
17764 Lisp API for reordering a display line at redisplay time, | |
17765 possibly substitution of different glyphs (esp. mirroring of | |
17766 glyphs). | |
17767 | |
17768 @item | |
17769 Lisp API called after a display line is laid out, but only when | |
17770 reordering may be necessary (display engine checks for | |
17771 non-uniform BIDI text properties; can handle internally a line | |
17772 that's completely in one direction) | |
17773 | |
17774 @item | |
17775 Default direction is a buffer-local variable | |
17776 | |
17777 @item | |
17778 We concentrate on implementing Unicode BIDI algorithm. | |
17779 | |
17780 @item | |
17781 Display support for mirroring of entire window | |
17782 | |
17783 @item | |
17784 Display code keeps track of mirroring junctures so it can | |
17785 display double cursor. | |
17786 | |
17787 @item | |
17788 Entire layout of screen (on a per window basis) is exported as a | |
17789 Lisp API, for visual editing (also very useful for other | |
17790 purposes e.g. proper handling of word wrapping with proportional | |
17791 fonts, complex Lisp layout engines e.g. W3) | |
17792 | |
17793 @item | |
17794 Logical, visual, etc. cursor movement handled entirely in Lisp, | |
17795 using aforementioned API, plus a specifier for controlling how | |
17796 cursor is shown (e.g. split or not). | |
17797 @end enumerate | |
17798 | |
17799 @node Future Work -- Localized Text/Messages, , Future Work -- BIDI Support, Future Work -- Byte Code Snippets | |
17800 @subsection Future Work -- Localized Text/Messages | |
17801 @cindex future work, localized text/messages | |
17802 @cindex localized text/messages, future work | |
17803 | |
17804 NOTE: There is existing message translation in X Windows of menu names. | |
17805 This is handled through X resources. The files are in | |
17806 @file{PACKAGES/mule-packages/locale/app-defaults/LOCALE/Emacs}, where | |
17807 @var{locale} is @samp{ja}, @samp{fr}, etc. | |
17808 | |
17809 See lib-src/make-msgfile.lex. | |
17810 | |
17811 Long comment from jwz, some additions from ben marked "ben": | |
17812 | |
17813 (much of this comment is outdated, and a lot of it is actually | |
17814 implemented) | |
17815 | |
17816 @subsection Proposal for How This All Ought to Work | |
17817 | |
17818 this isn't implemented yet, but this is the plan-in-progress | |
17819 | |
17820 In general, it's accepted that the best way to internationalize is for all | |
17821 messages to be referred to by a symbolic name (or number) and come out of a | |
17822 table or tables, which are easy to change. | |
17823 | |
17824 However, with Emacs, we've got the task of internationalizing a huge body | |
17825 of existing code, which already contains messages internally. | |
17826 | |
17827 For the C code we've got two options: | |
17828 | |
17829 @itemize @bullet | |
17830 @item | |
17831 Use a Sun-like @code{gettext()} form, which takes an "english" string which | |
17832 appears literally in the source, and uses that as a hash key to find | |
17833 a translated string; | |
17834 @item | |
17835 Rip all of the strings out and put them in a table. | |
17836 @end itemize | |
17837 | |
17838 In this case, it's desirable to make as few changes as possible to the C | |
17839 code, to make it easier to merge the code with the FSF version of emacs | |
17840 which won't ever have these changes made to it. So we should go with the | |
17841 former option. | |
17842 | |
17843 The way it has been done (between 19.8 and 19.9) was to use @code{gettext()}, but | |
17844 @strong{also} to make massive changes to the source code. The goal now is to use | |
17845 @code{gettext()} at run-time and yet not require a textual change to every line | |
17846 in the C code which contains a string constant. A possible way to do this | |
17847 is described below. | |
17848 | |
17849 (@code{gettext()} can be implemented in terms of @code{catgets()} for non-Sun systems, so | |
17850 that in itself isn't a problem.) | |
17851 | |
17852 For the Lisp code, we've got basically the same options: put everything in | |
17853 a table, or translate things implicitly. | |
17854 | |
17855 Another kink that lisp code introduces is that there are thousands of third- | |
17856 party packages, so changing the source for all of those is simply not an | |
17857 option. | |
17858 | |
17859 Is it a goal that if some third party package displays a message which is | |
17860 one we know how to translate, then we translate it? I think this is a | |
17861 worthy goal. It remains to be seen how well it will work in practice. | |
17862 | |
17863 So, we should endeavor to minimize the impact on the lisp code. Certain | |
17864 primitive lisp routines (the stuff in lisp/prim/, and especially in | |
17865 cmdloop.el and minibuf.el) may need to be changed to know about translation, | |
17866 but that's an ideologically clean thing to do because those are considered | |
17867 a part of the emacs substrate. | |
17868 | |
17869 However, if we find ourselves wanting to make changes to, say, RMAIL, then | |
17870 something has gone wrong. (Except to do things like remove assumptions | |
17871 about the order of words within a sentence, or how pluralization works.) | |
17872 | |
17873 There are two parts to the task of displaying translated strings to the | |
17874 user: the first is to extract the strings which need to be translated from | |
17875 the sources; and the second is to make some call which will translate those | |
17876 strings before they are presented to the user. | |
17877 | |
17878 The old way was to use the same form to do both, that is, @code{GETTEXT()} was both | |
17879 the tag that we searched for to build a catalog, and was the form which did | |
17880 the translation. The new plan is to separate these two things more: the | |
17881 tags that we search for to build the catalog will be stuff that was in there | |
17882 already, and the translation will get done in some more centralized, lower | |
17883 level place. | |
17884 | |
17885 This program (make-msgfile.c) addresses the first part, extracting the | |
17886 strings. | |
17887 | |
17888 For the emacs C code, we need to recognize the following patterns: | |
17889 | |
17890 @example | |
17891 message ("string" ... ) | |
17892 error ("string") | |
17893 report_file_error ("string" ... ) | |
17894 signal_simple_error ("string" ... ) | |
17895 signal_simple_error_2 ("string" ... ) | |
17896 | |
17897 build_translated_string ("string") | |
17898 #### add this and use it instead of @code{build_string()} in some places. | |
17899 | |
17900 yes_or_no_p ("string" ... ) | |
17901 #### add this instead of funcalling Qyes_or_no_p directly. | |
17902 | |
17903 barf_or_query_if_file_exists #### restructure this | |
17904 check all callers of Fsignal #### restructure these | |
17905 signal_error (Qerror ... ) #### change all of these to @code{error()} | |
17906 | |
17907 And we also parse out the @code{interactive} prompts from @code{DEFUN()} forms. | |
17908 | |
17909 #### When we've got a string which is a candidate for translation, we | |
17910 should ignore it if it contains only format directives, that is, if | |
17911 there are no alphabetic characters in it that are not a part of a `%' | |
17912 directive. (Careful not to translate either "%s%s" or "%s: ".) | |
17913 @end example | |
17914 | |
17915 For the emacs Lisp code, we need to recognize the following patterns: | |
17916 | |
17917 @example | |
17918 (message "string" ... ) | |
17919 (error "string" ... ) | |
17920 (format "string" ... ) | |
17921 (read-from-minibuffer "string" ... ) | |
17922 (read-shell-command "string" ... ) | |
17923 (y-or-n-p "string" ... ) | |
17924 (yes-or-no-p "string" ... ) | |
17925 (read-file-name "string" ... ) | |
17926 (temp-minibuffer-message "string") | |
17927 (query-replace-read-args "string" ... ) | |
17928 @end example | |
17929 | |
17930 I expect there will be a lot like the above; basically, any function which | |
17931 is a commonly used wrapper around an eventual call to @code{message} or | |
17932 @code{read-from-minibuffer} needs to be recognized by this program. | |
17933 | |
17934 | |
17935 @example | |
17936 (dgettext "domain-name" "string") #### do we still need this? | |
17937 | |
17938 things that should probably be restructured: | |
17939 @code{princ} in cmdloop.el | |
17940 @code{insert} in debug.el | |
17941 face-interactive | |
17942 help.el, syntax.el all messed up | |
17943 @end example | |
17944 | |
17945 ben: (format) is a tricky case. If I use format to create a string | |
17946 that I then send to a file, I probably don't want the string translated. | |
17947 On the other hand, If the string gets used as an argument to (y-or-n-p) | |
17948 or some such function, I do want it translated, and it needs to be | |
17949 translated before the %s and such are replaced. The proper solution | |
17950 here is for (format) and other functions that call gettext but don't | |
17951 immediately output the string to the user to add the translated (and | |
17952 formatted) string as a string property of the object, and have | |
17953 functions that output potentially translated strings look for a | |
17954 "translated string" property. Of course, this will fail if someone | |
17955 does something like | |
17956 | |
17957 @example | |
17958 (y-or-n-p (concat (if you-p "Do you " "Does he ") | |
17959 (format "want to delete %s? " filename)))) | |
17960 @end example | |
17961 | |
17962 But you shouldn't be doing things like this anyway. | |
17963 | |
17964 ben: Also, to avoid excessive translating, strings should be marked | |
17965 as translated once they get translated, and further calls to gettext | |
17966 don't do any more translating. Otherwise, a call like | |
17967 | |
17968 @example | |
17969 (y-or-n-p (format "Delete %s? " filename)) | |
17970 @end example | |
17971 | |
17972 would cause translation on both the pre-formatted and post-formatted | |
17973 strings, which could lead to weird results in some cases (y-or-n-p | |
17974 has to translate its argument because someone could pass a string to | |
17975 it directly). Note that the "translating too much" solution outlined | |
17976 below could be implemented by just marking all strings that don't | |
17977 come from a .el or .elc file as already translated. | |
17978 | |
17979 Menu descriptors: one way to extract the strings in menu labels would be | |
17980 to teach this program about "^(defvar .*menu\n" forms; that's probably | |
17981 kind of hard, though, so perhaps a better approach would be to make this | |
17982 program recognize lines of the form | |
17983 | |
17984 @example | |
17985 "string" ... ;###translate | |
17986 @end example | |
17987 | |
17988 where the magic token ";###translate" on a line means that the string | |
17989 constant on this line should go into the message catalog. This is analogous | |
17990 to the magic ";###autoload" comments, and to the magic comments used in the | |
17991 EPSF structuring conventions. | |
17992 | |
17993 ----- | |
17994 So this program manages to build up a catalog of strings to be translated. | |
17995 To address the second part of the problem, of actually looking up the | |
17996 translations, there are hooks in a small number of low level places in | |
17997 emacs. | |
17998 | |
17999 Assume the existence of a C function gettext(str) which returns the | |
18000 translation of @var{str} if there is one, otherwise returns @var{str}. | |
18001 | |
18002 @itemize @bullet | |
18003 @item | |
18004 @code{message()} takes a char* as its argument, and always filters it through | |
18005 @code{gettext()} before displaying it. | |
18006 | |
18007 @item | |
18008 errors are printed by running the lisp function @code{display-error} which | |
18009 doesn't call @code{message} directly (it princ's to streams), so it must be | |
18010 carefully coded to translate its arguments. This is only a few lines | |
18011 of code. | |
18012 | |
18013 @item | |
18014 @code{Fread_minibuffer_internal()} is the lowest level interface to all minibuf | |
18015 interactions, so it is responsible for translating the value that will go | |
18016 into Vminibuf_prompt. | |
18017 | |
18018 @item | |
18019 Fpopup_menu filters the menu titles through @code{gettext()}. | |
18020 | |
18021 The above take care of 99% of all messages the user ever sees. | |
18022 | |
18023 @item | |
18024 The lisp function temp-minibuffer-message translates its arg. | |
18025 | |
18026 @item | |
18027 query-replace-read-args is funny; it does | |
18028 (setq from (read-from-minibuffer (format "%s: " string) ... )) | |
18029 (setq to (read-from-minibuffer (format "%s %s with: " string from) ... )) | |
18030 @end itemize | |
18031 | |
18032 What should we do about this? We could hack query-replace-read-args to | |
18033 translate its args, but might this be a more general problem? I don't | |
18034 think we ought to translate all calls to format. We could just change | |
18035 the calling sequence, since this is odd in that the first %s wants to be | |
18036 translated but the second doesn't. | |
18037 | |
18038 Solving the "translating too much" problem: | |
18039 | |
18040 The concern has been raised that in this situation: | |
18041 | |
18042 @itemize @bullet | |
18043 @item | |
18044 "Help" is a string for which we know a translation; | |
18045 @item | |
18046 someone visits a file called Help, and someone does something | |
18047 contrived like (error buffer-file-name) | |
18048 @end itemize | |
18049 | |
18050 then we would display the translation of Help, which would not be correct. | |
18051 We can solve this by adding a bit to Lisp_String objects which identifies | |
18052 them as having been read as literal constants from a .el or .elc file (as | |
18053 opposed to having been constructed at run time as it would in the above | |
18054 case.) To solve this: | |
18055 | |
18056 @example | |
18057 - @code{Fmessage()} takes a lisp string as its first argument. | |
18058 If that string is a constant, that is, was read from a source file | |
18059 as a literal, then it calls @code{message()} with it, which translates. | |
18060 Otherwise, it calls @code{message_no_translate()}, which does not translate. | |
18061 | |
18062 - @code{Ferror()} (actually, @code{Fsignal()} when condition is Qerror) works similarly. | |
18063 @end example | |
18064 | |
18065 More specifically, we do: | |
18066 | |
18067 @quotation | |
18068 Scan specified C and Lisp files, extracting the following messages: | |
18069 | |
18070 @example | |
18071 C files: | |
18072 GETTEXT (...) | |
18073 DEFER_GETTEXT (...) | |
18074 DEFUN interactive prompts | |
18075 Lisp files: | |
18076 (gettext ...) | |
18077 (dgettext "domain-name" ...) | |
18078 (defer-gettext ...) | |
18079 (interactive ...) | |
18080 @end example | |
18081 | |
18082 The arguments given to this program are all the C and Lisp source files | |
18083 of GNU Emacs. .el and .c files are allowed. There is no support for .elc | |
18084 files at this time, but they may be specified; the corresponding .el file | |
18085 will be used. Similarly, .o files can also be specified, and the corresponding | |
18086 .c file will be used. This helps the makefile pass the correct list of files. | |
18087 | |
18088 The results, which go to standard output or to a file specified with -a or -o | |
18089 (-a to append, -o to start from nothing), are quoted strings wrapped in | |
18090 gettext(...). The results can be passed to xgettext to produce a .po message | |
18091 file. | |
18092 | |
18093 However, we also need to do the following: | |
18094 | |
18095 @enumerate | |
18096 @item | |
18097 Definition of Arg below won't handle a generalized argument | |
18098 as might appear in a function call. This is fine for DEFUN | |
18099 and friends, because only simple arguments appear there; but | |
18100 it might run into problems if Arg is used for other sorts | |
18101 of functions. | |
18102 @item | |
18103 @code{snarf()} should be modified so that it doesn't output null | |
18104 strings and non-textual strings (see the comment at the top | |
18105 of make-msgfile.c). | |
18106 @item | |
18107 parsing of (insert) should snarf all of the arguments. | |
18108 @item | |
18109 need to add set-keymap-prompt and deal with gettext of that. | |
18110 @item | |
18111 parsing of arguments should snarf all strings anywhere within | |
18112 the arguments, rather than just looking for a string as the | |
18113 argument. This allows if statements as arguments to get parsed. | |
18114 @item | |
18115 @code{begin_paren_counting()} et al. should handle recursive entry. | |
18116 @item | |
18117 handle set-window-buffer and other such functions that take | |
18118 a buffer as the other-than-first argument. | |
18119 @item | |
18120 there is a fair amount of work to be done on the C code. | |
18121 Look through the code for #### comments associated with | |
18122 '#ifdef I18N3' or with an I18N3 nearby. | |
18123 @item | |
18124 Deal with @code{get-buffer-process} et al. | |
18125 @item | |
18126 Many of the changes in the Lisp code marked | |
18127 'rewritten for I18N3 snarfing' should be undone once (5) is | |
18128 implemented. | |
18129 @item | |
18130 Go through the Lisp code in prim and make sure that all | |
18131 strings are gettexted as necessary. This may reveal more | |
18132 things to implement. | |
18133 @item | |
18134 Do the equivalent of (8) for the Lisp code. | |
18135 @item | |
18136 Deal with parsing of menu specifications. | |
18137 @end enumerate | |
18138 @end quotation | |
18139 | |
18140 @node Future Work -- Lisp Stream API, Future Work -- Multiple Values, Future Work -- Byte Code Snippets, Future Work | |
18141 @section Future Work -- Lisp Stream API | |
18142 @cindex future work, Lisp stream API | |
18143 @cindex Lisp stream API, future work | |
18144 | |
18145 Expose XEmacs internal lstreams to Lisp as stream objects. (In | |
18146 addition to the functions given below, each stream object has | |
18147 properties that can be associated with it using the standard put, get | |
18148 etc. API. For GNU Emacs, where put and get have not been extended to | |
18149 be general property functions, but work only on strings, we would have | |
18150 to create functions set-stream-property, stream-property, | |
18151 remove-stream-property, and stream-properties. These provide the same | |
18152 functionality as the generic get, put, remprop, and object-plist | |
18153 functions under XEmacs) | |
18154 | |
18155 (Implement properties using a hash table, and @strong{generalize} this so | |
18156 that it is extremely easy to add a property interface onto any kind | |
18157 of object) | |
18158 | |
18159 @example | |
18160 (write-stream STREAM STRING) | |
18161 @end example | |
18162 | |
18163 Write the STRING to the STREAM. This will signal an error if all the | |
18164 bytes cannot be written. | |
18165 | |
18166 @example | |
18167 (read-stream STREAM &optional N SEQUENCE) | |
18168 @end example | |
18169 | |
18170 Reads data from STREAM. N specifies the number of bytes or | |
18171 characters, depending on the stream. SEQUENCE specifies where to | |
18172 write the data into. If N is not specified, data is read until end of | |
18173 file. If SEQUENCE is not specified, the data is returned as a stream. | |
18174 If SEQUENCE is specified, the SEQUENCE must be large enough to hold | |
18175 the data. | |
18176 | |
18177 @example | |
18178 (push-stream-marker STREAM) | |
18179 @end example | |
18180 | |
18181 returns ID, probably a stream marker object | |
18182 | |
18183 @example | |
18184 (pop-stream-marker STREAM) | |
18185 @end example | |
18186 | |
18187 backs up stream to last marker | |
18188 | |
18189 @example | |
18190 (unread-stream STREAM STRING) | |
18191 @end example | |
18192 | |
18193 The only valid STREAM is an input stream in which case the data in | |
18194 STRING is pushed back and will be read ahead of all other data. In | |
18195 general, there is no limit to the amount of data that can be unread or | |
18196 the number of times that unread-stream can be called before another | |
18197 read. | |
18198 | |
18199 @example | |
18200 (stream-available-chars STREAM) | |
18201 @end example | |
18202 | |
18203 This returns the number of characters (or bytes) that can definitely | |
18204 be read from the screen without an error. This can be useful, for | |
18205 example, when dealing with non-blocking streams when an attempt to | |
18206 read too much data will result in a blocking error. | |
18207 | |
18208 @example | |
18209 (stream-seekable-p STREAM) | |
18210 @end example | |
18211 | |
18212 Returns true if the stream is seekable. If false, operations such as | |
18213 seek-stream and stream-position will signal an error. However, the | |
18214 functions set-stream-marker and seek-stream-marker will still succeed | |
18215 for an input stream. | |
18216 | |
18217 @example | |
18218 (stream-position STREAM) | |
18219 @end example | |
18220 | |
18221 If STREAM is a seekable stream, returns a position which can be passed | |
18222 to seek-stream. | |
18223 | |
18224 @example | |
18225 (seek-stream STREAM N) | |
18226 @end example | |
18227 | |
18228 If STREAM is a seekable stream, move to the position indicated by N, | |
18229 otherwise signal an error. | |
18230 | |
18231 @example | |
18232 (set-stream-marker STREAM) | |
18233 @end example | |
18234 | |
18235 If STREAM is an input stream, create a marker at the current position, | |
18236 which can later be moved back to. The stream does not need to be a | |
18237 seekable stream. In this case, all successive data will be buffered | |
18238 to simulate the effect of a seekable stream. Therefore use this | |
18239 function with care. | |
18240 | |
18241 @example | |
18242 (seek-stream-marker STREAM marker) | |
18243 @end example | |
18244 | |
18245 Move the stream back to the position that was stored in the marker | |
18246 object. (this is generally an opaque object of type stream-marker). | |
18247 | |
18248 @example | |
18249 (delete-stream-marker MARKER) | |
18250 @end example | |
18251 | |
18252 Destroy the stream marker and if the stream is a non-seekable stream | |
18253 and there are no other stream markers pointing to an earlier position, | |
18254 frees up some buffering information. | |
18255 | |
18256 @example | |
18257 (delete-stream STREAM N) | |
18258 @end example | |
18259 | |
18260 @example | |
18261 (delete-stream-marker STREAM ID) | |
18262 @end example | |
18263 | |
18264 @example | |
18265 (close-stream stream) | |
18266 @end example | |
18267 | |
18268 Writes any remaining data to the stream and closes it and the object | |
18269 to which it's attached. This also happens automatically when the | |
18270 stream is garbage collected. | |
18271 | |
18272 @example | |
18273 (getchar-stream STREAM) | |
18274 @end example | |
18275 | |
18276 Return a single character from the stream. (This may be a single byte | |
18277 depending on the nature of the stream). This is actually a macro with | |
18278 an extremely efficient implementation (as efficient as you can get in | |
18279 Emacs Lisp), so that this can be used without fear in a loop. The | |
18280 implementation works by reading a large amount of data into a vector | |
18281 and then simply using the function AREF to read characters one by one | |
18282 from the vector. Because AREF is one of the primitives handled | |
18283 specially by the byte interpreter, this will be very efficient. The | |
18284 actual implementation may in fact use the function | |
18285 call-with-condition-handler to avoid the necessity of checking for | |
18286 overflow. Its typical implementation is to fetch the vector | |
18287 containing the characters as a stream property, as well as the index | |
18288 into that vector. Then it retrieves the character and increments the | |
18289 value and stores it back in the stream. As a first implementation, we | |
18290 check to see when we are reading the character whether the character | |
18291 would be out of range. If so, we read another 4096 characters, | |
18292 storing them into the same vector, setting the index back to the | |
18293 beginning, and then proceeding with the rest of the getchar algorithm. | |
18294 | |
18295 @example | |
18296 (putchar-stream STREAM CHAR) | |
18297 @end example | |
18298 | |
18299 This is similar to getchar-stream but it writes data instead of | |
18300 reading data. | |
18301 | |
18302 @example | |
18303 Function make-stream | |
18304 @end example | |
18305 | |
18306 There are actually two stream-creation functions, which are: | |
18307 | |
18308 @example | |
18309 (make-input-stream TYPE PROPERTIES) | |
18310 (make-output-stream TYPE PROPERTIES) | |
18311 @end example | |
18312 | |
18313 These can be used to create a stream that reads data, or writes data, | |
18314 respectively. PROPERTIES is a property list and the allowable | |
18315 properties in it are defined by the type. Possible types are: | |
18316 | |
18317 @enumerate | |
18318 @item | |
18319 @code{file} (this reads data from a file or writes to a file) | |
18320 | |
18321 Allowable properties are: | |
18322 | |
18323 @table @code | |
18324 @item :file-name | |
18325 (the name of the file) | |
18326 | |
18327 @item :create | |
18328 (for output streams only, creates the file if it doesn't | |
18329 already exist) | |
18330 | |
18331 @item :exclusive | |
18332 (for output streams only, fails if the file already | |
18333 exists) | |
18334 | |
18335 @item :append | |
18336 (for output streams only; starts appending to the end | |
18337 of the file rather than overwriting the file) | |
18338 | |
18339 @item :offset | |
18340 (positions in bytes in the file where reading or writing | |
18341 should begin. If unspecified, defaults to the beginning of the | |
18342 file or to the end of the file when :appended specified) | |
18343 | |
18344 @item :count | |
18345 (for input streams only, the number of bytes to read from | |
18346 the file before signaling "end of file". If nil or omitted, the | |
18347 number of bytes is unlimited) | |
18348 | |
18349 @item :non-blocking | |
18350 (if true, reads or writes will fail if the operation | |
18351 would block. This only makes sense for non-regular files). | |
18352 @end table | |
18353 | |
18354 @item | |
18355 @code{process} (For output streams only, send data to a process.) | |
18356 | |
18357 Allowable properties are: | |
18358 | |
18359 @table @code | |
18360 @item :process | |
18361 (the process object) | |
18362 @end table | |
18363 | |
18364 @item | |
18365 @code{buffer} (Read from or write to a buffer.) | |
18366 | |
18367 Allowable properties are: | |
18368 | |
18369 @table @code | |
18370 @item :buffer | |
18371 (the name of the buffer or the buffer object.) | |
18372 | |
18373 @item :start | |
18374 (the position to start reading from or writing to. If nil, | |
18375 use the buffer point. If true, use the buffer's point and move | |
18376 point beyond the end of the data read or written.) | |
18377 | |
18378 @item :end | |
18379 (only for input streams, the position to stop reading at. If | |
18380 nil, continue to the end of the buffer.) | |
18381 | |
18382 @item :ignore-accessible | |
18383 (if true, the default for :start and :end | |
18384 ignore any narrowing of the buffer.) | |
18385 @end table | |
18386 | |
18387 @item | |
18388 @code{stream} (read from or write to a lisp stream) | |
18389 | |
18390 Allowable properties are: | |
18391 | |
18392 @table @code | |
18393 @item :stream | |
18394 (the stream object) | |
18395 | |
18396 @item :offset | |
18397 (the position to begin to be reading from or writing to) | |
18398 | |
18399 @item :length | |
18400 (For input streams only, the amount of data to read, | |
18401 defaulting to the rest of the data in the string. Revise string | |
18402 for output streams only if true, the stream is resized as | |
18403 necessary to accommodate data written off the end, otherwise the | |
18404 writes will fail. | |
18405 @end table | |
18406 | |
18407 @item | |
18408 @code{memory} (For output only, writes data to an internal memory | |
18409 buffer. This is more lightweight than using a Lisp buffer. The | |
18410 function memory-stream-string can be used to convert the memory | |
18411 into a string.) | |
18412 | |
18413 @item | |
18414 @code{debugging} (For output streams only, write data to the debugging | |
18415 output.) | |
18416 | |
18417 @item | |
18418 @code{stream-device} (During non-interactive invocations only, Read | |
18419 from or write to the initial stream terminal device.) | |
18420 | |
18421 @item | |
18422 @code{function} (For output streams only, send data by calling a | |
18423 function, exactly as with the STREAM argument to the print | |
18424 primitive.) | |
18425 | |
18426 Allowable Properties are: | |
18427 | |
18428 @table @code | |
18429 @item :function | |
18430 (the function to call. The function is called with one | |
18431 argument, the stream.) | |
18432 @end table | |
18433 | |
18434 @item | |
18435 @code{marker} (Write data to the location pointed to by a marker and | |
18436 move the marker past the data.) | |
18437 | |
18438 Allowable properties are: | |
18439 | |
18440 @table @code | |
18441 @item :marker | |
18442 (the marker object.) | |
18443 @end table | |
18444 | |
18445 @item | |
18446 @code{decoding} (As an input stream, reads data from another stream and | |
18447 decodes it according to a coding system. As an output stream | |
18448 decodes the data written to it according to a coding system and | |
18449 then writes results in another stream.) | |
18450 | |
18451 Properties are: | |
18452 | |
18453 @table @code | |
18454 @item :coding-system | |
18455 (the symbol of coding system object, which defines the | |
18456 decoding.) | |
18457 | |
18458 @item :stream | |
18459 (the stream on the other end.) | |
18460 @end table | |
18461 | |
18462 @item | |
18463 @code{encoding} (As an input stream, reads data from another stream and | |
18464 encodes it according to a coding system. As an output stream | |
18465 encodes the data written to it according to a coding system and | |
18466 then writes results in another stream.) | |
18467 | |
18468 Properties are: | |
18469 | |
18470 @table @code | |
18471 @item :coding-system | |
18472 (the symbol of coding system object, which defines the | |
18473 encoding.) | |
18474 | |
18475 @item :stream | |
18476 (the stream on the other end.) | |
18477 @end table | |
18478 @end enumerate | |
18479 | |
18480 Consider | |
18481 | |
18482 @example | |
18483 (define-stream-type 'type | |
18484 :read-function | |
18485 :write-function | |
18486 :rewind- | |
18487 :seek- | |
18488 :tell- | |
18489 (?:buffer) | |
18490 @end example | |
18491 | |
18492 Old Notes: | |
18493 | |
18494 Expose lstreams as hash (put get etc. properties) table. | |
18495 | |
18496 @example | |
18497 (write-stream stream string) | |
18498 (read-stream stream &optional n sequence) | |
18499 (make-stream ...) | |
18500 (push-stream-marker stream) | |
18501 returns ID prob a stream marker object | |
18502 (pop-stream-marker stream) | |
18503 backs up stream to last marker | |
18504 (unread-stream stream string) | |
18505 (stream-available-chars stream) | |
18506 (seek-stream stream n) | |
18507 (delete-stream stream n) | |
18508 (delete-stream-marker stream ic) can always be poe only nested if you | |
18509 have set stream marker | |
18510 | |
18511 (get-char-stream @strong{generalizes} stream) | |
18512 | |
18513 a macro that tries to be efficient perhaps by reading the next | |
18514 e.g. 512 characters into a vector and arefing them. Might check aref | |
18515 optimization for vectors in the byte interpreter. | |
18516 | |
18517 (make-stream 'process :process ... :type write) | |
18518 | |
18519 Consider | |
18520 | |
18521 (define-stream-type 'type | |
18522 :read-function | |
18523 :write-function | |
18524 :rewind- | |
18525 :seek- | |
18526 :tell- | |
18527 (?:buffer) | |
18528 @end example | |
18529 | |
18530 @node Future Work -- Multiple Values, Future Work -- Macros, Future Work -- Lisp Stream API, Future Work | |
18531 @section Future Work -- Multiple Values | |
18532 @cindex future work, multiple values | |
18533 @cindex multiple values, future work | |
18534 | |
18535 On low level, all funs that can return multiple values are defined | |
18536 with DEFUN_MULTIPLE_VALUES and have an extra parameter, a struct | |
18537 mv_context *. | |
18538 | |
18539 It has to be this way to ensure that only the fun itself, and no called | |
18540 funs, think they're called in an mv context. | |
18541 | |
18542 apply, funcall, eval might propagate their mv context to their | |
18543 children? | |
18544 | |
18545 Might need eval-mv to implement calling a fun in an mv context. Maybe | |
18546 also funcall_mv? apply_mv? | |
18547 | |
18548 Generally, just set up context appropriately. Call fun (noticing | |
18549 whether it's an mv-aware fun) and binding values on the way back or | |
18550 passing them out. (e.g. to multiple-value-bind) | |
18551 | |
18552 @subheading Common Lisp multiple values, required for specifier improvements. | |
18553 | |
18554 The multiple return values from get-specifier should allow the | |
18555 specifier value to be modified in the correct fashion (i.e. should | |
18556 interact correctly with all manner of changes from other callers) | |
18557 using set-specifier. We should check this and see if we need other | |
18558 return values. (how-to-add? inst-list?) | |
18559 | |
18560 In C, call multiple-values-context to get number of expected values, | |
18561 and multiple-value-set (#, value) to get values other than the first. | |
18562 | |
18563 (Returns Qno_value, or something, if there are no values. | |
18564 | |
18565 #### Or should throw? Probably not. | |
18566 #### What happens if a fn returns no values but the caller expects a | |
18567 #### value? | |
18568 | |
18569 Something like @code{funcall_with_multiple_values()} for setting up the | |
18570 context. | |
18571 | |
18572 For efficiency, byte code could notice Ffuncall to m.v. functions and | |
18573 sub in special opcodes during load in processing, if it mattered. | |
18574 | |
18575 @node Future Work -- Macros, Future Work -- Specifiers, Future Work -- Multiple Values, Future Work | |
18576 @section Future Work -- Macros | |
18577 @cindex future work, macros | |
18578 @cindex macros, future work | |
18579 | |
18580 @enumerate | |
18581 @item | |
18582 Option to control whether beep really kills a macro execution. | |
18583 @item | |
18584 Recently defined macros are remembered on a stack, so accidentally | |
18585 defining another one doesn't fuck you up. You can "rotate" | |
18586 anonymous macros or just pick one (numbered) to put on tags, so it | |
18587 works with execute macro - menu shows the anonymous macro, and | |
18588 lists some keystrokes. Normally numbered but you can easily assign | |
18589 to named fun or to keyboard sequence or give it a number (or give | |
18590 it a letter accelerator?) | |
18591 @end enumerate | |
18592 | |
18593 @node Future Work -- Specifiers, Future Work -- Display Tables, Future Work -- Macros, Future Work | |
18594 @section Future Work -- Specifiers | |
18595 @cindex future work, specifiers | |
18596 @cindex specifiers, future work | |
18597 | |
18598 @subheading Ideas To Work On When Their Time Has Come | |
18599 | |
18600 @itemize | |
18601 @item | |
18602 specifier-instance returns additional params (multiple-value) - the instantiator | |
18603 used, the associated tag set, the locale found in, a code that can | |
18604 be passed in as an additional param RESTART to restart an | |
18605 instantiation process, e.g. to allow an instantiator to "inherit" | |
18606 from another one higher up. Also, domain can be 'global (look only | |
18607 in global specs) or "complex" - a list of the actual locales to look | |
18608 in (e.g. a buffer - frame - a device - 'global) | |
18609 | |
18610 @item | |
18611 pragmatic-specifier-domain (locale) | |
18612 Converts a locale into a domain in a way that's "pragmatic" - does | |
18613 what most users expect will happen, but is not clean. In | |
18614 particular, handling of "buffer" requires trickiness, as mentioned | |
18615 before. | |
18616 | |
18617 @item | |
18618 ensure-instantiator-exists (specifier locale) | |
18619 Ensures an actual instantiator exists in a locale, so that it can | |
18620 later be futzed with. If none exists, one is constructed by first | |
18621 calling pragmatic-specifier domain and then specifier-instance and | |
18622 fetching out the instantiator for this call. | |
18623 | |
18624 @item | |
18625 map-modifying-instantiators (specifier fun &optional locale tag-set) | |
18626 Same args as map-specifier, but use the return value from the fun to | |
18627 replace the instantiator. Called with three args (instantiator | |
18628 locale tag-set) | |
18629 | |
18630 @item | |
18631 map-modifying-instantiators-force (specifier fun &optional locale tag-set) | |
18632 Same as previous, but calls ensure-instantiator-exists on each | |
18633 locale before processing. | |
18634 @end itemize | |
18635 | |
18636 NOTE: Can do preliminary implementation without Multiple Values - | |
18637 instead create fun specifier-instance - that returns a list (and will | |
18638 be deleted at some point) | |
18639 | |
18640 @subheading specifier &c changes for glyphs | |
18641 | |
18642 @enumerate | |
18643 @item | |
18644 @itemize @bullet | |
18645 @item | |
18646 resizable vectors with funs to insert, delete elements (elements | |
18647 shift accordingly) | |
18648 @item | |
18649 gap array vectors as an implementation of resizing vectors. | |
18650 @end itemize | |
18651 | |
18652 @item | |
18653 You can @code{put} @code{get}, etc. on vectors to modify properties within | |
18654 them. | |
18655 | |
18656 @item | |
18657 copy-over routines | |
18658 routines that carefully copy one complex item OVER another one, | |
18659 destroying the second in the process. I wrote one for lists. Need | |
18660 a general copy-over-tree. | |
18661 | |
18662 @item | |
18663 improvement to specifier mapping routines e.g. | |
18664 | |
18665 map-modifying-instantiator and its force versions below, so that we | |
18666 could implement in turns. | |
18667 | |
18668 @item | |
18669 put-specifier-property (specifier which finds the key, value | |
18670 instantiator in the locale, &opt locale possibly creating one | |
18671 tag-set) if necessary and goes into the vector, changes it, and | |
18672 puts it back into the specifier. | |
18673 | |
18674 @item | |
18675 Smarter add-spec-to-specifier | |
18676 | |
18677 If it notices that it's just replacing one instantiator with | |
18678 another, instead of just copy-tree the first one and throw away the | |
18679 other, use copy-over-tree to save lots of garbage when repeatedly | |
18680 called. | |
18681 | |
18682 ILLEGIBLE: GOTO LOO BUI BUGS LAST PNOTE | |
18683 | |
18684 @item | |
18685 When at image instantiate: | |
18686 @itemize @bullet | |
18687 @item | |
18688 Some properties in the instantiators could be implemented through | |
18689 dynamically modifying an existing image instance (e.g. when the | |
18690 value of a slider or progress bar or text in a text field | |
18691 changes). So when we hash, we only hash the part of the | |
18692 instantiator that cannot be dynamically modified (We might need | |
18693 to do something tricky here - allowing a :key property in hash | |
18694 tables or @strong{ILLEGIBLE}). Anyway, so we need to generate an image | |
18695 instance, and we mask off the dynamic properties and look up in | |
18696 our hash table, and we get something back! But is it ours to | |
18697 modify? (We already checked to see it wasn't exactly the same | |
18698 dynamic properties that it had) Thus --- | |
18699 @end itemize | |
18700 | |
18701 @item | |
18702 Reference counting. Somehow or other, each image instance in the | |
18703 cache needs to keep track of the instantiators that generated it. | |
18704 @end enumerate | |
18705 | |
18706 It might do this through some sort of special instantiator-reference | |
18707 object. This points to the instantiator, where in the hierarchy the | |
18708 instantiator is etc. When an instantiator gets removed, this | |
18709 gu*ILLEGIBLE* values report not attached. Somehow that gets | |
18710 communicated back to the image instance in the cache. So somehow or | |
18711 other, the image instance in the cache knows who's using them and so | |
18712 when you go and keep updating the slider value, by simply modifying an | |
18713 instantiator, which efficiently changes the internal structure of this | |
18714 specifier - eventually image instantiate notices that the image | |
18715 instance it points has no other user and just modifiers it, but in | |
18716 complex situations, some optimizations get lost, but everything is | |
18717 still correct. | |
18718 | |
18719 vs. | |
18720 | |
18721 Andy's set-image-instance-property, which achieves the same | |
18722 optimizations much more easily, but | |
18723 | |
18724 @enumerate | |
18725 @item | |
18726 falls apart in any more complicated system | |
18727 | |
18728 @item | |
18729 only works because of the way the caching system in XEmacs works. | |
18730 Any change (e.g. @strong{ILLEGIBLE} more of making the caches GQ instead | |
18731 of GQ) is likely to make things stop working right in all but the | |
18732 simplest situation. | |
18733 @end enumerate | |
18734 | |
18735 @subheading Specifier improvements for support of specifier inheritance (necessary for the new font mapping API) | |
18736 | |
18737 'Fallback should be a locale/domain. | |
18738 | |
18739 @example | |
18740 (get-specifier specifier &optional locale) | |
18741 | |
18742 #### If locale is omitted, should it be (current-buffer) or 'global? | |
18743 #### Should argument not be optional? | |
18744 @end example | |
18745 | |
18746 If a buffer is specified: find a window showing buffer by looking | |
18747 | |
18748 @itemize @bullet | |
18749 @item | |
18750 at selected window | |
18751 @item | |
18752 at other windows on selected frame | |
18753 @item | |
18754 at selected windows on other frames in selected device | |
18755 @item | |
18756 at other windows on "" | |
18757 @item | |
18758 at selected windows on selected frames on other devices in selected | |
18759 console. | |
18760 @item | |
18761 other windows sel from other devices sel con | |
18762 @item | |
18763 "" oth "" sel | |
18764 @item | |
18765 sel win sel from sel dev oth con | |
18766 @item | |
18767 oth win sel from sel dev oth con | |
18768 @item | |
18769 sel win oth from sel dev oth con | |
18770 @item | |
18771 oth win oth from sel dev oth con | |
18772 @item | |
18773 sel win sel from oth dev oth con | |
18774 @item | |
18775 oth win sel from oth dev oth con | |
18776 @item | |
18777 oth win oth from oth dev oth con | |
18778 @end itemize | |
18779 | |
18780 If none, use buffer -> sel from -> etc. | |
18781 | |
18782 @example | |
18783 Returns multiple values | |
18784 second is instantiator | |
18785 third is locale containing inst. | |
18786 fourth is tag set | |
18787 | |
18788 (restart-specifier-instance ...) | |
18789 @end example | |
18790 | |
18791 like specifier-instance, but allows restarting the lookup, for | |
18792 implementing inheritance, etc. Obsoletes | |
18793 specifier-matching-find-charset, or whatever it is. The restart | |
18794 argument is opaque, and is returned as a multiple value of | |
18795 restart-specifier-instance. (It's actually an integer with the low | |
18796 bits holding the locale and the other bits count int to the list) | |
18797 attached to the locale.) | |
18798 | |
18799 @node Future Work -- Display Tables, Future Work -- Making Elisp Function Calls Faster, Future Work -- Specifiers, Future Work | |
18800 @section Future Work -- Display Tables | |
18801 @cindex future work, display tables | |
18802 @cindex display tables, future work | |
18803 | |
18804 #### It would also be really nice if you could specify that the | |
18805 characters come out in hex instead of in octal. Mule does that by | |
18806 adding a @code{ctl-hexa} variable similar to @code{ctl-arrow}, but | |
18807 that's bogus -- we need a more general solution. I think you need to | |
18808 extend the concept of display tables into a more general conversion | |
18809 mechanism. Ideally you could specify a Lisp function that converts | |
18810 characters, but this violates the Second Golden Rule and besides would | |
18811 make things way way way way slow. | |
18812 | |
18813 So instead, we extend the display-table concept, which was historically | |
18814 limited to 256-byte vectors, to one of the following: | |
18815 | |
18816 @enumerate | |
18817 @item | |
18818 A 256-entry vector, for backward compatibility; | |
18819 @item | |
18820 char-table, mapping characters to values; | |
18821 @item | |
18822 range-table, mapping ranges of characters to values; | |
18823 @item | |
18824 a list of the above. | |
18825 @end enumerate | |
18826 | |
18827 The fourth option allows you to specify multiple display tables instead | |
18828 of just one. Each display table can specify conversions for some | |
18829 characters and leave others unchanged. The way the character gets | |
18830 displayed is determined by the first display table with a binding for | |
18831 that character. This way, you could call a function | |
18832 @code{enable-hex-display} that adds a hex display-table to the list of | |
18833 display tables for the current buffer. | |
18834 | |
18835 #### ...not yet implemented... Also, we extend the concept of "mapping" | |
18836 to include a printf-like spec. Thus you can make all extended | |
18837 characters show up as hex with a display table like this: | |
18838 | |
18839 @example | |
18840 #s(range-table data ((256 524288) (format "%x"))) | |
18841 @end example | |
18842 | |
18843 Since more than one display table is possible, you have | |
18844 great flexibility in mapping ranges of characters. | |
18845 | |
18846 @uref{../../www.666.com/ben/default.htm,Ben Wing} | |
18847 | |
18848 @node Future Work -- Making Elisp Function Calls Faster, Future Work -- Lisp Engine Replacement, Future Work -- Display Tables, Future Work | |
18849 @section Future Work -- Making Elisp Function Calls Faster | |
18850 @cindex future work, making Elisp function calls faster | |
18851 @cindex making Elisp function calls faster, future work | |
18852 | |
18853 @strong{Abstract: }This page describes many optimizations that can be | |
18854 made to the existing Elisp function call mechanism without too much | |
18855 effort. The most important optimizations can probably be implemented | |
18856 with only a day or two of work. I think it's important to do this work | |
18857 regardless of whether we eventually decide to replace the Lisp engine. | |
18858 | |
18859 Many complaints have been made about the speed of Elisp, and in | |
18860 particular about the slowness in executing function calls, and rightly | |
18861 so. If you look at the implementation of the @code{funcall} function, | |
18862 you'll notice that it does an incredible amount of work. Now logically, | |
18863 it doesn't need to be so. Let's look first from the theoretical | |
18864 standpoint at what absolutely needs to be done to call a Lisp function. | |
18865 | |
18866 First, let's look at the situation that would exist if we were smart | |
18867 enough to have made lexical scoping be the default language policy. We | |
18868 know at compile time exactly which code can reference the variables that | |
18869 are the formal parameters for the function being called (specifically, | |
18870 only the code that is part of that function's definition) and where | |
18871 these references are. As a result, we can simply push all the values of | |
18872 the variables onto a stack, and convert all the variable references in | |
18873 the function definition into stack references. Therefore, binding | |
18874 lexically-scoped parameters in preparation for a function call involves | |
18875 nothing more than pushing the values of the parameters onto a stack and | |
18876 then setting a new value for the frame pointer, at the same time | |
18877 remembering the old one. Because the byte-code interpreter has a | |
18878 stack-based architecture, however, the parameter values have already | |
18879 been pushed onto the stack at the time of the function call invocation. | |
18880 Therefore, binding the variables involves doing nothing at all, other | |
18881 than dealing with the frame pointer. | |
18882 | |
18883 With dynamic scoping, the situation is somewhat more complicated. | |
18884 Because the parameters can be referenced anywhere, and these references | |
18885 cannot be located at compile time, their values have to be stored into a | |
18886 global table that maps the name of the parameter to its current value. | |
18887 In Elisp, this table is called the @dfn{obarray}. Variable binding in | |
18888 Elisp is done using the C function @code{specbind()}. (This stands for | |
18889 "special variable binding" where @dfn{special} is the standard Lisp | |
18890 terminology for a dynamically-scoped variable.) What @code{specbind()} | |
18891 does, essentially, is retrieve the old value of the variable out of the | |
18892 obarray, remember the value by pushing it, along with the name of the | |
18893 variable, onto what's called the @dfn{specpdl} stack, and then store the | |
18894 new value into the obarray. The term "specpdl" means @dfn{Special | |
18895 Variable Pushdown List}, where @dfn{Pushdown List} is an archaic computer | |
18896 science term for a stack that used to be popular at MIT. These binding | |
18897 operations, however, should still not take very much time because of the | |
18898 use of symbols, i.e. because the location in the obarray where the | |
18899 variable's value is stored has already been determined (specifically, it | |
18900 was determined at the time that the byte code was loaded and the symbol | |
18901 created), so no expensive hash table lookups need to be performed. | |
18902 | |
18903 An actual function invocation in Elisp does a great deal more work, | |
18904 however, than was just outlined above. Let's just take a look at what | |
18905 happens when one byte-compiled function invokes another byte-compiled | |
18906 function, checking for places where unnecessary work is being done and | |
18907 determining how to optimize these places. | |
18908 | |
18909 @enumerate | |
18910 @item | |
18911 | |
18912 The byte-compiled function's parameter list is stored in exactly the | |
18913 format that the programmer entered it in, which is to say as a Lisp | |
18914 list, complete with @code{&optional} and @code{&rest} keywords. | |
18915 This list has to be parsed for @emph{every} function invocation, which | |
18916 means that for every element in a list, the element is checked to see | |
18917 whether it's the @code{&optional} or @code{&rest} keywords, its | |
18918 surrounding cons cell is checked to make sure that it is indeed a cons | |
18919 cell, the @code{QUIT} macro is called, etc. What should be happening | |
18920 here is that the argument list is parsed exactly once, at the time that | |
18921 the byte code is loaded, and converted into a C array. The C array | |
18922 should be stored as part of the byte-code object. The C array should | |
18923 also contain, in addition to the symbols themselves, the number of | |
18924 required and optional arguments. At function call time, the C array can | |
18925 be very quickly retrieved and processed. | |
18926 @item | |
18927 | |
18928 For every variable that is to be bound, the @code{specbind()} function | |
18929 is called. This actually does quite a lot of things, including: | |
18930 | |
18931 @enumerate | |
18932 @item | |
18933 | |
18934 Checking the symbol argument to the function to make sure it's actually | |
18935 a symbol. | |
18936 @item | |
18937 | |
18938 Checking for specpdl stack overflow, and increasing its size as | |
18939 necessary. | |
18940 @item | |
18941 | |
18942 Calling @code{symbol_value_buffer_local_info()} to retrieve buffer local | |
18943 information for the symbol, and then processing the return value from | |
18944 this function in a series of if statements. | |
18945 @item | |
18946 | |
18947 Actually storing the old value onto the specpdl stack. | |
18948 @item | |
18949 | |
18950 Calling @code{Fset()} to change the variable's value. | |
18951 | |
18952 @end enumerate | |
18953 | |
18954 | |
18955 @end enumerate | |
18956 | |
18957 | |
18958 | |
18959 The entire series of calls to @code{specbind()} should be inline and | |
18960 merged into the argument processing code as a single tight loop, with no | |
18961 function calls in the vast majority of cases. The @code{specbind()} | |
18962 logic should be streamlined as follows: | |
18963 | |
18964 @enumerate | |
18965 @item | |
18966 | |
18967 The symbol argument type checking is unnecessary. | |
18968 @item | |
18969 | |
18970 The check for the specpdl stack overflow needs to be done only once, not | |
18971 once per argument. | |
18972 @item | |
18973 | |
18974 All of the remaining logic should be boiled down as follows: | |
18975 | |
18976 @enumerate | |
18977 @item | |
18978 | |
18979 Retrieve the old value from the symbol's value cell. | |
18980 @item | |
18981 | |
18982 If this value is a symbol-value-magic object, then call the real | |
18983 @code{specbind()} to do the work. | |
18984 @item | |
18985 | |
18986 Otherwise, we know that nothing complicated needs to be done, so we | |
18987 simply push the symbol and its value onto the specpdl stack, and then | |
18988 replace the value in the symbol's value cell. | |
18989 @item | |
18990 | |
18991 The only logic that we are omitting is the code in @code{Fset()} that | |
18992 checks to make sure a constant isn't being set. These checks should be | |
18993 made at the time that the byte code for the function is loaded and the C | |
18994 array of parameters to the function is created. (Whether a symbol is | |
18995 constant or not is generally known at XEmacs compile time. The only | |
18996 issue here is with symbols whose names begin with a colon. These | |
18997 symbols should simply be disallowed completely as parameter names.) | |
18998 | |
18999 @end enumerate | |
19000 | |
19001 | |
19002 @end enumerate | |
19003 | |
19004 | |
19005 | |
19006 Other optimizations that could be done are: | |
19007 | |
19008 @itemize | |
19009 @item | |
19010 | |
19011 At the beginning of the function that implements the byte-code | |
19012 interpreter (this is the Lisp primitive @code{byte-code}), the string | |
19013 containing the actual byte code is converted into an array of integers. | |
19014 I added this code specifically for MULE so that the byte-code engine | |
19015 didn't have to deal with the complexities of the internal string format | |
19016 for text. This conversion, however, is generally useful because on | |
19017 modern processors accessing 32-bit values out of an array is | |
19018 significantly faster than accessing unaligned 8-bit values. This | |
19019 conversion takes time, though, and should be done once at load time | |
19020 rather than each time the byte code is executed. This array should be | |
19021 stored in the byte-code object. Currently, this is a bit tricky to do, | |
19022 because @code{byte-code} is not actually passed the byte-code object, | |
19023 but rather three of its elements. We can't just change @code{byte-code} | |
19024 so that it is directly passed the byte-code object because this | |
19025 function, with its existing argument calling pattern, is called directly | |
19026 from compiled Elisp files. What we can and should do, however, is | |
19027 create a subfunction that does take a byte-code object and actually | |
19028 implements the byte-code interpreter engine. Whenever the C code wants | |
19029 to execute byte code, it calls this subfunction. @code{byte-code} | |
19030 itself also calls this subfunction after conjuring up an appropriate | |
19031 byte-code object and storing its arguments into this object. With a | |
19032 small amount of work, it's possible to do this conjuring in such a way | |
19033 that it doesn't generate any garbage. | |
19034 @item | |
19035 | |
19036 At the end of a function call, the parameter bindings that have been | |
19037 done need to be undone. This is standardly done by calling | |
19038 @code{unbind_to()}. Just as for a @code{specbind()}, this function does | |
19039 a lot of work that is unnecessary in the vast majority of cases, and it | |
19040 could also be inlined and streamlined. | |
19041 @item | |
19042 | |
19043 As part of each Elisp function call, a whole bunch of checks are done | |
19044 for a series of unlikely but possible conditions that may occur. These | |
19045 include, for example, | |
19046 | |
19047 @itemize | |
19048 @item | |
19049 | |
19050 Calling the @code{QUIT} macro, which essentially involves | |
19051 checking a global volatile variable to see whether additional processing | |
19052 needs to be done. | |
19053 @item | |
19054 | |
19055 Checking whether a garbage collection needs to be done. | |
19056 @item | |
19057 | |
19058 Checking the variable @code{debug_on_next_call}. | |
19059 @item | |
19060 | |
19061 Checking for whether Elisp profiling is active. (An additional | |
19062 optimization that's perhaps not worth the effort is to do some | |
19063 post-processing on the array of integers after it has been converted. | |
19064 For example, whenever a 16-bit value occurs in the byte code, it has | |
19065 to be encoded as two separate 8-bit values. These values could be | |
19066 combined. The tricky part here is that all of the places where a goto | |
19067 occurs across the place where this modification is made would have to | |
19068 have their offsets changed. Other such optimizations can easily be | |
19069 imagined as well.) | |
19070 | |
19071 @end itemize | |
19072 | |
19073 @item | |
19074 | |
19075 With a little bit smarter code, it should be possible to make a | |
19076 single trip variable that indicates whether any of these conditions is | |
19077 true. This variable would be updated by any code that changes the | |
19078 actual variables whose values are checked in the various checks just | |
19079 mentioned. (By the way, all of this is occurring in the C function | |
19080 @code{funcall_recording_as()}.) There is a little bit of code | |
19081 between each of the checks. This code would simply have to be | |
19082 duplicated between the two cases where this general trip variable is | |
19083 true and is false. (Note: the optimization detailed in this item is | |
19084 probably not worth doing on the first pass.) | |
19085 | |
19086 @end itemize | |
19087 | |
19088 @uref{../../www.666.com/ben/default.htm,Ben Wing} | |
19089 | |
19090 @node Future Work -- Lisp Engine Replacement, , Future Work -- Making Elisp Function Calls Faster, Future Work | |
19091 @section Future Work -- Lisp Engine Replacement | |
19092 @cindex future work, lisp engine replacement | |
19093 @cindex lisp engine replacement, future work | |
19094 | |
19095 @menu | |
19096 * Future Work -- Lisp Engine Discussion:: | |
19097 * Future Work -- Lisp Engine Replacement -- Implementation:: | |
19098 @end menu | |
19099 | |
19100 @node Future Work -- Lisp Engine Discussion, Future Work -- Lisp Engine Replacement -- Implementation, Future Work -- Lisp Engine Replacement, Future Work -- Lisp Engine Replacement | |
19101 @subsection Future Work -- Lisp Engine Discussion | |
19102 @cindex future work, lisp engine discussion | |
19103 @cindex lisp engine discussion, future work | |
19104 | |
19105 | |
19106 @strong{Abstract: }Recently there has been a great deal of talk on the | |
19107 XEmacs mailing lists about potential changes to the XEmacs Lisp engine. | |
19108 Usually the discussion has centered around the question which is better, | |
19109 Common Lisp or Scheme? This is certainly an interesting debate topic, | |
19110 but it didn't seem to have much practical relevance to me, so I vowed to | |
19111 stay out of the discussion. Recently, however, it seems that people are | |
19112 losing sight of the broader picture. For example, nobody seems to be | |
19113 asking the question, ``"Would an extension language other than Lisp or | |
19114 Scheme (perhaps not a Lisp variant at all) be more appropriate?"'' Nor | |
19115 does anybody seem to be addressing what I consider to be the most | |
19116 fundamental question, is changing the extension language a good thing to | |
19117 do? | |
19118 | |
19119 I think it would be a mistake at this point in XEmacs development to | |
19120 begin any project involving fundamental changes to the Lisp engine or to | |
19121 the XEmacs Lisp language itself. It would take a huge amount of effort | |
19122 to complete even part of this project, and would be a major drain on the | |
19123 already-insufficient resources of the XEmacs development community. | |
19124 Most of the gains that are purported to stem from a project such as this | |
19125 could be obtained with far less effort by making more incremental | |
19126 changes to the XEmacs core. I think it would be an even bigger mistake | |
19127 to change the actual XEmacs extension language (as opposed to just | |
19128 changing the Lisp engine, making few, if any, externally visible | |
19129 changes). The only language change that I could possibly imagine | |
19130 justifying would involve switching to some ubiquitous web language, such | |
19131 as Java and JavaScript, or Perl. (Even among those, I think Java would | |
19132 be the only possibility that really makes sense). | |
19133 | |
19134 In the rest of this document I'll present the broader issues that would | |
19135 be involved in changing the Lisp engine or extension language. This | |
19136 should make clear why I've come to believe as I do. | |
19137 | |
19138 @subheading Is everyone clear on the difference between interface and implementation? | |
19139 | |
19140 There seems to be a great deal of confusion concerning the difference | |
19141 between interface and implementation. In the context of XEmacs, | |
19142 changing the interface means switching to a different extension language | |
19143 such as Common Lisp, Scheme, Java, etc. Changing the implementation | |
19144 means using a different Lisp engine. There is obviously some relation | |
19145 between these two issues, but there is no particular requirement that | |
19146 one be changed if the other is changed. It is quite possible, for | |
19147 example, to imagine taking the underlying engine for any of the various | |
19148 Lisp dialects in existence, and adapting it so that it implements the | |
19149 same Elisp extension language that currently exists. The vast majority | |
19150 of the purported benefits that we would get from changing the extension | |
19151 language could just as easily be obtained while making minimal changes | |
19152 to the external Elisp interface. This way nearly all existing Elisp | |
19153 programs would continue to work, there would be no need to translate | |
19154 Elisp programs into some other language or to simultaneously support two | |
19155 incompatible Lisp variants, and there would be no need for users or | |
19156 package authors to learn a new extension language that would be just as | |
19157 unfamiliar to the vast majority of them as Elisp is. | |
19158 | |
19159 @subheading Why should we change the Lisp engine? | |
19160 | |
19161 Let's go over the possible reasons for changing the Lisp engine. | |
19162 | |
19163 @subsubheading Speed. | |
19164 | |
19165 Changing the Lisp engine might make XEmacs faster. However, | |
19166 consider the following. | |
19167 | |
19168 @enumerate | |
19169 @item | |
19170 | |
19171 XEmacs will get faster over time without any development effort at all | |
19172 because computers will get faster. | |
19173 @item | |
19174 | |
19175 Perhaps the biggest causes of the slowness of XEmacs are not related to | |
19176 the Lisp engine at all. It has been asserted, for example, that the | |
19177 slowness of XEmacs is primarily due to the redisplay mechanism, to the | |
19178 handling of insertion and deletion of text in a buffer, to the event | |
19179 loop, etc. Nobody has done any real studies to determine what the | |
19180 actual cause of slowness is. | |
19181 @item | |
19182 | |
19183 Emacs 18 seems plenty fast enough to most people. However, Emacs 18 | |
19184 also had a worse Lisp engine and a worse byte compiler than XEmacs. | |
19185 @item | |
19186 | |
19187 Significant speed increases in the execution of Lisp code could be | |
19188 achieved without too much effort by working on the existing byte code | |
19189 interpreter and function call mechanism a bit. | |
19190 | |
19191 @end enumerate | |
19192 | |
19193 @subsubheading Memory usage. | |
19194 | |
19195 A new Lisp engine with a better garbage collection mechanism might make | |
19196 more efficient use of memory; for example, through the use of a | |
19197 relocating garbage collector. However, consider this: | |
19198 | |
19199 @enumerate | |
19200 @item | |
19201 | |
19202 A new Lisp engine would probably have a larger memory footprint, perhaps | |
19203 a significantly larger one. | |
19204 @item | |
19205 | |
19206 The worst memory problems might not be due to Lisp object inefficiency | |
19207 at all. The problems could simply be due mainly to the inefficient | |
19208 buffer representation. Nobody has come up with any concrete numbers on | |
19209 where the real problem lies. | |
19210 | |
19211 @end enumerate | |
19212 | |
19213 @subsubheading Robustness. | |
19214 | |
19215 A new Lisp engine might well be more robust. (On the other hand, it | |
19216 might not be. It is not always easy to tell). However, I think that | |
19217 the biggest problems with robustness are in the part of the C code that | |
19218 is not concerned with implementing the Lisp engine. The redisplay | |
19219 mechanism and the unexec mechanism are probably the biggest sources of | |
19220 robustness problems. I think the biggest robustness problems that are | |
19221 related to the Lisp engine concern the use of GCPRO declarations. The | |
19222 entire GCPRO mechanism is ill-conceived and unsafe. The only real way | |
19223 to make this safe would be to do conservative garbage collection over | |
19224 the C stack and to eliminate the GCPRO declarations entirely. But how | |
19225 many of the Lisp engines that are being considered have such a mechanism | |
19226 built into them? | |
19227 | |
19228 | |
19229 @subsubheading Maintainability. | |
19230 | |
19231 A new Lisp engine might well improve the maintainability of XEmacs by | |
19232 offloading the maintenance of the Lisp engine. However, we need to make | |
19233 very sure that this is, in fact, the case before embarking on a project | |
19234 like this. We would almost certainly have to make significant | |
19235 modifications to any Lisp engine that we choose to integrate, and | |
19236 without the active and committed support and cooperation of the | |
19237 developers of that Lisp engine, the maintainability problem would | |
19238 actually get worse. | |
19239 | |
19240 @subsubheading Features. | |
19241 | |
19242 A new Lisp engine might have built in support for various features that | |
19243 we would like to add to the XEmacs extension language, such as lexical | |
19244 scoping and an object system. | |
19245 | |
19246 @subheading Why would we want to change the extension language? | |
19247 | |
19248 Possible reasons for changing the extension language include: | |
19249 | |
19250 @subsubheading More standard. | |
19251 | |
19252 Switching to a language that is more standard and more commonly in use | |
19253 would be beneficial for various reasons. First of all, the language | |
19254 that is more commonly used and more familiar would make it easier for | |
19255 users to write their own extensions and in general, increase the | |
19256 acceptance of XEmacs. Also, an accepted standard probably has had a lot | |
19257 more thought put into it than any language interface created by the | |
19258 XEmacs developers themselves. Furthermore, if our extension language is | |
19259 being actively developed and supported, much of the work that we would | |
19260 otherwise have to do ourselves is transferred elsewhere. | |
19261 | |
19262 However, both Scheme and Common Lisp flunk the familiarity test. | |
19263 Neither language is being actively used for program development outside | |
19264 of small research communities, and few prospective authors of XEmacs | |
19265 extensions will be familiar with any Lisp variant for real world uses. | |
19266 (I consider the argument that Scheme is often used in introductory | |
19267 programming courses to be irrelevant. Many existing programmers were | |
19268 taught Pascal in their introductory programming courses. How many of | |
19269 them would actually be comfortable writing a program in Pascal?) | |
19270 Furthermore, someone who wants to learn Lisp can't exactly go to their | |
19271 neighborhood bookstore and pick up a book on this topic. | |
19272 | |
19273 @subsubheading Ease of use. | |
19274 | |
19275 There are endless arguments about which language is easiest to use. In | |
19276 practice, this largely boils down to which languages are most familiar. | |
19277 | |
19278 @subsubheading Object oriented. | |
19279 | |
19280 The object-oriented paradigm is the dominant one in use today for new | |
19281 languages. User interface concepts in particular are expressed very | |
19282 naturally in an object-oriented system. However, neither Scheme nor | |
19283 Common Lisp has been designed with object orientation in mind. There is | |
19284 a standard object system for Common Lisp, but it is extremely complex | |
19285 and difficult to understand. | |
19286 | |
19287 | |
19288 @uref{../../www.666.com/ben/default.htm,Ben Wing} | |
19289 | |
19290 | |
19291 @node Future Work -- Lisp Engine Replacement -- Implementation, , Future Work -- Lisp Engine Discussion, Future Work -- Lisp Engine Replacement | |
19292 @subsection Future Work -- Lisp Engine Replacement -- Implementation | |
19293 @cindex future work, lisp engine replacement, implementation | |
19294 @cindex lisp engine replacement, implementation, future work | |
19295 | |
19296 Let's take a look at the sort of work that would be required if we were | |
19297 to replace the existing Elisp engine in XEmacs with some other engine, | |
19298 for example, the Clisp engine. I'm assuming here, of course, that we | |
19299 are not going to be changing the interface here at the same time, which | |
19300 is to say that we will be keeping the same Elisp language that we | |
19301 currently have as the extension language for XEmacs, except perhaps for | |
19302 incremental changes that we will make, such as lexical scoping and | |
19303 proper structure support in an attempt to gradually move the language | |
19304 towards an upwardly-compatible goal, such as Common Lisp. I am writing | |
19305 this page primarily as food for thought. I feel fairly strongly that | |
19306 actually doing this work would be a big waste of effort that would | |
19307 inevitably become a huge time sink on the part of nearly everyone | |
19308 involved in XEmacs development, and not only for the ones who were | |
19309 supposed to be actually doing the engine change. I feel that most of | |
19310 the desired changes that we want for the language and/or the engine can | |
19311 be achieved with much less effort and time through incremental changes | |
19312 to the existing code base. | |
19313 | |
19314 First of all, in order to make a successful Lisp engine change in | |
19315 XEmacs, it is vitally important that the work be done through a series | |
19316 of incremental stages where at the end of each stage XEmacs can be | |
19317 compiled and run, and it works. It is tempting to try to make the | |
19318 change all at once, but this would be disastrous. If the resulting | |
19319 product worked at all, it would inevitably contain a huge number of | |
19320 subtle and extremely difficult to track down bugs, and it would be next | |
19321 to impossible to determine which of the myriad changes made introduced | |
19322 the bug. | |
19323 | |
19324 Now let's look at what the possible stages of implementation could be. | |
19325 | |
19326 @subsubheading An Extra C Preprocessing Stage | |
19327 | |
19328 The first step would be to introduce another preprocessing stage for the | |
19329 XEmacs C code, which is done before the C compiler itself is invoked on | |
19330 the code, and before the standard C preprocessor runs. The C | |
19331 preprocessor is simply not powerful enough to do many of the things we | |
19332 would like to do in the C code. The existing results of this have been | |
19333 a combination of a lot of hacked up and tricky-to-maintain stuff (such | |
19334 as the @code{DEFUN} macro, and the associated @code{DEFSUBR}), as well | |
19335 as code constructs that are difficult to write. (Consider for example, | |
19336 attempting to do structured exception handling, such as catch/throw and | |
19337 unwind-protect constructs), as well as code that is potentially or | |
19338 actually unsafe (such as the uses of @code{alloca}), which could easily | |
19339 cause stack overflow with large amounts of memory allocated in this | |
19340 fashion.) The problem is that the C preprocessor does not allow macros | |
19341 to have the power of an actual language, such as C or Lisp. What our | |
19342 own preprocessor should do is allow us to define macros, whose | |
19343 definitions are simply functions written in some language which are | |
19344 executed at compile time, and whose arguments are the actual argument | |
19345 for the macro call, as well as an environment which should have a data | |
19346 structure representation of the C code in the file and allow this | |
19347 environment to be queried and modified. It can be debated what the | |
19348 language should be that these extensions are written in. Whatever the | |
19349 language chosen, it needs to be a very standard language and a language | |
19350 whose compiler or interpreter is available on all of the platforms that | |
19351 we could ever possibly consider putting XEmacs to, which is basically to | |
19352 say all the platforms in existence. One obvious choice is C, because | |
19353 there will obviously be a C compiler available, because it is needed to | |
19354 compile XEmacs itself. Another possibility is Perl, which is already | |
19355 installed on most systems, and is universally available on all others. | |
19356 This language has powerful text processing facilities which would | |
19357 probably make it possible to implement the macro definitions more | |
19358 quickly and easily; however, this might also encourage bad coding | |
19359 practices in the macros (often simple text processing is not | |
19360 appropriate, and more sophisticated parsing or recursive data structure | |
19361 processing needs to be done instead), and we'd have to make sure that | |
19362 the nested data structure that comprises the environment could be | |
19363 represented well in Perl. Elisp would not be a good choice because it | |
19364 would create a bootstrapping problem. Other possible languages, such as | |
19365 Python, are not appropriate, because most programmers are unfamiliar | |
19366 with this language (creating a maintainability problem) and the Python | |
19367 interpreter would have to be included and compiled as part of the XEmacs | |
19368 compilation process (another maintainability problem). Java is still | |
19369 too much in flux to be considered at this point. | |
19370 | |
19371 The macro facility that we will provide needs to add two features to the | |
19372 language: the ability to define a macro, and the ability to call a | |
19373 macro. One good way of doing this would be to make use of special | |
19374 characters that have no meaning in the C language (or in C++ for that | |
19375 matter), and thus can never appear in a C file outside of comments and | |
19376 strings. Two obvious characters are the @@ sign and the $ sign. We | |
19377 could, for example, use @code{@@} defined to define new macros, and the | |
19378 @code{$} sign followed by the macro name to call a macro. (Proponents | |
19379 of Perl will note that both of these characters have a meaning in Perl. | |
19380 This should not be a problem, however, because the way that macros are | |
19381 defined and called inside of another macro should not be through the use | |
19382 of any special characters which would in effect be extending the macro | |
19383 language, but through function calls made in the normal way for the | |
19384 language.) | |
19385 | |
19386 The program that actually implements this extra preprocessing stage | |
19387 needs to know a certain amount about how to parse C code. In | |
19388 particular, it needs to know how to recognize comments, strings, | |
19389 character constants, and perhaps certain other kinds of C tokens, and | |
19390 needs to be able to parse C code down to the statement level. (This is | |
19391 to say it needs to be able to parse function definitions and to separate | |
19392 out the statements, @code{if} blocks, @code{while} blocks, etc. within | |
19393 these definitions. It probably doesn't, however need to parse the | |
19394 contents of a C expression.) The preprocessing program should work | |
19395 first by parsing the entire file into a data structure (which may just | |
19396 contain expressions in the form of literal strings rather than a data | |
19397 structure representing the parsed expression). This data structure | |
19398 should become the environment parameter that is passed as an argument to | |
19399 macros as mentioned above. The implementation of the parsing could and | |
19400 probably should be done using @code{lex} and @code{yacc}. One good idea | |
19401 is simply to steal some of the @code{lex} and @code{yacc} code that is | |
19402 part of GCC. | |
19403 | |
19404 Here are some possibilities that could be implemented as part of the | |
19405 preprocessing: | |
19406 | |
19407 @enumerate | |
19408 @item | |
19409 | |
19410 A proper way of doing the @code{DEFUN} macros. These could, for | |
19411 example, take an argument list in the form of a Lisp argument list | |
19412 (complete with keyword parameters and other complex features) and | |
19413 automatically generate the appropriate @code{subr} structure, the | |
19414 appropriate C function definition header, and the appropriate call to | |
19415 the @code{DEFSUBR} initialization function. | |
19416 @item | |
19417 | |
19418 A truly safe and easy to use implementation of the @code{alloca} | |
19419 function. This could allocate the memory in any fashion it chooses | |
19420 (calling @code{malloc} using a large global array, or a series of such | |
19421 arrays, etc.) an @code{insert} in the appropriate places to | |
19422 automatically free up this memory. (Appropriate places here would be at | |
19423 the end of the function and before any return statements. Non-local | |
19424 exits can be handled in the function that actually implements the | |
19425 non-local exit.) | |
19426 @item | |
19427 | |
19428 If we allow for the possibility of having an arbitrary Lisp engine, we | |
19429 can't necessarily assume that we can call Lisp primitives implemented in | |
19430 C from other C functions by simply making a function all. Perhaps | |
19431 something special needs to happen when this is done. This could be | |
19432 handled fairly easily by having our new and improved @code{DEFUN} macro | |
19433 define a new macro for use when calling a primitive. | |
19434 @end enumerate | |
19435 | |
19436 | |
19437 @subsubheading Make the Existing Lisp Engine be Self-contained. | |
19438 | |
19439 The goal of this stage is to gradually build up a self-contained Lisp | |
19440 engine out of the existing XEmacs core, which has no dependencies on any | |
19441 of the code elsewhere in the XEmacs core, and has a well-defined and | |
19442 black box-style interface. (This is to say that the rest of the C code | |
19443 should not be able to access the implementation of the Lisp engine, and | |
19444 should make as few assumptions as possible about how this implementation | |
19445 works). The Lisp engine could, and probably should, be built up as a | |
19446 separate library which can be compiled on its own without any of the | |
19447 rest of the XEmacs C code, and can be tested in this configuration as | |
19448 well. | |
19449 | |
19450 The creation of this engine library should be done as a series of | |
19451 subsets, each of which moves more code out of the XEmacs core and into | |
19452 the engine library, and XEmacs should be compilable and runnable between | |
19453 each sub-step. One possible series of sub-steps would be to first | |
19454 create an engine that does only object allocation and garbage | |
19455 collection, then as a second sub-step, move in the code that handles | |
19456 symbols, symbol values, and simple binding, and then finally move in the | |
19457 code that handles control structures, function calling, @code{byte-code} | |
19458 execution, exception handling, etc. (It might well be possible to | |
19459 further separate this last sub-step). | |
19460 | |
19461 @subsubheading Removal of Assumptions About the Lisp Engine Implementation | |
19462 | |
19463 Currently, the XEmacs C code makes all sorts of assumptions about the | |
19464 implementation of the Lisp engine, particularly in the areas of object | |
19465 allocation, object representation, and garbage collection. A different | |
19466 Lisp engine may well have different ways of doing these implementations, | |
19467 and thus the XEmacs C code must be rid of any assumptions about these | |
19468 implementations. This is a tough and tedious job, but it needs to be | |
19469 done. Here are some examples: | |
19470 | |
19471 @enumerate | |
19472 @item | |
19473 | |
19474 @code{GCPRO} must go. The @code{GCPRO} mechanism is tedious, | |
19475 error-prone, unmaintainable, and fundamentally unsafe. As anyone who | |
19476 has worked on the C Core of XEmacs knows, figuring out where to insert | |
19477 the @code{GCPRO} calls is an exercise in black magic, and debugging | |
19478 crashes as a result of incorrect @code{GCPROing} is an absolute | |
19479 nightmare. Furthermore, the entire mechanism is fundamentally unsafe. | |
19480 Even if we were to use the extra preprocessing stage detailed above to | |
19481 automatically generate @code{GCPRO} and @code{UNGCPRO} calls for all | |
19482 Lisp object variables occurring anywhere in the C code, there are still | |
19483 places where we could be bitten. Consider, for example, code which | |
19484 calls @code{cons} and where the two arguments to this functions are both | |
19485 calls to the @code{append} function. Now the @code{append} function | |
19486 generates new Lisp objects, and it also calls @code{QUIT}, which could | |
19487 potentially execute arbitrary Lisp code and cause a garbage collection | |
19488 before returning control to the @code{append} function. Now in order to | |
19489 generate the arguments to the @code{cons} function, the @code{append} | |
19490 function is called twice in a row. When the first @code{append} call | |
19491 returns, new Lisp data has been created, but has no @code{GCPRO} | |
19492 pointers to it. If the second @code{append} call causes a garbage | |
19493 collection, the Lisp data from the first @code{append} call will be | |
19494 collected and recycled, which is likely to lead to obscure and | |
19495 impossible-to-debug crashes. The only way around this would be to | |
19496 rewrite all function calls whose parameters are Lisp objects in terms of | |
19497 temporary variables, so that no such function calls ever contain other | |
19498 function calls as arguments. This would not only be annoying to | |
19499 implement, even in a smart preprocessor, but would make the C code | |
19500 become incredibly slow because of all the constant updating of the | |
19501 @code{GCPRO} lists. | |
19502 @item | |
19503 | |
19504 The only proper solution here is to completely do away with the | |
19505 @code{GCPRO} mechanism and simply do conservative garbage collection | |
19506 over the C stack. There are already portable implementations of | |
19507 conservative pointer marking over the C stack, and these could easily be | |
19508 adapted for use in the Elisp garbage collector. If, as outlined above, | |
19509 we use an extra preprocessing stage to create a new version of | |
19510 @code{alloca} that allocates its memory elsewhere than actually on the C | |
19511 stack, and we ensure that we don't declare any large arrays as local | |
19512 variables, but instead use @code{alloca}, then we can be guaranteed that | |
19513 the C stack is small and thus that the conservative pointer marking | |
19514 stage will be fast and not very likely to find false matches. | |
19515 @item | |
19516 | |
19517 Removing the @code{GCPRO} declarations as just outlined would also | |
19518 remove the assumption currently made that garbage collection can occur | |
19519 only in certain places in the C code, rather than in any arbitrary spot. | |
19520 (For example, any time an allocation of Lisp data happens). In order to | |
19521 make things really safe, however, we also have to remove another | |
19522 assumption as detailed in the following item. | |
19523 @item | |
19524 | |
19525 Lisp objects might be relocatable. Currently, the C code assumes that | |
19526 Lisp objects other than string data are not relocatable and therefore | |
19527 it's safe to pass around and hold onto the actual pointers for the C | |
19528 structures that implement the Lisp objects. Current code, for example, | |
19529 assumes that a @code{Lisp_Object} of type buffer and a C pointer to a | |
19530 @code{struct buffer} mean basically the same thing, and indiscriminately | |
19531 passes the two kinds of buffer pointers around. With relocatable Lisp | |
19532 objects, the pointers to the C structures might change at any time. | |
19533 (Remember, we are now assuming that a garbage collection can happen at | |
19534 basically any point). All of the C code needs to be changed so that | |
19535 Lisp objects are always passed around using a Lisp object type, and the | |
19536 underlying pointers are only retrieved at the time when a particular | |
19537 data element out of the structure is needed. (As an aside, here's | |
19538 another reason why Lisp objects, instead of pointers, should always be | |
19539 passed around. If pointers are passed around, it's conceivable that at | |
19540 the time a garbage collection occurs, the only reference to a Lisp | |
19541 object (for example, a deleted buffer) would be in the form of a C | |
19542 pointer rather than a Lisp object. In such a case, the conservative | |
19543 pointer marking mechanism might not notice the reference, especially if, | |
19544 in an attempt to eliminate false matches and make the code generally | |
19545 more efficient, it will be written so that it will look for actual Lisp | |
19546 object references.) | |
19547 @item | |
19548 | |
19549 I would go a step farther and completely eliminate the macros that | |
19550 convert a Lisp object reference into a C pointer. This way the only way | |
19551 to access an element out of a Lisp object would be to use the macro for | |
19552 that element, which in one atomic operation de-references the Lisp | |
19553 object reference and retrieves the value contained in the element. We | |
19554 probably do need the ability to retrieve actual C pointers, though. For | |
19555 example, in the case where an array is stored in a Lisp object, or | |
19556 simply for efficiency purposes where we might want some code to retrieve | |
19557 the C pointer for a Lisp object, and work on that directly to avoid a | |
19558 whole bunch of extra indirections. I think the way to do this would be | |
19559 through the use of a special locking construct implemented as part of | |
19560 the extra preprocessor stage mentioned above. This would essentially be | |
19561 what you might call a @dfn{lock block}, just like a @code{while} block. | |
19562 You'd write the word @code{lock} followed by a parenthesized expression | |
19563 that retrieves the C pointer and stores it into a variable that is | |
19564 scoped only within the lock block and followed in turn by some code in | |
19565 braces, which is the actual code associated with the lock block, and | |
19566 which can make use of this pointer. While the code inside the lock | |
19567 block is executing, that particular pointer and the object pointed to by | |
19568 it is guaranteed not to be relocated. | |
19569 @item | |
19570 | |
19571 If all the XEmacs C code were converted according to these rules, there | |
19572 would be no restrictions on the sorts of implementations that can be | |
19573 used for the garbage collector. It would be possible, for example, to | |
19574 have an incremental asynchronous relocating garbage collector that | |
19575 operated continuously in another thread while XEmacs was running. | |
19576 @item | |
19577 | |
19578 The C implementation of Lisp objects might not, and probably should not, | |
19579 be visible to the rest of the XEmacs C code. It should theoretically be | |
19580 possible, for example, to implement Lisp objects entirely in terms of | |
19581 association lists, rather than using C structures in the standard way. | |
19582 (This may be an extreme example, but it's good to keep in mind an | |
19583 example such as this when cleaning up the XEmacs C code). The changes | |
19584 mentioned in the previous item would go a long way towards removing this | |
19585 assumption. The only places where this assumption might still be made | |
19586 would be inside of the lock blocks where an actual pointer is retrieved. | |
19587 (Also, of course, we'd have to change the way that Lisp objects are | |
19588 defined in C so that this is done with some function calls and new and | |
19589 improved macros rather than by having the XEmacs C code actually define | |
19590 the structures. This sort of thing would probably have to be done in | |
19591 any case once the allocation mechanism is moved into a separate | |
19592 library.) With some thought it should be possible to define the lock | |
19593 block interface in such a way as to remove any assumptions about the | |
19594 implementation of Lisp objects. | |
19595 @item | |
19596 | |
19597 C code may not be able to call Lisp primitives that are defined in C | |
19598 simply by making standard C function calls. There might need to be some | |
19599 wrapper around all such calls. This could be achieved cleanly through | |
19600 the extra preprocessing step mentioned above, in line with the example | |
19601 described there. | |
19602 | |
19603 @end enumerate | |
19604 | |
19605 @subsubheading Actually Replacing the Engine. | |
19606 | |
19607 Once we've done all of the work mentioned in the previous steps (and | |
19608 admittedly, this is quite a lot of work), we should have an XEmacs that | |
19609 still uses what is essentially the old and previously existing Lisp | |
19610 engine, but which is ready to have its Lisp engine replaced. The | |
19611 replacement might proceed as follows: | |
19612 | |
19613 @enumerate | |
19614 @item | |
19615 | |
19616 Identify any further changes that need to be made to the engine | |
19617 interface that we have defined as a result of the previous steps so that | |
19618 features and idiosyncrasies of various Lisp engines that we examine | |
19619 could be properly supported. | |
19620 @item | |
19621 | |
19622 Pick a Lisp engine and write an interface layer that sits on top of this | |
19623 Lisp engine and makes it adhere to what I'll now call the XEmacs Lisp | |
19624 engine interface. | |
19625 @item | |
19626 | |
19627 Strongly consider creating, if we haven't already done so, a test suite | |
19628 that can test the XEmacs Lisp engine interface when used with a | |
19629 stand-alone Lisp engine. | |
19630 @item | |
19631 | |
19632 Test the hell out of the Lisp engine that we've chosen when combined | |
19633 with its XEmacs Lisp engine interface layer as a stand-alone program. | |
19634 @item | |
19635 | |
19636 Now finally attach this stand-alone program to XEmacs itself. Debug and | |
19637 fix any further problems that ensue (and there inevitably will be such | |
19638 problems), updating the test suite as we go along so that if it were run | |
19639 again on the old and buggy interfaced Lisp engine, it would note the | |
19640 bug. | |
19641 | |
19642 @end enumerate | |
19643 | |
19644 | |
19645 @uref{../../www.666.com/ben/default.htm,Ben Wing} | |
19646 | |
19647 @node Future Work Discussion, Old Future Work, Future Work, Top | |
19648 @chapter Future Work Discussion | |
19649 @cindex future work, discussion | |
19650 @cindex discussion, future work | |
19651 | |
19652 This chapter includes (mostly) email discussions about particular design | |
19653 issues, edited to include only relevant and useful stuff. Ideally over | |
19654 time these could be condensed down to a single design document to go | |
19655 into the normal Future Work section. | |
19656 | |
19657 @menu | |
19658 * Discussion -- garbage collection:: | |
19659 * Discussion -- glyphs:: | |
19660 @end menu | |
19661 | |
19662 @node Discussion -- garbage collection, Discussion -- glyphs, Future Work Discussion, Future Work Discussion | |
19663 @section Discussion -- garbage collection | |
19664 @cindex discussion, garbage collection | |
19665 @cindex garbage collection, discussion | |
19666 | |
19667 | |
19668 @example | |
19669 On Tue, Oct 12, 1999 at 03:36:59AM -0700, Ben Wing wrote: | |
19670 @end example | |
19671 | |
19672 So what am I missing here? | |
19673 | |
19674 @example | |
19675 In response, Olivier Galibert wrote: | |
19676 @end example | |
19677 | |
19678 Two things: | |
19679 @enumerate | |
19680 @item | |
19681 The purespace is gone | |
19682 | |
19683 I mean absolutely, completely and utterly removed. Fpurecopy is a | |
19684 no-op now (and have been for some time). Readonly objects are gone | |
19685 too. Having less checks to do in Fsetcar, Fsetcdr, Faset and some | |
19686 others is probably a good thing, speedwise. I have it removed some | |
19687 time ago because it does not make sense when using a portable dumper | |
19688 to copy data in a special area of the memory at dump time and I wanted | |
19689 to be sure that supressing the copying from Fpurecopy wouldn't break | |
19690 things. | |
19691 | |
19692 Now, we want to get the post-dumping data sharing back, of course. In | |
19693 today systems, it is quite easy: you just have to map the file | |
19694 MAP_PRIVATE and avoid writing to the subset of pages you want to keep | |
19695 shared. Copy-on-write does the job for you. It has the nice side | |
19696 effect of completely avoiding bus errors due to trying to write to | |
19697 readonly memory zones. | |
19698 | |
19699 Avoiding writing to the "pure" objects themselves is already done, of | |
19700 course. Would lisp code have written to the purecopied parts of the | |
19701 dumped data that it would have exploded long ago. So there is nothing | |
19702 to do in this area. So the only remaining thing is the markbit. Two | |
19703 possible strategies: | |
19704 | |
19705 @itemize @bullet | |
19706 @item | |
19707 have Fpurecopy mark somehow the lrecords it would have copied in the | |
19708 good old times. Post-dump, use this mark as a "always marked, don't | |
19709 touch, don't look into, don't free" flag, the same way CHECK_PURE | |
19710 was used. | |
19711 @item | |
19712 move the markbit outside of the lrecord. | |
19713 @end itemize | |
19714 | |
19715 | |
19716 The second solution is more appealing to me for a bunch of reasons: | |
19717 @itemize @bullet | |
19718 @item | |
19719 more things are shared than only what is purecopied (not yet used | |
19720 functions come to mind) | |
19721 @item | |
19722 no more "the only references to this non-purecopied object are from | |
19723 purecopied objects, XEmacs will self-destruct in ten seconds" kind | |
19724 of bugs. | |
19725 @item | |
19726 removing flags goes the right way towards implementing Jan's | |
19727 allocator ideas. | |
19728 @item | |
19729 it becomes probably easier to experiment with the GC code | |
19730 @end itemize | |
19731 | |
19732 @item | |
19733 Finding all the dumped objects in order to unmark them sucks | |
19734 | |
19735 Not having to rebuild a list of all the dumped objects in order to | |
19736 find them all and ensure that all are unmarked simplifies things for | |
19737 me. Errr, ok, now that I really think of it, I can rebuild this list | |
19738 easily, in fact. And I'm probably going to have to manage it, since I | |
19739 feel like the lack of calls to the finalizers for the dumped objects | |
19740 is going to someday turn over and bite me in the face. But anyways, | |
19741 it makes my life easier for now. | |
19742 | |
19743 So no, it's not a _necessity_. But it helps. And the automatic | |
19744 sharing of all objects until you write to them explicitely is, I | |
19745 think, really cool. | |
19746 @end enumerate | |
19747 | |
19748 | |
19749 @example | |
19750 On 10/12/1999 5:49 PM Ben Wing wrote: | |
19751 | |
19752 Subject: Re: hashtable-based marking and cleanups | |
19753 @end example | |
19754 | |
19755 OK, I can see the advantages. But: | |
19756 | |
19757 @enumerate | |
19758 @item | |
19759 There will be an inevitable loss of speed using a large hashtable. If | |
19760 it's large, I say that it's just not worth it. There are things that are | |
19761 so much more important than futzing around with the garbage collector | |
19762 (e.g. fixing the god damn user interface), things which if not fixed will | |
19763 sooner or later cause XEmacs to die entirely. If we are causing a major | |
19764 slowdown in the name of some not-so-important work that may or may not get | |
19765 done, we shouldn't do it. (On the other hand, if the slowdown is | |
19766 negligible, I have no problems with this.) | |
19767 | |
19768 @item | |
19769 I think you should @strong{expand} the concept of read-only objects so | |
19770 that @strong{any} object (especially strings and cons cells) can get | |
19771 marked read-only by the C code if it wants. (Perhaps you could use the | |
19772 now-unused mark bit to hold a read-only flag.) This is important because | |
19773 it allows C code to directly return internal lists (e.g. from the | |
19774 specifiers and various object property lists) without having to do a | |
19775 copy, like is now done (and similarly, potentially to directly accept | |
19776 lists from a Lisp call without copying them for internal use, if the | |
19777 Lisp caller is made aware that the list might become read-only) -- if | |
19778 the copy weren't done and some piece of Lisp code went and modified the | |
19779 list, XEmacs might very well crash. Thus, this read-only flag would be | |
19780 a huge efficiency gain in terms of the garbage collection overhead saved | |
19781 as well as the speed of copying a large list. The extra checks in | |
19782 @code{Fsetcar()}, etc. for this that you mention are in fact negligible | |
19783 in their speed overhead -- one or two instructions -- and these | |
19784 functions are not used all that commonly, either. With the changes I | |
19785 have proposed in Architecting XEmacs, the case of returning an internal | |
19786 list will become more and more common as the power of the user interface | |
19787 would be greatly increased and along with it are lots and lots of lists | |
19788 of info that need to be retrievable from Lisp. | |
19789 @end enumerate | |
19790 | |
19791 BTW there is a wonderful book all about garbage collection by Jones and | |
19792 Lins. Ever seen it? | |
19793 | |
19794 @example | |
19795 http://www.amazon.com/exec/obidos/ASIN/0471941484/qid=939775572/sr=1-1/002-3092633-2509405 | |
19796 @end example | |
19797 | |
19798 @node Discussion -- glyphs, , Discussion -- garbage collection, Future Work Discussion | |
19799 @section Discussion -- glyphs | |
19800 @cindex discussion, glyphs | |
19801 @cindex glyphs, discussion | |
19802 | |
19803 Some comments (not always pretty!) by Ben: | |
19804 | |
19805 @example | |
19806 March 20, 2000 | |
19807 | |
19808 Andy, I use the tab widgets but I've been having lots of problems. | |
19809 | |
19810 1] Sometimes clicking on them does nothing. | |
19811 | |
19812 2] There's a design flaw: I frequently use M-C-l to switch to the | |
19813 previous buffer. If I use this in conjunction with the tabs, things get | |
19814 all screwed up because selecting a buffer with the tab does not bring it | |
19815 to the front of the buffer list, like it should. It looks like you're | |
19816 doing this to avoid having the order of the tabs change, but this is | |
19817 wrong: If you don't reorder the buffer list, everything else gets | |
19818 screwed up. If you want the order of the tabs not to change, you need | |
19819 to decouple this order from the buffer list order. | |
19820 @end example | |
19821 | |
19822 @example | |
19823 March 23, 2000 | |
19824 | |
19825 I'm very confused. The SIGIO timer is used @strong{only} for C-g. It has | |
19826 nothing to do with any other events. (sit-for 0) ought to | |
19827 | |
19828 (1) cause all pending non-command events to get executed, and | |
19829 (b) do redisplay | |
19830 | |
19831 However, sit-for gets preempted by input coming in. | |
19832 | |
19833 What about (sit-for 0.1)? | |
19834 | |
19835 I suppose a solution along the lines of dispatch-non-command-events | |
19836 might be OK if you've tried everything else and it doesn't work, but i'm | |
19837 leery of introducing new Lisp functions to deal with specific problems. | |
19838 Pretty soon we end up with a whole bevy of such ill-defined functions, | |
19839 like we already have. I think instead, you should introduce the | |
19840 following primitive: | |
19841 | |
19842 (wait-for-event redisplay &rest event-specs) | |
19843 | |
19844 Waits for one of the event specifications specified to happen. Returns | |
19845 something about what happened. | |
19846 | |
19847 REDISPLAY controls the behavior of redisplay during waiting. Something | |
19848 like | |
19849 | |
19850 - nil (never redisplay), | |
19851 - t (redisplay when it seems appropriate), etc. | |
19852 | |
19853 EVENT-SPECS could be | |
19854 | |
19855 t -- drain all non-user events, and then return | |
19856 any-process -- wait till input or state change on any process | |
19857 process -- wait till input or state change on process | |
19858 time -- wait till such-and-such time has elapsed | |
19859 'user -- wait till user event has happened | |
19860 '(user predicate) -- wait till user event matching the predicate has | |
19861 happened | |
19862 'event -- wait till any event has happened | |
19863 '(event predicate) -- wait till event matching the predicate has happened | |
19864 | |
19865 The existing functions @code{next-event}, @code{next-command-event}, | |
19866 @code{accept-process-output}, @code{sit-for}, @code{sleep-for}, etc. could all be | |
19867 written in terms of this new command. You could use this command inside | |
19868 of your glyph code to ensure that the events get processed that need do | |
19869 in order for widget updates to happen. | |
19870 | |
19871 But you said something about need a magic event to invoke redisplay? | |
19872 Why is that? | |
19873 @end example | |
19874 | |
19875 @example | |
19876 April 2, 2000 | |
19877 | |
19878 the internal distinction between "widget" and "layout" is bogus. there | |
19879 exist widgets that do drawing and do layout of their children, | |
19880 e.g. group-box widgets and proper tab widgets. the only sensible | |
19881 distinction is between widgets with children and those without children. | |
19882 @end example | |
19883 | |
19884 @example | |
19885 April 5, 2000 | |
19886 | |
19887 andy, i'm not sure i really believe that you need to cycle the event | |
19888 code to get widgets to redisplay, but in any case you should | |
19889 | |
19890 @enumerate | |
19891 @item | |
19892 hide the logic to do this in the c code; the lisp code should do | |
19893 nothing other than call (redisplay widget) | |
19894 | |
19895 @item | |
19896 make sure your event-cycling code processes @strong{NO} events at all. this | |
19897 includes non-user events. queue the events instead. | |
19898 @end enumerate | |
19899 | |
19900 in other words, dispatch-non-command-events must go, and i am proposing | |
19901 a general function (redisplay OBJECT) to replace the existing ad-hoc | |
19902 functions. | |
19903 @end example | |
19904 | |
19905 @example | |
19906 April 6, 2000 | |
19907 | |
19908 the tab widget code should simply be able to create a whole lot of tabs | |
19909 without regard to the size of the gutter, and the surrounding layout | |
19910 widget (please please make layouts be proper widgets!) should | |
19911 automatically map and unmap them as necessary, to fill up the available | |
19912 space. perhaps this already works and what you're doing is just for | |
19913 optimization? but i get the feeling this is not the case. | |
19914 @end example | |
19915 | |
19916 @example | |
19917 April 6, 2000 | |
19918 | |
19919 the function make-gutter-only-dialog-frame is bogus. the use of the | |
19920 gutter here to hold widgets is an implementation detail and should not | |
19921 be exposed in the interface. similarly, make-search-dialog should not | |
19922 have to do all the futzing that it does. creating the frame unmapped, | |
19923 creating an extent and messing with the gutter: all this stuff should be | |
19924 hidden. you should have a simple function make-dialog-frame that takes | |
19925 a dialog specification, and that's all you need to do. | |
19926 | |
19927 also, these dialog boxes, and this function make-dialog-frame, should | |
19928 | |
19929 a] be in dialog.el, not gutter-items.el. | |
19930 b] when possible, be placed in the interactive spec of standard lisp | |
19931 functions rather than accessed directly from menubar-items.el | |
19932 c] wrapped in calls to should-use-dialog-box-p, so the user has control | |
19933 over when dialog boxes appear. | |
19934 @end example | |
19935 | |
19936 @example | |
19937 April 7, 2000 | |
19938 | |
19939 hmmm ... in that case, the whitespace absolutely needs to be specified | |
19940 as properties of the layout widget (e.g. :border-width and | |
19941 :border-height), rather than setting an overall size. you have no idea | |
19942 what the correct size should be if the user changes font size or uses | |
19943 translations in a different language. | |
19944 | |
19945 Your modus operandi should be "hardcoded pixel sizes are @strong{always} bad." | |
19946 @end example | |
19947 | |
19948 @example | |
19949 April 7, 2000 | |
19950 | |
19951 you mean the number of tabs adjusts, or the size of each tab adjusts (by | |
19952 making the font smaller or something)? if the size of a single tab is | |
19953 not related to the total space the tabs can fix into, then it should be | |
19954 possible to simply specify as many tabs as exist for buffers, and have | |
19955 the layout manager decide how many can fit into the available space. | |
19956 this does @strong{not} mean the layout manager will resize the tabs, because | |
19957 query-geometry on the tabs should find out that the tabs don't want to | |
19958 be any size other than they are. | |
19959 | |
19960 the point here is that you should not @strong{have} to worry about pixel | |
19961 heights and widths @strong{anywhere} in Lisp-level code. The layout managers | |
19962 should take care of everything for you. The only exceptions may be in | |
19963 some text fields, which will be blank by default and you want to specify | |
19964 a maximum width (which should be done in 'n' sizes, not in pixels!). | |
19965 | |
19966 i won't stop complaining until i see nearly every one of those | |
19967 pixel-width and pixel-height parameters gone, and the remaining ones | |
19968 there for a very, very good reason. | |
19969 @end example | |
19970 | |
19971 @example | |
19972 April 7, 2000 | |
19973 | |
19974 Andy Piper wrote: | |
19975 | |
19976 > At 03:51 PM 4/6/00 -0700, Ben Wing wrote: | |
19977 > >[the function make-gutter-only-dialog-frame is bogus] | |
19978 > | |
19979 > The problem is that some of the callbacks and such need access to the | |
19980 > @strong{created} frame, so you end up in a catch 22 unless you do what I've done. | |
19981 | |
19982 [Ben proposes other ways to avoid exposing all the guts, as in | |
19983 @code{make-gutter-only-dialog-frame}:] | |
19984 | |
19985 @enumerate | |
19986 @item | |
19987 Instead of passing in the actual glyph spec or glyph, pass in a | |
19988 function of two args (the dialog frame and its parents), which when | |
19989 called, creates and returns the appropriate glyph. | |
19990 | |
19991 @item | |
19992 [Better] Provide a way for callbacks to determine where they were | |
19993 invoked at. This is much more general and is what you should really | |
19994 do. For example, have the code that calls the callbacks bind some | |
19995 global variables such as widget-callback-current-glyph and | |
19996 widget-callback-current-channel, which contain the glyph whose | |
19997 callback is being invoked, and the window or frame of the glyph | |
19998 (depending on where the glyph is) where the invocation actually | |
19999 happened. That way, the callbacks can easily figure out the dialog | |
20000 box and its parent, and not have to worry about embedding it in at | |
20001 creation time. | |
20002 @end enumerate | |
20003 @end example | |
20004 | |
20005 @example | |
20006 April 15, 2000 | |
20007 I don't understand when you say "the various types of callback". Are | |
20008 you using the callback for various different purposes? | |
20009 | |
20010 Your widget callbacks should work just like any other callback: they | |
20011 take two arguments, one indicating the object to which the callback was | |
20012 attached (an image instance, i think), and the event that caused the | |
20013 callback to be invoked. | |
20014 @end example | |
20015 | |
20016 @example | |
20017 April 17, 2000 | |
20018 | |
20019 I am completely vetoing widget-callback-current-channel. How about you | |
20020 create a new keyword, :new-callback, that is a function of two args, | |
20021 like i specified before. | |
20022 | |
20023 btw if you really are calling your callback using call-interactively, | |
20024 why don't you declare a function (interactive "e") and then call | |
20025 event-channel on the resulting event? that should get you the same | |
20026 result as widget-callback-current-channel. | |
20027 | |
20028 the problem with this and everything you've proposed is that there's no | |
20029 way, of course, to get at the actual widget that you were invoked from. | |
20030 would you propose adding widget-callback-current-widget? | |
20031 @end example | |
20032 | |
20033 @node Old Future Work, Index, Future Work Discussion, Top | |
20034 @chapter Old Future Work | |
20035 @cindex old future work | |
20036 @cindex future work, old | |
20037 | |
20038 This chapter includes proposals for future work that were later | |
20039 implemented. These proposals are included because they may describe to | |
20040 some extent the actual workings of the implemented code, and because | |
20041 they may discuss relevant design issues, alternative implementations, or | |
20042 work still to be done. | |
20043 | |
20044 | |
20045 @menu | |
20046 * Future Work -- A Portable Unexec Replacement:: | |
20047 * Future Work -- Indirect Buffers:: | |
20048 * Future Work -- Improvements in support for non-ASCII (European) keysyms under X:: | |
20049 * Future Work -- xemacs.org Mailing Address Changes:: | |
20050 * Future Work -- Lisp callbacks from critical areas of the C code:: | |
20051 @end menu | |
20052 | |
20053 @node Future Work -- A Portable Unexec Replacement, Future Work -- Indirect Buffers, Old Future Work, Old Future Work | |
20054 @section Future Work -- A Portable Unexec Replacement | |
20055 @cindex future work, a portable unexec replacement | |
20056 @cindex a portable unexec replacement, future work | |
20057 | |
20058 @strong{Abstract:} Currently, during the build stage of XEmacs, a bare | |
20059 version of the program (called @dfn{temacs}) is run, which loads up a | |
20060 bunch of Lisp data and then writes out a modified executable file. This | |
20061 process is very tricky to implement and highly system-dependent. It can | |
20062 be replaced by a simple, mostly portable, and easy to implement scheme | |
20063 where the Lisp data is written out to a separate data file. | |
20064 | |
20065 The scheme makes only three assumptions about the memory layout of a | |
20066 running XEmacs process, which, as far as I know, are met by all current | |
20067 implementations of XEmacs (and they're also requirements of the existing | |
20068 unexec scheme): | |
20069 | |
20070 @enumerate | |
20071 @item | |
20072 | |
20073 The initialized data segments of the various XEmacs modules are all laid | |
20074 out contiguously in memory and are separated from the initialized data | |
20075 segments of libraries that are linked with XEmacs; likewise for | |
20076 uninitialized data segments. | |
20077 @item | |
20078 | |
20079 The beginning and end of the XEmacs portion of the combined initialized | |
20080 data segment can be programmatically determined; likewise for the | |
20081 uninitialized data segment. | |
20082 @item | |
20083 | |
20084 The XEmacs portion of the initialized and uninitialized data segments | |
20085 are always loaded at the same place in memory. | |
20086 | |
20087 @end enumerate | |
20088 | |
20089 Assumption number three means that this scheme is non-relocatable, which | |
20090 is a disadvantage as compared to other, relocatable schemes that have | |
20091 been proposed. However, the advantage of this scheme over them is that | |
20092 it is much easier to implement and requires minimal changes to the | |
20093 XEmacs code base. | |
20094 | |
20095 First, let's go over the theory behind the dumping mechanism. The | |
20096 principles that we would like to follow are: | |
20097 | |
20098 @enumerate | |
20099 @item | |
20100 | |
20101 We write out to disk all of the data structures and all of their | |
20102 sub-structures that we have created ourselves, except for data that is | |
20103 expected to change from invocation to invocation (in particular, data | |
20104 that is extracted from the external environment at run time). | |
20105 @item | |
20106 | |
20107 We don't write out to disk any data structures created or initialized by | |
20108 system libraries, by the kernel or by any other code that we didn't | |
20109 create ourselves, because we can't count on that code working in the way | |
20110 that we want it to. | |
20111 @item | |
20112 | |
20113 At the beginning of the next invocation of our program, we read in all | |
20114 those data structures that we have written out to disk, and then | |
20115 continue as if we had just created and initialized all of that data | |
20116 ourselves. | |
20117 @item | |
20118 | |
20119 We make sure that our own data structures don't have any pointers to | |
20120 system data, or if they do, that we note all of these pointers so that | |
20121 we can re-create the system data and set up pointers to the data again | |
20122 in the next invocation. | |
20123 @item | |
20124 | |
20125 During the next invocation of our program, we re-create all of our own | |
20126 data structures that are derived from the external environment. | |
20127 | |
20128 @end enumerate | |
20129 | |
20130 XEmacs, of course, is already set up to adhere to most of these | |
20131 principles. | |
20132 | |
20133 In fact, the current dumping process that we are replacing does a few of | |
20134 these principles slightly differently and adds a few extra of its own: | |
20135 | |
20136 @enumerate | |
20137 @item | |
20138 | |
20139 All data structures of all sorts, including system data, are written | |
20140 out. This is the cause of no end of problems, and it is avoidable, | |
20141 because we can ensure that our own data and the system data are | |
20142 physically separated in memory. | |
20143 @item | |
20144 | |
20145 Our own data structures that we derive from the external environment are | |
20146 in fact written out and read in, but then are simply overwritten during | |
20147 the next invocation with new data. Before dumping, we make sure to free | |
20148 any such data structure that would cause memory leaks. | |
20149 @item | |
20150 | |
20151 XEmacs carefully arranges things so that all static variables in the | |
20152 initialized data are never written to after the dumping stage has | |
20153 completed. This allows for an additional optimization in which we can | |
20154 make static initialized data segments in pre-dumped invocations of | |
20155 XEmacs be read-only and shared among all XEmacs processes on a single | |
20156 machine. | |
20157 | |
20158 @end enumerate | |
20159 | |
20160 The difficult part in this process is figuring out where our data | |
20161 structures lie in memory so that we can correctly write them out and | |
20162 read them back in. The trick that we use to make this problem solvable | |
20163 is to ensure that the heap that is used for all dynamically allocated | |
20164 data structures that are created during the dumping process is located | |
20165 inside the memory of a large, statically declared array. This ensures | |
20166 that all of our own data structures are contained (at least at the time | |
20167 that we dump out our data) inside the static initialized and | |
20168 uninitialized data segments, which are physically separated in memory | |
20169 from any data treated by system libraries and whose starting and ending | |
20170 points are known and unchanging (we know that all of these things are | |
20171 true because we require them to be so, as preconditions of being able to | |
20172 make use of this method of dumping). | |
20173 | |
20174 In order to implement this method of heap allocation, we change the | |
20175 memory allocation function that we use for our own data. (It's | |
20176 extremely important that this function not be used to allocate system | |
20177 data. This means that we must not redefine the @code{malloc} function | |
20178 using the linker, but instead we need to achieve this using the C | |
20179 preprocessor, or by simply using a different name, such as | |
20180 @code{xmalloc}. It's also very important that we use the correct | |
20181 @code{free} function when freeing dynamically-allocated data, depending | |
20182 on whether this data was allocated by us or by the | |
20183 | |
20184 @node Future Work -- Indirect Buffers, Future Work -- Improvements in support for non-ASCII (European) keysyms under X, Future Work -- A Portable Unexec Replacement, Old Future Work | |
20185 @section Future Work -- Indirect Buffers | |
20186 @cindex future work, indirect buffers | |
20187 @cindex indirect buffers, future work | |
20188 | |
20189 An indirect buffer is a buffer that shares its text with some other | |
20190 buffer, but has its own version of all of the buffer properties, | |
20191 including markers, extents, buffer local variables, etc. Indirect | |
20192 buffers are not currently implemented in XEmacs, but they are in GNU | |
20193 Emacs, and some people have asked for this feature. I consider this | |
20194 feature somewhat extent-related because much of the work required to | |
20195 implement this feature involves tracking extents properly. | |
20196 | |
20197 In a world with indirect buffers, some buffers are direct, and some | |
20198 buffers are indirect. This only matters when there is more than one | |
20199 buffer sharing the same text. In such a case, one of the buffers can be | |
20200 considered the canonical buffer for the text in question. This buffer | |
20201 is a direct buffer, and all buffers sharing the text are indirect | |
20202 buffers. These two kinds of buffers are created differently. One of | |
20203 them is created simply using the @code{make_buffer()} function (or | |
20204 perhaps the @code{Fget_buffer_create()} function), and the other kind is | |
20205 created using the @code{make_indirect_buffer()} function, which takes | |
20206 another buffer as an argument which specifies the text of the indirect | |
20207 buffer being created. Every indirect buffer keeps track of the direct | |
20208 buffer that is its parent, and every direct buffer keeps a list of all | |
20209 of its indirect buffer children. This list is modified as buffers are | |
20210 created and deleted. Because buffers are permanent objects, there is no | |
20211 special garbage collection-related trickery involved in these parent and | |
20212 children pointers. There should never be an indirect buffer whose | |
20213 parent is also an indirect buffer. If the user attempts to set up such | |
20214 a situation using @code{make_indirect_buffer()}, either an error should | |
20215 be signaled or the parent of the indirect buffer should automatically | |
20216 become the direct buffer that actually is responsible for the text. | |
20217 Deleting a direct buffer should perhaps cause all of the indirect buffer | |
20218 children to be deleted automatically. There should be Lisp functions | |
20219 for determining whether a buffer is direct or indirect, and other | |
20220 functions for retrieving the parents, or the children of the buffer, | |
20221 depending on which is appropriate. (The scheme being described here is | |
20222 similar to symbolic links. Another possible scheme would be analogous | |
20223 to hard links, and would make no distinction between direct and indirect | |
20224 buffers. In that case, the text of the buffer logically exists as an | |
20225 object separate from the buffer itself and only goes away when the last | |
20226 buffer pointing to this text is deleted.) | |
20227 | |
20228 Other than keeping track of parent and child pointer, the only remaining | |
20229 thing required to implement indirect buffers is to ensure that changes | |
20230 to the text of the buffer trigger the same sorts of effect in all the | |
20231 buffers that share that text. Luckily there are only three functions in | |
20232 XEmacs that actually make changes to the text of the buffer, and they | |
20233 are all located in the file @code{insdel.c}. | |
20234 | |
20235 These three functions are called @code{buffer_insert_string_1()}, | |
20236 @code{buffer_delete_range()}, and @code{buffer_replace_char()}. All of | |
20237 the subfunctions called by these functions are also in @code{insdel.c}. | |
20238 | |
20239 The first thing that each of these three functions needs to do is check | |
20240 to see if its buffer argument is an indirect buffer, and if so, convert | |
20241 it to the indirect buffer's parent. Once that is done, the functions | |
20242 need to be modified so that all of the things they do, other than | |
20243 actually changing the buffers text, such as calling | |
20244 before-change-functions and after-change-functions, and updating extents | |
20245 and markers, need to be done over all of the buffers that are indirect | |
20246 children of the buffers being modified; as well as, of course, for the | |
20247 buffer itself. Each step in the process needs to be iterated for all of | |
20248 the buffers in question before proceeding to the next step. For | |
20249 example, in @code{buffer_insert_string_1()}, | |
20250 @code{prepare_to_modify_buffer()} needs to be called in turn, for all of | |
20251 the buffers sharing the text being modified. Then the text itself is | |
20252 modified, then @code{insert_invalidate_line_number_cache()} is called | |
20253 for all of the buffers, then @code{record_insert()} is called for all of | |
20254 the buffers, etc. Essentially, the operation is being done on all of | |
20255 the buffers in parallel, rather than each buffer being processed in | |
20256 series. This is necessary because many of the steps can quit or call | |
20257 Lisp code and each step depends on the previous step, and some steps are | |
20258 done only once, rather than on each buffer. I imagine it would be | |
20259 significantly easier to implement this, if a macro were created for | |
20260 iterating over a buffer, and then all of the indirect children of that | |
20261 buffer. | |
20262 | |
20263 @node Future Work -- Improvements in support for non-ASCII (European) keysyms under X, Future Work -- xemacs.org Mailing Address Changes, Future Work -- Indirect Buffers, Old Future Work | |
20264 @section Future Work -- Improvements in support for non-ASCII (European) keysyms under X | |
20265 @cindex future work, improvements in support for non-ascii (european) keysyms under x | |
20266 @cindex improvements in support for non-ascii (european) keysyms under x, future work | |
20267 | |
20268 From Martin Buchholz. | |
20269 | |
20270 If a user has a keyboard with known standard non-ASCII character | |
20271 equivalents, typically for European users, then Emacs' default | |
20272 binding should be self-insert-command, with the obvious character | |
20273 inserted. For example, if a user has a keyboard with | |
20274 | |
20275 xmodmap -e "keycode 54 = scaron" | |
20276 | |
20277 then pressing that key on the keyboard will insert the (Latin-2) | |
20278 character corresponding to "scaron" into the buffer. | |
20279 | |
20280 Note: Emacs 20.6 does NOTHING when pressing such a key (not even an | |
20281 error), i.e. even (read-event) ignores this key, which means it can't | |
20282 even be bound to anything by a user trying to customize it. | |
20283 | |
20284 This is implemented by maintaining a table of translations between all | |
20285 the known X keysym names and the corresponding (charset, octet) pairs. | |
20286 | |
20287 For every key on the keyboard that has a known character correspondence, | |
20288 we define the ascii-character property of the keysym, and make the | |
20289 default binding for the key be self-insert-command. | |
20290 | |
20291 The following magic is basically intimate knowledge of X11/keysymdef.h. | |
20292 The keysym mappings defined by X11 are based on the iso8859 standards, | |
20293 except for Cyrillic and Greek. | |
20294 | |
20295 In a non-Mule world, a user can still have a multi-lingual editor, by doing | |
20296 (set-face-font "...-iso8859-2" (current-buffer)) | |
20297 for all their Latin-2 buffers, etc. | |
20298 | |
20299 @node Future Work -- xemacs.org Mailing Address Changes, Future Work -- Lisp callbacks from critical areas of the C code, Future Work -- Improvements in support for non-ASCII (European) keysyms under X, Old Future Work | |
20300 @section Future Work -- xemacs.org Mailing Address Changes | |
20301 @cindex future work, xemacs.org mailing address changes | |
20302 @cindex xemacs.org mailing address changes, future work | |
20303 | |
20304 @subheading Personal addresses | |
20305 | |
20306 @enumerate | |
20307 @item | |
20308 | |
20309 Everyone who is contributing or has ever contributed code to the XEmacs | |
20310 core, or to any of the packages archived at xemacs.org, even if they | |
20311 don't actually have an account on any machine at xemacs.org. In fact, | |
20312 all of these people should have two mailing addresses at xemacs.org, one | |
20313 of which is their actual login name (or potential login name if they | |
20314 were ever to have an account), and the other one is in the form of first | |
20315 name/last name, similar to the way things are done at Sun. For example, | |
20316 Martin would have two addresses at xemacs.org, @code{martin@@xemacs.org}, | |
20317 and @code{martin.buchholz@@xemacs.org}, with the latter one simply being | |
20318 an alias for the former. The idea is that in all cases, if you simply | |
20319 know the name of any past or present contributor to XEmacs, and you want | |
20320 to mail them, you will know immediately how to do this without having to | |
20321 do any complicated searching on the Web or in XEmacs documentation. | |
20322 @item | |
20323 | |
20324 Furthermore, I think that all of the email addresses mentioned anywhere | |
20325 in the XEmacs source code or documentation should be changed to be the | |
20326 corresponding ones at xemacs.org, instead of any other email addresses | |
20327 that any contributors might have. | |
20328 @item | |
20329 | |
20330 All the places in the source code where a contributor's name is | |
20331 mentioned, but no email addressed is attached, should be found, and the | |
20332 correct xemacs.org address should be attached. | |
20333 @item | |
20334 | |
20335 The alias file mapping people's addresses at xemacs.org to their actual | |
20336 addresses elsewhere (in the case, as will be true for the majority of | |
20337 addresses, where the contributor does not actually have an account at | |
20338 xemacs.org, but simply a forwarding pointer), should be viewable on the | |
20339 xemacs.org web site through a CGI script that reads the alias file and | |
20340 turns it into an HTML table. | |
20341 | |
20342 @end enumerate | |
20343 | |
20344 @subheading Package addresses | |
20345 | |
20346 I also think that for every package archived at xemacs.org, there should | |
20347 be three corresponding email addresses at xemacs.org. For example, | |
20348 consider a package such as @code{lazy-shot}. The addresses associated | |
20349 with this package would be: | |
20350 | |
20351 @table @code | |
20352 @item lazy-shot@@xemacs.org | |
20353 This is a discussion mailing list about the @code{lazy-shot} package, | |
20354 and it should be controlled by Majordomo in the standard fashion. | |
20355 @item lazy-shot-patches@@xemacs.org | |
20356 This is where patches to the @code{lazy-shot} package are set. This | |
20357 should go to various people who are interested in such patches. For | |
20358 example, the maintainer of @code{lazy-shot}, perhaps the maintainer of | |
20359 XEmacs itself, and probably to other people who have volunteered to do | |
20360 code review for this package, or for a larger group of packages that | |
20361 this package is in. Perhaps this list should also be maintained by | |
20362 Majordomo. | |
20363 @item lazy-shot-maintainer@@xemacs.org | |
20364 This address is for mailing the maintainer directly. It is possible | |
20365 that this will go to more than one person. This would particularly be | |
20366 the case, for example, if the maintainer is dormant or does not appear | |
20367 very responsive to patches. In this case, the address would also point | |
20368 to someone like Steve, who is acting in the maintainer's stead, and who | |
20369 will himself apply patches or make other changes to the package as | |
20370 maintained in the CVS archive on xemacs.org. | |
20371 @end table | |
20372 | |
20373 It may take a bit of work to track down the current addresses for the | |
20374 various package maintainers, and may in general seem like a lot of work | |
20375 to set up all of these mail addresses, but I think it's very important | |
20376 to make it as easy as possible for random XEmacs users to be able to | |
20377 submit patches and report bugs in an orderly fashion. The general idea | |
20378 that I'm striving for is to create as much momentum as possible in the | |
20379 XEmacs development community, and I think having the system of mail | |
20380 addresses set up will make it much easier for this momentum to be built | |
20381 up and to remain. | |
20382 | |
20383 @uref{../../www.666.com/ben/default.htm,Ben Wing} | |
20384 | |
20385 @node Future Work -- Lisp callbacks from critical areas of the C code, , Future Work -- xemacs.org Mailing Address Changes, Old Future Work | |
20386 @section Future Work -- Lisp callbacks from critical areas of the C code | |
20387 @cindex future work, lisp callbacks from critical areas of the c code | |
20388 @cindex lisp callbacks from critical areas of the c code, future work | |
20389 | |
20390 @example | |
20391 There are many places in the XEmacs C code where Lisp functions are | |
20392 called, usually because the Lisp function is acting as a callback, | |
20393 hook, process filter, or the like. The lisp code is often called in | |
20394 places where some lisp operations are dangerous. Currently there are | |
20395 a lot of ad-hoc schemes implemented to try to prevent these dangerous | |
20396 operations from causing problems. I've added a lot of them myself, | |
20397 for example, the @code{call*_trapping_errors()} functions. Other places, | |
20398 such as the pre-gc- and post-gc-hooks, do their own ad hoc processing. | |
20399 I'm proposing a scheme that would generalize all of this ad hoc code | |
20400 and allow Lisp code to be called in all sorts of sensitive areas of | |
20401 the C code, including even within redisplay. | |
20402 | |
20403 Basically, we define a set of operations that are disallowable because | |
20404 they are dangerous. We essentially assign a bit flag to all of these | |
20405 operations. Whenever any sensitive C code wants to call Lisp code, | |
20406 instead of using the standard call* functions, it uses a new set of | |
20407 functions, call*_critical, which takes an extra parameter, which is a | |
20408 bit mask specifying the set of operations which are disallowed. The | |
20409 basic operations of these functions is simply to set a global variable | |
20410 corresponding to the bit mask (more specifically, the functions store | |
20411 the previous value of this global variable in an unwind_protect, and | |
20412 use bitwise-or to combine the previous value with the new bit mask | |
20413 that was passed in). (Actually, we should first implement a slightly | |
20414 lower level function which is called @code{enter_sensitive_code_section()}, | |
20415 which simply sets up the global variable and the @code{unwind_protect()}, and | |
20416 returns a @code{specbind()} value, but doesn't actually call any Lisp code. | |
20417 There is a corresponding function @code{exit_sensitive_code_section()}, which | |
20418 takes the specbind value as an argument, and unwinds the | |
20419 unwind_protect. The call*_sensitive functions are trivially | |
20420 implemented in terms of these lower level functions.) | |
20421 | |
20422 Corresponding to each of these entries is the C name of the bit flag. | |
20423 | |
20424 The sets of dangerous operations which can be prohibited are: | |
20425 | |
20426 OPERATION_GC_PROHIBITED | |
20427 1. garbage collection. When this flag is set, and the garbage | |
20428 collection threshold is reached, garbage collection simply doesn't | |
20429 happen. It will happen at the next opportunity that it is allowed. | |
20430 Similarly, explicitly calling the Lisp function garbage-collect | |
20431 simply does nothing. | |
20432 | |
20433 OPERATION_CATCH_ERRORS | |
20434 2. signalling an error. When @code{enter_sensitive_code_section()} is | |
20435 called, with the bit flag corresponding to this prohibited | |
20436 operation. When this bit flag is passed to | |
20437 @code{enter_sensitive_code_section()}, a catch is set up which catches all | |
20438 errors, signals a warning with @code{warn_when_safe()}, and then simply | |
20439 continues. This is exactly the same behavior you now get with the | |
20440 @code{call_*_trapping_errors()} functions. (there should also be some way | |
20441 of specifying a warning level and class here, similar to the | |
20442 @code{call_*_trapping_errors()} functions. This is not completely | |
20443 important, however, because a standard warning level and class | |
20444 could simply be chosen.) | |
20445 | |
20446 OPERATION_NO_UNSAFE_OBJECT_DELETION | |
20447 3. This flag prohibits deletion of any permanent object (i.e. any | |
20448 object that does not automatically disappear when created, such as | |
20449 buffers, frames, devices, windows, etc...) unless they were created | |
20450 after this bit flag was set. This would be implemented using a | |
20451 list which stores all of the permanent objects created after this | |
20452 bit flag was set. This list is reset to its previous value when | |
20453 the call to @code{exit_sensitive_code_section()} occurs. The motivation | |
20454 here is to allow Lisp callbacks to create their own temporary | |
20455 buffers or frames, and later delete them, but not allow any other | |
20456 permanent objects to be deleted, because C code might be working | |
20457 with them, and not expect them to change. | |
20458 | |
20459 OPERATION_NO_BUFFER_MODIFICATION | |
20460 4. This flag disallows modifications to the text, extent or any other | |
20461 properties of any buffers except those created after this flag was | |
20462 set, just like in the previous entry. | |
20463 | |
20464 OPERATION_NO_REDISPLAY | |
20465 5. This bit flag inhibits any redisplay-related operations from | |
20466 happening, more specifically, any entry into the redisplay-related | |
20467 code. This includes, for example, the Lisp functions sit-for, | |
20468 force-redisplay, force-cursor-redisplay, window-end with certain | |
20469 arguments to it, and various other functions. When this flag is | |
20470 set, instead of entering the redisplay code, the calling function | |
20471 should simply make sure not to enter the redisplay code, (for | |
20472 example, in the case of window-end), or postpone the redisplay | |
20473 until such a time when it's safe (for example, with sit-for and | |
20474 force-redisplay). | |
20475 | |
20476 OPERATION_NO_REDISPLAY_SETTINGS_CHANGE | |
20477 6. This flag prohibits any modifications to faces, glyphs, specifiers, | |
20478 extents, or any other settings that will affect the way that any | |
20479 window is displayed. | |
20480 | |
20481 | |
20482 The idea here is that it will finally be safe to call Lisp code from | |
20483 nearly any part of the C code, simply by setting any combination of | |
20484 restricted operation bit flags. This even includes from within | |
20485 redisplay. (in such a case, all of the bit flags need to be set). The | |
20486 reason that I thought of this is that some coding system translations | |
20487 might cause Lisp code to be invoked and C code often invokes these | |
20488 translations in sensitive places. | |
20489 @end example | |
20490 | |
20491 @c Indexing guidelines | |
20492 | |
20493 @c I assume that all indexes will be combined. | |
20494 @c Therefore, if a generated findex and permutations | |
20495 @c cover the ways an index user would look up the entry, | |
20496 @c then no cindex is added. | |
20497 @c Concept index (cindex) entries will also be permuted. Therefore, they | |
20498 @c have no commas and few irrelevant connectives in them. | |
20499 | |
20500 @c I tried to include words in a cindex that give the context of the entry, | |
20501 @c particularly if there is more than one entry for the same concept. | |
20502 @c For example, "nil in keymap" | |
20503 @c Similarly for explicit findex and vindex entries, e.g. "print example". | |
20504 | |
20505 @c Error codes are given cindex entries, e.g. "end-of-file error". | |
20506 | |
20507 @c pindex is used for .el files and Unix programs | |
20508 | |
20509 @node Index, , Old Future Work, Top | |
20510 @unnumbered Index | |
20511 | |
20512 @ignore | |
20513 All variables, functions, keys, programs, files, and concepts are | |
20514 in this one index. | |
20515 | |
20516 All names and concepts are permuted, so they appear several times, one | |
20517 for each permutation of the parts of the name. For example, | |
20518 @code{function-name} would appear as @b{function-name} and @b{name, | |
20519 function-}. Key entries are not permuted, however. | |
20520 @end ignore | |
20521 | |
20522 @c Print the indices | |
20523 | |
20524 @printindex fn | |
11039 | 20525 |
11040 @c Print the tables of contents | 20526 @c Print the tables of contents |
11041 @summarycontents | 20527 @summarycontents |
11042 @contents | 20528 @contents |
11043 @c That's all | 20529 @c That's all |