Mercurial > hg > xemacs-beta
comparison man/internals/internals.texi @ 0:376386a54a3c r19-14
Import from CVS: tag r19-14
author | cvs |
---|---|
date | Mon, 13 Aug 2007 08:45:50 +0200 |
parents | |
children | ac2d302a0011 |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:376386a54a3c |
---|---|
1 \input texinfo @c -*-texinfo-*- | |
2 @c %**start of header | |
3 @setfilename ../../info/internals.info | |
4 @settitle XEmacs Internals Manual | |
5 @c %**end of header | |
6 | |
7 @ifinfo | |
8 | |
9 Copyright @copyright{} 1992 - 1996 Ben Wing. | |
10 Copyright @copyright{} 1996 Sun Microsystems. | |
11 Copyright @copyright{} 1994, 1995 Free Software Foundation. | |
12 Copyright @copyright{} 1994, 1995 Board of Trustees, University of Illinois. | |
13 | |
14 | |
15 Permission is granted to make and distribute verbatim copies of this | |
16 manual provided the copyright notice and this permission notice are | |
17 preserved on all copies. | |
18 | |
19 @ignore | |
20 Permission is granted to process this file through TeX and print the | |
21 results, provided the printed document carries copying permission notice | |
22 identical to this one except for the removal of this paragraph (this | |
23 paragraph not being relevant to the printed manual). | |
24 | |
25 @end ignore | |
26 Permission is granted to copy and distribute modified versions of this | |
27 manual under the conditions for verbatim copying, provided that the | |
28 entire resulting derived work is distributed under the terms of a | |
29 permission notice identical to this one. | |
30 | |
31 Permission is granted to copy and distribute translations of this manual | |
32 into another language, under the above conditions for modified versions, | |
33 except that this permission notice may be stated in a translation | |
34 approved by the Foundation. | |
35 | |
36 Permission is granted to copy and distribute modified versions of this | |
37 manual under the conditions for verbatim copying, provided also that the | |
38 section entitled ``GNU General Public License'' is included exactly as | |
39 in the original, and provided that the entire resulting derived work is | |
40 distributed under the terms of a permission notice identical to this | |
41 one. | |
42 | |
43 Permission is granted to copy and distribute translations of this manual | |
44 into another language, under the above conditions for modified versions, | |
45 except that the section entitled ``GNU General Public License'' may be | |
46 included in a translation approved by the Free Software Foundation | |
47 instead of in the original English. | |
48 @end ifinfo | |
49 | |
50 @c Combine indices. | |
51 @synindex cp fn | |
52 @syncodeindex vr fn | |
53 @syncodeindex ky fn | |
54 @syncodeindex pg fn | |
55 @syncodeindex tp fn | |
56 | |
57 @setchapternewpage odd | |
58 @finalout | |
59 | |
60 @titlepage | |
61 @title XEmacs Internals Manual | |
62 @subtitle Version 1.0, March 1996 | |
63 | |
64 @author Ben Wing | |
65 @page | |
66 @vskip 0pt plus 1fill | |
67 | |
68 @noindent | |
69 Copyright @copyright{} 1992 - 1996 Ben Wing. @* | |
70 Copyright @copyright{} 1996 Sun Microsystems, Inc. @* | |
71 Copyright @copyright{} 1994 Free Software Foundation. @* | |
72 Copyright @copyright{} 1994, 1995 Board of Trustees, University of Illinois. | |
73 | |
74 @sp 2 | |
75 Version 1.0 @* | |
76 March, 1996.@* | |
77 | |
78 Permission is granted to make and distribute verbatim copies of this | |
79 manual provided the copyright notice and this permission notice are | |
80 preserved on all copies. | |
81 | |
82 Permission is granted to copy and distribute modified versions of this | |
83 manual under the conditions for verbatim copying, provided also that the | |
84 section entitled ``GNU General Public License'' is included | |
85 exactly as in the original, and provided that the entire resulting | |
86 derived work is distributed under the terms of a permission notice | |
87 identical to this one. | |
88 | |
89 Permission is granted to copy and distribute translations of this manual | |
90 into another language, under the above conditions for modified versions, | |
91 except that the section entitled ``GNU General Public License'' may be | |
92 included in a translation approved by the Free Software Foundation | |
93 instead of in the original English. | |
94 @end titlepage | |
95 @page | |
96 | |
97 @node Top, A History of Emacs, (dir), (dir) | |
98 | |
99 @ifinfo | |
100 This Info file contains v1.0 of the XEmacs Internals Manual. | |
101 @end ifinfo | |
102 | |
103 @menu | |
104 * A History of Emacs:: Times, dates, important events. | |
105 * XEmacs From the Outside:: A broad conceptual overview. | |
106 * The Lisp Language:: An overview. | |
107 * XEmacs From the Perspective of Building:: | |
108 * XEmacs From the Inside:: | |
109 * The XEmacs Object System (Abstractly Speaking):: | |
110 * How Lisp Objects Are Represented in C:: | |
111 * Rules When Writing New C Code:: | |
112 * A Summary of the Various XEmacs Modules:: | |
113 * Allocation of Objects in XEmacs Lisp:: | |
114 * Events and the Event Loop:: | |
115 * Evaluation; Stack Frames; Bindings:: | |
116 * Symbols and Variables:: | |
117 * Buffers and Textual Representation:: | |
118 * MULE Character Sets and Encodings:: | |
119 * The Lisp Reader and Compiler:: | |
120 * Lstreams:: | |
121 * Consoles; Devices; Frames; Windows:: | |
122 * The Redisplay Mechanism:: | |
123 * Extents:: | |
124 * Faces and Glyphs:: | |
125 * Specifiers:: | |
126 * Menus:: | |
127 * Subprocesses:: | |
128 * Interface to X Windows:: | |
129 * Index:: Index including concepts, functions, variables, | |
130 and other terms. | |
131 | |
132 --- The Detailed Node Listing --- | |
133 | |
134 Here are other nodes that are inferiors of those already listed, | |
135 mentioned here so you can get to them in one step: | |
136 | |
137 A History of Emacs | |
138 | |
139 * Through Version 18:: Unification prevails. | |
140 * Lucid Emacs:: One version 19 Emacs. | |
141 * GNU Emacs 19:: The other version 19 Emacs. | |
142 * XEmacs:: The continuation of Lucid Emacs. | |
143 | |
144 Rules When Writing New C Code | |
145 | |
146 * General Coding Rules:: | |
147 * Writing Lisp Primitives:: | |
148 * Adding Global Lisp Variables:: | |
149 | |
150 A Summary of the Various XEmacs Modules | |
151 | |
152 * Low-Level Modules:: | |
153 * Basic Lisp Modules:: | |
154 * Modules for Standard Editing Operations:: | |
155 * Editor-Level Control Flow Modules:: | |
156 * Modules for the Basic Displayable Lisp Objects:: | |
157 * Modules for other Display-Related Lisp Objects:: | |
158 * Modules for the Redisplay Mechanism:: | |
159 * Modules for Interfacing with the File System:: | |
160 * Modules for Other Aspects of the Lisp Interpreter and Object System:: | |
161 * Modules for Interfacing with the Operating System:: | |
162 * Modules for Interfacing with X Windows:: | |
163 * Modules for Internationalization:: | |
164 | |
165 Allocation of Objects in XEmacs Lisp | |
166 | |
167 * Introduction to Allocation:: | |
168 * Garbage Collection:: | |
169 * GCPROing:: | |
170 * Integers and Characters:: | |
171 * Allocation from Frob Blocks:: | |
172 * lrecords:: | |
173 * Low-level allocation:: | |
174 * Pure Space:: | |
175 * Cons:: | |
176 * Vector:: | |
177 * Bit Vector:: | |
178 * Symbol:: | |
179 * Marker:: | |
180 * String:: | |
181 * Bytecode:: | |
182 | |
183 Events and the Event Loop | |
184 | |
185 * Introduction to Events:: | |
186 * Main Loop:: | |
187 * Specifics of the Event Gathering Mechanism:: | |
188 * Specifics About the Emacs Event:: | |
189 * The Event Stream Callback Routines:: | |
190 * Other Event Loop Functions:: | |
191 * Converting Events:: | |
192 * Dispatching Events; The Command Builder:: | |
193 | |
194 Evaluation; Stack Frames; Bindings | |
195 | |
196 * Evaluation:: | |
197 * Dynamic Binding; The specbinding Stack; Unwind-Protects:: | |
198 * Simple Special Forms:: | |
199 * Catch and Throw:: | |
200 | |
201 Symbols and Variables | |
202 | |
203 * Introduction to Symbols:: | |
204 * Obarrays:: | |
205 * Symbol Values:: | |
206 | |
207 Buffers and Textual Representation | |
208 | |
209 * Introduction to Buffers:: A buffer holds a block of text such as a file. | |
210 * A Buffer@'s Text:: Representation of the text in a buffer. | |
211 * Buffer Lists:: Keeping track of all buffers. | |
212 * Markers and Extents:: Tagging locations within a buffer. | |
213 * Bufbytes and Emchars:: Representation of individual characters. | |
214 * The Buffer Object:: The Lisp object corresponding to a buffer. | |
215 | |
216 MULE Character Sets and Encodings | |
217 | |
218 * Character Sets:: | |
219 * Encodings:: | |
220 * Internal Mule Encodings:: | |
221 | |
222 Encodings | |
223 | |
224 * Japanese EUC (Extended Unix Code):: | |
225 * JIS7:: | |
226 | |
227 Internal Mule Encodings | |
228 | |
229 * Internal String Encoding:: | |
230 * Internal Character Encoding:: | |
231 | |
232 The Lisp Reader and Compiler | |
233 | |
234 Lstreams | |
235 | |
236 Consoles; Devices; Frames; Windows | |
237 | |
238 * Introduction to Consoles; Devices; Frames; Windows:: | |
239 * Point:: | |
240 * Window Hierarchy:: | |
241 | |
242 The Redisplay Mechanism | |
243 | |
244 * Critical Redisplay Sections:: | |
245 * Line Start Cache:: | |
246 | |
247 Extents | |
248 | |
249 * Introduction to Extents:: Extents are ranges over text, with properties. | |
250 * Extent Ordering:: How extents are ordered internally. | |
251 * Format of the Extent Info:: The extent information in a buffer or string. | |
252 * Zero-Length Extents:: A weird special case. | |
253 * Mathematics of Extent Ordering:: A rigorous foundation. | |
254 * Extent Fragments:: Cached information useful for redisplay. | |
255 | |
256 Faces and Glyphs | |
257 | |
258 Specifiers | |
259 | |
260 Menus | |
261 | |
262 Subprocesses | |
263 | |
264 Interface to X Windows | |
265 | |
266 @end menu | |
267 | |
268 @node A History of Emacs, XEmacs From the Outside, Top, Top | |
269 @chapter A History of Emacs | |
270 @cindex history of Emacs | |
271 @cindex Hackers (Steven Levy) | |
272 @cindex Levy, Steven | |
273 @cindex ITS (Incompatible Timesharing System) | |
274 @cindex Stallman, Richard | |
275 @cindex RMS | |
276 @cindex MIT | |
277 @cindex TECO | |
278 @cindex FSF | |
279 @cindex Free Software Foundation | |
280 | |
281 XEmacs is a powerful, customizable text editor and development | |
282 environment. It began as Lucid Emacs, which was in turn derived from | |
283 GNU Emacs, a program written by Richard Stallman of the Free Software | |
284 Foundation. GNU Emacs dates back to the 1970's, and was modelled | |
285 after a package called ``Emacs'', written in 1976, that was a set of | |
286 macros on top of TECO, an old, old text editor written at MIT on the | |
287 DEC PDP 10 under one of the earliest time-sharing operating systems, | |
288 ITS (Incompatible Timesharing System). (ITS dates back well before | |
289 Unix.) ITS, TECO, and Emacs were products of a group of people at MIT | |
290 who called themselves ``hackers'', who shared an idealistic belief | |
291 system about the free exchange of information and were fanatical in | |
292 their devotion to and time spent with computers. (The hacker | |
293 subculture dates back to the late 1950's at MIT and is described in | |
294 detail in Steven Levy's book @cite{Hackers}. This book also includes | |
295 a lot of information about Stallman himself and the development of | |
296 Lisp, a programming language developed at MIT that underlies Emacs.) | |
297 | |
298 @menu | |
299 * Through Version 18:: Unification prevails. | |
300 * Lucid Emacs:: One version 19 Emacs. | |
301 * GNU Emacs 19:: The other version 19 Emacs. | |
302 * XEmacs:: The continuation of Lucid Emacs. | |
303 @end menu | |
304 | |
305 @node Through Version 18 | |
306 @section Through Version 18 | |
307 @cindex Gosling, James | |
308 @cindex Great Usenet Renaming | |
309 | |
310 Although the history of the early versions of GNU Emacs is unclear, | |
311 the history is well-known from the middle of 1985. A time line is: | |
312 | |
313 @itemize @bullet | |
314 @item | |
315 GNU Emacs version 15 (15.34) was released sometime in 1984 or 1985 and | |
316 shared some code with a version of Emacs written by James Gosling (the | |
317 same James Gosling who later created the Java language). | |
318 @item | |
319 GNU Emacs version 16 (first released version was 16.56) was released on | |
320 July 15, 1985. All Gosling code was removed due to potential copyright | |
321 problems with the code. | |
322 @item | |
323 version 16.57: released on September 16, 1985. | |
324 @item | |
325 versions 16.58, 16.59: released on September 17, 1985. | |
326 @item | |
327 version 16.60: released on September 19, 1985. These later version 16's | |
328 incorporated patches from the net, esp. for getting Emacs to work under | |
329 System V. | |
330 @item | |
331 version 17.36 (first official v17 release) released on December 20, | |
332 1985. Included a TeX-able user manual. First official unpatched | |
333 version that worked on vanilla System V machines. | |
334 @item | |
335 version 17.43 (second official v17 release) released on January 25, | |
336 1986. | |
337 @item | |
338 version 17.45 released on January 30, 1986. | |
339 @item | |
340 version 17.46 released on February 4, 1986. | |
341 @item | |
342 version 17.48 released on February 10, 1986. | |
343 @item | |
344 version 17.49 released on February 12, 1986. | |
345 @item | |
346 version 17.55 released on March 18, 1986. | |
347 @item | |
348 version 17.57 released on March 27, 1986. | |
349 @item | |
350 version 17.58 released on April 4, 1986. | |
351 @item | |
352 version 17.61 released on April 12, 1986. | |
353 @item | |
354 version 17.63 released on May 7, 1986. | |
355 @item | |
356 version 17.64 released on May 12, 1986. | |
357 @item | |
358 version 18.24 (a beta version) released on October 2, 1986. | |
359 @item | |
360 version 18.30 (a beta version) released on November 15, 1986. | |
361 @item | |
362 version 18.31 (a beta version) released on November 23, 1986. | |
363 @item | |
364 version 18.32 (a beta version) released on December 7, 1986. | |
365 @item | |
366 version 18.33 (a beta version) released on December 12, 1986. | |
367 @item | |
368 version 18.35 (a beta version) released on January 5, 1987. | |
369 @item | |
370 version 18.36 (a beta version) released on January 21, 1987. | |
371 @item | |
372 version 18.37 (a beta version) released on February 12, 1987. | |
373 @item | |
374 version 18.38 (a beta version) released on March 3, 1987. | |
375 @item | |
376 version 18.39 (a beta version) released on March 14, 1987. | |
377 @item | |
378 version 18.40 (a beta version) released on March 18, 1987. | |
379 @item | |
380 version 18.41 (the first ``official'' release) released on March 22, | |
381 1987. | |
382 @item | |
383 version 18.45 released on June 2, 1987. | |
384 @item | |
385 version 18.46 released on June 9, 1987. | |
386 @item | |
387 version 18.47 released on June 18, 1987. | |
388 @item | |
389 version 18.48 released on September 3, 1987. | |
390 @item | |
391 version 18.49 released on September 18, 1987. | |
392 @item | |
393 version 18.50 released on February 13, 1988. | |
394 @item | |
395 version 18.51 released on May 7, 1988. | |
396 @item | |
397 version 18.52 released on September 1, 1988. | |
398 @item | |
399 January 27, 1989: The Great Usenet Renaming. net.emacs is now | |
400 comp.emacs. | |
401 @item | |
402 version 18.53 released on February 24, 1989. | |
403 @item | |
404 version 18.54 released on April 26, 1989. | |
405 @item | |
406 version 18.55 released on August 23, 1989. This is the earliest version | |
407 that is still available by FTP. | |
408 @item | |
409 version 18.56 released on January 17, 1991. | |
410 @item | |
411 version 18.57 released late January, 1991. | |
412 @item | |
413 version 18.58 released ?????. | |
414 @item | |
415 version 18.59 released October 31, 1992. | |
416 @end itemize | |
417 | |
418 @node Lucid Emacs | |
419 @section Lucid Emacs | |
420 @cindex Lucid Emacs | |
421 @cindex Lucid Inc. | |
422 @cindex Energize | |
423 @cindex Epoch | |
424 | |
425 Lucid Emacs was developed by the (now-defunct) Lucid Inc., a maker of | |
426 C++ and Lisp development environments. It began when Lucid decided they | |
427 wanted to use Emacs as the editor and cornerstone of their C++ | |
428 development environment (called ``Energize''). They needed many features | |
429 that were not available in the existing version of GNU Emacs (version | |
430 18.5something), in particular good and integrated support for GUI | |
431 elements such as mouse support, multiple fonts, multiple window-system | |
432 windows, etc. A branch of GNU Emacs called Epoch, written at the | |
433 University of Illinois, existed that supplied many of these features; | |
434 however, Lucid needed more than what existed in Epoch. At the time, the | |
435 Free Software Foundation was working on version 19 of Emacs (this was | |
436 sometime around 1991), which was planned to have similar features, and | |
437 so Lucid decided to work with the Free Software Foundation. Their plan | |
438 was to add features that they needed, and coordinate with the FSF so | |
439 that the features would get included back into Emacs version 19. | |
440 | |
441 Delays in the release of version 19 occurred, however (resulting in it | |
442 finally being released more than a year after what was initially | |
443 planned), and Lucid encountered unexpected technical resistance in | |
444 getting their changes merged back into version 19, so they decided to | |
445 release their own version of Emacs, which became Lucid Emacs 19.0. | |
446 | |
447 @cindex Zawinski, Jamie | |
448 @cindex Sexton, Harlan | |
449 @cindex Benson, Eric | |
450 @cindex Devin, Matthieu | |
451 The initial authors of Lucid Emacs were Matthieu Devin, Harlan Sexton, | |
452 and Eric Benson, and the work was later taken over by Jamie Zawinski, | |
453 who became ``Mr. Lucid Emacs'' for many releases. | |
454 | |
455 A time line for Lucid Emacs/XEmacs is | |
456 | |
457 @itemize @bullet | |
458 @item | |
459 version 19.0 shipped with Energize 1.0, April 1992. | |
460 @item | |
461 version 19.1 released June 4, 1992. | |
462 @item | |
463 version 19.2 released June 19, 1992. | |
464 @item | |
465 version 19.3 released September 9, 1992. | |
466 @item | |
467 version 19.4 released January 21, 1993. | |
468 @item | |
469 version 19.5 was a repackaging of 19.4 with a few bug fixes and | |
470 shipped with Energize 2.0. Never released to the net. | |
471 @item | |
472 version 19.6 released April 9, 1993. | |
473 @item | |
474 version 19.7 was a repackaging of 19.6 with a few bug fixes and | |
475 shipped with Energize 2.1. Never released to the net. | |
476 @item | |
477 version 19.8 released September 6, 1993. | |
478 @item | |
479 version 19.9 released January 12, 1994. | |
480 @item | |
481 version 19.10 released May 27, 1994. | |
482 @item | |
483 version 19.11 (first XEmacs) released September 13, 1994. | |
484 @item | |
485 version 19.12 released June 23, 1995. | |
486 @item | |
487 version 19.13 released September 1, 1995. | |
488 @end itemize | |
489 | |
490 @node GNU Emacs 19 | |
491 @section GNU Emacs 19 | |
492 @cindex GNU Emacs 19 | |
493 @cindex FSF Emacs | |
494 | |
495 About a year after the initial release of Lucid Emacs, the FSF | |
496 released a beta of their version of Emacs 19 (referred to here as ``GNU | |
497 Emacs''). By this time, the current version of Lucid Emacs was | |
498 19.6. (Strangely, the first released beta from the FSF was GNU Emacs | |
499 19.7.) A time line for GNU Emacs version 19 is | |
500 | |
501 @itemize @bullet | |
502 @item | |
503 version 19.8 (beta) released May 27, 1993. | |
504 @item | |
505 version 19.9 (beta) released May 27, 1993. | |
506 @item | |
507 version 19.10 (beta) released May 30, 1993. | |
508 @item | |
509 version 19.11 (beta) released June 1, 1993. | |
510 @item | |
511 version 19.12 (beta) released June 2, 1993. | |
512 @item | |
513 version 19.13 (beta) released June 8, 1993. | |
514 @item | |
515 version 19.14 (beta) released June 17, 1993. | |
516 @item | |
517 version 19.15 (beta) released June 19, 1993. | |
518 @item | |
519 version 19.16 (beta) released July 6, 1993. | |
520 @item | |
521 version 19.17 (beta) released late July, 1993. | |
522 @item | |
523 version 19.18 (beta) released August 9, 1993. | |
524 @item | |
525 version 19.19 (beta) released August 15, 1993. | |
526 @item | |
527 version 19.20 (beta) released November 17, 1993. | |
528 @item | |
529 version 19.21 (beta) released November 17, 1993. | |
530 @item | |
531 version 19.22 (beta) released November 28, 1993. | |
532 @item | |
533 version 19.23 (beta) released May 17, 1994. | |
534 @item | |
535 version 19.24 (beta) released May 16, 1994. | |
536 @item | |
537 version 19.25 (beta) released June 3, 1994. | |
538 @item | |
539 version 19.26 (beta) released September 11, 1994. | |
540 @item | |
541 version 19.27 (beta) released September 14, 1994. | |
542 @item | |
543 version 19.28 (first ``official'' release) released November 1, 1994. | |
544 @item | |
545 version 19.29 released June 21, 1995. | |
546 @end itemize | |
547 | |
548 @cindex Mlynarik, Richard | |
549 In some ways, GNU Emacs 19 was better than Lucid Emacs; in some ways, | |
550 worse. Lucid soon began incorporating features from GNU Emacs 19 into | |
551 Lucid Emacs; the work was mostly done by Richard Mlynarik, who had been | |
552 working on and using GNU Emacs for a long time (back as far as version | |
553 16 or 17). | |
554 | |
555 @node XEmacs | |
556 @section XEmacs | |
557 @cindex XEmacs | |
558 | |
559 @cindex Sun Microsystems | |
560 @cindex University of Illinois | |
561 @cindex Illinois, University of | |
562 @cindex SPARCWorks | |
563 @cindex Andreessen, Marc | |
564 @cindex Kaplan, Simon | |
565 @cindex Wing, Ben | |
566 @cindex Thompson, Chuck | |
567 @cindex Win-Emacs | |
568 @cindex Epoch | |
569 @cindex Amdahl Corporation | |
570 Around the time that Lucid was developing Energize, Sun Microsystems | |
571 was developing their own development environment (called ``SPARCWorks'') | |
572 and also decided to use Emacs. They joined forces with the Epoch team | |
573 at the University of Illinois and later with Lucid. The maintainer of | |
574 the last-released version of Epoch was Marc Andreessen, but he dropped | |
575 out and the Epoch project, headed by Simon Kaplan, lured Chuck Thompson | |
576 away from a system administration job to become the primary Lucid Emacs | |
577 author for Epoch and Sun. Chuck's area of specialty became the | |
578 redisplay engine (he replaced the old Lucid Emacs redisplay engine with | |
579 a ported version from Epoch and then later rewrote it from scratch). | |
580 Sun also hired Ben Wing (the author of Win-Emacs, a port of Lucid Emacs | |
581 to Microsoft Windows 3.1) in 1993, for what was initially a one-month | |
582 contract to fix some event problems but later became a many-year | |
583 involvement, punctuated by a six-month contract with Amdahl Corporation. | |
584 | |
585 @cindex rename to XEmacs | |
586 In 1994, Sun and Lucid agreed to rename Lucid Emacs to XEmacs (a name | |
587 not favorable to either company); the first release called XEmacs was | |
588 version 19.11. In June 1994, Lucid folded and Jamie quit to work for | |
589 the newly formed Mosaic Communications Corp., later Netscape | |
590 Communications Corp. (co-founded by the same Marc Andreessen, who had | |
591 quit his Epoch job to work on a graphical browser for the World Wide | |
592 Web). Chuck then become the primary maintainer of XEmacs, and put out | |
593 versions 19.11, 19.12, and 19.13 in conjunction with Ben. For 19.12 and | |
594 19.13, Chuck added the new redisplay and many other display improvements | |
595 and Ben added MULE support (support for Asian and other languages) and | |
596 redesigned most of the internal Lisp subsystems to better support the | |
597 MULE work and the various other features being added to XEmacs. | |
598 | |
599 @cindex merging attempts | |
600 Many attempts have been made to merge XEmacs and GNU Emacs, but they | |
601 have consistently run into the same technical disagreements and other | |
602 problems that Lucid ran into when originally attempting to merge Lucid | |
603 Emacs into GNU Emacs. | |
604 | |
605 A more detailed history is contained in the XEmacs About page. | |
606 | |
607 @node XEmacs From the Outside, The Lisp Language, A History of Emacs, Top | |
608 @chapter XEmacs From the Outside | |
609 @cindex read-eval-print | |
610 | |
611 XEmacs appears to the outside world as an editor, but it is really a | |
612 Lisp environment. At its heart is a Lisp interpreter; it also | |
613 ``happens'' to contain many specialized object types (e.g. buffers, | |
614 windows, frames, events) that are useful for implementing an editor. | |
615 Some of these objects (in particular windows and frames) have | |
616 displayable representations, and XEmacs provides a function | |
617 @code{redisplay()} that ensures that the display of all such objects | |
618 matches their internal state. Most of the time, a standard Lisp | |
619 environment is in a @dfn{read-eval-print} loop -- i.e. ``read some Lisp | |
620 code, execute it, and print the results''. XEmacs has a similar loop: | |
621 | |
622 @itemize @bullet | |
623 @item | |
624 read an event | |
625 @item | |
626 dispatch the event (i.e. ``do it'') | |
627 @item | |
628 redisplay | |
629 @end itemize | |
630 | |
631 Reading an event is done using the Lisp function @code{next-event}, | |
632 which waits for something to happen (typically, the user presses a key | |
633 or moves the mouse) and returns an event object describing this. | |
634 Dispatching an event is done using the Lisp function | |
635 @code{dispatch-event}, which looks up the event in a keymap object (a | |
636 particular kind of object that associates an event with a Lisp function) | |
637 and calls that function. The function ``does'' what the user has | |
638 requested by changing the state of particular frame objects, buffer | |
639 objects, etc. Finally, @code{redisplay()} is called, which updates the | |
640 display to reflect those changes just made. Thus is an ``editor'' born. | |
641 | |
642 @cindex bridge, playing | |
643 @cindex taxes, doing | |
644 @cindex pi, calculating | |
645 Note that you do not have to use XEmacs as an editor; you could just | |
646 as well make it do your taxes, compute pi, play bridge, etc. You'd just | |
647 have to write functions to do those operations in Lisp. | |
648 | |
649 @node The Lisp Language, XEmacs From the Perspective of Building, XEmacs From the Outside, Top | |
650 @chapter The Lisp Language | |
651 @cindex Lisp vs. C | |
652 @cindex C vs. Lisp | |
653 @cindex Lisp vs. Java | |
654 @cindex Java vs. Lisp | |
655 @cindex dynamic scoping | |
656 @cindex scoping, dynamic | |
657 @cindex dynamic types | |
658 @cindex types, dynamic | |
659 @cindex Java | |
660 @cindex Common Lisp | |
661 @cindex Gosling, James | |
662 | |
663 Lisp is a general-purpose language that is higher-level than C and in | |
664 many ways more powerful than C. Powerful dialects of Lisp such as | |
665 Common Lisp are probably much better languages for writing very large | |
666 applications than is C. (Unfortunately, for many non-technical | |
667 reasons C and its successor C++ have become the dominant languages for | |
668 application development. These languages are both inadequate for | |
669 extremely large applications, which is evidenced by the fact that newer, | |
670 larger programs are becoming ever harder to write and are requiring ever | |
671 more programmers despite great increases in C development environments; | |
672 and by the fact that, although hardware speeds and reliability have been | |
673 growing at an exponential rate, most software is still generally | |
674 considered to be slow and buggy.) | |
675 | |
676 The new Java language holds promise as a better general-purpose | |
677 development language than C. Java has many features in common with | |
678 Lisp that are not shared by C (this is not a coincidence, since | |
679 Java was designed by James Gosling, a former Lisp hacker). This | |
680 will be discussed more later. | |
681 | |
682 For those used to C, here is a summary of the basic differences between | |
683 C and Lisp: | |
684 | |
685 @enumerate | |
686 @item | |
687 Lisp has an extremely regular syntax. Every function, expression, | |
688 and control statement is written in the form | |
689 | |
690 @example | |
691 (@var{func} @var{arg1} @var{arg2} ...) | |
692 @end example | |
693 | |
694 This is as opposed to C, which writes functions as | |
695 | |
696 @example | |
697 func(@var{arg1}, @var{arg2}, ...) | |
698 @end example | |
699 | |
700 but writes expressions involving operators as (e.g.) | |
701 | |
702 @example | |
703 @var{arg1} + @var{arg2} | |
704 @end example | |
705 | |
706 and writes control statements as (e.g.) | |
707 | |
708 @example | |
709 while (@var{expr}) @{ @var{statement1}; @var{statement2}; ... @} | |
710 @end example | |
711 | |
712 Lisp equivalents of the latter two would be | |
713 | |
714 @example | |
715 (+ @var{arg1} @var{arg2} ...) | |
716 @end example | |
717 | |
718 and | |
719 | |
720 @example | |
721 (while @var{expr} @var{statement1} @var{statement2} ...) | |
722 @end example | |
723 | |
724 @item | |
725 Lisp is a safe language. Assuming there are no bugs in the Lisp | |
726 interpreter/compiler, it is impossible to write a program that ``core | |
727 dumps'' or otherwise causes the machine to execute an illegal | |
728 instruction. This is very different from C, where perhaps the most | |
729 common outcome of a bug is exactly such a crash. A corollary of this is that | |
730 the C operation of casting a pointer is impossible (and unnecessary) in | |
731 Lisp, and that it is impossible to access memory outside the bounds of | |
732 an array. | |
733 | |
734 @item | |
735 Programs and data are written in the same form. The | |
736 parenthesis-enclosing form described above for statements is the same | |
737 form used for the most common data type in Lisp, the list. Thus, it is | |
738 possible to represent any Lisp program using Lisp data types, and for | |
739 one program to construct Lisp statements and then dynamically | |
740 @dfn{evaluate} them, or cause them to execute. | |
741 | |
742 @item | |
743 All objects are @dfn{dynamically typed}. This means that part of every | |
744 object is an indication of what type it is. A Lisp program can | |
745 manipulate an object without knowing what type it is, and can query an | |
746 object to determine its type. This means that, correspondingly, | |
747 variables and function parameters can hold objects of any type and are | |
748 not normally declared as being of any particular type. This is opposed | |
749 to the @dfn{static typing} of C, where variables can hold exactly one | |
750 type of object and must be declared as such, and objects do not contain | |
751 an indication of their type because it's implicit in the variables they | |
752 are stored in. It is possible in C to have a variable hold different | |
753 types of objects (e.g. through the use of @code{void *} pointers or | |
754 variable-argument functions), but the type information must then be | |
755 passed explicitly in some other fashion, leading to additional program | |
756 complexity. | |
757 | |
758 @item | |
759 Allocated memory is automatically reclaimed when it is no longer in use. | |
760 This operation is called @dfn{garbage collection} and involves looking | |
761 through all variables to see what memory is being pointed to, and | |
762 reclaiming any memory that is not pointed to and is thus | |
763 ``inaccessible'' and out of use. This is as opposed to C, in which | |
764 allocated memory must be explicitly reclaimed using @code{free()}. If | |
765 you simply drop all pointers to memory without freeing it, it becomes | |
766 ``leaked'' memory that still takes up space. Over a long period of | |
767 time, this can cause your program to grow and grow until it runs out of | |
768 memory. | |
769 | |
770 @item | |
771 Lisp has built-in facilities for handling errors and exceptions. In C, | |
772 when an error occurs, usually either the program exits entirely or the | |
773 routine in which the error occurs returns a value indicating this. If | |
774 an error occurs in a deeply-nested routine, then every routine currently | |
775 called must unwind itself normally and return an error value back up to | |
776 the next routine. This means that every routine must explicitly check | |
777 for an error in all the routines it calls; if it does not do so, | |
778 unexpected and often random behavior results. This is an extremely | |
779 common source of bugs in C programs. An alternative would be to do a | |
780 non-local exit using @code{longjmp()}, but that is often very dangerous | |
781 because the routines that were exited past had no opportunity to clean | |
782 up after themselves and may leave things in an inconsistent state, | |
783 causing a crash shortly afterwards. | |
784 | |
785 Lisp provides mechanisms to make such non-local exits safe. When an | |
786 error occurs, a routine simply signals that an error of a particular | |
787 class has occurred, and a non-local exit takes place. Any routine can | |
788 trap errors occurring in routines it calls by registering an error | |
789 handler for some or all classes of errors. (If no handler is registered, | |
790 a default handler, generally installed by the top-level event loop, is | |
791 executed; this prints out the error and continues.) Routines can also | |
792 specify cleanup code (called an @dfn{unwind-protect}) that will be | |
793 called when control exits from a block of code, no matter how that exit | |
794 occurs -- i.e. even if a function deeply nested below it causes a | |
795 non-local exit back to the top level. | |
796 | |
797 Note that this facility has appeared in some recent vintages of C, in | |
798 particular Visual C++ and other PC compilers written for the Microsoft | |
799 Win32 API. | |
800 | |
801 @item | |
802 In Emacs Lisp, local variables are @dfn{dynamically scoped}. This means | |
803 that if you declare a local variable in a particular function, and then | |
804 call another function, that subfunction can ``see'' the local variable | |
805 you declared. This is actually considered a bug in Emacs Lisp and in | |
806 all other early dialects of Lisp, and was corrected in Common Lisp. (In | |
807 Common Lisp, you can still declare dynamically scoped variables if you | |
808 want to -- they are sometimes useful -- but variables by default are | |
809 @dfn{lexically scoped} as in C.) | |
810 @end enumerate | |
811 | |
812 For those familiar with Lisp, Emacs Lisp is modelled after MacLisp, an | |
813 early dialect of Lisp developed at MIT (no relation to the Macintosh | |
814 computer). There is a Common Lisp compatibility package available for | |
815 Emacs that provides many of the features of Common Lisp. | |
816 | |
817 The Java language is derived in many ways from C, and shares a similar | |
818 syntax, but has the following features in common with Lisp (and different | |
819 from C): | |
820 | |
821 @enumerate | |
822 @item | |
823 Java is a safe language, like Lisp. | |
824 @item | |
825 Java provides garbage collection, like Lisp. | |
826 @item | |
827 Java has built-in facilities for handling errors and exceptions, like | |
828 Lisp. | |
829 @item | |
830 Java has a type system that combines the best advantages of both static | |
831 and dynamic typing. Objects (except very simple types) are explicitly | |
832 marked with their type, as in dynamic typing; but there is a hierarchy | |
833 of types and functions are declared to accept only certain types, thus | |
834 providing the increased compile-time error-checking of static typing. | |
835 @end enumerate | |
836 | |
837 @node XEmacs From the Perspective of Building, XEmacs From the Inside, The Lisp Language, Top | |
838 @chapter XEmacs From the Perspective of Building | |
839 | |
840 The heart of XEmacs is the Lisp environment, which is written in C. | |
841 This is contained in the @file{src/} subdirectory. Underneath | |
842 @file{src/} are two subdirectories of header files: @file{s/} (header | |
843 files for particular operating systems) and @file{m/} (header files for | |
844 particular machine types). In practice the distinction between the two | |
845 types of header files is blurred. These header files define or undefine | |
846 certain preprocessor constants and macros to indicate particular | |
847 characteristics of the associated machine or operating system. As part | |
848 of the configure process, one @file{s/} file and one @file{m/} file is | |
849 identified for the particular environment in which XEmacs is being | |
850 built. | |
851 | |
852 XEmacs also contains a great deal of Lisp code. This implements the | |
853 operations that make XEmacs useful as an editor as well as just a | |
854 Lisp environment, and also contains many add-on packages that allow | |
855 XEmacs to browse directories, act as a mail and Usenet news reader, | |
856 compile Lisp code, etc. There is actually a lot more Lisp code than | |
857 C code associated with XEmacs, but much of the Lisp code is | |
858 peripheral to the actual operation of the editor. The Lisp code | |
859 all lies in subdirectories underneath the @file{lisp/} directory. | |
860 | |
861 The @file{lwlib/} directory contains C code that implements a | |
862 generalized interface onto different X widget toolkits and also | |
863 implements some widgets of its own that behave like Motif widgets but | |
864 are faster, free, and in some cases more powerful. The code in this | |
865 directory compiles into a library and is mostly independent from XEmacs. | |
866 | |
867 The @file{etc/} directory contains various data files associated with | |
868 XEmacs. Some of them are actually read by XEmacs at startup; others | |
869 merely contain useful information of various sorts. | |
870 | |
871 The @file{lib-src/} directory contains C code for various auxiliary | |
872 programs that are used in connection with XEmacs. Some of them are used | |
873 during the build process; others are used to perform certain functions | |
874 that cannot conveniently be placed in the XEmacs executable (e.g. the | |
875 @file{movemail} program for fetching mail out of /var/spool/mail, which | |
876 must be setgid to @file{mail} on many systems; and the 'gnuclient' | |
877 program, which allows an external script to communicate with a running | |
878 XEmacs process). | |
879 | |
880 The @file{man/} directory contains the sources for the XEmacs | |
881 documentation. It is mostly in a form called Texinfo, which can be | |
882 converted into either a printed document (by passing it through TeX) or | |
883 into on-line documentation called @dfn{info files}. | |
884 | |
885 The @file{info/} directory contains the results of formatting the | |
886 XEmacs documentation as @dfn{info files}, for on-line use. These files | |
887 are used when you enter the Info system using @kbd{C-h i} or through the | |
888 Help menu. | |
889 | |
890 The @file{dynodump/} directory contains auxiliary code used to build | |
891 XEmacs on Solaris platforms. | |
892 | |
893 The other directories contain various miscellaneous code and | |
894 information that is not normally used or needed. | |
895 | |
896 The first step of building involves running the @file{configure} | |
897 program and passing it various parameters to specify any optional | |
898 features you want and compiler arguments and such, as described in the | |
899 @file{INSTALL} file. This determines what the build environment is, | |
900 chooses the appropriate @file{s/} and @file{m/} file, and runs a series | |
901 of tests to determine many details about your environment, such as which | |
902 library functions are available and exactly how they work. (The | |
903 @file{s/} and @file{m/} files only contain information that cannot be | |
904 conveniently detected in this fashion.) The reason for running these | |
905 tests is that it allows XEmacs to be compiled on a much wider variety of | |
906 platforms than those that the XEmacs developers happen to be familiar | |
907 with, including various sorts of hybrid platforms. This is especially | |
908 important now that many operating systems give you a great deal of | |
909 control over exactly what features you want installed, and allow for | |
910 easy upgrading of parts of a system without upgrading the rest. It | |
911 would be impossible to pre-determine and pre-specify the information for | |
912 all possible configurations. | |
913 | |
914 When configure is done running, it generates @file{Makefile}s and the | |
915 file @file{config.h} (which describes the features of your system) from | |
916 template files. You then run @file{make}, which compiles the auxiliary | |
917 code and programs in @file{lib-src/} and @file{lwlib/} and the main | |
918 XEmacs executable in @file{src/}. The result of this is an executable | |
919 called @file{temacs}, which is @emph{not} the XEmacs executable. | |
920 @file{temacs} by itself cannot function as an editor or even display any | |
921 windows on the screen, and if you simply run it, it will exit | |
922 immediately. The Makefile runs @file{temacs} with certain options that | |
923 cause it to initialize itself, read in a number of basic Lisp files, and | |
924 then dump itself out into a new executable called @file{xemacs}. This | |
925 new executable has been pre-initialized and contains pre-digested Lisp | |
926 code that is necessary for the editor to function (this includes some | |
927 extremely basic Lisp functions, e.g. @code{not}, that can be defined in | |
928 terms of other Lisp primitives; some initialization code that is called | |
929 when certain objects, such as frames, are created; and all of the | |
930 standard keybindings and code for the actions they result in). This | |
931 executable, @file{xemacs}, is the executable that you run to use the | |
932 XEmacs editor. | |
933 | |
934 @node XEmacs From the Inside, The XEmacs Object System (Abstractly Speaking), XEmacs From the Perspective of Building, Top | |
935 @chapter XEmacs From the Inside | |
936 | |
937 Internally, XEmacs is quite complex, and can be very confusing. To | |
938 simplify things, it can be useful to think of XEmacs as containing an | |
939 event loop that ``drives'' everything, and a number of other subsystems, | |
940 such as a Lisp engine and a redisplay mechanism. Each of these others | |
941 subsystems exists simultaneously in XEmacs, and each has a certain | |
942 state. The flow of control continually passes in and out of these | |
943 different subsystems in the course of normal operation of the editor. | |
944 | |
945 It is important to keep in mind that, most of the time, the editor is | |
946 ``driven'' by the event loop. Except during initialization and batch | |
947 mode, all subsystems are entered directly or indirectly through the | |
948 event loop, and ultimately, control exits out of all subsystems back up | |
949 to the event loop. This cycle of entering a subsystem, exiting back out | |
950 to the event loop, and starting another iteration of the event loop | |
951 occurs once each keystroke, mouse motion, etc. | |
952 | |
953 If you're trying to understand a particular subsystem (other than the | |
954 event loop), think of it as a ``daemon'' process or ``servant'' that is | |
955 responsible for one particular aspect of a larger system, and | |
956 periodically receives commands or environment changes that cause it to | |
957 do something. Ultimately, these commands and environment changes are | |
958 always triggered by the event loop. For example: | |
959 | |
960 @itemize @bullet | |
961 @item | |
962 The window and frame mechanism is responsible for keeping track of what | |
963 windows and frames exist, what buffers are in them, etc. It is | |
964 periodically given commands (usually from the user) to make a change to | |
965 the current window/frame state: i.e. create a new frame, delete a | |
966 window, etc. | |
967 | |
968 @item | |
969 The buffer mechanism is responsible for keeping track of what buffers | |
970 exist and what text is in them. It is periodically given commands | |
971 (usually from the user) to insert or delete text, create a buffer, etc. | |
972 When it receives a textual-change command, it tells the redisplay | |
973 mechanism about this. | |
974 | |
975 @item | |
976 The redisplay mechanism is responsible for making sure that windows and | |
977 frames are displayed correctly. It is periodically told (by the event | |
978 loop) to actually ``do its job'', i.e. snoop around and see what the | |
979 current state of the environment (mostly of the currently-existing | |
980 windows, frames, and buffers) is, and make sure that that state matches | |
981 what's actually displayed. It keeps lots and lots of information around | |
982 (such as what is actually being displayed currently, and what the | |
983 environment was last time it checked) so that it can minimize the work | |
984 it has to do. It is also helped along in that whenever a relevant | |
985 change to the environment occurs, the redisplay mechanism is told about | |
986 this, so it has a pretty good idea of where it has to look to find | |
987 possible changes and doesn't have to look everywhere. | |
988 | |
989 @item | |
990 The Lisp engine is responsible for executing the Lisp code in which most | |
991 user commands are written. It is entered through a call to @code{eval} | |
992 or @code{funcall}, which occurs as a result of dispatching an event from | |
993 the event loop. The functions it calls issue commands to the buffer | |
994 mechanism, the window/frame subsystem, etc. | |
995 | |
996 @item | |
997 The Lisp allocation subsystem is responsible for keeping track of Lisp | |
998 objects. It is given commands from the Lisp engine to allocate objects, | |
999 garbage collect, etc. | |
1000 @end itemize | |
1001 | |
1002 etc. | |
1003 | |
1004 The important idea here is that there are a number of independent | |
1005 subsystems each with their own responsibility and persistent state, just | |
1006 like different employees in a company, and each subsystem is | |
1007 periodically given commands from other subsystems. Commands can flow | |
1008 from any one subsystem to any other, but there is usually some sort of | |
1009 hierarchy, with all commands originating from the event subsystem. | |
1010 | |
1011 XEmacs is entered in @code{main()}, which is in @file{emacs.c}. When | |
1012 this is called the first time (in a properly-invoked @file{temacs}), it | |
1013 does the following: | |
1014 | |
1015 @enumerate | |
1016 @item | |
1017 It does some very basic environment initializations, such as determining | |
1018 where it and its directories (e.g. @file{lisp/} and @file{etc/}) reside | |
1019 and setting up signal handlers. | |
1020 @item | |
1021 It initializes the entire Lisp interpreter. | |
1022 @item | |
1023 It sets the initial values of many built-in variables (including many | |
1024 variables that are visible to Lisp programs), such as the global keymap | |
1025 object and the built-in faces (a face is an object that describes the | |
1026 display characteristics of text). This involves creating Lisp objects | |
1027 and thus is dependent on step (2). | |
1028 @item | |
1029 It performs various other initializations that are relevant to the | |
1030 particular environment it is running in, such as retrieving environment | |
1031 variables, determining the current date and the user who is running the | |
1032 program, examining its standard input, creating any necessary file | |
1033 descriptors, etc. | |
1034 @item | |
1035 At this point, the C initialization is complete. A Lisp program that | |
1036 was specified on the command line (usually @file{loadup.el}) is called | |
1037 (temacs is normally invoked as @code{temacs -batch -l loadup.el dump}). | |
1038 @file{loadup.el} loads all of the other Lisp files that are needed for | |
1039 the operation of the editor, calls the @code{dump-emacs} function to | |
1040 write out @file{xemacs}, and then kills the temacs process. | |
1041 @end enumerate | |
1042 | |
1043 When @file{xemacs} is then run, it only redoes steps (1) and (4) | |
1044 above; all variables already contain the values they were set to when | |
1045 the executable was dumped, and all memory that was allocated with | |
1046 @code{malloc()} is still around. (XEmacs knows whether it is being run | |
1047 as @file{xemacs} or @file{temacs} because it sets the global variable | |
1048 @code{initialized} to 1 after step (4) above.) At this point, | |
1049 @file{xemacs} calls a Lisp function to do any further initialization, | |
1050 which includes parsing the command-line (the C code can only do limited | |
1051 command-line parsing, which includes looking for the @samp{-batch} and | |
1052 @samp{-l} flags and a few other flags that it needs to know about before | |
1053 initialization is complete), creating the first frame (or @dfn{window} | |
1054 in standard window-system parlance), running the user's init file | |
1055 (usually the file @file{.emacs} in the user's home directory), etc. The | |
1056 function to do this is usually called @code{normal-top-level}; | |
1057 @file{loadup.el} tells the C code about this function by setting its | |
1058 name as the value of the Lisp variable @code{top-level}. | |
1059 | |
1060 When the Lisp initialization code is done, the C code enters the event | |
1061 loop, and stays there for the duration of the XEmacs process. The code | |
1062 for the event loop is contained in @file{keyboard.c}, and is called | |
1063 @code{Fcommand_loop_1()}. Note that this event loop could very well be | |
1064 written in Lisp, and in fact a Lisp version exists; but apparently, | |
1065 doing this makes XEmacs run noticeably slower. | |
1066 | |
1067 Notice how much of the initialization is done in Lisp, not in C. | |
1068 In general, XEmacs tries to move as much code as is possible | |
1069 into Lisp. Code that remains in C is code that implements the | |
1070 Lisp interpreter itself, or code that needs to be very fast, or | |
1071 code that needs to do system calls or other such stuff that | |
1072 needs to be done in C, or code that needs to have access to | |
1073 ``forbidden'' structures. (One conscious aspect of the design of | |
1074 Lisp under XEmacs is a clean separation between the external | |
1075 interface to a Lisp object's functionality and its internal | |
1076 implementation. Part of this design is that Lisp programs | |
1077 are forbidden from accessing the contents of the object other | |
1078 than through using a standard API. In this respect, XEmacs Lisp | |
1079 is similar to modern Lisp dialects but differs from GNU Emacs, | |
1080 which tends to expose the implementation and allow Lisp | |
1081 programs to look at it directly. The major advantage of | |
1082 hiding the implementation is that it allows the implementation | |
1083 to be redesigned without affecting any Lisp programs, including | |
1084 those that might want to be ``clever'' by looking directly at | |
1085 the object's contents and possibly manipulating them.) | |
1086 | |
1087 Moving code into Lisp makes the code easier to debug and maintain and | |
1088 makes it much easier for people who are not XEmacs developers to | |
1089 customize XEmacs, because they can make a change with much less chance | |
1090 of obscure and unwanted interactions occurring than if they were to | |
1091 change the C code. | |
1092 | |
1093 @node The XEmacs Object System (Abstractly Speaking), How Lisp Objects Are Represented in C, XEmacs From the Inside, Top | |
1094 @chapter The XEmacs Object System (Abstractly Speaking) | |
1095 | |
1096 At the heart of the Lisp interpreter is its management of objects. | |
1097 XEmacs Lisp contains many built-in objects, some of which are | |
1098 simple and others of which can be very complex; and some of which | |
1099 are very common, and others of which are rarely used or are only | |
1100 used internally. (Since the Lisp allocation system, with its | |
1101 automatic reclamation of unused storage, is so much more convenient | |
1102 than @code{malloc()} and @code{free()}, the C code makes extensive use of it | |
1103 in its internal operations.) | |
1104 | |
1105 The basic Lisp objects are | |
1106 | |
1107 @table @code | |
1108 @item integer | |
1109 28 bits of precision, or 60 bits on 64-bit machines; the reason for this | |
1110 is described below when the internal Lisp object representation is | |
1111 described. | |
1112 @item float | |
1113 Same precision as a double in C. | |
1114 @item cons | |
1115 A simple container for two Lisp objects, used to implement lists and | |
1116 most other data structures in Lisp. | |
1117 @item char | |
1118 An object representing a single character of text; chars behave like | |
1119 integers in many ways but are logically considered text rather than | |
1120 numbers and have a different read syntax. (the read syntax for a char | |
1121 contains the char itself or some textual encoding of it -- for example, | |
1122 a Japanese Kanji character might be encoded as @samp{^[$(B#&^[(B} using the | |
1123 ISO-2022 encoding standard -- rather than the numerical representation | |
1124 of the char; this way, if the mapping between chars and integers | |
1125 changes, which is quite possible for Kanji characters and other extended | |
1126 characters, the same character will still be created. Note that some | |
1127 primitives confuse chars and integers. The worst culprit is @code{eq}, | |
1128 which makes a special exception and considers a char to be @code{eq} to | |
1129 its integer equivalent, even though in no other case are objects of two | |
1130 different types @code{eq}. The reason for this monstrosity is | |
1131 compatibility with existing code; the separation of char from integer | |
1132 came fairly recently.) | |
1133 @item symbol | |
1134 An object that contains Lisp objects and is referred to by name; | |
1135 symbols are used to implement variables and named functions | |
1136 and to provide the equivalent of preprocessor constants in C. | |
1137 @item vector | |
1138 A one-dimensional array of Lisp objects providing constant-time access | |
1139 to any of the objects; access to an arbitrary object in a vector is | |
1140 faster than for lists, but the operations that can be done on a vector | |
1141 are more limited. | |
1142 @item string | |
1143 Self-explanatory; behaves much like a vector of chars | |
1144 but has a different read syntax and is stored and manipulated | |
1145 more compactly and efficiently. | |
1146 @item bit-vector | |
1147 A vector of bits; similar to a string in spirit. | |
1148 @item compiled-function | |
1149 An object describing compiled Lisp code, known as @dfn{byte code}. | |
1150 @item subr | |
1151 An object describing a Lisp primitive. | |
1152 @end table | |
1153 | |
1154 @cindex closure | |
1155 Note that there is no basic ``function'' type, as in more powerful | |
1156 versions of Lisp (where it's called a @dfn{closure}). XEmacs Lisp does | |
1157 not provide the closure semantics implemented by Common Lisp and Scheme. | |
1158 The guts of a function in XEmacs Lisp are represented in one of four | |
1159 ways: a symbol specifying another function (when one function is an | |
1160 alias for another), a list containing the function's source code, a | |
1161 bytecode object, or a subr object. (In other words, given a symbol | |
1162 specifying the name of a function, calling @code{symbol-function} to | |
1163 retrieve the contents of the symbol's function cell will return one of | |
1164 these types of objects.) | |
1165 | |
1166 XEmacs Lisp also contains numerous specialized objects used to | |
1167 implement the editor: | |
1168 | |
1169 @table @asis | |
1170 @item buffer | |
1171 Stores text like a string, but is optimized for insertion and deletion | |
1172 and has certain other properties that can be set. | |
1173 @item frame | |
1174 An object with various properties whose displayable representation is a | |
1175 @dfn{window} in window-system parlance. | |
1176 @item window | |
1177 A section of a frame that displays the contents of a buffer; | |
1178 often called a @dfn{pane} in window-system parlance. | |
1179 @item window-configuration | |
1180 An object that represents a saved configuration of windows in a frame. | |
1181 @item device | |
1182 An object representing a screen on which frames can be displayed; | |
1183 equivalent to a @dfn{display} in the X Window System and a @dfn{TTY} in | |
1184 character mode. | |
1185 @item face | |
1186 An object specifying the appearance of text or graphics; it contains | |
1187 characteristics such as font, foreground color, and background color. | |
1188 @item marker | |
1189 An object that refers to a particular position in a buffer and moves | |
1190 around as text is inserted and deleted to stay in the same relative | |
1191 position to the text around it. | |
1192 @item extent | |
1193 Similar to a marker but covers a range of text in a buffer; can also | |
1194 specify properties of the text, such as a face in which the text is to | |
1195 be displayed, whether the text is invisible or unmodifiable, etc. | |
1196 @item event | |
1197 Generated by calling @code{next-event} and contains information | |
1198 describing a particular event happening in the system, such as the user | |
1199 pressing a key or a process terminating. | |
1200 @item keymap | |
1201 An object that maps from events (described using lists, vectors, and | |
1202 symbols rather than with an event object because the mapping is for | |
1203 classes of events, rather than individual events) to functions to | |
1204 execute or other events to recursively look up; the functions are | |
1205 described by name, using a symbol, or using lists to specify the | |
1206 function's code. | |
1207 @item glyph | |
1208 An object that describes the appearance of an image (e.g. pixmap) on | |
1209 the screen; glyphs can be attached to the beginning or end of extents | |
1210 and in some future version of XEmacs will be able to be inserted | |
1211 directly into a buffer. | |
1212 @item process | |
1213 An object that describes a connection to an externally-running process. | |
1214 @end table | |
1215 | |
1216 There are some other, less-commonly-encountered general objects: | |
1217 | |
1218 @table @asis | |
1219 @item hashtable | |
1220 An object that maps from an arbitrary Lisp object to another arbitrary | |
1221 Lisp object, using hashing for fast lookup. | |
1222 @item obarray | |
1223 A limited form of hashtable that maps from strings to symbols; obarrays | |
1224 are used to look up a symbol given its name and are not actually their | |
1225 own object type but are kludgily represented using vectors with hidden | |
1226 fields (this representation derives from GNU Emacs). | |
1227 @item specifier | |
1228 A complex object used to specify the value of a display property; a | |
1229 default value is given and different values can be specified for | |
1230 particular frames, buffers, windows, devices, or classes of device. | |
1231 @item char-table | |
1232 An object that maps from chars or classes of chars to arbitrary Lisp | |
1233 objects; internally char tables use a complex nested-vector | |
1234 representation that is optimized to the way characters are represented | |
1235 as integers. | |
1236 @item range-table | |
1237 An object that maps from ranges of integers to arbitrary Lisp objects. | |
1238 @end table | |
1239 | |
1240 And some strange special-purpose objects: | |
1241 | |
1242 @table @asis | |
1243 @item charset | |
1244 @itemx coding-system | |
1245 Objects used when MULE, or multi-lingual/Asian-language, support is | |
1246 enabled. | |
1247 @item color-instance | |
1248 @itemx font-instance | |
1249 @itemx image-instance | |
1250 An object that encapsulates a window-system resource; instances are | |
1251 mostly used internally but are exposed on the Lisp level for cleanness | |
1252 of the specifier model and because it's occasionally useful for Lisp | |
1253 program to create or query the properties of instances. | |
1254 @item subwindow | |
1255 An object that encapsulate a @dfn{subwindow} resource, i.e. a | |
1256 window-system child window that is drawn into by an external process; | |
1257 this object should be integrated into the glyph system but isn't yet, | |
1258 and may change form when this is done. | |
1259 @item tooltalk-message | |
1260 @itemx tooltalk-pattern | |
1261 Objects that represent resources used in the ToolTalk interprocess | |
1262 communication protocol. | |
1263 @item toolbar-button | |
1264 An object used in conjunction with the toolbar. | |
1265 @item x-resource | |
1266 An object that encapsulates certain miscellaneous resources in the X | |
1267 window system, used only when Epoch support is enabled. | |
1268 @end table | |
1269 | |
1270 And objects that are only used internally: | |
1271 | |
1272 @table @asis | |
1273 @item opaque | |
1274 A generic object for encapsulating arbitrary memory; this allows you the | |
1275 generality of @code{malloc()} and the convenience of the Lisp object | |
1276 system. | |
1277 @item lstream | |
1278 A buffering I/O stream, used to provide a unified interface to anything | |
1279 that can accept output or provide input, such as a file descriptor, a | |
1280 stdio stream, a chunk of memory, a Lisp buffer, a Lisp string, etc.; | |
1281 it's a Lisp object to make its memory management more convenient. | |
1282 @item char-table-entry | |
1283 Subsidiary objects in the internal char-table representation. | |
1284 @item extent-auxiliary | |
1285 @itemx menubar-data | |
1286 @itemx toolbar-data | |
1287 Various special-purpose objects that are basically just used to | |
1288 encapsulate memory for particular subsystems, similar to the more | |
1289 general ``opaque'' object. | |
1290 @item symbol-value-forward | |
1291 @itemx symbol-value-buffer-local | |
1292 @itemx symbol-value-varalias | |
1293 @itemx symbol-value-lisp-magic | |
1294 Special internal-only objects that are placed in the value cell of a | |
1295 symbol to indicate that there is something special with this variable -- | |
1296 e.g. it has no value, it mirrors another variable, or it mirrors some C | |
1297 variable; there is really only one kind of object, called a | |
1298 @dfn{symbol-value-magic}, but it is sort-of halfway kludged into | |
1299 semi-different object types. | |
1300 @end table | |
1301 | |
1302 @cindex permanent objects | |
1303 @cindex temporary objects | |
1304 Some types of objects are @dfn{permanent}, meaning that once created, | |
1305 they do not disappear until explicitly destroyed, using a function such | |
1306 as @code{delete-buffer}, @code{delete-window}, @code{delete-frame}, etc. | |
1307 Others will disappear once they are not longer used, through the garbage | |
1308 collection mechanism. Buffers, frames, windows, devices, and processes | |
1309 are among the objects that are permanent. Note that some objects can go | |
1310 both ways: Faces can be created either way; extents are normally | |
1311 permanent, but detached extents (extents not referring to any text, as | |
1312 happens to some extents when the text they are referring to is deleted) | |
1313 are temporary. Note that some permanent objects, such as faces and | |
1314 coding systems, cannot be deleted. Note also that windows are unique in | |
1315 that they can be @emph{undeleted} after having previously been | |
1316 deleted. (This happens as a result of restoring a window configuration.) | |
1317 | |
1318 @cindex read syntax | |
1319 Note that many types of objects have a @dfn{read syntax}, i.e. a way of | |
1320 specifying an object of that type in Lisp code. When you load a Lisp | |
1321 file, or type in code to be evaluated, what really happens is that the | |
1322 function @code{read} is called, which reads some text and creates an object | |
1323 based on the syntax of that text; then @code{eval} is called, which | |
1324 possibly does something special; then this loop repeats until there's | |
1325 no more text to read. (@code{eval} only actually does something special | |
1326 with symbols, which causes the symbol's value to be returned, | |
1327 similar to referencing a variable; and with conses [i.e. lists], | |
1328 which cause a function invocation. All other values are returned | |
1329 unchanged.) | |
1330 | |
1331 The read syntax | |
1332 | |
1333 @example | |
1334 17297 | |
1335 @end example | |
1336 | |
1337 converts to an integer whose value is 17297. | |
1338 | |
1339 @example | |
1340 1.983e-4 | |
1341 @end example | |
1342 | |
1343 converts to a float whose value is 1983.23e-4, or .0001983. | |
1344 | |
1345 @example | |
1346 ?b | |
1347 @end example | |
1348 | |
1349 converts to a char that represents the lowercase letter b. | |
1350 | |
1351 @example | |
1352 ?^[$(B#&^[(B | |
1353 @end example | |
1354 | |
1355 (where @samp{^[} actually is an @samp{ESC} character) converts to a | |
1356 particular Kanji character. (To decode this gook: @samp{ESC} begins an | |
1357 escape sequence; @samp{ESC $ (} is a class of escape sequences meaning | |
1358 ``switch to a 94x94 character set''; @samp{ESC $ ( B} means ``switch to | |
1359 Japanese Kanji''; @samp{#} and @samp{&} collectively index into a | |
1360 94-by-94 array of characters [subtract 33 from the ASCII value of each | |
1361 character to get the corresponding index]; @samp{ESC (} is a class of | |
1362 escape sequences meaning ``switch to a 94 character set''; @samp{ESC (B} | |
1363 means ``switch to US ASCII''. It is a coincidence that the letter | |
1364 @samp{B} is used to denote both Japanese Kanji and US ASCII. If the | |
1365 first @samp{B} were replaced with an @samp{A}, you'd be requesting a | |
1366 Chinese Hanzi character from the GB2312 character set.) | |
1367 | |
1368 @example | |
1369 "foobar" | |
1370 @end example | |
1371 | |
1372 converts to a string. | |
1373 | |
1374 @example | |
1375 foobar | |
1376 @end example | |
1377 | |
1378 converts to a symbol whose name is @code{"foobar"}. This is done by | |
1379 looking up the string equivalent in the global variable | |
1380 @code{obarray}, whose contents should be an obarray. If no symbol | |
1381 is found, a new symbol with the name @code{"foobar"} is automatically | |
1382 created and adding it to @code{obarray}; this process is called | |
1383 @dfn{interning} the symbol. | |
1384 @cindex interning | |
1385 | |
1386 @example | |
1387 (foo . bar) | |
1388 @end example | |
1389 | |
1390 converts to a cons cell containing the symbols @code{foo} and @code{bar}. | |
1391 | |
1392 @example | |
1393 (1 a 2.5) | |
1394 @end example | |
1395 | |
1396 converts to a three-element list containing the specified objects | |
1397 (note that a list is actually a set of nested conses; see the | |
1398 XEmacs Lisp Reference). | |
1399 | |
1400 @example | |
1401 [1 a 2.5] | |
1402 @end example | |
1403 | |
1404 converts to a three-element vector containing the specified objects. | |
1405 | |
1406 @example | |
1407 #[... ... ... ...] | |
1408 @end example | |
1409 | |
1410 converts to a compiled-function object (the actual contents are not | |
1411 shown since they are not relevant here; look at a file that ends with | |
1412 @file{.elc} for examples). | |
1413 | |
1414 @example | |
1415 #*01110110 | |
1416 @end example | |
1417 | |
1418 converts to a bit-vector. | |
1419 | |
1420 @example | |
1421 #s(range-table ... ...) | |
1422 @end example | |
1423 | |
1424 converts to a range table (the actual contents are not shown). | |
1425 | |
1426 @example | |
1427 #s(char-table ... ...) | |
1428 @end example | |
1429 | |
1430 converts to a char table (the actual contents are not shown). | |
1431 (Note that the #s syntax is the general syntax for structures, | |
1432 which are not really implemented in XEmacs Lisp but should be.) | |
1433 | |
1434 When an object is printed out (using @code{print} or a related | |
1435 function), the read syntax is used, so that the same object can be read | |
1436 in again. | |
1437 | |
1438 The other objects do not have read syntaxes, usually because it does | |
1439 not really make sense to create them in this fashion (i.e. processes, | |
1440 where it doesn't make sense to have a subprocess created as a side | |
1441 effect of reading some Lisp code), or because they can't be created at | |
1442 all (e.g. subrs). Permanent objects, as a rule, do not have a read | |
1443 syntax; nor do most complex objects, which contain too much state to be | |
1444 easily initialized through a read syntax. | |
1445 | |
1446 @node How Lisp Objects Are Represented in C, Rules When Writing New C Code, The XEmacs Object System (Abstractly Speaking), Top | |
1447 @chapter How Lisp Objects Are Represented in C | |
1448 | |
1449 Lisp objects are represented in C using a 32- or 64-bit machine word | |
1450 (depending on the processor; i.e. DEC Alphas use 64-bit Lisp objects and | |
1451 most other processors use 32-bit Lisp objects). The representation | |
1452 stuffs a pointer together with a tag, as follows: | |
1453 | |
1454 @example | |
1455 [ 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 ] | |
1456 [ 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 ] | |
1457 | |
1458 ^ <---> <------------------------------------------------------> | |
1459 | tag a pointer to a structure, or an integer | |
1460 | | |
1461 `---> mark bit | |
1462 @end example | |
1463 | |
1464 The tag describes the type of the Lisp object. For integers and | |
1465 chars, the lower 28 bits contain the value of the integer or char; for | |
1466 all others, the lower 28 bits contain a pointer. The mark bit is used | |
1467 during garbage-collection, and is always 0 when garbage collection is | |
1468 not happening. Many macros that extract out parts of a Lisp object | |
1469 expect that the mark bit is 0, and will produce incorrect results if | |
1470 it's not. (The way that garbage collection works, basically, is that it | |
1471 loops over all places where Lisp objects could exist -- this includes | |
1472 all global variables in C that contain Lisp objects [including | |
1473 @code{Vobarray}, the C equivalent of @code{obarray}; through this, all | |
1474 Lisp variables will get marked], plus various other places -- and | |
1475 recursively scans through the Lisp objects, marking each object it finds | |
1476 by setting the mark bit. Then it goes through the lists of all objects | |
1477 allocated, freeing the ones that are not marked and turning off the | |
1478 mark bit of the ones that are marked.) | |
1479 | |
1480 Lisp objects use the typedef @code{Lisp_Object}, but the actual C type | |
1481 used for the Lisp object can vary. It can be either a simple type | |
1482 (@code{long} on the DEC Alpha, @code{int} on other machines) or a | |
1483 structure whose fields are bit fields that line up properly (actually, | |
1484 it's a union of structures that's used). Generally the simple integral | |
1485 type is preferable because it ensures that the compiler will actually | |
1486 use a machine word to represent the object (some compilers will use more | |
1487 general and less efficient code for unions and structs even if they can | |
1488 fit in a machine word). The union type, however, has the advantage of | |
1489 stricter type checking (if you accidentally pass an integer where a Lisp | |
1490 object is desired, you get a compile error), and it makes it easier to | |
1491 decode Lisp objects when debugging. The choice of which type to use is | |
1492 determined by the presence or absence of the preprocessor constant | |
1493 @code{NO_UNION_TYPE}. (Shouldn't it be @code{USE_UNION_TYPE}, with | |
1494 opposite semantics? ``Hysterical reasons'', of course.) | |
1495 | |
1496 @cindex record type | |
1497 Note that there are only eight types that the tag can represent, | |
1498 but many more actual types than this. This is handled by having | |
1499 one of the tag types specify a meta-object called a @dfn{record}; | |
1500 for all such objects, the first four bytes of the pointed-to | |
1501 structure indicate what the actual type is. | |
1502 | |
1503 Note also that having 28 bits for pointers and integers restricts a | |
1504 lot of things to 256 megabytes of memory. (Basically, enough pointers | |
1505 and indices and whatnot get stuffed into Lisp objects that the total | |
1506 amount of memory used by XEmacs can't grow above 256 megabytes. In | |
1507 older versions of XEmacs and GNU Emacs, the tag was 5 bits wide, | |
1508 allowing for 32 types, which was more than the actual number of types | |
1509 that existed at the time, and no ``record'' type was necessary. | |
1510 However, this limited the editor to 64 megabytes total, which some users | |
1511 who edited large files might conceivably exceed.) | |
1512 | |
1513 Also, note that there is an implicit assumption here that all pointers | |
1514 are low enough that the top bits are all zero and can just be chopped | |
1515 off. On standard machines that allocate memory from the bottom up (and | |
1516 give each process its own address space), this works fine. Some | |
1517 machines, however, put the data space somewhere else in memory | |
1518 (e.g. beginning at 0x80000000). Those machines cope by defining | |
1519 @code{DATA_SEG_BITS} in the corresponding @file{m/} or @file{s/} file to | |
1520 the proper mask. Then, pointers retrieved from Lisp objects are | |
1521 automatically OR'ed with this value prior to being used. | |
1522 | |
1523 A corollary of the previous paragraph is that @strong{stack-allocated | |
1524 structures cannot be put into Lisp objects}. The stack is generally | |
1525 located near the top of memory; if you put such a pointer into a Lisp | |
1526 object, it will get its top bits chopped off, and you will lose. | |
1527 | |
1528 Various macros are used to construct Lisp objects and extract the | |
1529 components. Macros of the form @code{XINT()}, @code{XCHAR()}, | |
1530 @code{XSTRING()}, @code{XSYMBOL()}, etc. mask out the pointer/integer | |
1531 field and cast it to the appropriate type. All of the macros that | |
1532 construct pointers will @code{OR} with @code{DATA_SEG_BITS} if | |
1533 necessary. @code{XINT()} needs to be a bit tricky so that negative | |
1534 numbers are properly sign-extended: Usually it does this by shifting the | |
1535 number four bits to the left and then four bits to the right. This | |
1536 assumes that the right-shift operator does an arithmetic shift (i.e. it | |
1537 leaves the most-significant bit as-is rather than shifting in a zero, so | |
1538 that it mimics a divide-by-two even for negative numbers). Not all | |
1539 machines/compilers do this, and on the ones that don't, a more | |
1540 complicated definition is selected by defining | |
1541 @code{EXPLICIT_SIGN_EXTEND}. | |
1542 | |
1543 Note that when @code{ERROR_CHECK_TYPECHECK} is defined, the extractor | |
1544 macros become more complicated -- they check the tag bits and/or the | |
1545 type field in the first four bytes of a record type to ensure that the | |
1546 object is really of the correct type. This is great for catching places | |
1547 where an incorrect type is being dereferenced -- this typically results | |
1548 in a pointer being dereferenced as the wrong type of structure, with | |
1549 unpredictable (and sometimes not easily traceable) results. | |
1550 | |
1551 There are similar @code{XSET()} macros that construct a Lisp object. | |
1552 These macros are of the form @code{XSET (@var{lvalue}, @var{result})}, | |
1553 i.e. they have to be a statement rather than just used in an expression. | |
1554 The reason for this is that standard C doesn't let you ``construct'' a | |
1555 structure (but GCC does). Granted, this sometimes isn't too convenient; | |
1556 for the case of integers, at least, you can use the function | |
1557 @code{make_number()}, which constructs and @emph{returns} an integer | |
1558 Lisp object. Note that the @code{XSET()} macros are also affected by | |
1559 @code{ERROR_CHECK_TYPECHECK} and make sure that the structure is of the right | |
1560 type in the case of record types, where the type is contained in | |
1561 the structure. | |
1562 | |
1563 @node Rules When Writing New C Code, A Summary of the Various XEmacs Modules, How Lisp Objects Are Represented in C, Top | |
1564 @chapter Rules When Writing New C Code | |
1565 | |
1566 The XEmacs C Code is extremely complex and intricate, and there are | |
1567 many rules that are more or less consistently followed throughout the code. | |
1568 Many of these rules are not obvious, so they are explained here. It is | |
1569 of the utmost importance that you follow them. If you don't, you may get | |
1570 something that appears to work, but which will crash in odd situations, | |
1571 often in code far away from where the actual breakage is. | |
1572 | |
1573 @menu | |
1574 * General Coding Rules:: | |
1575 * Writing Lisp Primitives:: | |
1576 * Adding Global Lisp Variables:: | |
1577 @end menu | |
1578 | |
1579 @node General Coding Rules | |
1580 @section General Coding Rules | |
1581 | |
1582 Almost every module contains a @code{syms_of_*()} function and a | |
1583 @code{vars_of_*()} function. The former declares any Lisp primitives | |
1584 you have defined and defines any symbols you will be using. The latter | |
1585 declares any global Lisp variables you have added and initializes global | |
1586 C variables in the module. For each such function, declare it in | |
1587 @file{symsinit.h} and make sure it's called in the appropriate place in | |
1588 @code{main()}. @strong{Important}: There are stringent requirements on | |
1589 exactly what can go into these functions. See the comment in | |
1590 @code{main()}. The reason for this is to avoid obscure unwanted | |
1591 interactions during initialization. If you don't follow these rules, | |
1592 you'll be sorry! If you want to do anything that isn't allowed, create | |
1593 a @code{complex_vars_of_*()} function for it. Doing this is tricky, | |
1594 though: You have to make sure your function is called at the right time | |
1595 so that all the initialization dependencies work out. | |
1596 | |
1597 Every module includes @file{<config.h>} (angle brackets so that | |
1598 @samp{--srcdir} works correctly) and @file{lisp.h}. @file{config.h} | |
1599 should always be included before any other header files (including | |
1600 system header files) to ensure that certain tricks played by various | |
1601 @file{s/} and @file{m/} files work out correctly. | |
1602 | |
1603 @strong{All global and static variables that are to be modifiable must | |
1604 be declared uninitialized.} This means that you may not use the ``declare | |
1605 with initializer'' form for these variables, such as @code{int | |
1606 some_variable = 0;}. The reason for this has to do with some kludges | |
1607 done during the dumping process: If possible, the initialized data | |
1608 segment is re-mapped so that it becomes part of the (unmodifiable) code | |
1609 segment in the dumped executable. This allows this memory to be shared | |
1610 among multiple running XEmacs processes. XEmacs is careful to place as | |
1611 much constant data as possible into initialized variables (in | |
1612 particular, into what's called the @dfn{pure space} -- see below) during | |
1613 the @file{temacs} phase. | |
1614 | |
1615 @cindex copy-on-write | |
1616 @strong{Note:} This kludge only works on a few systems nowadays, and is | |
1617 rapidly becoming irrelevant because most modern operating systems provide | |
1618 @dfn{copy-on-write} semantics. All data is initially shared between | |
1619 processes, and a private copy is automatically made (on a page-by-page | |
1620 basis) when a process first attempts to write to a page of memory. | |
1621 | |
1622 Formerly, there was a requirement that static variables not be | |
1623 declared inside of functions. This had to do with another hack along | |
1624 the same vein as what was just described: old USG systems put | |
1625 statically-declared variables in the initialized data space, so those | |
1626 header files had a @code{#define static} declaration. (That way, the | |
1627 data-segment remapping described above could still work.) This fails | |
1628 badly on static variables inside of functions, which suddenly become | |
1629 automatic variables; therefore, you weren't supposed to have any of | |
1630 them. This awful kludge has been removed in XEmacs because | |
1631 | |
1632 @enumerate | |
1633 @item | |
1634 almost all of the systems that used this kludge ended up having | |
1635 to disable the data-segment remapping anyway; | |
1636 @item | |
1637 the only systems that didn't were extremely outdated ones; | |
1638 @item | |
1639 this hack completely messed up inline functions. | |
1640 @end enumerate | |
1641 | |
1642 @node Writing Lisp Primitives | |
1643 @section Writing Lisp Primitives | |
1644 | |
1645 Lisp primitives are Lisp functions implemented in C. The details of | |
1646 interfacing the C function so that Lisp can call it are handled by a few | |
1647 C macros. The only way to really understand how to write new C code is | |
1648 to read the source, but we can explain some things here. | |
1649 | |
1650 An example of a special form is the definition of @code{or}, from | |
1651 @file{eval.c}. (An ordinary function would have the same general | |
1652 appearance.) | |
1653 | |
1654 @cindex garbage collection protection | |
1655 @smallexample | |
1656 @group | |
1657 DEFUN ("or", For, Sor, 0, UNEVALLED, 0 /* | |
1658 Eval args until one of them yields non-nil, then return that value. | |
1659 The remaining args are not evalled at all. | |
1660 @end group | |
1661 @group | |
1662 If all args return nil, return nil. | |
1663 */ ) | |
1664 (args) | |
1665 Lisp_Object args; | |
1666 @{ | |
1667 /* This function can GC */ | |
1668 REGISTER Lisp_Object val; | |
1669 Lisp_Object args_left; | |
1670 struct gcpro gcpro1; | |
1671 @end group | |
1672 | |
1673 @group | |
1674 if (NILP (args)) | |
1675 return Qnil; | |
1676 | |
1677 args_left = args; | |
1678 GCPRO1 (args_left); | |
1679 @end group | |
1680 | |
1681 @group | |
1682 do | |
1683 @{ | |
1684 val = Feval (Fcar (args_left)); | |
1685 if (!NILP (val)) | |
1686 break; | |
1687 args_left = Fcdr (args_left); | |
1688 @} | |
1689 while (!NILP (args_left)); | |
1690 @end group | |
1691 | |
1692 @group | |
1693 UNGCPRO; | |
1694 return val; | |
1695 @} | |
1696 @end group | |
1697 @end smallexample | |
1698 | |
1699 Let's start with a precise explanation of the arguments to the | |
1700 @code{DEFUN} macro. Here is a template for them: | |
1701 | |
1702 @example | |
1703 DEFUN (@var{lname}, @var{fname}, @var{sname}, @var{min}, @var{max}, @var{interactive} /* @var{doc} */ ) | |
1704 @end example | |
1705 | |
1706 @table @var | |
1707 @item lname | |
1708 This is the name of the Lisp symbol to define as the function name; in | |
1709 the example above, it is @code{or}. | |
1710 | |
1711 @item fname | |
1712 This is the C function name for this function. This is | |
1713 the name that is used in C code for calling the function. The name is, | |
1714 by convention, @samp{F} prepended to the Lisp name, with all dashes | |
1715 (@samp{-}) in the Lisp name changed to underscores. Thus, to call this | |
1716 function from C code, call @code{For}. Remember that the arguments must | |
1717 be of type @code{Lisp_Object}; various macros and functions for creating | |
1718 values of type @code{Lisp_Object} are declared in the file | |
1719 @file{lisp.h}. | |
1720 | |
1721 Primitives whose names are special characters (e.g. @code{+} or | |
1722 @code{<}) are named by spelling out, in some fashion, the special | |
1723 character: e.g. @code{Fplus()} or @code{Flss()}. Primitives whose names | |
1724 begin with normal alphanumeric characters but also contain special | |
1725 characters are spelled out in some creative way, e.g. @code{let*} | |
1726 becomes @code{FletX()}. | |
1727 | |
1728 @item sname | |
1729 This is a C variable name to use for a structure that holds the data for | |
1730 the subr object that represents the function in Lisp. This structure | |
1731 conveys the Lisp symbol name to the initialization routine that will | |
1732 create the symbol and store the subr object as its definition. By | |
1733 convention, this name is always @var{fname} with @samp{F} replaced with | |
1734 @samp{S}. | |
1735 | |
1736 @item min | |
1737 This is the minimum number of arguments that the function requires. The | |
1738 function @code{or} allows a minimum of zero arguments. | |
1739 | |
1740 @item max | |
1741 This is the maximum number of arguments that the function accepts, if | |
1742 there is a fixed maximum. Alternatively, it can be @code{UNEVALLED}, | |
1743 indicating a special form that receives unevaluated arguments, or | |
1744 @code{MANY}, indicating an unlimited number of evaluated arguments (the | |
1745 equivalent of @code{&rest}). Both @code{UNEVALLED} and @code{MANY} are | |
1746 macros. If @var{max} is a number, it may not be less than @var{min} and | |
1747 it may not be greater than 12. (If you need to add a function with | |
1748 more than 12 arguments, either use the @code{MANY} form or edit the | |
1749 definition of @code{DEFUN} in @file{lisp.h}. If you do the latter, | |
1750 make sure to also add another clause to the switch statement in | |
1751 @code{primitive_funcall().}) | |
1752 | |
1753 @item interactive | |
1754 This is an interactive specification, a string such as might be used as | |
1755 the argument of @code{interactive} in a Lisp function. In the case of | |
1756 @code{or}, it is 0 (a null pointer), indicating that @code{or} cannot be | |
1757 called interactively. A value of @code{""} indicates a function that | |
1758 should receive no arguments when called interactively. | |
1759 | |
1760 @item doc | |
1761 This is the documentation string. It is written just like a | |
1762 documentation string for a function defined in Lisp; in particular, | |
1763 the first line should be a single sentence. Note how the documentation | |
1764 string is enclosed in a comment, none of the documentation is placed | |
1765 on the same lines as the comment-start and comment-end characters, and | |
1766 the comment-start characters are on the same line as the interactive | |
1767 specification. @file{make-docfile}, which scans the C files for | |
1768 documentation strings, is very particular about what it looks for, | |
1769 and will not properly note the doc string if it's not in this exact | |
1770 format. | |
1771 @end table | |
1772 | |
1773 You are free to put the various arguments to @code{DEFUN} on separate | |
1774 lines to avoid overly long lines. However, make sure to put the | |
1775 comment-start characters for the doc string on the same line as the | |
1776 interactive specification, and put a newline directly after them | |
1777 (and before the comment-end characters). | |
1778 | |
1779 After the call to the @code{DEFUN} macro, you must write the argument | |
1780 name list that every C function must have, followed by ordinary C | |
1781 declarations for the arguments. For a function with a fixed maximum | |
1782 number of arguments, declare a C argument for each Lisp argument, and | |
1783 give them all type @code{Lisp_Object}. When a Lisp function has no | |
1784 upper limit on the number of arguments, its implementation in C actually | |
1785 receives exactly two arguments: the first is the number of Lisp | |
1786 arguments, and the second is the address of a block containing their | |
1787 values. They have types @code{int} and @w{@code{Lisp_Object *}}. | |
1788 | |
1789 The names of the C arguments will be used as the names of the arguments | |
1790 to the Lisp primitive as displayed in its documentation, modulo the | |
1791 same concerns described above for @code{F...} names (in particular, | |
1792 underscores in the C arguments become dashes in the Lisp arguments). | |
1793 There is one additional kludge: A C argument called @code{defalt} | |
1794 becomes the Lisp argument @code{default}. This deliberate misspelling | |
1795 is done because @code{default} is a reserved word in the C language. | |
1796 | |
1797 Note that you @emph{must} use old-style prototypes for the arguments | |
1798 to @code{DEFUN}, even though all other functions in the C code use | |
1799 new-style prototypes. | |
1800 | |
1801 Within the function @code{For} itself, note the use of the macros | |
1802 @code{GCPRO1} and @code{UNGCPRO}. @code{GCPRO1} is used to ``protect'' | |
1803 a variable from garbage collection---to inform the garbage collector | |
1804 that it must look in that variable and regard its contents as an | |
1805 accessible object. This is necessary whenever you call @code{Feval} or | |
1806 anything that can directly or indirectly call @code{Feval} (this | |
1807 includes the @code{QUIT} macro!). At such a time, any Lisp object that | |
1808 you intend to refer to again must be protected somehow. @code{UNGCPRO} | |
1809 cancels the protection of the variables that are protected in the | |
1810 current function. It is necessary to do this explicitly. | |
1811 | |
1812 The macro @code{GCPRO1} protects just one local variable. If you want | |
1813 to protect two, use @code{GCPRO2} instead; repeating @code{GCPRO1} will | |
1814 not work. Macros @code{GCPRO3} and @code{GCPRO4} also exist. | |
1815 | |
1816 These macros implicitly use local variables such as @code{gcpro1}; you | |
1817 must declare these explicitly, with type @code{struct gcpro}. Thus, if | |
1818 you use @code{GCPRO2}, you must declare @code{gcpro1} and @code{gcpro2}. | |
1819 | |
1820 @cindex caller-protects (@code{GCPRO} rule) | |
1821 Note also that the general rule is @dfn{caller-protects}; i.e. you | |
1822 are only responsible for protecting those Lisp objects that you create. | |
1823 Any objects passed to you as parameters should have been protected | |
1824 by whoever created them, so you don't in general have to protect them. | |
1825 @code{For} is an exception; it protects its parameters to provide | |
1826 extra assurance against Lisp primitives elsewhere that are incorrectly | |
1827 written, and against malicious self-modifying code. There are a few | |
1828 other standard functions that also do this. | |
1829 | |
1830 @code{GCPRO}ing is perhaps the trickiest and most error-prone part | |
1831 of XEmacs coding. It is @strong{extremely} important that you get this | |
1832 right and use a great deal of discipline when writing this code. | |
1833 @xref{GCPROing, ,@code{GCPRO}ing}, for full details on how to do this. | |
1834 | |
1835 What @code{DEFUN} actually does is declare a global structure of | |
1836 type @code{Lisp_Subr} whose name begins with a capital @samp{S} and | |
1837 which contains information about the primitive (e.g. a pointer to the | |
1838 function, its minimum and maximum allowed arguments, a string describing | |
1839 its Lisp name); @code{DEFUN} then begins a normal C function | |
1840 declaration using the @code{F...} name. The Lisp subr object that is | |
1841 the function definition of a primitive (i.e. the object in the function | |
1842 slot of the symbol that names the primitive) actually points to this | |
1843 @samp{S} structure; when @code{Feval} encounters a subr, it looks in the | |
1844 structure to find out how to call the C function. | |
1845 | |
1846 Defining the C function is not enough to make a Lisp primitive | |
1847 available; you must also create the Lisp symbol for the primitive (the | |
1848 symbol is @dfn{interned}; @pxref{Obarrays}) and store a suitable subr | |
1849 object in its function cell. (If you don't do this, the primitive won't | |
1850 be seen by Lisp code.) The code looks like this: | |
1851 | |
1852 @example | |
1853 defsubr (&@var{subr-structure-name}); | |
1854 @end example | |
1855 | |
1856 @noindent | |
1857 Here @var{subr-structure-name} is the name you used as the third | |
1858 argument to @code{DEFUN}. | |
1859 | |
1860 This call to @code{defsubr} should go in the @code{syms_of_*()} | |
1861 function at the end of the module. If no such function exists, create | |
1862 it and make sure to also declare it in @file{symsinit.h} and call it | |
1863 from the appropriate spot in @code{main()}. @xref{General Coding | |
1864 Rules}. | |
1865 | |
1866 Note that C code cannot call functions by name unless they are defined | |
1867 in C. The way to call a function written in Lisp is to use | |
1868 @code{Ffuncall}, which embodies the Lisp function @code{funcall}. Since | |
1869 the Lisp function @code{funcall} accepts an unlimited number of | |
1870 arguments, in C it takes two: the number of Lisp-level arguments, and a | |
1871 one-dimensional array containing their values. The first Lisp-level | |
1872 argument is the Lisp function to call, and the rest are the arguments to | |
1873 pass to it. Since @code{Ffuncall} can call the evaluator, you must | |
1874 protect pointers from garbage collection around the call to | |
1875 @code{Ffuncall}. (However, @code{Ffuncall} explicitly protects all of | |
1876 its parameters, so you don't have to protect any pointers passed | |
1877 as parameters to it.) | |
1878 | |
1879 The C functions @code{call0}, @code{call1}, @code{call2}, and so on, | |
1880 provide handy ways to call a Lisp function conveniently with a fixed | |
1881 number of arguments. They work by calling @code{Ffuncall}. | |
1882 | |
1883 @file{eval.c} is a very good file to look through for examples; | |
1884 @file{lisp.h} contains the definitions for some important macros and | |
1885 functions. | |
1886 | |
1887 @node Adding Global Lisp Variables | |
1888 @section Adding Global Lisp Variables | |
1889 | |
1890 Global variables whose names begin with @samp{Q} are constants whose | |
1891 value is a symbol of a particular name. The name of the variable should | |
1892 be derived from the name of the symbol using the same rules as for Lisp | |
1893 primitives. These variables are initialized using a call to | |
1894 @code{defsymbol()} in the @code{syms_of_*()} function. (This call | |
1895 interns a symbol, sets the C variable to the resulting Lisp object, and | |
1896 calls @code{staticpro()} on the C variable to tell the | |
1897 garbage-collection mechanism about this variable. What | |
1898 @code{staticpro()} does is add a pointer to the variable to a large | |
1899 global array; when garbage-collection happens, all pointers listed in | |
1900 the array are used as starting points for marking Lisp objects. This is | |
1901 important because it's quite possible that the only current reference to | |
1902 the object is the C variable. In the case of symbols, the | |
1903 @code{staticpro()} doesn't matter all that much because the symbol is | |
1904 contained in @code{obarray}, which is itself @code{staticpro()}ed. | |
1905 However, it's possible that a naughty user could do something like | |
1906 uninterning the symbol out of @code{obarray} or even setting | |
1907 @code{obarray} to a different value [although this is likely to make | |
1908 XEmacs crash!].) | |
1909 | |
1910 @strong{Note:} It is potentially deadly if you declare a @samp{Q...} | |
1911 variable in two different modules. The two calls to @code{defsymbol()} | |
1912 are no problem, but some linkers will complain about multiply-defined | |
1913 symbols. The most insidious aspect of this is that often the link will | |
1914 succeed anyway, but then the resulting executable will sometimes crash | |
1915 in obscure ways during certain operations! To avoid this problem, | |
1916 declare any symbols with common names (such as @code{text}) that are not | |
1917 obviously associated with this particular module in the module | |
1918 @file{general.c}. | |
1919 | |
1920 Global variables whose names begin with @samp{V} are variables that | |
1921 contain Lisp objects. The convention here is that all global variables | |
1922 of type @code{Lisp_Object} begin with @samp{V}, and all others don't | |
1923 (including integer and boolean variables that have Lisp | |
1924 equivalents). Most of the time, these variables have equivalents in | |
1925 Lisp, but some don't. Those that do are declared this way by a call to | |
1926 @code{DEFVAR_LISP()} in the @code{vars_of_*()} initializer for the | |
1927 module. What this does is create a special @dfn{symbol-value-forward} | |
1928 Lisp object that contains a pointer to the C variable, intern a symbol | |
1929 whose name is as specified in the call to @code{DEFVAR_LISP()}, and set | |
1930 its value to the symbol-value-forward Lisp object; it also calls | |
1931 @code{staticpro()} on the C variable to tell the garbage-collection | |
1932 mechanism about the variable. When @code{eval} (or actually | |
1933 @code{symbol-value}) encounters this special object in the process of | |
1934 retrieving a variable's value, it follows the indirection to the C | |
1935 variable and gets its value. @code{setq} does similar things so that | |
1936 the C variable gets changed. | |
1937 | |
1938 Whether or not you @code{DEFVAR_LISP()} a variable, you need to | |
1939 initialize it in the @code{vars_of_*()} function; otherwise it will end | |
1940 up as all zeroes, which is the integer 0 (@emph{not} @code{nil}), and | |
1941 this is probably not what you want. Also, if the variable is not | |
1942 @code{DEFVAR_LISP()}ed, @strong{you must call} @code{staticpro()} on the | |
1943 C variable in the @code{vars_of_*()} function. Otherwise, the | |
1944 garbage-collection mechanism won't know that the object in this variable | |
1945 is in use, and will happily collect it and reuse its storage for another | |
1946 Lisp object, and you will be the one who's unhappy when you can't figure | |
1947 out how your variable got overwritten. | |
1948 | |
1949 @node A Summary of the Various XEmacs Modules, Allocation of Objects in XEmacs Lisp, Rules When Writing New C Code, Top | |
1950 @chapter A Summary of the Various XEmacs Modules | |
1951 | |
1952 This is accurate as of XEmacs 20.0. | |
1953 | |
1954 @menu | |
1955 * Low-Level Modules:: | |
1956 * Basic Lisp Modules:: | |
1957 * Modules for Standard Editing Operations:: | |
1958 * Editor-Level Control Flow Modules:: | |
1959 * Modules for the Basic Displayable Lisp Objects:: | |
1960 * Modules for other Display-Related Lisp Objects:: | |
1961 * Modules for the Redisplay Mechanism:: | |
1962 * Modules for Interfacing with the File System:: | |
1963 * Modules for Other Aspects of the Lisp Interpreter and Object System:: | |
1964 * Modules for Interfacing with the Operating System:: | |
1965 * Modules for Interfacing with X Windows:: | |
1966 * Modules for Internationalization:: | |
1967 @end menu | |
1968 | |
1969 @node Low-Level Modules | |
1970 @section Low-Level Modules | |
1971 | |
1972 @example | |
1973 size name | |
1974 ------- --------------------- | |
1975 18150 config.h | |
1976 @end example | |
1977 | |
1978 This is automatically generated from @file{config.h.in} based on the | |
1979 results of configure tests and user-selected optional features and | |
1980 contains preprocessor definitions specifying the nature of the | |
1981 environment in which XEmacs is being compiled. | |
1982 | |
1983 | |
1984 | |
1985 @example | |
1986 2347 paths.h | |
1987 @end example | |
1988 | |
1989 This is automatically generated from @file{paths.h.in} based on supplied | |
1990 configure values, and allows for non-standard installed configurations | |
1991 of the XEmacs directories. It's currently broken, though. | |
1992 | |
1993 | |
1994 | |
1995 @example | |
1996 47878 emacs.c | |
1997 20239 signal.c | |
1998 @end example | |
1999 | |
2000 @file{emacs.c} contains @code{main()} and other code that performs the most | |
2001 basic environment initializations and handles shutting down the XEmacs | |
2002 process (this includes @code{kill-emacs}, the normal way that XEmacs is | |
2003 exited; @code{dump-emacs}, which is used during the build process to | |
2004 write out the XEmacs executable; @code{run-emacs-from-temacs}, which can | |
2005 be used to start XEmacs directly when temacs has finished loading all | |
2006 the Lisp code; and emergency code to handle crashes [XEmacs tries to | |
2007 auto-save all files before it crashes]). | |
2008 | |
2009 Low-level code that directly interacts with the Unix signal mechanism, | |
2010 however, is in @file{signal.c}. Note that this code does not handle system | |
2011 dependencies in interfacing to signals; that is handled using the | |
2012 @file{syssignal.h} header file, described in section J below. | |
2013 | |
2014 | |
2015 | |
2016 @example | |
2017 23458 unexaix.c | |
2018 9893 unexalpha.c | |
2019 11302 unexapollo.c | |
2020 16544 unexconvex.c | |
2021 31967 unexec.c | |
2022 30959 unexelf.c | |
2023 35791 unexelfsgi.c | |
2024 3207 unexencap.c | |
2025 7276 unexenix.c | |
2026 20539 unexfreebsd.c | |
2027 1153 unexfx2800.c | |
2028 13432 unexhp9k3.c | |
2029 11049 unexhp9k800.c | |
2030 9165 unexmips.c | |
2031 8981 unexnext.c | |
2032 1673 unexsol2.c | |
2033 19261 unexsunos4.c | |
2034 @end example | |
2035 | |
2036 These modules contain code dumping out the XEmacs executable on various | |
2037 different systems. (This process is highly machine-specific and | |
2038 requires intimate knowledge of the executable format and the memory map | |
2039 of the process.) Only one of these modules is actually used; this is | |
2040 chosen by @file{configure}. | |
2041 | |
2042 | |
2043 | |
2044 @example | |
2045 15715 crt0.c | |
2046 1484 lastfile.c | |
2047 1115 pre-crt0.c | |
2048 @end example | |
2049 | |
2050 These modules are used in conjunction with the dump mechanism. On some | |
2051 systems, an alternative version of the C startup code (the actual code | |
2052 that receives control from the operating system when the process is | |
2053 started, and which calls @code{main()}) is required so that the dumping | |
2054 process works properly; @file{crt0.c} provides this. | |
2055 | |
2056 @file{pre-crt0.c} and @file{lastfile.c} should be the very first and | |
2057 very last file linked, respectively. (Actually, this is not really true. | |
2058 @file{lastfile.c} should be after all Emacs modules whose initialized | |
2059 data should be made constant, and before all other Emacs files and all | |
2060 libraries. In particular, the allocation modules @file{gmalloc.c}, | |
2061 @file{alloca.c}, etc. are normally placed past @file{lastfile.c}, and | |
2062 all of the files that implement Xt widget classes @emph{must} be placed | |
2063 after @file{lastfile.c} because they contain various structures that | |
2064 must be statically initialized and into which Xt writes at various | |
2065 times.) @file{pre-crt0.c} and @file{lastfile.c} contain exported symbols | |
2066 that are used to determine the start and end of XEmacs's initialized | |
2067 data space when dumping. | |
2068 | |
2069 | |
2070 | |
2071 @example | |
2072 14786 alloca.c | |
2073 16678 free-hook.c | |
2074 1692 getpagesize.h | |
2075 41936 gmalloc.c | |
2076 25141 malloc.c | |
2077 3802 mem-limits.h | |
2078 39011 ralloc.c | |
2079 3436 vm-limit.c | |
2080 @end example | |
2081 | |
2082 These handle basic C allocation of memory. @file{alloca.c} is an emulation of | |
2083 the stack allocation function @code{alloca()} on machines that lack | |
2084 this. (XEmacs makes extensive use of @code{alloca()} in its code.) | |
2085 | |
2086 @file{gmalloc.c} and @file{malloc.c} are two implementations of the standard C | |
2087 functions @code{malloc()}, @code{realloc()} and @code{free()}. They are | |
2088 often used in place of the standard system-provided @code{malloc()} | |
2089 because they usually provide a much faster implementation, at the | |
2090 expense of additional memory use. @file{gmalloc.c} is a newer implementation | |
2091 that is much more memory-efficient for large allocations than @file{malloc.c}, | |
2092 and should always be preferred if it works. (At one point, @file{gmalloc.c} | |
2093 didn't work on some systems where @file{malloc.c} worked; but this should be | |
2094 fixed now.) | |
2095 | |
2096 @cindex relocating allocator | |
2097 @file{ralloc.c} is the @dfn{relocating allocator}. It provides functions | |
2098 similar to @code{malloc()}, @code{realloc()} and @code{free()} that allocate | |
2099 memory that can be dynamically relocated in memory. The advantage of | |
2100 this is that allocated memory can be shuffled around to place all the | |
2101 free memory at the end of the heap, and the heap can then be shrunk, | |
2102 releasing the memory back to the operating system. The use of this can | |
2103 be controlled with the configure option @code{--rel-alloc}; if enabled, memory allocated for | |
2104 buffers will be relocatable, so that if a very large file is visited and | |
2105 the buffer is later killed, the memory can be released to the operating | |
2106 system. (The disadvantage of this mechanism is that it can be very | |
2107 slow. On systems with the @code{mmap()} system call, the XEmacs version | |
2108 of @file{ralloc.c} uses this to move memory around without actually having to | |
2109 block-copy it, which can speed things up; but it can still cause | |
2110 noticeable performance degradation.) | |
2111 | |
2112 @file{free-hook.c} contains some debugging functions for checking for invalid | |
2113 arguments to @code{free()}. | |
2114 | |
2115 @file{vm-limit.c} contains some functions that warn the user when memory is | |
2116 getting low. These are callback functions that are called by @file{gmalloc.c} | |
2117 and @file{malloc.c} at appropriate times. | |
2118 | |
2119 @file{getpagesize.h} provides a uniform interface for retrieving the size of a | |
2120 page in virtual memory. @file{mem-limits.h} provides a uniform interface for | |
2121 retrieving the total amount of available virtual memory. Both are | |
2122 similar in spirit to the @file{sys*.h} files described in section J, below. | |
2123 | |
2124 | |
2125 | |
2126 @example | |
2127 2659 blocktype.c | |
2128 1410 blocktype.h | |
2129 7194 dynarr.c | |
2130 2671 dynarr.h | |
2131 @end example | |
2132 | |
2133 These implement a couple of basic C data types to facilitate memory | |
2134 allocation. The @code{Blocktype} type efficiently manages the | |
2135 allocation of fixed-size blocks by minimizing the number of times that | |
2136 @code{malloc()} and @code{free()} are called. It allocates memory in | |
2137 large chunks, subdivides the chunks into blocks of the proper size, and | |
2138 returns the blocks as requested. When blocks are freed, they are placed | |
2139 onto a linked list, so they can be efficiently reused. This data type | |
2140 is not much used in XEmacs currently, because it's a fairly new | |
2141 addition. | |
2142 | |
2143 @cindex dynamic array | |
2144 The @code{Dynarr} type implements a @dfn{dynamic array}, which is | |
2145 similar to a standard C array but has no fixed limit on the number of | |
2146 elements it can contain. Dynamic arrays can hold elements of any type, | |
2147 and when you add a new element, the array automatically resizes itself | |
2148 if it isn't big enough. Dynarrs are extensively used in the redisplay | |
2149 mechanism. | |
2150 | |
2151 | |
2152 | |
2153 @example | |
2154 2058 inline.c | |
2155 @end example | |
2156 | |
2157 This module is used in connection with inline functions (available in | |
2158 some compilers). Often, inline functions need to have a corresponding | |
2159 non-inline function that does the same thing. This module is where they | |
2160 reside. It contains no actual code, but defines some special flags that | |
2161 cause inline functions defined in header files to be rendered as actual | |
2162 functions. It then includes all header files that contain any inline | |
2163 function definitions, so that each one gets a real function equivalent. | |
2164 | |
2165 | |
2166 | |
2167 @example | |
2168 6489 debug.c | |
2169 2267 debug.h | |
2170 @end example | |
2171 | |
2172 These functions provide a system for doing internal consistency checks | |
2173 during code development. This system is not currently used; instead the | |
2174 simpler @code{assert()} macro is used along with the various checks | |
2175 provided by the @samp{--error-check-*} configuration options. | |
2176 | |
2177 | |
2178 | |
2179 @example | |
2180 1643 prefix-args.c | |
2181 @end example | |
2182 | |
2183 This is actually the source for a small, self-contained program | |
2184 used during building. | |
2185 | |
2186 | |
2187 @example | |
2188 904 universe.h | |
2189 @end example | |
2190 | |
2191 This is not currently used. | |
2192 | |
2193 | |
2194 | |
2195 @node Basic Lisp Modules | |
2196 @section Basic Lisp Modules | |
2197 | |
2198 @example | |
2199 size name | |
2200 ------- --------------------- | |
2201 70167 emacsfns.h | |
2202 6305 lisp-disunion.h | |
2203 7086 lisp-union.h | |
2204 54929 lisp.h | |
2205 14235 lrecord.h | |
2206 10728 symsinit.h | |
2207 @end example | |
2208 | |
2209 These are the basic header files for all XEmacs modules. Each module | |
2210 includes @file{lisp.h}, which brings the other header files in. | |
2211 @file{lisp.h} contains the definitions of the structures and extractor | |
2212 and constructor macros for the basic Lisp objects and various other | |
2213 basic definitions for the Lisp environment, as well as some | |
2214 general-purpose definitions (e.g. @code{min()} and @code{max()}). | |
2215 @file{lisp.h} includes either @file{lisp-disunion.h} or | |
2216 @file{lisp-union.h}, depending on whether @code{NO_UNION_TYPE} is | |
2217 defined. These files define the typedef of the Lisp object itself (as | |
2218 described above) and the low-level macros that hide the actual | |
2219 implementation of the Lisp object. All extractor and constructor macros | |
2220 for particular types of Lisp objects are defined in terms of these | |
2221 low-level macros. | |
2222 | |
2223 As a general rule, all typedefs should go into the typedefs section of | |
2224 @file{lisp.h} rather than into a module-specific header file even if the | |
2225 structure is defined elsewhere. This allows function prototypes that | |
2226 use the typedef to be placed into @file{emacsfns.h}. Forward structure | |
2227 declarations (i.e. a simple declaration like @code{struct foo;} where | |
2228 the structure itself is defined elsewhere) should be placed into the | |
2229 typedefs section as necessary. | |
2230 | |
2231 @file{lrecord.h} contains the basic structures and macros that implement | |
2232 all record-type Lisp objects -- i.e. all objects whose type is a field | |
2233 in their C structure, which includes all objects except the few most | |
2234 basic ones. | |
2235 | |
2236 @file{emacsfns.h} contains prototypes for most of the exported functions | |
2237 in the various modules. (In particular, prototypes for Lisp primitives | |
2238 should always go in this header file. Prototypes for other functions | |
2239 can either go here or in a module-specific header file, depending on how | |
2240 general-purpose the function is and whether it has special-purpose | |
2241 argument types requiring definitions not in @file{lisp.h}.) All | |
2242 initialization functions are prototyped in @file{symsinit.h}. | |
2243 | |
2244 | |
2245 | |
2246 @example | |
2247 120478 alloc.c | |
2248 1029 pure.c | |
2249 2506 puresize.h | |
2250 @end example | |
2251 | |
2252 The large module @file{alloc.c} implements all of the basic allocation and | |
2253 garbage collection for Lisp objects. The most commonly used Lisp | |
2254 objects are allocated in chunks, similar to the Blocktype data type | |
2255 described above; others are allocated in individually @code{malloc()}ed | |
2256 blocks. This module provides the foundation on which all other aspects | |
2257 of the Lisp environment sit, and is the first module initialized at | |
2258 startup. | |
2259 | |
2260 Note that @file{alloc.c} provides a series of generic functions that are | |
2261 not dependent on any particular object type, and interfaces to | |
2262 particular types of objects using a standardized interface of | |
2263 type-specific methods. This scheme is a fundamental principle of | |
2264 object-oriented programming and is heavily used throughout XEmacs. The | |
2265 great advantage of this is that it allows for a clean separation of | |
2266 functionality into different modules -- new classes of Lisp objects, new | |
2267 event interfaces, new device types, new stream interfaces, etc. can be | |
2268 added transparently without affecting code anywhere else in XEmacs. | |
2269 Because the different subsystems are divided into general and specific | |
2270 code, adding a new subtype within a subsystem will in general not | |
2271 require changes to the generic subsystem code or affect any of the other | |
2272 subtypes in the subsystem; this provides a great deal of robustness to | |
2273 the XEmacs code. | |
2274 | |
2275 @cindex pure space | |
2276 @file{pure.c} contains the declaration of the @dfn{purespace} array. | |
2277 Pure space is a hack used to place some constant Lisp data into the code | |
2278 segment of the XEmacs executable, even though the data needs to be | |
2279 initialized through function calls. (See above in section VIII for more | |
2280 info about this.) During startup, certain sorts of data is | |
2281 automatically copied into pure space, and other data is copied manually | |
2282 in some of the basic Lisp files by calling the function @code{purecopy}, | |
2283 which copies the object if possible (this only works in temacs, of | |
2284 course) and returns the new object. In particular, while temacs is | |
2285 executing, the Lisp reader automatically copies all compiled-function | |
2286 objects that it reads into pure space. Since compiled-function objects | |
2287 are large, are never modified, and typically comprise the majority of | |
2288 the contents of a compiled-Lisp file, this works well. While XEmacs is | |
2289 running, any attempt to modify an object that resides in pure space | |
2290 causes an error. Objects in pure space are never garbage collected -- | |
2291 almost all of the time, they're intended to be permanent, and in any | |
2292 case you can't write into pure space to set the mark bits. | |
2293 | |
2294 @file{puresize.h} contains the declaration of the size of the pure space | |
2295 array. This depends on the optional features that are compiled in, any | |
2296 extra purespace requested by the user at compile time, and certain other | |
2297 factors (e.g. 64-bit machines need more pure space because their Lisp | |
2298 objects are larger). The smallest size that suffices should be used, so | |
2299 that there's no wasted space. If there's not enough pure space, you | |
2300 will get an error during the build process, specifying how much more | |
2301 pure space is needed. | |
2302 | |
2303 | |
2304 | |
2305 @example | |
2306 122243 eval.c | |
2307 2305 backtrace.h | |
2308 @end example | |
2309 | |
2310 This module contains all of the functions to handle the flow of control. | |
2311 This includes the mechanisms of defining functions, calling functions, | |
2312 traversing stack frames, and binding variables; the control primitives | |
2313 and other special forms such as @code{while}, @code{if}, @code{eval}, | |
2314 @code{let}, @code{and}, @code{or}, @code{progn}, etc.; handling of | |
2315 non-local exits, unwind-protects, and exception handlers; entering the | |
2316 debugger; methods for the subr Lisp object type; etc. It does | |
2317 @emph{not} include the @code{read} function, the @code{print} function, | |
2318 or the handling of symbols and obarrays. | |
2319 | |
2320 @file{backtrace.h} contains some structures related to stack frames and the | |
2321 flow of control. | |
2322 | |
2323 | |
2324 | |
2325 @example | |
2326 64949 lread.c | |
2327 @end example | |
2328 | |
2329 This module implements the Lisp reader and the @code{read} function, | |
2330 which converts text into Lisp objects, according to the read syntax of | |
2331 the objects, as described above. This is similar to the parser that is | |
2332 a part of all compilers. | |
2333 | |
2334 | |
2335 | |
2336 @example | |
2337 40900 print.c | |
2338 @end example | |
2339 | |
2340 This module implements the Lisp print mechanism and the @code{print} | |
2341 function and related functions. This is the inverse of the Lisp reader | |
2342 -- it converts Lisp objects to a printed, textual representation. | |
2343 (Hopefully something that can be read back in using @code{read} to get | |
2344 an equivalent object.) | |
2345 | |
2346 | |
2347 | |
2348 @example | |
2349 4518 general.c | |
2350 60220 symbols.c | |
2351 9966 symeval.h | |
2352 @end example | |
2353 | |
2354 @file{symbols.c} implements the handling of symbols, obarrays, and | |
2355 retrieving the values of symbols. Much of the code is devoted to | |
2356 handling the special @dfn{symbol-value-magic} objects that define | |
2357 special types of variables -- this includes buffer-local variables, | |
2358 variable aliases, variables that forward into C variables, etc. This | |
2359 module is initialized extremely early (right after @file{alloc.c}), | |
2360 because it is here that the basic symbols @code{t} and @code{nil} are | |
2361 created, and those symbols are used everywhere throughout XEmacs. | |
2362 | |
2363 @file{symeval.h} contains the definitions of symbol structures and the | |
2364 @code{DEFVAR_LISP()} and related macros for declaring variables. | |
2365 | |
2366 | |
2367 | |
2368 @example | |
2369 48973 data.c | |
2370 25694 floatfns.c | |
2371 71049 fns.c | |
2372 @end example | |
2373 | |
2374 These modules implement the methods and standard Lisp primitives for all | |
2375 the basic Lisp object types other than symbols (which are described | |
2376 above). @file{data.c} contains all the predicates (primitives that return | |
2377 whether an object is of a particular type); the integer arithmetic | |
2378 functions; and the basic accessor and mutator primitives for the various | |
2379 object types. @file{fns.c} contains all the standard predicates for working | |
2380 with sequences (where, abstractly speaking, a sequence is an ordered set | |
2381 of objects, and can be represented by a list, string, vector, or | |
2382 bit-vector); it also contains @code{equal}, perhaps on the grounds that | |
2383 bulk of the operation of @code{equal} is comparing sequences. | |
2384 @file{floatfns.c} contains methods and primitives for floats and floating-point | |
2385 arithmetic. | |
2386 | |
2387 | |
2388 | |
2389 @example | |
2390 23555 bytecode.c | |
2391 3358 bytecode.h | |
2392 @end example | |
2393 | |
2394 @file{bytecode.c} implements the byte-code interpreter, and @file{bytecode.h} contains | |
2395 associated structures. Note that the byte-code @emph{compiler} is | |
2396 written in Lisp. | |
2397 | |
2398 | |
2399 | |
2400 | |
2401 @node Modules for Standard Editing Operations | |
2402 @section Modules for Standard Editing Operations | |
2403 | |
2404 @example | |
2405 size name | |
2406 ------- --------------------- | |
2407 82900 buffer.c | |
2408 60964 buffer.h | |
2409 6059 bufslots.h | |
2410 @end example | |
2411 | |
2412 @file{buffer.c} implements the buffer Lisp object type. This includes | |
2413 functions that create and destroy buffers; retrieve buffers by name or | |
2414 by other properties; manipulate lists of buffers (remember that buffers | |
2415 are permanent objects and stored in various ordered lists); retrieve or | |
2416 change buffer properties; etc. It also contains the definitions of all | |
2417 the built-in buffer-local variables (which can be viewed as buffer | |
2418 properties). It does @emph{not} contain code to manipulate buffer-local | |
2419 variables (that's in @file{symbols.c}, described above); or code to manipulate | |
2420 the text in a buffer. | |
2421 | |
2422 @file{buffer.h} defines the structures associated with a buffer and the various | |
2423 macros for retrieving text from a buffer and special buffer positions | |
2424 (e.g. @code{point}, the default location for text insertion). It also | |
2425 contains macros for working with buffer positions and converting between | |
2426 their representations as character offsets and as byte offsets (under | |
2427 MULE, they are different, because characters can be multi-byte). It is | |
2428 one of the largest header files. | |
2429 | |
2430 @file{bufslots.h} defines the fields in the buffer structure that correspond to | |
2431 the built-in buffer-local variables. It is its own header file because | |
2432 it is included many times in @file{buffer.c}, as a way of iterating over all | |
2433 the built-in buffer-local variables. | |
2434 | |
2435 | |
2436 | |
2437 @example | |
2438 79888 insdel.c | |
2439 6103 insdel.h | |
2440 @end example | |
2441 | |
2442 @file{insdel.c} contains low-level functions for inserting and deleting text in | |
2443 a buffer, keeping track of changed regions for use by redisplay, and | |
2444 calling any before-change and after-change functions that may have been | |
2445 registered for the buffer. It also contains the actual functions that | |
2446 convert between byte offsets and character offsets. | |
2447 | |
2448 @file{insdel.h} contains associated headers. | |
2449 | |
2450 | |
2451 | |
2452 @example | |
2453 10975 marker.c | |
2454 @end example | |
2455 | |
2456 This module implements the marker Lisp object type, which conceptually | |
2457 is a pointer to a text position in a buffer that moves around as text is | |
2458 inserted and deleted, so as to remain in the same relative position. | |
2459 This module doesn't actually move the markers around -- that's handled | |
2460 in @file{insdel.c}. This module just creates them and implements the | |
2461 primitives for working with them. As markers are simple objects, this | |
2462 does not entail much. | |
2463 | |
2464 Note that the standard arithmetic primitives (e.g. @code{+}) accept | |
2465 markers in place of integers and automatically substitute the value of | |
2466 @code{marker-position} for the marker, i.e. an integer describing the | |
2467 current buffer position of the marker. | |
2468 | |
2469 | |
2470 | |
2471 @example | |
2472 193714 extents.c | |
2473 15686 extents.h | |
2474 @end example | |
2475 | |
2476 This module implements the extent Lisp object type, which is like a | |
2477 marker that works over a range of text rather than a single position. | |
2478 Extents are also much more complex and powerful than markers and have a | |
2479 more efficient (and more algorithmically complex) implementation. The | |
2480 implementation is described in detail in comments in @file{extents.c}. | |
2481 | |
2482 The code in @file{extents.c} works closely with @file{insdel.c} so that | |
2483 extents are properly moved around as text is inserted and deleted. | |
2484 There is also code in @file{extents.c} that provides information needed | |
2485 by the redisplay mechanism for efficient operation. (Remember that | |
2486 extents can have display properties that affect [sometimes drastically, | |
2487 as in the @code{invisible} property] the display of the text they | |
2488 cover.) | |
2489 | |
2490 | |
2491 | |
2492 @example | |
2493 60155 editfns.c | |
2494 @end example | |
2495 | |
2496 @file{editfns.c} contains the standard Lisp primitives for working with | |
2497 a buffer's text, and calls the low-level functions in @file{insdel.c}. | |
2498 It also contains primitives for working with @code{point} (the default | |
2499 buffer insertion location). | |
2500 | |
2501 @file{editfns.c} also contains functions for retrieving various | |
2502 characteristics from the external environment: the current time, the | |
2503 process ID of the running XEmacs process, the name of the user who ran | |
2504 this XEmacs process, etc. It's not clear why this code is in | |
2505 @file{editfns.c}. | |
2506 | |
2507 | |
2508 | |
2509 @example | |
2510 26081 callint.c | |
2511 12577 cmds.c | |
2512 2749 commands.h | |
2513 @end example | |
2514 | |
2515 @cindex interactive | |
2516 These modules implement the basic @dfn{interactive} commands, | |
2517 i.e. user-callable functions. Commands, as opposed to other functions, | |
2518 have special ways of getting their parameters interactively (by querying | |
2519 the user), as opposed to having them passed in a normal function | |
2520 invocation. Many commands are not really meant to be called from other | |
2521 Lisp functions, because they modify global state in a way that's often | |
2522 undesired as part of other Lisp functions. | |
2523 | |
2524 @file{callint.c} implements the mechanism for querying the user for | |
2525 parameters and calling interactive commands. The bulk of this module is | |
2526 code that parses the interactive spec that is supplied with an | |
2527 interactive command. | |
2528 | |
2529 @file{cmds.c} implements the basic, most commonly used editing commands: | |
2530 commands to move around the current buffer and insert and delete | |
2531 characters. These commands are implemented using the Lisp primitives | |
2532 defined in @file{editfns.c}. | |
2533 | |
2534 @file{commands.h} contains associated structure definitions and prototypes. | |
2535 | |
2536 | |
2537 | |
2538 @example | |
2539 194863 regex.c | |
2540 18968 regex.h | |
2541 79800 search.c | |
2542 @end example | |
2543 | |
2544 @file{search.c} implements the Lisp primitives for searching for text in | |
2545 a buffer, and some of the low-level algorithms for doing this. In | |
2546 particular, the fast fixed-string Boyer-Moore search algorithm is | |
2547 implemented in @file{search.c}. The low-level algorithms for doing | |
2548 regular-expression searching, however, are implemented in @file{regex.c} | |
2549 and @file{regex.h}. These two modules are largely independent of | |
2550 XEmacs, and are similar to (and based upon) the regular-expression | |
2551 routines used in @file{grep} and other GNU utilities. | |
2552 | |
2553 | |
2554 | |
2555 @example | |
2556 20476 doprnt.c | |
2557 @end example | |
2558 | |
2559 @file{doprnt.c} implements formatted-string processing, similar to | |
2560 @code{printf()} command in C. | |
2561 | |
2562 | |
2563 | |
2564 @example | |
2565 15372 undo.c | |
2566 @end example | |
2567 | |
2568 This module implements the undo mechanism for tracking buffer changes. | |
2569 Most of this could be implemented in Lisp. | |
2570 | |
2571 | |
2572 | |
2573 @node Editor-Level Control Flow Modules | |
2574 @section Editor-Level Control Flow Modules | |
2575 | |
2576 @example | |
2577 size name | |
2578 ------- --------------------- | |
2579 84546 event-Xt.c | |
2580 121483 event-stream.c | |
2581 6658 event-tty.c | |
2582 49271 events.c | |
2583 14459 events.h | |
2584 @end example | |
2585 | |
2586 These implement the handling of events (user input and other system | |
2587 notifications). | |
2588 | |
2589 @file{events.c} and @file{events.h} define the event Lisp object type | |
2590 and primitives for manipulating it. | |
2591 | |
2592 @file{event-stream.c} implements the basic functions for working with | |
2593 event queues, dispatching an event by looking it up in relevant keymaps | |
2594 and such, and handling timeouts; this includes the primitives | |
2595 @code{next-event} and @code{dispatch-event}, as well as related | |
2596 primitives such as @code{sit-for}, @code{sleep-for}, and | |
2597 @code{accept-process-output}. (@file{event-stream.c} is one of the | |
2598 hairiest and trickiest modules in XEmacs. Beware! You can easily mess | |
2599 things up here.) | |
2600 | |
2601 @file{event-Xt.c} and @file{event-tty.c} implement the low-level | |
2602 interfaces onto retrieving events from Xt (the X toolkit) and from TTY's | |
2603 (using @code{read()} and @code{select()}), respectively. The event | |
2604 interface enforces a clean separation between the specific code for | |
2605 interfacing with the operating system and the generic code for working | |
2606 with events, by defining an API of basic, low-level event methods; | |
2607 @file{event-Xt.c} and @file{event-tty.c} are two different | |
2608 implementations of this API. To add support for a new operating system | |
2609 (e.g. NeXTstep), one merely needs to provide another implementation of | |
2610 those API functions. | |
2611 | |
2612 Note that the choice of whether to use @file{event-Xt.c} or | |
2613 @file{event-tty.c} is made at compile time! Or at the very latest, it | |
2614 is made at startup time. @file{event-Xt.c} handles events for | |
2615 @emph{both} X and TTY frames; @file{event-tty.c} is only used when X | |
2616 support is not compiled into XEmacs. The reason for this is that there | |
2617 is only one event loop in XEmacs: thus, it needs to be able to receive | |
2618 events from all different kinds of frames. | |
2619 | |
2620 | |
2621 | |
2622 @example | |
2623 129583 keymap.c | |
2624 2621 keymap.h | |
2625 @end example | |
2626 | |
2627 @file{keymap.c} and @file{keymap.h} define the keymap Lisp object type | |
2628 and associated methods and primitives. (Remember that keymaps are | |
2629 objects that associate event descriptions with functions to be called to | |
2630 ``execute'' those events; @code{dispatch-event} looks up events in the | |
2631 relevant keymaps.) | |
2632 | |
2633 | |
2634 | |
2635 @example | |
2636 25212 keyboard.c | |
2637 @end example | |
2638 | |
2639 @file{keyboard.c} contains functions that implement the actual editor | |
2640 command loop -- i.e. the event loop that cyclically retrieves and | |
2641 dispatches events. This code is also rather tricky, just like | |
2642 @file{event-stream.c}. | |
2643 | |
2644 | |
2645 | |
2646 @example | |
2647 9973 macros.c | |
2648 1397 macros.h | |
2649 @end example | |
2650 | |
2651 These two modules contain the basic code for defining keyboard macros. | |
2652 These functions don't actually do much; most of the code that handles keyboard | |
2653 macros is mixed in with the event-handling code in @file{event-stream.c}. | |
2654 | |
2655 | |
2656 | |
2657 @example | |
2658 23234 minibuf.c | |
2659 @end example | |
2660 | |
2661 This contains some miscellaneous code related to the minibuffer (most of | |
2662 the minibuffer code was moved into Lisp by Richard Mlynarik). This | |
2663 includes the primitives for completion (although filename completion is | |
2664 in @file{dired.c}), the lowest-level interface to the minibuffer (if the | |
2665 command loop were cleaned up, this too could be in Lisp), and code for | |
2666 dealing with the echo area (this, too, was mostly moved into Lisp, and | |
2667 the only code remaining is code to call out to Lisp or provide simple | |
2668 bootstrapping implementations early in temacs, before the echo-area Lisp | |
2669 code is loaded). | |
2670 | |
2671 | |
2672 | |
2673 @node Modules for the Basic Displayable Lisp Objects | |
2674 @section Modules for the Basic Displayable Lisp Objects | |
2675 | |
2676 @example | |
2677 size name | |
2678 ------- --------------------- | |
2679 985 device-ns.h | |
2680 6454 device-stream.c | |
2681 1196 device-stream.h | |
2682 9526 device-tty.c | |
2683 8660 device-tty.h | |
2684 43798 device-x.c | |
2685 11667 device-x.h | |
2686 26056 device.c | |
2687 22993 device.h | |
2688 @end example | |
2689 | |
2690 These modules implement the device Lisp object type. This abstracts a | |
2691 particular screen or connection on which frames are displayed. As with | |
2692 Lisp objects, event interfaces, and other subsystems, the device code is | |
2693 separated into a generic component that contains a standardized | |
2694 interface (in the form of a set of methods) onto particular device | |
2695 types. | |
2696 | |
2697 The device subsystem defines all the methods and provides method | |
2698 services for not only device operations but also for the frame, window, | |
2699 menubar, scrollbar, toolbar, and other displayable-object subsystems. | |
2700 The reason for this is that all of these subsystems have the same | |
2701 subtypes (X, TTY, NeXTstep, Microsoft Windows, etc.) as devices do. | |
2702 | |
2703 | |
2704 | |
2705 @example | |
2706 934 frame-ns.h | |
2707 2303 frame-tty.c | |
2708 69205 frame-x.c | |
2709 5976 frame-x.h | |
2710 68175 frame.c | |
2711 15080 frame.h | |
2712 @end example | |
2713 | |
2714 Each device contains one or more frames in which objects (e.g. text) are | |
2715 displayed. A frame corresponds to a window in the window system; | |
2716 usually this is a top-level window but it could potentially be one of a | |
2717 number of overlapping child windows within a top-level window, using the | |
2718 MDI (Multiple Document Interface) protocol in Microsoft Windows or a | |
2719 similar scheme. | |
2720 | |
2721 The @file{frame-*} files implement the frame Lisp object type and provide the | |
2722 generic and device-type-specific operations on frames (e.g. raising, | |
2723 lowering, resizing, moving, etc.). | |
2724 | |
2725 | |
2726 | |
2727 @example | |
2728 160783 window.c | |
2729 15974 window.h | |
2730 @end example | |
2731 | |
2732 @cindex window (in Emacs) | |
2733 @cindex pane | |
2734 Each frame consists of one or more non-overlapping @dfn{windows} (better | |
2735 known as @dfn{panes} in standard window-system terminology) in which a | |
2736 buffer's text can be displayed. Windows can also have scrollbars | |
2737 displayed around their edges. | |
2738 | |
2739 @file{window.c} and @file{window.h} implement the window Lisp object | |
2740 type and provide code to manage windows. Since windows have no | |
2741 associated resources in the window system (the window system knows only | |
2742 about the frame; no child windows or anything are used for XEmacs | |
2743 windows), there is no device-type-specific code here; all of that code | |
2744 is part of the redisplay mechanism or the code for particular object | |
2745 types such as scrollbars. | |
2746 | |
2747 | |
2748 | |
2749 @node Modules for other Display-Related Lisp Objects | |
2750 @section Modules for other Display-Related Lisp Objects | |
2751 | |
2752 @example | |
2753 size name | |
2754 ------- --------------------- | |
2755 54397 faces.c | |
2756 15173 faces.h | |
2757 @end example | |
2758 | |
2759 | |
2760 | |
2761 @example | |
2762 4961 bitmaps.h | |
2763 954 glyphs-ns.h | |
2764 105345 glyphs-x.c | |
2765 4288 glyphs-x.h | |
2766 72102 glyphs.c | |
2767 16356 glyphs.h | |
2768 @end example | |
2769 | |
2770 | |
2771 | |
2772 @example | |
2773 952 objects-ns.h | |
2774 9971 objects-tty.c | |
2775 1465 objects-tty.h | |
2776 32326 objects-x.c | |
2777 2806 objects-x.h | |
2778 31944 objects.c | |
2779 6809 objects.h | |
2780 @end example | |
2781 | |
2782 | |
2783 | |
2784 @example | |
2785 57511 menubar-x.c | |
2786 11243 menubar.c | |
2787 @end example | |
2788 | |
2789 | |
2790 | |
2791 @example | |
2792 25012 scrollbar-x.c | |
2793 2554 scrollbar-x.h | |
2794 26954 scrollbar.c | |
2795 2778 scrollbar.h | |
2796 @end example | |
2797 | |
2798 | |
2799 | |
2800 @example | |
2801 23117 toolbar-x.c | |
2802 43456 toolbar.c | |
2803 4280 toolbar.h | |
2804 @end example | |
2805 | |
2806 | |
2807 | |
2808 @example | |
2809 25070 font-lock.c | |
2810 @end example | |
2811 | |
2812 This file provides C support for syntax highlighting -- i.e. | |
2813 highlighting different syntactic constructs of a source file in | |
2814 different colors, for easy reading. The C support is provided so that | |
2815 this is fast. | |
2816 | |
2817 | |
2818 | |
2819 @example | |
2820 32180 dgif_lib.c | |
2821 3999 gif_err.c | |
2822 10697 gif_lib.h | |
2823 9371 gifalloc.c | |
2824 @end example | |
2825 | |
2826 These modules decode GIF-format image files, for use with glyphs. | |
2827 | |
2828 | |
2829 | |
2830 @node Modules for the Redisplay Mechanism | |
2831 @section Modules for the Redisplay Mechanism | |
2832 | |
2833 @example | |
2834 size name | |
2835 ------- --------------------- | |
2836 38692 redisplay-output.c | |
2837 40835 redisplay-tty.c | |
2838 65069 redisplay-x.c | |
2839 234142 redisplay.c | |
2840 17026 redisplay.h | |
2841 @end example | |
2842 | |
2843 These files provide the redisplay mechanism. As with many other | |
2844 subsystems in XEmacs, there is a clean separation between the general | |
2845 and device-specific support. | |
2846 | |
2847 @file{redisplay.c} contains the bulk of the redisplay engine. These | |
2848 functions update the redisplay structures (which describe how the screen | |
2849 is to appear) to reflect any changes made to the state of any | |
2850 displayable objects (buffer, frame, window, etc.) since the last time | |
2851 that redisplay was called. These functions are highly optimized to | |
2852 avoid doing more work than necessary (since redisplay is called | |
2853 extremely often and is potentially a huge time sink), and depend heavily | |
2854 on notifications from the objects themselves that changes have occurred, | |
2855 so that redisplay doesn't explicitly have to check each possible object. | |
2856 The redisplay mechanism also contains a great deal of caching to further | |
2857 speed things up; some of this caching is contained within the various | |
2858 displayable objects. | |
2859 | |
2860 @file{redisplay-output.c} goes through the redisplay structures and converts | |
2861 them into calls to device-specific methods to actually output the screen | |
2862 changes. | |
2863 | |
2864 @file{redisplay-x.c} and @file{redisplay-tty.c} are two implementations | |
2865 of these redisplay output methods, for X frames and TTY frames, | |
2866 respectively. | |
2867 | |
2868 | |
2869 | |
2870 @example | |
2871 14129 indent.c | |
2872 @end example | |
2873 | |
2874 This module contains various functions and Lisp primitives for | |
2875 converting between buffer positions and screen positions. These | |
2876 functions call the redisplay mechanism to do most of the work, and then | |
2877 examine the redisplay structures to get the necessary information. This | |
2878 module needs work. | |
2879 | |
2880 | |
2881 | |
2882 @example | |
2883 14754 termcap.c | |
2884 2141 terminfo.c | |
2885 7253 tparam.c | |
2886 @end example | |
2887 | |
2888 These files contain functions for working with the termcap (BSD-style) | |
2889 and terminfo (System V style) databases of terminal capabilities and | |
2890 escape sequences, used when XEmacs is displaying in a TTY. | |
2891 | |
2892 | |
2893 | |
2894 @example | |
2895 10869 cm.c | |
2896 5876 cm.h | |
2897 @end example | |
2898 | |
2899 These files provide some miscellaneous TTY-output functions and should | |
2900 probably be merged into @file{redisplay-tty.c}. | |
2901 | |
2902 | |
2903 | |
2904 @node Modules for Interfacing with the File System | |
2905 @section Modules for Interfacing with the File System | |
2906 | |
2907 @example | |
2908 size name | |
2909 ------- --------------------- | |
2910 43362 lstream.c | |
2911 14240 lstream.h | |
2912 @end example | |
2913 | |
2914 These modules implement the stream Lisp object type. This is an | |
2915 internal-only Lisp object that implements a generic buffering stream. | |
2916 The idea is to provide a uniform interface onto all sources and sinks of | |
2917 data, including file descriptors, stdio streams, chunks of memory, Lisp | |
2918 buffers, Lisp strings, etc. That way, I/O functions can be written to | |
2919 the stream interface and can transparently handle all possible sources | |
2920 and sinks. (For example, the @code{read} function can read data from a | |
2921 file, a string, a buffer, or even a function that is called repeatedly | |
2922 to return data, without worrying about where the data is coming from or | |
2923 what-size chunks it is returned in.) | |
2924 | |
2925 @cindex lstream | |
2926 Note that in the C code, streams are called @dfn{lstreams} (for ``Lisp | |
2927 streams'') to distinguish them from other kinds of streams, e.g. stdio | |
2928 streams and C++ I/O streams. | |
2929 | |
2930 Similar to other subsystems in XEmacs, lstreams are separated into | |
2931 generic functions and a set of methods for the different types of | |
2932 lstreams. @file{lstream.c} provides implementations of many different | |
2933 types of streams; others are provided, e.g., in @file{mule-coding.c}. | |
2934 | |
2935 | |
2936 | |
2937 @example | |
2938 126926 fileio.c | |
2939 @end example | |
2940 | |
2941 This implements the basic primitives for interfacing with the file | |
2942 system. This includes primitives for reading files into buffers, | |
2943 writing buffers into files, checking for the presence or accessibility | |
2944 of files, canonicalizing file names, etc. Note that these primitives | |
2945 are usually not invoked directly by the user: There is a great deal of | |
2946 higher-level Lisp code that implements the user commands such as | |
2947 @code{find-file} and @code{save-buffer}. This is similar to the | |
2948 distinction between the lower-level primitives in @file{editfns.c} and | |
2949 the higher-level user commands in @file{commands.c} and | |
2950 @file{simple.el}. | |
2951 | |
2952 | |
2953 | |
2954 @example | |
2955 10960 filelock.c | |
2956 @end example | |
2957 | |
2958 This file provides functions for detecting clashes between different | |
2959 processes (e.g. XEmacs and some external process, or two different | |
2960 XEmacs processes) modifying the same file. (XEmacs can optionally use | |
2961 the @file{lock/} subdirectory to provide a form of ``locking'' between | |
2962 different XEmacs processes.) This module is also used by the low-level | |
2963 functions in @file{insdel.c} to ensure that, if the first modification | |
2964 is being made to a buffer whose corresponding file has been externally | |
2965 modified, the user is made aware of this so that the buffer can be | |
2966 synched up with the external changes if necessary. | |
2967 | |
2968 | |
2969 @example | |
2970 4527 filemode.c | |
2971 @end example | |
2972 | |
2973 This file provides some miscellaneous functions that construct a | |
2974 @samp{rwxr-xr-x}-type permissions string (as might appear in an | |
2975 @file{ls}-style directory listing) given the information returned by the | |
2976 @code{stat()} system call. | |
2977 | |
2978 | |
2979 | |
2980 @example | |
2981 22855 dired.c | |
2982 2094 ndir.h | |
2983 @end example | |
2984 | |
2985 These files implement the XEmacs interface to directory searching. This | |
2986 includes a number of primitives for determining the files in a directory | |
2987 and for doing filename completion. (Remember that generic completion is | |
2988 handled by a different mechanism, in @file{minibuf.c}.) | |
2989 | |
2990 @file{ndir.h} is a header file used for the directory-searching | |
2991 emulation functions provided in @file{sysdep.c} (see section J below), | |
2992 for systems that don't provide any directory-searching functions. (On | |
2993 those systems, directories can be read directly as files, and parsed.) | |
2994 | |
2995 | |
2996 | |
2997 @example | |
2998 4311 realpath.c | |
2999 @end example | |
3000 | |
3001 This file provides an implementation of the @code{realpath()} function | |
3002 for expanding symbolic links, on systems that don't implement it or have | |
3003 a broken implementation. | |
3004 | |
3005 | |
3006 | |
3007 @node Modules for Other Aspects of the Lisp Interpreter and Object System | |
3008 @section Modules for Other Aspects of the Lisp Interpreter and Object System | |
3009 | |
3010 @example | |
3011 size name | |
3012 ------- --------------------- | |
3013 22290 elhash.c | |
3014 2454 elhash.h | |
3015 12169 hash.c | |
3016 3369 hash.h | |
3017 @end example | |
3018 | |
3019 These files implement the hashtable Lisp object type. @file{hash.c} and | |
3020 @file{hash.h} provide a generic C implementation of hash tables (which | |
3021 can stand independently of XEmacs), and @file{elhash.c} and | |
3022 @file{elhash.h} provide a Lisp interface onto the C hash tables using | |
3023 the hashtable Lisp object type. | |
3024 | |
3025 | |
3026 | |
3027 @example | |
3028 95691 specifier.c | |
3029 11167 specifier.h | |
3030 @end example | |
3031 | |
3032 This module implements the specifier Lisp object type. This is | |
3033 primarily used for displayable properties, and allows for values that | |
3034 are specific to a particular buffer, window, frame, device, or device | |
3035 class, as well as a default value existing. This is used, for example, | |
3036 to control the height of the horizontal scrollbar or the appearance of | |
3037 the @code{default}, @code{bold}, or other faces. The specifier object | |
3038 consists of a number of specifications, each of which maps from a | |
3039 buffer, window, etc. to a value. The function @code{specifier-instance} | |
3040 looks up a value given a window (from which a buffer, frame, and device | |
3041 can be derived). | |
3042 | |
3043 | |
3044 @example | |
3045 43058 chartab.c | |
3046 6503 chartab.h | |
3047 9918 casetab.c | |
3048 @end example | |
3049 | |
3050 @file{chartab.c} and @file{chartab.h} implement the char table Lisp | |
3051 object type, which maps from characters or certain sorts of character | |
3052 ranges to Lisp objects. The implementation of this object is optimized | |
3053 for the internal representation of characters. Char tables come in | |
3054 different types, which affect the allowed object types to which a | |
3055 character can be mapped and also dictate certain other properties of the | |
3056 char table. | |
3057 | |
3058 @cindex case table | |
3059 @file{casetab.c} implements one sort of char table, the @dfn{case | |
3060 table}, which maps characters to other characters of possibly different | |
3061 case. These are used by XEmacs to implement case-changing primitives | |
3062 and to do case-insensitive searching. | |
3063 | |
3064 | |
3065 | |
3066 @example | |
3067 49593 syntax.c | |
3068 10200 syntax.h | |
3069 @end example | |
3070 | |
3071 @cindex scanner | |
3072 This module implements syntax tables, another sort of char table that | |
3073 maps characters into syntax classes that define the syntax of these | |
3074 characters (e.g. a parenthesis belongs to a class of @samp{open} characters | |
3075 that have corresponding @samp{close} characters and can be nested). | |
3076 This module also implements the Lisp @dfn{scanner}, a set of primitives | |
3077 for scanning over text based on syntax tables. This is used, for | |
3078 example, to find the matching parenthesis in a command such as | |
3079 @code{forward-sexp}, and by @file{font-lock.c} to locate quoted strings, | |
3080 comments, etc. | |
3081 | |
3082 | |
3083 | |
3084 @example | |
3085 10438 casefiddle.c | |
3086 @end example | |
3087 | |
3088 This module implements various Lisp primitives for upcasing, downcasing | |
3089 and capitalizing strings or regions of buffers. | |
3090 | |
3091 | |
3092 | |
3093 @example | |
3094 20234 rangetab.c | |
3095 @end example | |
3096 | |
3097 This module implements the range table Lisp object type, which provides | |
3098 for a mapping from ranges of integers to arbitrary Lisp objects. | |
3099 | |
3100 | |
3101 | |
3102 @example | |
3103 3201 opaque.c | |
3104 2206 opaque.h | |
3105 @end example | |
3106 | |
3107 This module implements the opaque Lisp object type, an internal-only | |
3108 Lisp object that encapsulates an arbitrary block of memory so that it | |
3109 can be managed by the Lisp allocation system. To create an opaque | |
3110 object, you call @code{make_opaque()}, passing a pointer to a block of | |
3111 memory. An object is created that is big enough to hold the memory, | |
3112 which is copied into the object's storage. The object will then stick | |
3113 around as long as you keep pointers to it, after which it will be | |
3114 automatically reclaimed. | |
3115 | |
3116 @cindex mark method | |
3117 Opaque objects can also have an arbitrary @dfn{mark method} associated | |
3118 with them, in case the block of memory contains other Lisp objects that | |
3119 need to be marked for garbage-collection purposes. (If you need other | |
3120 object methods, such as a finalize method, you should just go ahead and | |
3121 create a new Lisp object type -- it's not hard.) | |
3122 | |
3123 | |
3124 | |
3125 @example | |
3126 8783 abbrev.c | |
3127 @end example | |
3128 | |
3129 This function provides a few primitives for doing dynamic abbreviation | |
3130 expansion. In XEmacs, most of the code for this has been moved into | |
3131 Lisp. Some C code remains for speed and because the primitive | |
3132 @code{self-insert-command} (which is executed for all self-inserting | |
3133 characters) hooks into the abbrev mechanism. (@code{self-insert-command} | |
3134 is itself in C only for speed.) | |
3135 | |
3136 | |
3137 | |
3138 @example | |
3139 21934 doc.c | |
3140 @end example | |
3141 | |
3142 This function provides primitives for retrieving the documentation | |
3143 strings of functions and variables. These documentation strings contain | |
3144 certain special markers that get dynamically expanded (e.g. a | |
3145 reverse-lookup is performed on some named functions to retrieve their | |
3146 current key bindings). Some documentation strings (in particular, for | |
3147 the built-in primitives and pre-loaded Lisp functions) are stored | |
3148 externally in a file @file{DOC} in the @file{lib-src/} directory and | |
3149 need to be fetched from that file. (Part of the build stage involves | |
3150 building this file, and another part involves constructing an index for | |
3151 this file and embedding it into the executable, so that the functions in | |
3152 @file{doc.c} do not have to search the entire @file{DOC} file to find | |
3153 the appropriate documentation string.) | |
3154 | |
3155 | |
3156 | |
3157 @example | |
3158 13197 md5.c | |
3159 @end example | |
3160 | |
3161 This function provides a Lisp primitive that implements the MD5 secure | |
3162 hashing scheme, used to create a large hash value of a string of data such that | |
3163 the data cannot be derived from the hash value. This is used for | |
3164 various security applications on the Internet. | |
3165 | |
3166 | |
3167 | |
3168 @example | |
3169 7000 mocklisp.c | |
3170 @end example | |
3171 | |
3172 This function provides some emulation of MockLisp, a version of Lisp | |
3173 provided in Gosling Emacs (aka Unipress Emacs), from which some old | |
3174 versions of GNU Emacs were derived. You have to explicitly enable this | |
3175 code with a configure option and shouldn't normally, because it changes | |
3176 the semantics of XEmacs Lisp in ways that are not desirable for normal | |
3177 Lisp programs. | |
3178 | |
3179 | |
3180 | |
3181 @node Modules for Interfacing with the Operating System | |
3182 @section Modules for Interfacing with the Operating System | |
3183 | |
3184 @example | |
3185 size name | |
3186 ------- --------------------- | |
3187 33533 callproc.c | |
3188 89697 process.c | |
3189 4663 process.h | |
3190 @end example | |
3191 | |
3192 These modules allow XEmacs to spawn and communicate with subprocesses | |
3193 and network connections. | |
3194 | |
3195 @cindex synchronous subprocesses | |
3196 @cindex subprocesses, synchronous | |
3197 @file{callproc.c} implements (through the @code{call-process} | |
3198 primitive) what are called @dfn{synchronous subprocesses}. This means | |
3199 that XEmacs runs a program, waits till it's done, and retrieves its | |
3200 output. A typical example might be calling the @file{ls} program to get | |
3201 a directory listing. | |
3202 | |
3203 @cindex asynchronous subprocesses | |
3204 @cindex subprocesses, asynchronous | |
3205 @file{process.c} and @file{process.h} implement @dfn{asynchronous | |
3206 subprocesses}. This means that XEmacs starts a program and then | |
3207 continues normally, not waiting for the process to finish. Data can be | |
3208 sent to the process or retrieved from it as it's running. This is used | |
3209 for the @code{shell} command (which provides a front end onto a shell | |
3210 program such as @file{csh}), the mail and news readers implemented in | |
3211 XEmacs, etc. The result of calling @code{start-process} to start a | |
3212 subprocess is a process object, a particular kind of object used to | |
3213 communicate with the subprocess. You can send data to the process by | |
3214 passing the process object and the data to @code{send-process}, and you | |
3215 can specify what happens to data retrieved from the process by setting | |
3216 properties of the process object. (When the process sends data, XEmacs | |
3217 receives a process event, which says that there is data ready. When | |
3218 @code{dispatch-event} is called on this event, it reads the data from | |
3219 the process and does something with it, as specified by the process | |
3220 object's properties. Typically, this means inserting the data into a | |
3221 buffer or calling a function.) Another property of the process object is | |
3222 called the @dfn{sentinel}, which is a function that is called when the | |
3223 process terminates. | |
3224 | |
3225 @cindex network connections | |
3226 Process objects are also used for network connections (connections to a | |
3227 process running on another machine). Network connections are started | |
3228 with @code{open-network-stream} but otherwise work just like | |
3229 subprocesses. | |
3230 | |
3231 | |
3232 | |
3233 @example | |
3234 136029 sysdep.c | |
3235 5986 sysdep.h | |
3236 @end example | |
3237 | |
3238 These modules implement most of the low-level, messy operating-system | |
3239 interface code. This includes various device control (ioctl) operations | |
3240 for file descriptors, TTY's, pseudo-terminals, etc. (usually this stuff | |
3241 is fairly system-dependent; thus the name of this module), and emulation | |
3242 of standard library functions and system calls on systems that don't | |
3243 provide them or have broken versions. | |
3244 | |
3245 | |
3246 | |
3247 @example | |
3248 3605 sysdir.h | |
3249 6708 sysfile.h | |
3250 2027 sysfloat.h | |
3251 2918 sysproc.h | |
3252 745 syspwd.h | |
3253 7643 syssignal.h | |
3254 6892 systime.h | |
3255 12477 systty.h | |
3256 3487 syswait.h | |
3257 @end example | |
3258 | |
3259 These header files provide consistent interfaces onto system-dependent | |
3260 header files and system calls. The idea is that, instead of including a | |
3261 standard header file like @file{<sys/param.h>} (which may or may not | |
3262 exist on various systems) or having to worry about whether all system | |
3263 provide a particular preprocessor constant, or having to deal with the | |
3264 four different paradigms for manipulating signals, you just include the | |
3265 appropriate @file{sys*.h} header file, which includes all the right | |
3266 system header files, defines and missing preprocessor constants, | |
3267 provides a uniform interface onto system calls, etc. | |
3268 | |
3269 @file{sysdir.h} provides a uniform interface onto directory-querying | |
3270 functions. (In some cases, this is in conjunction with emulation | |
3271 functions in @file{sysdep.c}.) | |
3272 | |
3273 @file{sysfile.h} includes all the necessary header files for standard | |
3274 system calls (e.g. @code{read()}), ensures that all necessary | |
3275 @code{open()} and @code{stat()} preprocessor constants are defined, and | |
3276 possibly (usually) substitutes sugared versions of @code{read()}, | |
3277 @code{write()}, etc. that automatically restart interrupted I/O | |
3278 operations. | |
3279 | |
3280 @file{sysfloat.h} includes the necessary header files for floating-point | |
3281 operations. | |
3282 | |
3283 @file{sysproc.h} includes the necessary header files for calling | |
3284 @code{select()}, @code{fork()}, @code{execve()}, socket operations, and | |
3285 the like, and ensures that the @code{FD_*()} macros for descriptor-set | |
3286 manipulations are available. | |
3287 | |
3288 @file{syspwd.h} includes the necessary header files for obtaining | |
3289 information from @file{/etc/passwd} (the functions are emulated under | |
3290 VMS). | |
3291 | |
3292 @file{syssignal.h} includes the necessary header files for | |
3293 signal-handling and provides a uniform interface onto the different | |
3294 signal-handling and signal-blocking paradigms. | |
3295 | |
3296 @file{systime.h} includes the necessary header files and provides | |
3297 uniform interfaces for retrieving the time of day, setting file | |
3298 access/modification times, getting the amount of time used by the XEmacs | |
3299 process, etc. | |
3300 | |
3301 @file{systty.h} buffers against the infinitude of different ways of | |
3302 controlling TTY's. | |
3303 | |
3304 @file{syswait.h} provides a uniform way of retrieving the exit status | |
3305 from a @code{wait()}ed-on process (some systems use a union, others use | |
3306 an int). | |
3307 | |
3308 | |
3309 | |
3310 @example | |
3311 7940 hpplay.c | |
3312 10920 libsst.c | |
3313 1480 libsst.h | |
3314 3260 libst.h | |
3315 15355 linuxplay.c | |
3316 15849 nas.c | |
3317 19133 sgiplay.c | |
3318 15411 sound.c | |
3319 7358 sunplay.c | |
3320 @end example | |
3321 | |
3322 These files implement the ability to play various sounds on some types | |
3323 of computers. You have to configure your XEmacs with sound support in | |
3324 order to get this capability. | |
3325 | |
3326 @file{sound.c} provides the generic interface. It implements various | |
3327 Lisp primitives and variables that let you specify which sounds should | |
3328 be played in certain conditions. (The conditions are identified by | |
3329 symbols, which are passed to @code{ding} to make a sound. Various | |
3330 standard functions call this function at certain times; if sound support | |
3331 does not exist, a simple beep results. | |
3332 | |
3333 @cindex native sound | |
3334 @cindex sound, native | |
3335 @file{sgiplay.c}, @file{sunplay.c}, @file{hpplay.c}, and | |
3336 @file{linuxplay.c} interface to the machine's speaker for various | |
3337 different kind of machines. This is called @dfn{native} sound. | |
3338 | |
3339 @cindex sound, network | |
3340 @cindex network sound | |
3341 @cindex NAS | |
3342 @file{nas.c} interfaces to a computer somewhere else on the network | |
3343 using the NAS (Network Audio Server) protocol, playing sounds on that | |
3344 machine. This allows you to run XEmacs on a remote machine, with its | |
3345 display set to your local machine, and have the sounds be made on your | |
3346 local machine, provided that you have a NAS server running on your local | |
3347 machine. | |
3348 | |
3349 @file{libsst.c}, @file{libsst.h}, and @file{libst.h} provide some | |
3350 additional functions for playing sound on a Sun SPARC but are not | |
3351 currently in use. | |
3352 | |
3353 | |
3354 | |
3355 @example | |
3356 44368 tooltalk.c | |
3357 2137 tooltalk.h | |
3358 @end example | |
3359 | |
3360 These two modules implement an interface to the ToolTalk protocol, which | |
3361 is an interprocess communication protocol implemented on some versions | |
3362 of Unix. ToolTalk is a high-level protocol that allows processes to | |
3363 register themselves as providers of particular services; other processes | |
3364 can then request a service without knowing or caring exactly who is | |
3365 providing the service. It is similar in spirit to the DDE protocol | |
3366 provided under Microsoft Windows. ToolTalk is a part of the new CDE | |
3367 (Common Desktop Environment) specification and is used to connect the | |
3368 parts of the SPARCWorks development environment. | |
3369 | |
3370 | |
3371 | |
3372 @example | |
3373 22695 getloadavg.c | |
3374 @end example | |
3375 | |
3376 This module provides the ability to retrieve the system's current load | |
3377 average. (The way to do this is highly system-specific, unfortunately, | |
3378 and requires a lot of special-case code.) | |
3379 | |
3380 | |
3381 | |
3382 @example | |
3383 148520 energize.c | |
3384 6896 energize.h | |
3385 @end example | |
3386 | |
3387 This module provides code to interface to an Energize server (when | |
3388 XEmacs is used as part of Lucid's Energize development environment) and | |
3389 provides some other Energize-specific functions. Much of the code in | |
3390 this module should be made more general-purpose and moved elsewhere, but | |
3391 is no longer very relevant now that Lucid is defunct. It also hasn't | |
3392 worked since version 19.12, since nobody has been maintaining it. | |
3393 | |
3394 | |
3395 | |
3396 @example | |
3397 2861 sunpro.c | |
3398 @end example | |
3399 | |
3400 This module provides a small amount of code used internally at Sun to | |
3401 keep statistics on the usage of XEmacs. | |
3402 | |
3403 | |
3404 | |
3405 @example | |
3406 5548 broken-sun.h | |
3407 3468 strcmp.c | |
3408 2179 strcpy.c | |
3409 1650 sunOS-fix.c | |
3410 @end example | |
3411 | |
3412 These files provide replacement functions and prototypes to fix numerous | |
3413 bugs in early releases of SunOS 4.1. | |
3414 | |
3415 | |
3416 | |
3417 @example | |
3418 11669 hftctl.c | |
3419 @end example | |
3420 | |
3421 This module provides some terminal-control code necessary on versions of | |
3422 AIX prior to 4.1. | |
3423 | |
3424 | |
3425 | |
3426 @example | |
3427 1776 acldef.h | |
3428 1602 chpdef.h | |
3429 9032 uaf.h | |
3430 105 vlimit.h | |
3431 7145 vms-pp.c | |
3432 1158 vms-pwd.h | |
3433 26532 vmsfns.c | |
3434 6038 vmsmap.c | |
3435 695 vmspaths.h | |
3436 17482 vmsproc.c | |
3437 469 vmsproc.h | |
3438 @end example | |
3439 | |
3440 All of these files are used for VMS support, which has never worked in | |
3441 XEmacs. | |
3442 | |
3443 | |
3444 | |
3445 @example | |
3446 28316 msdos.c | |
3447 1472 msdos.h | |
3448 @end example | |
3449 | |
3450 These modules are used for MS-DOS support, which does not work in | |
3451 XEmacs. | |
3452 | |
3453 | |
3454 | |
3455 @node Modules for Interfacing with X Windows | |
3456 @section Modules for Interfacing with X Windows | |
3457 | |
3458 @example | |
3459 size name | |
3460 ------- --------------------- | |
3461 3196 Emacs.ad.h | |
3462 @end example | |
3463 | |
3464 A file generated from @file{Emacs.ad}, which contains XEmacs-supplied | |
3465 fallback resources (so that XEmacs has pretty defaults). | |
3466 | |
3467 | |
3468 | |
3469 @example | |
3470 24242 EmacsFrame.c | |
3471 6979 EmacsFrame.h | |
3472 3351 EmacsFrameP.h | |
3473 @end example | |
3474 | |
3475 These modules implement an Xt widget class that encapsulates a frame. | |
3476 This is for ease in integrating with Xt. The EmacsFrame widget covers | |
3477 the entire X window except for the menubar; the scrollbars are | |
3478 positioned on top of the EmacsFrame widget. | |
3479 | |
3480 @strong{Warning:} Abandon hope, all ye who enter here. This code took | |
3481 an ungodly amount of time to get right, and is likely to fall apart | |
3482 mercilessly at the slightest change. Such is life under Xt. | |
3483 | |
3484 | |
3485 | |
3486 @example | |
3487 8178 EmacsManager.c | |
3488 1967 EmacsManager.h | |
3489 1895 EmacsManagerP.h | |
3490 @end example | |
3491 | |
3492 These modules implement a simple Xt manager (i.e. composite) widget | |
3493 class that simply lets its children set whatever geometry they want. | |
3494 It's amazing that Xt doesn't provide this standardly, but on second | |
3495 thought, it makes sense, considering how amazingly broken Xt is. | |
3496 | |
3497 | |
3498 @example | |
3499 13188 EmacsShell-sub.c | |
3500 4588 EmacsShell.c | |
3501 2180 EmacsShell.h | |
3502 3133 EmacsShellP.h | |
3503 @end example | |
3504 | |
3505 These modules implement two Xt widget classes that are subclasses of | |
3506 the TopLevelShell and TransientShell classes. This is necessary to deal | |
3507 with more brokenness that Xt has sadistically thrust onto the backs of | |
3508 developers. | |
3509 | |
3510 | |
3511 | |
3512 @example | |
3513 9673 xgccache.c | |
3514 1111 xgccache.h | |
3515 @end example | |
3516 | |
3517 These modules provide functions for maintenance and caching of GC's | |
3518 (graphics contexts) under the X Window System. This code is junky and | |
3519 needs to be rewritten. | |
3520 | |
3521 | |
3522 | |
3523 @example | |
3524 69181 xselect.c | |
3525 @end example | |
3526 | |
3527 @cindex selections | |
3528 This module provides an interface to the X Window System's concept of | |
3529 @dfn{selections}, the standard way for X applications to communicate | |
3530 with each other. | |
3531 | |
3532 | |
3533 | |
3534 @example | |
3535 929 xintrinsic.h | |
3536 1038 xintrinsicp.h | |
3537 1579 xmmanagerp.h | |
3538 1585 xmprimitivep.h | |
3539 @end example | |
3540 | |
3541 These header files are similar in spirit to the @file{sys*.h} files and buffer | |
3542 against different implementations of Xt and Motif. | |
3543 | |
3544 @itemize @bullet | |
3545 @item | |
3546 @file{xintrinsic.h} should be included in place of @file{<Intrinsic.h>}. | |
3547 @item | |
3548 @file{xintrinsicp.h} should be included in place of @file{<IntrinsicP.h>}. | |
3549 @item | |
3550 @file{xmmanagerp.h} should be included in place of @file{<XmManagerP.h>}. | |
3551 @item | |
3552 @file{xmprimitivep.h} should be included in place of @file{<XmPrimitiveP.h>}. | |
3553 @end itemize | |
3554 | |
3555 | |
3556 | |
3557 @example | |
3558 16930 xmu.c | |
3559 936 xmu.h | |
3560 @end example | |
3561 | |
3562 These files provide an emulation of the Xmu library for those systems | |
3563 (i.e. HPUX) that don't provide it as a standard part of X. | |
3564 | |
3565 | |
3566 | |
3567 @example | |
3568 4201 ExternalClient-Xlib.c | |
3569 18083 ExternalClient.c | |
3570 2035 ExternalClient.h | |
3571 2104 ExternalClientP.h | |
3572 22684 ExternalShell.c | |
3573 1709 ExternalShell.h | |
3574 1971 ExternalShellP.h | |
3575 2478 extw-Xlib.c | |
3576 1481 extw-Xlib.h | |
3577 6565 extw-Xt.c | |
3578 1430 extw-Xt.h | |
3579 @end example | |
3580 | |
3581 @cindex external widget | |
3582 These files provide the @dfn{external widget} interface, which allows an | |
3583 XEmacs frame to appear as a widget in another application. To do this, | |
3584 you have to configure with @samp{--external-widget}. | |
3585 | |
3586 @file{ExternalShell*} provides the server (XEmacs) side of the | |
3587 connection. | |
3588 | |
3589 @file{ExternalClient*} provides the client (other application) side of | |
3590 the connection. These files are not compiled into XEmacs but are | |
3591 compiled into libraries that are then linked into your application. | |
3592 | |
3593 @file{extw-*} is common code that is used for both the client and server. | |
3594 | |
3595 Don't touch this code; something is liable to break if you do. | |
3596 | |
3597 | |
3598 | |
3599 @example | |
3600 31014 epoch.c | |
3601 @end example | |
3602 | |
3603 This file provides some additional, Epoch-compatible, functionality for | |
3604 interfacing to the X Window System. | |
3605 | |
3606 | |
3607 | |
3608 @node Modules for Internationalization | |
3609 @section Modules for Internationalization | |
3610 | |
3611 @example | |
3612 size name | |
3613 ------- --------------------- | |
3614 42836 mule-canna.c | |
3615 16737 mule-ccl.c | |
3616 41080 mule-charset.c | |
3617 30176 mule-charset.h | |
3618 146844 mule-coding.c | |
3619 16588 mule-coding.h | |
3620 6996 mule-mcpath.c | |
3621 2899 mule-mcpath.h | |
3622 57158 mule-wnnfns.c | |
3623 3351 mule.c | |
3624 @end example | |
3625 | |
3626 These files implement the MULE (Asian-language) support. Note that MULE | |
3627 actually provides a general interface for all sorts of languages, not | |
3628 just Asian languages (although they are generally the most complicated | |
3629 to support). This code is still in beta. | |
3630 | |
3631 @file{mule-charset.*} and @file{mule-coding.*} provide the heart of the | |
3632 XEmacs MULE support. @file{mule-charset.*} implements the @dfn{charset} Lisp object, | |
3633 which encapsulates a character set (an ordered one- or two-dimensional | |
3634 set of characters, such as US ASCII or JISX0208 Japanese Kanji). | |
3635 @file{mule-coding.*} implements the coding-system Lisp object, which | |
3636 encapsulates a method of converting between different encodings. An | |
3637 encoding is a representation of a stream of characters from multiple | |
3638 character sets using a stream of bytes or words and defines (e.g.) which | |
3639 escape sequences are used to specify particular character sets, how the | |
3640 indices for a character are converted into bytes (sometimes this | |
3641 involves setting the high bit; sometimes complicated rearranging of the | |
3642 values takes place, as in the Shift-JIS encoding), etc. | |
3643 | |
3644 @file{mule-ccl.c} provides the CCL (Code Conversion Language) | |
3645 interpreter. CCL is similar in spirit to Lisp byte code and is used to | |
3646 implement converters for custom encodings. | |
3647 | |
3648 @file{mule-canna.c} and @file{mule-wnnfns.c} implement interfaces to | |
3649 external programs used to implement the Canna and WNN input methods, | |
3650 respectively. This is currently broken. | |
3651 | |
3652 @file{mule-mcpatch.c} provides some functions to allow for pathnames | |
3653 containing extended characters. This code is fragmentary and completely | |
3654 non-working. | |
3655 | |
3656 @file{mule.c} provides a few miscellaneous things that should probably | |
3657 be elsewhere. | |
3658 | |
3659 | |
3660 | |
3661 @example | |
3662 9400 intl.c | |
3663 @end example | |
3664 | |
3665 This provides some miscellaneous internationalization code for | |
3666 implementing message translation and interfacing to the Ximp input | |
3667 method. None of this code is currently working. | |
3668 | |
3669 | |
3670 | |
3671 @example | |
3672 1764 iso-wide.h | |
3673 @end example | |
3674 | |
3675 This contains leftover code from an earlier implementation of | |
3676 Asian-language support, and is not currently used. | |
3677 | |
3678 | |
3679 | |
3680 | |
3681 @node Allocation of Objects in XEmacs Lisp, Events and the Event Loop, A Summary of the Various XEmacs Modules, Top | |
3682 @chapter Allocation of Objects in XEmacs Lisp | |
3683 | |
3684 @menu | |
3685 * Introduction to Allocation:: | |
3686 * Garbage Collection:: | |
3687 * GCPROing:: | |
3688 * Integers and Characters:: | |
3689 * Allocation from Frob Blocks:: | |
3690 * lrecords:: | |
3691 * Low-level allocation:: | |
3692 * Pure Space:: | |
3693 * Cons:: | |
3694 * Vector:: | |
3695 * Bit Vector:: | |
3696 * Symbol:: | |
3697 * Marker:: | |
3698 * String:: | |
3699 * Bytecode:: | |
3700 @end menu | |
3701 | |
3702 @node Introduction to Allocation | |
3703 @section Introduction to Allocation | |
3704 | |
3705 Emacs Lisp, like all Lisps, has garbage collection. This means that | |
3706 the programmer never has to explicitly free (destroy) an object; it | |
3707 happens automatically when the object becomes inaccessible. Most | |
3708 experts agree that garbage collection is a necessity in a modern, | |
3709 high-level language. Its omission from C stems from the fact that C was | |
3710 originally designed to be a nice abstract layer on top of assembly | |
3711 language, for writing kernels and basic system utilities rather than | |
3712 large applications. | |
3713 | |
3714 Lisp objects can be created by any of a number of Lisp primitives. | |
3715 Most object types have one or a small number of basic primitives | |
3716 for creating objects. For conses, the basic primitive is @code{cons}; | |
3717 for vectors, the primitives are @code{make-vector} and @code{vector}; for | |
3718 symbols, the primitives are @code{make-symbol} and @code{intern}; etc. | |
3719 Some Lisp objects, especially those that are primarily used internally, | |
3720 have no corresponding Lisp primitives. Every Lisp object, though, | |
3721 has at least one C primitive for creating it. | |
3722 | |
3723 Recall from section (VII) that a Lisp object, as stored in a 32-bit | |
3724 or 64-bit word, has a mark bit, a few tag bits, and a ``value'' that | |
3725 occupies the remainder of the bits. We can separate the different | |
3726 Lisp object types into four broad categories: | |
3727 | |
3728 @itemize @bullet | |
3729 @item | |
3730 (a) Those for whom the value directly represents the contents of the | |
3731 Lisp object. Only two types are in this category: integers and | |
3732 characters. No special allocation or garbage collection is necessary | |
3733 for such objects. | |
3734 @end itemize | |
3735 | |
3736 In the remaining three categories, the value is a pointer to a | |
3737 structure. | |
3738 | |
3739 @itemize @bullet | |
3740 @item | |
3741 @cindex frob block | |
3742 (b) Those for whom the tag directly specifies the type. Recall that | |
3743 there are only three tag bits; this means that at most five types can be | |
3744 specified this way. The most commonly-used types are stored in this | |
3745 format; this includes conses, strings, vectors, and sometimes symbols. | |
3746 With the exception of vectors, objects in this category are allocated in | |
3747 @dfn{frob blocks}, i.e. large blocks of memory that are subdivided into | |
3748 individual objects. This saves a lot on malloc overhead, since there | |
3749 are typically quite a lot of these objects around, and the objects are | |
3750 small. (A cons, for example, occupies 8 bytes on 32-bit machines -- 4 | |
3751 bytes for each of the two objects it contains.) Vectors are individually | |
3752 @code{malloc()}ed since they are of variable size. (It would be | |
3753 possible, and desirable, to allocate vectors of certain small sizes out | |
3754 of frob blocks, but it isn't currently done.) Strings are handled | |
3755 specially: Each string is allocated in two parts, a fixed size structure | |
3756 containing a length and a data pointer, and the actual data of the | |
3757 string. The former structure is allocated in frob blocks as usual, and | |
3758 the latter data is stored in @dfn{string chars blocks} and is relocated | |
3759 during garbage collection to eliminate holes. | |
3760 @end itemize | |
3761 | |
3762 In the remaining two categories, the type is stored in the object | |
3763 itself. The tag for all such objects is the generic @dfn{lrecord} | |
3764 (Lisp_Record) tag. The first four bytes (or eight, for 64-bit machines) | |
3765 of the object's structure are a pointer to a structure that describes | |
3766 the object's type, which includes method pointers and a pointer to a | |
3767 string naming the type. Note that it's possible to save some space by | |
3768 using a one- or two-byte tag, rather than a four- or eight-byte pointer | |
3769 to store the type, but it's not clear it's worth making the change. | |
3770 | |
3771 @itemize @bullet | |
3772 @item | |
3773 (c) Those lrecords that are allocated in frob blocks (see above). This | |
3774 includes the objects that are most common and relatively small, and | |
3775 includes floats, bytecodes, symbols (when not in category (b)), extents, | |
3776 events, and markers. With the cleanup of frob blocks done in 19.12, | |
3777 it's not terribly hard to add more objects to this category, but it's a | |
3778 bit trickier than adding an object type to type (d) (esp. if the object | |
3779 needs a finalization method), and is not likely to save much space | |
3780 unless the object is small and there are many of them. (In fact, if | |
3781 there are very few of them, it might actually waste space.) | |
3782 @item | |
3783 (d) Those lrecords that are individually @code{malloc()}ed. These are | |
3784 called @dfn{lcrecords}. All other types are in this category. Adding a | |
3785 new type to this category is comparatively easy, and all types added | |
3786 since 19.8 (when the current allocation scheme was devised, by Richard | |
3787 Mlynarik), with the exception of the character type, have been in this | |
3788 category. | |
3789 @end itemize | |
3790 | |
3791 Note that bit vectors are a bit of a special case. They are | |
3792 simple lrecords as in category (c), but are individually @code{malloc()}ed | |
3793 like vectors. You can basically view them as exactly like vectors | |
3794 except that their type is stored in lrecord fashion rather than | |
3795 in directly-tagged fashion. | |
3796 | |
3797 Note that FSF Emacs redesigned their object system in 19.29 to follow | |
3798 a similar scheme. However, given RMS's expressed dislike for data | |
3799 abstraction, the FSF scheme is not nearly as clean or as easy to | |
3800 extend. (FSF calls items of type (c) @code{Lisp_Misc} and items of type | |
3801 (d) @code{Lisp_Vectorlike}, with separate tags for each, although | |
3802 @code{Lisp_Vectorlike} is also used for vectors.) | |
3803 | |
3804 @node Garbage Collection | |
3805 @section Garbage Collection | |
3806 @cindex garbage collection | |
3807 | |
3808 @cindex mark and sweep | |
3809 Garbage collection is simple in theory but tricky to implement. | |
3810 Emacs Lisp uses the oldest garbage collection method, called | |
3811 @dfn{mark and sweep}. Garbage collection begins by starting with | |
3812 all accessible locations (i.e. all variables and other slots where | |
3813 Lisp objects might occur) and recursively traversing all objects | |
3814 accessible from those slots, marking each one that is found. | |
3815 We then go through all of memory and free each object that is | |
3816 not marked, and unmarking each object that is marked. Note | |
3817 that ``all of memory'' means all currently allocated objects. | |
3818 Traversing all these objects means traversing all frob blocks, | |
3819 all vectors (which are chained in one big list), and all | |
3820 lcrecords (which are likewise chained). | |
3821 | |
3822 Note that, when an object is marked, the mark has to occur | |
3823 inside of the object's structure, rather than in the 32-bit | |
3824 @code{Lisp_Object} holding the object's pointer; i.e. you can't just | |
3825 set the pointer's mark bit. This is because there may be many | |
3826 pointers to the same object. This means that the method of | |
3827 marking an object can differ depending on the type. The | |
3828 different marking methods are approximately as follows: | |
3829 | |
3830 @enumerate | |
3831 @item | |
3832 For conses, the mark bit of the car is set. | |
3833 @item | |
3834 For strings, the mark bit of the string's plist is set. | |
3835 @item | |
3836 For symbols when not lrecords, the mark bit of the | |
3837 symbol's plist is set. | |
3838 @item | |
3839 For vectors, the length is negated after adding 1. | |
3840 @item | |
3841 For lrecords, the pointer to the structure describing | |
3842 the type is changed (see below). | |
3843 @item | |
3844 Integers and characters do not need to be marked, since | |
3845 no allocation occurs for them. | |
3846 @end enumerate | |
3847 | |
3848 The details of this are in the @code{mark_object()} function. | |
3849 | |
3850 Note that any code that operates during garbage collection has | |
3851 to be especially careful because of the fact that some objects | |
3852 may be marked and as such may not look like they normally do. | |
3853 In particular: | |
3854 | |
3855 @itemize @bullet | |
3856 Some object pointers may have their mark bit set. This will make | |
3857 @code{FOOBARP()} predicates fail. Use @code{GC_FOOBARP()} to deal with | |
3858 this. | |
3859 @item | |
3860 Even if you clear the mark bit, @code{FOOBARP()} will still fail | |
3861 for lrecords because the implementation pointer has been | |
3862 changed (see below). @code{GC_FOOBARP()} will correctly deal with | |
3863 this. | |
3864 @item | |
3865 Vectors have their size field munged, so anything that | |
3866 looks at this field will fail. | |
3867 @item | |
3868 Note that @code{XFOOBAR()} macros @emph{will} work correctly on object | |
3869 pointers with their mark bit set, because the logical shift operations | |
3870 that remove the tag also remove the mark bit. | |
3871 @end itemize | |
3872 | |
3873 Finally, note that garbage collection can be invoked explicitly | |
3874 by calling @code{garbage-collect} but is also called automatically | |
3875 by @code{eval}, once a certain amount of memory has been allocated | |
3876 since the last garbage collection (according to @code{gc-cons-threshold}). | |
3877 | |
3878 @node GCPROing | |
3879 @section @code{GCPRO}ing | |
3880 | |
3881 @code{GCPRO}ing is one of the ugliest and trickiest parts of Emacs | |
3882 internals. The basic idea is that whenever garbage collection | |
3883 occurs, all in-use objects must be reachable somehow or | |
3884 other from one of the roots of accessibility. The roots | |
3885 of accessibility are: | |
3886 | |
3887 @enumerate | |
3888 @item | |
3889 All objects that have been @code{staticpro()}d. This is used for | |
3890 any global C variables that hold Lisp objects. A call to | |
3891 @code{staticpro()} happens implicitly as a result of any symbols | |
3892 declared with @code{defsymbol()} and any variables declared with | |
3893 @code{DEFVAR_FOO()}. You need to explicitly call @code{staticpro()} | |
3894 (in the @code{vars_of_foo()} method of a module) for other global | |
3895 C variables holding Lisp objects. (This typically includes | |
3896 internal lists and such things.) | |
3897 | |
3898 Note that @code{obarray} is one of the @code{staticpro()}d things. | |
3899 Therefore, all functions and variables get marked through this. | |
3900 @item | |
3901 Any shadowed bindings that are sitting on the specpdl stack. | |
3902 @item | |
3903 Any objects sitting in currently active stack frames, | |
3904 catches, and condition cases. | |
3905 @item | |
3906 A couple of special-case places where active objects are | |
3907 located. | |
3908 @item | |
3909 Anything currently marked with @code{GCPRO}. | |
3910 @end enumerate | |
3911 | |
3912 Marking with @code{GCPRO} is necessary because some C functions (quite | |
3913 a lot, in fact), allocate objects during their operation. Quite | |
3914 frequently, there will be no other pointer to the object while the | |
3915 function is running, and if a garbage collection occurs and the object | |
3916 needs to be referenced again, bad things will happen. The solution is | |
3917 to mark those objects with @code{GCPRO}. Unfortunately this is easy to | |
3918 forget, and there is basically no way around this problem. Here are | |
3919 some rules, though: | |
3920 | |
3921 @enumerate | |
3922 @item | |
3923 For every @code{GCPRO@var{n}}, there have to be declarations of | |
3924 @code{struct gcpro gcpro1, gcpro2}, etc. | |
3925 | |
3926 @item | |
3927 You @emph{must} @code{UNGCPRO} anything that's @code{GCPRO}ed, and you | |
3928 @emph{must not} @code{UNGCPRO} if you haven't @code{GCPRO}ed. Getting | |
3929 either of these wrong will lead to crashes, often in completely random | |
3930 places unrelated to where the problem lies. | |
3931 | |
3932 @item | |
3933 The way this actually works is that all currently active @code{GCPRO}s | |
3934 are chained through the @code{struct gcpro} local variables, with the | |
3935 variable @samp{gcprolist} pointing to the head of the list and the nth | |
3936 local @code{gcpro} variable pointing to the first @code{gcpro} variable | |
3937 in the next enclosing stack frame. Each @code{GCPRO}ed thing is an | |
3938 lvalue, and the @code{struct gcpro} local variable contains a pointer to | |
3939 this lvalue. This is why things will mess up badly if you don't pair up | |
3940 the @code{GCPRO}s and @code{UNGCPRO}s -- you will end up with | |
3941 @code{gcprolist}s containing pointers to @code{struct gcpro}s or local | |
3942 @code{Lisp_Object} variables in no-longer-active stack frames. | |
3943 | |
3944 @item | |
3945 It is actually possible for a single @code{struct gcpro} to | |
3946 protect a contiguous array of any number of values, rather than | |
3947 just a single lvalue. To effect this, call @code{GCPRO@var{n}} as usual on | |
3948 the first object in the array and then set @code{gcpron.nvars}. | |
3949 | |
3950 @item | |
3951 @strong{Strings are relocated.} What this means in practice is that the | |
3952 pointer obtained using @code{string_data()} is liable to change at any | |
3953 time, and you should never keep it around past any function call, or | |
3954 pass it as an argument to any function that might cause a garbage | |
3955 collection. This is why a number of functions accept either a | |
3956 ``non-relocatable'' @code{char *} pointer or a relocatable Lisp string, | |
3957 and only access the Lisp string's data at the very last minute. In some | |
3958 cases, you may end up having to @code{alloca()} some space and copy the | |
3959 string's data into it. | |
3960 | |
3961 @item | |
3962 By convention, if you have to nest @code{GCPRO}'s, use @code{NGCPRO@var{n}} | |
3963 (along with @code{struct gcpro ngcpro1, ngcpro2}, etc.), @code{NNGCPRO@var{n}}, | |
3964 etc. This avoids compiler warnings about shadowed locals. | |
3965 | |
3966 @item | |
3967 It is @emph{always} better to err on the side of extra @code{GCPRO}s | |
3968 rather than too few. The extra cycles spent on this are | |
3969 almost never going to make a whit of difference in the | |
3970 speed of anything. | |
3971 | |
3972 @item | |
3973 The general rule to follow is that caller, not callee, @code{GCPRO}s. | |
3974 That is, you should not have to explicitly @code{GCPRO} any Lisp objects | |
3975 that are passed in as parameters, but if you create any Lisp objects | |
3976 (remember, this happens in all sorts of circumstances, e.g. with | |
3977 @code{Fcons()}, etc.), you are responsible for @code{GCPRO}ing the | |
3978 objects unless you are @emph{absolutely sure} that there's no | |
3979 possibility that a garbage-collection can occur while you need to use | |
3980 the object. Even then, consider @code{GCPRO}ing. | |
3981 | |
3982 @item | |
3983 A garbage collection can occur whenever anything calls @code{Feval}, or | |
3984 whenever a QUIT can occur where execution can continue past | |
3985 this. (Remember, this is almost anywhere.) | |
3986 | |
3987 @item | |
3988 If you have the @emph{least smidgeon of doubt} about whether | |
3989 you need to @code{GCPRO}, you should @code{GCPRO}. | |
3990 | |
3991 @item | |
3992 Beware of @code{GCPRO}ing something that is uninitialized. If you have | |
3993 any shade of doubt about this, initialize all your variables to Qnil. | |
3994 | |
3995 @item | |
3996 Be careful of traps, like calling @code{Fcons()} in the argument to | |
3997 another function. By the ``caller protects'' law, you should be | |
3998 @code{GCPRO}ing the newly-created cons, but you aren't. A certain | |
3999 number of functions that are commonly called on freshly created stuff | |
4000 (e.g. @code{nconc2()}, @code{Fsignal()}), break the ``caller protects'' | |
4001 law and go ahead and @code{GCPRO} their arguments so as to simplify | |
4002 things, but make sure and check if it's OK whenever doing something like | |
4003 this. | |
4004 | |
4005 @item | |
4006 Once again, remember to @code{GCPRO}! Bugs resulting from insufficient | |
4007 @code{GCPRO}ing are intermittent and extremely difficult to track down, | |
4008 often showing up in crashes inside of @code{garbage-collect} or in | |
4009 weirdly corrupted objects or even in incorrect values in a totally | |
4010 different section of code. | |
4011 @end enumerate | |
4012 | |
4013 @cindex garbage collection, conservative | |
4014 @cindex conservative garbage collection | |
4015 Given the extremely error-prone nature of the @code{GCPRO} scheme, and | |
4016 the difficulties in tracking down, it should be considered a deficiency | |
4017 in the XEmacs code. A solution to this problem would involve | |
4018 implementing so-called @dfn{conservative} garbage collection for the C | |
4019 stack. That involves looking through all of stack memory and treating | |
4020 anything that looks like a reference to an object as a reference. This | |
4021 will result in a few objects not getting collected when they should, but | |
4022 it obviates the need for @code{GCPRO}ing, and allows garbage collection | |
4023 to happen at any point at all, such as during object allocation. | |
4024 | |
4025 @node Integers and Characters | |
4026 @section Integers and Characters | |
4027 | |
4028 Integer and character Lisp objects are created from integers using the | |
4029 macros @code{XSETINT()} and @code{XSETCHAR()} or the equivalent | |
4030 functions @code{make_int()} and @code{make_char()}. (These are actually | |
4031 macros on most systems.) These functions basically just do some moving | |
4032 of bits around, since the integral value of the object is stored | |
4033 directly in the @code{Lisp_Object}. | |
4034 | |
4035 @code{XSETINT()} and the like will truncate values given to them that | |
4036 are too big; i.e. you won't get the value you expected but the tag bits | |
4037 will at least be correct. | |
4038 | |
4039 @node Allocation from Frob Blocks | |
4040 @section Allocation from Frob Blocks | |
4041 | |
4042 The uninitialized memory required by a @code{Lisp_Object} of a particular type | |
4043 is allocated using | |
4044 @code{ALLOCATE_FIXED_TYPE()}. This only occurs inside of the | |
4045 lowest-level object-creating functions in @file{alloc.c}: | |
4046 @code{Fcons()}, @code{make_float()}, @code{Fmake_byte_code()}, | |
4047 @code{Fmake_symbol()}, @code{allocate_extent()}, | |
4048 @code{allocate_event()}, @code{Fmake_marker()}, and | |
4049 @code{make_uninit_string()}. The idea is that, for each type, there are | |
4050 a number of frob blocks (each 2K in size); each frob block is divided up | |
4051 into object-sized chunks. Each frob block will have some of these | |
4052 chunks that are currently assigned to objects, and perhaps some that are | |
4053 free. (If a frob block has nothing but free chunks, it is freed at the | |
4054 end of the garbage collection cycle.) The free chunks are stored in a | |
4055 free list, which is chained by storing a pointer in the first four bytes | |
4056 of the chunk. (Except for the free chunks at the end of the last frob | |
4057 block, which are handled using an index which points past the end of the | |
4058 last-allocated chunk in the last frob block.) | |
4059 @code{ALLOCATE_FIXED_TYPE()} first tries to retrieve a chunk from the | |
4060 free list; if that fails, it calls | |
4061 @code{ALLOCATE_FIXED_TYPE_FROM_BLOCK()}, which looks at the end of the | |
4062 last frob block for space, and creates a new frob block if there is | |
4063 none. (There are actually two versions of these macros, one of which is | |
4064 more defensive but less efficient and is used for error-checking.) | |
4065 | |
4066 @node lrecords | |
4067 @section lrecords | |
4068 | |
4069 [see @file{lrecord.h}] | |
4070 | |
4071 All lrecords have at the beginning of their structure a @code{struct | |
4072 lrecord_header}. This just contains a pointer to a @code{struct | |
4073 lrecord_implementation}, which is a structure containing method pointers | |
4074 and such. There is one of these for each type, and it is a global, | |
4075 constant, statically-declared structure that is declared in the | |
4076 @code{DEFINE_LRECORD_IMPLEMENTATION()} macro. (This macro actually | |
4077 declares an array of two @code{struct lrecord_implementation} | |
4078 structures. The first one contains all the standard method pointers, | |
4079 and is used in all normal circumstances. During garbage collection, | |
4080 however, the lrecord is @dfn{marked} by bumping its implementation | |
4081 pointer by one, so that it points to the second structure in the array. | |
4082 This structure contains a special indication in it that it's a | |
4083 @dfn{marked-object} structure: the finalize method is the special | |
4084 function @code{this_marks_a_marked_record()}, and all other methods are | |
4085 null pointers. At the end of garbage collection, all lrecords will | |
4086 either be reclaimed or unmarked by decrementing their implementation | |
4087 pointers, so this second structure pointer will never remain past | |
4088 garbage collection. | |
4089 | |
4090 Simple lrecords (of type (c) above) just have a @code{struct | |
4091 lrecord_header} at their beginning. lcrecords, however, actually have a | |
4092 @code{struct lcrecord_header}. This, in turn, has a @code{struct | |
4093 lrecord_header} at its beginning, so sanity is preserved; but it also | |
4094 has a pointer used to chain all lrecords together, and a special ID | |
4095 field used to distinguish one lcrecord from another. (This field is used | |
4096 only for debugging and could be removed, but the space gain is not | |
4097 significant.) | |
4098 | |
4099 Simple lrecords are created using @code{ALLOCATE_FIXED_TYPE()}, just | |
4100 like for other frob blocks. The only change is that the implementation | |
4101 pointer must be initialized correctly. (The implementation structure for | |
4102 an lrecord, or rather the pointer to it, is named @code{lrecord_float}, | |
4103 @code{lrecord_extent}, @code{lrecord_buffer}, etc.) | |
4104 | |
4105 lcrecords are created using @code{alloc_lcrecord()}. This takes a | |
4106 size to allocate and an implementation pointer. (The size needs to be | |
4107 passed because some lcrecords, such as window configurations, are of | |
4108 variable size.) This basically just @code{malloc()}s the storage, | |
4109 initializes the @code{struct lcrecord_header}, and chains the lcrecord | |
4110 onto the head of the list of all lcrecords, which is stored in the | |
4111 variable @code{all_lcrecords}. The calls to @code{alloc_lcrecord()} | |
4112 generally occur in the lowest-level allocation function for each lrecord | |
4113 type. | |
4114 | |
4115 Whenever you create an lrecord, you need to call either | |
4116 @code{DEFINE_LRECORD_IMPLEMENTATION()} or | |
4117 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()}. This needs to be | |
4118 specified in a C file, at the top level. What this actually does is | |
4119 define and initialize the implementation structure for the lrecord. (And | |
4120 possibly declares a function @code{error_check_foo()} that implements | |
4121 the @code{XFOO()} macro when error-checking is enabled.) The arguments | |
4122 to the macros are the actual type name (this is used to construct the C | |
4123 variable name of the lrecord implementation structure and related | |
4124 structures using the @samp{##} macro concatenation operator), a string | |
4125 that names the type on the Lisp level (this may not be the same as the C | |
4126 type name; typically, the C type name has underscores, while the Lisp | |
4127 string has dashes), various method pointers, and the name of the C | |
4128 structure that contains the object. The methods are used to encapsulate | |
4129 type-specific information about the object, such as how to print it or | |
4130 mark it for garbage collection, so that it's easy to add new object | |
4131 types without having to add a specific case for each new type in a bunch | |
4132 of different places. | |
4133 | |
4134 The difference between @code{DEFINE_LRECORD_IMPLEMENTATION()} and | |
4135 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()} is that the former is | |
4136 used for fixed-size object types and the latter is for variable-size | |
4137 object types. Most object types are fixed-size; some complex | |
4138 types, however (e.g. window configurations), are variable-size. | |
4139 Variable-size object types have an extra method, which is called | |
4140 to determine the actual size of a particular object of that type. | |
4141 (Currently this is only used for keeping allocation statistics.) | |
4142 | |
4143 For the purpose of keeping allocation statistics, the allocation | |
4144 engine keeps a list of all the different types that exist. Note that, | |
4145 since @code{DEFINE_LRECORD_IMPLEMENTATION()} is a macro that is | |
4146 specified at top-level, there is no way for it to add to the list of all | |
4147 existing types. What happens instead is that each implementation | |
4148 structure contains in it a dynamically assigned number that is | |
4149 particular to that type. (Or rather, it contains a pointer to another | |
4150 structure that contains this number. This evasiveness is done so that | |
4151 the implementation structure can be declared const.) In the sweep stage | |
4152 of garbage collection, each lrecord is examined to see if its | |
4153 implementation structure has its dynamically-assigned number set. If | |
4154 not, it must be a new type, and it is added to the list of known types | |
4155 and a new number assigned. The number is used to index into an array | |
4156 holding the number of objects of each type and the total memory | |
4157 allocated for objects of that type. The statistics in this array are | |
4158 also computed during the sweep stage. These statistics are returned by | |
4159 the call to @code{garbage-collect} and are printed out at the end of the | |
4160 loadup phase. | |
4161 | |
4162 Note that for every type defined with a @code{DEFINE_LRECORD_*()} | |
4163 macro, there needs to be a @code{DECLARE_LRECORD_IMPLEMENTATION()} | |
4164 somewhere in a @file{.h} file, and this @file{.h} file needs to be | |
4165 included by @file{inline.c}. | |
4166 | |
4167 Furthermore, there should generally be a set of @code{XFOOBAR()}, | |
4168 @code{FOOBARP()}, etc. macros in a @file{.h} (or occasionally @file{.c}) | |
4169 file. To create one of these, copy an existing model and modify as | |
4170 necessary. | |
4171 | |
4172 The various methods in the lrecord implementation structure are: | |
4173 | |
4174 @enumerate | |
4175 @item | |
4176 @cindex mark method | |
4177 A @dfn{mark} method. This is called during the marking stage and passed | |
4178 a function pointer (usually the @code{mark_object()} function), which is | |
4179 used to mark an object. All Lisp objects that are contained within the | |
4180 object need to be marked by applying this function to them. The mark | |
4181 method should also return a Lisp object, which should be either nil or | |
4182 an object to mark. (This can be used in lieu of calling | |
4183 @code{mark_object()} on the object, to reduce the recursion depth, and | |
4184 consequently should be the most heavily nested sub-object, such as a | |
4185 long list.) | |
4186 | |
4187 @strong{Note}: When the mark method is called, garbage collection | |
4188 is in progress, and special precautions need to be taken | |
4189 when accessing objects; see section (B) above. | |
4190 | |
4191 If your mark method does not need to do anything, it can be | |
4192 @code{NULL}. | |
4193 | |
4194 @item | |
4195 A @dfn{print} method. This is called to create a printed representation | |
4196 of the object, whenever @code{princ}, @code{prin1}, or the like is | |
4197 called. It is passed the object, a stream to which the output is to be | |
4198 directed, and an @code{escapeflag} which indicates whether the object's | |
4199 printed representation should be @dfn{escaped} so that it is | |
4200 readable. (This corresponds to the difference between @code{princ} and | |
4201 @code{prin1}.) Basically, @dfn{escaped} means that strings will have | |
4202 quotes around them and confusing characters in the strings such as | |
4203 quotes, backslashes, and newlines will be backslashed; and that special | |
4204 care will be taken to make symbols print in a readable fashion | |
4205 (e.g. symbols that look like numbers will be backslashed). Other | |
4206 readable objects should perhaps pass @code{escapeflag} on when | |
4207 sub-objects are printed, so that readability is preserved when necessary | |
4208 (or if not, always pass in a 1 for @code{escapeflag}). Non-readable | |
4209 objects should in general ignore @code{escapeflag}, except that some use | |
4210 it as an indication that more verbose output should be given. | |
4211 | |
4212 Sub-objects are printed using @code{print_internal()}, which takes | |
4213 exactly the same arguments as are passed to the print method. | |
4214 | |
4215 Literal C strings should be printed using @code{write_c_string()}, | |
4216 or @code{write_string_1()} for non-null-terminated strings. | |
4217 | |
4218 Functions that do not have a readable representation should check the | |
4219 @code{print_readably} flag and signal an error if it is set. | |
4220 | |
4221 If you specify NULL for the print method, the | |
4222 @code{default_object_printer()} will be used. | |
4223 | |
4224 @item | |
4225 A @dfn{finalize} method. This is called at the beginning of the sweep | |
4226 stage on lcrecords that are about to be freed, and should be used to | |
4227 perform any extra object cleanup. This typically involves freeing any | |
4228 extra @code{malloc()}ed memory associated with the object, releasing any | |
4229 operating-system and window-system resources associated with the object | |
4230 (e.g. pixmaps, fonts), etc. | |
4231 | |
4232 The finalize method can be NULL if nothing needs to be done. | |
4233 | |
4234 WARNING #1: The finalize method is also called at the end of the dump | |
4235 phase; this time with the for_disksave parameter set to non-zero. The | |
4236 object is @emph{not} about to disappear, so you have to make sure to | |
4237 @emph{not} free any extra @code{malloc()}ed memory if you're going to | |
4238 need it later. (Also, signal an error if there are any operating-system | |
4239 and window-system resources here, because they can't be dumped.) | |
4240 | |
4241 Finalize methods should, as a rule, set to zero any pointers after | |
4242 they've been freed, and check to make sure pointers are not zero before | |
4243 freeing. Although I'm pretty sure that finalize methods are not called | |
4244 twice on the same object (except for the @code{for_disksave} proviso), | |
4245 we've gotten nastily burned in some cases by not doing this. | |
4246 | |
4247 WARNING #2: The finalize method is @emph{only} called for | |
4248 lcrecords, @emph{not} for simply lrecords. If you need a | |
4249 finalize method for simple lrecords, you have to stick | |
4250 it in the @code{ADDITIONAL_FREE_foo()} macro in @file{alloc.c}. | |
4251 | |
4252 WARNING #3: Things are in an @emph{extremely} bizarre state | |
4253 when @code{ADDITIONAL_FREE_foo()} is called, so you have to | |
4254 be incredibly careful when writing one of these functions. | |
4255 See the comment in @code{gc_sweep()}. If you ever have to add | |
4256 one of these, consider using an lcrecord or dealing with | |
4257 the problem in a different fashion. | |
4258 | |
4259 @item | |
4260 An @dfn{equal} method. This compares the two objects for similarity, | |
4261 when @code{equal} is called. It should compare the contents of the | |
4262 objects in some reasonable fashion. It is passed the two objects and a | |
4263 @dfn{depth} value, which is used to catch circular objects. To compare | |
4264 sub-Lisp-objects, call @code{internal_equal()} and bump the depth value | |
4265 by one. If this value gets too high, a @code{circular-object} error | |
4266 will be signaled. | |
4267 | |
4268 If this is NULL, objects are @code{equal} only when they are @code{eq}, | |
4269 i.e. identical. | |
4270 | |
4271 @item | |
4272 A @dfn{hash} method. This is used to hash objects when they are to be | |
4273 compared with @code{equal}. The rule here is that if two objects are | |
4274 @code{equal}, they @emph{must} hash to the same value; i.e. your hash | |
4275 function should use some subset of the sub-fields of the object that are | |
4276 compared in the ``equal'' method. If you specify this method as | |
4277 @code{NULL}, the object's pointer will be used as the hash, which will | |
4278 @emph{fail} if the object has an @code{equal} method, so don't do this. | |
4279 | |
4280 To hash a sub-Lisp-object, call @code{internal_hash()}. Bump the | |
4281 depth by one, just like in the ``equal'' method. | |
4282 | |
4283 To convert a Lisp object directly into a hash value (using | |
4284 its pointer), use @code{LISP_HASH()}. This is what happens when | |
4285 the hash method is NULL. | |
4286 | |
4287 To hash two or more values together into a single value, use | |
4288 @code{HASH2()}, @code{HASH3()}, @code{HASH4()}, etc. | |
4289 | |
4290 @item | |
4291 @dfn{getprop}, @dfn{putprop}, @dfn{remprop}, and @dfn{plist} methods. | |
4292 These are used for object types that have properties. I don't feel like | |
4293 documenting them here. If you create one of these objects, you have to | |
4294 use different macros to define them, | |
4295 i.e. @code{DEFINE_LRECORD_IMPLEMENTATION_WITH_PROPS()} or | |
4296 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION_WITH_PROPS()}. | |
4297 | |
4298 @item | |
4299 A @dfn{size_in_bytes} method, when the object is of variable-size. | |
4300 (i.e. declared with a @code{_SEQUENCE_IMPLEMENTATION} macro.) This should | |
4301 simply return the object's size in bytes, exactly as you might expect. | |
4302 For an example, see the methods for window configurations and opaques. | |
4303 @end enumerate | |
4304 | |
4305 @node Low-level allocation | |
4306 @section Low-level allocation | |
4307 | |
4308 Memory that you want to allocate directly should be allocated using | |
4309 @code{xmalloc()} rather than @code{malloc()}. This implements | |
4310 error-checking on the return value, and once upon a time did some more | |
4311 vital stuff (i.e. @code{BLOCK_INPUT}, which is no longer necessary). | |
4312 Free using @code{xfree()}, and realloc using @code{xrealloc()}. Note | |
4313 that @code{xmalloc()} will do a non-local exit if the memory can't be | |
4314 allocated. (Many functions, however, do not expect this, and thus XEmacs | |
4315 will likely crash if this happens. @strong{This is a bug.} If you can, | |
4316 you should strive to make your function handle this OK. However, it's | |
4317 difficult in the general circumstance, perhaps requiring extra | |
4318 unwind-protects and such.) | |
4319 | |
4320 Note that XEmacs provides two separate replacements for the standard | |
4321 @code{malloc()} library function. These are called @dfn{old GNU malloc} | |
4322 (@file{malloc.c}) and @dfn{new GNU malloc} (@file{gmalloc.c}), | |
4323 respectively. New GNU malloc is better in pretty much every way than | |
4324 old GNU malloc, and should be used if possible. (It used to be that on | |
4325 some systems, the old one worked but the new one didn't. I think this | |
4326 was due specifically to a bug in SunOS, which the new one now works | |
4327 around; so I don't think the old one ever has to be used any more.) The | |
4328 primary difference between both of these mallocs and the standard system | |
4329 malloc is that they are much faster, at the expense of increased space. | |
4330 The basic idea is that memory is allocated in fixed chunks of powers of | |
4331 two. This allows for basically constant malloc time, since the various | |
4332 chunks can just be kept on a number of free lists. (The standard system | |
4333 malloc typically allocates arbitrary-sized chunks and has to spend some | |
4334 time, sometimes a significant amount of time, walking the heap looking | |
4335 for a free block to use and cleaning things up.) The new GNU malloc | |
4336 improves on things by allocating large objects in chunks of 4096 bytes | |
4337 rather than in ever larger powers of two, which results in ever larger | |
4338 wastage. There is a slight speed loss here, but it's of doubtful | |
4339 significance. | |
4340 | |
4341 NOTE: Apparently there is a third-generation GNU malloc that is | |
4342 significantly better than the new GNU malloc, and should probably | |
4343 be included in XEmacs. | |
4344 | |
4345 There is also the relocating allocator, @file{ralloc.c}. This actually | |
4346 moves blocks of memory around so that the @code{sbrk()} pointer shrunk | |
4347 and virtual memory released back to the system. On some systems, | |
4348 this is a big win. On all systems, it causes a noticeable (and | |
4349 sometimes huge) speed penalty, so I turn it off by default. | |
4350 @file{ralloc.c} only works with the new GNU malloc in @file{gmalloc.c}. | |
4351 There are also two versions of @file{ralloc.c}, one that uses @code{mmap()} | |
4352 rather than block copies to move data around. This purports to | |
4353 be faster, although that depends on the amount of data that would | |
4354 have had to be block copied and the system-call overhead for | |
4355 @code{mmap()}. I don't know exactly how this works, except that the | |
4356 relocating-allocation routines are pretty much used only for | |
4357 the memory allocated for a buffer, which is the biggest consumer | |
4358 of space, esp. of space that may get freed later. | |
4359 | |
4360 Note that the GNU mallocs have some ``memory warning'' facilities. | |
4361 XEmacs taps into them and issues a warning through the standard | |
4362 warning system, when memory gets to 75%, 85%, and 95% full. | |
4363 (On some systems, the memory warnings are not functional.) | |
4364 | |
4365 Allocated memory that is going to be used to make a Lisp object | |
4366 is created using @code{allocate_lisp_storage()}. This calls @code{xmalloc()} | |
4367 but also verifies that the pointer to the memory can fit into | |
4368 a Lisp word (remember that some bits are taken away for a type | |
4369 tag and a mark bit). If not, an error is issued through @code{memory_full()}. | |
4370 @code{allocate_lisp_storage()} is called by @code{alloc_lcrecord()}, | |
4371 @code{ALLOCATE_FIXED_TYPE()}, and the vector and bit-vector creation | |
4372 routines. These routines also call @code{INCREMENT_CONS_COUNTER()} at the | |
4373 appropriate times; this keeps statistics on how much memory is | |
4374 allocated, so that garbage-collection can be invoked when the | |
4375 threshold is reached. | |
4376 | |
4377 @node Pure Space | |
4378 @section Pure Space | |
4379 | |
4380 Not yet documented. | |
4381 | |
4382 @node Cons | |
4383 @section Cons | |
4384 | |
4385 Conses are allocated in standard frob blocks. The only thing to | |
4386 note is that conses can be explicitly freed using @code{free_cons()} | |
4387 and associated functions @code{free_list()} and @code{free_alist()}. This | |
4388 immediately puts the conses onto the cons free list, and decrements | |
4389 the statistics on memory allocation appropriately. This is used | |
4390 to good effect by some extremely commonly-used code, to avoid | |
4391 generating extra objects and thereby triggering GC sooner. | |
4392 However, you have to be @emph{extremely} careful when doing this. | |
4393 If you mess this up, you will get BADLY BURNED, and it has happened | |
4394 before. | |
4395 | |
4396 @node Vector | |
4397 @section Vector | |
4398 | |
4399 As mentioned above, each vector is @code{malloc()}ed individually, and | |
4400 all are threaded through the variable @code{all_vectors}. Vectors are | |
4401 marked strangely during garbage collection, by kludging the size field. | |
4402 Note that the @code{struct Lisp_Vector} is declared with its contents | |
4403 being an array of one element. It is actually @code{malloc()}ed with | |
4404 the right size, however, and access to any element through the contents | |
4405 array works fine. | |
4406 | |
4407 @node Bit Vector | |
4408 @section Bit Vector | |
4409 | |
4410 Bit vectors work exactly like vectors, except for more complicated | |
4411 code to access an individual bit, and except for the fact that bit | |
4412 vectors are lrecords while vectors are not. (The only difference here is | |
4413 that there's an lrecord implementation pointer at the beginning and the | |
4414 tag field in bit vector Lisp words is ``lrecord'' rather than | |
4415 ``vector''.) | |
4416 | |
4417 @node Symbol | |
4418 @section Symbol | |
4419 | |
4420 Symbols are also allocated in frob blocks. Note that the code | |
4421 exists for symbols to be either lrecords (category (c) above) | |
4422 or simple types (category (b) above), and are lrecords by | |
4423 default (I think), although there is no good reason for this. | |
4424 | |
4425 Note that symbols in the awful horrible obarray structure are | |
4426 chained through their @code{next} field. | |
4427 | |
4428 Remember that @code{intern} looks up a symbol in an obarray, creating | |
4429 one if necessary. | |
4430 | |
4431 @node Marker | |
4432 @section Marker | |
4433 | |
4434 Markers are allocated in frob blocks, as usual. They are kept | |
4435 in a buffer unordered, but in a doubly-linked list so that they | |
4436 can easily be removed. (Formerly this was a singly-linked list, | |
4437 but in some cases garbage collection took an extraordinarily | |
4438 long time due to the O(N^2) time required to remove lots of | |
4439 markers from a buffer.) Markers are removed from a buffer in | |
4440 the finalize stage, in @code{ADDITIONAL_FREE_marker()}. | |
4441 | |
4442 @node String | |
4443 @section String | |
4444 | |
4445 As mentioned above, strings are a special case. A string is logically | |
4446 two parts, a fixed-size object (containing the length, property list, | |
4447 and a pointer to the actual data), and the actual data in the string. | |
4448 The fixed-size object is a @code{struct Lisp_String} and is allocated in | |
4449 frob blocks, as usual. The actual data is stored in special | |
4450 @dfn{string-chars blocks}, which are 8K blocks of memory. | |
4451 Currently-allocated strings are simply laid end to end in these | |
4452 string-chars blocks, with a pointer back to the @code{struct Lisp_String} | |
4453 stored before each string in the string-chars block. When a new string | |
4454 needs to be allocated, the remaining space at the end of the last | |
4455 string-chars block is used if there's enough, and a new string-chars | |
4456 block is created otherwise. | |
4457 | |
4458 There are never any holes in the string-chars blocks due to the string | |
4459 compaction and relocation that happens at the end of garbage collection. | |
4460 During the sweep stage of garbage collection, when objects are | |
4461 reclaimed, the garbage collector goes through all string-chars blocks, | |
4462 looking for unused strings. Each chunk of string data is preceded by a | |
4463 pointer to the corresponding @code{struct Lisp_String}, which indicates | |
4464 both whether the string is used and how big the string is, i.e. how to | |
4465 get to the next chuck of string data. Holes are compressed by | |
4466 block-copying the next string into the empty space and relocating the | |
4467 pointer stored in the corresponding @code{struct Lisp_String}. | |
4468 @strong{This means you have to be careful with strings in your code.} | |
4469 See the section above on @code{GCPRO}ing. | |
4470 | |
4471 Note that there is one situation not handled: a string that is too big | |
4472 to fit into a string-chars block. Such strings, called @dfn{big | |
4473 strings}, are all @code{malloc()}ed as their own block. (#### Although it | |
4474 would make more sense for the threshold for big strings to be somewhat | |
4475 lower, e.g. 1/2 or 1/4 the size of a string-chars block. It seems that | |
4476 this was indeed the case formerly -- indeed, the threshold was set at | |
4477 1/8 -- but Mly forgot about this when rewriting things for 19.8.) | |
4478 | |
4479 Note also that the string data in string-chars blocks is padded as | |
4480 necessary so that proper alignment constraints on the @code{struct | |
4481 Lisp_String} back pointers are maintained. | |
4482 | |
4483 Finally, strings can be resized. This happens in Mule when a | |
4484 character is substituted with a different-length character, or during | |
4485 modeline frobbing. (You could also export this to Lisp, but it's not | |
4486 done so currently.) Resizing a string is a potentially tricky process. | |
4487 If the change is small enough that the padding can absorb it, nothing | |
4488 other than a simple memory move needs to be done. Keep in mind, | |
4489 however, that the string can't shrink too much because the offset to the | |
4490 next string in the string-chars block is computed by looking at the | |
4491 length and rounding to the nearest multiple of four or eight. If the | |
4492 string would shrink or expand beyond the correct padding, new string | |
4493 data needs to be allocated at the end of the last string-chars block and | |
4494 the data moved appropriately. This leaves some dead string data, which | |
4495 is marked by putting a special marker of 0xFFFFFFFF in the @code{struct | |
4496 Lisp_String} pointer before the data (there's no real @code{struct | |
4497 Lisp_String} to point to and relocate), and storing the size of the dead | |
4498 string data (which would normally be obtained from the now-non-existent | |
4499 @code{struct Lisp_String}) at the beginning of the dead string data gap. | |
4500 The string compactor recognizes this special 0xFFFFFFFF marker and | |
4501 handles it correctly. | |
4502 | |
4503 @node Bytecode | |
4504 @section Bytecode | |
4505 | |
4506 Not yet documented. | |
4507 | |
4508 @node Events and the Event Loop, Evaluation; Stack Frames; Bindings, Allocation of Objects in XEmacs Lisp, Top | |
4509 @chapter Events and the Event Loop | |
4510 | |
4511 @menu | |
4512 * Introduction to Events:: | |
4513 * Main Loop:: | |
4514 * Specifics of the Event Gathering Mechanism:: | |
4515 * Specifics About the Emacs Event:: | |
4516 * The Event Stream Callback Routines:: | |
4517 * Other Event Loop Functions:: | |
4518 * Converting Events:: | |
4519 * Dispatching Events; The Command Builder:: | |
4520 @end menu | |
4521 | |
4522 @node Introduction to Events | |
4523 @section Introduction to Events | |
4524 | |
4525 An event is an object that encapsulates information about an | |
4526 interesting occurrence in the operating system. Events are | |
4527 generated either by user action, direct (e.g. typing on the | |
4528 keyboard or moving the mouse) or indirect (moving another | |
4529 window, thereby generating an expose event on an Emacs frame), | |
4530 or as a result of some other typically asynchronous action happening, | |
4531 such as output from a subprocess being ready or a timer expiring. | |
4532 Events come into the system in an asynchronous fashion (typically | |
4533 through a callback being called) and are converted into a | |
4534 synchronous event queue (first-in, first-out) in a process that | |
4535 we will call @dfn{collection}. | |
4536 | |
4537 Note that each application has their own event queue. (It is | |
4538 immaterial whether the collection process directly puts the | |
4539 events in the proper application's queue, or puts them into | |
4540 a single system queue, which is later split up.) | |
4541 | |
4542 The most basic level of event collection is done by the | |
4543 operating system or window system. Typically, XEmacs does | |
4544 its own event collection as well. Often there are multiple | |
4545 layers of collection in XEmacs, with events from various | |
4546 sources being collected into a queue, which is then combined | |
4547 with other sources to go into another queue (i.e. a second | |
4548 level of collection), with perhaps another level on top of | |
4549 this, etc. | |
4550 | |
4551 XEmacs has its own types of events (called @dfn{Emacs events}), | |
4552 which provides an abstract layer on top of the system-dependent | |
4553 nature of the most basic events that are received. Part of the | |
4554 complex nature of the XEmacs event collection process involves | |
4555 converting from the operating-system events into the proper | |
4556 Emacs events -- there may not be a one-to-one correspondence. | |
4557 | |
4558 Emacs events are documented in @file{events.h}; I'll discuss them | |
4559 later. | |
4560 | |
4561 @node Main Loop | |
4562 @section Main Loop | |
4563 | |
4564 The @dfn{command loop} is the top-level loop that the editor is always | |
4565 running. It loops endlessly, calling @code{next-event} to retrieve an | |
4566 event and @code{dispatch-event} to execute it. @code{dispatch-event} does | |
4567 the appropriate thing with non-user events (process, timeout, | |
4568 magic, eval, mouse motion); this involves calling a Lisp handler | |
4569 function, redrawing a newly-exposed part of a frame, reading | |
4570 subprocess output, etc. For user events, @code{dispatch-event} | |
4571 looks up the event in relevant keymaps or menubars; when a | |
4572 full key sequence or menubar selection is reached, the appropriate | |
4573 function is executed. @code{dispatch-event} may have to keep state | |
4574 across calls; this is done in the ``command-builder'' structure | |
4575 associated with each console (remember, there's usually only | |
4576 one console), and the engine that looks up keystrokes and | |
4577 constructs full key sequences is called the @dfn{command builder}. | |
4578 This is documented elsewhere. | |
4579 | |
4580 The guts of the command loop are in @code{command_loop_1()}. This | |
4581 function doesn't catch errors, though -- that's the job of | |
4582 @code{command_loop_2()}, which is a condition-case (i.e. error-trapping) | |
4583 wrapper around @code{command_loop_1()}. @code{command_loop_1()} never | |
4584 returns, but may get thrown out of. | |
4585 | |
4586 When an error occurs, @code{cmd_error()} is called, which usually | |
4587 invokes the Lisp error handler in @code{command-error}; however, a | |
4588 default error handler is provided if @code{command-error} is @code{nil} | |
4589 (e.g. during startup). The purpose of the error handler is simply to | |
4590 display the error message and do associated cleanup; it does not need to | |
4591 throw anywhere. When the error handler finishes, the condition-case in | |
4592 @code{command_loop_2()} will finish and @code{command_loop_2()} will | |
4593 reinvoke @code{command_loop_1()}. | |
4594 | |
4595 @code{command_loop_2()} is invoked from three places: from | |
4596 @code{initial_command_loop()} (called from @code{main()} at the end of | |
4597 internal initialization), from the Lisp function @code{recursive-edit}, | |
4598 and from @code{call_command_loop()}. | |
4599 | |
4600 @code{call_command_loop()} is called when a macro is started and when | |
4601 the minibuffer is entered; normal termination of the macro or minibuffer | |
4602 causes a throw out of the recursive command loop. (To | |
4603 @code{execute-kbd-macro} for macros and @code{exit} for minibuffers. | |
4604 Note also that the low-level minibuffer-entering function, | |
4605 @code{read-minibuffer-internal}, provides its own error handling and | |
4606 does not need @code{command_loop_2()}'s error encapsulation; so it tells | |
4607 @code{call_command_loop()} to invoke @code{command_loop_1()} directly.) | |
4608 | |
4609 Note that both read-minibuffer-internal and recursive-edit set up a | |
4610 catch for @code{exit}; this is why @code{abort-recursive-edit}, which | |
4611 throws to this catch, exits out of either one. | |
4612 | |
4613 @code{initial_command_loop()}, called from @code{main()}, sets up a | |
4614 catch for @code{top-level} when invoking @code{command_loop_2()}, | |
4615 allowing functions to throw all the way to the top level if they really | |
4616 need to. Before invoking @code{command_loop_2()}, | |
4617 @code{initial_command_loop()} calls @code{top_level_1()}, which handles | |
4618 all of the startup stuff (creating the initial frame, handling the | |
4619 command-line options, loading the user's @file{.emacs} file, etc.). The | |
4620 function that actually does this is in Lisp and is pointed to by the | |
4621 variable @code{top-level}; normally this function is | |
4622 @code{normal-top-level}. @code{top_level_1()} is just an error-handling | |
4623 wrapper similar to @code{command_loop_2()}. Note also that | |
4624 @code{initial_command_loop()} sets up a catch for @code{top-level} when | |
4625 invoking @code{top_level_1()}, just like when it invokes | |
4626 @code{command_loop_2()}. | |
4627 | |
4628 @node Specifics of the Event Gathering Mechanism | |
4629 @section Specifics of the Event Gathering Mechanism | |
4630 | |
4631 Here is an approximate diagram of the collection processes | |
4632 at work in XEmacs, under TTY's (TTY's are simpler than X | |
4633 so we'll look at this first): | |
4634 | |
4635 @noindent | |
4636 @example | |
4637 asynch. asynch. asynch. asynch. [Collectors in | |
4638 kbd events kbd events process process the OS] | |
4639 | | output output | |
4640 | | | | | |
4641 | | | | SIGINT, [signal handlers | |
4642 | | | | SIGQUIT, in XEmacs] | |
4643 V V V V SIGWINCH, | |
4644 file file file file SIGALRM | |
4645 desc. desc. desc. desc. | | |
4646 (TTY) (TTY) (pipe) (pipe) | | |
4647 | | | | fake timeouts | |
4648 | | | | file | | |
4649 | | | | desc. | | |
4650 | | | | (pipe) | | |
4651 | | | | | | | |
4652 | | | | | | | |
4653 | | | | | | | |
4654 V V V V V V | |
4655 ------>-----------<----------------<---------------- | |
4656 | | |
4657 | | |
4658 | [collected using select() in emacs_tty_next_event() | |
4659 | and converted to the appropriate Emacs event] | |
4660 | | |
4661 | | |
4662 V (above this line is TTY-specific) | |
4663 Emacs ------------------------------------------------ | |
4664 event (below this line is the generic event mechanism) | |
4665 | | |
4666 | | |
4667 was there if not, call | |
4668 a SIGINT? emacs_tty_next_event() | |
4669 | | | |
4670 | | | |
4671 | | | |
4672 V V | |
4673 --->-------<---- | |
4674 | | |
4675 | [collected in event_stream_next_event(); | |
4676 | SIGINT is converted using maybe_read_quit_event()] | |
4677 V | |
4678 Emacs | |
4679 event | |
4680 | | |
4681 \---->------>----- maybe_kbd_translate() ---->---\ | |
4682 | | |
4683 | | |
4684 | | |
4685 command event queue | | |
4686 if not from command | |
4687 (contains events that were event queue, call | |
4688 read earlier but not processed, event_stream_next_event() | |
4689 typically when waiting in a | | |
4690 sit-for, sleep-for, etc. for | | |
4691 a particular event to be received) | | |
4692 | | | |
4693 | | | |
4694 V V | |
4695 ---->------------------------------------<---- | |
4696 | | |
4697 | [collected in | |
4698 | next_event_internal()] | |
4699 | | |
4700 unread- unread- event from | | |
4701 command- command- keyboard else, call | |
4702 events event macro next_event_internal() | |
4703 | | | | | |
4704 | | | | | |
4705 | | | | | |
4706 V V V V | |
4707 --------->----------------------<------------ | |
4708 | | |
4709 | [collected in `next-event', which may loop | |
4710 | more than once if the event it gets is on | |
4711 | a dead frame, device, etc.] | |
4712 | | |
4713 | | |
4714 V | |
4715 feed into top-level event loop, | |
4716 which repeatedly calls `next-event' | |
4717 and then dispatches the event | |
4718 using `dispatch-event' | |
4719 @end example | |
4720 | |
4721 Notice the separation between TTY-specific and generic event mechanism. | |
4722 When using the Xt-based event loop, the TTY-specific stuff is replaced | |
4723 but the rest stays the same. | |
4724 | |
4725 It's also important to realize that only one different kind of | |
4726 system-specific event loop can be operating at a time, and must be able | |
4727 to receive all kinds of events simultaneously. For the two existing | |
4728 event loops (implemented in @file{event-tty.c} and @file{event-Xt.c}, | |
4729 respectively), the TTY event loop @emph{only} handles TTY consoles, | |
4730 while the Xt event loop handles @emph{both} TTY and X consoles. This | |
4731 situation is different from all of the output handlers, where you simply | |
4732 have one per console type. | |
4733 | |
4734 Here's the Xt Event Loop Diagram (notice that below a certain point, | |
4735 it's the same as the above diagram): | |
4736 | |
4737 @example | |
4738 asynch. asynch. asynch. asynch. [Collectors in | |
4739 kbd kbd process process the OS] | |
4740 events events output output | |
4741 | | | | | |
4742 | | | | asynch. asynch. [Collectors in the | |
4743 | | | | X X OS and X Window System] | |
4744 | | | | events events | |
4745 | | | | | | | |
4746 | | | | | | | |
4747 | | | | | | SIGINT, [signal handlers | |
4748 | | | | | | SIGQUIT, in XEmacs] | |
4749 | | | | | | SIGWINCH, | |
4750 | | | | | | SIGALRM | |
4751 | | | | | | | | |
4752 | | | | | | | | |
4753 | | | | | | | timeouts | |
4754 | | | | | | | | | |
4755 | | | | | | | | | |
4756 | | | | | | V | | |
4757 V V V V V V fake | | |
4758 file file file file file file file | | |
4759 desc. desc. desc. desc. desc. desc. desc. | | |
4760 (TTY) (TTY) (pipe) (pipe) (socket) (socket) (pipe) | | |
4761 | | | | | | | | | |
4762 | | | | | | | | | |
4763 | | | | | | | | | |
4764 V V V V V V V V | |
4765 --->----------------------------------------<---------<------ | |
4766 | | | | |
4767 | | | [collected using select() in | |
4768 | | | _XtWaitForSomething(), called | |
4769 | | | from XtAppProcessEvent(), called | |
4770 | | | in emacs_Xt_next_event(); | |
4771 | | | dispatched to various callbacks] | |
4772 | | | | |
4773 | | | | |
4774 emacs_Xt_ p_s_callback(), | [popup_selection_callback] | |
4775 event_handler() x_u_v_s_callback(),| [x_update_vertical_scrollbar_ | |
4776 | x_u_h_s_callback(),| callback] | |
4777 | search_callback() | [x_update_horizontal_scrollbar_ | |
4778 | | | callback] | |
4779 | | | | |
4780 | | | | |
4781 enqueue_Xt_ signal_special_ | | |
4782 dispatch_event() Xt_user_event() | | |
4783 [maybe multiple | | | |
4784 times, maybe 0 | | | |
4785 times] | | | |
4786 | enqueue_Xt_ | | |
4787 | dispatch_event() | | |
4788 | | | | |
4789 | | | | |
4790 V V | | |
4791 -->----------<-- | | |
4792 | | | |
4793 | | | |
4794 dispatch Xt_what_callback() | |
4795 event sets flags | |
4796 queue | | |
4797 | | | |
4798 | | | |
4799 | | | |
4800 | | | |
4801 ---->-----------<-------- | |
4802 | | |
4803 | | |
4804 | [collected and converted as appropriate in | |
4805 | emacs_Xt_next_event()] | |
4806 | | |
4807 | | |
4808 V (above this line is Xt-specific) | |
4809 Emacs ------------------------------------------------ | |
4810 event (below this line is the generic event mechanism) | |
4811 | | |
4812 | | |
4813 was there if not, call | |
4814 a SIGINT? emacs_Xt_next_event() | |
4815 | | | |
4816 | | | |
4817 | | | |
4818 V V | |
4819 --->-------<---- | |
4820 | | |
4821 | [collected in event_stream_next_event(); | |
4822 | SIGINT is converted using maybe_read_quit_event()] | |
4823 V | |
4824 Emacs | |
4825 event | |
4826 | | |
4827 \---->------>----- maybe_kbd_translate() -->-----\ | |
4828 | | |
4829 | | |
4830 | | |
4831 command event queue | | |
4832 if not from command | |
4833 (contains events that were event queue, call | |
4834 read earlier but not processed, event_stream_next_event() | |
4835 typically when waiting in a | | |
4836 sit-for, sleep-for, etc. for | | |
4837 a particular event to be received) | | |
4838 | | | |
4839 | | | |
4840 V V | |
4841 ---->----------------------------------<------ | |
4842 | | |
4843 | [collected in | |
4844 | next_event_internal()] | |
4845 | | |
4846 unread- unread- event from | | |
4847 command- command- keyboard else, call | |
4848 events event macro next_event_internal() | |
4849 | | | | | |
4850 | | | | | |
4851 | | | | | |
4852 V V V V | |
4853 --------->----------------------<------------ | |
4854 | | |
4855 | [collected in `next-event', which may loop | |
4856 | more than once if the event it gets is on | |
4857 | a dead frame, device, etc.] | |
4858 | | |
4859 | | |
4860 V | |
4861 feed into top-level event loop, | |
4862 which repeatedly calls `next-event' | |
4863 and then dispatches the event | |
4864 using `dispatch-event' | |
4865 @end example | |
4866 | |
4867 @node Specifics About the Emacs Event | |
4868 @section Specifics About the Emacs Event | |
4869 | |
4870 @node The Event Stream Callback Routines | |
4871 @section The Event Stream Callback Routines | |
4872 | |
4873 @node Other Event Loop Functions | |
4874 @section Other Event Loop Functions | |
4875 | |
4876 @code{detect_input_pending()} and @code{input-pending-p} look for | |
4877 input by calling @code{event_stream->event_pending_p} and looking in | |
4878 @code{[V]unread-command-event} and the @code{command_event_queue} (they | |
4879 do not check for an executing keyboard macro, though). | |
4880 | |
4881 @code{discard-input} cancels any command events pending (and any | |
4882 keyboard macros currently executing), and puts the others onto the | |
4883 @code{command_event_queue}. There is a comment about a ``race | |
4884 condition'', which is not a good sign. | |
4885 | |
4886 @code{next-command-event} and @code{read-char} are higher-level | |
4887 interfaces to @code{next-event}. @code{next-command-event} gets the | |
4888 next @dfn{command} event (i.e. keypress, mouse event, or menu | |
4889 selection), calling dispatch-event on any others. @code{read-char} | |
4890 calls @code{next-command-event} and uses @code{event_to_character()} to | |
4891 return the ASCII equivalent. | |
4892 | |
4893 @node Converting Events | |
4894 @section Converting Events | |
4895 | |
4896 @code{character_to_event()}, @code{event_to_character()}, | |
4897 @code{event-to-character}, and @code{character-to-event} convert between | |
4898 ASCII characters and keypresses corresponding to the characters. If the | |
4899 event was not a keypress, @code{event_to_character()} returns -1 and | |
4900 @code{event-to-character} returns @code{nil}. These functions convert | |
4901 between ASCII representation and the split-up event representation | |
4902 (keysym plus mod keys). | |
4903 | |
4904 @node Dispatching Events; The Command Builder | |
4905 @section Dispatching Events; The Command Builder | |
4906 | |
4907 Not yet documented. | |
4908 | |
4909 @node Evaluation; Stack Frames; Bindings, Symbols and Variables, Events and the Event Loop, Top | |
4910 @chapter Evaluation; Stack Frames; Bindings | |
4911 | |
4912 @menu | |
4913 * Evaluation:: | |
4914 * Dynamic Binding; The specbinding Stack; Unwind-Protects:: | |
4915 * Simple Special Forms:: | |
4916 * Catch and Throw:: | |
4917 @end menu | |
4918 | |
4919 @node Evaluation | |
4920 @section Evaluation | |
4921 | |
4922 @code{Feval()} evaluates the form (a Lisp object) that is passed to | |
4923 it. Note that evaluation is only non-trivial for two types of objects: | |
4924 symbols and conses. Under normal circumstances (i.e. not mocklisp) a | |
4925 symbol is evaluated simply by calling symbol-value on it and returning | |
4926 the value. | |
4927 | |
4928 Evaluating a cons means calling a function. First, @code{eval} checks | |
4929 to see if garbage-collection is necessary, and calls | |
4930 @code{Fgarbage_collect()} if so. It then increases the evaluation depth | |
4931 by 1 (@code{lisp_eval_depth}, which is always less than @code{max_lisp_eval_depth}) and adds an | |
4932 element to the linked list of @code{struct backtrace}'s | |
4933 (@code{backtrace_list}). Each such structure contains a pointer to the | |
4934 function being called plus a list of the function's arguments. | |
4935 Originally these values are stored unevalled, and as they are evaluated, | |
4936 the backtrace structure is updated. Garbage collection pays attention | |
4937 to the objects pointed to in the backtrace structures (garbage | |
4938 collection might happen while a function is being called or while an | |
4939 argument is being evaluated, and there could easily be no other | |
4940 references to the arguments in the argument list; once an argument is | |
4941 evaluated, however, the unevalled version is not needed by eval, and so | |
4942 the backtrace structure is changed). | |
4943 | |
4944 At this point, the function to be called is determined by looking at | |
4945 the car of the cons (if this is a symbol, its function definition is | |
4946 retrieved and the process repeated). The function should then consist | |
4947 of either a Lisp_Subr (built-in function), a Lisp_Compiled object, or a | |
4948 cons whose car is the symbol @code{autoload}, @code{macro}, | |
4949 @code{lambda}, or @code{mocklisp}. | |
4950 | |
4951 If the function is a Lisp_Subr, the lisp object points to a struct | |
4952 Lisp_Subr (created by @code{DEFUN()}), which contains a pointer to the C | |
4953 function, a minimum and maximum number of arguments (possibly the | |
4954 special constants @code{MANY} or @code{UNEVALLED}), a pointer to the | |
4955 symbol referring to that subr, and a couple of other things. If the | |
4956 subr wants its arguments @code{UNEVALLED}, they are passed raw as a | |
4957 list. Otherwise, an array of evaluated arguments is created and put | |
4958 into the backtrace structure, and either passed whole (@code{MANY}) or | |
4959 each argument is passed as a C argument. | |
4960 | |
4961 If the function is a Lisp_Compiled object or a lambda, | |
4962 @code{apply_lambda()} is called. If the function is a macro, | |
4963 [..... fill in] is done. If the function is an autoload, | |
4964 @code{do_autoload()} is called to load the definition and then eval | |
4965 starts over [explain this more]. If the function is a mocklisp, | |
4966 @code{ml_apply()} is called. | |
4967 | |
4968 When @code{Feval} exits, the evaluation depth is reduced by one, the | |
4969 debugger is called if appropriate, and the current backtrace structure | |
4970 is removed from the list. | |
4971 | |
4972 @code{apply_lambda()} is passed a function, a list of arguments, and a | |
4973 flag indicating whether to evaluate the arguments. It creates an array | |
4974 of (possibly) evaluated arguments and fixes up the backtrace structure, | |
4975 just like eval does. Then it calls @code{funcall_lambda()}. | |
4976 | |
4977 @code{funcall_lambda()} goes through the formal arguments to the | |
4978 function and binds them to the actual arguments, checking for | |
4979 @code{&rest} and @code{&optional} symbols in the formal arguments and | |
4980 making sure the number of actual arguments is correct. Then either | |
4981 progn or byte-code is called to actually execute the body and return a | |
4982 value. | |
4983 | |
4984 @code{Ffuncall()} implements Lisp @code{funcall}. @code{(funcall fun | |
4985 x1 x2 x3 ...)} is equivalent to @code{(eval (list fun (quote x1) (quote | |
4986 x2) (quote x3) ...))}. @code{Ffuncall()} contains its own code to do | |
4987 the evaluation, however, and is almost identical to eval. | |
4988 | |
4989 @code{Fapply()} implements Lisp @code{apply}, which is very similar to | |
4990 funcall except that if the last argument is a list, the result is the | |
4991 same as if each of the arguments in the list had been passed separately. | |
4992 @code{Fapply()} does some business to expand the last argument if it's a | |
4993 list, then calls @code{Ffuncall()} to do the work. | |
4994 | |
4995 @code{apply1()}, @code{call0()}, @code{call1()}, @code{call2()}, and | |
4996 @code{call3()} call a function, passing it the argument(s) given (the | |
4997 arguments are given as separate C arguments rather than being passed as | |
4998 an array). @code{apply1()} uses @code{apply} while the others use | |
4999 @code{funcall}. | |
5000 | |
5001 @node Dynamic Binding; The specbinding Stack; Unwind-Protects | |
5002 @section Dynamic Binding; The specbinding Stack; Unwind-Protects | |
5003 | |
5004 @example | |
5005 struct specbinding | |
5006 @{ | |
5007 Lisp_Object symbol, old_value; | |
5008 Lisp_Object (*func) (); | |
5009 Lisp_Object unused; /* Dividing by 16 is faster than by 12 */ | |
5010 @}; | |
5011 @end example | |
5012 | |
5013 @code{struct specbinding} is used for local-variable bindings and | |
5014 unwind-protects. @code{specpdl} holds an array of @code{struct specbinding}'s, | |
5015 @code{specpdl_ptr} points to the beginning of the free bindings in the | |
5016 array, @code{specpdl_size} specifies the total number of binding slots | |
5017 in the array, and @code{max_specpdl_size} specifies the maximum number | |
5018 of bindings the array can be expanded to hold. @code{grow_specpdl()} | |
5019 increases the size of the specpdl array, multiplying its size by 2 but | |
5020 never exceeding max_specpdl_size (except that if this number is less | |
5021 than 400, it is first set to 400). | |
5022 | |
5023 @code{specbind()} binds a symbol to a value and is used for local | |
5024 variables and @code{let} forms. The symbol and its old value (which | |
5025 might be @code{Qunbound}, indicating no prior value) are recorded in the | |
5026 specpdl array, and @code{specpdl_size} is increased by 1. | |
5027 | |
5028 @code{record_unwind_protect()} implements an @dfn{unwind-protect}, | |
5029 which, when placed around a section of code, ensures that some specified | |
5030 cleanup routine will be executed even if the code exits abnormally | |
5031 (e.g. through a throw or quit). @code{record_unwind_protect()} simply | |
5032 adds a new specbinding to the specpdl array and stores the appropriate | |
5033 information in it. The cleanup routine can either be a C function, | |
5034 which is stored in the @code{func} field, or a progn form, which is stored in | |
5035 the @code{old_value} field. | |
5036 | |
5037 @code{unbind_to()} removes specbindings from the specpdl array until | |
5038 the specified position is reached. The specbinding can be one of three | |
5039 types: | |
5040 | |
5041 @enumerate | |
5042 @item | |
5043 an unwind-protect with a C cleanup function (@code{func} is not 0 -- | |
5044 @code{old_value} holds an argument to be passed to the function); | |
5045 @item | |
5046 an unwind-protect with a Lisp form (@code{func} is 0 and @code{symbol} | |
5047 is @code{nil} -- @code{old_value} holds the form to be executed with | |
5048 @code{Fprogn()}); or | |
5049 @item | |
5050 a local-variable binding (@code{func} is 0 and @code{symbol} is not | |
5051 @code{nil} -- @code{old_value} holds the old value, which is stored as | |
5052 the symbol's value). | |
5053 @end enumerate | |
5054 | |
5055 @node Simple Special Forms | |
5056 @section Simple Special Forms | |
5057 | |
5058 @code{or}, @code{and}, @code{if}, @code{cond}, @code{progn}, | |
5059 @code{prog1}, @code{prog2}, @code{setq}, @code{quote}, @code{function}, | |
5060 @code{let*}, @code{let}, @code{while} | |
5061 | |
5062 All of these are very simple and work as expected, calling | |
5063 @code{Feval()} or @code{Fprogn()} as necessary and (in the case of | |
5064 @code{let} and @code{let*}) using @code{specbind()} to create bindings | |
5065 and @code{unbind_to()} to undo the bindings when finished. Note that | |
5066 these functions do a lot of @code{GCPRO}ing to protect their arguments | |
5067 from garbage collection because they call @code{Feval()} (@pxref{Garbage | |
5068 Collection}). | |
5069 | |
5070 @node Catch and Throw | |
5071 @section Catch and Throw | |
5072 | |
5073 @example | |
5074 struct catchtag | |
5075 @{ | |
5076 Lisp_Object tag; | |
5077 Lisp_Object val; | |
5078 struct catchtag *next; | |
5079 struct gcpro *gcpro; | |
5080 jmp_buf jmp; | |
5081 struct backtrace *backlist; | |
5082 int lisp_eval_depth; | |
5083 int pdlcount; | |
5084 @}; | |
5085 @end example | |
5086 | |
5087 @code{catch} is a Lisp function that places a catch around a body of | |
5088 code. A catch is a means of non-local exit from the code. When a catch | |
5089 is created, a tag is specified, and executing a @code{throw} to this tag | |
5090 will exit from the body of code caught with this tag, and its value will | |
5091 be the value given in the call to @code{throw}. If there is no such | |
5092 call, the code will be executed normally. | |
5093 | |
5094 Information pertaining to a catch is held in a @code{struct catchtag}, | |
5095 which is placed at the head of a linked list pointed to by | |
5096 @code{catchlist}. @code{internal_catch()} is passed a C function to | |
5097 call (@code{Fprogn()} when Lisp @code{catch} is called) and arguments to | |
5098 give it, and places a catch around the function. Each @code{struct | |
5099 catchtag} is held in the stack frame of the @code{internal_catch()} | |
5100 instance that created the catch. | |
5101 | |
5102 @code{internal_catch()} is fairly straightforward. It stores into the | |
5103 @code{struct catchtag} the tag name and the current values of | |
5104 @code{backtrace_list}, @code{lisp_eval_depth}, @code{gcprolist}, and the | |
5105 offset into the specpdl array, sets a jump point with @code{_setjmp()} | |
5106 (storing the jump point into the @code{struct catchtag}), and calls the | |
5107 function. Control will return to @code{internal_catch()} either when | |
5108 the function exits normally or through a @code{_longjmp()} to this jump | |
5109 point. In the latter case, @code{throw} will store the value to be | |
5110 returned into the @code{struct catchtag} before jumping. When it's | |
5111 done, @code{internal_catch()} removes the @code{struct catchtag} from | |
5112 the catchlist and returns the proper value. | |
5113 | |
5114 @code{Fthrow()} goes up through the catchlist until it finds one with | |
5115 a matching tag. It then calls @code{unbind_catch()} to restore | |
5116 everything to what it was when the appropriate catch was set, stores the | |
5117 return value in the @code{struct catchtag}, and jumps (with | |
5118 @code{_longjmp()}) to its jump point. | |
5119 | |
5120 @code{unbind_catch()} removes all catches from the catchlist until it | |
5121 finds the correct one. Some of the catches might have been placed for | |
5122 error-trapping, and if so, the appropriate entries on the handlerlist | |
5123 must be removed (see ``errors''). @code{unbind_catch()} also restores | |
5124 the values of @code{gcprolist}, @code{backtrace_list}, and | |
5125 @code{lisp_eval}, and calls @code{unbind_to()} to undo any specbindings | |
5126 created since the catch. | |
5127 | |
5128 | |
5129 @node Symbols and Variables, Buffers and Textual Representation, Evaluation; Stack Frames; Bindings, Top | |
5130 @chapter Symbols and Variables | |
5131 | |
5132 @menu | |
5133 * Introduction to Symbols:: | |
5134 * Obarrays:: | |
5135 * Symbol Values:: | |
5136 @end menu | |
5137 | |
5138 @node Introduction to Symbols | |
5139 @section Introduction to Symbols | |
5140 | |
5141 A symbol is basically just an object with four fields: a name (a | |
5142 string), a value (some Lisp object), a function (some Lisp object), and | |
5143 a property list (usually a list of alternating keyword/value pairs). | |
5144 What makes symbols special is that there is usually only one symbol with | |
5145 a given name, and the symbol is referred to by name. This makes a | |
5146 symbol a convenient way of calling up data by name, i.e. of implementing | |
5147 variables. (The variable's value is stored in the @dfn{value slot}.) | |
5148 Similarly, functions are referenced by name, and the definition of the | |
5149 function is stored in a symbol's @dfn{function slot}. This means that | |
5150 there can be a distinct function and variable with the same name. The | |
5151 property list is used as a more general mechanism of associating | |
5152 additional values with particular names, and once again the namespace is | |
5153 independent of the function and variable namespaces. | |
5154 | |
5155 @node Obarrays | |
5156 @section Obarrays | |
5157 | |
5158 The identity of symbols with their names is accomplished through a | |
5159 structure called an obarray, which is just a poorly-implemented hash | |
5160 table mapping from strings to symbols whose name is that string. (I say | |
5161 ``poorly implemented'' because an obarray appears in Lisp as a vector | |
5162 with some hidden fields rather than as its own opaque type. This is an | |
5163 Emacs Lisp artifact that should be fixed.) | |
5164 | |
5165 Obarrays are implemented as a vector of some fixed size (which should | |
5166 be a prime for best results), where each ``bucket'' of the vector | |
5167 contains one or more symbols, threaded through a hidden @code{next} | |
5168 field in the symbol. Lookup of a symbol in an obarray, and adding a | |
5169 symbol to an obarray, is accomplished through standard hash-table | |
5170 techniques. | |
5171 | |
5172 The standard Lisp function for working with symbols and obarrays is | |
5173 @code{intern}. This looks up a symbol in an obarray given its name; if | |
5174 it's not found, a new symbol is automatically created with the specified | |
5175 name, added to the obarray, and returned. This is what happens when the | |
5176 Lisp reader encounters a symbol (or more precisely, encounters the name | |
5177 of a symbol) in some text that it is reading. There is a standard | |
5178 obarray called @code{obarray} that is used for this purpose, although | |
5179 the Lisp programmer is free to create his own obarrays and @code{intern} | |
5180 symbols in them. | |
5181 | |
5182 Note that, once a symbol is in an obarray, it stays there until | |
5183 something is done about it, and the standard obarray @code{obarray} | |
5184 always stays around, so once you use any particular variable name, a | |
5185 corresponding symbol will stay around in @code{obarray} until you exit | |
5186 XEmacs. | |
5187 | |
5188 Note that @code{obarray} itself is a variable, and as such there is a | |
5189 symbol in @code{obarray} whose name is @code{"obarray"} and which | |
5190 contains @code{obarray} as its value. | |
5191 | |
5192 Note also that this call to @code{intern} occurs only when in the Lisp | |
5193 reader, not when the code is executed (at which point the symbol is | |
5194 already around, stored as such in the definition of the function). | |
5195 | |
5196 You can create your own obarray using @code{make-vector} (this is | |
5197 horrible but is an artifact) and intern symbols into that obarray. | |
5198 Doing that will result in two or more symbols with the same name. | |
5199 However, at most one of these symbols is in the standard @code{obarray}: | |
5200 You cannot have two symbols of the same name in any particular obarray. | |
5201 Note that you cannot add a symbol to an obarray in any fashion other | |
5202 than using @code{intern}: i.e. you can't take an existing symbol and put | |
5203 it in an existing obarray. Nor can you change the name of an existing | |
5204 symbol. (Since obarrays are vectors, you can violate the consistency of | |
5205 things by storing directly into the vector, but let's ignore that | |
5206 possibility.) | |
5207 | |
5208 Usually symbols are created by @code{intern}, but if you really want, | |
5209 you can explicitly create a symbol using @code{make-symbol}, giving it | |
5210 some name. The resulting symbol is not in any obarray (i.e. it is | |
5211 @dfn{uninterned}), and you can't add it to any obarray. Therefore its | |
5212 primary purpose is as a carrier of information. (Cons cells could | |
5213 probably be used just as well.) | |
5214 | |
5215 You can also use @code{intern-soft} to look up a symbol but not create | |
5216 a new one, and @code{unintern} to remove a symbol from an obarray. This | |
5217 returns the removed symbol. (Remember: You can't put the symbol back | |
5218 into any obarray.) Finally, @code{mapatoms} maps over all of the symbols | |
5219 in an obarray. | |
5220 | |
5221 @node Symbol Values | |
5222 @section Symbol Values | |
5223 | |
5224 The value field of a symbol normally contains a Lisp object. However, | |
5225 a symbol can be @dfn{unbound}, meaning that it logically has no value. | |
5226 This is internally indicated by storing a special Lisp object, called | |
5227 @dfn{the unbound marker} and stored in the global variable | |
5228 @code{Qunbound}. The unbound marker is of a special Lisp object type | |
5229 called @dfn{symbol-value-magic}. It is impossible for the Lisp | |
5230 programmer to directly create or access any object of this type. | |
5231 | |
5232 @strong{You must not let any ``symbol-value-magic'' object escape to | |
5233 the Lisp level.} Printing any of these objects will cause the message | |
5234 @samp{INTERNAL EMACS BUG} to appear as part of the print representation. | |
5235 (You may see this normally when you call @code{debug_print()} from the | |
5236 debugger on a Lisp object.) If you let one of these objects escape to | |
5237 the Lisp level, you will violate a number of assumptions contained in | |
5238 the C code and make the unbound marker not function right. | |
5239 | |
5240 When a symbol is created, its value field (and function field) are set | |
5241 to @code{Qunbound}. The Lisp programmer can restore these conditions | |
5242 later using @code{makunbound} or @code{fmakunbound}, and can query to | |
5243 see whether the value of function fields are @dfn{bound} (i.e. have a | |
5244 value other than @code{Qunbound}) using @code{boundp} and | |
5245 @code{fboundp}. The fields are set to a normal Lisp object using | |
5246 @code{set} (or @code{setq}) and @code{fset}. | |
5247 | |
5248 Other symbol-value-magic objects are used as special markers to | |
5249 indicate variables that have non-normal properties. This includes any | |
5250 variables that are tied into C variables (setting the variable magically | |
5251 sets some global variable in the C code, and likewise for retrieving the | |
5252 variable's value), variables that magically tie into slots in the | |
5253 current buffer, variables that are buffer-local, etc. The | |
5254 symbol-value-magic object is stored in the value cell in place of | |
5255 a normal object, and the code to retrieve a symbol's value | |
5256 (i.e. @code{symbol-value}) knows how to do special things with them. | |
5257 This means that you should not just fetch the value cell directly if you | |
5258 want a symbol's value. | |
5259 | |
5260 The exact workings of this are rather complex and involved and are | |
5261 well-documented in comments in @file{buffer.c}, @file{symbols.c}, and | |
5262 @file{lisp.h}. | |
5263 | |
5264 @node Buffers and Textual Representation, MULE Character Sets and Encodings, Symbols and Variables, Top | |
5265 @chapter Buffers and Textual Representation | |
5266 | |
5267 @menu | |
5268 * Introduction to Buffers:: A buffer holds a block of text such as a file. | |
5269 * A Buffer@'s Text:: Representation of the text in a buffer. | |
5270 * Buffer Lists:: Keeping track of all buffers. | |
5271 * Markers and Extents:: Tagging locations within a buffer. | |
5272 * Bufbytes and Emchars:: Representation of individual characters. | |
5273 * The Buffer Object:: The Lisp object corresponding to a buffer. | |
5274 @end menu | |
5275 | |
5276 @node Introduction to Buffers | |
5277 @section Introduction to Buffers | |
5278 | |
5279 A buffer is logically just a Lisp object that holds some text. | |
5280 In this, it is like a string, but a buffer is optimized for | |
5281 frequent insertion and deletion, while a string is not. Furthermore: | |
5282 | |
5283 @enumerate | |
5284 @item | |
5285 Buffers are @dfn{permanent} objects, i.e. one you create them, they | |
5286 remain around, and need to be explicitly deleted before they go away. | |
5287 @item | |
5288 Each buffer has a unique name, which is a string. Buffers are | |
5289 normally referred to by name. In this respect, they are like | |
5290 symbols. | |
5291 @item | |
5292 Buffers have a default insertion position, called @dfn{point}. | |
5293 Inserting text (unless you explicitly give a position) goes at point, | |
5294 and moves point forward past the text. This is what is going on when | |
5295 you type text into Emacs. | |
5296 @item | |
5297 Buffers have lots of extra properties associated with them. | |
5298 @item | |
5299 Buffers can be @dfn{displayed}. What this means is that there | |
5300 exist a number of @dfn{windows}, which are objects that correspond | |
5301 to some visible section of your display, and each window has | |
5302 an associated buffer, and the current contents of the buffer | |
5303 are shown in that section of the display. The redisplay mechanism | |
5304 (which takes care of doing this) knows how to look at the | |
5305 text of a buffer and come up with some reasonable way of displaying | |
5306 this. Many of the properties of a buffer control how the | |
5307 buffer's text is displayed. | |
5308 @item | |
5309 One buffer is distinguished and called the @dfn{current buffer}. It is | |
5310 stored in the variable @code{current_buffer}. Buffer operations operate | |
5311 on this buffer by default. When you are typing text into a buffer, the | |
5312 buffer you are typing into is always @code{current_buffer}. Switching | |
5313 to a different window changes the current buffer. Note that Lisp code | |
5314 can temporarily change the current buffer using @code{set-buffer} (often | |
5315 enclosed in a @code{save-excursion} so that the former current buffer | |
5316 gets restored when the code is finished). However, calling | |
5317 @code{set-buffer} will NOT cause a permanent change in the current | |
5318 buffer. The reason for this is that the top-level event loop sets | |
5319 current buffer to the buffer of the selected window, each time it | |
5320 finishes executing a user command. | |
5321 @end enumerate | |
5322 | |
5323 Make sure you understand the distinction between @dfn{current buffer} | |
5324 and @dfn{buffer of the selected window}, and the distinction between | |
5325 @dfn{point} of the current buffer and @dfn{window-point} of the selected | |
5326 window. (This latter distinction is explained in detail in the section | |
5327 on windows.) | |
5328 | |
5329 @node A Buffer@'s Text | |
5330 @section A Buffer's Text | |
5331 | |
5332 The text in a buffer consists of a sequence of zero or more | |
5333 characters. A @dfn{character} is an integer that logically represents | |
5334 a letter, number, space, or other unit of text. Most of the characters | |
5335 that you will typically encounter belong to the ASCII set of characters, | |
5336 but there are also characters for various sorts of accented letters, | |
5337 special symbols, Chinese and Japanese ideograms (i.e. Kanji, Katakana, | |
5338 etc.), Cyrillic and Greek letters, etc. The actual number of possible | |
5339 characters is quite large. | |
5340 | |
5341 For now, we can view a character as some non-negative integer that | |
5342 has some shape that defines how it typically appears (e.g. as an | |
5343 uppercase A). (The exact way in which a character appears depends | |
5344 on the font of the character.) The internal type of characters in | |
5345 the C code is an Emchar; this is just an int, but using a symbolic | |
5346 type makes the code clearer. | |
5347 | |
5348 Between every character in a buffer is a @dfn{buffer position} or | |
5349 @dfn{character position}. We can speak of the character before or after | |
5350 a particular buffer position, and when you insert a character at a | |
5351 particular position, all characters after that position end up at new | |
5352 positions. When we speak of the character @dfn{at} a position, we | |
5353 really mean the character after the position. (This schizophrenia | |
5354 between a buffer position being ``between'' a character and ``on'' a | |
5355 character is rampant in Emacs.) | |
5356 | |
5357 Buffer positions are numbered starting at 1. This means that | |
5358 position 1 is before the first character, and position 0 is not | |
5359 valid. If there are N characters in a buffer, then buffer | |
5360 position N+1 is after the last one, and position N+2 is not valid. | |
5361 | |
5362 The internal makeup of the Emchar integer varies depending on whether | |
5363 we have compiled with MULE support. If not, the Emchar integer is an | |
5364 8-bit integer with possible values from 0 - 255. 0 - 127 are the | |
5365 standard ASCII characters, while 128 - 255 are the characters from the | |
5366 ISO-8859-1 character set. If we have compiled with MULE support, an | |
5367 Emchar is a 19-bit integer, with the various bits having meanings | |
5368 according to a complex scheme that will be detailed later. The | |
5369 characters numbered 0 - 255 still have the same meanings as for the | |
5370 non-MULE case, though. | |
5371 | |
5372 Internally, the text in a buffer is represented in a fairly simple | |
5373 fashion: as a contiguous array of bytes, with a @dfn{gap} of some size | |
5374 in the middle. Although the gap is of some substantial size in bytes, | |
5375 there is no text contained within it: From the perspective of the text | |
5376 in the buffer, it does not exist. The gap logically sits at some buffer | |
5377 position, between two characters (or possibly at the beginning or end of | |
5378 the buffer). Insertion of text in a buffer at a particular position is | |
5379 always accomplished by first moving the gap to that position | |
5380 (i.e. through some block moving of text), then writing the text into the | |
5381 beginning of the gap, thereby shrinking the gap. If the gap shrinks | |
5382 down to nothing, a new gap is created. (What actually happens is that a | |
5383 new gap is ``created'' at the end of the buffer's text, which requires | |
5384 nothing more than changing a couple of indices; then the gap is | |
5385 ``moved'' to the position where the insertion needs to take place by | |
5386 moving up in memory all the text after that position.) Similarly, | |
5387 deletion occurs by moving the gap to the place where the text is to be | |
5388 deleted, and then simply expanding the gap to include the deleted text. | |
5389 (@dfn{Expanding} and @dfn{shrinking} the gap as just described means | |
5390 just that the internal indices that keep track of where the gap is | |
5391 located are changed.) | |
5392 | |
5393 Note that the total amount of memory allocated for a buffer text never | |
5394 decreases while the buffer is live. Therefore, if you load up a | |
5395 20-megabyte file and then delete all but one character, there will be a | |
5396 20-megabyte gap, which won't get any smaller (except by inserting | |
5397 characters back again). Once the buffer is killed, the memory allocated | |
5398 for the buffer text will be freed, but it will still be sitting on the | |
5399 heap, taking up virtual memory, and will not be released back to the | |
5400 operating system. (However, if you have compiled XEmacs with rel-alloc, | |
5401 the situation is different. In this case, the space @emph{will} be | |
5402 released back to the operating system. However, this tends to effect a | |
5403 noticeable speed penalty.) | |
5404 | |
5405 Astute readers may notice that the text in a buffer is represented as | |
5406 an array of @emph{bytes}, while (at least in the MULE case) an Emchar is | |
5407 a 19-bit integer, which clearly cannot fit in a byte. This means (of | |
5408 course) that the text in a buffer uses a different representation from | |
5409 an Emchar: specifically, the 19-bit Emchar becomes a series of one to | |
5410 four bytes. The conversion between these two representations is complex | |
5411 and will be described later. | |
5412 | |
5413 In the non-MULE case, everything is very simple: An Emchar | |
5414 is an 8-bit value, which fits neatly into one byte. | |
5415 | |
5416 If we are given a buffer position and want to retrieve the | |
5417 character at that position, we need to follow these steps: | |
5418 | |
5419 @enumerate | |
5420 @item | |
5421 Pretend there's no gap, and convert the buffer position into a @dfn{byte | |
5422 index} that indexes to the appropriate byte in the buffer's stream of | |
5423 textual bytes. By convention, byte indices begin at 1, just like buffer | |
5424 positions. In the non-MULE case, byte indices and buffer positions are | |
5425 identical, since one character equals one byte. | |
5426 @item | |
5427 Convert the byte index into a @dfn{memory index}, which takes the gap | |
5428 into account. The memory index is a direct index into the block of | |
5429 memory that stores the text of a buffer. This basically just involves | |
5430 checking to see if the byte index is past the gap, and if so, adding the | |
5431 size of the gap to it. By convention, memory indices begin at 1, just | |
5432 like buffer positions and byte indices, and when referring to the | |
5433 position that is @dfn{at} the gap, we always use the memory position at | |
5434 the @emph{beginning}, not at the end, of the gap. | |
5435 @item | |
5436 Fetch the appropriate bytes at the determined memory position. | |
5437 @item | |
5438 Convert these bytes into an Emchar. | |
5439 @end enumerate | |
5440 | |
5441 In the non-Mule case, (3) and (4) boil down to a simple one-byte | |
5442 memory access. | |
5443 | |
5444 Note that we have defined three types of positions in a buffer: | |
5445 | |
5446 @enumerate | |
5447 @item | |
5448 @dfn{buffer positions} or @dfn{character positions}, typedef @code{Bufpos} | |
5449 @item | |
5450 @dfn{byte indices}, typedef @code{Bytind} | |
5451 @item | |
5452 @dfn{memory indices}, typedef @code{Memind} | |
5453 @end enumerate | |
5454 | |
5455 All three typedefs are just ints, but defining them this way makes | |
5456 things a lot clearer. | |
5457 | |
5458 Most code works with buffer positions. In particular, all Lisp code | |
5459 that refers to text in a buffer uses buffer positions. Lisp code does | |
5460 not know that byte indices or memory indices exist. | |
5461 | |
5462 Finally, we have a typedef for the bytes in a buffer. This is a | |
5463 @code{Bufbyte}, which is an unsigned char. Referring to them as | |
5464 Bufbytes underscores the fact that we are working with a string of bytes | |
5465 in the internal Emacs buffer representation rather than in one of a | |
5466 number of possible alternative representations (e.g. EUC-coded text, | |
5467 etc.). | |
5468 | |
5469 @node Buffer Lists | |
5470 @section Buffer Lists | |
5471 | |
5472 Recall earlier that buffers are @dfn{permanent} objects, i.e. that | |
5473 they remain around until explicitly deleted. This entails that there is | |
5474 a list of all the buffers in existence. This list is actually an | |
5475 assoc-list (mapping from the buffer's name to the buffer) and is stored | |
5476 in the global variable @code{Vbuffer_alist}. | |
5477 | |
5478 The order of the buffers in the list is important: the buffers are | |
5479 ordered approximately from most-recently-used to least-recently-used. | |
5480 Switching to a buffer using @code{switch-to-buffer}, | |
5481 @code{pop-to-buffer}, etc. and switching windows using | |
5482 @code{other-window}, etc. usually brings the new current buffer to the | |
5483 front of the list. @code{switch-to-buffer}, @code{other-buffer}, | |
5484 etc. look at the beginning of the list to find an alternative buffer to | |
5485 suggest. You can also explicitly move a buffer to the end of the list | |
5486 using @code{bury-buffer}. | |
5487 | |
5488 In addition to the global ordering in @code{Vbuffer_alist}, each frame | |
5489 has its own ordering of the list. These lists always contain the same | |
5490 elements as in @code{Vbuffer_alist} although possibly in a different | |
5491 order. @code{buffer-list} normally returns the list for the selected | |
5492 frame. This allows you to work in separate frames without things | |
5493 interfering with each other. | |
5494 | |
5495 The standard way to look up a buffer given a name is | |
5496 @code{get-buffer}, and the standard way to create a new buffer is | |
5497 @code{get-buffer-create}, which looks up a buffer with a given name, | |
5498 creating a new one if necessary. These operations correspond exactly | |
5499 with the symbol operations @code{intern-soft} and @code{intern}, | |
5500 respectively. You can also force a new buffer to be created using | |
5501 @code{generate-new-buffer}, which takes a name and (if necessary) makes | |
5502 a unique name from this by appending a number, and then creates the | |
5503 buffer. This is basically like the symbol operation @code{gensym}. | |
5504 | |
5505 @node Markers and Extents | |
5506 @section Markers and Extents | |
5507 | |
5508 Among the things associated with a buffer are things that are | |
5509 logically attached to certain buffer positions. This can be used to | |
5510 keep track of a buffer position when text is inserted and deleted, so | |
5511 that it remains at the same spot relative to the text around it; to | |
5512 assign properties to particular sections of text; etc. There are two | |
5513 such objects that are useful in this regard: they are @dfn{markers} and | |
5514 @dfn{extents}. | |
5515 | |
5516 A @dfn{marker} is simply a flag placed at a particular buffer | |
5517 position, which is moved around as text is inserted and deleted. | |
5518 Markers are used for all sorts of purposes, such as the @code{mark} that | |
5519 is the other end of textual regions to be cut, copied, etc. | |
5520 | |
5521 An @dfn{extent} is similar to two markers plus some associated | |
5522 properties, and is used to keep track of regions in a buffer as text is | |
5523 inserted and deleted, and to add properties (e.g. fonts) to particular | |
5524 regions of text. The external interface of extents is explained | |
5525 elsewhere. | |
5526 | |
5527 The important thing here is that markers and extents simply contain | |
5528 buffer positions in them as integers, and every time text is inserted or | |
5529 deleted, these positions must be updated. In order to minimize the | |
5530 amount of shuffling that needs to be done, the positions in markers and | |
5531 extents (there's one per marker, two per extent) and stored in Meminds. | |
5532 This means that they only need to be moved when the text is physically | |
5533 moved in memory; since the gap structure tries to minimize this, it also | |
5534 minimizes the number of marker and extent indices that need to be | |
5535 adjusted. Look in @file{insdel.c} for the details of how this works. | |
5536 | |
5537 One other important distinction is that markers are @dfn{temporary} | |
5538 while extents are @dfn{permanent}. This means that markers disappear as | |
5539 soon as there are no more pointers to them, and correspondingly, there | |
5540 is no way to determine what markers are in a buffer if you are just | |
5541 given the buffer. Extents remain in a buffer until they are detached | |
5542 (which could happen as a result of text being deleted) or the buffer is | |
5543 deleted, and primitives do exist to enumerate the extents in a buffer. | |
5544 | |
5545 @node Bufbytes and Emchars | |
5546 @section Bufbytes and Emchars | |
5547 | |
5548 Not yet documented. | |
5549 | |
5550 @node The Buffer Object | |
5551 @section The Buffer Object | |
5552 | |
5553 Buffers contain fields not directly accessible by the Lisp programmer. | |
5554 We describe them here, naming them by the names used in the C code. | |
5555 Many are accessible indirectly in Lisp programs via Lisp primitives. | |
5556 | |
5557 @table @code | |
5558 @item name | |
5559 The buffer name is a string that names the buffer. It is guaranteed to | |
5560 be unique. @xref{Buffer Names,,, lispref, XEmacs Lisp Programmer's | |
5561 Manual}. | |
5562 | |
5563 @item save_modified | |
5564 This field contains the time when the buffer was last saved, as an | |
5565 integer. @xref{Buffer Modification,,, lispref, XEmacs Lisp Programmer's | |
5566 Manual}. | |
5567 | |
5568 @item modtime | |
5569 This field contains the modification time of the visited file. It is | |
5570 set when the file is written or read. Every time the buffer is written | |
5571 to the file, this field is compared to the modification time of the | |
5572 file. @xref{Buffer Modification,,, lispref, XEmacs Lisp Programmer's | |
5573 Manual}. | |
5574 | |
5575 @item auto_save_modified | |
5576 This field contains the time when the buffer was last auto-saved. | |
5577 | |
5578 @item last_window_start | |
5579 This field contains the @code{window-start} position in the buffer as of | |
5580 the last time the buffer was displayed in a window. | |
5581 | |
5582 @item undo_list | |
5583 This field points to the buffer's undo list. @xref{Undo,,, lispref, | |
5584 XEmacs Lisp Programmer's Manual}. | |
5585 | |
5586 @item syntax_table_v | |
5587 This field contains the syntax table for the buffer. @xref{Syntax | |
5588 Tables,,, lispref, XEmacs Lisp Programmer's Manual}. | |
5589 | |
5590 @item downcase_table | |
5591 This field contains the conversion table for converting text to lower | |
5592 case. @xref{Case Tables,,, lispref, XEmacs Lisp Programmer's Manual}. | |
5593 | |
5594 @item upcase_table | |
5595 This field contains the conversion table for converting text to upper | |
5596 case. @xref{Case Tables,,, lispref, XEmacs Lisp Programmer's Manual}. | |
5597 | |
5598 @item case_canon_table | |
5599 This field contains the conversion table for canonicalizing text for | |
5600 case-folding search. @xref{Case Tables,,, lispref, XEmacs Lisp | |
5601 Programmer's Manual}. | |
5602 | |
5603 @item case_eqv_table | |
5604 This field contains the equivalence table for case-folding search. | |
5605 @xref{Case Tables,,, lispref, XEmacs Lisp Programmer's Manual}. | |
5606 | |
5607 @item display_table | |
5608 This field contains the buffer's display table, or @code{nil} if it | |
5609 doesn't have one. @xref{Display Tables,,, lispref, XEmacs Lisp | |
5610 Programmer's Manual}. | |
5611 | |
5612 @item markers | |
5613 This field contains the chain of all markers that currently point into | |
5614 the buffer. Deletion of text in the buffer, and motion of the buffer's | |
5615 gap, must check each of these markers and perhaps update it. | |
5616 @xref{Markers,,, lispref, XEmacs Lisp Programmer's Manual}. | |
5617 | |
5618 @item backed_up | |
5619 This field is a flag that tells whether a backup file has been made for | |
5620 the visited file of this buffer. | |
5621 | |
5622 @item mark | |
5623 This field contains the mark for the buffer. The mark is a marker, | |
5624 hence it is also included on the list @code{markers}. @xref{The Mark,,, | |
5625 lispref, XEmacs Lisp Programmer's Manual}. | |
5626 | |
5627 @item mark_active | |
5628 This field is non-@code{nil} if the buffer's mark is active. | |
5629 | |
5630 @item local_var_alist | |
5631 This field contains the association list describing the variables local | |
5632 in this buffer, and their values, with the exception of local variables | |
5633 that have special slots in the buffer object. (Those slots are omitted | |
5634 from this table.) @xref{Buffer-Local Variables,,, lispref, XEmacs Lisp | |
5635 Programmer's Manual}. | |
5636 | |
5637 @item modeline_format | |
5638 This field contains a Lisp object which controls how to display the mode | |
5639 line for this buffer. @xref{Modeline Format,,, lispref, XEmacs Lisp | |
5640 Programmer's Manual}. | |
5641 | |
5642 @item base_buffer | |
5643 This field holds the buffer's base buffer (if it is an indirect buffer), | |
5644 or @code{nil}. | |
5645 @end table | |
5646 | |
5647 @node MULE Character Sets and Encodings, The Lisp Reader and Compiler, Buffers and Textual Representation, Top | |
5648 @chapter MULE Character Sets and Encodings | |
5649 | |
5650 Recall that there are two primary ways that text is represented in | |
5651 XEmacs. The @dfn{buffer} representation sees the text as a series of | |
5652 bytes (Bufbytes), with a variable number of bytes used per character. | |
5653 The @dfn{character} representation sees the text as a series of integers | |
5654 (Emchars), one per character. The character representation is a cleaner | |
5655 representation from a theoretical standpoint, and is thus used in many | |
5656 cases when lots of manipulations on a string need to be done. However, | |
5657 the buffer representation is the standard representation used in both | |
5658 Lisp strings and buffers, and because of this, it is the ``default'' | |
5659 representation that text comes in. The reason for using this | |
5660 representation is that it's compact and is compatible with ASCII. | |
5661 | |
5662 @menu | |
5663 * Character Sets:: | |
5664 * Encodings:: | |
5665 * Internal Mule Encodings:: | |
5666 * CCL:: | |
5667 @end menu | |
5668 | |
5669 @node Character Sets | |
5670 @section Character Sets | |
5671 | |
5672 A character set (or @dfn{charset}) is an ordered set of characters. A | |
5673 particular character in a charset is indexed using one or more | |
5674 @dfn{position codes}, which are non-negative integers. The number of | |
5675 position codes needed to identify a particular character in a charset is | |
5676 called the @dfn{dimension} of the charset. In XEmacs/Mule, all charsets | |
5677 have dimension 1 or 2, and the size of all charsets (except for a few | |
5678 special cases) is either 94, 96, 94 by 94, or 96 by 96. The range of | |
5679 position codes used to index characters from any of these types of | |
5680 character sets is as follows: | |
5681 | |
5682 @example | |
5683 Charset type Position code 1 Position code 2 | |
5684 ------------------------------------------------------------ | |
5685 94 33 - 126 N/A | |
5686 96 32 - 127 N/A | |
5687 94x94 33 - 126 33 - 126 | |
5688 96x96 32 - 127 32 - 127 | |
5689 @end example | |
5690 | |
5691 Note that in the above cases position codes do not start at an | |
5692 expected value such as 0 or 1. The reason for this will become clear | |
5693 later. | |
5694 | |
5695 For example, Latin-1 is a 96-character charset, and JISX0208 (the | |
5696 Japanese national character set) is a 94x94-character charset. | |
5697 | |
5698 [Note that, although the ranges above define the @emph{valid} position | |
5699 codes for a charset, some of the slots in a particular charset may in | |
5700 fact be empty. This is the case for JISX0208, for example, where (e.g.) | |
5701 all the slots whose first position code is in the range 118 - 127 are | |
5702 empty.] | |
5703 | |
5704 There are three charsets that do not follow the above rules. All of | |
5705 them have one dimension, and have ranges of position codes as follows: | |
5706 | |
5707 @example | |
5708 Charset name Position code 1 | |
5709 ------------------------------------ | |
5710 ASCII 0 - 127 | |
5711 Control-1 0 - 31 | |
5712 Composite 0 - some large number | |
5713 @end example | |
5714 | |
5715 (The upper bound of the position code for composite characters has not | |
5716 yet been determined, but it will probably be at least 16,383). | |
5717 | |
5718 ASCII is the union of two subsidiary character sets: Printing-ASCII | |
5719 (the printing ASCII character set, consisting of position codes 33 - | |
5720 126, like for a standard 94-character charset) and Control-ASCII (the | |
5721 non-printing characters that would appear in a binary file with codes 0 | |
5722 - 32 and 127). | |
5723 | |
5724 Control-1 contains the non-printing characters that would appear in a | |
5725 binary file with codes 128 - 159. | |
5726 | |
5727 Composite contains characters that are generated by overstriking one | |
5728 or more characters from other charsets. | |
5729 | |
5730 Note that some characters in ASCII, and all characters in Control-1, | |
5731 are @dfn{control} (non-printing) characters. These have no printed | |
5732 representation but instead control some other function of the printing | |
5733 (e.g. TAB or 8 moves the current character position to the next tab | |
5734 stop). All other characters in all charsets are @dfn{graphic} | |
5735 (printing) characters. | |
5736 | |
5737 When a binary file is read in, the bytes in the file are assigned to | |
5738 character sets as follows: | |
5739 | |
5740 @example | |
5741 Bytes Character set Range | |
5742 -------------------------------------------------- | |
5743 0 - 127 ASCII 0 - 127 | |
5744 128 - 159 Control-1 0 - 31 | |
5745 160 - 255 Latin-1 32 - 127 | |
5746 @end example | |
5747 | |
5748 This is a bit ad-hoc but gets the job done. | |
5749 | |
5750 @node Encodings | |
5751 @section Encodings | |
5752 | |
5753 An @dfn{encoding} is a way of numerically representing characters from | |
5754 one or more character sets. If an encoding only encompasses one | |
5755 character set, then the position codes for the characters in that | |
5756 character set could be used directly. This is not possible, however, if | |
5757 more than one character set is to be used in the encoding. | |
5758 | |
5759 For example, the conversion detailed above between bytes in a binary | |
5760 file and characters is effectively an encoding that encompasses the | |
5761 three character sets ASCII, Control-1, and Latin-1 in a stream of 8-bit | |
5762 bytes. | |
5763 | |
5764 Thus, an encoding can be viewed as a way of encoding characters from a | |
5765 specified group of character sets using a stream of bytes, each of which | |
5766 contains a fixed number of bits (but not necessarily 8, as in the common | |
5767 usage of ``byte''). | |
5768 | |
5769 Here are descriptions of a couple of common | |
5770 encodings: | |
5771 | |
5772 @menu | |
5773 * Japanese EUC (Extended Unix Code):: | |
5774 * JIS7:: | |
5775 @end menu | |
5776 | |
5777 @node Japanese EUC (Extended Unix Code) | |
5778 @subsection Japanese EUC (Extended Unix Code) | |
5779 | |
5780 This encompasses the character sets Printing-ASCII, Japanese (aka | |
5781 JISX0208), and Japanese-Kana (half-width katakana, the right half of | |
5782 JISX0201). It uses 8-bit bytes. | |
5783 | |
5784 Note that Printing-ASCII and Japanese-Kana are 94-character charsets, | |
5785 while Japanese is a 94x94-character charset. | |
5786 | |
5787 The encoding is as follows: | |
5788 | |
5789 @example | |
5790 Character set Representation (PC=position-code) | |
5791 ------------- -------------- | |
5792 Printing-ASCII PC1 | |
5793 Japanese PC1 + 0x80 | PC2 + 0x80 | |
5794 Japanese-Kana 0x8E | PC1 + 0x80 | |
5795 @end example | |
5796 | |
5797 | |
5798 @node JIS7 | |
5799 @subsection JIS7 | |
5800 | |
5801 This encompasses the character sets Printing-ASCII, | |
5802 Japanese-Roman (the left half of JISX0201; this character | |
5803 set is very similar to Printing-ASCII and is a 94-character | |
5804 charset), Japanese, and Japanese-Kana. It uses 7-bit bytes. | |
5805 | |
5806 Unlike Japanese EUC, this is a @dfn{modal} encoding, which | |
5807 means that there are multiple states that the encoding can | |
5808 be in, which affect how the bytes are to be interpreted. | |
5809 Special sequences of bytes (called @dfn{escape sequences}) | |
5810 are used to change states. | |
5811 | |
5812 The encoding is as follows: | |
5813 | |
5814 @example | |
5815 Character set Representation (PC=position-code) | |
5816 ------------- -------------- | |
5817 Printing-ASCII PC1 | |
5818 Japanese-Roman PC1 | |
5819 Japanese PC1 PC2 | |
5820 Japanese-Kana PC1 | |
5821 | |
5822 | |
5823 Escape sequence ASCII equivalent Meaning | |
5824 --------------- ---------------- ------- | |
5825 0x1B 0x28 0x4A ESC ( J invoke Japanese-Roman | |
5826 0x1B 0x24 0x42 ESC $ B invoke Japanese | |
5827 0x1B 0x28 0x49 ESC ( I invoke Japanese-Kana | |
5828 0x1B 0x28 0x42 ESC ( B invoke Printing-ASCII | |
5829 @end example | |
5830 | |
5831 Initially, Printing-ASCII is invoked. | |
5832 | |
5833 @node Internal Mule Encodings | |
5834 @section Internal Mule Encodings | |
5835 | |
5836 In XEmacs/Mule, each character set is assigned a unique number, | |
5837 called a @dfn{leading byte}. This is used in the encodings of a | |
5838 character. Leading bytes are in the range 0x80 - 0xFF | |
5839 (except for ASCII, which has a leading byte of 0), although | |
5840 some leading bytes are reserved. | |
5841 | |
5842 Charsets whose leading byte is in the range 0x80 - 0x9F are | |
5843 called @dfn{official} and are used for built-in charsets. | |
5844 Other charsets are called @dfn{private} and have leading bytes | |
5845 in the range 0xA0 - 0xFF; these are user-defined charsets. | |
5846 | |
5847 More specifically: | |
5848 | |
5849 @example | |
5850 Character set Leading byte | |
5851 ------------- ------------ | |
5852 ASCII 0 | |
5853 Composite 0x80 | |
5854 Dimension-1 Official 0x81 - 0x8D | |
5855 (0x8E is free) | |
5856 Control-1 0x8F | |
5857 Dimension-2 Official 0x90 - 0x99 | |
5858 (0x9A - 0x9D are free; | |
5859 0x9E and 0x9F are reserved) | |
5860 Dimension-1 Private 0xA0 - 0xEF | |
5861 Dimension-2 Private 0xF0 - 0xFF | |
5862 @end example | |
5863 | |
5864 There are two internal encodings for characters in XEmacs/Mule. One | |
5865 is called @dfn{string encoding} and is an 8-bit encoding that is used | |
5866 for representing characters in a buffer or string. It uses 1 to 4 bytes | |
5867 per character. The other is called @dfn{character encoding} and is a | |
5868 19-bit encoding that is used for representing characters individually in | |
5869 a variable. | |
5870 | |
5871 (In the following descriptions, we'll ignore composite | |
5872 characters for the moment. We also give a general (structural) | |
5873 overview first, followed later by the exact details.) | |
5874 | |
5875 @menu | |
5876 * Internal String Encoding:: | |
5877 * Internal Character Encoding:: | |
5878 @end menu | |
5879 | |
5880 @node Internal String Encoding | |
5881 @subsection Internal String Encoding | |
5882 | |
5883 ASCII characters are encoded using their position code directly. | |
5884 Other characters are encoded using their leading byte followed | |
5885 by their position code(s) with the high bit set. Characters | |
5886 in private character sets have their leading byte prefixed with | |
5887 a @dfn{leading byte prefix}, which is either 0x9E or 0x9F. (No | |
5888 character sets are ever assigned these leading bytes.) Specifically: | |
5889 | |
5890 @example | |
5891 Character set Encoding (PC=position-code, LB=leading-byte) | |
5892 ------------- -------- | |
5893 ASCII PC-1 | | |
5894 Control-1 LB | PC1 + 0xA0 | | |
5895 Dimension-1 official LB | PC1 + 0x80 | | |
5896 Dimension-1 private 0x9E | LB | PC1 + 0x80 | | |
5897 Dimension-2 official LB | PC1 + 0x80 | PC2 + 0x80 | | |
5898 Dimension-2 private 0x9F | LB | PC1 + 0x80 | PC2 + 0x80 | |
5899 @end example | |
5900 | |
5901 The basic characteristic of this encoding is that the first byte | |
5902 of all characters is in the range 0x00 - 0x9F, and the second and | |
5903 following bytes of all characters is in the range 0xA0 - 0xFF. | |
5904 This means that it is impossible to get out of sync, or more | |
5905 specifically: | |
5906 | |
5907 @enumerate | |
5908 @item | |
5909 Given any byte position, the beginning of the character it is | |
5910 within can be determined in constant time. | |
5911 @item | |
5912 Given any byte position at the beginning of a character, the | |
5913 beginning of the next character can be determined in constant | |
5914 time. | |
5915 @item | |
5916 Given any byte position at the beginning of a character, the | |
5917 beginning of the previous character can be determined in constant | |
5918 time. | |
5919 @item | |
5920 Textual searches can simply treat encoded strings as if they | |
5921 were encoded in a one-byte-per-character fashion rather than | |
5922 the actual multi-byte encoding. | |
5923 @end enumerate | |
5924 | |
5925 None of the standard non-modal encodings meet all of these | |
5926 conditions. For example, EUC satisfies only (2) and (3), while | |
5927 Shift-JIS and Big5 (not yet described) satisfy only (2). (All | |
5928 non-modal encodings must satisfy (2), in order to be unambiguous.) | |
5929 | |
5930 @node Internal Character Encoding | |
5931 @subsection Internal Character Encoding | |
5932 | |
5933 One 19-bit word represents a single character. The word is | |
5934 separated into three fields: | |
5935 | |
5936 @example | |
5937 Bit number: 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00 | |
5938 <------------> <------------------> <------------------> | |
5939 Field: 1 2 3 | |
5940 @end example | |
5941 | |
5942 Note that fields 2 and 3 hold 7 bits each, while field 1 holds 5 bits. | |
5943 | |
5944 @example | |
5945 Character set Field 1 Field 2 Field 3 | |
5946 ------------- ------- ------- ------- | |
5947 ASCII 0 0 PC1 | |
5948 range: (00 - 7F) | |
5949 Control-1 0 1 PC1 | |
5950 range: (00 - 1F) | |
5951 Dimension-1 official 0 LB - 0x80 PC1 | |
5952 range: (01 - 0D) (20 - 7F) | |
5953 Dimension-1 private 0 LB - 0x80 PC1 | |
5954 range: (20 - 6F) (20 - 7F) | |
5955 Dimension-2 official LB - 0x8F PC1 PC2 | |
5956 range: (01 - 0A) (20 - 7F) (20 - 7F) | |
5957 Dimension-2 private LB - 0xE1 PC1 PC2 | |
5958 range: (0F - 1E) (20 - 7F) (20 - 7F) | |
5959 Composite 0x1F ? ? | |
5960 @end example | |
5961 | |
5962 Note that character codes 0 - 255 are the same as the ``binary encoding'' | |
5963 described above. | |
5964 | |
5965 @node CCL | |
5966 @section CCL | |
5967 | |
5968 @example | |
5969 CCL PROGRAM SYNTAX: | |
5970 CCL_PROGRAM := (CCL_MAIN_BLOCK | |
5971 [ CCL_EOF_BLOCK ]) | |
5972 | |
5973 CCL_MAIN_BLOCK := CCL_BLOCK | |
5974 CCL_EOF_BLOCK := CCL_BLOCK | |
5975 | |
5976 CCL_BLOCK := STATEMENT | (STATEMENT [STATEMENT ...]) | |
5977 STATEMENT := | |
5978 SET | IF | BRANCH | LOOP | REPEAT | BREAK | |
5979 | READ | WRITE | |
5980 | |
5981 SET := (REG = EXPRESSION) | (REG SELF_OP EXPRESSION) | |
5982 | INT-OR-CHAR | |
5983 | |
5984 EXPRESSION := ARG | (EXPRESSION OP ARG) | |
5985 | |
5986 IF := (if EXPRESSION CCL_BLOCK CCL_BLOCK) | |
5987 BRANCH := (branch EXPRESSION CCL_BLOCK [CCL_BLOCK ...]) | |
5988 LOOP := (loop STATEMENT [STATEMENT ...]) | |
5989 BREAK := (break) | |
5990 REPEAT := (repeat) | |
5991 | (write-repeat [REG | INT-OR-CHAR | string]) | |
5992 | (write-read-repeat REG [INT-OR-CHAR | string | ARRAY]?) | |
5993 READ := (read REG) | (read REG REG) | |
5994 | (read-if REG ARITH_OP ARG CCL_BLOCK CCL_BLOCK) | |
5995 | (read-branch REG CCL_BLOCK [CCL_BLOCK ...]) | |
5996 WRITE := (write REG) | (write REG REG) | |
5997 | (write INT-OR-CHAR) | (write STRING) | STRING | |
5998 | (write REG ARRAY) | |
5999 END := (end) | |
6000 | |
6001 REG := r0 | r1 | r2 | r3 | r4 | r5 | r6 | r7 | |
6002 ARG := REG | INT-OR-CHAR | |
6003 OP := + | - | * | / | % | & | '|' | ^ | << | >> | <8 | >8 | // | |
6004 | < | > | == | <= | >= | != | |
6005 SELF_OP := | |
6006 += | -= | *= | /= | %= | &= | '|=' | ^= | <<= | >>= | |
6007 ARRAY := '[' INT-OR-CHAR ... ']' | |
6008 INT-OR-CHAR := INT | CHAR | |
6009 | |
6010 MACHINE CODE: | |
6011 | |
6012 The machine code consists of a vector of 32-bit words. | |
6013 The first such word specifies the start of the EOF section of the code; | |
6014 this is the code executed to handle any stuff that needs to be done | |
6015 (e.g. designating back to ASCII and left-to-right mode) after all | |
6016 other encoded/decoded data has been written out. This is not used for | |
6017 charset CCL programs. | |
6018 | |
6019 REGISTER: 0..7 -- refered by RRR or rrr | |
6020 | |
6021 OPERATOR BIT FIELD (27-bit): XXXXXXXXXXXXXXX RRR TTTTT | |
6022 TTTTT (5-bit): operator type | |
6023 RRR (3-bit): register number | |
6024 XXXXXXXXXXXXXXXX (15-bit): | |
6025 CCCCCCCCCCCCCCC: constant or address | |
6026 000000000000rrr: register number | |
6027 | |
6028 AAAA: 00000 + | |
6029 00001 - | |
6030 00010 * | |
6031 00011 / | |
6032 00100 % | |
6033 00101 & | |
6034 00110 | | |
6035 00111 ~ | |
6036 | |
6037 01000 << | |
6038 01001 >> | |
6039 01010 <8 | |
6040 01011 >8 | |
6041 01100 // | |
6042 01101 not used | |
6043 01110 not used | |
6044 01111 not used | |
6045 | |
6046 10000 < | |
6047 10001 > | |
6048 10010 == | |
6049 10011 <= | |
6050 10100 >= | |
6051 10101 != | |
6052 | |
6053 OPERATORS: TTTTT RRR XX.. | |
6054 | |
6055 SetCS: 00000 RRR C...C RRR = C...C | |
6056 SetCL: 00001 RRR ..... RRR = c...c | |
6057 c.............c | |
6058 SetR: 00010 RRR ..rrr RRR = rrr | |
6059 SetA: 00011 RRR ..rrr RRR = array[rrr] | |
6060 C.............C size of array = C...C | |
6061 c.............c contents = c...c | |
6062 | |
6063 Jump: 00100 000 c...c jump to c...c | |
6064 JumpCond: 00101 RRR c...c if (!RRR) jump to c...c | |
6065 WriteJump: 00110 RRR c...c Write1 RRR, jump to c...c | |
6066 WriteReadJump: 00111 RRR c...c Write1, Read1 RRR, jump to c...c | |
6067 WriteCJump: 01000 000 c...c Write1 C...C, jump to c...c | |
6068 C...C | |
6069 WriteCReadJump: 01001 RRR c...c Write1 C...C, Read1 RRR, | |
6070 C.............C and jump to c...c | |
6071 WriteSJump: 01010 000 c...c WriteS, jump to c...c | |
6072 C.............C | |
6073 S.............S | |
6074 ... | |
6075 WriteSReadJump: 01011 RRR c...c WriteS, Read1 RRR, jump to c...c | |
6076 C.............C | |
6077 S.............S | |
6078 ... | |
6079 WriteAReadJump: 01100 RRR c...c WriteA, Read1 RRR, jump to c...c | |
6080 C.............C size of array = C...C | |
6081 c.............c contents = c...c | |
6082 ... | |
6083 Branch: 01101 RRR C...C if (RRR >= 0 && RRR < C..) | |
6084 c.............c branch to (RRR+1)th address | |
6085 Read1: 01110 RRR ... read 1-byte to RRR | |
6086 Read2: 01111 RRR ..rrr read 2-byte to RRR and rrr | |
6087 ReadBranch: 10000 RRR C...C Read1 and Branch | |
6088 c.............c | |
6089 ... | |
6090 Write1: 10001 RRR ..... write 1-byte RRR | |
6091 Write2: 10010 RRR ..rrr write 2-byte RRR and rrr | |
6092 WriteC: 10011 000 ..... write 1-char C...CC | |
6093 C.............C | |
6094 WriteS: 10100 000 ..... write C..-byte of string | |
6095 C.............C | |
6096 S.............S | |
6097 ... | |
6098 WriteA: 10101 RRR ..... write array[RRR] | |
6099 C.............C size of array = C...C | |
6100 c.............c contents = c...c | |
6101 ... | |
6102 End: 10110 000 ..... terminate the execution | |
6103 | |
6104 SetSelfCS: 10111 RRR C...C RRR AAAAA= C...C | |
6105 ..........AAAAA | |
6106 SetSelfCL: 11000 RRR ..... RRR AAAAA= c...c | |
6107 c.............c | |
6108 ..........AAAAA | |
6109 SetSelfR: 11001 RRR ..Rrr RRR AAAAA= rrr | |
6110 ..........AAAAA | |
6111 SetExprCL: 11010 RRR ..Rrr RRR = rrr AAAAA c...c | |
6112 c.............c | |
6113 ..........AAAAA | |
6114 SetExprR: 11011 RRR ..rrr RRR = rrr AAAAA Rrr | |
6115 ............Rrr | |
6116 ..........AAAAA | |
6117 JumpCondC: 11100 RRR c...c if !(RRR AAAAA C..) jump to c...c | |
6118 C.............C | |
6119 ..........AAAAA | |
6120 JumpCondR: 11101 RRR c...c if !(RRR AAAAA rrr) jump to c...c | |
6121 ............rrr | |
6122 ..........AAAAA | |
6123 ReadJumpCondC: 11110 RRR c...c Read1 and JumpCondC | |
6124 C.............C | |
6125 ..........AAAAA | |
6126 ReadJumpCondR: 11111 RRR c...c Read1 and JumpCondR | |
6127 ............rrr | |
6128 ..........AAAAA | |
6129 @end example | |
6130 | |
6131 @node The Lisp Reader and Compiler, Lstreams, MULE Character Sets and Encodings, Top | |
6132 @chapter The Lisp Reader and Compiler | |
6133 | |
6134 Not yet documented. | |
6135 | |
6136 @node Lstreams, Consoles; Devices; Frames; Windows, The Lisp Reader and Compiler, Top | |
6137 @chapter Lstreams | |
6138 | |
6139 An @dfn{lstream} is an internal Lisp object that provides a generic | |
6140 buffering stream implementation. Conceptually, you send data to the | |
6141 stream or read data from the stream, not caring what's on the other end | |
6142 of the stream. The other end could be another stream, a file | |
6143 descriptor, a stdio stream, a fixed block of memory, a reallocating | |
6144 block of memory, etc. The main purpose of the stream is to provide a | |
6145 standard interface and to do buffering. Macros are defined to read or | |
6146 write characters, so the calling functions do not have to worry about | |
6147 blocking data together in order to achieve efficiency. | |
6148 | |
6149 @menu | |
6150 * Creating an Lstream:: Creating an lstream object. | |
6151 * Lstream Types:: Different sorts of things that are streamed. | |
6152 * Lstream Functions:: Functions for working with lstreams. | |
6153 * Lstream Methods:: Creating new lstream types. | |
6154 @end menu | |
6155 | |
6156 @node Creating an Lstream | |
6157 @section Creating an Lstream | |
6158 | |
6159 Lstreams come in different types, depending on what is being interfaced | |
6160 to. Although the primitive for creating new lstreams is | |
6161 @code{Lstream_new()}, generally you do not call this directly. Instead, | |
6162 you call some type-specific creation function, which creates the lstream | |
6163 and initializes it as appropriate for the particular type. | |
6164 | |
6165 All lstream creation functions take a @var{mode} argument, specifying | |
6166 what mode the lstream should be opened as. This controls whether the | |
6167 lstream is for input and output, and optionally whether data should be | |
6168 blocked up in units of MULE characters. Note that some types of | |
6169 lstreams can only be opened for input; others only for output; and | |
6170 others can be opened either way. #### Richard Mlynarik thinks that | |
6171 there should be a strict separation between input and output streams, | |
6172 and he's probably right. | |
6173 | |
6174 @var{mode} is a string, one of | |
6175 | |
6176 @table @code | |
6177 @item "r" | |
6178 Open for reading. | |
6179 @item "w" | |
6180 Open for writing. | |
6181 @item "rc" | |
6182 Open for reading, but ``read'' never returns partial MULE characters. | |
6183 @item "wc" | |
6184 Open for writing, but never writes partial MULE characters. | |
6185 @end table | |
6186 | |
6187 @node Lstream Types | |
6188 @section Lstream Types | |
6189 | |
6190 @table @asis | |
6191 @item stdio | |
6192 | |
6193 @item filedesc | |
6194 | |
6195 @item lisp-string | |
6196 | |
6197 @item fixed-buffer | |
6198 | |
6199 @item resizing-buffer | |
6200 | |
6201 @item dynarr | |
6202 | |
6203 @item lisp-buffer | |
6204 | |
6205 @item print | |
6206 | |
6207 @item decoding | |
6208 | |
6209 @item encoding | |
6210 @end table | |
6211 | |
6212 @node Lstream Functions | |
6213 @section Lstream Functions | |
6214 | |
6215 @deftypefun {Lstream *} Lstream_new (Lstream_implementation *@var{imp}, CONST char *@var{mode}) | |
6216 Allocate and return a new Lstream. This function is not really meant to | |
6217 be called directly; rather, each stream type should provide its own | |
6218 stream creation function, which creates the stream and does any other | |
6219 necessary creation stuff (e.g. opening a file). | |
6220 @end deftypefun | |
6221 | |
6222 @deftypefun void Lstream_set_buffering (Lstream *@var{lstr}, Lstream_buffering @var{buffering}, int @var{buffering_size}) | |
6223 Change the buffering of a stream. See @file{lstream.h}. By default the | |
6224 buffering is @code{STREAM_BLOCK_BUFFERED}. | |
6225 @end deftypefun | |
6226 | |
6227 @deftypefun int Lstream_flush (Lstream *@var{lstr}) | |
6228 Flush out any pending unwritten data in the stream. Clear any buffered | |
6229 input data. Returns 0 on success, -1 on error. | |
6230 @end deftypefun | |
6231 | |
6232 @deftypefn Macro int Lstream_putc (Lstream *@var{stream}, int @var{c}) | |
6233 Write out one byte to the stream. This is a macro and so it is very | |
6234 efficient. The @var{c} argument is only evaluated once but the @var{stream} | |
6235 argument is evaluated more than once. Returns 0 on success, -1 on | |
6236 error. | |
6237 @end deftypefn | |
6238 | |
6239 @deftypefn Macro int Lstream_getc (Lstream *@var{stream}) | |
6240 Read one byte from the stream. This is a macro and so it is very | |
6241 efficient. The @var{stream} argument is evaluated more than once. Return | |
6242 value is -1 for EOF or error. | |
6243 @end deftypefn | |
6244 | |
6245 @deftypefn Macro void Lstream_ungetc (Lstream *@var{stream}, int @var{c}) | |
6246 Push one byte back onto the input queue. This will be the next byte | |
6247 read from the stream. Any number of bytes can be pushed back and will | |
6248 be read in the reverse order they were pushed back -- most recent | |
6249 first. (This is necessary for consistency -- if there are a number of | |
6250 bytes that have been unread and I read and unread a byte, it needs to be | |
6251 the first to be read again.) This is a macro and so it is very | |
6252 efficient. The @var{c} argument is only evaluated once but the @var{stream} | |
6253 argument is evaluated more than once. | |
6254 @end deftypefn | |
6255 | |
6256 @deftypefun int Lstream_fputc (Lstream *@var{stream}, int @var{c}) | |
6257 @deftypefunx int Lstream_fgetc (Lstream *@var{stream}) | |
6258 @deftypefunx void Lstream_fungetc (Lstream *@var{stream}, int @var{c}) | |
6259 Function equivalents of the above macros. | |
6260 @end deftypefun | |
6261 | |
6262 @deftypefun int Lstream_read (Lstream *@var{stream}, void *@var{data}, int @var{size}) | |
6263 Read @var{size} bytes of @var{data} from the stream. Return the number | |
6264 of bytes read. 0 means EOF. -1 means an error occurred and no bytes | |
6265 were read. | |
6266 @end deftypefun | |
6267 | |
6268 @deftypefun int Lstream_write (Lstream *@var{stream}, void *@var{data}, int @var{size}) | |
6269 Write @var{size} bytes of @var{data} to the stream. Return the number | |
6270 of bytes written. -1 means an error occurred and no bytes were written. | |
6271 @end deftypefun | |
6272 | |
6273 @deftypefun void Lstream_unread (Lstream *@var{stream}, void *@var{data}, int @var{size}) | |
6274 Push back @var{size} bytes of @var{data} onto the input queue. The next | |
6275 call to @code{Lstream_read()} with the same size will read the same | |
6276 bytes back. Note that this will be the case even if there is other | |
6277 pending unread data. | |
6278 @end deftypefun | |
6279 | |
6280 @deftypefun int Lstream_close (Lstream *@var{stream}) | |
6281 Close the stream. All data will be flushed out. | |
6282 @end deftypefun | |
6283 | |
6284 @deftypefun void Lstream_reopen (Lstream *@var{stream}) | |
6285 Reopen a closed stream. This enables I/O on it again. This is not | |
6286 meant to be called except from a wrapper routine that reinitializes | |
6287 variables and such -- the close routine may well have freed some | |
6288 necessary storage structures, for example. | |
6289 @end deftypefun | |
6290 | |
6291 @deftypefun void Lstream_rewind (Lstream *@var{stream}) | |
6292 Rewind the stream to the beginning. | |
6293 @end deftypefun | |
6294 | |
6295 @node Lstream Methods | |
6296 @section Lstream Methods | |
6297 | |
6298 @deftypefn {Lstream Method} int reader (Lstream *@var{stream}, unsigned char *@var{data}, int @var{size}) | |
6299 Read some data from the stream's end and store it into @var{data}, which | |
6300 can hold @var{size} bytes. Return the number of bytes read. A return | |
6301 value of 0 means no bytes can be read at this time. This may be because | |
6302 of an EOF, or because there is a granularity greater than one byte that | |
6303 the stream imposes on the returned data, and @var{size} is less than | |
6304 this granularity. (This will happen frequently for streams that need to | |
6305 return whole characters, because @code{Lstream_read()} calls the reader | |
6306 function repeatedly until it has the number of bytes it wants or until 0 | |
6307 is returned.) The lstream functions do not treat a 0 return as EOF or | |
6308 do anything special; however, the calling function will interpret any 0 | |
6309 it gets back as EOF. This will normally not happen unless the caller | |
6310 calls @code{Lstream_read()} with a very small size. | |
6311 | |
6312 This function can be @code{NULL} if the stream is output-only. | |
6313 @end deftypefn | |
6314 | |
6315 @deftypefn {Lstream Method} int writer (Lstream *@var{stream}, CONST unsigned char *@var{data}, int @var{size}) | |
6316 Send some data to the stream's end. Data to be sent is in @var{data} | |
6317 and is @var{size} bytes. Return the number of bytes sent. This | |
6318 function can send and return fewer bytes than is passed in; in that | |
6319 case, the function will just be called again until there is no data left | |
6320 or 0 is returned. A return value of 0 means that no more data can be | |
6321 currently stored, but there is no error; the data will be squirreled | |
6322 away until the writer can accept data. (This is useful, e.g., if you're | |
6323 dealing with a non-blocking file descriptor and are getting | |
6324 @code{EWOULDBLOCK} errors.) This function can be @code{NULL} if the | |
6325 stream is input-only. | |
6326 @end deftypefn | |
6327 | |
6328 @deftypefn {Lstream Method} int rewinder (Lstream *@var{stream}) | |
6329 Rewind the stream. If this is @code{NULL}, the stream is not seekable. | |
6330 @end deftypefn | |
6331 | |
6332 @deftypefn {Lstream Method} int seekable_p (Lstream *@var{stream}) | |
6333 Indicate whether this stream is seekable -- i.e. it can be rewound. | |
6334 This method is ignored if the stream does not have a rewind method. If | |
6335 this method is not present, the result is determined by whether a rewind | |
6336 method is present. | |
6337 @end deftypefn | |
6338 | |
6339 @deftypefn {Lstream Method} int flusher (Lstream *@var{stream}) | |
6340 Perform any additional operations necessary to flush the data in this | |
6341 stream. | |
6342 @end deftypefn | |
6343 | |
6344 @deftypefn {Lstream Method} int pseudo_closer (Lstream *@var{stream}) | |
6345 @end deftypefn | |
6346 | |
6347 @deftypefn {Lstream Method} int closer (Lstream *@var{stream}) | |
6348 Perform any additional operations necessary to close this stream down. | |
6349 May be @code{NULL}. This function is called when @code{Lstream_close()} | |
6350 is called or when the stream is garbage-collected. When this function | |
6351 is called, all pending data in the stream will already have been written | |
6352 out. | |
6353 @end deftypefn | |
6354 | |
6355 @deftypefn {Lstream Method} Lisp_Object marker (Lisp_Object @var{lstream}, void (*@var{markfun}) (Lisp_Object)) | |
6356 Mark this object for garbage collection. Same semantics as a standard | |
6357 @code{Lisp_Object} marker. This function can be @code{NULL}. | |
6358 @end deftypefn | |
6359 | |
6360 @node Consoles; Devices; Frames; Windows, The Redisplay Mechanism, Lstreams, Top | |
6361 @chapter Consoles; Devices; Frames; Windows | |
6362 | |
6363 @menu | |
6364 * Introduction to Consoles; Devices; Frames; Windows:: | |
6365 * Point:: | |
6366 * Window Hierarchy:: | |
6367 * The Window Object:: | |
6368 @end menu | |
6369 | |
6370 @node Introduction to Consoles; Devices; Frames; Windows | |
6371 @section Introduction to Consoles; Devices; Frames; Windows | |
6372 | |
6373 A window-system window that you see on the screen is called a | |
6374 @dfn{frame} in Emacs terminology. Each frame is subdivided into one or | |
6375 more non-overlapping panes, called (confusingly) @dfn{windows}. Each | |
6376 window displays the text of a buffer in it. (See above on Buffers.) Note | |
6377 that buffers and windows are independent entities: Two or more windows | |
6378 can be displaying the same buffer (potentially in different locations), | |
6379 and a buffer can be displayed in no windows. | |
6380 | |
6381 A single display screen that contains one or more frames is called | |
6382 a @dfn{display}. Under most circumstances, there is only one display. | |
6383 However, more than one display can exist, for example if you have | |
6384 a @dfn{multi-headed} console, i.e. one with a single keyboard but | |
6385 multiple displays. (Typically in such a situation, the various | |
6386 displays act like one large display, in that the mouse is only | |
6387 in one of them at a time, and moving the mouse off of one moves | |
6388 it into another.) In some cases, the different displays will | |
6389 have different characteristics, e.g. one color and one mono. | |
6390 | |
6391 XEmacs can display frames on multiple displays. It can even deal | |
6392 simultaneously with frames on multiple keyboards (called @dfn{consoles} in | |
6393 XEmacs terminology). Here is one case where this might be useful: You | |
6394 are using XEmacs on your workstation at work, and leave it running. | |
6395 Then you go home and dial in on a TTY line, and you can use the | |
6396 already-running XEmacs process to display another frame on your local | |
6397 TTY. | |
6398 | |
6399 Thus, there is a hierarchy console -> display -> frame -> window. | |
6400 There is a separate Lisp object type for each of these four concepts. | |
6401 Furthermore, there is logically a @dfn{selected console}, | |
6402 @dfn{selected display}, @dfn{selected frame}, and @dfn{selected window}. | |
6403 This particular object is distinguished in various ways, such as | |
6404 that it is the default object for various functions that act | |
6405 on objects of that type. Note that every containing object | |
6406 rememembers the ``selected'' object among the objects that it | |
6407 contains: e.g. not only is there a selected window, but | |
6408 every frame remembers the last window in it that was selected, | |
6409 and changing the selected frame causes the remembered window | |
6410 within it to become the selected window. Similar relationships | |
6411 apply for consoles to devices and devices to frames. | |
6412 | |
6413 @node Point | |
6414 @section Point | |
6415 | |
6416 Recall that every buffer has a current insertion position, called | |
6417 @dfn{point}. Now, two or more windows may be displaying the same buffer, | |
6418 and the text cursor in the two windows (i.e. @code{point}) can be in | |
6419 two different places. You may ask, how can that be, since each | |
6420 buffer has only one value of @code{point}? The answer is that each window | |
6421 also has a value of @code{point} that is squirreled away in it. There | |
6422 is only one selected window, and the value of ``point'' in that buffer | |
6423 corresponds to that window. When the selected window is changed | |
6424 from one window to another displaying the same buffer, the old | |
6425 value of @code{point} is stored into the old window's ``point'' and the | |
6426 value of @code{point} from the new window is retrieved and made the | |
6427 value of @code{point} in the buffer. This means that @code{window-point} | |
6428 for the selected window is potentially inaccurate, and if you | |
6429 want to retrieve the correct value of @code{point} for a window, | |
6430 you must special-case on the selected window and retrieve the | |
6431 buffer's point instead. This is related to why @code{save-window-excursion} | |
6432 does not save the selected window's value of @code{point}. | |
6433 | |
6434 @node Window Hierarchy | |
6435 @section Window Hierarchy | |
6436 @cindex window hierarchy | |
6437 @cindex hierarchy of windows | |
6438 | |
6439 If a frame contains multiple windows (panes), they are always created | |
6440 by splitting an existing window along the horizontal or vertical axis. | |
6441 Terminology is a bit confusing here: to @dfn{split a window | |
6442 horizontally} means to create two side-by-side windows, i.e. to make a | |
6443 @emph{vertical} cut in a window. Likewise, to @dfn{split a window | |
6444 vertically} means to create two windows, one above the other, by making | |
6445 a @emph{horizontal} cut. | |
6446 | |
6447 If you split a window and then split again along the same axis, you | |
6448 will end up with a number of panes all arranged along the same axis. | |
6449 The precise way in which the splits were made should not be important, | |
6450 and this is reflected internally. Internally, all windows are arranged | |
6451 in a tree, consisting of two types of windows, @dfn{combination} windows | |
6452 (which have children, and are covered completely by those children) and | |
6453 @dfn{leaf} windows, which have no children and are visible. Every | |
6454 combination window has two or more children, all arranged along the same | |
6455 axis. There are (logically) two subtypes of windows, depending on | |
6456 whether their children are horizontally or vertically arrayed. There is | |
6457 always one root window, which is either a leaf window (if the frame | |
6458 contains only one window) or a combination window (if the frame contains | |
6459 more than one window). In the latter case, the root window will have | |
6460 two or more children, either horizontally or vertically arrayed, and | |
6461 each of those children will be either a leaf window or another | |
6462 combination window. | |
6463 | |
6464 Here are some rules: | |
6465 | |
6466 @enumerate | |
6467 @item | |
6468 Horizontal combination windows can never have children that | |
6469 are horizontal combination windows; same for vertical. | |
6470 | |
6471 @item | |
6472 Only leaf windows can be split (obviously) and this splitting does one | |
6473 of two things: (a) turns the leaf window into a combination window and | |
6474 creates two new leaf children, or (b) turns the leaf window into one of | |
6475 the two new leaves and creates the other leaf. Rule (1) dictates which | |
6476 of these two outcomes happens. | |
6477 | |
6478 @item | |
6479 Every combination window must have at least two children. | |
6480 | |
6481 @item | |
6482 Leaf windows can never become combination windows. They can be deleted, | |
6483 however. If this results in a violation of (3), the parent combination | |
6484 window also gets deleted. | |
6485 | |
6486 @item | |
6487 All functions that accept windows must be prepared to accept combination | |
6488 windows, and do something sane (e.g. signal an error if so). | |
6489 Combination windows @emph{do} escape to the Lisp level. | |
6490 | |
6491 @item | |
6492 All windows have three fields governing their contents: | |
6493 these are @dfn{hchild} (a list of horizontally-arrayed children), | |
6494 @dfn{vchild} (a list of vertically-arrayed children), and @dfn{buffer} | |
6495 (the buffer contained in a leaf window). Exactly one of | |
6496 these will be non-nil. Remember that @dfn{horizontally-arrayed} | |
6497 means ``side-by-side'' and @dfn{vertically-arrayed} means | |
6498 @dfn{one above the other}. | |
6499 | |
6500 @item | |
6501 Leaf windows also have markers in their @code{start} (the | |
6502 first buffer position displayed in the window) and @code{pointm} | |
6503 (the window's stashed value of @code{point} -- see above) fields, | |
6504 while combination windows have nil in these fields. | |
6505 | |
6506 @item | |
6507 The list of children for a window is threaded through the | |
6508 @code{next} and @code{prev} fields of each child window. | |
6509 | |
6510 @item | |
6511 @strong{Deleted windows can be undeleted}. This happens as a result of | |
6512 restoring a window configuration, and is unlike frames, displays, and | |
6513 consoles, which, once deleted, can never be restored. Deleting a window | |
6514 does nothing except set a special @code{dead} bit to 1 and clear out the | |
6515 @code{next}, @code{prev}, @code{hchild}, and @code{vchild} fields, for | |
6516 GC purposes. | |
6517 | |
6518 @item | |
6519 Most frames actually have two top-level windows -- one for the | |
6520 minibuffer and one (the @dfn{root}) for everything else. The modeline | |
6521 (if present) separates these two. The @code{next} field of the root | |
6522 points to the minibuffer, and the @code{prev} field of the minibuffer | |
6523 points to the root. The other @code{next} and @code{prev} fields are | |
6524 @code{nil}, and the frame points to both of these windows. | |
6525 Minibuffer-less frames have no minibuffer window, and the @code{next} | |
6526 and @code{prev} of the root window are @code{nil}. Minibuffer-only | |
6527 frames have no root window, and the @code{next} of the minibuffer window | |
6528 is @code{nil} but the @code{prev} points to itself. (#### This is an | |
6529 artifact that should be fixed.) | |
6530 @end enumerate | |
6531 | |
6532 @node The Window Object | |
6533 @section The Window Object | |
6534 | |
6535 Windows have the following accessible fields: | |
6536 | |
6537 @table @code | |
6538 @item frame | |
6539 The frame that this window is on. | |
6540 | |
6541 @item mini_p | |
6542 Non-@code{nil} if this window is a minibuffer window. | |
6543 | |
6544 @item buffer | |
6545 The buffer that the window is displaying. This may change often during | |
6546 the life of the window. | |
6547 | |
6548 @item dedicated | |
6549 Non-@code{nil} if this window is dedicated to its buffer. | |
6550 | |
6551 @item pointm | |
6552 @cindex window point internals | |
6553 This is the value of point in the current buffer when this window is | |
6554 selected; when it is not selected, it retains its previous value. | |
6555 | |
6556 @item start | |
6557 The position in the buffer that is the first character to be displayed | |
6558 in the window. | |
6559 | |
6560 @item force_start | |
6561 If this flag is non-@code{nil}, it says that the window has been | |
6562 scrolled explicitly by the Lisp program. This affects what the next | |
6563 redisplay does if point is off the screen: instead of scrolling the | |
6564 window to show the text around point, it moves point to a location that | |
6565 is on the screen. | |
6566 | |
6567 @item last_modified | |
6568 The @code{modified} field of the window's buffer, as of the last time | |
6569 a redisplay completed in this window. | |
6570 | |
6571 @item last_point | |
6572 The buffer's value of point, as of the last time | |
6573 a redisplay completed in this window. | |
6574 | |
6575 @item left | |
6576 This is the left-hand edge of the window, measured in columns. (The | |
6577 leftmost column on the screen is @w{column 0}.) | |
6578 | |
6579 @item top | |
6580 This is the top edge of the window, measured in lines. (The top line on | |
6581 the screen is @w{line 0}.) | |
6582 | |
6583 @item height | |
6584 The height of the window, measured in lines. | |
6585 | |
6586 @item width | |
6587 The width of the window, measured in columns. | |
6588 | |
6589 @item next | |
6590 This is the window that is the next in the chain of siblings. It is | |
6591 @code{nil} in a window that is the rightmost or bottommost of a group of | |
6592 siblings. | |
6593 | |
6594 @item prev | |
6595 This is the window that is the previous in the chain of siblings. It is | |
6596 @code{nil} in a window that is the leftmost or topmost of a group of | |
6597 siblings. | |
6598 | |
6599 @item parent | |
6600 Internally, XEmacs arranges windows in a tree; each group of siblings has | |
6601 a parent window whose area includes all the siblings. This field points | |
6602 to a window's parent. | |
6603 | |
6604 Parent windows do not display buffers, and play little role in display | |
6605 except to shape their child windows. Emacs Lisp programs usually have | |
6606 no access to the parent windows; they operate on the windows at the | |
6607 leaves of the tree, which actually display buffers. | |
6608 | |
6609 @item hscroll | |
6610 This is the number of columns that the display in the window is scrolled | |
6611 horizontally to the left. Normally, this is 0. | |
6612 | |
6613 @item use_time | |
6614 This is the last time that the window was selected. The function | |
6615 @code{get-lru-window} uses this field. | |
6616 | |
6617 @item display_table | |
6618 The window's display table, or @code{nil} if none is specified for it. | |
6619 | |
6620 @item update_mode_line | |
6621 Non-@code{nil} means this window's mode line needs to be updated. | |
6622 | |
6623 @item base_line_number | |
6624 The line number of a certain position in the buffer, or @code{nil}. | |
6625 This is used for displaying the line number of point in the mode line. | |
6626 | |
6627 @item base_line_pos | |
6628 The position in the buffer for which the line number is known, or | |
6629 @code{nil} meaning none is known. | |
6630 | |
6631 @item region_showing | |
6632 If the region (or part of it) is highlighted in this window, this field | |
6633 holds the mark position that made one end of that region. Otherwise, | |
6634 this field is @code{nil}. | |
6635 @end table | |
6636 | |
6637 @node The Redisplay Mechanism, Extents, Consoles; Devices; Frames; Windows, Top | |
6638 @chapter The Redisplay Mechanism | |
6639 | |
6640 The redisplay mechanism is one of the most complicated sections of | |
6641 XEmacs, especially from a conceptual standpoint. This is doubly so | |
6642 because, unlike for the basic aspects of the Lisp interpreter, the | |
6643 computer science theories of how to efficiently handle redisplay are not | |
6644 well-developed. | |
6645 | |
6646 When working with the redisplay mechanism, remember the Golden Rules | |
6647 of Redisplay: | |
6648 | |
6649 @enumerate | |
6650 @item | |
6651 It Is Better To Be Correct Than Fast. | |
6652 @item | |
6653 Thou Shalt Not Run Elisp From Within Redisplay. | |
6654 @item | |
6655 It Is Better To Be Fast Than Not To Be. | |
6656 @end enumerate | |
6657 | |
6658 @menu | |
6659 * Critical Redisplay Sections:: | |
6660 * Line Start Cache:: | |
6661 @end menu | |
6662 | |
6663 @node Critical Redisplay Sections | |
6664 @section Critical Redisplay Sections | |
6665 @cindex critical redisplay sections | |
6666 | |
6667 Within this section, we are defenseless and assume that the | |
6668 following cannot happen: | |
6669 | |
6670 @enumerate | |
6671 @item | |
6672 garbage collection | |
6673 @item | |
6674 Lisp code evaluation | |
6675 @item | |
6676 frame size changes | |
6677 @end enumerate | |
6678 | |
6679 We ensure (3) by calling @code{hold_frame_size_changes()}, which | |
6680 will cause any pending frame size changes to get put on hold | |
6681 till after the end of the critical section. (1) follows | |
6682 automatically if (2) is met. #### Unfortunately, there are | |
6683 some places where Lisp code can be called within this section. | |
6684 We need to remove them. | |
6685 | |
6686 If @code{Fsignal()} is called during this critical section, we | |
6687 will @code{abort()}. | |
6688 | |
6689 If garbage collection is called during this critical section, | |
6690 we simply return. #### We should abort instead. | |
6691 | |
6692 #### If a frame-size change does occur we should probably | |
6693 actually be preempting redisplay. | |
6694 | |
6695 @node Line Start Cache | |
6696 @section Line Start Cache | |
6697 @cindex line start cache | |
6698 | |
6699 The traditional scrolling code in Emacs breaks in a variable height | |
6700 world. It depends on the key assumption that the number of lines that | |
6701 can be displayed at any given time is fixed. This led to a complete | |
6702 separation of the scrolling code from the redisplay code. In order to | |
6703 fully support variable height lines, the scrolling code must actually be | |
6704 tightly integrated with redisplay. Only redisplay can determine how | |
6705 many lines will be displayed on a screen for any given starting point. | |
6706 | |
6707 What is ideally wanted is a complete list of the starting buffer | |
6708 position for every possible display line of a buffer along with the | |
6709 height of that display line. Maintaining such a full list would be very | |
6710 expensive. We settle for having it include information for all areas | |
6711 which we happen to generate anyhow (i.e. the region currently being | |
6712 displayed) and for those areas we need to work with. | |
6713 | |
6714 In order to ensure that the cache accurately represents what redisplay | |
6715 would actually show, it is necessary to invalidate it in many | |
6716 situations. If the buffer changes, the starting positions may no longer | |
6717 be correct. If a face or an extent has changed then the line heights | |
6718 may have altered. These events happen frequently enough that the cache | |
6719 can end up being constantly disabled. With this potentially constant | |
6720 invalidation when is the cache ever useful? | |
6721 | |
6722 Even if the cache is invalidated before every single usage, it is | |
6723 necessary. Scrolling often requires knowledge about display lines which | |
6724 are actually above or below the visible region. The cache provides a | |
6725 convenient light-weight method of storing this information for multiple | |
6726 display regions. This knowledge is necessary for the scrolling code to | |
6727 always obey the First Golden Rule of Redisplay. | |
6728 | |
6729 If the cache already contains all of the information that the scrolling | |
6730 routines happen to need so that it doesn't have to go generate it, then | |
6731 we are able to obey the Third Golden Rule of Redisplay. The first thing | |
6732 we do to help out the cache is to always add the displayed region. This | |
6733 region had to be generated anyway, so the cache ends up getting the | |
6734 information basically for free. In those cases where a user is simply | |
6735 scrolling around viewing a buffer there is a high probability that this | |
6736 is sufficient to always provide the needed information. The second | |
6737 thing we can do is be smart about invalidating the cache. | |
6738 | |
6739 TODO -- Be smart about invalidating the cache. Potential places: | |
6740 | |
6741 @itemize @bullet | |
6742 @item | |
6743 Insertions at end-of-line which don't cause line-wraps do not alter the | |
6744 starting positions of any display lines. These types of buffer | |
6745 modifications should not invalidate the cache. This is actually a large | |
6746 optimization for redisplay speed as well. | |
6747 @item | |
6748 Buffer modifications frequently only affect the display of lines at and | |
6749 below where they occur. In these situations we should only invalidate | |
6750 the part of the cache starting at where the modification occurs. | |
6751 @end itemize | |
6752 | |
6753 In case you're wondering, the Second Golden Rule of Redisplay is not | |
6754 applicable. | |
6755 | |
6756 @node Extents, Faces and Glyphs, The Redisplay Mechanism, Top | |
6757 @chapter Extents | |
6758 | |
6759 @menu | |
6760 * Introduction to Extents:: Extents are ranges over text, with properties. | |
6761 * Extent Ordering:: How extents are ordered internally. | |
6762 * Format of the Extent Info:: The extent information in a buffer or string. | |
6763 * Zero-Length Extents:: A weird special case. | |
6764 * Mathematics of Extent Ordering:: A rigorous foundation. | |
6765 * Extent Fragments:: Cached information useful for redisplay. | |
6766 @end menu | |
6767 | |
6768 @node Introduction to Extents | |
6769 @section Introduction to Extents | |
6770 | |
6771 Extents are regions over a buffer, with a start and an end position | |
6772 denoting the region of the buffer included in the extent. In | |
6773 addition, either end can be closed or open, meaning that the endpoint | |
6774 is or is not logically included in the extent. Insertion of a character | |
6775 at a closed endpoint causes the character to go inside the extent; | |
6776 insertion at an open endpoint causes the character to go outside. | |
6777 | |
6778 Extent endpoints are stored using memory indices (see @file{insdel.c}), | |
6779 to minimize the amount of adjusting that needs to be done when | |
6780 characters are inserted or deleted. | |
6781 | |
6782 (Formerly, extent endpoints at the gap could be either before or | |
6783 after the gap, depending on the open/closedness of the endpoint. | |
6784 The intent of this was to make it so that insertions would | |
6785 automatically go inside or out of extents as necessary with no | |
6786 further work needing to be done. It didn't work out that way, | |
6787 however, and just ended up complexifying and buggifying all the | |
6788 rest of the code.) | |
6789 | |
6790 @node Extent Ordering | |
6791 @section Extent Ordering | |
6792 | |
6793 Extents are compared using memory indices. There are two orderings | |
6794 for extents and both orders are kept current at all times. The normal | |
6795 or @dfn{display} order is as follows: | |
6796 | |
6797 @example | |
6798 Extent A is ``less than'' extent B, that is, earlier in the display order, | |
6799 if: A-start < B-start, | |
6800 or if: A-start = B-start, and A-end > B-end | |
6801 @end example | |
6802 | |
6803 So if two extents begin at the same position, the larger of them is the | |
6804 earlier one in the display order (@code{EXTENT_LESS} is true). | |
6805 | |
6806 For the e-order, the same thing holds: | |
6807 | |
6808 @example | |
6809 Extent A is ``less than'' extent B in e-order, that is, later in the buffer, | |
6810 if: A-end < B-end, | |
6811 or if: A-end = B-end, and A-start > B-start | |
6812 @end example | |
6813 | |
6814 So if two extents end at the same position, the smaller of them is the | |
6815 earlier one in the e-order (@code{EXTENT_E_LESS} is true). | |
6816 | |
6817 The display order and the e-order are complementary orders: any | |
6818 theorem about the display order also applies to the e-order if you swap | |
6819 all occurrences of ``display order'' and ``e-order'', ``less than'' and | |
6820 ``greater than'', and ``extent start'' and ``extent end''. | |
6821 | |
6822 @node Format of the Extent Info | |
6823 @section Format of the Extent Info | |
6824 | |
6825 An extent-info structure consists of a list of the buffer or string's | |
6826 extents and a @dfn{stack of extents} that lists all of the extents over | |
6827 a particular position. The stack-of-extents info is used for | |
6828 optimization purposes -- it basically caches some info that might | |
6829 be expensive to compute. Certain otherwise hard computations are easy | |
6830 given the stack of extents over a particular position, and if the | |
6831 stack of extents over a nearby position is known (because it was | |
6832 calculated at some prior point in time), it's easy to move the stack | |
6833 of extents to the proper position. | |
6834 | |
6835 Given that the stack of extents is an optimization, and given that | |
6836 it requires memory, a string's stack of extents is wiped out each | |
6837 time a garbage collection occurs. Therefore, any time you retrieve | |
6838 the stack of extents, it might not be there. If you need it to | |
6839 be there, use the @code{_force} version. | |
6840 | |
6841 Similarly, a string may or may not have an extent_info structure. | |
6842 (Generally it won't if there haven't been any extents added to the | |
6843 string.) So use the @code{_force} version if you need the extent_info | |
6844 structure to be there. | |
6845 | |
6846 A list of extents is maintained as a double gap array: one gap array | |
6847 is ordered by start index (the @dfn{display order}) and the other is | |
6848 ordered by end index (the @dfn{e-order}). Note that positions in an | |
6849 extent list should logically be conceived of as referring @emph{to} a | |
6850 particular extent (as is the norm in programs) rather than sitting | |
6851 between two extents. Note also that callers of these functions should | |
6852 not be aware of the fact that the extent list is implemented as an | |
6853 array, except for the fact that positions are integers (this should be | |
6854 generalized to handle integers and linked list equally well). | |
6855 | |
6856 @node Zero-Length Extents | |
6857 @section Zero-Length Extents | |
6858 | |
6859 Extents can be zero-length, and will end up that way if their endpoints | |
6860 are explicitly set that way or if their detachable property is nil | |
6861 and all the text in the extent is deleted. (The exception is open-open | |
6862 zero-length extents, which are barred from existing because there is | |
6863 no sensible way to define their properties. Deletion of the text in | |
6864 an open-open extent causes it to be converted into a closed-open | |
6865 extent.) Zero-length extents are primarily used to represent | |
6866 annotations, and behave as follows: | |
6867 | |
6868 @enumerate | |
6869 @item | |
6870 Insertion at the position of a zero-length extent expands the extent | |
6871 if both endpoints are closed; goes after the extent if it is closed-open; | |
6872 and goes before the extent if it is open-closed. | |
6873 | |
6874 @item | |
6875 Deletion of a character on a side of a zero-length extent whose | |
6876 corresponding endpoint is closed causes the extent to be detached if | |
6877 it is detachable; if the extent is not detachable or the corresponding | |
6878 endpoint is open, the extent remains in the buffer, moving as necessary. | |
6879 @end enumerate | |
6880 | |
6881 Note that closed-open, non-detachable zero-length extents behave | |
6882 exactly like markers and that open-closed, non-detachable zero-length | |
6883 extents behave like the ``point-type'' marker in Mule. | |
6884 | |
6885 @node Mathematics of Extent Ordering | |
6886 @section Mathematics of Extent Ordering | |
6887 @cindex extent mathematics | |
6888 @cindex mathematics of extents | |
6889 @cindex extent ordering | |
6890 | |
6891 @cindex display order of extents | |
6892 @cindex extents, display order | |
6893 The extents in a buffer are ordered by ``display order'' because that | |
6894 is that order that the redisplay mechanism needs to process them in. | |
6895 The e-order is an auxiliary ordering used to facilitate operations | |
6896 over extents. The operations that can be performed on the ordered | |
6897 list of extents in a buffer are | |
6898 | |
6899 @enumerate | |
6900 @item | |
6901 Locate where an extent would go if inserted into the list. | |
6902 @item | |
6903 Insert an extent into the list. | |
6904 @item | |
6905 Remove an extent from the list. | |
6906 @item | |
6907 Map over all the extents that overlap a range. | |
6908 @end enumerate | |
6909 | |
6910 (4) requires being able to determine the first and last extents | |
6911 that overlap a range. | |
6912 | |
6913 NOTE: @dfn{overlap} is used as follows: | |
6914 | |
6915 @itemize @bullet | |
6916 @item | |
6917 two ranges overlap if they have at least one point in common. | |
6918 Whether the endpoints are open or closed makes a difference here. | |
6919 @item | |
6920 a point overlaps a range if the point is contained within the | |
6921 range; this is equivalent to treating a point @math{P} as the range | |
6922 @math{[P, P]}. | |
6923 @item | |
6924 In the case of an @emph{extent} overlapping a point or range, the extent | |
6925 is normally treated as having closed endpoints. This applies | |
6926 consistently in the discussion of stacks of extents and such below. | |
6927 Note that this definition of overlap is not necessarily consistent with | |
6928 the extents that @code{map-extents} maps over, since @code{map-extents} | |
6929 sometimes pays attention to whether the endpoints of an extents are open | |
6930 or closed. But for our purposes, it greatly simplifies things to treat | |
6931 all extents as having closed endpoints. | |
6932 @end itemize | |
6933 | |
6934 First, define @math{>}, @math{<}, @math{<=}, etc. as applied to extents | |
6935 to mean comparison according to the display order. Comparison between | |
6936 an extent @math{E} and an index @math{I} means comparison between | |
6937 @math{E} and the range @math{[I, I]}. | |
6938 | |
6939 Also define @math{e>}, @math{e<}, @math{e<=}, etc. to mean comparison | |
6940 according to the e-order. | |
6941 | |
6942 For any range @math{R}, define @math{R(0)} to be the starting index of | |
6943 the range and @math{R(1)} to be the ending index of the range. | |
6944 | |
6945 For any extent @math{E}, define @math{E(next)} to be the extent directly | |
6946 following @math{E}, and @math{E(prev)} to be the extent directly | |
6947 preceding @math{E}. Assume @math{E(next)} and @math{E(prev)} can be | |
6948 determined from @math{E} in constant time. (This is because we store | |
6949 the extent list as a doubly linked list.) | |
6950 | |
6951 Similarly, define @math{E(e-next)} and @math{E(e-prev)} to be the | |
6952 extents directly following and preceding @math{E} in the e-order. | |
6953 | |
6954 Now: | |
6955 | |
6956 Let @math{R} be a range. | |
6957 Let @math{F} be the first extent overlapping @math{R}. | |
6958 Let @math{L} be the last extent overlapping @math{R}. | |
6959 | |
6960 Theorem 1: @math{R(1)} lies between @math{L} and @math{L(next)}, | |
6961 i.e. @math{L <= R(1) < L(next)}. | |
6962 | |
6963 This follows easily from the definition of display order. The | |
6964 basic reason that this theorem applies is that the display order | |
6965 sorts by increasing starting index. | |
6966 | |
6967 Therefore, we can determine @math{L} just by looking at where we would | |
6968 insert @math{R(1)} into the list, and if we know @math{F} and are moving | |
6969 forward over extents, we can easily determine when we've hit @math{L} by | |
6970 comparing the extent we're at to @math{R(1)}. | |
6971 | |
6972 @example | |
6973 Theorem 2: @math{F(e-prev) e< [1, R(0)] e<= F}. | |
6974 @end example | |
6975 | |
6976 This is the analog of Theorem 1, and applies because the e-order | |
6977 sorts by increasing ending index. | |
6978 | |
6979 Therefore, @math{F} can be found in the same amount of time as | |
6980 operation (1), i.e. the time that it takes to locate where an extent | |
6981 would go if inserted into the e-order list. | |
6982 | |
6983 If the lists were stored as balanced binary trees, then operation (1) | |
6984 would take logarithmic time, which is usually quite fast. However, | |
6985 currently they're stored as simple doubly-linked lists, and instead we | |
6986 do some caching to try to speed things up. | |
6987 | |
6988 Define a @dfn{stack of extents} (or @dfn{SOE}) as the set of extents | |
6989 (ordered in the display order) that overlap an index @math{I}, together | |
6990 with the SOE's @dfn{previous} extent, which is an extent that precedes | |
6991 @math{I} in the e-order. (Hopefully there will not be very many extents | |
6992 between @math{I} and the previous extent.) | |
6993 | |
6994 Now: | |
6995 | |
6996 Let @math{I} be an index, let @math{S} be the stack of extents on | |
6997 @math{I}, let @math{F} be the first extent in @math{S}, and let @math{P} | |
6998 be @math{S}'s previous extent. | |
6999 | |
7000 Theorem 3: The first extent in @math{S} is the first extent that overlaps | |
7001 any range @math{[I, J]}. | |
7002 | |
7003 Proof: Any extent that overlaps @math{[I, J]} but does not include | |
7004 @math{I} must have a start index @math{> I}, and thus be greater than | |
7005 any extent in @math{S}. | |
7006 | |
7007 Therefore, finding the first extent that overlaps a range @math{R} is | |
7008 the same as finding the first extent that overlaps @math{R(0)}. | |
7009 | |
7010 Theorem 4: Let @math{I2} be an index such that @math{I2 > I}, and let | |
7011 @math{F2} be the first extent that overlaps @math{I2}. Then, either | |
7012 @math{F2} is in @math{S} or @math{F2} is greater than any extent in | |
7013 @math{S}. | |
7014 | |
7015 Proof: If @math{F2} does not include @math{I} then its start index is | |
7016 greater than @math{I} and thus it is greater than any extent in | |
7017 @math{S}, including @math{F}. Otherwise, @math{F2} includes @math{I} | |
7018 and thus is in @math{S}, and thus @math{F2 >= F}. | |
7019 | |
7020 @node Extent Fragments | |
7021 @section Extent Fragments | |
7022 @cindex extent fragment | |
7023 | |
7024 Imagine that the buffer is divided up into contiguous, non-overlapping | |
7025 @dfn{runs} of text such that no extent starts or ends within a run | |
7026 (extents that abut the run don't count). | |
7027 | |
7028 An extent fragment is a structure that holds data about the run that | |
7029 contains a particular buffer position (if the buffer position is at the | |
7030 junction of two runs, the run after the position is used) -- the | |
7031 beginning and end of the run, a list of all of the extents in that run, | |
7032 the @dfn{merged face} that results from merging all of the faces | |
7033 corresponding to those extents, the begin and end glyphs at the | |
7034 beginning of the run, etc. This is the information that redisplay needs | |
7035 in order to display this run. | |
7036 | |
7037 Extent fragments have to be very quick to update to a new buffer | |
7038 position when moving linearly through the buffer. They rely on the | |
7039 stack-of-extents code, which does the heavy-duty algorithmic work of | |
7040 determining which extents overly a particular position. | |
7041 | |
7042 @node Faces and Glyphs, Specifiers, Extents, Top | |
7043 @chapter Faces and Glyphs | |
7044 | |
7045 Not yet documented. | |
7046 | |
7047 @node Specifiers, Menus, Faces and Glyphs, Top | |
7048 @chapter Specifiers | |
7049 | |
7050 Not yet documented. | |
7051 | |
7052 @node Menus, Subprocesses, Specifiers, Top | |
7053 @chapter Menus | |
7054 | |
7055 A menu is set by setting the value of the variable | |
7056 @code{current-menubar} (which may be buffer-local) and then calling | |
7057 @code{set-menubar-dirty-flag} to signal a change. This will cause the | |
7058 menu to be redrawn at the next redisplay. The format of the data in | |
7059 @code{current-menubar} is described in @file{menubar.c}. | |
7060 | |
7061 Internally the data in current-menubar is parsed into a tree of | |
7062 @code{widget_value's} (defined in @file{lwlib.h}); this is accomplished | |
7063 by the recursive function @code{menu_item_descriptor_to_widget_value()}, | |
7064 called by @code{compute_menubar_data()}. Such a tree is deallocated | |
7065 using @code{free_widget_value()}. | |
7066 | |
7067 @code{update_screen_menubars()} is one of the external entry points. | |
7068 This checks to see, for each screen, if that screen's menubar needs to | |
7069 be updated. This is the case if | |
7070 | |
7071 @enumerate | |
7072 @item | |
7073 @code{set-menubar-dirty-flag} was called since the last redisplay. (This | |
7074 function sets the C variable menubar_has_changed.) | |
7075 @item | |
7076 The buffer displayed in the screen has changed. | |
7077 @item | |
7078 The screen has no menubar currently displayed. | |
7079 @end enumerate | |
7080 | |
7081 @code{set_screen_menubar()} is called for each such screen. This | |
7082 function calls @code{compute_menubar_data()} to create the tree of | |
7083 widget_value's, then calls @code{lw_create_widget()}, | |
7084 @code{lw_modify_all_widgets()}, and/or @code{lw_destroy_all_widgets()} | |
7085 to create the X-Toolkit widget associated with the menu. | |
7086 | |
7087 @code{update_psheets()}, the other external entry point, actually | |
7088 changes the menus being displayed. It uses the widgets fixed by | |
7089 @code{update_screen_menubars()} and calls various X functions to ensure | |
7090 that the menus are displayed properly. | |
7091 | |
7092 The menubar widget is set up so that @code{pre_activate_callback()} is | |
7093 called when the menu is first selected (i.e. mouse button goes down), | |
7094 and @code{menubar_selection_callback()} is called when an item is | |
7095 selected. @code{pre_activate_callback()} calls the function in | |
7096 activate-menubar-hook, which can change the menubar (this is described | |
7097 in @file{menubar.c}). If the menubar is changed, | |
7098 @code{set_screen_menubars()} is called. | |
7099 @code{menubar_selection_callback()} enqueues a menu event, putting in it | |
7100 a function to call (either @code{eval} or @code{call-interactively}) and | |
7101 its argument, which is the callback function or form given in the menu's | |
7102 description. | |
7103 | |
7104 @node Subprocesses, Interface to X Windows, Menus, Top | |
7105 @chapter Subprocesses | |
7106 | |
7107 The fields of a process are: | |
7108 | |
7109 @table @code | |
7110 @item name | |
7111 A string, the name of the process. | |
7112 | |
7113 @item command | |
7114 A list containing the command arguments that were used to start this | |
7115 process. | |
7116 | |
7117 @item filter | |
7118 A function used to accept output from the process instead of a buffer, | |
7119 or @code{nil}. | |
7120 | |
7121 @item sentinel | |
7122 A function called whenever the process receives a signal, or @code{nil}. | |
7123 | |
7124 @item buffer | |
7125 The associated buffer of the process. | |
7126 | |
7127 @item pid | |
7128 An integer, the Unix process @sc{id}. | |
7129 | |
7130 @item childp | |
7131 A flag, non-@code{nil} if this is really a child process. | |
7132 It is @code{nil} for a network connection. | |
7133 | |
7134 @item mark | |
7135 A marker indicating the position of the end of the last output from this | |
7136 process inserted into the buffer. This is often but not always the end | |
7137 of the buffer. | |
7138 | |
7139 @item kill_without_query | |
7140 If this is non-@code{nil}, killing XEmacs while this process is still | |
7141 running does not ask for confirmation about killing the process. | |
7142 | |
7143 @item raw_status_low | |
7144 @itemx raw_status_high | |
7145 These two fields record 16 bits each of the process status returned by | |
7146 the @code{wait} system call. | |
7147 | |
7148 @item status | |
7149 The process status, as @code{process-status} should return it. | |
7150 | |
7151 @item tick | |
7152 @itemx update_tick | |
7153 If these two fields are not equal, a change in the status of the process | |
7154 needs to be reported, either by running the sentinel or by inserting a | |
7155 message in the process buffer. | |
7156 | |
7157 @item pty_flag | |
7158 Non-@code{nil} if communication with the subprocess uses a @sc{pty}; | |
7159 @code{nil} if it uses a pipe. | |
7160 | |
7161 @item infd | |
7162 The file descriptor for input from the process. | |
7163 | |
7164 @item outfd | |
7165 The file descriptor for output to the process. | |
7166 | |
7167 @item subtty | |
7168 The file descriptor for the terminal that the subprocess is using. (On | |
7169 some systems, there is no need to record this, so the value is | |
7170 @code{-1}.) | |
7171 | |
7172 @item tty_name | |
7173 The name of the terminal that the subprocess is using, | |
7174 or @code{nil} if it is using pipes. | |
7175 @end table | |
7176 | |
7177 @node Interface to X Windows, Index, Subprocesses, Top | |
7178 @chapter Interface to X Windows | |
7179 | |
7180 Not yet documented. | |
7181 | |
7182 @include index.texi | |
7183 | |
7184 @c Print the tables of contents | |
7185 @summarycontents | |
7186 @contents | |
7187 @c That's all | |
7188 | |
7189 @bye | |
7190 |