Mercurial > hg > xemacs-beta
diff man/internals/internals.texi @ 44:8d2a9b52c682 r19-15prefinal
Import from CVS: tag r19-15prefinal
author | cvs |
---|---|
date | Mon, 13 Aug 2007 08:55:10 +0200 |
parents | d620409f5eb8 |
children | ee648375d8d6 |
line wrap: on
line diff
--- a/man/internals/internals.texi Mon Aug 13 08:54:52 2007 +0200 +++ b/man/internals/internals.texi Mon Aug 13 08:55:10 2007 +0200 @@ -7,7 +7,7 @@ @ifinfo Copyright @copyright{} 1992 - 1996 Ben Wing. -Copyright @copyright{} 1996 Sun Microsystems. +Copyright @copyright{} 1996, 1997 Sun Microsystems. Copyright @copyright{} 1994, 1995 Free Software Foundation. Copyright @copyright{} 1994, 1995 Board of Trustees, University of Illinois. @@ -59,9 +59,10 @@ @titlepage @title XEmacs Internals Manual -@subtitle Version 1.0, March 1996 +@subtitle Version 1.1, March 1997 @author Ben Wing +@author Martin Buchholz @page @vskip 0pt plus 1fill @@ -72,8 +73,8 @@ Copyright @copyright{} 1994, 1995 Board of Trustees, University of Illinois. @sp 2 -Version 1.0 @* -March, 1996.@* +Version 1.1 @* +March, 1997.@* Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are @@ -870,7 +871,7 @@ operations that make XEmacs useful as an editor as well as just a Lisp environment, and also contains many add-on packages that allow XEmacs to browse directories, act as a mail and Usenet news reader, -compile Lisp code, etc. There is actually a lot more Lisp code than +compile Lisp code, etc. There is actually more Lisp code than C code associated with XEmacs, but much of the Lisp code is peripheral to the actual operation of the editor. The Lisp code all lies in subdirectories underneath the @file{lisp/} directory. @@ -889,15 +890,15 @@ programs that are used in connection with XEmacs. Some of them are used during the build process; others are used to perform certain functions that cannot conveniently be placed in the XEmacs executable (e.g. the -@file{movemail} program for fetching mail out of /var/spool/mail, which -must be setgid to @file{mail} on many systems; and the 'gnuclient' -program, which allows an external script to communicate with a running -XEmacs process). +@file{movemail} program for fetching mail out of @file{/var/spool/mail}, +which must be setgid to @file{mail} on many systems; and the +@file{gnuclient} program, which allows an external script to communicate +with a running XEmacs process). The @file{man/} directory contains the sources for the XEmacs documentation. It is mostly in a form called Texinfo, which can be -converted into either a printed document (by passing it through TeX) or -into on-line documentation called @dfn{info files}. +converted into either a printed document (by passing it through @TeX{}) +or into on-line documentation called @dfn{info files}. The @file{info/} directory contains the results of formatting the XEmacs documentation as @dfn{info files}, for on-line use. These files @@ -940,13 +941,12 @@ cause it to initialize itself, read in a number of basic Lisp files, and then dump itself out into a new executable called @file{xemacs}. This new executable has been pre-initialized and contains pre-digested Lisp -code that is necessary for the editor to function (this includes some -extremely basic Lisp functions, e.g. @code{not}, that can be defined in -terms of other Lisp primitives; some initialization code that is called -when certain objects, such as frames, are created; and all of the -standard keybindings and code for the actions they result in). This -executable, @file{xemacs}, is the executable that you run to use the -XEmacs editor. +code that is necessary for the editor to function (this includes most +basic Lisp functions, e.g. @code{not}, that can be defined in terms of +other Lisp primitives; some initialization code that is called when +certain objects, such as frames, are created; and all of the standard +keybindings and code for the actions they result in). This executable, +@file{xemacs}, is the executable that you run to use the XEmacs editor. @node XEmacs From the Inside, The XEmacs Object System (Abstractly Speaking), XEmacs From the Perspective of Building, Top @chapter XEmacs From the Inside @@ -954,7 +954,7 @@ Internally, XEmacs is quite complex, and can be very confusing. To simplify things, it can be useful to think of XEmacs as containing an event loop that ``drives'' everything, and a number of other subsystems, -such as a Lisp engine and a redisplay mechanism. Each of these others +such as a Lisp engine and a redisplay mechanism. Each of these other subsystems exists simultaneously in XEmacs, and each has a certain state. The flow of control continually passes in and out of these different subsystems in the course of normal operation of the editor. @@ -986,8 +986,8 @@ The buffer mechanism is responsible for keeping track of what buffers exist and what text is in them. It is periodically given commands (usually from the user) to insert or delete text, create a buffer, etc. -When it receives a textual-change command, it tells the redisplay -mechanism about this. +When it receives a text-change command, it notifies the redisplay +mechanism. @item The redisplay mechanism is responsible for making sure that windows and @@ -1183,7 +1183,7 @@ XEmacs Lisp also contains numerous specialized objects used to implement the editor: -@table @asis +@table @code @item buffer Stores text like a string, but is optimized for insertion and deletion and has certain other properties that can be set. @@ -1232,7 +1232,7 @@ There are some other, less-commonly-encountered general objects: -@table @asis +@table @code @item hashtable An object that maps from an arbitrary Lisp object to another arbitrary Lisp object, using hashing for fast lookup. @@ -1256,7 +1256,7 @@ And some strange special-purpose objects: -@table @asis +@table @code @item charset @itemx coding-system Objects used when MULE, or multi-lingual/Asian-language, support is @@ -1370,17 +1370,18 @@ @end example (where @samp{^[} actually is an @samp{ESC} character) converts to a -particular Kanji character. (To decode this gook: @samp{ESC} begins an -escape sequence; @samp{ESC $ (} is a class of escape sequences meaning -``switch to a 94x94 character set''; @samp{ESC $ ( B} means ``switch to -Japanese Kanji''; @samp{#} and @samp{&} collectively index into a -94-by-94 array of characters [subtract 33 from the ASCII value of each -character to get the corresponding index]; @samp{ESC (} is a class of -escape sequences meaning ``switch to a 94 character set''; @samp{ESC (B} -means ``switch to US ASCII''. It is a coincidence that the letter -@samp{B} is used to denote both Japanese Kanji and US ASCII. If the -first @samp{B} were replaced with an @samp{A}, you'd be requesting a -Chinese Hanzi character from the GB2312 character set.) +particular Kanji character when using an ISO2022-based coding system for +input. (To decode this gook: @samp{ESC} begins an escape sequence; +@samp{ESC $ (} is a class of escape sequences meaning ``switch to a +94x94 character set''; @samp{ESC $ ( B} means ``switch to Japanese +Kanji''; @samp{#} and @samp{&} collectively index into a 94-by-94 array +of characters [subtract 33 from the ASCII value of each character to get +the corresponding index]; @samp{ESC (} is a class of escape sequences +meaning ``switch to a 94 character set''; @samp{ESC (B} means ``switch +to US ASCII''. It is a coincidence that the letter @samp{B} is used to +denote both Japanese Kanji and US ASCII. If the first @samp{B} were +replaced with an @samp{A}, you'd be requesting a Chinese Hanzi character +from the GB2312 character set.) @example "foobar" @@ -1513,7 +1514,7 @@ @cindex record type Note that there are only eight types that the tag can represent, but many more actual types than this. This is handled by having -one of the tag types specify a meta-object called a @dfn{record}; +one of the tag types specify a meta-type called a @dfn{record}; for all such objects, the first four bytes of the pointed-to structure indicate what the actual type is. @@ -1537,10 +1538,11 @@ the proper mask. Then, pointers retrieved from Lisp objects are automatically OR'ed with this value prior to being used. - A corollary of the previous paragraph is that @strong{stack-allocated -structures cannot be put into Lisp objects}. The stack is generally -located near the top of memory; if you put such a pointer into a Lisp -object, it will get its top bits chopped off, and you will lose. + A corollary of the previous paragraph is that @strong{(pointers to) +stack-allocated structures cannot be put into Lisp objects}. The stack +is generally located near the top of memory; if you put such a pointer +into a Lisp object, it will get its top bits chopped off, and you will +lose. Various macros are used to construct Lisp objects and extract the components. Macros of the form @code{XINT()}, @code{XCHAR()}, @@ -1565,17 +1567,17 @@ in a pointer being dereferenced as the wrong type of structure, with unpredictable (and sometimes not easily traceable) results. - There are similar @code{XSET()} macros that construct a Lisp object. -These macros are of the form @code{XSET (@var{lvalue}, @var{result})}, + There are similar @code{XSET@var{TYPE}()} macros that construct a Lisp object. +These macros are of the form @code{XSET@var{TYPE} (@var{lvalue}, @var{result})}, i.e. they have to be a statement rather than just used in an expression. The reason for this is that standard C doesn't let you ``construct'' a structure (but GCC does). Granted, this sometimes isn't too convenient; for the case of integers, at least, you can use the function @code{make_number()}, which constructs and @emph{returns} an integer -Lisp object. Note that the @code{XSET()} macros are also affected by -@code{ERROR_CHECK_TYPECHECK} and make sure that the structure is of the right -type in the case of record types, where the type is contained in -the structure. +Lisp object. Note that the @code{XSET@var{TYPE}()} macros are also +affected by @code{ERROR_CHECK_TYPECHECK} and make sure that the +structure is of the right type in the case of record types, where the +type is contained in the structure. @node Rules When Writing New C Code, A Summary of the Various XEmacs Modules, How Lisp Objects Are Represented in C, Top @chapter Rules When Writing New C Code @@ -1603,9 +1605,9 @@ declares any global Lisp variables you have added and initializes global C variables in the module. For each such function, declare it in @file{symsinit.h} and make sure it's called in the appropriate place in -@code{main()}. @strong{Important}: There are stringent requirements on +@file{emacs.c}. @strong{Important}: There are stringent requirements on exactly what can go into these functions. See the comment in -@code{main()}. The reason for this is to avoid obscure unwanted +@file{emacs.c}. The reason for this is to avoid obscure unwanted interactions during initialization. If you don't follow these rules, you'll be sorry! If you want to do anything that isn't allowed, create a @code{complex_vars_of_*()} function for it. Doing this is tricky, @@ -1613,7 +1615,8 @@ so that all the initialization dependencies work out. Every module includes @file{<config.h>} (angle brackets so that -@samp{--srcdir} works correctly) and @file{lisp.h}. @file{config.h} +@samp{--srcdir} works correctly; @file{config.h} may or may not be in +the same directory as the C sources) and @file{lisp.h}. @file{config.h} should always be included before any other header files (including system header files) to ensure that certain tricks played by various @file{s/} and @file{m/} files work out correctly. @@ -1672,42 +1675,33 @@ @cindex garbage collection protection @smallexample @group -DEFUN ("or", For, Sor, 0, UNEVALLED, 0 /* +DEFUN ("or", For, 0, UNEVALLED, 0, /* Eval args until one of them yields non-nil, then return that value. The remaining args are not evalled at all. -@end group -@group If all args return nil, return nil. -*/ ) - (args) - Lisp_Object args; +*/ + (args)) @{ /* This function can GC */ REGISTER Lisp_Object val; Lisp_Object args_left; struct gcpro gcpro1; -@end group - -@group + if (NILP (args)) return Qnil; args_left = args; GCPRO1 (args_left); -@end group - -@group + do @{ val = Feval (Fcar (args_left)); if (!NILP (val)) - break; + break; args_left = Fcdr (args_left); @} while (!NILP (args_left)); -@end group - -@group + UNGCPRO; return val; @} @@ -1718,23 +1712,25 @@ @code{DEFUN} macro. Here is a template for them: @example -DEFUN (@var{lname}, @var{fname}, @var{sname}, @var{min}, @var{max}, @var{interactive} /* @var{doc} */ ) +DEFUN (@var{lname}, @var{fname}, @var{min}, @var{max}, @var{interactive}, /* +@var{docstring} +*/ + (@var{arglist}) ) @end example @table @var @item lname -This is the name of the Lisp symbol to define as the function name; in -the example above, it is @code{or}. +This string is the name of the Lisp symbol to define as the function +name; in the example above, it is @code{"or"}. @item fname -This is the C function name for this function. This is -the name that is used in C code for calling the function. The name is, -by convention, @samp{F} prepended to the Lisp name, with all dashes -(@samp{-}) in the Lisp name changed to underscores. Thus, to call this -function from C code, call @code{For}. Remember that the arguments must -be of type @code{Lisp_Object}; various macros and functions for creating -values of type @code{Lisp_Object} are declared in the file -@file{lisp.h}. +This is the C function name for this function. This is the name that is +used in C code for calling the function. The name is, by convention, +@samp{F} prepended to the Lisp name, with all dashes (@samp{-}) in the +Lisp name changed to underscores. Thus, to call this function from C +code, call @code{For}. Remember that the arguments are of type +@code{Lisp_Object}; various macros and functions for creating values of +type @code{Lisp_Object} are declared in the file @file{lisp.h}. Primitives whose names are special characters (e.g. @code{+} or @code{<}) are named by spelling out, in some fashion, the special @@ -1743,13 +1739,13 @@ characters are spelled out in some creative way, e.g. @code{let*} becomes @code{FletX()}. -@item sname -This is a C variable name to use for a structure that holds the data for +Each function also has an associated structure that holds the data for the subr object that represents the function in Lisp. This structure conveys the Lisp symbol name to the initialization routine that will -create the symbol and store the subr object as its definition. By -convention, this name is always @var{fname} with @samp{F} replaced with -@samp{S}. +create the symbol and store the subr object as its definition. The C +variable name of this structure is always @samp{S} prepended to the +@var{fname}. You hardly ever need to be aware of the existence of this +structure. @item min This is the minimum number of arguments that the function requires. The @@ -1762,8 +1758,8 @@ @code{MANY}, indicating an unlimited number of evaluated arguments (the equivalent of @code{&rest}). Both @code{UNEVALLED} and @code{MANY} are macros. If @var{max} is a number, it may not be less than @var{min} and -it may not be greater than 12. (If you need to add a function with -more than 12 arguments, either use the @code{MANY} form or edit the +it may not be greater than 8. (If you need to add a function with +more than 8 arguments, either use the @code{MANY} form or edit the definition of @code{DEFUN} in @file{lisp.h}. If you do the latter, make sure to also add another clause to the switch statement in @code{primitive_funcall().}) @@ -1775,46 +1771,51 @@ called interactively. A value of @code{""} indicates a function that should receive no arguments when called interactively. -@item doc +@item docstring This is the documentation string. It is written just like a -documentation string for a function defined in Lisp; in particular, -the first line should be a single sentence. Note how the documentation -string is enclosed in a comment, none of the documentation is placed -on the same lines as the comment-start and comment-end characters, and -the comment-start characters are on the same line as the interactive +documentation string for a function defined in Lisp; in particular, the +first line should be a single sentence. Note how the documentation +string is enclosed in a comment, none of the documentation is placed on +the same lines as the comment-start and comment-end characters, and the +comment-start characters are on the same line as the interactive specification. @file{make-docfile}, which scans the C files for -documentation strings, is very particular about what it looks for, -and will not properly note the doc string if it's not in this exact -format. -@end table - - You are free to put the various arguments to @code{DEFUN} on separate +documentation strings, is very particular about what it looks for, and +will not properly extract the doc string if it's not in this exact format. + +You are free to put the various arguments to @code{DEFUN} on separate lines to avoid overly long lines. However, make sure to put the comment-start characters for the doc string on the same line as the -interactive specification, and put a newline directly after them -(and before the comment-end characters). - - After the call to the @code{DEFUN} macro, you must write the argument -name list that every C function must have, followed by ordinary C -declarations for the arguments. For a function with a fixed maximum -number of arguments, declare a C argument for each Lisp argument, and -give them all type @code{Lisp_Object}. When a Lisp function has no -upper limit on the number of arguments, its implementation in C actually -receives exactly two arguments: the first is the number of Lisp -arguments, and the second is the address of a block containing their -values. They have types @code{int} and @w{@code{Lisp_Object *}}. - - The names of the C arguments will be used as the names of the arguments -to the Lisp primitive as displayed in its documentation, modulo the -same concerns described above for @code{F...} names (in particular, +interactive specification, and put a newline directly after them (and +before the comment-end characters). + +@item arglist +This is the comma-separated list of arguments to the C function. For a +function with a fixed maximum number of arguments, provide a C argument +for each Lisp argument. In this case, unlike regular C functions, the +types of the arguments are not declared; they are simply always of type +@code{Lisp_Object}. + +The names of the C arguments will be used as the names of the arguments +to the Lisp primitive as displayed in its documentation, modulo the same +concerns described above for @code{F...} names (in particular, underscores in the C arguments become dashes in the Lisp arguments). There is one additional kludge: A C argument called @code{defalt} becomes the Lisp argument @code{default}. This deliberate misspelling is done because @code{default} is a reserved word in the C language. - Note that you @emph{must} use old-style prototypes for the arguments -to @code{DEFUN}, even though all other functions in the C code use -new-style prototypes. +A Lisp function with @w{@var{max} = @code{UNEVALLED}} is a +@w{@dfn{special form}}; its arguments are not evaluated. Instead it +receives one argument of type @code{Lisp_Object}, a (Lisp) list of the +unevaluated arguments, conventionally named @code{(args)}. + +When a Lisp function has no upper limit on the number of arguments, +specify @w{@var{max} = @code{MANY}}. In this case its implementation in +C actually receives exactly two arguments: the number of Lisp arguments +(an @code{int}) and the address of a block containing their values (a +@w{@code{Lisp_Object *}}). In this case only are the C types specified +in the @var{arglist}: @w{@code{(int nargs, Lisp_Object *args)}}. + +@end table Within the function @code{For} itself, note the use of the macros @code{GCPRO1} and @code{UNGCPRO}. @code{GCPRO1} is used to ``protect'' @@ -1851,14 +1852,14 @@ @xref{GCPROing, ,@code{GCPRO}ing}, for full details on how to do this. What @code{DEFUN} actually does is declare a global structure of -type @code{Lisp_Subr} whose name begins with a capital @samp{S} and +type @code{Lisp_Subr} whose name begins with capital @samp{SF} and which contains information about the primitive (e.g. a pointer to the function, its minimum and maximum allowed arguments, a string describing its Lisp name); @code{DEFUN} then begins a normal C function declaration using the @code{F...} name. The Lisp subr object that is the function definition of a primitive (i.e. the object in the function slot of the symbol that names the primitive) actually points to this -@samp{S} structure; when @code{Feval} encounters a subr, it looks in the +@samp{SF} structure; when @code{Feval} encounters a subr, it looks in the structure to find out how to call the C function. Defining the C function is not enough to make a Lisp primitive @@ -1868,21 +1869,21 @@ be seen by Lisp code.) The code looks like this: @example -defsubr (&@var{subr-structure-name}); +DEFSUBR (@var{fname}); @end example -@noindent -Here @var{subr-structure-name} is the name you used as the third -argument to @code{DEFUN}. - - This call to @code{defsubr} should go in the @code{syms_of_*()} +@noindent +Here @var{fname} is the name you used as the second argument to +@code{DEFUN}. + + This call to @code{DEFSUBR} should go in the @code{syms_of_*()} function at the end of the module. If no such function exists, create it and make sure to also declare it in @file{symsinit.h} and call it from the appropriate spot in @code{main()}. @xref{General Coding Rules}. Note that C code cannot call functions by name unless they are defined -in C. The way to call a function written in Lisp is to use +in C. The way to call a function written in Lisp from C is to use @code{Ffuncall}, which embodies the Lisp function @code{funcall}. Since the Lisp function @code{funcall} accepts an unlimited number of arguments, in C it takes two: the number of Lisp-level arguments, and a @@ -2108,7 +2109,7 @@ after @file{lastfile.c} because they contain various structures that must be statically initialized and into which Xt writes at various times.) @file{pre-crt0.c} and @file{lastfile.c} contain exported symbols -that are used to determine the start and end of XEmacs's initialized +that are used to determine the start and end of XEmacs' initialized data space when dumping. @@ -3092,13 +3093,13 @@ 9918 casetab.c @end example -@file{chartab.c} and @file{chartab.h} implement the char table Lisp -object type, which maps from characters or certain sorts of character -ranges to Lisp objects. The implementation of this object is optimized -for the internal representation of characters. Char tables come in -different types, which affect the allowed object types to which a -character can be mapped and also dictate certain other properties of the -char table. +@file{chartab.c} and @file{chartab.h} implement the @dfn{char table} +Lisp object type, which maps from characters or certain sorts of +character ranges to Lisp objects. The implementation of this object +type is optimized for the internal representation of characters. Char +tables come in different types, which affect the allowed object types to +which a character can be mapped and also dictate certain other +properties of the char table. @cindex case table @file{casetab.c} implements one sort of char table, the @dfn{case @@ -3114,13 +3115,13 @@ @end example @cindex scanner -This module implements syntax tables, another sort of char table that -maps characters into syntax classes that define the syntax of these -characters (e.g. a parenthesis belongs to a class of @samp{open} characters -that have corresponding @samp{close} characters and can be nested). -This module also implements the Lisp @dfn{scanner}, a set of primitives -for scanning over text based on syntax tables. This is used, for -example, to find the matching parenthesis in a command such as +This module implements @dfn{syntax tables}, another sort of char table +that maps characters into syntax classes that define the syntax of these +characters (e.g. a parenthesis belongs to a class of @samp{open} +characters that have corresponding @samp{close} characters and can be +nested). This module also implements the Lisp @dfn{scanner}, a set of +primitives for scanning over text based on syntax tables. This is used, +for example, to find the matching parenthesis in a command such as @code{forward-sexp}, and by @file{font-lock.c} to locate quoted strings, comments, etc. @@ -3682,9 +3683,9 @@ @file{mule-coding.*} implements the @dfn{coding-system} Lisp object type, which encapsulates a method of converting between different -encodings. An encoding is a representation of a stream of characters -from multiple character sets using a stream of bytes or words and -defines (e.g.) which escape sequences are used to specify particular +encodings. An encoding is a representation of a stream of characters, +possibly from multiple character sets, using a stream of bytes or words, +and defines (e.g.) which escape sequences are used to specify particular character sets, how the indices for a character are converted into bytes (sometimes this involves setting the high bit; sometimes complicated rearranging of the values takes place, as in the Shift-JIS encoding), @@ -3696,11 +3697,14 @@ @file{mule-canna.c} and @file{mule-wnnfns.c} implement interfaces to external programs used to implement the Canna and WNN input methods, -respectively. This is currently broken. - -@file{mule-mcpatch.c} provides some functions to allow for pathnames -containing extended characters. This code is fragmentary and completely -non-working. +respectively. This is currently in beta. + +@file{mule-mcpath.c} provides some functions to allow for pathnames +containing extended characters. This code is fragmentary, obsolete, and +completely non-working. Instead, @var{pathname-coding-system} is used +to specify conversions of names of files and directories. The standard +C I/O functions like @samp{open()} are wrapped so that conversion occurs +automatically. @file{mule.c} provides a few miscellaneous things that should probably be elsewhere. @@ -3779,7 +3783,8 @@ (a) Those for whom the value directly represents the contents of the Lisp object. Only two types are in this category: integers and characters. No special allocation or garbage collection is necessary -for such objects. +for such objects. Lisp objects of these types do not need to be +@code{GCPRO}ed. @end itemize In the remaining three categories, the value is a pointer to a @@ -3949,7 +3954,7 @@ @item Any shadowed bindings that are sitting on the specpdl stack. @item -Any objects sitting in currently active stack frames, +Any objects sitting in currently active (Lisp) stack frames, catches, and condition cases. @item A couple of special-case places where active objects are @@ -3998,7 +4003,7 @@ @item @strong{Strings are relocated.} What this means in practice is that the -pointer obtained using @code{string_data()} is liable to change at any +pointer obtained using @code{XSTRING_DATA()} is liable to change at any time, and you should never keep it around past any function call, or pass it as an argument to any function that might cause a garbage collection. This is why a number of functions accept either a @@ -4039,7 +4044,7 @@ @item Beware of @code{GCPRO}ing something that is uninitialized. If you have -any shade of doubt about this, initialize all your variables to Qnil. +any shade of doubt about this, initialize all your variables to @code{Qnil}. @item Be careful of traps, like calling @code{Fcons()} in the argument to @@ -4448,10 +4453,10 @@ As mentioned above, each vector is @code{malloc()}ed individually, and all are threaded through the variable @code{all_vectors}. Vectors are marked strangely during garbage collection, by kludging the size field. -Note that the @code{struct Lisp_Vector} is declared with its contents -being an array of one element. It is actually @code{malloc()}ed with -the right size, however, and access to any element through the contents -array works fine. +Note that the @code{struct Lisp_Vector} is declared with its +@code{contents} field being a @emph{stretchy} array of one element. It +is actually @code{malloc()}ed with the right size, however, and access +to any element through the @code{contents} array works fine. @node Bit Vector @section Bit Vector @@ -4934,20 +4939,22 @@ @code{next-command-event} and @code{read-char} are higher-level interfaces to @code{next-event}. @code{next-command-event} gets the -next @dfn{command} event (i.e. keypress, mouse event, or menu -selection), calling dispatch-event on any others. @code{read-char} -calls @code{next-command-event} and uses @code{event_to_character()} to -return the ASCII equivalent. +next @dfn{command} event (i.e. keypress, mouse event, menu selection, +or scrollbar action), calling @code{dispatch-event} on any others. +@code{read-char} calls @code{next-command-event} and uses +@code{event_to_character()} to return the character equivalent. With +the right kind of input method support, it is possible for (read-char) +to return a Kanji character. @node Converting Events @section Converting Events @code{character_to_event()}, @code{event_to_character()}, @code{event-to-character}, and @code{character-to-event} convert between -ASCII characters and keypresses corresponding to the characters. If the +characters and keypress events corresponding to the characters. If the event was not a keypress, @code{event_to_character()} returns -1 and @code{event-to-character} returns @code{nil}. These functions convert -between ASCII representation and the split-up event representation +between character representation and the split-up event representation (keysym plus mod keys). @node Dispatching Events; The Command Builder @@ -4993,21 +5000,21 @@ At this point, the function to be called is determined by looking at the car of the cons (if this is a symbol, its function definition is retrieved and the process repeated). The function should then consist -of either a Lisp_Subr (built-in function), a Lisp_Compiled object, or a -cons whose car is the symbol @code{autoload}, @code{macro}, -@code{lambda}, or @code{mocklisp}. - - If the function is a Lisp_Subr, the lisp object points to a struct -Lisp_Subr (created by @code{DEFUN()}), which contains a pointer to the C -function, a minimum and maximum number of arguments (possibly the -special constants @code{MANY} or @code{UNEVALLED}), a pointer to the -symbol referring to that subr, and a couple of other things. If the -subr wants its arguments @code{UNEVALLED}, they are passed raw as a -list. Otherwise, an array of evaluated arguments is created and put -into the backtrace structure, and either passed whole (@code{MANY}) or -each argument is passed as a C argument. - - If the function is a Lisp_Compiled object or a lambda, +of either a @code{Lisp_Subr} (built-in function), a +@code{Lisp_Compiled_Function} object, or a cons whose car is the symbol +@code{autoload}, @code{macro}, @code{lambda}, or @code{mocklisp}. + +If the function is a @code{Lisp_Subr}, the lisp object points to a +@code{struct Lisp_Subr} (created by @code{DEFUN()}), which contains a +pointer to the C function, a minimum and maximum number of arguments +(possibly the special constants @code{MANY} or @code{UNEVALLED}), a +pointer to the symbol referring to that subr, and a couple of other +things. If the subr wants its arguments @code{UNEVALLED}, they are +passed raw as a list. Otherwise, an array of evaluated arguments is +created and put into the backtrace structure, and either passed whole +(@code{MANY}) or each argument is passed as a C argument. + + If the function is a @code{Lisp_Compiled_Function} object or a lambda, @code{apply_lambda()} is called. If the function is a macro, [..... fill in] is done. If the function is an autoload, @code{do_autoload()} is called to load the definition and then eval @@ -5027,8 +5034,8 @@ function and binds them to the actual arguments, checking for @code{&rest} and @code{&optional} symbols in the formal arguments and making sure the number of actual arguments is correct. Then either -progn or byte-code is called to actually execute the body and return a -value. +@code{progn} or @code{byte-code} is called to actually execute the body +and return a value. @code{Ffuncall()} implements Lisp @code{funcall}. @code{(funcall fun x1 x2 x3 ...)} is equivalent to @code{(eval (list fun (quote x1) (quote @@ -5076,27 +5083,27 @@ @code{record_unwind_protect()} implements an @dfn{unwind-protect}, which, when placed around a section of code, ensures that some specified cleanup routine will be executed even if the code exits abnormally -(e.g. through a throw or quit). @code{record_unwind_protect()} simply -adds a new specbinding to the specpdl array and stores the appropriate -information in it. The cleanup routine can either be a C function, -which is stored in the @code{func} field, or a progn form, which is stored in -the @code{old_value} field. +(e.g. through a @code{throw} or quit). @code{record_unwind_protect()} +simply adds a new specbinding to the specpdl array and stores the +appropriate information in it. The cleanup routine can either be a C +function, which is stored in the @code{func} field, or a @code{progn} +form, which is stored in the @code{old_value} field. @code{unbind_to()} removes specbindings from the specpdl array until -the specified position is reached. The specbinding can be one of three +the specified position is reached. Each specbinding can be one of three types: @enumerate @item -an unwind-protect with a C cleanup function (@code{func} is not 0 -- +an unwind-protect with a C cleanup function (@code{func} is not 0, and @code{old_value} holds an argument to be passed to the function); @item -an unwind-protect with a Lisp form (@code{func} is 0 and @code{symbol} -is @code{nil} -- @code{old_value} holds the form to be executed with +an unwind-protect with a Lisp form (@code{func} is 0, @code{symbol} +is @code{nil}, and @code{old_value} holds the form to be executed with @code{Fprogn()}); or @item -a local-variable binding (@code{func} is 0 and @code{symbol} is not -@code{nil} -- @code{old_value} holds the old value, which is stored as +a local-variable binding (@code{func} is 0, @code{symbol} is not +@code{nil}, and @code{old_value} holds the old value, which is stored as the symbol's value). @end enumerate @@ -5257,8 +5264,9 @@ you can explicitly create a symbol using @code{make-symbol}, giving it some name. The resulting symbol is not in any obarray (i.e. it is @dfn{uninterned}), and you can't add it to any obarray. Therefore its -primary purpose is as a carrier of information. (Cons cells could -probably be used just as well.) +primary purpose is as a symbol to use in macros to avoid namespace +pollution. It can also be used as a carrier of information, but cons +cells could probably be used just as well. You can also use @code{intern-soft} to look up a symbol but not create a new one, and @code{unintern} to remove a symbol from an obarray. This @@ -5330,7 +5338,7 @@ @enumerate @item -Buffers are @dfn{permanent} objects, i.e. one you create them, they +Buffers are @dfn{permanent} objects, i.e. once you create them, they remain around, and need to be explicitly deleted before they go away. @item Each buffer has a unique name, which is a string. Buffers are @@ -5364,8 +5372,8 @@ gets restored when the code is finished). However, calling @code{set-buffer} will NOT cause a permanent change in the current buffer. The reason for this is that the top-level event loop sets -current buffer to the buffer of the selected window, each time it -finishes executing a user command. +@code{current_buffer} to the buffer of the selected window, each time +it finishes executing a user command. @end enumerate Make sure you understand the distinction between @dfn{current buffer} @@ -5388,10 +5396,10 @@ For now, we can view a character as some non-negative integer that has some shape that defines how it typically appears (e.g. as an -uppercase A). (The exact way in which a character appears depends -on the font of the character.) The internal type of characters in -the C code is an Emchar; this is just an int, but using a symbolic -type makes the code clearer. +uppercase A). (The exact way in which a character appears depends on the +font used to display the character.) The internal type of characters in +the C code is an @code{Emchar}; this is just an @code{int}, but using a +symbolic type makes the code clearer. Between every character in a buffer is a @dfn{buffer position} or @dfn{character position}. We can speak of the character before or after @@ -5447,7 +5455,7 @@ heap, taking up virtual memory, and will not be released back to the operating system. (However, if you have compiled XEmacs with rel-alloc, the situation is different. In this case, the space @emph{will} be -released back to the operating system. However, this tends to effect a +released back to the operating system. However, this tends to result in a noticeable speed penalty.) Astute readers may notice that the text in a buffer is represented as @@ -5500,7 +5508,7 @@ @dfn{memory indices}, typedef @code{Memind} @end enumerate - All three typedefs are just ints, but defining them this way makes + All three typedefs are just @code{int}s, but defining them this way makes things a lot clearer. Most code works with buffer positions. In particular, all Lisp code @@ -5511,7 +5519,7 @@ @code{Bufbyte}, which is an unsigned char. Referring to them as Bufbytes underscores the fact that we are working with a string of bytes in the internal Emacs buffer representation rather than in one of a -number of possible alternative representations (e.g. EUC-coded text, +number of possible alternative representations (e.g. EUC-encoded text, etc.). @node Buffer Lists @@ -5825,33 +5833,34 @@ @node Japanese EUC (Extended Unix Code) @subsection Japanese EUC (Extended Unix Code) - This encompasses the character sets Printing-ASCII, Japanese (aka -JISX0208), and Japanese-Kana (half-width katakana, the right half of +This encompasses the character sets Printing-ASCII, Japanese-JISSX0201, +and Japanese-JISX0208-Kana (half-width katakana, the right half of JISX0201). It uses 8-bit bytes. - Note that Printing-ASCII and Japanese-Kana are 94-character charsets, -while Japanese is a 94x94-character charset. - - The encoding is as follows: +Note that Printing-ASCII and Japanese-JISX0201-Kana are 94-character +charsets, while Japanese-JISX0208 is a 94x94-character charset. + +The encoding is as follows: @example -Character set Representation (PC=position-code) -------------- -------------- -Printing-ASCII PC1 -Japanese PC1 + 0x80 | PC2 + 0x80 -Japanese-Kana 0x8E | PC1 + 0x80 +Character set Representation (PC=position-code) +------------- -------------- +Printing-ASCII PC1 +Japanese-JISX0201-Kana 0x8E | PC1 + 0x80 +Japanese-JISX0208 PC1 + 0x80 | PC2 + 0x80 +Japanese-JISX0212 PC1 + 0x80 | PC2 + 0x80 @end example @node JIS7 @subsection JIS7 - This encompasses the character sets Printing-ASCII, -Japanese-Roman (the left half of JISX0201; this character -set is very similar to Printing-ASCII and is a 94-character -charset), Japanese, and Japanese-Kana. It uses 7-bit bytes. - - Unlike Japanese EUC, this is a @dfn{modal} encoding, which +This encompasses the character sets Printing-ASCII, +Japanese-JISX0201-Roman (the left half of JISX0201; this character set +is very similar to Printing-ASCII and is a 94-character charset), +Japanese-JISX0208, and Japanese-JISX0201-Kana. It uses 7-bit bytes. + +Unlike Japanese EUC, this is a @dfn{modal} encoding, which means that there are multiple states that the encoding can be in, which affect how the bytes are to be interpreted. Special sequences of bytes (called @dfn{escape sequences}) @@ -5860,19 +5869,19 @@ The encoding is as follows: @example -Character set Representation (PC=position-code) -------------- -------------- -Printing-ASCII PC1 -Japanese-Roman PC1 -Japanese PC1 PC2 -Japanese-Kana PC1 +Character set Representation (PC=position-code) +------------- -------------- +Printing-ASCII PC1 +Japanese-JISX0201-Roman PC1 +Japanese-JISX0201-Kana PC1 +Japanese-JISX0208 PC1 PC2 Escape sequence ASCII equivalent Meaning --------------- ---------------- ------- -0x1B 0x28 0x4A ESC ( J invoke Japanese-Roman -0x1B 0x24 0x42 ESC $ B invoke Japanese -0x1B 0x28 0x49 ESC ( I invoke Japanese-Kana +0x1B 0x28 0x4A ESC ( J invoke Japanese-JISX0201-Roman +0x1B 0x28 0x49 ESC ( I invoke Japanese-JISX0201-Kana +0x1B 0x24 0x42 ESC $ B invoke Japanese-JISX0208 0x1B 0x28 0x42 ESC ( B invoke Printing-ASCII @end example @@ -5881,16 +5890,15 @@ @node Internal Mule Encodings @section Internal Mule Encodings - In XEmacs/Mule, each character set is assigned a unique number, -called a @dfn{leading byte}. This is used in the encodings of a -character. Leading bytes are in the range 0x80 - 0xFF -(except for ASCII, which has a leading byte of 0), although -some leading bytes are reserved. - - Charsets whose leading byte is in the range 0x80 - 0x9F are -called @dfn{official} and are used for built-in charsets. -Other charsets are called @dfn{private} and have leading bytes -in the range 0xA0 - 0xFF; these are user-defined charsets. +In XEmacs/Mule, each character set is assigned a unique number, called a +@dfn{leading byte}. This is used in the encodings of a character. +Leading bytes are in the range 0x80 - 0xFF (except for ASCII, which has +a leading byte of 0), although some leading bytes are reserved. + +Charsets whose leading byte is in the range 0x80 - 0x9F are called +@dfn{official} and are used for built-in charsets. Other charsets are +called @dfn{private} and have leading bytes in the range 0xA0 - 0xFF; +these are user-defined charsets. More specifically: @@ -5909,16 +5917,16 @@ Dimension-2 Private 0xF0 - 0xFF @end example - There are two internal encodings for characters in XEmacs/Mule. One -is called @dfn{string encoding} and is an 8-bit encoding that is used -for representing characters in a buffer or string. It uses 1 to 4 bytes -per character. The other is called @dfn{character encoding} and is a -19-bit encoding that is used for representing characters individually in -a variable. - - (In the following descriptions, we'll ignore composite -characters for the moment. We also give a general (structural) -overview first, followed later by the exact details.) +There are two internal encodings for characters in XEmacs/Mule. One is +called @dfn{string encoding} and is an 8-bit encoding that is used for +representing characters in a buffer or string. It uses 1 to 4 bytes per +character. The other is called @dfn{character encoding} and is a 19-bit +encoding that is used for representing characters individually in a +variable. + +(In the following descriptions, we'll ignore composite characters for +the moment. We also give a general (structural) overview first, +followed later by the exact details.) @menu * Internal String Encoding:: @@ -5928,12 +5936,12 @@ @node Internal String Encoding @subsection Internal String Encoding - ASCII characters are encoded using their position code directly. -Other characters are encoded using their leading byte followed -by their position code(s) with the high bit set. Characters -in private character sets have their leading byte prefixed with -a @dfn{leading byte prefix}, which is either 0x9E or 0x9F. (No -character sets are ever assigned these leading bytes.) Specifically: +ASCII characters are encoded using their position code directly. Other +characters are encoded using their leading byte followed by their +position code(s) with the high bit set. Characters in private character +sets have their leading byte prefixed with a @dfn{leading byte prefix}, +which is either 0x9E or 0x9F. (No character sets are ever assigned these +leading bytes.) Specifically: @example Character set Encoding (PC=position-code, LB=leading-byte)