xemacs-beta: man/internals/internals.texi comparison

comparison man/internals/internals.texi @ 44:8d2a9b52c682 r19-15prefinal

Import from CVS: tag r19-15prefinal

author	cvs
date	Mon, 13 Aug 2007 08:55:10 +0200
parents	d620409f5eb8
children	ee648375d8d6

comparison

equal deleted inserted replaced

-:23cafc5d2038
+:8d2a9b52c682
 @c %**end of header
 @ifinfo
 Copyright @copyright{} 1992 - 1996 Ben Wing.
-Copyright @copyright{} 1996 Sun Microsystems.
+Copyright @copyright{} 1996, 1997 Sun Microsystems.
 Copyright @copyright{} 1994, 1995 Free Software Foundation.
 Copyright @copyright{} 1994, 1995 Board of Trustees, University of Illinois.
 Permission is granted to make and distribute verbatim copies of this
 @setchapternewpage odd
 @finalout
 @titlepage
 @title XEmacs Internals Manual
-@subtitle Version 1.0, March 1996
+@subtitle Version 1.1, March 1997
 @author Ben Wing
+@author Martin Buchholz
 @page
 @vskip 0pt plus 1fill
 @noindent
 Copyright @copyright{} 1992 - 1996 Ben Wing. @*
 Copyright @copyright{} 1996 Sun Microsystems, Inc. @*
 Copyright @copyright{} 1994 Free Software Foundation. @*
 Copyright @copyright{} 1994, 1995 Board of Trustees, University of Illinois.
 @sp 2
-Version 1.0 @*
+Version 1.1 @*
-March, 1996.@*
+March, 1997.@*
 Permission is granted to make and distribute verbatim copies of this
 manual provided the copyright notice and this permission notice are
 preserved on all copies.
 XEmacs also contains a great deal of Lisp code.  This implements the
 operations that make XEmacs useful as an editor as well as just a
 Lisp environment, and also contains many add-on packages that allow
 XEmacs to browse directories, act as a mail and Usenet news reader,
-compile Lisp code, etc.  There is actually a lot more Lisp code than
+compile Lisp code, etc.  There is actually more Lisp code than
 C code associated with XEmacs, but much of the Lisp code is
 peripheral to the actual operation of the editor.  The Lisp code
 all lies in subdirectories underneath the @file{lisp/} directory.
 The @file{lwlib/} directory contains C code that implements a
 The @file{lib-src/} directory contains C code for various auxiliary
 programs that are used in connection with XEmacs.  Some of them are used
 during the build process; others are used to perform certain functions
 that cannot conveniently be placed in the XEmacs executable (e.g. the
-@file{movemail} program for fetching mail out of /var/spool/mail, which
+@file{movemail} program for fetching mail out of @file{/var/spool/mail},
-must be setgid to @file{mail} on many systems; and the 'gnuclient'
+which must be setgid to @file{mail} on many systems; and the
-program, which allows an external script to communicate with a running
+@file{gnuclient} program, which allows an external script to communicate
-XEmacs process).
+with a running XEmacs process).
 The @file{man/} directory contains the sources for the XEmacs
 documentation.  It is mostly in a form called Texinfo, which can be
-converted into either a printed document (by passing it through TeX) or
+converted into either a printed document (by passing it through @TeX{})
-into on-line documentation called @dfn{info files}.
+or into on-line documentation called @dfn{info files}.
 The @file{info/} directory contains the results of formatting the
 XEmacs documentation as @dfn{info files}, for on-line use.  These files
 are used when you enter the Info system using @kbd{C-h i} or through the
 Help menu.
 windows on the screen, and if you simply run it, it will exit
 immediately.  The Makefile runs @file{temacs} with certain options that
 cause it to initialize itself, read in a number of basic Lisp files, and
 then dump itself out into a new executable called @file{xemacs}.  This
 new executable has been pre-initialized and contains pre-digested Lisp
-code that is necessary for the editor to function (this includes some
+code that is necessary for the editor to function (this includes most
-extremely basic Lisp functions, e.g. @code{not}, that can be defined in
+basic Lisp functions, e.g. @code{not}, that can be defined in terms of
-terms of other Lisp primitives; some initialization code that is called
+other Lisp primitives; some initialization code that is called when
-when certain objects, such as frames, are created; and all of the
+certain objects, such as frames, are created; and all of the standard
-standard keybindings and code for the actions they result in).  This
+keybindings and code for the actions they result in).  This executable,
-executable, @file{xemacs}, is the executable that you run to use the
+@file{xemacs}, is the executable that you run to use the XEmacs editor.
-XEmacs editor.
 @node XEmacs From the Inside, The XEmacs Object System (Abstractly Speaking), XEmacs From the Perspective of Building, Top
 @chapter XEmacs From the Inside
 Internally, XEmacs is quite complex, and can be very confusing.  To
 simplify things, it can be useful to think of XEmacs as containing an
 event loop that ``drives'' everything, and a number of other subsystems,
-such as a Lisp engine and a redisplay mechanism.  Each of these others
+such as a Lisp engine and a redisplay mechanism.  Each of these other
 subsystems exists simultaneously in XEmacs, and each has a certain
 state.  The flow of control continually passes in and out of these
 different subsystems in the course of normal operation of the editor.
 It is important to keep in mind that, most of the time, the editor is
 @item
 The buffer mechanism is responsible for keeping track of what buffers
 exist and what text is in them.  It is periodically given commands
 (usually from the user) to insert or delete text, create a buffer, etc.
-When it receives a textual-change command, it tells the redisplay
+When it receives a text-change command, it notifies the redisplay
-mechanism about this.
+mechanism.
 @item
 The redisplay mechanism is responsible for making sure that windows and
 frames are displayed correctly.  It is periodically told (by the event
 loop) to actually ``do its job'', i.e. snoop around and see what the
 these types of objects.)
 XEmacs Lisp also contains numerous specialized objects used to
 implement the editor:
-@table @asis
+@table @code
 @item buffer
 Stores text like a string, but is optimized for insertion and deletion
 and has certain other properties that can be set.
 @item frame
 An object with various properties whose displayable representation is a
 An object that describes a connection to an externally-running process.
 @end table
 There are some other, less-commonly-encountered general objects:
-@table @asis
+@table @code
 @item hashtable
 An object that maps from an arbitrary Lisp object to another arbitrary
 Lisp object, using hashing for fast lookup.
 @item obarray
 A limited form of hashtable that maps from strings to symbols; obarrays
 An object that maps from ranges of integers to arbitrary Lisp objects.
 @end table
 And some strange special-purpose objects:
-@table @asis
+@table @code
 @item charset
 @itemx coding-system
 Objects used when MULE, or multi-lingual/Asian-language, support is
 enabled.
 @item color-instance
 @example
 ?^[$(B#&^[(B
 @end example
 (where @samp{^[} actually is an @samp{ESC} character) converts to a
-particular Kanji character. (To decode this gook: @samp{ESC} begins an
+particular Kanji character when using an ISO2022-based coding system for
-escape sequence; @samp{ESC $ (} is a class of escape sequences meaning
+input. (To decode this gook: @samp{ESC} begins an escape sequence;
-``switch to a 94x94 character set''; @samp{ESC $ ( B} means ``switch to
+@samp{ESC $ (} is a class of escape sequences meaning ``switch to a
-Japanese Kanji''; @samp{#} and @samp{&} collectively index into a
+94x94 character set''; @samp{ESC $ ( B} means ``switch to Japanese
-94-by-94 array of characters [subtract 33 from the ASCII value of each
+Kanji''; @samp{#} and @samp{&} collectively index into a 94-by-94 array
-character to get the corresponding index]; @samp{ESC (} is a class of
+of characters [subtract 33 from the ASCII value of each character to get
-escape sequences meaning ``switch to a 94 character set''; @samp{ESC (B}
+the corresponding index]; @samp{ESC (} is a class of escape sequences
-means ``switch to US ASCII''.  It is a coincidence that the letter
+meaning ``switch to a 94 character set''; @samp{ESC (B} means ``switch
-@samp{B} is used to denote both Japanese Kanji and US ASCII.  If the
+to US ASCII''.  It is a coincidence that the letter @samp{B} is used to
-first @samp{B} were replaced with an @samp{A}, you'd be requesting a
+denote both Japanese Kanji and US ASCII.  If the first @samp{B} were
-Chinese Hanzi character from the GB2312 character set.)
+replaced with an @samp{A}, you'd be requesting a Chinese Hanzi character
+from the GB2312 character set.)
 @example
 "foobar"
 @end example
 opposite semantics?  ``Hysterical reasons'', of course.)
 @cindex record type
 Note that there are only eight types that the tag can represent,
 but many more actual types than this.  This is handled by having
-one of the tag types specify a meta-object called a @dfn{record};
+one of the tag types specify a meta-type called a @dfn{record};
 for all such objects, the first four bytes of the pointed-to
 structure indicate what the actual type is.
 Note also that having 28 bits for pointers and integers restricts a
 lot of things to 256 megabytes of memory. (Basically, enough pointers
 (e.g. beginning at 0x80000000).  Those machines cope by defining
 @code{DATA_SEG_BITS} in the corresponding @file{m/} or @file{s/} file to
 the proper mask.  Then, pointers retrieved from Lisp objects are
 automatically OR'ed with this value prior to being used.
-A corollary of the previous paragraph is that @strong{stack-allocated
+A corollary of the previous paragraph is that @strong{(pointers to)
-structures cannot be put into Lisp objects}.  The stack is generally
+stack-allocated structures cannot be put into Lisp objects}.  The stack
-located near the top of memory; if you put such a pointer into a Lisp
+is generally located near the top of memory; if you put such a pointer
-object, it will get its top bits chopped off, and you will lose.
+into a Lisp object, it will get its top bits chopped off, and you will
+lose.
 Various macros are used to construct Lisp objects and extract the
 components.  Macros of the form @code{XINT()}, @code{XCHAR()},
 @code{XSTRING()}, @code{XSYMBOL()}, etc. mask out the pointer/integer
 field and cast it to the appropriate type.  All of the macros that
 object is really of the correct type.  This is great for catching places
 where an incorrect type is being dereferenced -- this typically results
 in a pointer being dereferenced as the wrong type of structure, with
 unpredictable (and sometimes not easily traceable) results.
-There are similar @code{XSET()} macros that construct a Lisp object.
+There are similar @code{XSET@var{TYPE}()} macros that construct a Lisp object.
-These macros are of the form @code{XSET (@var{lvalue}, @var{result})},
+These macros are of the form @code{XSET@var{TYPE} (@var{lvalue}, @var{result})},
 i.e. they have to be a statement rather than just used in an expression.
 The reason for this is that standard C doesn't let you ``construct'' a
 structure (but GCC does).  Granted, this sometimes isn't too convenient;
 for the case of integers, at least, you can use the function
 @code{make_number()}, which constructs and @emph{returns} an integer
-Lisp object.  Note that the @code{XSET()} macros are also affected by
+Lisp object.  Note that the @code{XSET@var{TYPE}()} macros are also
-@code{ERROR_CHECK_TYPECHECK} and make sure that the structure is of the right
+affected by @code{ERROR_CHECK_TYPECHECK} and make sure that the
-type in the case of record types, where the type is contained in
+structure is of the right type in the case of record types, where the
-the structure.
+type is contained in the structure.
 @node Rules When Writing New C Code, A Summary of the Various XEmacs Modules, How Lisp Objects Are Represented in C, Top
 @chapter Rules When Writing New C Code
 The XEmacs C Code is extremely complex and intricate, and there are
 @code{vars_of_*()} function.  The former declares any Lisp primitives
 you have defined and defines any symbols you will be using.  The latter
 declares any global Lisp variables you have added and initializes global
 C variables in the module.  For each such function, declare it in
 @file{symsinit.h} and make sure it's called in the appropriate place in
-@code{main()}.  @strong{Important}: There are stringent requirements on
+@file{emacs.c}.  @strong{Important}: There are stringent requirements on
 exactly what can go into these functions.  See the comment in
-@code{main()}.  The reason for this is to avoid obscure unwanted
+@file{emacs.c}.  The reason for this is to avoid obscure unwanted
 interactions during initialization.  If you don't follow these rules,
 you'll be sorry!  If you want to do anything that isn't allowed, create
 a @code{complex_vars_of_*()} function for it.  Doing this is tricky,
 though: You have to make sure your function is called at the right time
 so that all the initialization dependencies work out.
 Every module includes @file{<config.h>} (angle brackets so that
-@samp{--srcdir} works correctly) and @file{lisp.h}.  @file{config.h}
+@samp{--srcdir} works correctly; @file{config.h} may or may not be in
+the same directory as the C sources) and @file{lisp.h}.  @file{config.h}
 should always be included before any other header files (including
 system header files) to ensure that certain tricks played by various
 @file{s/} and @file{m/} files work out correctly.
 @strong{All global and static variables that are to be modifiable must
 appearance.)
 @cindex garbage collection protection
 @smallexample
 @group
-DEFUN ("or", For, Sor, 0, UNEVALLED, 0 /*
+DEFUN ("or", For, 0, UNEVALLED, 0, /*
 Eval args until one of them yields non-nil, then return that value.
 The remaining args are not evalled at all.
-@end group
-@group
 If all args return nil, return nil.
-*/ )
+*/
-(args)
+(args))
-Lisp_Object args;
 @{
 /* This function can GC */
 REGISTER Lisp_Object val;
 Lisp_Object args_left;
 struct gcpro gcpro1;
-@end group
-@group
 if (NILP (args))
 return Qnil;
 args_left = args;
 GCPRO1 (args_left);
-@end group
-@group
 do
 @{
 val = Feval (Fcar (args_left));
 if (!NILP (val))
-	break;
+break;
 args_left = Fcdr (args_left);
 @}
 while (!NILP (args_left));
-@end group
-@group
 UNGCPRO;
 return val;
 @}
 @end group
 @end smallexample
 Let's start with a precise explanation of the arguments to the
 @code{DEFUN} macro.  Here is a template for them:
 @example
-DEFUN (@var{lname}, @var{fname}, @var{sname}, @var{min}, @var{max}, @var{interactive} /* @var{doc} */ )
+DEFUN (@var{lname}, @var{fname}, @var{min}, @var{max}, @var{interactive}, /*
+@var{docstring}
+*/
+(@var{arglist}) )
 @end example
 @table @var
 @item lname
-This is the name of the Lisp symbol to define as the function name; in
+This string is the name of the Lisp symbol to define as the function
-the example above, it is @code{or}.
+name; in the example above, it is @code{"or"}.
 @item fname
-This is the C function name for this function.  This is
+This is the C function name for this function.  This is the name that is
-the name that is used in C code for calling the function.  The name is,
+used in C code for calling the function.  The name is, by convention,
-by convention, @samp{F} prepended to the Lisp name, with all dashes
+@samp{F} prepended to the Lisp name, with all dashes (@samp{-}) in the
-(@samp{-}) in the Lisp name changed to underscores.  Thus, to call this
+Lisp name changed to underscores.  Thus, to call this function from C
-function from C code, call @code{For}.  Remember that the arguments must
+code, call @code{For}.  Remember that the arguments are of type
-be of type @code{Lisp_Object}; various macros and functions for creating
+@code{Lisp_Object}; various macros and functions for creating values of
-values of type @code{Lisp_Object} are declared in the file
+type @code{Lisp_Object} are declared in the file @file{lisp.h}.
-@file{lisp.h}.
 Primitives whose names are special characters (e.g. @code{+} or
 @code{<}) are named by spelling out, in some fashion, the special
 character: e.g. @code{Fplus()} or @code{Flss()}.  Primitives whose names
 begin with normal alphanumeric characters but also contain special
 characters are spelled out in some creative way, e.g. @code{let*}
 becomes @code{FletX()}.
-@item sname
+Each function also has an associated structure that holds the data for
-This is a C variable name to use for a structure that holds the data for
 the subr object that represents the function in Lisp.  This structure
 conveys the Lisp symbol name to the initialization routine that will
-create the symbol and store the subr object as its definition.  By
+create the symbol and store the subr object as its definition.  The C
-convention, this name is always @var{fname} with @samp{F} replaced with
+variable name of this structure is always @samp{S} prepended to the
-@samp{S}.
+@var{fname}.  You hardly ever need to be aware of the existence of this
+structure.
 @item min
 This is the minimum number of arguments that the function requires.  The
 function @code{or} allows a minimum of zero arguments.
 there is a fixed maximum.  Alternatively, it can be @code{UNEVALLED},
 indicating a special form that receives unevaluated arguments, or
 @code{MANY}, indicating an unlimited number of evaluated arguments (the
 equivalent of @code{&rest}).  Both @code{UNEVALLED} and @code{MANY} are
 macros.  If @var{max} is a number, it may not be less than @var{min} and
-it may not be greater than 12. (If you need to add a function with
+it may not be greater than 8. (If you need to add a function with
-more than 12 arguments, either use the @code{MANY} form or edit the
+more than 8 arguments, either use the @code{MANY} form or edit the
 definition of @code{DEFUN} in @file{lisp.h}.  If you do the latter,
 make sure to also add another clause to the switch statement in
 @code{primitive_funcall().})
 @item interactive
 the argument of @code{interactive} in a Lisp function.  In the case of
 @code{or}, it is 0 (a null pointer), indicating that @code{or} cannot be
 called interactively.  A value of @code{""} indicates a function that
 should receive no arguments when called interactively.
-@item doc
+@item docstring
 This is the documentation string.  It is written just like a
-documentation string for a function defined in Lisp; in particular,
+documentation string for a function defined in Lisp; in particular, the
-the first line should be a single sentence.  Note how the documentation
+first line should be a single sentence.  Note how the documentation
-string is enclosed in a comment, none of the documentation is placed
+string is enclosed in a comment, none of the documentation is placed on
-on the same lines as the comment-start and comment-end characters, and
+the same lines as the comment-start and comment-end characters, and the
-the comment-start characters are on the same line as the interactive
+comment-start characters are on the same line as the interactive
 specification.  @file{make-docfile}, which scans the C files for
-documentation strings, is very particular about what it looks for,
+documentation strings, is very particular about what it looks for, and
-and will not properly note the doc string if it's not in this exact
+will not properly extract the doc string if it's not in this exact format.
-format.
-@end table
+You are free to put the various arguments to @code{DEFUN} on separate
-You are free to put the various arguments to @code{DEFUN} on separate
 lines to avoid overly long lines.  However, make sure to put the
 comment-start characters for the doc string on the same line as the
-interactive specification, and put a newline directly after them
+interactive specification, and put a newline directly after them (and
-(and before the comment-end characters).
+before the comment-end characters).
-After the call to the @code{DEFUN} macro, you must write the argument
+@item arglist
-name list that every C function must have, followed by ordinary C
+This is the comma-separated list of arguments to the C function.  For a
-declarations for the arguments.  For a function with a fixed maximum
+function with a fixed maximum number of arguments, provide a C argument
-number of arguments, declare a C argument for each Lisp argument, and
+for each Lisp argument.  In this case, unlike regular C functions, the
-give them all type @code{Lisp_Object}.  When a Lisp function has no
+types of the arguments are not declared; they are simply always of type
-upper limit on the number of arguments, its implementation in C actually
+@code{Lisp_Object}.
-receives exactly two arguments: the first is the number of Lisp
-arguments, and the second is the address of a block containing their
+The names of the C arguments will be used as the names of the arguments
-values.  They have types @code{int} and @w{@code{Lisp_Object *}}.
+to the Lisp primitive as displayed in its documentation, modulo the same
+concerns described above for @code{F...} names (in particular,
-The names of the C arguments will be used as the names of the arguments
-to the Lisp primitive as displayed in its documentation, modulo the
-same concerns described above for @code{F...} names (in particular,
 underscores in the C arguments become dashes in the Lisp arguments).
 There is one additional kludge: A C argument called @code{defalt}
 becomes the Lisp argument @code{default}.  This deliberate misspelling
 is done because @code{default} is a reserved word in the C language.
-Note that you @emph{must} use old-style prototypes for the arguments
+A Lisp function with @w{@var{max} = @code{UNEVALLED}} is a
-to @code{DEFUN}, even though all other functions in the C code use
+@w{@dfn{special form}}; its arguments are not evaluated.  Instead it
-new-style prototypes.
+receives one argument of type @code{Lisp_Object}, a (Lisp) list of the
+unevaluated arguments, conventionally named @code{(args)}.
+When a Lisp function has no upper limit on the number of arguments,
+specify @w{@var{max} = @code{MANY}}.  In this case its implementation in
+C actually receives exactly two arguments: the number of Lisp arguments
+(an @code{int}) and the address of a block containing their values (a
+@w{@code{Lisp_Object *}}).  In this case only are the C types specified
+in the @var{arglist}: @w{@code{(int nargs, Lisp_Object *args)}}.
+@end table
 Within the function @code{For} itself, note the use of the macros
 @code{GCPRO1} and @code{UNGCPRO}.  @code{GCPRO1} is used to ``protect''
 a variable from garbage collection---to inform the garbage collector
 that it must look in that variable and regard its contents as an
 of XEmacs coding.  It is @strong{extremely} important that you get this
 right and use a great deal of discipline when writing this code.
 @xref{GCPROing, ,@code{GCPRO}ing}, for full details on how to do this.
 What @code{DEFUN} actually does is declare a global structure of
-type @code{Lisp_Subr} whose name begins with a capital @samp{S} and
+type @code{Lisp_Subr} whose name begins with capital @samp{SF} and
 which contains information about the primitive (e.g. a pointer to the
 function, its minimum and maximum allowed arguments, a string describing
 its Lisp name); @code{DEFUN} then begins a normal C function
 declaration using the @code{F...} name.  The Lisp subr object that is
 the function definition of a primitive (i.e. the object in the function
 slot of the symbol that names the primitive) actually points to this
-@samp{S} structure; when @code{Feval} encounters a subr, it looks in the
+@samp{SF} structure; when @code{Feval} encounters a subr, it looks in the
 structure to find out how to call the C function.
 Defining the C function is not enough to make a Lisp primitive
 available; you must also create the Lisp symbol for the primitive (the
 symbol is @dfn{interned}; @pxref{Obarrays}) and store a suitable subr
 object in its function cell. (If you don't do this, the primitive won't
 be seen by Lisp code.) The code looks like this:
 @example
-defsubr (&@var{subr-structure-name});
+DEFSUBR (@var{fname});
 @end example
 @noindent
-Here @var{subr-structure-name} is the name you used as the third
+Here @var{fname} is the name you used as the second argument to
-argument to @code{DEFUN}.
+@code{DEFUN}.
-This call to @code{defsubr} should go in the @code{syms_of_*()}
+This call to @code{DEFSUBR} should go in the @code{syms_of_*()}
 function at the end of the module.  If no such function exists, create
 it and make sure to also declare it in @file{symsinit.h} and call it
 from the appropriate spot in @code{main()}.  @xref{General Coding
 Rules}.
 Note that C code cannot call functions by name unless they are defined
-in C.  The way to call a function written in Lisp is to use
+in C.  The way to call a function written in Lisp from C is to use
 @code{Ffuncall}, which embodies the Lisp function @code{funcall}.  Since
 the Lisp function @code{funcall} accepts an unlimited number of
 arguments, in C it takes two: the number of Lisp-level arguments, and a
 one-dimensional array containing their values.  The first Lisp-level
 argument is the Lisp function to call, and the rest are the arguments to
 @file{alloca.c}, etc. are normally placed past @file{lastfile.c}, and
 all of the files that implement Xt widget classes @emph{must} be placed
 after @file{lastfile.c} because they contain various structures that
 must be statically initialized and into which Xt writes at various
 times.) @file{pre-crt0.c} and @file{lastfile.c} contain exported symbols
-that are used to determine the start and end of XEmacs's initialized
+that are used to determine the start and end of XEmacs' initialized
 data space when dumping.
 @example
 43058  chartab.c
 6503  chartab.h
 9918  casetab.c
 @end example
-@file{chartab.c} and @file{chartab.h} implement the char table Lisp
+@file{chartab.c} and @file{chartab.h} implement the @dfn{char table}
-object type, which maps from characters or certain sorts of character
+Lisp object type, which maps from characters or certain sorts of
-ranges to Lisp objects.  The implementation of this object is optimized
+character ranges to Lisp objects.  The implementation of this object
-for the internal representation of characters.  Char tables come in
+type is optimized for the internal representation of characters.  Char
-different types, which affect the allowed object types to which a
+tables come in different types, which affect the allowed object types to
-character can be mapped and also dictate certain other properties of the
+which a character can be mapped and also dictate certain other
-char table.
+properties of the char table.
 @cindex case table
 @file{casetab.c} implements one sort of char table, the @dfn{case
 table}, which maps characters to other characters of possibly different
 case.  These are used by XEmacs to implement case-changing primitives
 49593  syntax.c
 10200  syntax.h
 @end example
 @cindex scanner
-This module implements syntax tables, another sort of char table that
+This module implements @dfn{syntax tables}, another sort of char table
-maps characters into syntax classes that define the syntax of these
+that maps characters into syntax classes that define the syntax of these
-characters (e.g. a parenthesis belongs to a class of @samp{open} characters
+characters (e.g. a parenthesis belongs to a class of @samp{open}
-that have corresponding @samp{close} characters and can be nested).
+characters that have corresponding @samp{close} characters and can be
-This module also implements the Lisp @dfn{scanner}, a set of primitives
+nested).  This module also implements the Lisp @dfn{scanner}, a set of
-for scanning over text based on syntax tables.  This is used, for
+primitives for scanning over text based on syntax tables.  This is used,
-example, to find the matching parenthesis in a command such as
+for example, to find the matching parenthesis in a command such as
 @code{forward-sexp}, and by @file{font-lock.c} to locate quoted strings,
 comments, etc.
 two-dimensional set of characters, such as US ASCII or JISX0208 Japanese
 Kanji).
 @file{mule-coding.*} implements the @dfn{coding-system} Lisp object
 type, which encapsulates a method of converting between different
-encodings.  An encoding is a representation of a stream of characters
+encodings.  An encoding is a representation of a stream of characters,
-from multiple character sets using a stream of bytes or words and
+possibly from multiple character sets, using a stream of bytes or words,
-defines (e.g.) which escape sequences are used to specify particular
+and defines (e.g.) which escape sequences are used to specify particular
 character sets, how the indices for a character are converted into bytes
 (sometimes this involves setting the high bit; sometimes complicated
 rearranging of the values takes place, as in the Shift-JIS encoding),
 etc.
 interpreter.  CCL is similar in spirit to Lisp byte code and is used to
 implement converters for custom encodings.
 @file{mule-canna.c} and @file{mule-wnnfns.c} implement interfaces to
 external programs used to implement the Canna and WNN input methods,
-respectively.  This is currently broken.
+respectively.  This is currently in beta.
-@file{mule-mcpatch.c} provides some functions to allow for pathnames
+@file{mule-mcpath.c} provides some functions to allow for pathnames
-containing extended characters.  This code is fragmentary and completely
+containing extended characters.  This code is fragmentary, obsolete, and
-non-working.
+completely non-working.  Instead, @var{pathname-coding-system} is used
+to specify conversions of names of files and directories.  The standard
+C I/O functions like @samp{open()} are wrapped so that conversion occurs
+automatically.
 @file{mule.c} provides a few miscellaneous things that should probably
 be elsewhere.
 @itemize @bullet
 @item
 (a) Those for whom the value directly represents the contents of the
 Lisp object.  Only two types are in this category: integers and
 characters.  No special allocation or garbage collection is necessary
-for such objects.
+for such objects.  Lisp objects of these types do not need to be
+@code{GCPRO}ed.
 @end itemize
 In the remaining three categories, the value is a pointer to a
 structure.
 Note that @code{obarray} is one of the @code{staticpro()}d things.
 Therefore, all functions and variables get marked through this.
 @item
 Any shadowed bindings that are sitting on the specpdl stack.
 @item
-Any objects sitting in currently active stack frames,
+Any objects sitting in currently active (Lisp) stack frames,
 catches, and condition cases.
 @item
 A couple of special-case places where active objects are
 located.
 @item
 just a single lvalue.  To effect this, call @code{GCPRO@var{n}} as usual on
 the first object in the array and then set @code{gcpron.nvars}.
 @item
 @strong{Strings are relocated.}  What this means in practice is that the
-pointer obtained using @code{string_data()} is liable to change at any
+pointer obtained using @code{XSTRING_DATA()} is liable to change at any
 time, and you should never keep it around past any function call, or
 pass it as an argument to any function that might cause a garbage
 collection.  This is why a number of functions accept either a
 ``non-relocatable'' @code{char *} pointer or a relocatable Lisp string,
 and only access the Lisp string's data at the very last minute.  In some
 If you have the @emph{least smidgeon of doubt} about whether
 you need to @code{GCPRO}, you should @code{GCPRO}.
 @item
 Beware of @code{GCPRO}ing something that is uninitialized.  If you have
-any shade of doubt about this, initialize all your variables to Qnil.
+any shade of doubt about this, initialize all your variables to @code{Qnil}.
 @item
 Be careful of traps, like calling @code{Fcons()} in the argument to
 another function.  By the ``caller protects'' law, you should be
 @code{GCPRO}ing the newly-created cons, but you aren't.  A certain
 @section Vector
 As mentioned above, each vector is @code{malloc()}ed individually, and
 all are threaded through the variable @code{all_vectors}.  Vectors are
 marked strangely during garbage collection, by kludging the size field.
-Note that the @code{struct Lisp_Vector} is declared with its contents
+Note that the @code{struct Lisp_Vector} is declared with its
-being an array of one element.  It is actually @code{malloc()}ed with
+@code{contents} field being a @emph{stretchy} array of one element.  It
-the right size, however, and access to any element through the contents
+is actually @code{malloc()}ed with the right size, however, and access
-array works fine.
+to any element through the @code{contents} array works fine.
 @node Bit Vector
 @section Bit Vector
 Bit vectors work exactly like vectors, except for more complicated
 @code{command_event_queue}.  There is a comment about a ``race
 condition'', which is not a good sign.
 @code{next-command-event} and @code{read-char} are higher-level
 interfaces to @code{next-event}.  @code{next-command-event} gets the
-next @dfn{command} event (i.e.  keypress, mouse event, or menu
+next @dfn{command} event (i.e.  keypress, mouse event, menu selection,
-selection), calling dispatch-event on any others.  @code{read-char}
+or scrollbar action), calling @code{dispatch-event} on any others.
-calls @code{next-command-event} and uses @code{event_to_character()} to
+@code{read-char} calls @code{next-command-event} and uses
-return the ASCII equivalent.
+@code{event_to_character()} to return the character equivalent.  With
+the right kind of input method support, it is possible for (read-char)
+to return a Kanji character.
 @node Converting Events
 @section Converting Events
 @code{character_to_event()}, @code{event_to_character()},
 @code{event-to-character}, and @code{character-to-event} convert between
-ASCII characters and keypresses corresponding to the characters.  If the
+characters and keypress events corresponding to the characters.  If the
 event was not a keypress, @code{event_to_character()} returns -1 and
 @code{event-to-character} returns @code{nil}.  These functions convert
-between ASCII representation and the split-up event representation
+between character representation and the split-up event representation
 (keysym plus mod keys).
 @node Dispatching Events; The Command Builder
 @section Dispatching Events; The Command Builder
 the backtrace structure is changed).
 At this point, the function to be called is determined by looking at
 the car of the cons (if this is a symbol, its function definition is
 retrieved and the process repeated).  The function should then consist
-of either a Lisp_Subr (built-in function), a Lisp_Compiled object, or a
+of either a @code{Lisp_Subr} (built-in function), a
-cons whose car is the symbol @code{autoload}, @code{macro},
+@code{Lisp_Compiled_Function} object, or a cons whose car is the symbol
-@code{lambda}, or @code{mocklisp}.
+@code{autoload}, @code{macro}, @code{lambda}, or @code{mocklisp}.
-If the function is a Lisp_Subr, the lisp object points to a struct
+If the function is a @code{Lisp_Subr}, the lisp object points to a
-Lisp_Subr (created by @code{DEFUN()}), which contains a pointer to the C
+@code{struct Lisp_Subr} (created by @code{DEFUN()}), which contains a
-function, a minimum and maximum number of arguments (possibly the
+pointer to the C function, a minimum and maximum number of arguments
-special constants @code{MANY} or @code{UNEVALLED}), a pointer to the
+(possibly the special constants @code{MANY} or @code{UNEVALLED}), a
-symbol referring to that subr, and a couple of other things.  If the
+pointer to the symbol referring to that subr, and a couple of other
-subr wants its arguments @code{UNEVALLED}, they are passed raw as a
+things.  If the subr wants its arguments @code{UNEVALLED}, they are
-list.  Otherwise, an array of evaluated arguments is created and put
+passed raw as a list.  Otherwise, an array of evaluated arguments is
-into the backtrace structure, and either passed whole (@code{MANY}) or
+created and put into the backtrace structure, and either passed whole
-each argument is passed as a C argument.
+(@code{MANY}) or each argument is passed as a C argument.
-If the function is a Lisp_Compiled object or a lambda,
+If the function is a @code{Lisp_Compiled_Function} object or a lambda,
 @code{apply_lambda()} is called.  If the function is a macro,
 [..... fill in] is done.  If the function is an autoload,
 @code{do_autoload()} is called to load the definition and then eval
 starts over [explain this more].  If the function is a mocklisp,
 @code{ml_apply()} is called.
 @code{funcall_lambda()} goes through the formal arguments to the
 function and binds them to the actual arguments, checking for
 @code{&rest} and @code{&optional} symbols in the formal arguments and
 making sure the number of actual arguments is correct.  Then either
-progn or byte-code is called to actually execute the body and return a
+@code{progn} or @code{byte-code} is called to actually execute the body
-value.
+and return a value.
 @code{Ffuncall()} implements Lisp @code{funcall}.  @code{(funcall fun
 x1 x2 x3 ...)} is equivalent to @code{(eval (list fun (quote x1) (quote
 x2) (quote x3) ...))}.  @code{Ffuncall()} contains its own code to do
 the evaluation, however, and is almost identical to eval.
 specpdl array, and @code{specpdl_size} is increased by 1.
 @code{record_unwind_protect()} implements an @dfn{unwind-protect},
 which, when placed around a section of code, ensures that some specified
 cleanup routine will be executed even if the code exits abnormally
-(e.g. through a throw or quit).  @code{record_unwind_protect()} simply
+(e.g. through a @code{throw} or quit).  @code{record_unwind_protect()}
-adds a new specbinding to the specpdl array and stores the appropriate
+simply adds a new specbinding to the specpdl array and stores the
-information in it.  The cleanup routine can either be a C function,
+appropriate information in it.  The cleanup routine can either be a C
-which is stored in the @code{func} field, or a progn form, which is stored in
+function, which is stored in the @code{func} field, or a @code{progn}
-the @code{old_value} field.
+form, which is stored in the @code{old_value} field.
 @code{unbind_to()} removes specbindings from the specpdl array until
-the specified position is reached.  The specbinding can be one of three
+the specified position is reached.  Each specbinding can be one of three
 types:
 @enumerate
 @item
-an unwind-protect with a C cleanup function (@code{func} is not 0 --
+an unwind-protect with a C cleanup function (@code{func} is not 0, and
 @code{old_value} holds an argument to be passed to the function);
 @item
-an unwind-protect with a Lisp form (@code{func} is 0 and @code{symbol}
+an unwind-protect with a Lisp form (@code{func} is 0, @code{symbol}
-is @code{nil} -- @code{old_value} holds the form to be executed with
+is @code{nil}, and @code{old_value} holds the form to be executed with
 @code{Fprogn()}); or
 @item
-a local-variable binding (@code{func} is 0 and @code{symbol} is not
+a local-variable binding (@code{func} is 0, @code{symbol} is not
-@code{nil} -- @code{old_value} holds the old value, which is stored as
+@code{nil}, and @code{old_value} holds the old value, which is stored as
 the symbol's value).
 @end enumerate
 @node Simple Special Forms
 @section Simple Special Forms
 Usually symbols are created by @code{intern}, but if you really want,
 you can explicitly create a symbol using @code{make-symbol}, giving it
 some name.  The resulting symbol is not in any obarray (i.e. it is
 @dfn{uninterned}), and you can't add it to any obarray.  Therefore its
-primary purpose is as a carrier of information. (Cons cells could
+primary purpose is as a symbol to use in macros to avoid namespace
-probably be used just as well.)
+pollution.  It can also be used as a carrier of information, but cons
+cells could probably be used just as well.
 You can also use @code{intern-soft} to look up a symbol but not create
 a new one, and @code{unintern} to remove a symbol from an obarray.  This
 returns the removed symbol. (Remember: You can't put the symbol back
 into any obarray.) Finally, @code{mapatoms} maps over all of the symbols
 In this, it is like a string, but a buffer is optimized for
 frequent insertion and deletion, while a string is not.  Furthermore:
 @enumerate
 @item
-Buffers are @dfn{permanent} objects, i.e. one you create them, they
+Buffers are @dfn{permanent} objects, i.e. once you create them, they
 remain around, and need to be explicitly deleted before they go away.
 @item
 Each buffer has a unique name, which is a string.  Buffers are
 normally referred to by name.  In this respect, they are like
 symbols.
 can temporarily change the current buffer using @code{set-buffer} (often
 enclosed in a @code{save-excursion} so that the former current buffer
 gets restored when the code is finished).  However, calling
 @code{set-buffer} will NOT cause a permanent change in the current
 buffer.  The reason for this is that the top-level event loop sets
-current buffer to the buffer of the selected window, each time it
+@code{current_buffer} to the buffer of the selected window, each time
-finishes executing a user command.
+it finishes executing a user command.
 @end enumerate
 Make sure you understand the distinction between @dfn{current buffer}
 and @dfn{buffer of the selected window}, and the distinction between
 @dfn{point} of the current buffer and @dfn{window-point} of the selected
 etc.), Cyrillic and Greek letters, etc.  The actual number of possible
 characters is quite large.
 For now, we can view a character as some non-negative integer that
 has some shape that defines how it typically appears (e.g. as an
-uppercase A). (The exact way in which a character appears depends
+uppercase A). (The exact way in which a character appears depends on the
-on the font of the character.) The internal type of characters in
+font used to display the character.) The internal type of characters in
-the C code is an Emchar; this is just an int, but using a symbolic
+the C code is an @code{Emchar}; this is just an @code{int}, but using a
-type makes the code clearer.
+symbolic type makes the code clearer.
 Between every character in a buffer is a @dfn{buffer position} or
 @dfn{character position}.  We can speak of the character before or after
 a particular buffer position, and when you insert a character at a
 particular position, all characters after that position end up at new
 characters back again).  Once the buffer is killed, the memory allocated
 for the buffer text will be freed, but it will still be sitting on the
 heap, taking up virtual memory, and will not be released back to the
 operating system. (However, if you have compiled XEmacs with rel-alloc,
 the situation is different.  In this case, the space @emph{will} be
-released back to the operating system.  However, this tends to effect a
+released back to the operating system.  However, this tends to result in a
 noticeable speed penalty.)
 Astute readers may notice that the text in a buffer is represented as
 an array of @emph{bytes}, while (at least in the MULE case) an Emchar is
 a 19-bit integer, which clearly cannot fit in a byte.  This means (of
 @dfn{byte indices}, typedef @code{Bytind}
 @item
 @dfn{memory indices}, typedef @code{Memind}
 @end enumerate
-All three typedefs are just ints, but defining them this way makes
+All three typedefs are just @code{int}s, but defining them this way makes
 things a lot clearer.
 Most code works with buffer positions.  In particular, all Lisp code
 that refers to text in a buffer uses buffer positions.  Lisp code does
 not know that byte indices or memory indices exist.
 Finally, we have a typedef for the bytes in a buffer.  This is a
 @code{Bufbyte}, which is an unsigned char.  Referring to them as
 Bufbytes underscores the fact that we are working with a string of bytes
 in the internal Emacs buffer representation rather than in one of a
-number of possible alternative representations (e.g. EUC-coded text,
+number of possible alternative representations (e.g. EUC-encoded text,
 etc.).
 @node Buffer Lists
 @section Buffer Lists
 @end menu
 @node Japanese EUC (Extended Unix Code)
 @subsection Japanese EUC (Extended Unix Code)
-This encompasses the character sets Printing-ASCII, Japanese (aka
+This encompasses the character sets Printing-ASCII, Japanese-JISSX0201,
-JISX0208), and Japanese-Kana (half-width katakana, the right half of
+and Japanese-JISX0208-Kana (half-width katakana, the right half of
 JISX0201).  It uses 8-bit bytes.
-Note that Printing-ASCII and Japanese-Kana are 94-character charsets,
+Note that Printing-ASCII and Japanese-JISX0201-Kana are 94-character
-while Japanese is a 94x94-character charset.
+charsets, while Japanese-JISX0208 is a 94x94-character charset.
 The encoding is as follows:
 @example
-Character set   Representation (PC=position-code)
+Character set            Representation (PC=position-code)
--------------   --------------
+-------------            --------------
-Printing-ASCII  PC1
+Printing-ASCII           PC1
-Japanese        PC1 + 0x80 | PC2 + 0x80
+Japanese-JISX0201-Kana   0x8E       | PC1 + 0x80
-Japanese-Kana   0x8E       | PC1 + 0x80
+Japanese-JISX0208        PC1 + 0x80 | PC2 + 0x80
+Japanese-JISX0212        PC1 + 0x80 | PC2 + 0x80
 @end example
 @node JIS7
 @subsection JIS7
 This encompasses the character sets Printing-ASCII,
-Japanese-Roman (the left half of JISX0201; this character
+Japanese-JISX0201-Roman (the left half of JISX0201; this character set
-set is very similar to Printing-ASCII and is a 94-character
+is very similar to Printing-ASCII and is a 94-character charset),
-charset), Japanese, and Japanese-Kana.  It uses 7-bit bytes.
+Japanese-JISX0208, and Japanese-JISX0201-Kana.  It uses 7-bit bytes.
 Unlike Japanese EUC, this is a @dfn{modal} encoding, which
 means that there are multiple states that the encoding can
 be in, which affect how the bytes are to be interpreted.
 Special sequences of bytes (called @dfn{escape sequences})
 are used to change states.
 The encoding is as follows:
 @example
-Character set     Representation (PC=position-code)
+Character set              Representation (PC=position-code)
--------------     --------------
+-------------              --------------
-Printing-ASCII    PC1
+Printing-ASCII             PC1
-Japanese-Roman    PC1
+Japanese-JISX0201-Roman    PC1
-Japanese          PC1 PC2
+Japanese-JISX0201-Kana     PC1
-Japanese-Kana     PC1
+Japanese-JISX0208          PC1 PC2
 Escape sequence   ASCII equivalent   Meaning
 ---------------   ----------------   -------
-0x1B 0x28 0x4A    ESC ( J            invoke Japanese-Roman
+0x1B 0x28 0x4A    ESC ( J            invoke Japanese-JISX0201-Roman
-0x1B 0x24 0x42    ESC $ B            invoke Japanese
+0x1B 0x28 0x49    ESC ( I            invoke Japanese-JISX0201-Kana
-0x1B 0x28 0x49    ESC ( I            invoke Japanese-Kana
+0x1B 0x24 0x42    ESC $ B            invoke Japanese-JISX0208
 0x1B 0x28 0x42    ESC ( B            invoke Printing-ASCII
 @end example
 Initially, Printing-ASCII is invoked.
 @node Internal Mule Encodings
 @section Internal Mule Encodings
-In XEmacs/Mule, each character set is assigned a unique number,
+In XEmacs/Mule, each character set is assigned a unique number, called a
-called a @dfn{leading byte}.  This is used in the encodings of a
+@dfn{leading byte}.  This is used in the encodings of a character.
-character.  Leading bytes are in the range 0x80 - 0xFF
+Leading bytes are in the range 0x80 - 0xFF (except for ASCII, which has
-(except for ASCII, which has a leading byte of 0), although
+a leading byte of 0), although some leading bytes are reserved.
-some leading bytes are reserved.
+Charsets whose leading byte is in the range 0x80 - 0x9F are called
-Charsets whose leading byte is in the range 0x80 - 0x9F are
+@dfn{official} and are used for built-in charsets.  Other charsets are
-called @dfn{official} and are used for built-in charsets.
+called @dfn{private} and have leading bytes in the range 0xA0 - 0xFF;
-Other charsets are called @dfn{private} and have leading bytes
+these are user-defined charsets.
-in the range 0xA0 - 0xFF; these are user-defined charsets.
 More specifically:
 @example
 Character set           Leading byte
 0x9E and 0x9F are reserved)
 Dimension-1 Private     0xA0 - 0xEF
 Dimension-2 Private     0xF0 - 0xFF
 @end example
-There are two internal encodings for characters in XEmacs/Mule.  One
+There are two internal encodings for characters in XEmacs/Mule.  One is
-is called @dfn{string encoding} and is an 8-bit encoding that is used
+called @dfn{string encoding} and is an 8-bit encoding that is used for
-for representing characters in a buffer or string.  It uses 1 to 4 bytes
+representing characters in a buffer or string.  It uses 1 to 4 bytes per
-per character.  The other is called @dfn{character encoding} and is a
+character.  The other is called @dfn{character encoding} and is a 19-bit
-19-bit encoding that is used for representing characters individually in
+encoding that is used for representing characters individually in a
-a variable.
+variable.
-(In the following descriptions, we'll ignore composite
+(In the following descriptions, we'll ignore composite characters for
-characters for the moment.  We also give a general (structural)
+the moment.  We also give a general (structural) overview first,
-overview first, followed later by the exact details.)
+followed later by the exact details.)
 @menu
 * Internal String Encoding::
 * Internal Character Encoding::
 @end menu
 @node Internal String Encoding
 @subsection Internal String Encoding
-ASCII characters are encoded using their position code directly.
+ASCII characters are encoded using their position code directly.  Other
-Other characters are encoded using their leading byte followed
+characters are encoded using their leading byte followed by their
-by their position code(s) with the high bit set.  Characters
+position code(s) with the high bit set.  Characters in private character
-in private character sets have their leading byte prefixed with
+sets have their leading byte prefixed with a @dfn{leading byte prefix},
-a @dfn{leading byte prefix}, which is either 0x9E or 0x9F. (No
+which is either 0x9E or 0x9F. (No character sets are ever assigned these
-character sets are ever assigned these leading bytes.) Specifically:
+leading bytes.) Specifically:
 @example
 Character set           Encoding (PC=position-code, LB=leading-byte)
 -------------           --------
 ASCII                   PC-1 |

Mercurial > hg > xemacs-beta

comparison man/internals/internals.texi @ 44:8d2a9b52c682 r19-15prefinal