Mercurial > hg > xemacs-beta
changeset 2640:a4040d921acc
[xemacs-hg @ 2005-03-09 05:36:28 by stephent]
internals and lispref <871xapfkkq.fsf@tleepslib.sk.tsukuba.ac.jp>
author | stephent |
---|---|
date | Wed, 09 Mar 2005 05:36:50 +0000 |
parents | cd00e5eeb22a |
children | f7e2b977e15c |
files | man/ChangeLog man/internals/internals.texi man/lispref/mule.texi |
diffstat | 3 files changed, 430 insertions(+), 21 deletions(-) [+] |
line wrap: on
line diff
--- a/man/ChangeLog Wed Mar 09 04:59:31 2005 +0000 +++ b/man/ChangeLog Wed Mar 09 05:36:50 2005 +0000 @@ -1,3 +1,26 @@ +2005-01-19 Aidan Kehoe <kehoea@parhasard.net> + + * lispref/mule.texi (CCL Example): Detail an implementation of the + web's URL encoding as a CCL coding system example. + +2005-02-22 Stephen J. Turnbull <stephen@xemacs.org> + + * internals/internals.texi (The version.sh Script): New node. + (XEmacs from the Perspective of Building): + (Low-Level Modules): + (The Build Configuration System): + (Adding Configurable Features): + Add or update references to the version.sh node and/or file. + + (XEmacs from the Perspective of Building): Improve text. + + +2005-01-22 Stephen J. Turnbull <stephen@xemacs.org> + + * internals/internals.texi (XEmacs): Add XEmacs 21.4.16 to list. + (The XEmacs Split): Add comments on untrue legal factoids. + (The XEmacs Split): Add some @urefs for Jamie's commentary. + 2005-02-23 Aidan Kehoe <kehoea@parhasard.net> * lispref/searching.texi (Syntax of Regexps):
--- a/man/internals/internals.texi Wed Mar 09 04:59:31 2005 +0000 +++ b/man/internals/internals.texi Wed Mar 09 05:36:50 2005 +0000 @@ -1822,6 +1822,8 @@ @item XEmacs 21.4.15 "Security Through Obscurity" released February 2, 2004. @item +XEmacs 21.4.16 "Successful IPO" released December 5, 2004. +@item version 21.5.0 "alfalfa" released April 18, 2001. @item version 21.5.1 "anise" released May 9, 2001. @@ -1907,19 +1909,26 @@ @itemize @bullet @item -By doing so you essentially give up all control over your code. You can -no longer release your code under a different license. If you want to +By doing so you essentially give up all control over your code. You can +no longer release your code under a different license. If you want to use your code that you've contributed to the FSF in a project of your own, and that project is not released under the GPL, you are not allowed -to do this. Obviously, large companies tend to want to reuse their code -in many different projects and as a result feel very uncomfortable about -signing legal papers. +to do this. (This is supposed to be avoided by the standard assignment +contract used by the FSF, which either automatically relicenses the code +to the author for any purpose under any license, or promises to do so, +depending on the version -- stephen.) Obviously, large companies tend +to want to reuse their code in many different projects and as a result +feel very uncomfortable about signing legal papers. @item One of the dangers of assigning copyright to the FSF is that if the FSF happens to be taken over by some evil corporate identity or anyone with different ideas than RMS, they will own all copyright-assigned code, and -can revoke the GPL and enforce any license they please. If the code has -many different copyright holders, this is much less likely of a +can revoke the GPL and enforce any license they please. (This is false, +according to RMS; the FSF's covenants and the assignment contracts +require that it or any successors may release the code only under +copyleft. Thus, the only real loophole is if the FSF goes bankrupt, +somehow leaving the code in the public domain -- stephen.) If the code +has many different copyright holders, this is much less likely of a scenario. @end itemize @@ -2074,9 +2083,12 @@ 1993, comprise the bulk (if not the entirety) of the public discussions between the Lucid and FSF camps on why the split happened and why a merger never did. +@uref{http://www.jwz.org/doc/lemacs.html,The Lucid Emacs Split}. The current XEmacs maintainers have a much more pusillanimous summary -of this history on their XEmacs versus GNU Emacs page. +of this history on +@uref{http://www.xemacs.org/About/XEmacsVsGNUemacs.html,their XEmacs +versus GNU Emacs page}. -- jwz, 11-Feb-2000. @@ -2415,8 +2427,8 @@ This determines what the build environment is, chooses the appropriate @file{s/} and @file{m/} file, and runs a series of tests to determine many details about your environment, such as which library -functions are available and exactly how they work. The reason for -running these tests is that it allows XEmacs to be compiled on a much +functions are available and exactly how they work. +Running these tests allows XEmacs to be compiled on a much wider variety of platforms than those that the XEmacs developers happen to be familiar with, including various sorts of hybrid platforms. This is especially important now that many operating systems give you a great @@ -2425,12 +2437,17 @@ would be impossible to pre-determine and pre-specify the information for all possible configurations. -In fact, the @file{s/} and @file{m/} files are basically @emph{evil}, -since they contain unmaintainable platform-specific hard-coded -information. XEmacs has been moving in the direction of having all +Thus, the @file{s/} and @file{m/} files are basically @emph{evil}, +since they contain platform-specific hard-coded +information. XEmacs is moving in the direction of having all system-specific information be determined dynamically by @file{configure}. Perhaps someday we can @code{rm -rf src/s src/m}. +@file{configure} also parses the version information from +@file{version.sh} and adds it to @file{config.h} as C preprocessor +macros. These macros in turn are used to initialize some Lisp +variables, such as @samp{emacs-version}. @xref{The version.sh Script}. + When configure is done running, it generates @file{Makefile}s and @file{GNUmakefile}s and the file @file{src/config.h} (which describes the features of your system) from template files. You then run @@ -3054,6 +3071,7 @@ @file{configure} @file{config.h.in} @file{Makefile.in.in} +@file{version.sh} @end example @example @@ -3061,6 +3079,7 @@ @file{configure.in} @end example +@xref{The version.sh Script}. @xref{The configure Script}. @@ -3096,6 +3115,15 @@ @cindex modules, low-level @example +@file{version.sh} +@end example + +This is a Bourne shell script which sets version-related variables. It +is updated in the release process by the maintainer of each series or +branch, and may also be automatically updated. +@xref{The version.sh Script}. + +@example @file{config.h} @end example @@ -4138,6 +4166,7 @@ macros in @file{configure.in} and @file{configure.ac}. @menu +* The version.sh Script:: * Adding Configurable Features:: * The configure Script:: * The Makefile Precursors:: @@ -4145,7 +4174,98 @@ -@node Adding Configurable Features, The configure Script, The Build Configuration System, The Build Configuration System +@node The version.sh Script, Adding Configurable Features, The Build Configuration System, The Build Configuration System +@section The version.sh Script +@cindex version.sh script +@cindex scripts, version.sh + +The @file{version.sh} script is a snippet of Bourne shell script which +sets version variables. By convention, these variables are given +descriptive names, all in lower case ASCII letters, with words separated +by underscores (@samp{_}, ASCII 0x5F). They are converted to C +preprocessor macro definitions and added to @file{src/config.h} by +@file{configure}. Thus each must have a corresponding @samp{#undef} in +@file{src/config.h.in}. Each macro's name is the same as the shell +variable's, converted to all uppercase. Finally, the macros are used to +initialize Lisp variables defined in @file{src/emacs.c}. These Lisp +variables have the same name as the shell variables and preprocessor +macros, except that they obey the Lisp conventions that Lisp variable +names are all lowercase with words separated by hyphens (@samp{-}, ASCII +0x2D), while the C implementations are the same as the shell variable +with the letter @samp{V} (ASCII 0x56) prepended. + +The file is updated by various release engineers and their scripts. +Other developers should have no need to edit this file. The main +exception would be to add a branch tag and possibly other information to +@samp{xemacs_extra_name} to describe informal releases from a private +branch. In particular, @samp{xemacs_release_date} and the +@samp{emacs_*_version} variables should refer to the most recent release +in the parent branch, so ``private branch'' maintainers should not +update them. If the branch is significant and long-lasting, you might +enjoy assigning your own codenames. (Of course, if you have no intent +of merging your changes to the mainline, you can do what you want with +any of the variables. But in that case you should change the name of +the program, as well, in version strings and the like.) + +Regarding the syntax of the file, it is simply a sequence of shell +variable assignments. So the only thing that you can rely on is that +the shebang (the shell's interpreter comment, @code{#!/bin/sh}) will +occupy the first line of the file. You should not count on order or +other comments being preserved. On the other hand, some maintainers' +tools do depend on the order, so as much as possible your tools should +preserve the order of assignments. + +Here is a table of the currently defined variables and their meanings (as +of February 2005): + +@table @samp +@item #!/bin/sh +The shebang, making this an executable script on Unix. + +@item emacs_is_beta +Set to @samp{t} when the release is a beta test release, otherwise null. + +@item emacs_major_version +@itemx emacs_minor_version +@itemx emacs_beta_version +Strings containing decimal numbers representing the components of the +version of the source tree. The name @samp{emacs_beta_version} is a +relic of the time when XEmacs had a two component version for public +releases. Since XEmacs 21.1, both the beta series and the stable series +have three-component version numbers, and @samp{emacs_beta_version} holds +the lowest-order component of the stable series as well as the beta series. + +@item xemacs_codename +An optional string containing a codename for the release. Recent +maintainers have chosen humorous themes for their codenames, and +typically the names are used in alphabetical order. + +@item emacs_kit_version +An optional string used for special branches. (This should be +deprecated in favor of xemacs_extra_name.) + +@item infodock_major_version +@itemx infodock_minor_version +@itemx infodock_build_version +Strings containing decimal numbers representing the components of the +version of the Infodock applied to the source tree. (The Infodock +project has been in hibernation since XEmacs 21.1.9 or so; these +variables are unused in current XEmacsen.) + +@item xemacs_extra_name +A string containing arbitrary additional information. If length is +positive, it is automatically added to the version string after the +codename. + +@item xemacs_release_date +A string containing the date of the latest release in the series in ISO +8601 format. The time zone should not be present, it is defined to be +UTC. Time is optional. Not currently used in the version string. +@end table + + + +@node Adding Configurable Features, The configure Script, The version.sh Script, The Build Configuration System @section Adding Configurable Features @cindex adding configurable features @cindex configurable features, adding @@ -4161,7 +4281,7 @@ @node The configure Script, The Makefile Precursors, Adding Configurable Features, The Build Configuration System @section The configure Script @cindex configure script -@cindex script, configure +@cindex scripts, configure At the heart of the XEmacs build configuration system is the @file{configure} script. This beast is maintained using the Autoconf
--- a/man/lispref/mule.texi Wed Mar 09 04:59:31 2005 +0000 +++ b/man/lispref/mule.texi Wed Mar 09 05:36:50 2005 +0000 @@ -1765,7 +1765,7 @@ * CCL Statements:: Semantics of CCL statements. * CCL Expressions:: Operators and expressions in CCL. * Calling CCL:: Running CCL programs. -* CCL Examples:: The encoding functions for Big5 and KOI-8. +* CCL Example:: A trivial program to transform the Web's URL encoding. @end menu @node CCL Syntax, CCL Statements, , CCL @@ -1986,7 +1986,7 @@ Shift JIS. CCL_DECODE_SJIS is its inverse.) It is somewhat odd to represent the SJIS operations in infix form. -@node Calling CCL, CCL Examples, CCL Expressions, CCL +@node Calling CCL, CCL Example, CCL Expressions, CCL @comment Node, Next, Previous, Up @subsection Calling CCL @@ -2052,11 +2052,277 @@ Resets the CCL interpreter's internal elapsed time registers. @end defun -@node CCL Examples, , Calling CCL, CCL +@node CCL Example, , Calling CCL, CCL @comment Node, Next, Previous, Up -@subsection CCL Examples - - This section is not yet written. +@subsection CCL Example + + In this section, we describe the implementation of a trivial coding +system to transform from the Web's URL encoding to XEmacs' internal +coding. Many people will have been first exposed to URL encoding when +they saw ``%20'' where they expected a space in a file's name on their +local hard disk; this can happen when a browser saves a file from the +web and doesn't encode the name, as passed from the server, properly. + + URL encoding itself is underspecified with regard to encodings beyond +ASCII. The relevant document, RFC 1738, explicitly doesn't give any +information on how to encode non-ASCII characters, and the ``obvious'' +way---use the %xx values for the octets of the eight bit MIME character +set in which the page was served---breaks when a user types a character +outside that character set. Best practice for web development is to +serve all pages as UTF-8 and treat incoming form data as using that +coding system. (Oh, and gamble that your clients won't ever want to +type anything outside Unicode. But that's not so much of a gamble with +today's client operating systems.) We don't treat non-ASCII in this +example, as dealing with @samp{(read-multibyte-character ...)} and +errors therewith would make it much harder to understand. + + Since CCL isn't a very rich language, we move much of the logic that +would ordinarily be computed from operations like @code{(member ..)}, +@code{(and ...)} and @code{(or ...)} into tables, from which register +values are read and written, and on which @code{if} statements are +predicated. Much more of the implementation of this coding system is +occupied with constructing these tables---in normal Emacs Lisp---than it +is with actual CCL code. + + All the @code{defvar} statements we deal with in the next few sections +are surrounded by a @code{(eval-and-compile ...)}, which means that the +logic which initializes these variables executes at compile time, and if +XEmacs loads the compiled version of the file, these variables are +initialized as constants. + +@menu +* Four bits to ASCII:: Two tables used for getting hex digits from ASCII. +* URI Encoding constants:: Useful predefined characters. +* Numeric to ASCII-hexadecimal conversion:: Trivial in Lisp, not so in CCL. +* Characters to be preserved:: No transformation needed for these characters. +* The program to decode to internal format:: . +* The program to encode from internal format:: . + +@end menu + +@node Four bits to ASCII, URI Encoding constants, , CCL Example +@subsubsection Four bits to ASCII + + The first @code{defvar} is for +@code{url-coding-high-order-nybble-as-ascii}, a 256-entry table that +maps from an octet's value to the ASCII encoding for the hex value of +its most significant four bits. That might sound complex, but it isn't; +for decimal 65, hex value @samp{#x41}, the entry in the table is the +ASCII encoding of `4'. For decimal 122, ASCII `z', hex value +@code{#x7a}, @code{(elt url-coding-high-order-nybble-as-ascii #x7a)} +after this file is loaded gives the ASCII encoding of 7. + +@example +(defvar url-coding-high-order-nybble-as-ascii + (let ((val (make-vector 256 0)) + (i 0)) + (while (< i (length val)) + (aset val i (char-int (aref (format "%02X" i) 0))) + (setq i (1+ i))) + val) + "Table to find an ASCII version of an octet's most significant 4 bits.") +@end example + + The next table, @code{url-coding-low-order-nybble-as-ascii} is almost +the same thing, but this time it has a map for the hex encoding of the +low-order four bits. So the sixty-fifth entry (offset @samp{#x51}) is +the ASCII encoding of `1', the hundred-and-twenty-second (offset +@samp{#x7a}) is the ASCII encoding of `A'. + +@example +(defvar url-coding-low-order-nybble-as-ascii + (let ((val (make-vector 256 0)) + (i 0)) + (while (< i (length val)) + (aset val i (char-int (aref (format "%02X" i) 1))) + (setq i (1+ i))) + val) + "Table to find an ASCII version of an octet's least significant 4 bits.") +@end example + +@node URI Encoding constants, Numeric to ASCII-hexadecimal conversion, Four bits to ASCII, CCL Example +@subsubsection URI Encoding constants + + Next, we have a couple of variables that make the CCL code more +readable. The first is the ASCII encoding of the percentage sign; this +character is used as an escape code, to start the encoding of a +non-printable character. For historical reasons, URL encoding allows +the space character to be encoded as a plus sign--it does make typing +URLs like @samp{http://google.com/search?q=XEmacs+home+page} easier--and +as such, we have to check when decoding for this value, and map it to +the space character. When doing this in CCL, we use the +@code{url-coding-escaped-space-code} variable. + +@example +(defvar url-coding-escape-character-code (char-int ?%) + "The code point for the percentage sign, in ASCII.") + +(defvar url-coding-escaped-space-code (char-int ?+) + "The URL-encoded value of the space character, that is, +.") +@end example + +@node Numeric to ASCII-hexadecimal conversion +@subsubsection Numeric to ASCII-hexadecimal conversion + + Now, we have a couple of utility tables that wouldn't be necessary in +a more expressive programming language than is CCL. The first is sixteen +in length, and maps a hexadecimal number to the ASCII encoding of that +number; so zero maps to ASCII `0', ten maps to ASCII `A.' The second +does the reverse; that is, it maps an ASCII character to its value when +interpreted as a hexadecimal digit. ('A' => 10, 'c' => 12, '2' => 2, as +a few examples.) + +@example +(defvar url-coding-hex-digit-table + (let ((i 0) + (val (make-vector 16 0))) + (while (< i 16) + (aset val i (char-int (aref (format "%X" i) 0))) + (setq i (1+ i))) + val) + "A map from a hexadecimal digit's numeric value to its encoding in ASCII.") + +(defvar url-coding-latin-1-as-hex-table + (let ((val (make-vector 256 0)) + (i 0)) + (while (< i (length val)) + ;; Get a hex val for this ASCII character. + (aset val i (string-to-int (format "%c" i) 16)) + (setq i (1+ i))) + val) + "A map from Latin 1 code points to their values as hexadecimal digits.") +@end example + +@node Characters to be preserved +@subsubsection Characters to be preserved + + And finally, the last of these tables. URL encoding says that +alphanumeric characters, the underscore, hyphen and the full stop +@footnote{That's what the standards call it, though my North American +readers will be more familiar with it as the period character.} retain +their ASCII encoding, and don't undergo transformation. +@code{url-coding-should-preserve-table} is an array in which the entries +are one if the corresponding ASCII character should be left as-is, and +zero if they should be transformed. So the entries for all the control +and most of the punctuation charcters are zero. Lisp programmers will +observe that this initialization is particularly inefficient, but +they'll also be aware that this is a long way from an inner loop where +every nanosecond counts. + +@example +(defvar url-coding-should-preserve-table + (let ((preserve + (list ?- ?_ ?. ?a ?b ?c ?d ?e ?f ?g ?h ?i ?j ?k ?l ?m ?n ?o + ?p ?q ?r ?s ?t ?u ?v ?w ?x ?y ?z ?A ?B ?C ?D ?E ?F ?G + ?H ?I ?J ?K ?L ?M ?N ?O ?P ?Q ?R ?S ?T ?U ?V ?W ?X ?Y + ?Z ?0 ?1 ?2 ?3 ?4 ?5 ?6 ?7 ?8 ?9)) + (i 0) + (res (make-vector 256 0))) + (while (< i 256) + (when (member (int-char i) preserve) + (aset res i 1)) + (setq i (1+ i))) + res) + "A 256-entry array of flags, indicating whether or not to preserve an +octet as its ASCII encoding.") +@end example + +@node The program to decode to internal format +@subsubsection The program to decode to internal format + + After the almost interminable tables, we get to the CCL. The first +CCL program, @code{ccl-decode-urlcoding} decodes from the URL coding to +our internal format; since this version of CCL doesn't have support for +error checking on the input, we don't do any verification on it. + +The buffer magnification--approximate ratio of the size of the output +buffer to the size of the input buffer--is declared as one, because +fractional values aren't allowed. (Since all those %20's will map to +` ', the length of the output text will be less than that of the input +text.) + +So, first we read an octet from the input buffer into register +@samp{r0}, to set up the loop. Next, we start the loop, with a +@code{(loop ...)} statement, and we check if the value in @samp{r0} is a +percentage sign. (Note the comma before +@code{url-coding-escape-character-code}; since CCL is a Lisp macro +language, we can break out of the macro evaluation with a comman, and as +such, ``@code{,url-coding-escape-character-code}'' will be evaluated as a +literal `37.') + +If it is a percentage sign, we read the next two octets into @samp{r2} +and @samp{r3}, and convert them into their hexadecimal numeric values, +using the @code{url-coding-latin-1-as-hex-table} array declared above. +(But again, it'll be interpreted as a literal array.) We then left +shift the first by four bits, mask the two together, and write the +result to the output buffer. + +If it isn't a percentage sign, and it is a `+' sign, we write a +space--hexadecimal 20--to the output buffer. + +If none of those things are true, we pass the octet to the output buffer +untransformed. (This could be a place to put error checking, in a more +expressive language.) We then read one more octet from the input +buffer, and move to the next iteration of the loop. + +@example +(define-ccl-program ccl-decode-urlcoding + `(1 + ((read r0) + (loop + (if (r0 == ,url-coding-escape-character-code) + ((read r2 r3) + ;; Assign the value at offset r2 in the url-coding-hex-digit-table + ;; to r3. + (r2 = r2 ,url-coding-latin-1-as-hex-table) + (r3 = r3 ,url-coding-latin-1-as-hex-table) + (r2 <<= 4) + (r3 |= r2) + (write r3)) + (if (r0 == ,url-coding-escaped-space-code) + (write #x20) + (write r0))) + (read r0) + (repeat)))) + "CCL program to take URI-encoded ASCII text and transform it to our +internal encoding. ") +@end example + +@node The program to encode from internal format +@subsubsection The program to encode from internal format + + Next, we see the CCL program to encode ASCII text as URL coded text. +Here, the buffer magnification is specified as three, to account for ` ' +mapping to %20, etc. As before, we read an octet from the input into +@samp{r0}, and move into the body of the loop. Next, we check if we +should preserve the value of this octet, by reading from offset +@samp{r0} in the @code{url-coding-should-preserve-table} into @samp{r1}. +Then we have an @samp{if} statement predicated on the value in +@samp{r1}; for the true branch, we write the input octet directly. For +the false branch, we write a percentage sign, the ASCII encoding of the +high four bits in hex, and then the ASCII encoding of the low four bits +in hex. + +We then read an octet from the input into @samp{r0}, and repeat the loop. + +@example +(define-ccl-program ccl-encode-urlcoding + `(3 + ((read r0) + (loop + (r1 = r0 ,url-coding-should-preserve-table) + ;; If we should preserve the value, just write the octet directly. + (if r1 + (write r0) + ;; else, write a percentage sign, and the hex value of the octet, in + ;; an ASCII-friendly format. + ((write ,url-coding-escape-character-code) + (write r0 ,url-coding-high-order-nybble-as-ascii) + (write r0 ,url-coding-low-order-nybble-as-ascii))) + (read r0) + (repeat)))) + "CCL program to encode octets (almost) according to RFC 1738") +@end example @node Category Tables, Unicode Support, CCL, MULE @section Category Tables