Mercurial > hg > xemacs-beta
changeset 868:48eed784e93a
[xemacs-hg @ 2002-06-05 12:00:40 by ben]
To: xemacs-patches@xemacs.org
internals/internals.texi:
author | ben |
---|---|
date | Wed, 05 Jun 2002 12:01:11 +0000 |
parents | 804517e16990 |
children | a07667553efc |
files | man/ChangeLog man/internals/internals.texi src/ChangeLog src/README.global-renaming src/README.integral-types |
diffstat | 5 files changed, 565 insertions(+), 352 deletions(-) [+] |
line wrap: on
line diff
--- a/man/ChangeLog Wed Jun 05 09:58:45 2002 +0000 +++ b/man/ChangeLog Wed Jun 05 12:01:11 2002 +0000 @@ -1,3 +1,13 @@ +2002-06-05 Ben Wing <ben@xemacs.org> + + * internals/internals.texi (Top): + * internals/internals.texi (The XEmacs Object System (Abstractly Speaking)): + * internals/internals.texi (How Lisp Objects Are Represented in C): + * internals/internals.texi (Major Textual Changes): + * internals/internals.texi (Great Integral Type Renaming): + * internals/internals.texi (Text/Char Type Renaming): + * internals/internals.texi (files): New. + 2002-05-04 Stephen J. Turnbull <stephen@xemacs.org> * custom.texi (The Init File): Rewrite completely.
--- a/man/internals/internals.texi Wed Jun 05 09:58:45 2002 +0000 +++ b/man/internals/internals.texi Wed Jun 05 12:01:11 2002 +0000 @@ -116,6 +116,7 @@ * XEmacs From the Inside:: * The XEmacs Object System (Abstractly Speaking):: * How Lisp Objects Are Represented in C:: +* Major Textual Changes:: * Rules When Writing New C Code:: * CVS Techniques:: * A Summary of the Various XEmacs Modules:: @@ -1759,7 +1760,7 @@ nor do most complex objects, which contain too much state to be easily initialized through a read syntax. -@node How Lisp Objects Are Represented in C, Rules When Writing New C Code, The XEmacs Object System (Abstractly Speaking), Top +@node How Lisp Objects Are Represented in C, Major Textual Changes, The XEmacs Object System (Abstractly Speaking), Top @chapter How Lisp Objects Are Represented in C @cindex Lisp objects are represented in C, how @cindex objects are represented in C, how Lisp @@ -1846,7 +1847,335 @@ nothing unless the corresponding configure error checking flag was specified. -@node Rules When Writing New C Code, CVS Techniques, How Lisp Objects Are Represented in C, Top +@node Major Textual Changes, Rules When Writing New C Code, How Lisp Objects Are Represented in C, Top +@chapter Major Textual Changes +@cindex textual changes, major +@cindex major textual changes + +Sometimes major textual changes are made to the source. This means that +a search-and-replace is done to change type names and such. Some people +disagree with such changes, and certainly if done without good reason +will just lead to headaches. But it's important to keep the code clean +and understable, and consistent naming goes a long way towards this. + +An example of the right way to do this was the so-called "great integral +type renaming". + +@menu +* Great Integral Type Renaming:: +* Text/Char Type Renaming:: +@end menu + +@node Great Integral Type Renaming +@section Great Integral Type Renaming +@cindex Great Integral Type Renaming +@cindex integral type renaming, great +@cindex type renaming, integral +@cindex renaming, integral types + +The purpose of this is to rationalize the names used for various +integral types, so that they match their intended uses and follow +consist conventions, and eliminate types that were not semantically +different from each other. + +The conventions are: + +@itemize @bullet +@item +All integral types that measure quantities of anything are signed. Some +people disagree vociferously with this, but their arguments are mostly +theoretical, and are vastly outweighed by the practical headaches of +mixing signed and unsigned values, and more importantly by the far +increased likelihood of inadvertent bugs: Because of the broken "viral" +nature of unsigned quantities in C (operations involving mixed +signed/unsigned are done unsigned, when exactly the opposite is nearly +always wanted), even a single error in declaring a quantity unsigned +that should be signed, or even the even more subtle error of comparing +signed and unsigned values and forgetting the necessary cast, can be +catastrophic, as comparisons will yield wrong results. -Wsign-compare +is turned on specifically to catch this, but this tends to result in a +great number of warnings when mixing signed and unsigned, and the casts +are annoying. More has been written on this elsewhere. + +@item +All such quantity types just mentioned boil down to EMACS_INT, which is +32 bits on 32-bit machines and 64 bits on 64-bit machines. This is +guaranteed to be the same size as Lisp objects of type `int', and (as +far as I can tell) of size_t (unsigned!) and ssize_t. The only type +below that is not an EMACS_INT is Hashcode, which is an unsigned value +of the same size as EMACS_INT. + +@item +Type names should be relatively short (no more than 10 characters or +so), with the first letter capitalized and no underscores if they can at +all be avoided. + +@item +"count" == a zero-based measurement of some quantity. Includes sizes, +offsets, and indexes. + +@item +"bpos" == a one-based measurement of a position in a buffer. "Charbpos" +and "Bytebpos" count text in the buffer, rather than bytes in memory; +thus Bytebpos does not directly correspond to the memory representation. +Use "Membpos" for this. + +@item +"Char" refers to internal-format characters, not to the C type "char", +which is really a byte. +@end itemize + +For the actual name changes, see the script below. + +I ran the following script to do the conversion. (NOTE: This script is +idempotent. You can safely run it multiple times and it will not screw +up previous results -- in fact, it will do nothing if nothing has +changed. Thus, it can be run repeatedly as necessary to handle patches +coming in from old workspaces, or old branches.) There are two tags, +just before and just after the change: @samp{pre-integral-type-rename} +and @samp{post-integral-type-rename}. When merging code from the main +trunk into a branch, the best thing to do is first merge up to +@samp{pre-integral-type-rename}, then apply the script and associated +changes, then merge from @samp{post-integral-type-change} to the +present. (Alternatively, just do the merging in one operation; but you +may then have a lot of conflicts needing to be resolved by hand.) + +Script @samp{fixtypes.sh} follows: + +@example +----------------------------------- cut ------------------------------------ +files="*.[ch] s/*.h m/*.h config.h.in ../configure.in Makefile.in.in ../lib-src/*.[ch] ../lwlib/*.[ch]" +gr Memory_Count Bytecount $files +gr Lstream_Data_Count Bytecount $files +gr Element_Count Elemcount $files +gr Hash_Code Hashcode $files +gr extcount bytecount $files +gr bufpos charbpos $files +gr bytind bytebpos $files +gr memind membpos $files +gr bufbyte intbyte $files +gr Extcount Bytecount $files +gr Bufpos Charbpos $files +gr Bytind Bytebpos $files +gr Memind Membpos $files +gr Bufbyte Intbyte $files +gr EXTCOUNT BYTECOUNT $files +gr BUFPOS CHARBPOS $files +gr BYTIND BYTEBPOS $files +gr MEMIND MEMBPOS $files +gr BUFBYTE INTBYTE $files +gr MEMORY_COUNT BYTECOUNT $files +gr LSTREAM_DATA_COUNT BYTECOUNT $files +gr ELEMENT_COUNT ELEMCOUNT $files +gr HASH_CODE HASHCODE $files +----------------------------------- cut ------------------------------------ +@end example + +The @samp{gr} script, and the scripts it uses, are documented in +@file{README.global-renaming}, because if placed in this file they would +need to have their @@ characters doubled, meaning you couldn't easily +cut and paste from the source. + +In addition to those programs, I needed to fix up a few other +things, particularly relating to the duplicate definitions of +types, now that some types merged with others. Specifically: + +@enumerate +@item +in lisp.h, removed duplicate declarations of Bytecount. The changed +code should now look like this: (In each code snippet below, the first +and last lines are the same as the original, as are all lines outside of +those lines. That allows you to locate the section to be replaced, and +replace the stuff in that section, verifying that there isn't anything +new added that would need to be kept.) + +@example +--------------------------------- snip ------------------------------------- +/* Counts of bytes or chars */ +typedef EMACS_INT Bytecount; +typedef EMACS_INT Charcount; + +/* Counts of elements */ +typedef EMACS_INT Elemcount; + +/* Hash codes */ +typedef unsigned long Hashcode; + +/* ------------------------ dynamic arrays ------------------- */ +--------------------------------- snip ------------------------------------- +@end example + +@item +in lstream.h, removed duplicate declaration of Bytecount. Rewrote the +comment about this type. The changed code should now look like this: + +@example +--------------------------------- snip ------------------------------------- +#endif + +/* The have been some arguments over the what the type should be that + specifies a count of bytes in a data block to be written out or read in, + using Lstream_read(), Lstream_write(), and related functions. + Originally it was long, which worked fine; Martin "corrected" these to + size_t and ssize_t on the grounds that this is theoretically cleaner and + is in keeping with the C standards. Unfortunately, this practice is + horribly error-prone due to design flaws in the way that mixed + signed/unsigned arithmetic happens. In fact, by doing this change, + Martin introduced a subtle but fatal error that caused the operation of + sending large mail messages to the SMTP server under Windows to fail. + By putting all values back to be signed, avoiding any signed/unsigned + mixing, the bug immediately went away. The type then in use was + Lstream_Data_Count, so that it be reverted cleanly if a vote came to + that. Now it is Bytecount. + + Some earlier comments about why the type must be signed: This MUST BE + SIGNED, since it also is used in functions that return the number of + bytes actually read to or written from in an operation, and these + functions can return -1 to signal error. + + Note that the standard Unix read() and write() functions define the + count going in as a size_t, which is UNSIGNED, and the count going + out as an ssize_t, which is SIGNED. This is a horrible design + flaw. Not only is it highly likely to lead to logic errors when a + -1 gets interpreted as a large positive number, but operations are + bound to fail in all sorts of horrible ways when a number in the + upper-half of the size_t range is passed in -- this number is + unrepresentable as an ssize_t, so code that checks to see how many + bytes are actually written (which is mandatory if you are dealing + with certain types of devices) will get completely screwed up. + + --ben +*/ + +typedef enum lstream_buffering +--------------------------------- snip ------------------------------------- +@end example + +@item +in dumper.c, there are four places, all inside of switch() statements, +where XD_BYTECOUNT appears twice as a case tag. In each case, the two +case blocks contain identical code, and you should *REMOVE THE SECOND* +and leave the first. +@end enumerate + +@node Text/Char Type Renaming +@section Text/Char Type Renaming +@cindex Text/Char Type Renaming +@cindex type renaming, text/char +@cindex renaming, text/char types + +The purpose of this was + +@enumerate +@item +To distinguish between ``charptr'' when it refers to operations on +the pointer itself and when it refers to operations on text +@item +To use consistent naming for everything referring to internal format, i.e. +@end enumerate + +@example + Itext == text in internal format + Ibyte == a byte in such text + Ichar == a char as represented in internal character format +@end example + +Thus e.g. + +@example + set_charptr_emchar -> set_itext_ichar +@end example + +This was done using a script like this: + +@example +files="*.[ch] s/*.h m/*.h config.h.in ../configure.in Makefile.in.in ../lib-src/*.[ch] ../lwlib/*.[ch]" +gr Intbyte Ibyte $files +gr INTBYTE IBYTE $files +gr intbyte ibyte $files +gr EMCHAR ICHAR $files +gr emchar ichar $files +gr Emchar Ichar $files +gr INC_CHARPTR INC_IBYTEPTR $files +gr DEC_CHARPTR DEC_IBYTEPTR $files +gr VALIDATE_CHARPTR VALIDATE_IBYTEPTR $files +gr valid_charptr valid_ibyteptr $files +gr CHARPTR ITEXT $files +gr charptr itext $files +gr Charptr Itext $files +@end example + +See above for the source to @samp{gr}. + +As in the integral-types change, there are pre and post tags before and +after the change: + +@example + pre-internal-format-textual-renaming + post-internal-format-textual-renaming +@end example + +When merging a large branch, follow the same sort of procedure +documented above, using these tags -- essentially sync up to the pre +tag, then apply the script yourself, then sync from the post tag to the +present. You can probably do the same if you don't have a separate +workspace, but do have lots of outstanding changes and you'd rather not +just merge all the textual changes directly. Use something like this: + +(WARNING: I'm not a CVS guru; before trying this, or any large operation +that might potentially mess things up, *DEFINITELY* make a backup of +your existing workspace.) + +@example +cup -r pre-internal-format-textual-renaming +<apply script> +cup -A -j post-internal-format-textual-renaming -j HEAD +@end example + +This might also work: + +@example +cup -j pre-internal-format-textual-renaming +<apply script> +cup -j post-internal-format-textual-renaming -j HEAD +@end example + +ben + +The following is a script to go in the opposite direction: + +@example +files="*.[ch] s/*.h m/*.h config.h.in ../configure.in Makefile.in.in ../lib-src/*.[ch] ../lwlib/*.[ch]" + +# Evidently Perl considers _ to be a word char ala \b, even though XEmacs +# doesn't. We need to be careful here with ibyte/ichar because of words +# like Richard, eicharlen(), multibyte, HIBYTE, etc. + +gr Ibyte Intbyte $files +gr '\bIBYTE' INTBYTE $files +gr '\bibyte' intbyte $files +gr '\bICHAR' EMCHAR $files +gr '\bichar' emchar $files +gr '\bIchar' Emchar $files +gr '\bIBYTEPTR' CHARPTR $files +gr '\bibyteptr' charptr $files +gr '\bITEXT' CHARPTR $files +gr '\bitext' charptr $files +gr '\bItext' CHARPTR $files + +gr '_IBYTE' _INTBYTE $files +gr '_ibyte' _intbyte $files +gr '_ICHAR' _EMCHAR $files +gr '_ichar' _emchar $files +gr '_Ichar' _Emchar $files +gr '_IBYTEPTR' _CHARPTR $files +gr '_ibyteptr' _charptr $files +gr '_ITEXT' _CHARPTR $files +gr '_itext' _charptr $files +gr '_Itext' _CHARPTR $files +@end example + +@node Rules When Writing New C Code, CVS Techniques, Major Textual Changes, Top @chapter Rules When Writing New C Code @cindex writing new C code, rules when @cindex C code, rules when writing new
--- a/src/ChangeLog Wed Jun 05 09:58:45 2002 +0000 +++ b/src/ChangeLog Wed Jun 05 12:01:11 2002 +0000 @@ -1,3 +1,15 @@ +2002-06-05 Ben Wing <ben@xemacs.org> + + * README.integral-types: Removed. + * README.global-renaming: Added. + + Stuff specific to the integral types rename was moved to the + Internals Manual. The general scripts, suitable for any type + of global search-and-replace, were moved to README.global-renaming. + (In the internals manual, they need to be munged by replacing @ + with @@, and this precludes just cutting and pasting from the source + file, which is what people are naturally going to do.) + 2002-06-05 Ben Wing <ben@xemacs.org> * abbrev.c (abbrev_match_mapper): @@ -6054,7 +6066,12 @@ * dumper.c: remove duplicate case tag XD_BYTECOUNT, and the accompanying duplicate code, from 4 switchs tatements. - See README.integral-types in this directory for more details. + [[See README.integral-types in this directory for more + details.]] --invalid. + + See the Internals Manual, under Major Type Changes, and also + README.global-renaming. + 2001-09-17 Ben Wing <ben@xemacs.org>
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/src/README.global-renaming Wed Jun 05 12:01:11 2002 +0000 @@ -0,0 +1,206 @@ +README.global-renaming + +This file documents the generic scripts that have been used to implement +the recent type renamings, e.g. the "great integral type renaming" and the +"text/char type renaming". More information about these changes can be +found in the Internals manual. + +A sample script to do such renaming is this (used in the great integral +type renaming): + +----------------------------------- cut ------------------------------------ +files="*.[ch] s/*.h m/*.h config.h.in ../configure.in Makefile.in.in ../lib-src/*.[ch] ../lwlib/*.[ch]" +gr Memory_Count Bytecount $files +gr Lstream_Data_Count Bytecount $files +gr Element_Count Elemcount $files +gr Hash_Code Hashcode $files +gr extcount bytecount $files +gr bufpos charbpos $files +gr bytind bytebpos $files +gr memind membpos $files +gr bufbyte intbyte $files +gr Extcount Bytecount $files +gr Bufpos Charbpos $files +gr Bytind Bytebpos $files +gr Memind Membpos $files +gr Bufbyte Intbyte $files +gr EXTCOUNT BYTECOUNT $files +gr BUFPOS CHARBPOS $files +gr BYTIND BYTEBPOS $files +gr MEMIND MEMBPOS $files +gr BUFBYTE INTBYTE $files +gr MEMORY_COUNT BYTECOUNT $files +gr LSTREAM_DATA_COUNT BYTECOUNT $files +gr ELEMENT_COUNT ELEMCOUNT $files +gr HASH_CODE HASHCODE $files +----------------------------------- cut ------------------------------------ + + +`fixtypes.sh' is a Bourne-shell script; it uses 'gr': + + +----------------------------------- cut ------------------------------------ +#!/bin/sh + +# Usage is like this: + +# gr FROM TO FILES ... + +# globally replace FROM with TO in FILES. FROM and TO are regular expressions. +# backup files are stored in the `backup' directory. +from="$1" +to="$2" +shift 2 +echo ${1+"$@"} | xargs global-replace "s/$from/$to/g" +----------------------------------- cut ------------------------------------ + + +`gr' in turn uses a Perl script to do its real work, `global-replace', +which follows: + + +----------------------------------- cut ------------------------------------ +: #-*- Perl -*- + +### global-replace --- modify the contents of a file by a Perl expression + +## Copyright (C) 1999 Martin Buchholz. +## Copyright (C) 2001, 2002 Ben Wing. + +## Authors: Martin Buchholz <martin@xemacs.org>, Ben Wing <ben@xemacs.org> +## Maintainer: Ben Wing <ben@xemacs.org> +## Current Version: 1.2, March 12, 2002 + +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2, or (at your option) +# any later version. +# +# This program is distributed in the hope that it will be useful, but +# WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +# General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with XEmacs; see the file COPYING. If not, write to the Free +# Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA +# 02111-1307, USA. + +eval 'exec perl -w -S $0 ${1+"$@"}' + if 0; + +use strict; +use FileHandle; +use Carp; +use Getopt::Long; +use File::Basename; + +(my $myName = $0) =~ s@.*/@@; my $usage=" +Usage: $myName [--help] [--backup-dir=DIR] [--line-mode] [--hunk-mode] + PERLEXPR FILE ... + +Globally modify a file, either line by line or in one big hunk. + +Typical usage is like this: + +[with GNU print, GNU xargs: guaranteed to handle spaces, quotes, etc. + in file names] + +find . -name '*.[ch]' -print0 | xargs -0 $0 's/\bCONST\b/const/g'\n + +[with non-GNU print, xargs] + +find . -name '*.[ch]' -print | xargs $0 's/\bCONST\b/const/g'\n + + +The file is read in, either line by line (with --line-mode specified) +or in one big hunk (with --hunk-mode specified; it's the default), and +the Perl expression is then evalled with \$_ set to the line or hunk of +text, including the terminating newline if there is one. It should +destructively modify the value there, storing the changed result in \$_. + +Files in which any modifications are made are backed up to the directory +specified using --backup-dir, or to `backup.orig' by default. To disable +this, use --backup-dir= with no argument. + +Hunk mode is the default because it is MUCH MUCH faster than line-by-line. +Use line-by-line only when it matters, e.g. you want to do a replacement +only once per line (the default without the `g' argument). Conversely, +when using hunk mode, *ALWAYS* use `g'; otherwise, you will only make one +replacement in the entire file! +"; + +my %options = (); +$Getopt::Long::ignorecase = 0; +&GetOptions ( + \%options, + 'help', 'backup-dir=s', 'line-mode', 'hunk-mode', +); + + +die $usage if $options{"help"} or @ARGV <= 1; +my $code = shift; + +die $usage if grep (-d || ! -w, @ARGV); + +sub SafeOpen { + open ((my $fh = new FileHandle), $_[0]); + confess "Can't open $_[0]: $!" if ! defined $fh; + return $fh; +} + +sub SafeClose { + close $_[0] or confess "Can't close $_[0]: $!"; +} + +sub FileContents { + my $fh = SafeOpen ("< $_[0]"); + my $olddollarslash = $/; + local $/ = undef; + my $contents = <$fh>; + $/ = $olddollarslash; + return $contents; +} + +sub WriteStringToFile { + my $fh = SafeOpen ("> $_[0]"); + binmode $fh; + print $fh $_[1] or confess "$_[0]: $!\n"; + SafeClose $fh; +} + +foreach my $file (@ARGV) { + my $changed_p = 0; + my $new_contents = ""; + if ($options{"line-mode"}) { + my $fh = SafeOpen $file; + while (<$fh>) { + my $save_line = $_; + eval $code; + $changed_p = 1 if $save_line ne $_; + $new_contents .= $_; + } + } else { + my $orig_contents = $_ = FileContents $file; + eval $code; + if ($_ ne $orig_contents) { + $changed_p = 1; + $new_contents = $_; + } + } + + if ($changed_p) { + my $backdir = $options{"backup-dir"}; + $backdir = "backup.orig" if !defined ($backdir); + if ($backdir) { + my ($name, $path, $suffix) = fileparse ($file, ""); + my $backfulldir = $path . $backdir; + my $backfile = "$backfulldir/$name"; + mkdir $backfulldir, 0755 unless -d $backfulldir; + print "modifying $file (original saved in $backfile)\n"; + rename $file, $backfile; + } + WriteStringToFile ($file, $new_contents); + } +} +----------------------------------- cut ------------------------------------
--- a/src/README.integral-types Wed Jun 05 09:58:45 2002 +0000 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,349 +0,0 @@ -README.integral-types - -The great integral types renaming. - -#### The content of this file was originally posted as a ChangeLog and -should be moved to the Internals manual. - -The purpose of this is to rationalize the names used for various -integral types, so that they match their intended uses and follow -consist conventions, and eliminate types that were not semantically -different from each other. - -The conventions are: - --- All integral types that measure quantities of anything are - signed. Some people disagree vociferously with this, but their - arguments are mostly theoretical, and are vastly outweighed by - the practical headaches of mixing signed and unsigned values, - and more importantly by the far increased likelihood of - inadvertent bugs: Because of the broken "viral" nature of - unsigned quantities in C (operations involving mixed - signed/unsigned are done unsigned, when exactly the opposite is - nearly always wanted), even a single error in declaring a - quantity unsigned that should be signed, or even the even more - subtle error of comparing signed and unsigned values and - forgetting the necessary cast, can be catastrophic, as - comparisons will yield wrong results. -Wsign-compare is turned - on specifically to catch this, but this tends to result in a - great number of warnings when mixing signed and unsigned, and - the casts are annoying. More has been written on this - elsewhere. - --- All such quantity types just mentioned boil down to EMACS_INT, - which is 32 bits on 32-bit machines and 64 bits on 64-bit - machines. This is guaranteed to be the same size as Lisp - objects of type `int', and (as far as I can tell) of size_t - (unsigned!) and ssize_t. The only type below that is not an - EMACS_INT is Hashcode, which is an unsigned value of the same - size as EMACS_INT. - --- Type names should be relatively short (no more than 10 - characters or so), with the first letter capitalized and no - underscores if they can at all be avoided. - --- "count" == a zero-based measurement of some quantity. Includes - sizes, offsets, and indexes. - --- "bpos" == a one-based measurement of a position in a buffer. - "Charbpos" and "Bytebpos" count text in the buffer, rather than - bytes in memory; thus Bytebpos does not directly correspond to - the memory representation. Use "Membpos" for this. - --- "Char" refers to internal-format characters, not to the C type - "char", which is really a byte. - --- For the actual name changes, see the script below. - -I ran the following script to do the conversion. (NOTE: This script -is idempotent. You can safely run it multiple times and it will -not screw up previous results -- in fact, it will do nothing if -nothing has changed. Thus, it can be run repeatedly as necessary -to handle patches coming in from old workspaces, or old branches.) -There are two tags, just before and just after the change: -`pre-integral-type-rename' and `post-integral-type-rename'. When -merging code from the main trunk into a branch, the best thing to -do is first merge up to `pre-integral-type-rename', then apply the -script and associated changes, then merge from -`post-integral-type-change' to the present. (Alternatively, just do -the merging in one operation; but you may then have a lot of -conflicts needing to be resolved by hand.) - -Script `fixtypes.sh' follows: - - ------------------------------------ cut ------------------------------------ -files="*.[ch] s/*.h m/*.h config.h.in ../configure.in Makefile.in.in ../lib-src/*.[ch] ../lwlib/*.[ch]" -gr Memory_Count Bytecount $files -gr Lstream_Data_Count Bytecount $files -gr Element_Count Elemcount $files -gr Hash_Code Hashcode $files -gr extcount bytecount $files -gr bufpos charbpos $files -gr bytind bytebpos $files -gr memind membpos $files -gr bufbyte intbyte $files -gr Extcount Bytecount $files -gr Bufpos Charbpos $files -gr Bytind Bytebpos $files -gr Memind Membpos $files -gr Bufbyte Intbyte $files -gr EXTCOUNT BYTECOUNT $files -gr BUFPOS CHARBPOS $files -gr BYTIND BYTEBPOS $files -gr MEMIND MEMBPOS $files -gr BUFBYTE INTBYTE $files -gr MEMORY_COUNT BYTECOUNT $files -gr LSTREAM_DATA_COUNT BYTECOUNT $files -gr ELEMENT_COUNT ELEMCOUNT $files -gr HASH_CODE HASHCODE $files ------------------------------------ cut ------------------------------------ - - - `fixtypes.sh' is a Bourne-shell script; it uses 'gr': - - ------------------------------------ cut ------------------------------------ -#!/bin/sh - -# Usage is like this: - -# gr FROM TO FILES ... - -# globally replace FROM with TO in FILES. FROM and TO are regular expressions. -# backup files are stored in the `backup' directory. -from="$1" -to="$2" -shift 2 -echo ${1+"$@"} | xargs global-replace "s/$from/$to/g" ------------------------------------ cut ------------------------------------ - - - `gr' in turn uses a Perl script to do its real work, - `global-replace', which follows: - - ------------------------------------ cut ------------------------------------ -: #-*- Perl -*- - -### global-modify --- modify the contents of a file by a Perl expression - -## Copyright (C) 1999 Martin Buchholz. -## Copyright (C) 2001 Ben Wing. - -## Authors: Martin Buchholz <martin@xemacs.org>, Ben Wing <ben@xemacs.org> -## Maintainer: Ben Wing <ben@xemacs.org> -## Current Version: 1.0, May 5, 2001 - -# This program is free software; you can redistribute it and/or modify -# it under the terms of the GNU General Public License as published by -# the Free Software Foundation; either version 2, or (at your option) -# any later version. -# -# This program is distributed in the hope that it will be useful, but -# WITHOUT ANY WARRANTY; without even the implied warranty of -# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU -# General Public License for more details. -# -# You should have received a copy of the GNU General Public License -# along with XEmacs; see the file COPYING. If not, write to the Free -# Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA -# 02111-1307, USA. - -eval 'exec perl -w -S $0 ${1+"$@"}' - if 0; - -use strict; -use FileHandle; -use Carp; -use Getopt::Long; -use File::Basename; - -(my $myName = $0) =~ s@.*/@@; my $usage=" -Usage: $myName [--help] [--backup-dir=DIR] [--line-mode] [--hunk-mode] - PERLEXPR FILE ... - -Globally modify a file, either line by line or in one big hunk. - -Typical usage is like this: - -[with GNU print, GNU xargs: guaranteed to handle spaces, quotes, etc. - in file names] - -find . -name '*.[ch]' -print0 | xargs -0 $0 's/\bCONST\b/const/g'\n - -[with non-GNU print, xargs] - -find . -name '*.[ch]' -print | xargs $0 's/\bCONST\b/const/g'\n - - -The file is read in, either line by line (with --line-mode specified) -or in one big hunk (with --hunk-mode specified; it's the default), and -the Perl expression is then evalled with \$_ set to the line or hunk of -text, including the terminating newline if there is one. It should -destructively modify the value there, storing the changed result in \$_. - -Files in which any modifications are made are backed up to the directory -specified using --backup-dir, or to `backup' by default. To disable this, -use --backup-dir= with no argument. - -Hunk mode is the default because it is MUCH MUCH faster than line-by-line. -Use line-by-line only when it matters, e.g. you want to do a replacement -only once per line (the default without the `g' argument). Conversely, -when using hunk mode, *ALWAYS* use `g'; otherwise, you will only make one -replacement in the entire file! -"; - -my %options = (); -$Getopt::Long::ignorecase = 0; -&GetOptions ( - \%options, - 'help', 'backup-dir=s', 'line-mode', 'hunk-mode', -); - - -die $usage if $options{"help"} or @ARGV <= 1; -my $code = shift; - -die $usage if grep (-d || ! -w, @ARGV); - -sub SafeOpen { - open ((my $fh = new FileHandle), $_[0]); - confess "Can't open $_[0]: $!" if ! defined $fh; - return $fh; -} - -sub SafeClose { - close $_[0] or confess "Can't close $_[0]: $!"; -} - -sub FileContents { - my $fh = SafeOpen ("< $_[0]"); - my $olddollarslash = $/; - local $/ = undef; - my $contents = <$fh>; - $/ = $olddollarslash; - return $contents; -} - -sub WriteStringToFile { - my $fh = SafeOpen ("> $_[0]"); - binmode $fh; - print $fh $_[1] or confess "$_[0]: $!\n"; - SafeClose $fh; -} - -foreach my $file (@ARGV) { - my $changed_p = 0; - my $new_contents = ""; - if ($options{"line-mode"}) { - my $fh = SafeOpen $file; - while (<$fh>) { - my $save_line = $_; - eval $code; - $changed_p = 1 if $save_line ne $_; - $new_contents .= $_; - } - } else { - my $orig_contents = $_ = FileContents $file; - eval $code; - if ($_ ne $orig_contents) { - $changed_p = 1; - $new_contents = $_; - } - } - - if ($changed_p) { - my $backdir = $options{"backup-dir"}; - $backdir = "backup" if !defined ($backdir); - if ($backdir) { - my ($name, $path, $suffix) = fileparse ($file, ""); - my $backfulldir = $path . $backdir; - my $backfile = "$backfulldir/$name"; - mkdir $backfulldir, 0755 unless -d $backfulldir; - print "modifying $file (original saved in $backfile)\n"; - rename $file, $backfile; - } - WriteStringToFile ($file, $new_contents); - } -} ------------------------------------ cut ------------------------------------ - - -In addition to those programs, I needed to fix up a few other -things, particularly relating to the duplicate definitions of -types, now that some types merged with others. Specifically: - -1. in lisp.h, removed duplicate declarations of Bytecount. The - changed code should now look like this: (In each code snippet - below, the first and last lines are the same as the original, as - are all lines outside of those lines. That allows you to locate - the section to be replaced, and replace the stuff in that - section, verifying that there isn't anything new added that - would need to be kept.) - ---------------------------------- snip ------------------------------------- -/* Counts of bytes or chars */ -typedef EMACS_INT Bytecount; -typedef EMACS_INT Charcount; - -/* Counts of elements */ -typedef EMACS_INT Elemcount; - -/* Hash codes */ -typedef unsigned long Hashcode; - -/* ------------------------ dynamic arrays ------------------- */ ---------------------------------- snip ------------------------------------- - -2. in lstream.h, removed duplicate declaration of Bytecount. - Rewrote the comment about this type. The changed code should - now look like this: - - ---------------------------------- snip ------------------------------------- -#endif - -/* The have been some arguments over the what the type should be that - specifies a count of bytes in a data block to be written out or read in, - using Lstream_read(), Lstream_write(), and related functions. - Originally it was long, which worked fine; Martin "corrected" these to - size_t and ssize_t on the grounds that this is theoretically cleaner and - is in keeping with the C standards. Unfortunately, this practice is - horribly error-prone due to design flaws in the way that mixed - signed/unsigned arithmetic happens. In fact, by doing this change, - Martin introduced a subtle but fatal error that caused the operation of - sending large mail messages to the SMTP server under Windows to fail. - By putting all values back to be signed, avoiding any signed/unsigned - mixing, the bug immediately went away. The type then in use was - Lstream_Data_Count, so that it be reverted cleanly if a vote came to - that. Now it is Bytecount. - - Some earlier comments about why the type must be signed: This MUST BE - SIGNED, since it also is used in functions that return the number of - bytes actually read to or written from in an operation, and these - functions can return -1 to signal error. - - Note that the standard Unix read() and write() functions define the - count going in as a size_t, which is UNSIGNED, and the count going - out as an ssize_t, which is SIGNED. This is a horrible design - flaw. Not only is it highly likely to lead to logic errors when a - -1 gets interpreted as a large positive number, but operations are - bound to fail in all sorts of horrible ways when a number in the - upper-half of the size_t range is passed in -- this number is - unrepresentable as an ssize_t, so code that checks to see how many - bytes are actually written (which is mandatory if you are dealing - with certain types of devices) will get completely screwed up. - - --ben -*/ - -typedef enum lstream_buffering ---------------------------------- snip ------------------------------------- - - -3. in dumper.c, there are four places, all inside of switch() - statements, where XD_BYTECOUNT appears twice as a case tag. In - each case, the two case blocks contain identical code, and you - should *REMOVE THE SECOND* and leave the first. -