Mercurial > hg > xemacs-beta
diff src/README.integral-types @ 734:8bd30fae1bce
[xemacs-hg @ 2002-01-25 16:46:24 by stephent]
Per patch <87665q9yfh.fsf@tleepslib.sk.tsukuba.ac.jp>.
author | stephent |
---|---|
date | Fri, 25 Jan 2002 16:46:26 +0000 |
parents | |
children |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/src/README.integral-types Fri Jan 25 16:46:26 2002 +0000 @@ -0,0 +1,349 @@ +README.integral-types + +The great integral types renaming. + +#### The content of this file was originally posted as a ChangeLog and +should be moved to the Internals manual. + +The purpose of this is to rationalize the names used for various +integral types, so that they match their intended uses and follow +consist conventions, and eliminate types that were not semantically +different from each other. + +The conventions are: + +-- All integral types that measure quantities of anything are + signed. Some people disagree vociferously with this, but their + arguments are mostly theoretical, and are vastly outweighed by + the practical headaches of mixing signed and unsigned values, + and more importantly by the far increased likelihood of + inadvertent bugs: Because of the broken "viral" nature of + unsigned quantities in C (operations involving mixed + signed/unsigned are done unsigned, when exactly the opposite is + nearly always wanted), even a single error in declaring a + quantity unsigned that should be signed, or even the even more + subtle error of comparing signed and unsigned values and + forgetting the necessary cast, can be catastrophic, as + comparisons will yield wrong results. -Wsign-compare is turned + on specifically to catch this, but this tends to result in a + great number of warnings when mixing signed and unsigned, and + the casts are annoying. More has been written on this + elsewhere. + +-- All such quantity types just mentioned boil down to EMACS_INT, + which is 32 bits on 32-bit machines and 64 bits on 64-bit + machines. This is guaranteed to be the same size as Lisp + objects of type `int', and (as far as I can tell) of size_t + (unsigned!) and ssize_t. The only type below that is not an + EMACS_INT is Hashcode, which is an unsigned value of the same + size as EMACS_INT. + +-- Type names should be relatively short (no more than 10 + characters or so), with the first letter capitalized and no + underscores if they can at all be avoided. + +-- "count" == a zero-based measurement of some quantity. Includes + sizes, offsets, and indexes. + +-- "bpos" == a one-based measurement of a position in a buffer. + "Charbpos" and "Bytebpos" count text in the buffer, rather than + bytes in memory; thus Bytebpos does not directly correspond to + the memory representation. Use "Membpos" for this. + +-- "Char" refers to internal-format characters, not to the C type + "char", which is really a byte. + +-- For the actual name changes, see the script below. + +I ran the following script to do the conversion. (NOTE: This script +is idempotent. You can safely run it multiple times and it will +not screw up previous results -- in fact, it will do nothing if +nothing has changed. Thus, it can be run repeatedly as necessary +to handle patches coming in from old workspaces, or old branches.) +There are two tags, just before and just after the change: +`pre-integral-type-rename' and `post-integral-type-rename'. When +merging code from the main trunk into a branch, the best thing to +do is first merge up to `pre-integral-type-rename', then apply the +script and associated changes, then merge from +`post-integral-type-change' to the present. (Alternatively, just do +the merging in one operation; but you may then have a lot of +conflicts needing to be resolved by hand.) + +Script `fixtypes.sh' follows: + + +----------------------------------- cut ------------------------------------ +files="*.[ch] s/*.h m/*.h config.h.in ../configure.in Makefile.in.in ../lib-src/*.[ch] ../lwlib/*.[ch]" +gr Memory_Count Bytecount $files +gr Lstream_Data_Count Bytecount $files +gr Element_Count Elemcount $files +gr Hash_Code Hashcode $files +gr extcount bytecount $files +gr bufpos charbpos $files +gr bytind bytebpos $files +gr memind membpos $files +gr bufbyte intbyte $files +gr Extcount Bytecount $files +gr Bufpos Charbpos $files +gr Bytind Bytebpos $files +gr Memind Membpos $files +gr Bufbyte Intbyte $files +gr EXTCOUNT BYTECOUNT $files +gr BUFPOS CHARBPOS $files +gr BYTIND BYTEBPOS $files +gr MEMIND MEMBPOS $files +gr BUFBYTE INTBYTE $files +gr MEMORY_COUNT BYTECOUNT $files +gr LSTREAM_DATA_COUNT BYTECOUNT $files +gr ELEMENT_COUNT ELEMCOUNT $files +gr HASH_CODE HASHCODE $files +----------------------------------- cut ------------------------------------ + + + `fixtypes.sh' is a Bourne-shell script; it uses 'gr': + + +----------------------------------- cut ------------------------------------ +#!/bin/sh + +# Usage is like this: + +# gr FROM TO FILES ... + +# globally replace FROM with TO in FILES. FROM and TO are regular expressions. +# backup files are stored in the `backup' directory. +from="$1" +to="$2" +shift 2 +echo ${1+"$@"} | xargs global-replace "s/$from/$to/g" +----------------------------------- cut ------------------------------------ + + + `gr' in turn uses a Perl script to do its real work, + `global-replace', which follows: + + +----------------------------------- cut ------------------------------------ +: #-*- Perl -*- + +### global-modify --- modify the contents of a file by a Perl expression + +## Copyright (C) 1999 Martin Buchholz. +## Copyright (C) 2001 Ben Wing. + +## Authors: Martin Buchholz <martin@xemacs.org>, Ben Wing <ben@xemacs.org> +## Maintainer: Ben Wing <ben@xemacs.org> +## Current Version: 1.0, May 5, 2001 + +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2, or (at your option) +# any later version. +# +# This program is distributed in the hope that it will be useful, but +# WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +# General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with XEmacs; see the file COPYING. If not, write to the Free +# Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA +# 02111-1307, USA. + +eval 'exec perl -w -S $0 ${1+"$@"}' + if 0; + +use strict; +use FileHandle; +use Carp; +use Getopt::Long; +use File::Basename; + +(my $myName = $0) =~ s@.*/@@; my $usage=" +Usage: $myName [--help] [--backup-dir=DIR] [--line-mode] [--hunk-mode] + PERLEXPR FILE ... + +Globally modify a file, either line by line or in one big hunk. + +Typical usage is like this: + +[with GNU print, GNU xargs: guaranteed to handle spaces, quotes, etc. + in file names] + +find . -name '*.[ch]' -print0 | xargs -0 $0 's/\bCONST\b/const/g'\n + +[with non-GNU print, xargs] + +find . -name '*.[ch]' -print | xargs $0 's/\bCONST\b/const/g'\n + + +The file is read in, either line by line (with --line-mode specified) +or in one big hunk (with --hunk-mode specified; it's the default), and +the Perl expression is then evalled with \$_ set to the line or hunk of +text, including the terminating newline if there is one. It should +destructively modify the value there, storing the changed result in \$_. + +Files in which any modifications are made are backed up to the directory +specified using --backup-dir, or to `backup' by default. To disable this, +use --backup-dir= with no argument. + +Hunk mode is the default because it is MUCH MUCH faster than line-by-line. +Use line-by-line only when it matters, e.g. you want to do a replacement +only once per line (the default without the `g' argument). Conversely, +when using hunk mode, *ALWAYS* use `g'; otherwise, you will only make one +replacement in the entire file! +"; + +my %options = (); +$Getopt::Long::ignorecase = 0; +&GetOptions ( + \%options, + 'help', 'backup-dir=s', 'line-mode', 'hunk-mode', +); + + +die $usage if $options{"help"} or @ARGV <= 1; +my $code = shift; + +die $usage if grep (-d || ! -w, @ARGV); + +sub SafeOpen { + open ((my $fh = new FileHandle), $_[0]); + confess "Can't open $_[0]: $!" if ! defined $fh; + return $fh; +} + +sub SafeClose { + close $_[0] or confess "Can't close $_[0]: $!"; +} + +sub FileContents { + my $fh = SafeOpen ("< $_[0]"); + my $olddollarslash = $/; + local $/ = undef; + my $contents = <$fh>; + $/ = $olddollarslash; + return $contents; +} + +sub WriteStringToFile { + my $fh = SafeOpen ("> $_[0]"); + binmode $fh; + print $fh $_[1] or confess "$_[0]: $!\n"; + SafeClose $fh; +} + +foreach my $file (@ARGV) { + my $changed_p = 0; + my $new_contents = ""; + if ($options{"line-mode"}) { + my $fh = SafeOpen $file; + while (<$fh>) { + my $save_line = $_; + eval $code; + $changed_p = 1 if $save_line ne $_; + $new_contents .= $_; + } + } else { + my $orig_contents = $_ = FileContents $file; + eval $code; + if ($_ ne $orig_contents) { + $changed_p = 1; + $new_contents = $_; + } + } + + if ($changed_p) { + my $backdir = $options{"backup-dir"}; + $backdir = "backup" if !defined ($backdir); + if ($backdir) { + my ($name, $path, $suffix) = fileparse ($file, ""); + my $backfulldir = $path . $backdir; + my $backfile = "$backfulldir/$name"; + mkdir $backfulldir, 0755 unless -d $backfulldir; + print "modifying $file (original saved in $backfile)\n"; + rename $file, $backfile; + } + WriteStringToFile ($file, $new_contents); + } +} +----------------------------------- cut ------------------------------------ + + +In addition to those programs, I needed to fix up a few other +things, particularly relating to the duplicate definitions of +types, now that some types merged with others. Specifically: + +1. in lisp.h, removed duplicate declarations of Bytecount. The + changed code should now look like this: (In each code snippet + below, the first and last lines are the same as the original, as + are all lines outside of those lines. That allows you to locate + the section to be replaced, and replace the stuff in that + section, verifying that there isn't anything new added that + would need to be kept.) + +--------------------------------- snip ------------------------------------- +/* Counts of bytes or chars */ +typedef EMACS_INT Bytecount; +typedef EMACS_INT Charcount; + +/* Counts of elements */ +typedef EMACS_INT Elemcount; + +/* Hash codes */ +typedef unsigned long Hashcode; + +/* ------------------------ dynamic arrays ------------------- */ +--------------------------------- snip ------------------------------------- + +2. in lstream.h, removed duplicate declaration of Bytecount. + Rewrote the comment about this type. The changed code should + now look like this: + + +--------------------------------- snip ------------------------------------- +#endif + +/* The have been some arguments over the what the type should be that + specifies a count of bytes in a data block to be written out or read in, + using Lstream_read(), Lstream_write(), and related functions. + Originally it was long, which worked fine; Martin "corrected" these to + size_t and ssize_t on the grounds that this is theoretically cleaner and + is in keeping with the C standards. Unfortunately, this practice is + horribly error-prone due to design flaws in the way that mixed + signed/unsigned arithmetic happens. In fact, by doing this change, + Martin introduced a subtle but fatal error that caused the operation of + sending large mail messages to the SMTP server under Windows to fail. + By putting all values back to be signed, avoiding any signed/unsigned + mixing, the bug immediately went away. The type then in use was + Lstream_Data_Count, so that it be reverted cleanly if a vote came to + that. Now it is Bytecount. + + Some earlier comments about why the type must be signed: This MUST BE + SIGNED, since it also is used in functions that return the number of + bytes actually read to or written from in an operation, and these + functions can return -1 to signal error. + + Note that the standard Unix read() and write() functions define the + count going in as a size_t, which is UNSIGNED, and the count going + out as an ssize_t, which is SIGNED. This is a horrible design + flaw. Not only is it highly likely to lead to logic errors when a + -1 gets interpreted as a large positive number, but operations are + bound to fail in all sorts of horrible ways when a number in the + upper-half of the size_t range is passed in -- this number is + unrepresentable as an ssize_t, so code that checks to see how many + bytes are actually written (which is mandatory if you are dealing + with certain types of devices) will get completely screwed up. + + --ben +*/ + +typedef enum lstream_buffering +--------------------------------- snip ------------------------------------- + + +3. in dumper.c, there are four places, all inside of switch() + statements, where XD_BYTECOUNT appears twice as a case tag. In + each case, the two case blocks contain identical code, and you + should *REMOVE THE SECOND* and leave the first. +