Mercurial > hg > xemacs-beta
comparison src/README.integral-types @ 734:8bd30fae1bce
[xemacs-hg @ 2002-01-25 16:46:24 by stephent]
Per patch <87665q9yfh.fsf@tleepslib.sk.tsukuba.ac.jp>.
| author | stephent |
|---|---|
| date | Fri, 25 Jan 2002 16:46:26 +0000 |
| parents | |
| children |
comparison
equal
deleted
inserted
replaced
| 733:b1f74adcc1ff | 734:8bd30fae1bce |
|---|---|
| 1 README.integral-types | |
| 2 | |
| 3 The great integral types renaming. | |
| 4 | |
| 5 #### The content of this file was originally posted as a ChangeLog and | |
| 6 should be moved to the Internals manual. | |
| 7 | |
| 8 The purpose of this is to rationalize the names used for various | |
| 9 integral types, so that they match their intended uses and follow | |
| 10 consist conventions, and eliminate types that were not semantically | |
| 11 different from each other. | |
| 12 | |
| 13 The conventions are: | |
| 14 | |
| 15 -- All integral types that measure quantities of anything are | |
| 16 signed. Some people disagree vociferously with this, but their | |
| 17 arguments are mostly theoretical, and are vastly outweighed by | |
| 18 the practical headaches of mixing signed and unsigned values, | |
| 19 and more importantly by the far increased likelihood of | |
| 20 inadvertent bugs: Because of the broken "viral" nature of | |
| 21 unsigned quantities in C (operations involving mixed | |
| 22 signed/unsigned are done unsigned, when exactly the opposite is | |
| 23 nearly always wanted), even a single error in declaring a | |
| 24 quantity unsigned that should be signed, or even the even more | |
| 25 subtle error of comparing signed and unsigned values and | |
| 26 forgetting the necessary cast, can be catastrophic, as | |
| 27 comparisons will yield wrong results. -Wsign-compare is turned | |
| 28 on specifically to catch this, but this tends to result in a | |
| 29 great number of warnings when mixing signed and unsigned, and | |
| 30 the casts are annoying. More has been written on this | |
| 31 elsewhere. | |
| 32 | |
| 33 -- All such quantity types just mentioned boil down to EMACS_INT, | |
| 34 which is 32 bits on 32-bit machines and 64 bits on 64-bit | |
| 35 machines. This is guaranteed to be the same size as Lisp | |
| 36 objects of type `int', and (as far as I can tell) of size_t | |
| 37 (unsigned!) and ssize_t. The only type below that is not an | |
| 38 EMACS_INT is Hashcode, which is an unsigned value of the same | |
| 39 size as EMACS_INT. | |
| 40 | |
| 41 -- Type names should be relatively short (no more than 10 | |
| 42 characters or so), with the first letter capitalized and no | |
| 43 underscores if they can at all be avoided. | |
| 44 | |
| 45 -- "count" == a zero-based measurement of some quantity. Includes | |
| 46 sizes, offsets, and indexes. | |
| 47 | |
| 48 -- "bpos" == a one-based measurement of a position in a buffer. | |
| 49 "Charbpos" and "Bytebpos" count text in the buffer, rather than | |
| 50 bytes in memory; thus Bytebpos does not directly correspond to | |
| 51 the memory representation. Use "Membpos" for this. | |
| 52 | |
| 53 -- "Char" refers to internal-format characters, not to the C type | |
| 54 "char", which is really a byte. | |
| 55 | |
| 56 -- For the actual name changes, see the script below. | |
| 57 | |
| 58 I ran the following script to do the conversion. (NOTE: This script | |
| 59 is idempotent. You can safely run it multiple times and it will | |
| 60 not screw up previous results -- in fact, it will do nothing if | |
| 61 nothing has changed. Thus, it can be run repeatedly as necessary | |
| 62 to handle patches coming in from old workspaces, or old branches.) | |
| 63 There are two tags, just before and just after the change: | |
| 64 `pre-integral-type-rename' and `post-integral-type-rename'. When | |
| 65 merging code from the main trunk into a branch, the best thing to | |
| 66 do is first merge up to `pre-integral-type-rename', then apply the | |
| 67 script and associated changes, then merge from | |
| 68 `post-integral-type-change' to the present. (Alternatively, just do | |
| 69 the merging in one operation; but you may then have a lot of | |
| 70 conflicts needing to be resolved by hand.) | |
| 71 | |
| 72 Script `fixtypes.sh' follows: | |
| 73 | |
| 74 | |
| 75 ----------------------------------- cut ------------------------------------ | |
| 76 files="*.[ch] s/*.h m/*.h config.h.in ../configure.in Makefile.in.in ../lib-src/*.[ch] ../lwlib/*.[ch]" | |
| 77 gr Memory_Count Bytecount $files | |
| 78 gr Lstream_Data_Count Bytecount $files | |
| 79 gr Element_Count Elemcount $files | |
| 80 gr Hash_Code Hashcode $files | |
| 81 gr extcount bytecount $files | |
| 82 gr bufpos charbpos $files | |
| 83 gr bytind bytebpos $files | |
| 84 gr memind membpos $files | |
| 85 gr bufbyte intbyte $files | |
| 86 gr Extcount Bytecount $files | |
| 87 gr Bufpos Charbpos $files | |
| 88 gr Bytind Bytebpos $files | |
| 89 gr Memind Membpos $files | |
| 90 gr Bufbyte Intbyte $files | |
| 91 gr EXTCOUNT BYTECOUNT $files | |
| 92 gr BUFPOS CHARBPOS $files | |
| 93 gr BYTIND BYTEBPOS $files | |
| 94 gr MEMIND MEMBPOS $files | |
| 95 gr BUFBYTE INTBYTE $files | |
| 96 gr MEMORY_COUNT BYTECOUNT $files | |
| 97 gr LSTREAM_DATA_COUNT BYTECOUNT $files | |
| 98 gr ELEMENT_COUNT ELEMCOUNT $files | |
| 99 gr HASH_CODE HASHCODE $files | |
| 100 ----------------------------------- cut ------------------------------------ | |
| 101 | |
| 102 | |
| 103 `fixtypes.sh' is a Bourne-shell script; it uses 'gr': | |
| 104 | |
| 105 | |
| 106 ----------------------------------- cut ------------------------------------ | |
| 107 #!/bin/sh | |
| 108 | |
| 109 # Usage is like this: | |
| 110 | |
| 111 # gr FROM TO FILES ... | |
| 112 | |
| 113 # globally replace FROM with TO in FILES. FROM and TO are regular expressions. | |
| 114 # backup files are stored in the `backup' directory. | |
| 115 from="$1" | |
| 116 to="$2" | |
| 117 shift 2 | |
| 118 echo ${1+"$@"} | xargs global-replace "s/$from/$to/g" | |
| 119 ----------------------------------- cut ------------------------------------ | |
| 120 | |
| 121 | |
| 122 `gr' in turn uses a Perl script to do its real work, | |
| 123 `global-replace', which follows: | |
| 124 | |
| 125 | |
| 126 ----------------------------------- cut ------------------------------------ | |
| 127 : #-*- Perl -*- | |
| 128 | |
| 129 ### global-modify --- modify the contents of a file by a Perl expression | |
| 130 | |
| 131 ## Copyright (C) 1999 Martin Buchholz. | |
| 132 ## Copyright (C) 2001 Ben Wing. | |
| 133 | |
| 134 ## Authors: Martin Buchholz <martin@xemacs.org>, Ben Wing <ben@xemacs.org> | |
| 135 ## Maintainer: Ben Wing <ben@xemacs.org> | |
| 136 ## Current Version: 1.0, May 5, 2001 | |
| 137 | |
| 138 # This program is free software; you can redistribute it and/or modify | |
| 139 # it under the terms of the GNU General Public License as published by | |
| 140 # the Free Software Foundation; either version 2, or (at your option) | |
| 141 # any later version. | |
| 142 # | |
| 143 # This program is distributed in the hope that it will be useful, but | |
| 144 # WITHOUT ANY WARRANTY; without even the implied warranty of | |
| 145 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU | |
| 146 # General Public License for more details. | |
| 147 # | |
| 148 # You should have received a copy of the GNU General Public License | |
| 149 # along with XEmacs; see the file COPYING. If not, write to the Free | |
| 150 # Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA | |
| 151 # 02111-1307, USA. | |
| 152 | |
| 153 eval 'exec perl -w -S $0 ${1+"$@"}' | |
| 154 if 0; | |
| 155 | |
| 156 use strict; | |
| 157 use FileHandle; | |
| 158 use Carp; | |
| 159 use Getopt::Long; | |
| 160 use File::Basename; | |
| 161 | |
| 162 (my $myName = $0) =~ s@.*/@@; my $usage=" | |
| 163 Usage: $myName [--help] [--backup-dir=DIR] [--line-mode] [--hunk-mode] | |
| 164 PERLEXPR FILE ... | |
| 165 | |
| 166 Globally modify a file, either line by line or in one big hunk. | |
| 167 | |
| 168 Typical usage is like this: | |
| 169 | |
| 170 [with GNU print, GNU xargs: guaranteed to handle spaces, quotes, etc. | |
| 171 in file names] | |
| 172 | |
| 173 find . -name '*.[ch]' -print0 | xargs -0 $0 's/\bCONST\b/const/g'\n | |
| 174 | |
| 175 [with non-GNU print, xargs] | |
| 176 | |
| 177 find . -name '*.[ch]' -print | xargs $0 's/\bCONST\b/const/g'\n | |
| 178 | |
| 179 | |
| 180 The file is read in, either line by line (with --line-mode specified) | |
| 181 or in one big hunk (with --hunk-mode specified; it's the default), and | |
| 182 the Perl expression is then evalled with \$_ set to the line or hunk of | |
| 183 text, including the terminating newline if there is one. It should | |
| 184 destructively modify the value there, storing the changed result in \$_. | |
| 185 | |
| 186 Files in which any modifications are made are backed up to the directory | |
| 187 specified using --backup-dir, or to `backup' by default. To disable this, | |
| 188 use --backup-dir= with no argument. | |
| 189 | |
| 190 Hunk mode is the default because it is MUCH MUCH faster than line-by-line. | |
| 191 Use line-by-line only when it matters, e.g. you want to do a replacement | |
| 192 only once per line (the default without the `g' argument). Conversely, | |
| 193 when using hunk mode, *ALWAYS* use `g'; otherwise, you will only make one | |
| 194 replacement in the entire file! | |
| 195 "; | |
| 196 | |
| 197 my %options = (); | |
| 198 $Getopt::Long::ignorecase = 0; | |
| 199 &GetOptions ( | |
| 200 \%options, | |
| 201 'help', 'backup-dir=s', 'line-mode', 'hunk-mode', | |
| 202 ); | |
| 203 | |
| 204 | |
| 205 die $usage if $options{"help"} or @ARGV <= 1; | |
| 206 my $code = shift; | |
| 207 | |
| 208 die $usage if grep (-d || ! -w, @ARGV); | |
| 209 | |
| 210 sub SafeOpen { | |
| 211 open ((my $fh = new FileHandle), $_[0]); | |
| 212 confess "Can't open $_[0]: $!" if ! defined $fh; | |
| 213 return $fh; | |
| 214 } | |
| 215 | |
| 216 sub SafeClose { | |
| 217 close $_[0] or confess "Can't close $_[0]: $!"; | |
| 218 } | |
| 219 | |
| 220 sub FileContents { | |
| 221 my $fh = SafeOpen ("< $_[0]"); | |
| 222 my $olddollarslash = $/; | |
| 223 local $/ = undef; | |
| 224 my $contents = <$fh>; | |
| 225 $/ = $olddollarslash; | |
| 226 return $contents; | |
| 227 } | |
| 228 | |
| 229 sub WriteStringToFile { | |
| 230 my $fh = SafeOpen ("> $_[0]"); | |
| 231 binmode $fh; | |
| 232 print $fh $_[1] or confess "$_[0]: $!\n"; | |
| 233 SafeClose $fh; | |
| 234 } | |
| 235 | |
| 236 foreach my $file (@ARGV) { | |
| 237 my $changed_p = 0; | |
| 238 my $new_contents = ""; | |
| 239 if ($options{"line-mode"}) { | |
| 240 my $fh = SafeOpen $file; | |
| 241 while (<$fh>) { | |
| 242 my $save_line = $_; | |
| 243 eval $code; | |
| 244 $changed_p = 1 if $save_line ne $_; | |
| 245 $new_contents .= $_; | |
| 246 } | |
| 247 } else { | |
| 248 my $orig_contents = $_ = FileContents $file; | |
| 249 eval $code; | |
| 250 if ($_ ne $orig_contents) { | |
| 251 $changed_p = 1; | |
| 252 $new_contents = $_; | |
| 253 } | |
| 254 } | |
| 255 | |
| 256 if ($changed_p) { | |
| 257 my $backdir = $options{"backup-dir"}; | |
| 258 $backdir = "backup" if !defined ($backdir); | |
| 259 if ($backdir) { | |
| 260 my ($name, $path, $suffix) = fileparse ($file, ""); | |
| 261 my $backfulldir = $path . $backdir; | |
| 262 my $backfile = "$backfulldir/$name"; | |
| 263 mkdir $backfulldir, 0755 unless -d $backfulldir; | |
| 264 print "modifying $file (original saved in $backfile)\n"; | |
| 265 rename $file, $backfile; | |
| 266 } | |
| 267 WriteStringToFile ($file, $new_contents); | |
| 268 } | |
| 269 } | |
| 270 ----------------------------------- cut ------------------------------------ | |
| 271 | |
| 272 | |
| 273 In addition to those programs, I needed to fix up a few other | |
| 274 things, particularly relating to the duplicate definitions of | |
| 275 types, now that some types merged with others. Specifically: | |
| 276 | |
| 277 1. in lisp.h, removed duplicate declarations of Bytecount. The | |
| 278 changed code should now look like this: (In each code snippet | |
| 279 below, the first and last lines are the same as the original, as | |
| 280 are all lines outside of those lines. That allows you to locate | |
| 281 the section to be replaced, and replace the stuff in that | |
| 282 section, verifying that there isn't anything new added that | |
| 283 would need to be kept.) | |
| 284 | |
| 285 --------------------------------- snip ------------------------------------- | |
| 286 /* Counts of bytes or chars */ | |
| 287 typedef EMACS_INT Bytecount; | |
| 288 typedef EMACS_INT Charcount; | |
| 289 | |
| 290 /* Counts of elements */ | |
| 291 typedef EMACS_INT Elemcount; | |
| 292 | |
| 293 /* Hash codes */ | |
| 294 typedef unsigned long Hashcode; | |
| 295 | |
| 296 /* ------------------------ dynamic arrays ------------------- */ | |
| 297 --------------------------------- snip ------------------------------------- | |
| 298 | |
| 299 2. in lstream.h, removed duplicate declaration of Bytecount. | |
| 300 Rewrote the comment about this type. The changed code should | |
| 301 now look like this: | |
| 302 | |
| 303 | |
| 304 --------------------------------- snip ------------------------------------- | |
| 305 #endif | |
| 306 | |
| 307 /* The have been some arguments over the what the type should be that | |
| 308 specifies a count of bytes in a data block to be written out or read in, | |
| 309 using Lstream_read(), Lstream_write(), and related functions. | |
| 310 Originally it was long, which worked fine; Martin "corrected" these to | |
| 311 size_t and ssize_t on the grounds that this is theoretically cleaner and | |
| 312 is in keeping with the C standards. Unfortunately, this practice is | |
| 313 horribly error-prone due to design flaws in the way that mixed | |
| 314 signed/unsigned arithmetic happens. In fact, by doing this change, | |
| 315 Martin introduced a subtle but fatal error that caused the operation of | |
| 316 sending large mail messages to the SMTP server under Windows to fail. | |
| 317 By putting all values back to be signed, avoiding any signed/unsigned | |
| 318 mixing, the bug immediately went away. The type then in use was | |
| 319 Lstream_Data_Count, so that it be reverted cleanly if a vote came to | |
| 320 that. Now it is Bytecount. | |
| 321 | |
| 322 Some earlier comments about why the type must be signed: This MUST BE | |
| 323 SIGNED, since it also is used in functions that return the number of | |
| 324 bytes actually read to or written from in an operation, and these | |
| 325 functions can return -1 to signal error. | |
| 326 | |
| 327 Note that the standard Unix read() and write() functions define the | |
| 328 count going in as a size_t, which is UNSIGNED, and the count going | |
| 329 out as an ssize_t, which is SIGNED. This is a horrible design | |
| 330 flaw. Not only is it highly likely to lead to logic errors when a | |
| 331 -1 gets interpreted as a large positive number, but operations are | |
| 332 bound to fail in all sorts of horrible ways when a number in the | |
| 333 upper-half of the size_t range is passed in -- this number is | |
| 334 unrepresentable as an ssize_t, so code that checks to see how many | |
| 335 bytes are actually written (which is mandatory if you are dealing | |
| 336 with certain types of devices) will get completely screwed up. | |
| 337 | |
| 338 --ben | |
| 339 */ | |
| 340 | |
| 341 typedef enum lstream_buffering | |
| 342 --------------------------------- snip ------------------------------------- | |
| 343 | |
| 344 | |
| 345 3. in dumper.c, there are four places, all inside of switch() | |
| 346 statements, where XD_BYTECOUNT appears twice as a case tag. In | |
| 347 each case, the two case blocks contain identical code, and you | |
| 348 should *REMOVE THE SECOND* and leave the first. | |
| 349 |
