734
|
1 README.integral-types
|
|
2
|
|
3 The great integral types renaming.
|
|
4
|
|
5 #### The content of this file was originally posted as a ChangeLog and
|
|
6 should be moved to the Internals manual.
|
|
7
|
|
8 The purpose of this is to rationalize the names used for various
|
|
9 integral types, so that they match their intended uses and follow
|
|
10 consist conventions, and eliminate types that were not semantically
|
|
11 different from each other.
|
|
12
|
|
13 The conventions are:
|
|
14
|
|
15 -- All integral types that measure quantities of anything are
|
|
16 signed. Some people disagree vociferously with this, but their
|
|
17 arguments are mostly theoretical, and are vastly outweighed by
|
|
18 the practical headaches of mixing signed and unsigned values,
|
|
19 and more importantly by the far increased likelihood of
|
|
20 inadvertent bugs: Because of the broken "viral" nature of
|
|
21 unsigned quantities in C (operations involving mixed
|
|
22 signed/unsigned are done unsigned, when exactly the opposite is
|
|
23 nearly always wanted), even a single error in declaring a
|
|
24 quantity unsigned that should be signed, or even the even more
|
|
25 subtle error of comparing signed and unsigned values and
|
|
26 forgetting the necessary cast, can be catastrophic, as
|
|
27 comparisons will yield wrong results. -Wsign-compare is turned
|
|
28 on specifically to catch this, but this tends to result in a
|
|
29 great number of warnings when mixing signed and unsigned, and
|
|
30 the casts are annoying. More has been written on this
|
|
31 elsewhere.
|
|
32
|
|
33 -- All such quantity types just mentioned boil down to EMACS_INT,
|
|
34 which is 32 bits on 32-bit machines and 64 bits on 64-bit
|
|
35 machines. This is guaranteed to be the same size as Lisp
|
|
36 objects of type `int', and (as far as I can tell) of size_t
|
|
37 (unsigned!) and ssize_t. The only type below that is not an
|
|
38 EMACS_INT is Hashcode, which is an unsigned value of the same
|
|
39 size as EMACS_INT.
|
|
40
|
|
41 -- Type names should be relatively short (no more than 10
|
|
42 characters or so), with the first letter capitalized and no
|
|
43 underscores if they can at all be avoided.
|
|
44
|
|
45 -- "count" == a zero-based measurement of some quantity. Includes
|
|
46 sizes, offsets, and indexes.
|
|
47
|
|
48 -- "bpos" == a one-based measurement of a position in a buffer.
|
|
49 "Charbpos" and "Bytebpos" count text in the buffer, rather than
|
|
50 bytes in memory; thus Bytebpos does not directly correspond to
|
|
51 the memory representation. Use "Membpos" for this.
|
|
52
|
|
53 -- "Char" refers to internal-format characters, not to the C type
|
|
54 "char", which is really a byte.
|
|
55
|
|
56 -- For the actual name changes, see the script below.
|
|
57
|
|
58 I ran the following script to do the conversion. (NOTE: This script
|
|
59 is idempotent. You can safely run it multiple times and it will
|
|
60 not screw up previous results -- in fact, it will do nothing if
|
|
61 nothing has changed. Thus, it can be run repeatedly as necessary
|
|
62 to handle patches coming in from old workspaces, or old branches.)
|
|
63 There are two tags, just before and just after the change:
|
|
64 `pre-integral-type-rename' and `post-integral-type-rename'. When
|
|
65 merging code from the main trunk into a branch, the best thing to
|
|
66 do is first merge up to `pre-integral-type-rename', then apply the
|
|
67 script and associated changes, then merge from
|
|
68 `post-integral-type-change' to the present. (Alternatively, just do
|
|
69 the merging in one operation; but you may then have a lot of
|
|
70 conflicts needing to be resolved by hand.)
|
|
71
|
|
72 Script `fixtypes.sh' follows:
|
|
73
|
|
74
|
|
75 ----------------------------------- cut ------------------------------------
|
|
76 files="*.[ch] s/*.h m/*.h config.h.in ../configure.in Makefile.in.in ../lib-src/*.[ch] ../lwlib/*.[ch]"
|
|
77 gr Memory_Count Bytecount $files
|
|
78 gr Lstream_Data_Count Bytecount $files
|
|
79 gr Element_Count Elemcount $files
|
|
80 gr Hash_Code Hashcode $files
|
|
81 gr extcount bytecount $files
|
|
82 gr bufpos charbpos $files
|
|
83 gr bytind bytebpos $files
|
|
84 gr memind membpos $files
|
|
85 gr bufbyte intbyte $files
|
|
86 gr Extcount Bytecount $files
|
|
87 gr Bufpos Charbpos $files
|
|
88 gr Bytind Bytebpos $files
|
|
89 gr Memind Membpos $files
|
|
90 gr Bufbyte Intbyte $files
|
|
91 gr EXTCOUNT BYTECOUNT $files
|
|
92 gr BUFPOS CHARBPOS $files
|
|
93 gr BYTIND BYTEBPOS $files
|
|
94 gr MEMIND MEMBPOS $files
|
|
95 gr BUFBYTE INTBYTE $files
|
|
96 gr MEMORY_COUNT BYTECOUNT $files
|
|
97 gr LSTREAM_DATA_COUNT BYTECOUNT $files
|
|
98 gr ELEMENT_COUNT ELEMCOUNT $files
|
|
99 gr HASH_CODE HASHCODE $files
|
|
100 ----------------------------------- cut ------------------------------------
|
|
101
|
|
102
|
|
103 `fixtypes.sh' is a Bourne-shell script; it uses 'gr':
|
|
104
|
|
105
|
|
106 ----------------------------------- cut ------------------------------------
|
|
107 #!/bin/sh
|
|
108
|
|
109 # Usage is like this:
|
|
110
|
|
111 # gr FROM TO FILES ...
|
|
112
|
|
113 # globally replace FROM with TO in FILES. FROM and TO are regular expressions.
|
|
114 # backup files are stored in the `backup' directory.
|
|
115 from="$1"
|
|
116 to="$2"
|
|
117 shift 2
|
|
118 echo ${1+"$@"} | xargs global-replace "s/$from/$to/g"
|
|
119 ----------------------------------- cut ------------------------------------
|
|
120
|
|
121
|
|
122 `gr' in turn uses a Perl script to do its real work,
|
|
123 `global-replace', which follows:
|
|
124
|
|
125
|
|
126 ----------------------------------- cut ------------------------------------
|
|
127 : #-*- Perl -*-
|
|
128
|
|
129 ### global-modify --- modify the contents of a file by a Perl expression
|
|
130
|
|
131 ## Copyright (C) 1999 Martin Buchholz.
|
|
132 ## Copyright (C) 2001 Ben Wing.
|
|
133
|
|
134 ## Authors: Martin Buchholz <martin@xemacs.org>, Ben Wing <ben@xemacs.org>
|
|
135 ## Maintainer: Ben Wing <ben@xemacs.org>
|
|
136 ## Current Version: 1.0, May 5, 2001
|
|
137
|
|
138 # This program is free software; you can redistribute it and/or modify
|
|
139 # it under the terms of the GNU General Public License as published by
|
|
140 # the Free Software Foundation; either version 2, or (at your option)
|
|
141 # any later version.
|
|
142 #
|
|
143 # This program is distributed in the hope that it will be useful, but
|
|
144 # WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
145 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
|
146 # General Public License for more details.
|
|
147 #
|
|
148 # You should have received a copy of the GNU General Public License
|
|
149 # along with XEmacs; see the file COPYING. If not, write to the Free
|
|
150 # Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA
|
|
151 # 02111-1307, USA.
|
|
152
|
|
153 eval 'exec perl -w -S $0 ${1+"$@"}'
|
|
154 if 0;
|
|
155
|
|
156 use strict;
|
|
157 use FileHandle;
|
|
158 use Carp;
|
|
159 use Getopt::Long;
|
|
160 use File::Basename;
|
|
161
|
|
162 (my $myName = $0) =~ s@.*/@@; my $usage="
|
|
163 Usage: $myName [--help] [--backup-dir=DIR] [--line-mode] [--hunk-mode]
|
|
164 PERLEXPR FILE ...
|
|
165
|
|
166 Globally modify a file, either line by line or in one big hunk.
|
|
167
|
|
168 Typical usage is like this:
|
|
169
|
|
170 [with GNU print, GNU xargs: guaranteed to handle spaces, quotes, etc.
|
|
171 in file names]
|
|
172
|
|
173 find . -name '*.[ch]' -print0 | xargs -0 $0 's/\bCONST\b/const/g'\n
|
|
174
|
|
175 [with non-GNU print, xargs]
|
|
176
|
|
177 find . -name '*.[ch]' -print | xargs $0 's/\bCONST\b/const/g'\n
|
|
178
|
|
179
|
|
180 The file is read in, either line by line (with --line-mode specified)
|
|
181 or in one big hunk (with --hunk-mode specified; it's the default), and
|
|
182 the Perl expression is then evalled with \$_ set to the line or hunk of
|
|
183 text, including the terminating newline if there is one. It should
|
|
184 destructively modify the value there, storing the changed result in \$_.
|
|
185
|
|
186 Files in which any modifications are made are backed up to the directory
|
|
187 specified using --backup-dir, or to `backup' by default. To disable this,
|
|
188 use --backup-dir= with no argument.
|
|
189
|
|
190 Hunk mode is the default because it is MUCH MUCH faster than line-by-line.
|
|
191 Use line-by-line only when it matters, e.g. you want to do a replacement
|
|
192 only once per line (the default without the `g' argument). Conversely,
|
|
193 when using hunk mode, *ALWAYS* use `g'; otherwise, you will only make one
|
|
194 replacement in the entire file!
|
|
195 ";
|
|
196
|
|
197 my %options = ();
|
|
198 $Getopt::Long::ignorecase = 0;
|
|
199 &GetOptions (
|
|
200 \%options,
|
|
201 'help', 'backup-dir=s', 'line-mode', 'hunk-mode',
|
|
202 );
|
|
203
|
|
204
|
|
205 die $usage if $options{"help"} or @ARGV <= 1;
|
|
206 my $code = shift;
|
|
207
|
|
208 die $usage if grep (-d || ! -w, @ARGV);
|
|
209
|
|
210 sub SafeOpen {
|
|
211 open ((my $fh = new FileHandle), $_[0]);
|
|
212 confess "Can't open $_[0]: $!" if ! defined $fh;
|
|
213 return $fh;
|
|
214 }
|
|
215
|
|
216 sub SafeClose {
|
|
217 close $_[0] or confess "Can't close $_[0]: $!";
|
|
218 }
|
|
219
|
|
220 sub FileContents {
|
|
221 my $fh = SafeOpen ("< $_[0]");
|
|
222 my $olddollarslash = $/;
|
|
223 local $/ = undef;
|
|
224 my $contents = <$fh>;
|
|
225 $/ = $olddollarslash;
|
|
226 return $contents;
|
|
227 }
|
|
228
|
|
229 sub WriteStringToFile {
|
|
230 my $fh = SafeOpen ("> $_[0]");
|
|
231 binmode $fh;
|
|
232 print $fh $_[1] or confess "$_[0]: $!\n";
|
|
233 SafeClose $fh;
|
|
234 }
|
|
235
|
|
236 foreach my $file (@ARGV) {
|
|
237 my $changed_p = 0;
|
|
238 my $new_contents = "";
|
|
239 if ($options{"line-mode"}) {
|
|
240 my $fh = SafeOpen $file;
|
|
241 while (<$fh>) {
|
|
242 my $save_line = $_;
|
|
243 eval $code;
|
|
244 $changed_p = 1 if $save_line ne $_;
|
|
245 $new_contents .= $_;
|
|
246 }
|
|
247 } else {
|
|
248 my $orig_contents = $_ = FileContents $file;
|
|
249 eval $code;
|
|
250 if ($_ ne $orig_contents) {
|
|
251 $changed_p = 1;
|
|
252 $new_contents = $_;
|
|
253 }
|
|
254 }
|
|
255
|
|
256 if ($changed_p) {
|
|
257 my $backdir = $options{"backup-dir"};
|
|
258 $backdir = "backup" if !defined ($backdir);
|
|
259 if ($backdir) {
|
|
260 my ($name, $path, $suffix) = fileparse ($file, "");
|
|
261 my $backfulldir = $path . $backdir;
|
|
262 my $backfile = "$backfulldir/$name";
|
|
263 mkdir $backfulldir, 0755 unless -d $backfulldir;
|
|
264 print "modifying $file (original saved in $backfile)\n";
|
|
265 rename $file, $backfile;
|
|
266 }
|
|
267 WriteStringToFile ($file, $new_contents);
|
|
268 }
|
|
269 }
|
|
270 ----------------------------------- cut ------------------------------------
|
|
271
|
|
272
|
|
273 In addition to those programs, I needed to fix up a few other
|
|
274 things, particularly relating to the duplicate definitions of
|
|
275 types, now that some types merged with others. Specifically:
|
|
276
|
|
277 1. in lisp.h, removed duplicate declarations of Bytecount. The
|
|
278 changed code should now look like this: (In each code snippet
|
|
279 below, the first and last lines are the same as the original, as
|
|
280 are all lines outside of those lines. That allows you to locate
|
|
281 the section to be replaced, and replace the stuff in that
|
|
282 section, verifying that there isn't anything new added that
|
|
283 would need to be kept.)
|
|
284
|
|
285 --------------------------------- snip -------------------------------------
|
|
286 /* Counts of bytes or chars */
|
|
287 typedef EMACS_INT Bytecount;
|
|
288 typedef EMACS_INT Charcount;
|
|
289
|
|
290 /* Counts of elements */
|
|
291 typedef EMACS_INT Elemcount;
|
|
292
|
|
293 /* Hash codes */
|
|
294 typedef unsigned long Hashcode;
|
|
295
|
|
296 /* ------------------------ dynamic arrays ------------------- */
|
|
297 --------------------------------- snip -------------------------------------
|
|
298
|
|
299 2. in lstream.h, removed duplicate declaration of Bytecount.
|
|
300 Rewrote the comment about this type. The changed code should
|
|
301 now look like this:
|
|
302
|
|
303
|
|
304 --------------------------------- snip -------------------------------------
|
|
305 #endif
|
|
306
|
|
307 /* The have been some arguments over the what the type should be that
|
|
308 specifies a count of bytes in a data block to be written out or read in,
|
|
309 using Lstream_read(), Lstream_write(), and related functions.
|
|
310 Originally it was long, which worked fine; Martin "corrected" these to
|
|
311 size_t and ssize_t on the grounds that this is theoretically cleaner and
|
|
312 is in keeping with the C standards. Unfortunately, this practice is
|
|
313 horribly error-prone due to design flaws in the way that mixed
|
|
314 signed/unsigned arithmetic happens. In fact, by doing this change,
|
|
315 Martin introduced a subtle but fatal error that caused the operation of
|
|
316 sending large mail messages to the SMTP server under Windows to fail.
|
|
317 By putting all values back to be signed, avoiding any signed/unsigned
|
|
318 mixing, the bug immediately went away. The type then in use was
|
|
319 Lstream_Data_Count, so that it be reverted cleanly if a vote came to
|
|
320 that. Now it is Bytecount.
|
|
321
|
|
322 Some earlier comments about why the type must be signed: This MUST BE
|
|
323 SIGNED, since it also is used in functions that return the number of
|
|
324 bytes actually read to or written from in an operation, and these
|
|
325 functions can return -1 to signal error.
|
|
326
|
|
327 Note that the standard Unix read() and write() functions define the
|
|
328 count going in as a size_t, which is UNSIGNED, and the count going
|
|
329 out as an ssize_t, which is SIGNED. This is a horrible design
|
|
330 flaw. Not only is it highly likely to lead to logic errors when a
|
|
331 -1 gets interpreted as a large positive number, but operations are
|
|
332 bound to fail in all sorts of horrible ways when a number in the
|
|
333 upper-half of the size_t range is passed in -- this number is
|
|
334 unrepresentable as an ssize_t, so code that checks to see how many
|
|
335 bytes are actually written (which is mandatory if you are dealing
|
|
336 with certain types of devices) will get completely screwed up.
|
|
337
|
|
338 --ben
|
|
339 */
|
|
340
|
|
341 typedef enum lstream_buffering
|
|
342 --------------------------------- snip -------------------------------------
|
|
343
|
|
344
|
|
345 3. in dumper.c, there are four places, all inside of switch()
|
|
346 statements, where XD_BYTECOUNT appears twice as a case tag. In
|
|
347 each case, the two case blocks contain identical code, and you
|
|
348 should *REMOVE THE SECOND* and leave the first.
|
|
349
|