Mercurial > hg > xemacs-beta
comparison src/README.integral-types @ 734:8bd30fae1bce
[xemacs-hg @ 2002-01-25 16:46:24 by stephent]
Per patch <87665q9yfh.fsf@tleepslib.sk.tsukuba.ac.jp>.
author | stephent |
---|---|
date | Fri, 25 Jan 2002 16:46:26 +0000 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
733:b1f74adcc1ff | 734:8bd30fae1bce |
---|---|
1 README.integral-types | |
2 | |
3 The great integral types renaming. | |
4 | |
5 #### The content of this file was originally posted as a ChangeLog and | |
6 should be moved to the Internals manual. | |
7 | |
8 The purpose of this is to rationalize the names used for various | |
9 integral types, so that they match their intended uses and follow | |
10 consist conventions, and eliminate types that were not semantically | |
11 different from each other. | |
12 | |
13 The conventions are: | |
14 | |
15 -- All integral types that measure quantities of anything are | |
16 signed. Some people disagree vociferously with this, but their | |
17 arguments are mostly theoretical, and are vastly outweighed by | |
18 the practical headaches of mixing signed and unsigned values, | |
19 and more importantly by the far increased likelihood of | |
20 inadvertent bugs: Because of the broken "viral" nature of | |
21 unsigned quantities in C (operations involving mixed | |
22 signed/unsigned are done unsigned, when exactly the opposite is | |
23 nearly always wanted), even a single error in declaring a | |
24 quantity unsigned that should be signed, or even the even more | |
25 subtle error of comparing signed and unsigned values and | |
26 forgetting the necessary cast, can be catastrophic, as | |
27 comparisons will yield wrong results. -Wsign-compare is turned | |
28 on specifically to catch this, but this tends to result in a | |
29 great number of warnings when mixing signed and unsigned, and | |
30 the casts are annoying. More has been written on this | |
31 elsewhere. | |
32 | |
33 -- All such quantity types just mentioned boil down to EMACS_INT, | |
34 which is 32 bits on 32-bit machines and 64 bits on 64-bit | |
35 machines. This is guaranteed to be the same size as Lisp | |
36 objects of type `int', and (as far as I can tell) of size_t | |
37 (unsigned!) and ssize_t. The only type below that is not an | |
38 EMACS_INT is Hashcode, which is an unsigned value of the same | |
39 size as EMACS_INT. | |
40 | |
41 -- Type names should be relatively short (no more than 10 | |
42 characters or so), with the first letter capitalized and no | |
43 underscores if they can at all be avoided. | |
44 | |
45 -- "count" == a zero-based measurement of some quantity. Includes | |
46 sizes, offsets, and indexes. | |
47 | |
48 -- "bpos" == a one-based measurement of a position in a buffer. | |
49 "Charbpos" and "Bytebpos" count text in the buffer, rather than | |
50 bytes in memory; thus Bytebpos does not directly correspond to | |
51 the memory representation. Use "Membpos" for this. | |
52 | |
53 -- "Char" refers to internal-format characters, not to the C type | |
54 "char", which is really a byte. | |
55 | |
56 -- For the actual name changes, see the script below. | |
57 | |
58 I ran the following script to do the conversion. (NOTE: This script | |
59 is idempotent. You can safely run it multiple times and it will | |
60 not screw up previous results -- in fact, it will do nothing if | |
61 nothing has changed. Thus, it can be run repeatedly as necessary | |
62 to handle patches coming in from old workspaces, or old branches.) | |
63 There are two tags, just before and just after the change: | |
64 `pre-integral-type-rename' and `post-integral-type-rename'. When | |
65 merging code from the main trunk into a branch, the best thing to | |
66 do is first merge up to `pre-integral-type-rename', then apply the | |
67 script and associated changes, then merge from | |
68 `post-integral-type-change' to the present. (Alternatively, just do | |
69 the merging in one operation; but you may then have a lot of | |
70 conflicts needing to be resolved by hand.) | |
71 | |
72 Script `fixtypes.sh' follows: | |
73 | |
74 | |
75 ----------------------------------- cut ------------------------------------ | |
76 files="*.[ch] s/*.h m/*.h config.h.in ../configure.in Makefile.in.in ../lib-src/*.[ch] ../lwlib/*.[ch]" | |
77 gr Memory_Count Bytecount $files | |
78 gr Lstream_Data_Count Bytecount $files | |
79 gr Element_Count Elemcount $files | |
80 gr Hash_Code Hashcode $files | |
81 gr extcount bytecount $files | |
82 gr bufpos charbpos $files | |
83 gr bytind bytebpos $files | |
84 gr memind membpos $files | |
85 gr bufbyte intbyte $files | |
86 gr Extcount Bytecount $files | |
87 gr Bufpos Charbpos $files | |
88 gr Bytind Bytebpos $files | |
89 gr Memind Membpos $files | |
90 gr Bufbyte Intbyte $files | |
91 gr EXTCOUNT BYTECOUNT $files | |
92 gr BUFPOS CHARBPOS $files | |
93 gr BYTIND BYTEBPOS $files | |
94 gr MEMIND MEMBPOS $files | |
95 gr BUFBYTE INTBYTE $files | |
96 gr MEMORY_COUNT BYTECOUNT $files | |
97 gr LSTREAM_DATA_COUNT BYTECOUNT $files | |
98 gr ELEMENT_COUNT ELEMCOUNT $files | |
99 gr HASH_CODE HASHCODE $files | |
100 ----------------------------------- cut ------------------------------------ | |
101 | |
102 | |
103 `fixtypes.sh' is a Bourne-shell script; it uses 'gr': | |
104 | |
105 | |
106 ----------------------------------- cut ------------------------------------ | |
107 #!/bin/sh | |
108 | |
109 # Usage is like this: | |
110 | |
111 # gr FROM TO FILES ... | |
112 | |
113 # globally replace FROM with TO in FILES. FROM and TO are regular expressions. | |
114 # backup files are stored in the `backup' directory. | |
115 from="$1" | |
116 to="$2" | |
117 shift 2 | |
118 echo ${1+"$@"} | xargs global-replace "s/$from/$to/g" | |
119 ----------------------------------- cut ------------------------------------ | |
120 | |
121 | |
122 `gr' in turn uses a Perl script to do its real work, | |
123 `global-replace', which follows: | |
124 | |
125 | |
126 ----------------------------------- cut ------------------------------------ | |
127 : #-*- Perl -*- | |
128 | |
129 ### global-modify --- modify the contents of a file by a Perl expression | |
130 | |
131 ## Copyright (C) 1999 Martin Buchholz. | |
132 ## Copyright (C) 2001 Ben Wing. | |
133 | |
134 ## Authors: Martin Buchholz <martin@xemacs.org>, Ben Wing <ben@xemacs.org> | |
135 ## Maintainer: Ben Wing <ben@xemacs.org> | |
136 ## Current Version: 1.0, May 5, 2001 | |
137 | |
138 # This program is free software; you can redistribute it and/or modify | |
139 # it under the terms of the GNU General Public License as published by | |
140 # the Free Software Foundation; either version 2, or (at your option) | |
141 # any later version. | |
142 # | |
143 # This program is distributed in the hope that it will be useful, but | |
144 # WITHOUT ANY WARRANTY; without even the implied warranty of | |
145 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU | |
146 # General Public License for more details. | |
147 # | |
148 # You should have received a copy of the GNU General Public License | |
149 # along with XEmacs; see the file COPYING. If not, write to the Free | |
150 # Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA | |
151 # 02111-1307, USA. | |
152 | |
153 eval 'exec perl -w -S $0 ${1+"$@"}' | |
154 if 0; | |
155 | |
156 use strict; | |
157 use FileHandle; | |
158 use Carp; | |
159 use Getopt::Long; | |
160 use File::Basename; | |
161 | |
162 (my $myName = $0) =~ s@.*/@@; my $usage=" | |
163 Usage: $myName [--help] [--backup-dir=DIR] [--line-mode] [--hunk-mode] | |
164 PERLEXPR FILE ... | |
165 | |
166 Globally modify a file, either line by line or in one big hunk. | |
167 | |
168 Typical usage is like this: | |
169 | |
170 [with GNU print, GNU xargs: guaranteed to handle spaces, quotes, etc. | |
171 in file names] | |
172 | |
173 find . -name '*.[ch]' -print0 | xargs -0 $0 's/\bCONST\b/const/g'\n | |
174 | |
175 [with non-GNU print, xargs] | |
176 | |
177 find . -name '*.[ch]' -print | xargs $0 's/\bCONST\b/const/g'\n | |
178 | |
179 | |
180 The file is read in, either line by line (with --line-mode specified) | |
181 or in one big hunk (with --hunk-mode specified; it's the default), and | |
182 the Perl expression is then evalled with \$_ set to the line or hunk of | |
183 text, including the terminating newline if there is one. It should | |
184 destructively modify the value there, storing the changed result in \$_. | |
185 | |
186 Files in which any modifications are made are backed up to the directory | |
187 specified using --backup-dir, or to `backup' by default. To disable this, | |
188 use --backup-dir= with no argument. | |
189 | |
190 Hunk mode is the default because it is MUCH MUCH faster than line-by-line. | |
191 Use line-by-line only when it matters, e.g. you want to do a replacement | |
192 only once per line (the default without the `g' argument). Conversely, | |
193 when using hunk mode, *ALWAYS* use `g'; otherwise, you will only make one | |
194 replacement in the entire file! | |
195 "; | |
196 | |
197 my %options = (); | |
198 $Getopt::Long::ignorecase = 0; | |
199 &GetOptions ( | |
200 \%options, | |
201 'help', 'backup-dir=s', 'line-mode', 'hunk-mode', | |
202 ); | |
203 | |
204 | |
205 die $usage if $options{"help"} or @ARGV <= 1; | |
206 my $code = shift; | |
207 | |
208 die $usage if grep (-d || ! -w, @ARGV); | |
209 | |
210 sub SafeOpen { | |
211 open ((my $fh = new FileHandle), $_[0]); | |
212 confess "Can't open $_[0]: $!" if ! defined $fh; | |
213 return $fh; | |
214 } | |
215 | |
216 sub SafeClose { | |
217 close $_[0] or confess "Can't close $_[0]: $!"; | |
218 } | |
219 | |
220 sub FileContents { | |
221 my $fh = SafeOpen ("< $_[0]"); | |
222 my $olddollarslash = $/; | |
223 local $/ = undef; | |
224 my $contents = <$fh>; | |
225 $/ = $olddollarslash; | |
226 return $contents; | |
227 } | |
228 | |
229 sub WriteStringToFile { | |
230 my $fh = SafeOpen ("> $_[0]"); | |
231 binmode $fh; | |
232 print $fh $_[1] or confess "$_[0]: $!\n"; | |
233 SafeClose $fh; | |
234 } | |
235 | |
236 foreach my $file (@ARGV) { | |
237 my $changed_p = 0; | |
238 my $new_contents = ""; | |
239 if ($options{"line-mode"}) { | |
240 my $fh = SafeOpen $file; | |
241 while (<$fh>) { | |
242 my $save_line = $_; | |
243 eval $code; | |
244 $changed_p = 1 if $save_line ne $_; | |
245 $new_contents .= $_; | |
246 } | |
247 } else { | |
248 my $orig_contents = $_ = FileContents $file; | |
249 eval $code; | |
250 if ($_ ne $orig_contents) { | |
251 $changed_p = 1; | |
252 $new_contents = $_; | |
253 } | |
254 } | |
255 | |
256 if ($changed_p) { | |
257 my $backdir = $options{"backup-dir"}; | |
258 $backdir = "backup" if !defined ($backdir); | |
259 if ($backdir) { | |
260 my ($name, $path, $suffix) = fileparse ($file, ""); | |
261 my $backfulldir = $path . $backdir; | |
262 my $backfile = "$backfulldir/$name"; | |
263 mkdir $backfulldir, 0755 unless -d $backfulldir; | |
264 print "modifying $file (original saved in $backfile)\n"; | |
265 rename $file, $backfile; | |
266 } | |
267 WriteStringToFile ($file, $new_contents); | |
268 } | |
269 } | |
270 ----------------------------------- cut ------------------------------------ | |
271 | |
272 | |
273 In addition to those programs, I needed to fix up a few other | |
274 things, particularly relating to the duplicate definitions of | |
275 types, now that some types merged with others. Specifically: | |
276 | |
277 1. in lisp.h, removed duplicate declarations of Bytecount. The | |
278 changed code should now look like this: (In each code snippet | |
279 below, the first and last lines are the same as the original, as | |
280 are all lines outside of those lines. That allows you to locate | |
281 the section to be replaced, and replace the stuff in that | |
282 section, verifying that there isn't anything new added that | |
283 would need to be kept.) | |
284 | |
285 --------------------------------- snip ------------------------------------- | |
286 /* Counts of bytes or chars */ | |
287 typedef EMACS_INT Bytecount; | |
288 typedef EMACS_INT Charcount; | |
289 | |
290 /* Counts of elements */ | |
291 typedef EMACS_INT Elemcount; | |
292 | |
293 /* Hash codes */ | |
294 typedef unsigned long Hashcode; | |
295 | |
296 /* ------------------------ dynamic arrays ------------------- */ | |
297 --------------------------------- snip ------------------------------------- | |
298 | |
299 2. in lstream.h, removed duplicate declaration of Bytecount. | |
300 Rewrote the comment about this type. The changed code should | |
301 now look like this: | |
302 | |
303 | |
304 --------------------------------- snip ------------------------------------- | |
305 #endif | |
306 | |
307 /* The have been some arguments over the what the type should be that | |
308 specifies a count of bytes in a data block to be written out or read in, | |
309 using Lstream_read(), Lstream_write(), and related functions. | |
310 Originally it was long, which worked fine; Martin "corrected" these to | |
311 size_t and ssize_t on the grounds that this is theoretically cleaner and | |
312 is in keeping with the C standards. Unfortunately, this practice is | |
313 horribly error-prone due to design flaws in the way that mixed | |
314 signed/unsigned arithmetic happens. In fact, by doing this change, | |
315 Martin introduced a subtle but fatal error that caused the operation of | |
316 sending large mail messages to the SMTP server under Windows to fail. | |
317 By putting all values back to be signed, avoiding any signed/unsigned | |
318 mixing, the bug immediately went away. The type then in use was | |
319 Lstream_Data_Count, so that it be reverted cleanly if a vote came to | |
320 that. Now it is Bytecount. | |
321 | |
322 Some earlier comments about why the type must be signed: This MUST BE | |
323 SIGNED, since it also is used in functions that return the number of | |
324 bytes actually read to or written from in an operation, and these | |
325 functions can return -1 to signal error. | |
326 | |
327 Note that the standard Unix read() and write() functions define the | |
328 count going in as a size_t, which is UNSIGNED, and the count going | |
329 out as an ssize_t, which is SIGNED. This is a horrible design | |
330 flaw. Not only is it highly likely to lead to logic errors when a | |
331 -1 gets interpreted as a large positive number, but operations are | |
332 bound to fail in all sorts of horrible ways when a number in the | |
333 upper-half of the size_t range is passed in -- this number is | |
334 unrepresentable as an ssize_t, so code that checks to see how many | |
335 bytes are actually written (which is mandatory if you are dealing | |
336 with certain types of devices) will get completely screwed up. | |
337 | |
338 --ben | |
339 */ | |
340 | |
341 typedef enum lstream_buffering | |
342 --------------------------------- snip ------------------------------------- | |
343 | |
344 | |
345 3. in dumper.c, there are four places, all inside of switch() | |
346 statements, where XD_BYTECOUNT appears twice as a case tag. In | |
347 each case, the two case blocks contain identical code, and you | |
348 should *REMOVE THE SECOND* and leave the first. | |
349 |