comparison man/internals/internals.texi @ 868:48eed784e93a

[xemacs-hg @ 2002-06-05 12:00:40 by ben] To: xemacs-patches@xemacs.org internals/internals.texi:
author ben
date Wed, 05 Jun 2002 12:01:11 +0000
parents 19dfb459d51a
children e51bd28995c0
comparison
equal deleted inserted replaced
867:804517e16990 868:48eed784e93a
114 * The Lisp Language:: An overview. 114 * The Lisp Language:: An overview.
115 * XEmacs From the Perspective of Building:: 115 * XEmacs From the Perspective of Building::
116 * XEmacs From the Inside:: 116 * XEmacs From the Inside::
117 * The XEmacs Object System (Abstractly Speaking):: 117 * The XEmacs Object System (Abstractly Speaking)::
118 * How Lisp Objects Are Represented in C:: 118 * How Lisp Objects Are Represented in C::
119 * Major Textual Changes::
119 * Rules When Writing New C Code:: 120 * Rules When Writing New C Code::
120 * CVS Techniques:: 121 * CVS Techniques::
121 * A Summary of the Various XEmacs Modules:: 122 * A Summary of the Various XEmacs Modules::
122 * Allocation of Objects in XEmacs Lisp:: 123 * Allocation of Objects in XEmacs Lisp::
123 * Dumping:: 124 * Dumping::
1757 reading some Lisp code), or because they can't be created at all 1758 reading some Lisp code), or because they can't be created at all
1758 (e.g. subrs). Permanent objects, as a rule, do not have a read syntax; 1759 (e.g. subrs). Permanent objects, as a rule, do not have a read syntax;
1759 nor do most complex objects, which contain too much state to be easily 1760 nor do most complex objects, which contain too much state to be easily
1760 initialized through a read syntax. 1761 initialized through a read syntax.
1761 1762
1762 @node How Lisp Objects Are Represented in C, Rules When Writing New C Code, The XEmacs Object System (Abstractly Speaking), Top 1763 @node How Lisp Objects Are Represented in C, Major Textual Changes, The XEmacs Object System (Abstractly Speaking), Top
1763 @chapter How Lisp Objects Are Represented in C 1764 @chapter How Lisp Objects Are Represented in C
1764 @cindex Lisp objects are represented in C, how 1765 @cindex Lisp objects are represented in C, how
1765 @cindex objects are represented in C, how Lisp 1766 @cindex objects are represented in C, how Lisp
1766 @cindex represented in C, how Lisp objects are 1767 @cindex represented in C, how Lisp objects are
1767 1768
1844 performance is an issue, use @code{type_checking_assert}, 1845 performance is an issue, use @code{type_checking_assert},
1845 @code{bufpos_checking_assert}, and @code{gc_checking_assert}, which do 1846 @code{bufpos_checking_assert}, and @code{gc_checking_assert}, which do
1846 nothing unless the corresponding configure error checking flag was 1847 nothing unless the corresponding configure error checking flag was
1847 specified. 1848 specified.
1848 1849
1849 @node Rules When Writing New C Code, CVS Techniques, How Lisp Objects Are Represented in C, Top 1850 @node Major Textual Changes, Rules When Writing New C Code, How Lisp Objects Are Represented in C, Top
1851 @chapter Major Textual Changes
1852 @cindex textual changes, major
1853 @cindex major textual changes
1854
1855 Sometimes major textual changes are made to the source. This means that
1856 a search-and-replace is done to change type names and such. Some people
1857 disagree with such changes, and certainly if done without good reason
1858 will just lead to headaches. But it's important to keep the code clean
1859 and understable, and consistent naming goes a long way towards this.
1860
1861 An example of the right way to do this was the so-called "great integral
1862 type renaming".
1863
1864 @menu
1865 * Great Integral Type Renaming::
1866 * Text/Char Type Renaming::
1867 @end menu
1868
1869 @node Great Integral Type Renaming
1870 @section Great Integral Type Renaming
1871 @cindex Great Integral Type Renaming
1872 @cindex integral type renaming, great
1873 @cindex type renaming, integral
1874 @cindex renaming, integral types
1875
1876 The purpose of this is to rationalize the names used for various
1877 integral types, so that they match their intended uses and follow
1878 consist conventions, and eliminate types that were not semantically
1879 different from each other.
1880
1881 The conventions are:
1882
1883 @itemize @bullet
1884 @item
1885 All integral types that measure quantities of anything are signed. Some
1886 people disagree vociferously with this, but their arguments are mostly
1887 theoretical, and are vastly outweighed by the practical headaches of
1888 mixing signed and unsigned values, and more importantly by the far
1889 increased likelihood of inadvertent bugs: Because of the broken "viral"
1890 nature of unsigned quantities in C (operations involving mixed
1891 signed/unsigned are done unsigned, when exactly the opposite is nearly
1892 always wanted), even a single error in declaring a quantity unsigned
1893 that should be signed, or even the even more subtle error of comparing
1894 signed and unsigned values and forgetting the necessary cast, can be
1895 catastrophic, as comparisons will yield wrong results. -Wsign-compare
1896 is turned on specifically to catch this, but this tends to result in a
1897 great number of warnings when mixing signed and unsigned, and the casts
1898 are annoying. More has been written on this elsewhere.
1899
1900 @item
1901 All such quantity types just mentioned boil down to EMACS_INT, which is
1902 32 bits on 32-bit machines and 64 bits on 64-bit machines. This is
1903 guaranteed to be the same size as Lisp objects of type `int', and (as
1904 far as I can tell) of size_t (unsigned!) and ssize_t. The only type
1905 below that is not an EMACS_INT is Hashcode, which is an unsigned value
1906 of the same size as EMACS_INT.
1907
1908 @item
1909 Type names should be relatively short (no more than 10 characters or
1910 so), with the first letter capitalized and no underscores if they can at
1911 all be avoided.
1912
1913 @item
1914 "count" == a zero-based measurement of some quantity. Includes sizes,
1915 offsets, and indexes.
1916
1917 @item
1918 "bpos" == a one-based measurement of a position in a buffer. "Charbpos"
1919 and "Bytebpos" count text in the buffer, rather than bytes in memory;
1920 thus Bytebpos does not directly correspond to the memory representation.
1921 Use "Membpos" for this.
1922
1923 @item
1924 "Char" refers to internal-format characters, not to the C type "char",
1925 which is really a byte.
1926 @end itemize
1927
1928 For the actual name changes, see the script below.
1929
1930 I ran the following script to do the conversion. (NOTE: This script is
1931 idempotent. You can safely run it multiple times and it will not screw
1932 up previous results -- in fact, it will do nothing if nothing has
1933 changed. Thus, it can be run repeatedly as necessary to handle patches
1934 coming in from old workspaces, or old branches.) There are two tags,
1935 just before and just after the change: @samp{pre-integral-type-rename}
1936 and @samp{post-integral-type-rename}. When merging code from the main
1937 trunk into a branch, the best thing to do is first merge up to
1938 @samp{pre-integral-type-rename}, then apply the script and associated
1939 changes, then merge from @samp{post-integral-type-change} to the
1940 present. (Alternatively, just do the merging in one operation; but you
1941 may then have a lot of conflicts needing to be resolved by hand.)
1942
1943 Script @samp{fixtypes.sh} follows:
1944
1945 @example
1946 ----------------------------------- cut ------------------------------------
1947 files="*.[ch] s/*.h m/*.h config.h.in ../configure.in Makefile.in.in ../lib-src/*.[ch] ../lwlib/*.[ch]"
1948 gr Memory_Count Bytecount $files
1949 gr Lstream_Data_Count Bytecount $files
1950 gr Element_Count Elemcount $files
1951 gr Hash_Code Hashcode $files
1952 gr extcount bytecount $files
1953 gr bufpos charbpos $files
1954 gr bytind bytebpos $files
1955 gr memind membpos $files
1956 gr bufbyte intbyte $files
1957 gr Extcount Bytecount $files
1958 gr Bufpos Charbpos $files
1959 gr Bytind Bytebpos $files
1960 gr Memind Membpos $files
1961 gr Bufbyte Intbyte $files
1962 gr EXTCOUNT BYTECOUNT $files
1963 gr BUFPOS CHARBPOS $files
1964 gr BYTIND BYTEBPOS $files
1965 gr MEMIND MEMBPOS $files
1966 gr BUFBYTE INTBYTE $files
1967 gr MEMORY_COUNT BYTECOUNT $files
1968 gr LSTREAM_DATA_COUNT BYTECOUNT $files
1969 gr ELEMENT_COUNT ELEMCOUNT $files
1970 gr HASH_CODE HASHCODE $files
1971 ----------------------------------- cut ------------------------------------
1972 @end example
1973
1974 The @samp{gr} script, and the scripts it uses, are documented in
1975 @file{README.global-renaming}, because if placed in this file they would
1976 need to have their @@ characters doubled, meaning you couldn't easily
1977 cut and paste from the source.
1978
1979 In addition to those programs, I needed to fix up a few other
1980 things, particularly relating to the duplicate definitions of
1981 types, now that some types merged with others. Specifically:
1982
1983 @enumerate
1984 @item
1985 in lisp.h, removed duplicate declarations of Bytecount. The changed
1986 code should now look like this: (In each code snippet below, the first
1987 and last lines are the same as the original, as are all lines outside of
1988 those lines. That allows you to locate the section to be replaced, and
1989 replace the stuff in that section, verifying that there isn't anything
1990 new added that would need to be kept.)
1991
1992 @example
1993 --------------------------------- snip -------------------------------------
1994 /* Counts of bytes or chars */
1995 typedef EMACS_INT Bytecount;
1996 typedef EMACS_INT Charcount;
1997
1998 /* Counts of elements */
1999 typedef EMACS_INT Elemcount;
2000
2001 /* Hash codes */
2002 typedef unsigned long Hashcode;
2003
2004 /* ------------------------ dynamic arrays ------------------- */
2005 --------------------------------- snip -------------------------------------
2006 @end example
2007
2008 @item
2009 in lstream.h, removed duplicate declaration of Bytecount. Rewrote the
2010 comment about this type. The changed code should now look like this:
2011
2012 @example
2013 --------------------------------- snip -------------------------------------
2014 #endif
2015
2016 /* The have been some arguments over the what the type should be that
2017 specifies a count of bytes in a data block to be written out or read in,
2018 using Lstream_read(), Lstream_write(), and related functions.
2019 Originally it was long, which worked fine; Martin "corrected" these to
2020 size_t and ssize_t on the grounds that this is theoretically cleaner and
2021 is in keeping with the C standards. Unfortunately, this practice is
2022 horribly error-prone due to design flaws in the way that mixed
2023 signed/unsigned arithmetic happens. In fact, by doing this change,
2024 Martin introduced a subtle but fatal error that caused the operation of
2025 sending large mail messages to the SMTP server under Windows to fail.
2026 By putting all values back to be signed, avoiding any signed/unsigned
2027 mixing, the bug immediately went away. The type then in use was
2028 Lstream_Data_Count, so that it be reverted cleanly if a vote came to
2029 that. Now it is Bytecount.
2030
2031 Some earlier comments about why the type must be signed: This MUST BE
2032 SIGNED, since it also is used in functions that return the number of
2033 bytes actually read to or written from in an operation, and these
2034 functions can return -1 to signal error.
2035
2036 Note that the standard Unix read() and write() functions define the
2037 count going in as a size_t, which is UNSIGNED, and the count going
2038 out as an ssize_t, which is SIGNED. This is a horrible design
2039 flaw. Not only is it highly likely to lead to logic errors when a
2040 -1 gets interpreted as a large positive number, but operations are
2041 bound to fail in all sorts of horrible ways when a number in the
2042 upper-half of the size_t range is passed in -- this number is
2043 unrepresentable as an ssize_t, so code that checks to see how many
2044 bytes are actually written (which is mandatory if you are dealing
2045 with certain types of devices) will get completely screwed up.
2046
2047 --ben
2048 */
2049
2050 typedef enum lstream_buffering
2051 --------------------------------- snip -------------------------------------
2052 @end example
2053
2054 @item
2055 in dumper.c, there are four places, all inside of switch() statements,
2056 where XD_BYTECOUNT appears twice as a case tag. In each case, the two
2057 case blocks contain identical code, and you should *REMOVE THE SECOND*
2058 and leave the first.
2059 @end enumerate
2060
2061 @node Text/Char Type Renaming
2062 @section Text/Char Type Renaming
2063 @cindex Text/Char Type Renaming
2064 @cindex type renaming, text/char
2065 @cindex renaming, text/char types
2066
2067 The purpose of this was
2068
2069 @enumerate
2070 @item
2071 To distinguish between ``charptr'' when it refers to operations on
2072 the pointer itself and when it refers to operations on text
2073 @item
2074 To use consistent naming for everything referring to internal format, i.e.
2075 @end enumerate
2076
2077 @example
2078 Itext == text in internal format
2079 Ibyte == a byte in such text
2080 Ichar == a char as represented in internal character format
2081 @end example
2082
2083 Thus e.g.
2084
2085 @example
2086 set_charptr_emchar -> set_itext_ichar
2087 @end example
2088
2089 This was done using a script like this:
2090
2091 @example
2092 files="*.[ch] s/*.h m/*.h config.h.in ../configure.in Makefile.in.in ../lib-src/*.[ch] ../lwlib/*.[ch]"
2093 gr Intbyte Ibyte $files
2094 gr INTBYTE IBYTE $files
2095 gr intbyte ibyte $files
2096 gr EMCHAR ICHAR $files
2097 gr emchar ichar $files
2098 gr Emchar Ichar $files
2099 gr INC_CHARPTR INC_IBYTEPTR $files
2100 gr DEC_CHARPTR DEC_IBYTEPTR $files
2101 gr VALIDATE_CHARPTR VALIDATE_IBYTEPTR $files
2102 gr valid_charptr valid_ibyteptr $files
2103 gr CHARPTR ITEXT $files
2104 gr charptr itext $files
2105 gr Charptr Itext $files
2106 @end example
2107
2108 See above for the source to @samp{gr}.
2109
2110 As in the integral-types change, there are pre and post tags before and
2111 after the change:
2112
2113 @example
2114 pre-internal-format-textual-renaming
2115 post-internal-format-textual-renaming
2116 @end example
2117
2118 When merging a large branch, follow the same sort of procedure
2119 documented above, using these tags -- essentially sync up to the pre
2120 tag, then apply the script yourself, then sync from the post tag to the
2121 present. You can probably do the same if you don't have a separate
2122 workspace, but do have lots of outstanding changes and you'd rather not
2123 just merge all the textual changes directly. Use something like this:
2124
2125 (WARNING: I'm not a CVS guru; before trying this, or any large operation
2126 that might potentially mess things up, *DEFINITELY* make a backup of
2127 your existing workspace.)
2128
2129 @example
2130 cup -r pre-internal-format-textual-renaming
2131 <apply script>
2132 cup -A -j post-internal-format-textual-renaming -j HEAD
2133 @end example
2134
2135 This might also work:
2136
2137 @example
2138 cup -j pre-internal-format-textual-renaming
2139 <apply script>
2140 cup -j post-internal-format-textual-renaming -j HEAD
2141 @end example
2142
2143 ben
2144
2145 The following is a script to go in the opposite direction:
2146
2147 @example
2148 files="*.[ch] s/*.h m/*.h config.h.in ../configure.in Makefile.in.in ../lib-src/*.[ch] ../lwlib/*.[ch]"
2149
2150 # Evidently Perl considers _ to be a word char ala \b, even though XEmacs
2151 # doesn't. We need to be careful here with ibyte/ichar because of words
2152 # like Richard, eicharlen(), multibyte, HIBYTE, etc.
2153
2154 gr Ibyte Intbyte $files
2155 gr '\bIBYTE' INTBYTE $files
2156 gr '\bibyte' intbyte $files
2157 gr '\bICHAR' EMCHAR $files
2158 gr '\bichar' emchar $files
2159 gr '\bIchar' Emchar $files
2160 gr '\bIBYTEPTR' CHARPTR $files
2161 gr '\bibyteptr' charptr $files
2162 gr '\bITEXT' CHARPTR $files
2163 gr '\bitext' charptr $files
2164 gr '\bItext' CHARPTR $files
2165
2166 gr '_IBYTE' _INTBYTE $files
2167 gr '_ibyte' _intbyte $files
2168 gr '_ICHAR' _EMCHAR $files
2169 gr '_ichar' _emchar $files
2170 gr '_Ichar' _Emchar $files
2171 gr '_IBYTEPTR' _CHARPTR $files
2172 gr '_ibyteptr' _charptr $files
2173 gr '_ITEXT' _CHARPTR $files
2174 gr '_itext' _charptr $files
2175 gr '_Itext' _CHARPTR $files
2176 @end example
2177
2178 @node Rules When Writing New C Code, CVS Techniques, Major Textual Changes, Top
1850 @chapter Rules When Writing New C Code 2179 @chapter Rules When Writing New C Code
1851 @cindex writing new C code, rules when 2180 @cindex writing new C code, rules when
1852 @cindex C code, rules when writing new 2181 @cindex C code, rules when writing new
1853 @cindex code, rules when writing new C 2182 @cindex code, rules when writing new C
1854 2183