Mercurial > hg > xemacs-beta
comparison man/internals/internals.texi @ 868:48eed784e93a
[xemacs-hg @ 2002-06-05 12:00:40 by ben]
To: xemacs-patches@xemacs.org
internals/internals.texi:
author | ben |
---|---|
date | Wed, 05 Jun 2002 12:01:11 +0000 |
parents | 19dfb459d51a |
children | e51bd28995c0 |
comparison
equal
deleted
inserted
replaced
867:804517e16990 | 868:48eed784e93a |
---|---|
114 * The Lisp Language:: An overview. | 114 * The Lisp Language:: An overview. |
115 * XEmacs From the Perspective of Building:: | 115 * XEmacs From the Perspective of Building:: |
116 * XEmacs From the Inside:: | 116 * XEmacs From the Inside:: |
117 * The XEmacs Object System (Abstractly Speaking):: | 117 * The XEmacs Object System (Abstractly Speaking):: |
118 * How Lisp Objects Are Represented in C:: | 118 * How Lisp Objects Are Represented in C:: |
119 * Major Textual Changes:: | |
119 * Rules When Writing New C Code:: | 120 * Rules When Writing New C Code:: |
120 * CVS Techniques:: | 121 * CVS Techniques:: |
121 * A Summary of the Various XEmacs Modules:: | 122 * A Summary of the Various XEmacs Modules:: |
122 * Allocation of Objects in XEmacs Lisp:: | 123 * Allocation of Objects in XEmacs Lisp:: |
123 * Dumping:: | 124 * Dumping:: |
1757 reading some Lisp code), or because they can't be created at all | 1758 reading some Lisp code), or because they can't be created at all |
1758 (e.g. subrs). Permanent objects, as a rule, do not have a read syntax; | 1759 (e.g. subrs). Permanent objects, as a rule, do not have a read syntax; |
1759 nor do most complex objects, which contain too much state to be easily | 1760 nor do most complex objects, which contain too much state to be easily |
1760 initialized through a read syntax. | 1761 initialized through a read syntax. |
1761 | 1762 |
1762 @node How Lisp Objects Are Represented in C, Rules When Writing New C Code, The XEmacs Object System (Abstractly Speaking), Top | 1763 @node How Lisp Objects Are Represented in C, Major Textual Changes, The XEmacs Object System (Abstractly Speaking), Top |
1763 @chapter How Lisp Objects Are Represented in C | 1764 @chapter How Lisp Objects Are Represented in C |
1764 @cindex Lisp objects are represented in C, how | 1765 @cindex Lisp objects are represented in C, how |
1765 @cindex objects are represented in C, how Lisp | 1766 @cindex objects are represented in C, how Lisp |
1766 @cindex represented in C, how Lisp objects are | 1767 @cindex represented in C, how Lisp objects are |
1767 | 1768 |
1844 performance is an issue, use @code{type_checking_assert}, | 1845 performance is an issue, use @code{type_checking_assert}, |
1845 @code{bufpos_checking_assert}, and @code{gc_checking_assert}, which do | 1846 @code{bufpos_checking_assert}, and @code{gc_checking_assert}, which do |
1846 nothing unless the corresponding configure error checking flag was | 1847 nothing unless the corresponding configure error checking flag was |
1847 specified. | 1848 specified. |
1848 | 1849 |
1849 @node Rules When Writing New C Code, CVS Techniques, How Lisp Objects Are Represented in C, Top | 1850 @node Major Textual Changes, Rules When Writing New C Code, How Lisp Objects Are Represented in C, Top |
1851 @chapter Major Textual Changes | |
1852 @cindex textual changes, major | |
1853 @cindex major textual changes | |
1854 | |
1855 Sometimes major textual changes are made to the source. This means that | |
1856 a search-and-replace is done to change type names and such. Some people | |
1857 disagree with such changes, and certainly if done without good reason | |
1858 will just lead to headaches. But it's important to keep the code clean | |
1859 and understable, and consistent naming goes a long way towards this. | |
1860 | |
1861 An example of the right way to do this was the so-called "great integral | |
1862 type renaming". | |
1863 | |
1864 @menu | |
1865 * Great Integral Type Renaming:: | |
1866 * Text/Char Type Renaming:: | |
1867 @end menu | |
1868 | |
1869 @node Great Integral Type Renaming | |
1870 @section Great Integral Type Renaming | |
1871 @cindex Great Integral Type Renaming | |
1872 @cindex integral type renaming, great | |
1873 @cindex type renaming, integral | |
1874 @cindex renaming, integral types | |
1875 | |
1876 The purpose of this is to rationalize the names used for various | |
1877 integral types, so that they match their intended uses and follow | |
1878 consist conventions, and eliminate types that were not semantically | |
1879 different from each other. | |
1880 | |
1881 The conventions are: | |
1882 | |
1883 @itemize @bullet | |
1884 @item | |
1885 All integral types that measure quantities of anything are signed. Some | |
1886 people disagree vociferously with this, but their arguments are mostly | |
1887 theoretical, and are vastly outweighed by the practical headaches of | |
1888 mixing signed and unsigned values, and more importantly by the far | |
1889 increased likelihood of inadvertent bugs: Because of the broken "viral" | |
1890 nature of unsigned quantities in C (operations involving mixed | |
1891 signed/unsigned are done unsigned, when exactly the opposite is nearly | |
1892 always wanted), even a single error in declaring a quantity unsigned | |
1893 that should be signed, or even the even more subtle error of comparing | |
1894 signed and unsigned values and forgetting the necessary cast, can be | |
1895 catastrophic, as comparisons will yield wrong results. -Wsign-compare | |
1896 is turned on specifically to catch this, but this tends to result in a | |
1897 great number of warnings when mixing signed and unsigned, and the casts | |
1898 are annoying. More has been written on this elsewhere. | |
1899 | |
1900 @item | |
1901 All such quantity types just mentioned boil down to EMACS_INT, which is | |
1902 32 bits on 32-bit machines and 64 bits on 64-bit machines. This is | |
1903 guaranteed to be the same size as Lisp objects of type `int', and (as | |
1904 far as I can tell) of size_t (unsigned!) and ssize_t. The only type | |
1905 below that is not an EMACS_INT is Hashcode, which is an unsigned value | |
1906 of the same size as EMACS_INT. | |
1907 | |
1908 @item | |
1909 Type names should be relatively short (no more than 10 characters or | |
1910 so), with the first letter capitalized and no underscores if they can at | |
1911 all be avoided. | |
1912 | |
1913 @item | |
1914 "count" == a zero-based measurement of some quantity. Includes sizes, | |
1915 offsets, and indexes. | |
1916 | |
1917 @item | |
1918 "bpos" == a one-based measurement of a position in a buffer. "Charbpos" | |
1919 and "Bytebpos" count text in the buffer, rather than bytes in memory; | |
1920 thus Bytebpos does not directly correspond to the memory representation. | |
1921 Use "Membpos" for this. | |
1922 | |
1923 @item | |
1924 "Char" refers to internal-format characters, not to the C type "char", | |
1925 which is really a byte. | |
1926 @end itemize | |
1927 | |
1928 For the actual name changes, see the script below. | |
1929 | |
1930 I ran the following script to do the conversion. (NOTE: This script is | |
1931 idempotent. You can safely run it multiple times and it will not screw | |
1932 up previous results -- in fact, it will do nothing if nothing has | |
1933 changed. Thus, it can be run repeatedly as necessary to handle patches | |
1934 coming in from old workspaces, or old branches.) There are two tags, | |
1935 just before and just after the change: @samp{pre-integral-type-rename} | |
1936 and @samp{post-integral-type-rename}. When merging code from the main | |
1937 trunk into a branch, the best thing to do is first merge up to | |
1938 @samp{pre-integral-type-rename}, then apply the script and associated | |
1939 changes, then merge from @samp{post-integral-type-change} to the | |
1940 present. (Alternatively, just do the merging in one operation; but you | |
1941 may then have a lot of conflicts needing to be resolved by hand.) | |
1942 | |
1943 Script @samp{fixtypes.sh} follows: | |
1944 | |
1945 @example | |
1946 ----------------------------------- cut ------------------------------------ | |
1947 files="*.[ch] s/*.h m/*.h config.h.in ../configure.in Makefile.in.in ../lib-src/*.[ch] ../lwlib/*.[ch]" | |
1948 gr Memory_Count Bytecount $files | |
1949 gr Lstream_Data_Count Bytecount $files | |
1950 gr Element_Count Elemcount $files | |
1951 gr Hash_Code Hashcode $files | |
1952 gr extcount bytecount $files | |
1953 gr bufpos charbpos $files | |
1954 gr bytind bytebpos $files | |
1955 gr memind membpos $files | |
1956 gr bufbyte intbyte $files | |
1957 gr Extcount Bytecount $files | |
1958 gr Bufpos Charbpos $files | |
1959 gr Bytind Bytebpos $files | |
1960 gr Memind Membpos $files | |
1961 gr Bufbyte Intbyte $files | |
1962 gr EXTCOUNT BYTECOUNT $files | |
1963 gr BUFPOS CHARBPOS $files | |
1964 gr BYTIND BYTEBPOS $files | |
1965 gr MEMIND MEMBPOS $files | |
1966 gr BUFBYTE INTBYTE $files | |
1967 gr MEMORY_COUNT BYTECOUNT $files | |
1968 gr LSTREAM_DATA_COUNT BYTECOUNT $files | |
1969 gr ELEMENT_COUNT ELEMCOUNT $files | |
1970 gr HASH_CODE HASHCODE $files | |
1971 ----------------------------------- cut ------------------------------------ | |
1972 @end example | |
1973 | |
1974 The @samp{gr} script, and the scripts it uses, are documented in | |
1975 @file{README.global-renaming}, because if placed in this file they would | |
1976 need to have their @@ characters doubled, meaning you couldn't easily | |
1977 cut and paste from the source. | |
1978 | |
1979 In addition to those programs, I needed to fix up a few other | |
1980 things, particularly relating to the duplicate definitions of | |
1981 types, now that some types merged with others. Specifically: | |
1982 | |
1983 @enumerate | |
1984 @item | |
1985 in lisp.h, removed duplicate declarations of Bytecount. The changed | |
1986 code should now look like this: (In each code snippet below, the first | |
1987 and last lines are the same as the original, as are all lines outside of | |
1988 those lines. That allows you to locate the section to be replaced, and | |
1989 replace the stuff in that section, verifying that there isn't anything | |
1990 new added that would need to be kept.) | |
1991 | |
1992 @example | |
1993 --------------------------------- snip ------------------------------------- | |
1994 /* Counts of bytes or chars */ | |
1995 typedef EMACS_INT Bytecount; | |
1996 typedef EMACS_INT Charcount; | |
1997 | |
1998 /* Counts of elements */ | |
1999 typedef EMACS_INT Elemcount; | |
2000 | |
2001 /* Hash codes */ | |
2002 typedef unsigned long Hashcode; | |
2003 | |
2004 /* ------------------------ dynamic arrays ------------------- */ | |
2005 --------------------------------- snip ------------------------------------- | |
2006 @end example | |
2007 | |
2008 @item | |
2009 in lstream.h, removed duplicate declaration of Bytecount. Rewrote the | |
2010 comment about this type. The changed code should now look like this: | |
2011 | |
2012 @example | |
2013 --------------------------------- snip ------------------------------------- | |
2014 #endif | |
2015 | |
2016 /* The have been some arguments over the what the type should be that | |
2017 specifies a count of bytes in a data block to be written out or read in, | |
2018 using Lstream_read(), Lstream_write(), and related functions. | |
2019 Originally it was long, which worked fine; Martin "corrected" these to | |
2020 size_t and ssize_t on the grounds that this is theoretically cleaner and | |
2021 is in keeping with the C standards. Unfortunately, this practice is | |
2022 horribly error-prone due to design flaws in the way that mixed | |
2023 signed/unsigned arithmetic happens. In fact, by doing this change, | |
2024 Martin introduced a subtle but fatal error that caused the operation of | |
2025 sending large mail messages to the SMTP server under Windows to fail. | |
2026 By putting all values back to be signed, avoiding any signed/unsigned | |
2027 mixing, the bug immediately went away. The type then in use was | |
2028 Lstream_Data_Count, so that it be reverted cleanly if a vote came to | |
2029 that. Now it is Bytecount. | |
2030 | |
2031 Some earlier comments about why the type must be signed: This MUST BE | |
2032 SIGNED, since it also is used in functions that return the number of | |
2033 bytes actually read to or written from in an operation, and these | |
2034 functions can return -1 to signal error. | |
2035 | |
2036 Note that the standard Unix read() and write() functions define the | |
2037 count going in as a size_t, which is UNSIGNED, and the count going | |
2038 out as an ssize_t, which is SIGNED. This is a horrible design | |
2039 flaw. Not only is it highly likely to lead to logic errors when a | |
2040 -1 gets interpreted as a large positive number, but operations are | |
2041 bound to fail in all sorts of horrible ways when a number in the | |
2042 upper-half of the size_t range is passed in -- this number is | |
2043 unrepresentable as an ssize_t, so code that checks to see how many | |
2044 bytes are actually written (which is mandatory if you are dealing | |
2045 with certain types of devices) will get completely screwed up. | |
2046 | |
2047 --ben | |
2048 */ | |
2049 | |
2050 typedef enum lstream_buffering | |
2051 --------------------------------- snip ------------------------------------- | |
2052 @end example | |
2053 | |
2054 @item | |
2055 in dumper.c, there are four places, all inside of switch() statements, | |
2056 where XD_BYTECOUNT appears twice as a case tag. In each case, the two | |
2057 case blocks contain identical code, and you should *REMOVE THE SECOND* | |
2058 and leave the first. | |
2059 @end enumerate | |
2060 | |
2061 @node Text/Char Type Renaming | |
2062 @section Text/Char Type Renaming | |
2063 @cindex Text/Char Type Renaming | |
2064 @cindex type renaming, text/char | |
2065 @cindex renaming, text/char types | |
2066 | |
2067 The purpose of this was | |
2068 | |
2069 @enumerate | |
2070 @item | |
2071 To distinguish between ``charptr'' when it refers to operations on | |
2072 the pointer itself and when it refers to operations on text | |
2073 @item | |
2074 To use consistent naming for everything referring to internal format, i.e. | |
2075 @end enumerate | |
2076 | |
2077 @example | |
2078 Itext == text in internal format | |
2079 Ibyte == a byte in such text | |
2080 Ichar == a char as represented in internal character format | |
2081 @end example | |
2082 | |
2083 Thus e.g. | |
2084 | |
2085 @example | |
2086 set_charptr_emchar -> set_itext_ichar | |
2087 @end example | |
2088 | |
2089 This was done using a script like this: | |
2090 | |
2091 @example | |
2092 files="*.[ch] s/*.h m/*.h config.h.in ../configure.in Makefile.in.in ../lib-src/*.[ch] ../lwlib/*.[ch]" | |
2093 gr Intbyte Ibyte $files | |
2094 gr INTBYTE IBYTE $files | |
2095 gr intbyte ibyte $files | |
2096 gr EMCHAR ICHAR $files | |
2097 gr emchar ichar $files | |
2098 gr Emchar Ichar $files | |
2099 gr INC_CHARPTR INC_IBYTEPTR $files | |
2100 gr DEC_CHARPTR DEC_IBYTEPTR $files | |
2101 gr VALIDATE_CHARPTR VALIDATE_IBYTEPTR $files | |
2102 gr valid_charptr valid_ibyteptr $files | |
2103 gr CHARPTR ITEXT $files | |
2104 gr charptr itext $files | |
2105 gr Charptr Itext $files | |
2106 @end example | |
2107 | |
2108 See above for the source to @samp{gr}. | |
2109 | |
2110 As in the integral-types change, there are pre and post tags before and | |
2111 after the change: | |
2112 | |
2113 @example | |
2114 pre-internal-format-textual-renaming | |
2115 post-internal-format-textual-renaming | |
2116 @end example | |
2117 | |
2118 When merging a large branch, follow the same sort of procedure | |
2119 documented above, using these tags -- essentially sync up to the pre | |
2120 tag, then apply the script yourself, then sync from the post tag to the | |
2121 present. You can probably do the same if you don't have a separate | |
2122 workspace, but do have lots of outstanding changes and you'd rather not | |
2123 just merge all the textual changes directly. Use something like this: | |
2124 | |
2125 (WARNING: I'm not a CVS guru; before trying this, or any large operation | |
2126 that might potentially mess things up, *DEFINITELY* make a backup of | |
2127 your existing workspace.) | |
2128 | |
2129 @example | |
2130 cup -r pre-internal-format-textual-renaming | |
2131 <apply script> | |
2132 cup -A -j post-internal-format-textual-renaming -j HEAD | |
2133 @end example | |
2134 | |
2135 This might also work: | |
2136 | |
2137 @example | |
2138 cup -j pre-internal-format-textual-renaming | |
2139 <apply script> | |
2140 cup -j post-internal-format-textual-renaming -j HEAD | |
2141 @end example | |
2142 | |
2143 ben | |
2144 | |
2145 The following is a script to go in the opposite direction: | |
2146 | |
2147 @example | |
2148 files="*.[ch] s/*.h m/*.h config.h.in ../configure.in Makefile.in.in ../lib-src/*.[ch] ../lwlib/*.[ch]" | |
2149 | |
2150 # Evidently Perl considers _ to be a word char ala \b, even though XEmacs | |
2151 # doesn't. We need to be careful here with ibyte/ichar because of words | |
2152 # like Richard, eicharlen(), multibyte, HIBYTE, etc. | |
2153 | |
2154 gr Ibyte Intbyte $files | |
2155 gr '\bIBYTE' INTBYTE $files | |
2156 gr '\bibyte' intbyte $files | |
2157 gr '\bICHAR' EMCHAR $files | |
2158 gr '\bichar' emchar $files | |
2159 gr '\bIchar' Emchar $files | |
2160 gr '\bIBYTEPTR' CHARPTR $files | |
2161 gr '\bibyteptr' charptr $files | |
2162 gr '\bITEXT' CHARPTR $files | |
2163 gr '\bitext' charptr $files | |
2164 gr '\bItext' CHARPTR $files | |
2165 | |
2166 gr '_IBYTE' _INTBYTE $files | |
2167 gr '_ibyte' _intbyte $files | |
2168 gr '_ICHAR' _EMCHAR $files | |
2169 gr '_ichar' _emchar $files | |
2170 gr '_Ichar' _Emchar $files | |
2171 gr '_IBYTEPTR' _CHARPTR $files | |
2172 gr '_ibyteptr' _charptr $files | |
2173 gr '_ITEXT' _CHARPTR $files | |
2174 gr '_itext' _charptr $files | |
2175 gr '_Itext' _CHARPTR $files | |
2176 @end example | |
2177 | |
2178 @node Rules When Writing New C Code, CVS Techniques, Major Textual Changes, Top | |
1850 @chapter Rules When Writing New C Code | 2179 @chapter Rules When Writing New C Code |
1851 @cindex writing new C code, rules when | 2180 @cindex writing new C code, rules when |
1852 @cindex C code, rules when writing new | 2181 @cindex C code, rules when writing new |
1853 @cindex code, rules when writing new C | 2182 @cindex code, rules when writing new C |
1854 | 2183 |