Mercurial > hg > xemacs-beta
comparison man/internals/internals.texi @ 5096:e0587c615e8b
Updates to internals.texi
-------------------- ChangeLog entries follow: --------------------
man/ChangeLog addition:
2010-03-04 Ben Wing <ben@xemacs.org>
* internals/internals.texi (Top):
* internals/internals.texi (list-to-texinfo): Removed.
* internals/internals.texi (convert-list-to-texinfo): New.
* internals/internals.texi (table-to-texinfo): Removed.
* internals/internals.texi (convert-table-to-texinfo): New.
Update Lisp functions at top to newest versions.
* internals/internals.texi (A History of Emacs):
* internals/internals.texi (Through Version 18):
* internals/internals.texi (Lucid Emacs):
* internals/internals.texi (XEmacs):
* internals/internals.texi (The XEmacs Split):
* internals/internals.texi (Modules for Other Aspects of the Lisp Interpreter and Object System):
* internals/internals.texi (Introduction to Writing C Code):
* internals/internals.texi (Writing Good Comments):
* internals/internals.texi (Writing Macros):
* internals/internals.texi (Major Textual Changes):
* internals/internals.texi (Great Integral Type Renaming):
* internals/internals.texi (How to Regression-Test):
* internals/internals.texi (Creating a Branch):
* internals/internals.texi (Dynamic Arrays):
* internals/internals.texi (Allocation by Blocks):
* internals/internals.texi (mark_object):
* internals/internals.texi (gc_sweep):
* internals/internals.texi (Byte-Char Position Conversion):
* internals/internals.texi (Searching and Matching):
* internals/internals.texi (Introduction to Multilingual Issues #3):
* internals/internals.texi (Byte Types):
* internals/internals.texi (Different Ways of Seeing Internal Text):
* internals/internals.texi (Buffer Positions):
* internals/internals.texi (Basic internal-format APIs):
* internals/internals.texi (The DFC API):
* internals/internals.texi (General Guidelines for Writing Mule-Aware Code):
* internals/internals.texi (Mule-izing Code):
* internals/internals.texi (Locales):
* internals/internals.texi (More about code pages):
* internals/internals.texi (More about locales):
* internals/internals.texi (Unicode support under Windows):
* internals/internals.texi (The Frame):
* internals/internals.texi (The Non-Client Area):
* internals/internals.texi (The Client Area):
* internals/internals.texi (The Paned Area):
* internals/internals.texi (Text Areas):
* internals/internals.texi (The Displayable Area):
* internals/internals.texi (Event Queues):
* internals/internals.texi (Event Stream Callback Routines):
* internals/internals.texi (Focus Handling):
* internals/internals.texi (Future Work -- Autodetection):
Replace " with ``, '' (not complete, maybe about halfway through).
author | Ben Wing <ben@xemacs.org> |
---|---|
date | Thu, 04 Mar 2010 07:19:03 -0600 |
parents | 0ca81354c4c7 |
children | 7be849cb8828 |
comparison
equal
deleted
inserted
replaced
5095:cb4f2e1bacc4 | 5096:e0587c615e8b |
---|---|
159 that has been formatted into ASCII lists and tables. | 159 that has been formatted into ASCII lists and tables. |
160 | 160 |
161 Note: to define these routines, put point after the end of the definition | 161 Note: to define these routines, put point after the end of the definition |
162 and type C-x C-e. | 162 and type C-x C-e. |
163 | 163 |
164 (defun list-to-texinfo (b e) | 164 (defun convert-list-to-texinfo (b e) |
165 "Convert the selected region from an ASCII list to a Texinfo list." | 165 "Convert the selected region from an ASCII list to a Texinfo list." |
166 (interactive "r") | 166 (interactive "r") |
167 (save-restriction | 167 (save-restriction |
168 (narrow-to-region b e) | 168 (narrow-to-region b e) |
169 (goto-char (point-min)) | 169 (goto-char (point-min)) |
170 (let ((dash-type "^ *-+ +") | 170 (let ((dash-type "^ *\\(-+\\|o\\) +") |
171 ;; allow single-letter numbering or roman numerals | 171 ;; allow single-letter numbering or roman numerals |
172 (letter-type "^ *[[(]?\\([a-zA-Z]\\|[IVXivx]+\\)[]).] +") | 172 (letter-type "^ *[[(]?\\([a-zA-Z]\\|[IVXivx]+\\)[]).] +") |
173 (num-type "^ *[[(]?[0-9]+[]).] +") | 173 (num-type "^ *[[(]?[0-9]+[]).] +") |
174 dash regexp) | 174 dash regexp) |
175 (save-excursion | 175 (save-excursion |
237 (insert-char ?\ (- min (current-column))) | 237 (insert-char ?\ (- min (current-column))) |
238 (beginning-of-line) | 238 (beginning-of-line) |
239 (forward-char min)) | 239 (forward-char min)) |
240 (kill-rectangle b (point)))))) | 240 (kill-rectangle b (point)))))) |
241 | 241 |
242 (defun table-to-texinfo (b e) | 242 (defun convert-table-to-texinfo (b e) |
243 "Convert the selected region from an ASCII table to a Texinfo table. | 243 "Convert the selected region from an ASCII table to a Texinfo table. |
244 Assumes entries are separated by a blank line, and the first sexp in | 244 Assumes entries are separated by a blank line, and the first sexp in |
245 each entry is the table heading." | 245 each entry is the table heading." |
246 (interactive "r") | 246 (interactive "r") |
247 (save-restriction | 247 (save-restriction |
281 If the region is active, do the region; otherwise, go from point to the end | 281 If the region is active, do the region; otherwise, go from point to the end |
282 of the buffer. This query-replaces for various kinds of conventions used | 282 of the buffer. This query-replaces for various kinds of conventions used |
283 in text: @code{} surrounded by ` and ' or followed by a (); @strong{} | 283 in text: @code{} surrounded by ` and ' or followed by a (); @strong{} |
284 surrounded by *'s; @file{} something that looks like a file name." | 284 surrounded by *'s; @file{} something that looks like a file name." |
285 (interactive) | 285 (interactive) |
286 (if (and (not no-narrow) (region-active-p)) | 286 (save-excursion |
287 (save-restriction | 287 (if (and (not no-narrow) (region-active-p)) |
288 (narrow-to-region (region-beginning) (region-end)) | 288 (save-restriction |
289 (convert-text-to-texinfo t)) | 289 (narrow-to-region (region-beginning) (region-end)) |
290 (let ((p (point)) | 290 (goto-char (region-beginning)) |
291 (case-replace nil)) | 291 (zmacs-deactivate-region) |
292 (query-replace-regexp "`\\([^']+\\)'\\([^']\\)" "@code{\\1}\\2" nil) | 292 (convert-text-to-texinfo t)) |
293 (goto-char p) | 293 (let ((p (point)) |
294 (query-replace-regexp "\\(\\Sw\\)\\*\\(\\(?:\\s_\\|\\sw\\)+\\)\\*\\([^A-Za-z.}]\\)" "\\1@strong{\\2}\\3" nil) | 294 (case-replace nil)) |
295 (goto-char p) | 295 (message "Point is %d" (point)) |
296 (query-replace-regexp "\\(\\(\\s_\\|\\sw\\)+()\\)\\([^}]\\)" "@code{\\1}\\3" nil) | 296 (query-replace-regexp "`\\([^']+\\)'\\([^']\\)" "@code{\\1}\\2" nil) |
297 (goto-char p) | 297 (goto-char p) |
298 (query-replace-regexp "\\(\\(\\s_\\|\\sw\\)+\\.[A-Za-z]+\\)\\([^A-Za-z.}]\\)" "@file{\\1}\\3" nil) | 298 (query-replace-regexp "\\(\\Sw\\)\\*\\(\\(?:\\s_\\|\\sw\\)+\\)\\*\\([^A-Za-z.}]\\)" "\\1@strong{\\2}\\3" nil) |
299 ))) | 299 (goto-char p) |
300 (query-replace-regexp "\\(\\(\\s_\\|\\sw\\)+()\\)\\([^}]\\)" "@code{\\1}\\3" nil) | |
301 (goto-char p) | |
302 (query-replace-regexp "\\(\\(\\s_\\|\\sw\\)+\\.[A-Za-z]+\\)\\([^A-Za-z.}]\\)" "@file{\\1}\\3" nil) | |
303 )))) | |
300 | 304 |
301 4. Adding new sections: | 305 4. Adding new sections: |
302 ----------------------- | 306 ----------------------- |
303 | 307 |
304 NOTE: These are in the form of macros. #### FIXME Convert them to | 308 NOTE: These are in the form of macros. #### FIXME Convert them to |
1236 XEmacs is a powerful, customizable text editor and development | 1240 XEmacs is a powerful, customizable text editor and development |
1237 environment. It began in 1991 as Lucid Emacs, which was in turn | 1241 environment. It began in 1991 as Lucid Emacs, which was in turn |
1238 derived from GNU Emacs, a program written by Richard Stallman of the | 1242 derived from GNU Emacs, a program written by Richard Stallman of the |
1239 Free Software Foundation. GNU Emacs dates back to 1985 and was | 1243 Free Software Foundation. GNU Emacs dates back to 1985 and was |
1240 modelled after Unipress Emacs, an editor written by James Gosling in | 1244 modelled after Unipress Emacs, an editor written by James Gosling in |
1241 1981 and based on a series of other "Emacs"-like editors, including | 1245 1981 and based on a series of other ``Emacs''-like editors, including |
1242 EINE (EINE Is Not EMACS), c. 1976, by Dan Weinreb, which run on the | 1246 EINE (EINE Is Not EMACS), c. 1976, by Dan Weinreb, which run on the |
1243 MIT Lisp Machine and was the first Emacs written in Lisp; ZWEI (ZWEI | 1247 MIT Lisp Machine and was the first Emacs written in Lisp; ZWEI (ZWEI |
1244 Was EINE Initially), c. 1978, by Dan Weinreb and Mike McMahon; Multics | 1248 Was EINE Initially), c. 1978, by Dan Weinreb and Mike McMahon; Multics |
1245 Emacs, c. 1978, by Bernie Greenberg, which was written in MacLisp and | 1249 Emacs, c. 1978, by Bernie Greenberg, which was written in MacLisp and |
1246 also used Lisp as its extension language; and ZMACS, c. 1980, a direct | 1250 also used Lisp as its extension language; and ZMACS, c. 1980, a direct |
1247 descendant of ZWEI that on ran the Symbolics LM-2, LMI LispM, and | 1251 descendant of ZWEI that on ran the Symbolics LM-2, LMI LispM, and |
1248 later, TI Explorer (1983-1989). These in turn were inspired by the | 1252 later, TI Explorer (1983-1989). These in turn were inspired by the |
1249 first Emacs, a package called EMACS, written in 1976 by Richard | 1253 first Emacs, a package called EMACS, written in 1976 by Richard |
1250 Stallman, Guy Steele, and Dave Moon. This was a merger of TECMAC and | 1254 Stallman, Guy Steele, and Dave Moon. This was a merger of TECMAC and |
1251 TMACS, a pair of "TECO-macro realtime editors" written by Guy Steele, | 1255 TMACS, a pair of ``TECO-macro realtime editors'' written by Guy Steele, |
1252 Dave Moon, Richard Greenblatt, Charles Frankston, et al., and added a | 1256 Dave Moon, Richard Greenblatt, Charles Frankston, et al., and added a |
1253 dynamic loader and Meta-key cmds. It ran under ITS (the Incompatible | 1257 dynamic loader and Meta-key cmds. It ran under ITS (the Incompatible |
1254 Timesharing System) on a DEC PDP 10 and under TWENEX on a Tops-20 and | 1258 Timesharing System) on a DEC PDP 10 and under TWENEX on a Tops-20 and |
1255 was written in TECO and PDP 10 assembly. ITS was one of the first | 1259 was written in TECO and PDP 10 assembly. ITS was one of the first |
1256 time-sharing operating systems and dates back well before Unix. ITS, | 1260 time-sharing operating systems and dates back well before Unix. ITS, |
1284 M. Stallman (RMS) and James Gosling (the creator of Java); its extension | 1288 M. Stallman (RMS) and James Gosling (the creator of Java); its extension |
1285 language was known as @dfn{Mocklisp}. This version of Emacs-in-C formed | 1289 language was known as @dfn{Mocklisp}. This version of Emacs-in-C formed |
1286 the basis for the early versions of GNU Emacs and also for Gosling's | 1290 the basis for the early versions of GNU Emacs and also for Gosling's |
1287 Unipress Emacs, a commercial product. Because of bad blood between the | 1291 Unipress Emacs, a commercial product. Because of bad blood between the |
1288 two over the issue of commercialism, RMS pretty much disowned this | 1292 two over the issue of commercialism, RMS pretty much disowned this |
1289 collaboration, referring to it as "Gosling Emacs". | 1293 collaboration, referring to it as ``Gosling Emacs''. |
1290 | 1294 |
1291 At this point we pick up with a time line of events. (A broader timeline | 1295 At this point we pick up with a time line of events. (A broader timeline |
1292 is available at @uref{http://www.jwz.org/doc/emacs-timeline.html, | 1296 is available at @uref{http://www.jwz.org/doc/emacs-timeline.html, |
1293 ``Emacs Timeline''}.) | 1297 ``Emacs Timeline''}.) |
1294 | 1298 |
1575 redisplay code, preliminary I18N support, code merged from GNU Emacs | 1579 redisplay code, preliminary I18N support, code merged from GNU Emacs |
1576 19.8 beta) | 1580 19.8 beta) |
1577 @item | 1581 @item |
1578 Version 19.9 released January 12, 1994. (Scrollbars, Athena.) | 1582 Version 19.9 released January 12, 1994. (Scrollbars, Athena.) |
1579 @item | 1583 @item |
1580 Version 19.10 released May 27, 1994. (Uses `configure'; code merged | 1584 Version 19.10 released May 27, 1994. (Uses @code{configure}; code merged |
1581 from GNU Emacs 19.23 beta and further merging with Epoch 4.0) Known as | 1585 from GNU Emacs 19.23 beta and further merging with Epoch 4.0) Known as |
1582 "Lucid Emacs" when shipped by Lucid, and as "XEmacs" when shipped by | 1586 ``Lucid Emacs'' when shipped by Lucid, and as ``XEmacs'' when shipped by |
1583 Sun; but Lucid went out of business a few days later and it's unclear | 1587 Sun; but Lucid went out of business a few days later and it's unclear |
1584 very many copies of 19.10 were released by Lucid. (Last release by | 1588 very many copies of 19.10 were released by Lucid. (Last release by |
1585 Jamie Zawinski.) | 1589 Jamie Zawinski.) |
1586 @end itemize | 1590 @end itemize |
1587 | 1591 |
1887 rewritten redisplay, TTY support, multi-device support, device and | 1891 rewritten redisplay, TTY support, multi-device support, device and |
1888 console objects, specifiers, glyphs, toolbars, horizontal scrollbars, | 1892 console objects, specifiers, glyphs, toolbars, horizontal scrollbars, |
1889 Lucid scrollbar widget, 3-d modeline, stay-up Lucid menus, resizable | 1893 Lucid scrollbar widget, 3-d modeline, stay-up Lucid menus, resizable |
1890 minibuffer, echo area is a true buffer, MD5 hashing support, expanded | 1894 minibuffer, echo area is a true buffer, MD5 hashing support, expanded |
1891 menubar, redone menu specification format (including menu filters), | 1895 menubar, redone menu specification format (including menu filters), |
1892 rewritten extents, renamed "screen" to "frame", misc-user events, | 1896 rewritten extents, renamed ``screen'' to ``frame'', misc-user events, |
1893 rewritten face code, rewritten mouse code, warnings system, CL | 1897 rewritten face code, rewritten mouse code, warnings system, CL |
1894 backquote syntax, critical C-g, code merging with GNU Emacs 19.28. | 1898 backquote syntax, critical C-g, code merging with GNU Emacs 19.28. |
1895 New packages Hyperbole, OOBR, hm--html-menus, viper, lazy-lock, | 1899 New packages Hyperbole, OOBR, hm--html-menus, viper, lazy-lock, |
1896 ksh-mode, rsz-minibuf.) | 1900 ksh-mode, rsz-minibuf.) |
1897 @item | 1901 @item |
1935 version 20.4 released February 28, 1998. | 1939 version 20.4 released February 28, 1998. |
1936 @item | 1940 @item |
1937 version 21.0.60 released December 10, 1998. (The version naming scheme was | 1941 version 21.0.60 released December 10, 1998. (The version naming scheme was |
1938 changed at this point: [a] the second version number is odd for stable | 1942 changed at this point: [a] the second version number is odd for stable |
1939 versions, even for beta versions; [b] a third version number is added, | 1943 versions, even for beta versions; [b] a third version number is added, |
1940 replacing the "beta xxx" ending for beta versions and allowing for | 1944 replacing the ``beta xxx'' ending for beta versions and allowing for |
1941 periodic maintenance releases for stable versions. Therefore, 21.0 was | 1945 periodic maintenance releases for stable versions. Therefore, 21.0 was |
1942 never "officially" released; similarly for 21.2, etc.) | 1946 never ``officially'' released; similarly for 21.2, etc.) |
1943 @item | 1947 @item |
1944 version 21.0.61 released January 4, 1999. | 1948 version 21.0.61 released January 4, 1999. |
1945 @item | 1949 @item |
1946 version 21.0.63 released February 3, 1999. | 1950 version 21.0.63 released February 3, 1999. |
1947 @item | 1951 @item |
1953 @item | 1957 @item |
1954 version 21.0.67 released March 25, 1999. | 1958 version 21.0.67 released March 25, 1999. |
1955 @item | 1959 @item |
1956 version 21.1.2 released May 14, 1999. (This is the followup to 21.0.67. | 1960 version 21.1.2 released May 14, 1999. (This is the followup to 21.0.67. |
1957 The second version number was bumped to indicate the beginning of the | 1961 The second version number was bumped to indicate the beginning of the |
1958 "stable" series.) | 1962 ``stable'' series.) |
1959 @item | 1963 @item |
1960 version 21.1.3 released June 26, 1999. | 1964 version 21.1.3 released June 26, 1999. |
1961 @item | 1965 @item |
1962 version 21.1.4 released July 8, 1999. | 1966 version 21.1.4 released July 8, 1999. |
1963 @item | 1967 @item |
2043 @item | 2047 @item |
2044 version 21.2.39 released December 31, 2000. | 2048 version 21.2.39 released December 31, 2000. |
2045 @item | 2049 @item |
2046 version 21.2.40 released January 8, 2001. | 2050 version 21.2.40 released January 8, 2001. |
2047 @item | 2051 @item |
2048 version 21.2.41 "Polyhymnia" released January 17, 2001. | 2052 version 21.2.41 ``Polyhymnia'' released January 17, 2001. |
2049 @item | 2053 @item |
2050 version 21.2.42 "Poseidon" released January 20, 2001. | 2054 version 21.2.42 ``Poseidon'' released January 20, 2001. |
2051 @item | 2055 @item |
2052 version 21.2.43 "Terspichore" released January 26, 2001. | 2056 version 21.2.43 ``Terspichore'' released January 26, 2001. |
2053 @item | 2057 @item |
2054 version 21.2.44 "Thalia" released February 8, 2001. | 2058 version 21.2.44 ``Thalia'' released February 8, 2001. |
2055 @item | 2059 @item |
2056 version 21.2.45 "Thelxepeia" released February 23, 2001. | 2060 version 21.2.45 ``Thelxepeia'' released February 23, 2001. |
2057 @item | 2061 @item |
2058 version 21.2.46 "Urania" released March 21, 2001. | 2062 version 21.2.46 ``Urania'' released March 21, 2001. |
2059 @item | 2063 @item |
2060 version 21.2.47 "Zephir" released April 14, 2001. | 2064 version 21.2.47 ``Zephir'' released April 14, 2001. |
2061 @item | 2065 @item |
2062 XEmacs 21.4.0 "Solid Vapor" released April 16, 2001. | 2066 XEmacs 21.4.0 ``Solid Vapor'' released April 16, 2001. |
2063 @item | 2067 @item |
2064 XEmacs 21.4.1 "Copyleft" released April 19, 2001. | 2068 XEmacs 21.4.1 ``Copyleft'' released April 19, 2001. |
2065 @item | 2069 @item |
2066 XEmacs 21.4.2 "Developer-Friendly Unix APIs" released May 10, 2001. | 2070 XEmacs 21.4.2 ``Developer-Friendly Unix APIs'' released May 10, 2001. |
2067 @item | 2071 @item |
2068 XEmacs 21.4.3 "Academic Rigor" released May 17, 2001. | 2072 XEmacs 21.4.3 ``Academic Rigor'' released May 17, 2001. |
2069 @item | 2073 @item |
2070 XEmacs 21.4.4 "Artificial Intelligence" released July 28, 2001. | 2074 XEmacs 21.4.4 ``Artificial Intelligence'' released July 28, 2001. |
2071 @item | 2075 @item |
2072 XEmacs 21.4.5 "Civil Service" released October 23, 2001. | 2076 XEmacs 21.4.5 ``Civil Service'' released October 23, 2001. |
2073 @item | 2077 @item |
2074 XEmacs 21.4.6 "Common Lisp" released December 17, 2001. | 2078 XEmacs 21.4.6 ``Common Lisp'' released December 17, 2001. |
2075 @item | 2079 @item |
2076 XEmacs 21.4.7 "Economic Science" released May 4, 2002. | 2080 XEmacs 21.4.7 ``Economic Science'' released May 4, 2002. |
2077 @item | 2081 @item |
2078 XEmacs 21.4.8 "Honest Recruiter" released May 9, 2002. | 2082 XEmacs 21.4.8 ``Honest Recruiter'' released May 9, 2002. |
2079 @item | 2083 @item |
2080 XEmacs 21.4.9 "Informed Management" released August 23, 2002. | 2084 XEmacs 21.4.9 ``Informed Management'' released August 23, 2002. |
2081 @item | 2085 @item |
2082 XEmacs 21.4.10 "Military Intelligence" released November 2, 2002. | 2086 XEmacs 21.4.10 ``Military Intelligence'' released November 2, 2002. |
2083 @item | 2087 @item |
2084 XEmacs 21.4.11 "Native Windows TTY Support" released January 3, 2003. | 2088 XEmacs 21.4.11 ``Native Windows TTY Support'' released January 3, 2003. |
2085 @item | 2089 @item |
2086 XEmacs 21.4.12 "Portable Code" released January 15, 2003. | 2090 XEmacs 21.4.12 ``Portable Code'' released January 15, 2003. |
2087 @item | 2091 @item |
2088 XEmacs 21.4.13 "Rational FORTRAN" released May 25, 2003. | 2092 XEmacs 21.4.13 ``Rational FORTRAN'' released May 25, 2003. |
2089 @item | 2093 @item |
2090 XEmacs 21.4.14 "Reasonable Discussion" released September 3, 2003. | 2094 XEmacs 21.4.14 ``Reasonable Discussion'' released September 3, 2003. |
2091 @item | 2095 @item |
2092 XEmacs 21.4.15 "Security Through Obscurity" released February 2, 2004. | 2096 XEmacs 21.4.15 ``Security Through Obscurity'' released February 2, 2004. |
2093 @item | 2097 @item |
2094 XEmacs 21.4.16 "Successful IPO" released December 5, 2004. | 2098 XEmacs 21.4.16 ``Successful IPO'' released December 5, 2004. |
2095 @item | 2099 @item |
2096 version 21.5.0 "alfalfa" released April 18, 2001. | 2100 version 21.5.0 ``alfalfa'' released April 18, 2001. |
2097 @item | 2101 @item |
2098 version 21.5.1 "anise" released May 9, 2001. | 2102 version 21.5.1 ``anise'' released May 9, 2001. |
2099 @item | 2103 @item |
2100 version 21.5.2 "artichoke" released July 28, 2001. | 2104 version 21.5.2 ``artichoke'' released July 28, 2001. |
2101 @item | 2105 @item |
2102 version 21.5.3 "asparagus" released September 7, 2001. | 2106 version 21.5.3 ``asparagus'' released September 7, 2001. |
2103 @item | 2107 @item |
2104 version 21.5.4 "bamboo" released January 8, 2002. | 2108 version 21.5.4 ``bamboo'' released January 8, 2002. |
2105 @item | 2109 @item |
2106 version 21.5.5 "beets" released March 5, 2002. | 2110 version 21.5.5 ``beets'' released March 5, 2002. |
2107 @item | 2111 @item |
2108 version 21.5.6 "bok choi" released April 5, 2002. | 2112 version 21.5.6 ``bok choi'' released April 5, 2002. |
2109 @item | 2113 @item |
2110 version 21.5.7 "broccoflower" released July 2, 2002. | 2114 version 21.5.7 ``broccoflower'' released July 2, 2002. |
2111 @item | 2115 @item |
2112 version 21.5.8 "broccoli" released July 27, 2002. | 2116 version 21.5.8 ``broccoli'' released July 27, 2002. |
2113 @item | 2117 @item |
2114 version 21.5.9 "brussels sprouts" released August 30, 2002. | 2118 version 21.5.9 ``brussels sprouts'' released August 30, 2002. |
2115 @item | 2119 @item |
2116 version 21.5.10 "burdock" released January 4, 2003. | 2120 version 21.5.10 ``burdock'' released January 4, 2003. |
2117 @item | 2121 @item |
2118 version 21.5.11 "cabbage" released February 16, 2003. | 2122 version 21.5.11 ``cabbage'' released February 16, 2003. |
2119 @item | 2123 @item |
2120 version 21.5.12 "carrot" released April 24, 2003. | 2124 version 21.5.12 ``carrot'' released April 24, 2003. |
2121 @item | 2125 @item |
2122 version 21.5.13 "cauliflower" released May 10, 2003. | 2126 version 21.5.13 ``cauliflower'' released May 10, 2003. |
2123 @item | 2127 @item |
2124 version 21.5.14 "cassava" released June 1, 2003. | 2128 version 21.5.14 ``cassava'' released June 1, 2003. |
2125 @item | 2129 @item |
2126 version 21.5.15 "celery" released September 3, 2003. | 2130 version 21.5.15 ``celery'' released September 3, 2003. |
2127 @item | 2131 @item |
2128 version 21.5.16 "celeriac" released September 26, 2003. | 2132 version 21.5.16 ``celeriac'' released September 26, 2003. |
2129 @item | 2133 @item |
2130 version 21.5.17 "chayote" released March 22, 2004. | 2134 version 21.5.17 ``chayote'' released March 22, 2004. |
2131 @item | 2135 @item |
2132 version 21.5.18 "chestnut" released October 22, 2004. | 2136 version 21.5.18 ``chestnut'' released October 22, 2004. |
2133 @end itemize | 2137 @end itemize |
2134 | 2138 |
2135 @node The XEmacs Split, XEmacs from the Outside, A History of Emacs, Top | 2139 @node The XEmacs Split, XEmacs from the Outside, A History of Emacs, Top |
2136 @chapter The XEmacs Split | 2140 @chapter The XEmacs Split |
2137 @cindex XEmacs split | 2141 @cindex XEmacs split |
2151 to cooperate a bit with RMS, and the two versions of Emacs will merge. In | 2155 to cooperate a bit with RMS, and the two versions of Emacs will merge. In |
2152 fact there have been six to seven major attempts at merging, each running | 2156 fact there have been six to seven major attempts at merging, each running |
2153 hundreds of messages long and all of them coming from the XEmacs side. All | 2157 hundreds of messages long and all of them coming from the XEmacs side. All |
2154 have failed because they have eventually come to the same conclusion, which | 2158 have failed because they have eventually come to the same conclusion, which |
2155 is that RMS has no real interest in cooperation at all. If you work with | 2159 is that RMS has no real interest in cooperation at all. If you work with |
2156 him, you have to do it his way -- "my way or the highway". Specifically: | 2160 him, you have to do it his way -- ``my way or the highway''. Specifically: |
2157 | 2161 |
2158 @enumerate | 2162 @enumerate |
2159 @item | 2163 @item |
2160 | 2164 |
2161 RMS insists on having legal papers signed for every bit of code that goes | 2165 RMS insists on having legal papers signed for every bit of code that goes |
4046 zero or more Kanji characters followed by zero or more | 4050 zero or more Kanji characters followed by zero or more |
4047 Hiragana characters. | 4051 Hiragana characters. |
4048 @end display | 4052 @end display |
4049 | 4053 |
4050 Then, the problem is that now we can't say that a sequence of | 4054 Then, the problem is that now we can't say that a sequence of |
4051 word-constituents makes up a word. For instance, both Hiragana "A" | 4055 word-constituents makes up a word. For instance, both Hiragana ``A'' |
4052 and Kanji "KAN" are word-constituents but the sequence of these two | 4056 and Kanji ``KAN'' are word-constituents but the sequence of these two |
4053 letters can't be a single word. | 4057 letters can't be a single word. |
4054 | 4058 |
4055 So, we introduced Sextword for Japanese letters. | 4059 So, we introduced Sextword for Japanese letters. |
4056 @end quotation | 4060 @end quotation |
4057 | 4061 |
5006 @item | 5010 @item |
5007 Any header-file declarations of the sort | 5011 Any header-file declarations of the sort |
5008 | 5012 |
5009 struct foobar; | 5013 struct foobar; |
5010 | 5014 |
5011 go into the "types" section of lisp.h. | 5015 go into the ``types'' section of @file{lisp.h}. |
5012 @end itemize | 5016 @end itemize |
5013 | 5017 |
5014 @node Writing New Modules, Working with Lisp Objects, Introduction to Writing C Code, Rules When Writing New C Code | 5018 @node Writing New Modules, Working with Lisp Objects, Introduction to Writing C Code, Rules When Writing New C Code |
5015 @section Writing New Modules | 5019 @section Writing New Modules |
5016 @cindex writing new modules | 5020 @cindex writing new modules |
5664 correct it or flag it as incorrect, as described in the previous | 5668 correct it or flag it as incorrect, as described in the previous |
5665 paragraph. Whenever you work on a section of code, @emph{always} make | 5669 paragraph. Whenever you work on a section of code, @emph{always} make |
5666 sure to update any comments to be correct -- or, at the very least, flag | 5670 sure to update any comments to be correct -- or, at the very least, flag |
5667 them as incorrect. | 5671 them as incorrect. |
5668 | 5672 |
5669 To indicate a "todo" or other problem, use four pound signs -- | 5673 To indicate a ``todo'' or other problem, use four pound signs -- |
5670 i.e. @samp{####}. | 5674 i.e. @samp{####}. |
5671 | 5675 |
5672 @node Adding Global Lisp Variables, Writing Macros, Writing Good Comments, Rules When Writing New C Code | 5676 @node Adding Global Lisp Variables, Writing Macros, Writing Good Comments, Rules When Writing New C Code |
5673 @section Adding Global Lisp Variables | 5677 @section Adding Global Lisp Variables |
5674 @cindex global Lisp variables, adding | 5678 @cindex global Lisp variables, adding |
5849 @enumerate | 5853 @enumerate |
5850 @item | 5854 @item |
5851 Anything that's an lvalue can be evaluated more than once. | 5855 Anything that's an lvalue can be evaluated more than once. |
5852 @item | 5856 @item |
5853 Macros where anything else can be evaluated more than once should | 5857 Macros where anything else can be evaluated more than once should |
5854 have the word "unsafe" in their name (exceptions may be made for | 5858 have the word ``unsafe'' in their name (exceptions may be made for |
5855 large sets of macros that evaluate arguments of certain types more | 5859 large sets of macros that evaluate arguments of certain types more |
5856 than once, e.g. struct buffer * arguments, when clearly indicated in | 5860 than once, e.g. struct buffer * arguments, when clearly indicated in |
5857 the macro documentation). These macros are generally meant to be | 5861 the macro documentation). These macros are generally meant to be |
5858 called only by other macros that have already stored the calling | 5862 called only by other macros that have already stored the calling |
5859 values in temporary variables. | 5863 values in temporary variables. |
5881 Capitalize macros doing stuff obviously impossible with (C) | 5885 Capitalize macros doing stuff obviously impossible with (C) |
5882 functions, e.g. directly modifying arguments as if they were passed by | 5886 functions, e.g. directly modifying arguments as if they were passed by |
5883 reference. | 5887 reference. |
5884 @item | 5888 @item |
5885 Capitalize macros that evaluate @strong{any} argument more than once regardless | 5889 Capitalize macros that evaluate @strong{any} argument more than once regardless |
5886 of whether that's "allowed" (e.g. buffer arguments). | 5890 of whether that's ``allowed'' (e.g. buffer arguments). |
5887 @item | 5891 @item |
5888 Capitalize macros that directly access a field in a Lisp_Object or | 5892 Capitalize macros that directly access a field in a Lisp_Object or |
5889 its equivalent underlying structure. In such cases, access through the | 5893 its equivalent underlying structure. In such cases, access through the |
5890 Lisp_Object precedes the macro with an X, and access through the underlying | 5894 Lisp_Object precedes the macro with an X, and access through the underlying |
5891 structure doesn't. | 5895 structure doesn't. |
5936 a search-and-replace is done to change type names and such. Some people | 5940 a search-and-replace is done to change type names and such. Some people |
5937 disagree with such changes, and certainly if done without good reason | 5941 disagree with such changes, and certainly if done without good reason |
5938 will just lead to headaches. But it's important to keep the code clean | 5942 will just lead to headaches. But it's important to keep the code clean |
5939 and understandable, and consistent naming goes a long way towards this. | 5943 and understandable, and consistent naming goes a long way towards this. |
5940 | 5944 |
5941 An example of the right way to do this was the so-called "great integral | 5945 An example of the right way to do this was the so-called ``great integral |
5942 type renaming". | 5946 type renaming''. |
5943 | 5947 |
5944 @menu | 5948 @menu |
5945 * Great Integral Type Renaming:: | 5949 * Great Integral Type Renaming:: |
5946 * Text/Char Type Renaming:: | 5950 * Text/Char Type Renaming:: |
5947 @end menu | 5951 @end menu |
5964 @item | 5968 @item |
5965 All integral types that measure quantities of anything are signed. Some | 5969 All integral types that measure quantities of anything are signed. Some |
5966 people disagree vociferously with this, but their arguments are mostly | 5970 people disagree vociferously with this, but their arguments are mostly |
5967 theoretical, and are vastly outweighed by the practical headaches of | 5971 theoretical, and are vastly outweighed by the practical headaches of |
5968 mixing signed and unsigned values, and more importantly by the far | 5972 mixing signed and unsigned values, and more importantly by the far |
5969 increased likelihood of inadvertent bugs: Because of the broken "viral" | 5973 increased likelihood of inadvertent bugs: Because of the broken ``viral'' |
5970 nature of unsigned quantities in C (operations involving mixed | 5974 nature of unsigned quantities in C (operations involving mixed |
5971 signed/unsigned are done unsigned, when exactly the opposite is nearly | 5975 signed/unsigned are done unsigned, when exactly the opposite is nearly |
5972 always wanted), even a single error in declaring a quantity unsigned | 5976 always wanted), even a single error in declaring a quantity unsigned |
5973 that should be signed, or even the even more subtle error of comparing | 5977 that should be signed, or even the even more subtle error of comparing |
5974 signed and unsigned values and forgetting the necessary cast, can be | 5978 signed and unsigned values and forgetting the necessary cast, can be |
5975 catastrophic, as comparisons will yield wrong results. -Wsign-compare | 5979 catastrophic, as comparisons will yield wrong results. @samp{-Wsign-compare} |
5976 is turned on specifically to catch this, but this tends to result in a | 5980 is turned on specifically to catch this, but this tends to result in a |
5977 great number of warnings when mixing signed and unsigned, and the casts | 5981 great number of warnings when mixing signed and unsigned, and the casts |
5978 are annoying. More has been written on this elsewhere. | 5982 are annoying. More has been written on this elsewhere. |
5979 | 5983 |
5980 @item | 5984 @item |
5989 Type names should be relatively short (no more than 10 characters or | 5993 Type names should be relatively short (no more than 10 characters or |
5990 so), with the first letter capitalized and no underscores if they can at | 5994 so), with the first letter capitalized and no underscores if they can at |
5991 all be avoided. | 5995 all be avoided. |
5992 | 5996 |
5993 @item | 5997 @item |
5994 "count" == a zero-based measurement of some quantity. Includes sizes, | 5998 ``count'' == a zero-based measurement of some quantity. Includes sizes, |
5995 offsets, and indexes. | 5999 offsets, and indexes. |
5996 | 6000 |
5997 @item | 6001 @item |
5998 "bpos" == a one-based measurement of a position in a buffer. "Charbpos" | 6002 ``bpos'' == a one-based measurement of a position in a buffer. ``Charbpos'' |
5999 and "Bytebpos" count text in the buffer, rather than bytes in memory; | 6003 and ``Bytebpos'' count text in the buffer, rather than bytes in memory; |
6000 thus Bytebpos does not directly correspond to the memory representation. | 6004 thus Bytebpos does not directly correspond to the memory representation. |
6001 Use "Membpos" for this. | 6005 Use ``Membpos'' for this. |
6002 | 6006 |
6003 @item | 6007 @item |
6004 "Char" refers to internal-format characters, not to the C type "char", | 6008 ``Char'' refers to internal-format characters, not to the C type ``char'', |
6005 which is really a byte. | 6009 which is really a byte. |
6006 @end itemize | 6010 @end itemize |
6007 | 6011 |
6008 For the actual name changes, see the script below. | 6012 For the actual name changes, see the script below. |
6009 | 6013 |
6094 #endif | 6098 #endif |
6095 | 6099 |
6096 /* The have been some arguments over the what the type should be that | 6100 /* The have been some arguments over the what the type should be that |
6097 specifies a count of bytes in a data block to be written out or read in, | 6101 specifies a count of bytes in a data block to be written out or read in, |
6098 using @code{Lstream_read()}, @code{Lstream_write()}, and related functions. | 6102 using @code{Lstream_read()}, @code{Lstream_write()}, and related functions. |
6099 Originally it was long, which worked fine; Martin "corrected" these to | 6103 Originally it was long, which worked fine; Martin ``corrected'' these to |
6100 size_t and ssize_t on the grounds that this is theoretically cleaner and | 6104 size_t and ssize_t on the grounds that this is theoretically cleaner and |
6101 is in keeping with the C standards. Unfortunately, this practice is | 6105 is in keeping with the C standards. Unfortunately, this practice is |
6102 horribly error-prone due to design flaws in the way that mixed | 6106 horribly error-prone due to design flaws in the way that mixed |
6103 signed/unsigned arithmetic happens. In fact, by doing this change, | 6107 signed/unsigned arithmetic happens. In fact, by doing this change, |
6104 Martin introduced a subtle but fatal error that caused the operation of | 6108 Martin introduced a subtle but fatal error that caused the operation of |
6469 fixed---use the @code{Known-Bug-Expect-Failure} wrapper macro to mark | 6473 fixed---use the @code{Known-Bug-Expect-Failure} wrapper macro to mark |
6470 them. | 6474 them. |
6471 | 6475 |
6472 @deffn Macro Known-Bug-Expect-Failure body | 6476 @deffn Macro Known-Bug-Expect-Failure body |
6473 Arrange for failing tests in @var{body} to generate messages prefixed | 6477 Arrange for failing tests in @var{body} to generate messages prefixed |
6474 with "KNOWN BUG:" instead of "FAIL:". @var{body} is a @code{progn}-like | 6478 with ``KNOWN BUG:'' instead of ``FAIL:''. @var{body} is a @code{progn}-like |
6475 body, and may contain several tests. | 6479 body, and may contain several tests. |
6476 @end deffn | 6480 @end deffn |
6477 | 6481 |
6478 A lot of the tests we run push limits; suppress Ebola warning messages | 6482 A lot of the tests we run push limits; suppress Ebola warning messages |
6479 with the @code{Ignore-Ebola} wrapper macro. | 6483 with the @code{Ignore-Ebola} wrapper macro. |
6650 with added or deleted files.} If you are lucky, the operation will | 6654 with added or deleted files.} If you are lucky, the operation will |
6651 simply fail. If you are less lucky, it will proceed, but make the | 6655 simply fail. If you are less lucky, it will proceed, but make the |
6652 adds and deletes on the main line, which you do not want at all. | 6656 adds and deletes on the main line, which you do not want at all. |
6653 Therefore, you must undo all adds and deletes. To find out what is | 6657 Therefore, you must undo all adds and deletes. To find out what is |
6654 added and deleted, use something like @code{cvs -n update >&! | 6658 added and deleted, use something like @code{cvs -n update >&! |
6655 cvs.out}, which does a "dry run". (You did make a backup copy first, | 6659 cvs.out}, which does a ``dry run''. (You did make a backup copy first, |
6656 right? What if you forgot the @samp{-n}, for example, and wasn't | 6660 right? What if you forgot the @samp{-n}, for example, and wasn't |
6657 prepared for the sudden onslaught of merging action?) Take a look at | 6661 prepared for the sudden onslaught of merging action?) Take a look at |
6658 the output file @file{cvs.out} and check very carefully for newly | 6662 the output file @file{cvs.out} and check very carefully for newly |
6659 added files (marked with an @samp{A}) and newly removed files (marked | 6663 added files (marked with an @samp{A}) and newly removed files (marked |
6660 with an @samp{R}). Double check that your newly added files are in | 6664 with an @samp{R}). Double check that your newly added files are in |
6682 crw tag -b ben-mule-21-5 | 6686 crw tag -b ben-mule-21-5 |
6683 @end example | 6687 @end example |
6684 | 6688 |
6685 Note that this doesn't actually do anything to your local workspace! | 6689 Note that this doesn't actually do anything to your local workspace! |
6686 It basically just creates another tag in the repository, identical to | 6690 It basically just creates another tag in the repository, identical to |
6687 the branch point tag but internally marked as a "branch tag" rather | 6691 the branch point tag but internally marked as a ``branch tag'' rather |
6688 than a regular tag. | 6692 than a regular tag. |
6689 | 6693 |
6690 @item | 6694 @item |
6691 Now, move your workspace onto the branch: | 6695 Now, move your workspace onto the branch: |
6692 | 6696 |
7016 and when you add a new element, the array automatically resizes itself | 7020 and when you add a new element, the array automatically resizes itself |
7017 if it isn't big enough. Dynarrs are extensively used in the redisplay | 7021 if it isn't big enough. Dynarrs are extensively used in the redisplay |
7018 mechanism. | 7022 mechanism. |
7019 | 7023 |
7020 | 7024 |
7021 A "dynamic array" is a contiguous array of fixed-size elements where there | 7025 A ``dynamic array'' is a contiguous array of fixed-size elements where there |
7022 is no upper limit (except available memory) on the number of elements in the | 7026 is no upper limit (except available memory) on the number of elements in the |
7023 array. Because the elements are maintained contiguously, space is used | 7027 array. Because the elements are maintained contiguously, space is used |
7024 efficiently (no per-element pointers necessary) and random access to a | 7028 efficiently (no per-element pointers necessary) and random access to a |
7025 particular element is in constant time. At any one point, the block of memory | 7029 particular element is in constant time. At any one point, the block of memory |
7026 that holds the array has an upper limit; if this limit is exceeded, the | 7030 that holds the array has an upper limit; if this limit is exceeded, the |
7027 memory is realloc()ed into a new array that is twice as big. Assuming that | 7031 memory is @code{realloc()}ed into a new array that is twice as big. Assuming that |
7028 the time to grow the array is on the order of the new size of the array | 7032 the time to grow the array is on the order of the new size of the array |
7029 block, this scheme has a provably constant amortized time (i.e. average | 7033 block, this scheme has a provably constant amortized time (i.e. average |
7030 time over all additions). | 7034 time over all additions). |
7031 | 7035 |
7032 When you add elements or retrieve elements, pointers are used. Note that | 7036 When you add elements or retrieve elements, pointers are used. Note that |
7130 onto a linked list, so they can be efficiently reused. This data type | 7134 onto a linked list, so they can be efficiently reused. This data type |
7131 is not much used in XEmacs currently, because it's a fairly new | 7135 is not much used in XEmacs currently, because it's a fairly new |
7132 addition. | 7136 addition. |
7133 | 7137 |
7134 | 7138 |
7135 A "block-type object" is used to efficiently allocate and free blocks | 7139 A ``block-type object'' is used to efficiently allocate and free blocks |
7136 of a particular size. Freed blocks are remembered in a free list and | 7140 of a particular size. Freed blocks are remembered in a free list and |
7137 are reused as necessary to allocate new blocks, so as to avoid as | 7141 are reused as necessary to allocate new blocks, so as to avoid as |
7138 much as possible making calls to malloc() and free(). | 7142 much as possible making calls to @code{malloc()} and @code{free()}. |
7139 | 7143 |
7140 This is a container object. Declare a block-type object of a specific type | 7144 This is a container object. Declare a block-type object of a specific type |
7141 as follows: | 7145 as follows: |
7142 | 7146 |
7143 struct mytype_blocktype @{ | 7147 struct mytype_blocktype @{ |
8275 @code{this_one_is_unmarkable} in @code{alloc.c}). | 8279 @code{this_one_is_unmarkable} in @code{alloc.c}). |
8276 | 8280 |
8277 Now, the actual marking is feasible. We do so by once using the macro | 8281 Now, the actual marking is feasible. We do so by once using the macro |
8278 @code{MARK_RECORD_HEADER} to mark the object itself (actually the | 8282 @code{MARK_RECORD_HEADER} to mark the object itself (actually the |
8279 special flag in the lrecord header), and calling its special marker | 8283 special flag in the lrecord header), and calling its special marker |
8280 "method" @code{marker} if available. The marker method marks every | 8284 ``method'' @code{marker} if available. The marker method marks every |
8281 other object that is in reach from our current object. Note, that these | 8285 other object that is in reach from our current object. Note, that these |
8282 marker methods should not call @code{mark_object} recursively, but | 8286 marker methods should not call @code{mark_object} recursively, but |
8283 instead should return the next object from where further marking has to | 8287 instead should return the next object from where further marking has to |
8284 be performed. | 8288 be performed. |
8285 | 8289 |
8330 @code{sweep_conses}, @code{sweep_bit_vectors_1}, | 8334 @code{sweep_conses}, @code{sweep_bit_vectors_1}, |
8331 @code{sweep_compiled_functions}, @code{sweep_floats}, | 8335 @code{sweep_compiled_functions}, @code{sweep_floats}, |
8332 @code{sweep_symbols}, @code{sweep_extents}, @code{sweep_markers} and | 8336 @code{sweep_symbols}, @code{sweep_extents}, @code{sweep_markers} and |
8333 @code{sweep_extents}. They are the fixed-size types cons, floats, | 8337 @code{sweep_extents}. They are the fixed-size types cons, floats, |
8334 compiled-functions, symbol, marker, extent, and event stored in | 8338 compiled-functions, symbol, marker, extent, and event stored in |
8335 so-called "frob blocks", and therefore we can basically do the same on | 8339 so-called ``frob blocks'', and therefore we can basically do the same on |
8336 every type objects, using the same macros, especially defined only to | 8340 every type objects, using the same macros, especially defined only to |
8337 handle everything with respect to fixed-size blocks. The only fixed-size | 8341 handle everything with respect to fixed-size blocks. The only fixed-size |
8338 type that is not handled here are the fixed-size portion of strings, | 8342 type that is not handled here are the fixed-size portion of strings, |
8339 because we took special care of them earlier. | 8343 because we took special care of them earlier. |
8340 | 8344 |
10004 complicated depending on how much information we cache. In addition to | 10008 complicated depending on how much information we cache. In addition to |
10005 the known region, we always cache the correct conversions for point, | 10009 the known region, we always cache the correct conversions for point, |
10006 BEGV, and ZV, and in addition to this we cache 16 positions where the | 10010 BEGV, and ZV, and in addition to this we cache 16 positions where the |
10007 conversion is known. We only look in the cache or update it when we | 10011 conversion is known. We only look in the cache or update it when we |
10008 need to move the known region more than a certain amount (currently 50 | 10012 need to move the known region more than a certain amount (currently 50 |
10009 chars), and then we throw away a "random" value and replace it with the | 10013 chars), and then we throw away a ``random'' value and replace it with the |
10010 newly calculated value. | 10014 newly calculated value. |
10011 | 10015 |
10012 Finally, we maintain an extra flag that tracks whether the buffer is | 10016 Finally, we maintain an extra flag that tracks whether the buffer is |
10013 entirely ASCII, to speed up the conversions even more. This flag is | 10017 entirely ASCII, to speed up the conversions even more. This flag is |
10014 actually of dubious value because in an entirely-ASCII buffer the known | 10018 actually of dubious value because in an entirely-ASCII buffer the known |
10040 track of a shifter value (0, 1, or 2) indicating how much to shift. | 10044 track of a shifter value (0, 1, or 2) indicating how much to shift. |
10041 Multiplying by 3 can be implemented by doubling and then adding the | 10045 Multiplying by 3 can be implemented by doubling and then adding the |
10042 original value. Dividing by 3, alas, cannot be implemented in any | 10046 original value. Dividing by 3, alas, cannot be implemented in any |
10043 simple shift/subtract method, as far as I know; so we just do a table | 10047 simple shift/subtract method, as far as I know; so we just do a table |
10044 lookup. For simplicity, we use a table of size 128K, which indexes the | 10048 lookup. For simplicity, we use a table of size 128K, which indexes the |
10045 "divide-by-3" values for the first 64K non-negative numbers. (Note that | 10049 ``divide-by-3'' values for the first 64K non-negative numbers. (Note that |
10046 we can increase the size up to 384K, i.e. indexing the first 192K | 10050 we can increase the size up to 384K, i.e. indexing the first 192K |
10047 non-negative numbers, while still using shorts in the array.) This also | 10051 non-negative numbers, while still using shorts in the array.) This also |
10048 means that the size of the known region can be at most 64K for | 10052 means that the size of the known region can be at most 64K for |
10049 width-three characters. | 10053 width-three characters. |
10050 @end quotation | 10054 @end quotation |
10070 @item | 10074 @item |
10071 the position of the gap | 10075 the position of the gap |
10072 @item | 10076 @item |
10073 the last value we computed | 10077 the last value we computed |
10074 @item | 10078 @item |
10075 a set of positions that are "far away" from previously computed positions | 10079 a set of positions that are ``far away'' from previously computed positions |
10076 (5000 chars currently; #### perhaps should be smaller) | 10080 (5000 chars currently; #### perhaps should be smaller) |
10077 @end itemize | 10081 @end itemize |
10078 | 10082 |
10079 For each position, we @code{CONSIDER()} it. This means: | 10083 For each position, we @code{CONSIDER()} it. This means: |
10080 | 10084 |
10096 the simple loop in FSF with the use of @code{bytecount_to_charcount()}, | 10100 the simple loop in FSF with the use of @code{bytecount_to_charcount()}, |
10097 @code{charcount_to_bytecount()}, @code{bytecount_to_charcount_down()}, or | 10101 @code{charcount_to_bytecount()}, @code{bytecount_to_charcount_down()}, or |
10098 @code{charcount_to_bytecount_down()}. (The latter two I added for this purpose.) | 10102 @code{charcount_to_bytecount_down()}. (The latter two I added for this purpose.) |
10099 These scan 4 or 8 bytes at a time through purely single-byte characters. | 10103 These scan 4 or 8 bytes at a time through purely single-byte characters. |
10100 | 10104 |
10101 If the amount we had to scan was more than our "far away" distance (5000 | 10105 If the amount we had to scan was more than our ``far away'' distance (5000 |
10102 characters, see above), then cache the new position. | 10106 characters, see above), then cache the new position. |
10103 | 10107 |
10104 #### Things to do: | 10108 #### Things to do: |
10105 | 10109 |
10106 @itemize @bullet | 10110 @itemize @bullet |
10107 @item | 10111 @item |
10108 Look at the most recent GNU Emacs to see whether anything has changed. | 10112 Look at the most recent GNU Emacs to see whether anything has changed. |
10109 @item | 10113 @item |
10110 Think about whether it makes sense to try to implement some sort of | 10114 Think about whether it makes sense to try to implement some sort of |
10111 known region or list of "known regions", like we had before. This would | 10115 known region or list of ``known regions'', like we had before. This would |
10112 be a region of entirely single-byte characters that we can check very | 10116 be a region of entirely single-byte characters that we can check very |
10113 quickly. (Previously I used a range of same-width characters of any | 10117 quickly. (Previously I used a range of same-width characters of any |
10114 size; but this adds extra complexity and slows down the scanning, and is | 10118 size; but this adds extra complexity and slows down the scanning, and is |
10115 probably not worth it.) As part of the scanning process in | 10119 probably not worth it.) As part of the scanning process in |
10116 @code{bytecount_to_charcount()} et al, we skip over chunks of entirely | 10120 @code{bytecount_to_charcount()} et al, we skip over chunks of entirely |
10324 In terms of reading the actual code, there are five optimizations | 10328 In terms of reading the actual code, there are five optimizations |
10325 (obfuscations, if you like) that have been done. | 10329 (obfuscations, if you like) that have been done. |
10326 | 10330 |
10327 @enumerate | 10331 @enumerate |
10328 @item | 10332 @item |
10329 An explicit "failure stack" has been substituted for recursion. | 10333 An explicit ``failure stack'' has been substituted for recursion. |
10330 | 10334 |
10331 @item | 10335 @item |
10332 The @code{match_1_operator}, @code{next_p}, and @code{next_b} functions | 10336 The @code{match_1_operator}, @code{next_p}, and @code{next_b} functions |
10333 are actually inlined into the @code{match} function for efficiency. | 10337 are actually inlined into the @code{match} function for efficiency. |
10334 Then the pointer movement is interspersed with the matching operations. | 10338 Then the pointer movement is interspersed with the matching operations. |
10337 If the operator uses buffer context, the buffer pointer movement is | 10341 If the operator uses buffer context, the buffer pointer movement is |
10338 sometimes implicit in the operations retrieving the context. | 10342 sometimes implicit in the operations retrieving the context. |
10339 | 10343 |
10340 @item | 10344 @item |
10341 Some cases are combined into short preparation for individual cases, and | 10345 Some cases are combined into short preparation for individual cases, and |
10342 a "fall-through" into combined code for several cases. | 10346 a ``fall-through'' into combined code for several cases. |
10343 | 10347 |
10344 @item | 10348 @item |
10345 The @code{pattern} type is not an explicit @samp{struct}. Instead, the | 10349 The @code{pattern} type is not an explicit @samp{struct}. Instead, the |
10346 data (including, @emph{e.g.}, @samp{range_table}) is inlined into the | 10350 data (including, @emph{e.g.}, @samp{range_table}) is inlined into the |
10347 compiled bytecode. This leads to bizarre code in the interpreter like | 10351 compiled bytecode. This leads to bizarre code in the interpreter like |
10356 @example | 10360 @example |
10357 ..., 'range', count, first_8_flags, second_8_flags, ..., next_op, ... | 10361 ..., 'range', count, first_8_flags, second_8_flags, ..., next_op, ... |
10358 @end example | 10362 @end example |
10359 @end enumerate | 10363 @end enumerate |
10360 | 10364 |
10361 But if you keep your eye on the "switch in a loop" structure, you | 10365 But if you keep your eye on the ``switch in a loop'' structure, you |
10362 should be able to understand the parts you need. | 10366 should be able to understand the parts you need. |
10363 | 10367 |
10364 @node Multilingual Support, Consoles; Devices; Frames; Windows, Text, Top | 10368 @node Multilingual Support, Consoles; Devices; Frames; Windows, Text, Top |
10365 @chapter Multilingual Support | 10369 @chapter Multilingual Support |
10366 @cindex Mule character sets and encodings | 10370 @cindex Mule character sets and encodings |
10818 a simple charset like ASCII, there is only one encoding normally used -- | 10822 a simple charset like ASCII, there is only one encoding normally used -- |
10819 each character is represented by a single byte, with the same value as | 10823 each character is represented by a single byte, with the same value as |
10820 its code point. For more complicated charsets, however, things are not | 10824 its code point. For more complicated charsets, however, things are not |
10821 so obvious. Unicode version 2, for example, is a large charset with | 10825 so obvious. Unicode version 2, for example, is a large charset with |
10822 thousands of characters, each indexed by a 16-bit number, often | 10826 thousands of characters, each indexed by a 16-bit number, often |
10823 represented in hex, e.g. 0x05D0 for the Hebrew letter "aleph". One | 10827 represented in hex, e.g. 0x05D0 for the Hebrew letter ``aleph''. One |
10824 obvious encoding uses two bytes per character (actually two encodings, | 10828 obvious encoding uses two bytes per character (actually two encodings, |
10825 depending on which of the two possible byte orderings is chosen). This | 10829 depending on which of the two possible byte orderings is chosen). This |
10826 encoding is convenient for internal processing of Unicode text; however, | 10830 encoding is convenient for internal processing of Unicode text; however, |
10827 it's incompatible with ASCII, so a different encoding, e.g. UTF-8, is | 10831 it's incompatible with ASCII, so a different encoding, e.g. UTF-8, is |
10828 usually used for external text, for example files or e-mail. UTF-8 | 10832 usually used for external text, for example files or e-mail. UTF-8 |
10839 | 10843 |
10840 In an ASCII or single-European-character-set world, life is very simple. | 10844 In an ASCII or single-European-character-set world, life is very simple. |
10841 There are 256 characters, and each character is represented using the | 10845 There are 256 characters, and each character is represented using the |
10842 numbers 0 through 255, which fit into a single byte. With a few | 10846 numbers 0 through 255, which fit into a single byte. With a few |
10843 exceptions (such as case-changing operations or syntax classes like | 10847 exceptions (such as case-changing operations or syntax classes like |
10844 'whitespace'), "text" is simply an array of indices into a font. You | 10848 @code{whitespace}), ``text'' is simply an array of indices into a font. You |
10845 can get different languages simply by choosing fonts with different | 10849 can get different languages simply by choosing fonts with different |
10846 8-bit character sets (ISO-8859-1, -2, special-symbol fonts, etc.), and | 10850 8-bit character sets (ISO-8859-1, -2, special-symbol fonts, etc.), and |
10847 everything will "just work" as long as anyone else receiving your text | 10851 everything will ``just work'' as long as anyone else receiving your text |
10848 uses a compatible font. | 10852 uses a compatible font. |
10849 | 10853 |
10850 In the multi-lingual world, however, it is much more complicated. There | 10854 In the multi-lingual world, however, it is much more complicated. There |
10851 are a great number of different characters which are organized in a | 10855 are a great number of different characters which are organized in a |
10852 complex fashion into various character sets. The representation to use | 10856 complex fashion into various character sets. The representation to use |
10892 text as possible. No operations should ever be performed on text encoded | 10896 text as possible. No operations should ever be performed on text encoded |
10893 in an external representation other than simple copying, because no | 10897 in an external representation other than simple copying, because no |
10894 assumptions can reliably be made about the format of this text. You | 10898 assumptions can reliably be made about the format of this text. You |
10895 cannot assume, for example, that the end of text is terminated by a null | 10899 cannot assume, for example, that the end of text is terminated by a null |
10896 byte. (For example, if the text is Unicode, it will have many null bytes | 10900 byte. (For example, if the text is Unicode, it will have many null bytes |
10897 in it.) You cannot find the next "slash" character by searching through | 10901 in it.) You cannot find the next ``slash'' character by searching through |
10898 the bytes until you find a byte that looks like a "slash" character, | 10902 the bytes until you find a byte that looks like a ``slash'' character, |
10899 because it might actually be the second byte of a Kanji character. | 10903 because it might actually be the second byte of a Kanji character. |
10900 Furthermore, all text in the internal representation must be converted, | 10904 Furthermore, all text in the internal representation must be converted, |
10901 even if it is known to be completely ASCII, because the external | 10905 even if it is known to be completely ASCII, because the external |
10902 representation may not be ASCII compatible (for example, if it is | 10906 representation may not be ASCII compatible (for example, if it is |
10903 Unicode). | 10907 Unicode). |
10923 the structures of a particular external encoding and the methods required | 10927 the structures of a particular external encoding and the methods required |
10924 to convert to and from this encoding. A facility exists to create coding | 10928 to convert to and from this encoding. A facility exists to create coding |
10925 system aliases, which in essence gives a single coding system two | 10929 system aliases, which in essence gives a single coding system two |
10926 different names. It is effectively used in XEmacs to provide a layer of | 10930 different names. It is effectively used in XEmacs to provide a layer of |
10927 abstraction on top of the actual coding systems. For example, the coding | 10931 abstraction on top of the actual coding systems. For example, the coding |
10928 system alias "file-name" points to whichever coding system is currently | 10932 system alias ``file-name'' points to whichever coding system is currently |
10929 used for encoding and decoding file names as passed to or retrieved from | 10933 used for encoding and decoding file names as passed to or retrieved from |
10930 system calls. In general, the actual encoding will differ from system to | 10934 system calls. In general, the actual encoding will differ from system to |
10931 system, and also on the particular locale that the user is in. The use | 10935 system, and also on the particular locale that the user is in. The use |
10932 of the file-name alias effectively hides that implementation detail on | 10936 of the file-name alias effectively hides that implementation detail on |
10933 top of that abstract interface layer which provides a unified set of | 10937 top of that abstract interface layer which provides a unified set of |
11434 C = plain char, when the base type is unsigned | 11438 C = plain char, when the base type is unsigned |
11435 U = unsigned | 11439 U = unsigned |
11436 S = signed | 11440 S = signed |
11437 @end example | 11441 @end example |
11438 | 11442 |
11439 (Formerly I had a comment saying that type (e) "should be replaced with | 11443 (Formerly I had a comment saying that type (e) ``should be replaced with |
11440 void *". However, there are in fact many places where an unsigned char | 11444 void *''. However, there are in fact many places where an unsigned char |
11441 * might be used -- e.g. for ease in pointer computation, since void * | 11445 * might be used -- e.g. for ease in pointer computation, since void * |
11442 doesn't allow this, and for compatibility with external APIs.) | 11446 doesn't allow this, and for compatibility with external APIs.) |
11443 | 11447 |
11444 Note that these typedefs are purely for documentation purposes; from | 11448 Note that these typedefs are purely for documentation purposes; from |
11445 the C code's perspective, they are exactly equivalent to @code{char *}, | 11449 the C code's perspective, they are exactly equivalent to @code{char *}, |
11456 @node Different Ways of Seeing Internal Text, Buffer Positions, Byte Types, Byte/Character Types; Buffer Positions; Other Typedefs | 11460 @node Different Ways of Seeing Internal Text, Buffer Positions, Byte Types, Byte/Character Types; Buffer Positions; Other Typedefs |
11457 @subsection Different Ways of Seeing Internal Text | 11461 @subsection Different Ways of Seeing Internal Text |
11458 @cindex different ways of seeing internal text | 11462 @cindex different ways of seeing internal text |
11459 | 11463 |
11460 There are various ways of representing internal text. The two primary | 11464 There are various ways of representing internal text. The two primary |
11461 ways are as an "array" of individual characters; the other is as a | 11465 ways are as an ``array'' of individual characters; the other is as a |
11462 "stream" of bytes. In the ASCII world, where there are only 255 | 11466 ``stream'' of bytes. In the ASCII world, where there are only 255 |
11463 characters at most, things are easy because each character fits into a | 11467 characters at most, things are easy because each character fits into a |
11464 byte. In general, however, this is not true -- see the above discussion | 11468 byte. In general, however, this is not true -- see the above discussion |
11465 of characters vs. encodings. | 11469 of characters vs. encodings. |
11466 | 11470 |
11467 In some cases, it's also important to distinguish between a stream | 11471 In some cases, it's also important to distinguish between a stream |
11468 representation as a series of bytes and as a series of textual units. | 11472 representation as a series of bytes and as a series of textual units. |
11469 This is particularly important wrt Unicode. The UTF-16 representation | 11473 This is particularly important wrt Unicode. The UTF-16 representation |
11470 (sometimes referred to, rather sloppily, as simply the "Unicode" format) | 11474 (sometimes referred to, rather sloppily, as simply the ``Unicode'' format) |
11471 represents text as a series of 16-bit units. Mostly, each unit | 11475 represents text as a series of 16-bit units. Mostly, each unit |
11472 corresponds to a single character, but not necessarily, as characters | 11476 corresponds to a single character, but not necessarily, as characters |
11473 outside of the range 0-65535 (the BMP or "Basic Multilingual Plane" of | 11477 outside of the range 0-65535 (the BMP or ``Basic Multilingual Plane'' of |
11474 Unicode) require two 16-bit units, through the mechanism of | 11478 Unicode) require two 16-bit units, through the mechanism of |
11475 "surrogates". When a series of 16-bit units is serialized into a byte | 11479 ``surrogates''. When a series of 16-bit units is serialized into a byte |
11476 stream, there are at least two possible representations, little-endian | 11480 stream, there are at least two possible representations, little-endian |
11477 and big-endian, and which one is used may depend on the native format of | 11481 and big-endian, and which one is used may depend on the native format of |
11478 16-bit integers in the CPU of the machine that XEmacs is running | 11482 16-bit integers in the CPU of the machine that XEmacs is running |
11479 on. (Similarly, UTF-32 is logically a representation with 32-bit textual | 11483 on. (Similarly, UTF-32 is logically a representation with 32-bit textual |
11480 units.) | 11484 units.) |
11487 @item | 11491 @item |
11488 UTF-16 has 2-byte (16-bit) units. | 11492 UTF-16 has 2-byte (16-bit) units. |
11489 @item | 11493 @item |
11490 UTF-32 has 4-byte (32-bit) units. | 11494 UTF-32 has 4-byte (32-bit) units. |
11491 @item | 11495 @item |
11492 XEmacs-internal encoding (the old "Mule" encoding) has 1-byte (8-bit) | 11496 XEmacs-internal encoding (the old ``Mule'' encoding) has 1-byte (8-bit) |
11493 units. | 11497 units. |
11494 @item | 11498 @item |
11495 UTF-7 technically has 7-bit units that are within the "mail-safe" range | 11499 UTF-7 technically has 7-bit units that are within the ``mail-safe'' range |
11496 (ASCII 32 - 126 plus a few control characters), but normally is encoded | 11500 (ASCII 32 - 126 plus a few control characters), but normally is encoded |
11497 in an 8-bit stream. (UTF-7 is also a modal encoding, since it has a | 11501 in an 8-bit stream. (UTF-7 is also a modal encoding, since it has a |
11498 normal mode where printable ASCII characters represent themselves and a | 11502 normal mode where printable ASCII characters represent themselves and a |
11499 shifted mode, introduced with a plus sign, where a base-64 encoding is | 11503 shifted mode, introduced with a plus sign, where a base-64 encoding is |
11500 used.) | 11504 used.) |
11555 @table @code | 11559 @table @code |
11556 @item Ibyte | 11560 @item Ibyte |
11557 The data in a buffer or string is logically made up of Ibyte objects, | 11561 The data in a buffer or string is logically made up of Ibyte objects, |
11558 where a Ibyte takes up the same amount of space as a char. (It is | 11562 where a Ibyte takes up the same amount of space as a char. (It is |
11559 declared differently, though, to catch invalid usages.) Strings stored | 11563 declared differently, though, to catch invalid usages.) Strings stored |
11560 using Ibytes are said to be in "internal format". The important | 11564 using Ibytes are said to be in ``internal format''. The important |
11561 characteristics of internal format are | 11565 characteristics of internal format are |
11562 | 11566 |
11563 @itemize @minus | 11567 @itemize @minus |
11564 @item | 11568 @item |
11565 ASCII characters are represented as a single Ibyte, in the range 0 - | 11569 ASCII characters are represented as a single Ibyte, in the range 0 - |
11608 | 11612 |
11609 This means that Ichar values are upwardly compatible with the standard | 11613 This means that Ichar values are upwardly compatible with the standard |
11610 8-bit representation of ASCII/ISO-8859-1. | 11614 8-bit representation of ASCII/ISO-8859-1. |
11611 | 11615 |
11612 @item Extbyte | 11616 @item Extbyte |
11613 Strings that go in or out of Emacs are in "external format", typedef'ed | 11617 Strings that go in or out of Emacs are in ``external format'', typedef'ed |
11614 as an array of char or a char *. There is more than one external format | 11618 as an array of char or a char *. There is more than one external format |
11615 (JIS, EUC, etc.) but they all have similar properties. They are modal | 11619 (JIS, EUC, etc.) but they all have similar properties. They are modal |
11616 encodings, which is to say that the meaning of particular bytes is not | 11620 encodings, which is to say that the meaning of particular bytes is not |
11617 fixed but depends on what "mode" the string is currently in (e.g. bytes | 11621 fixed but depends on what ``mode'' the string is currently in (e.g. bytes |
11618 in the range 0 - 0x7f might be interpreted as ASCII, or as Hiragana, or | 11622 in the range 0 - 0x7f might be interpreted as ASCII, or as Hiragana, or |
11619 as 2-byte Kanji, depending on the current mode). The mode starts out in | 11623 as 2-byte Kanji, depending on the current mode). The mode starts out in |
11620 ASCII/ISO-8859-1 and is switched using escape sequences -- for example, | 11624 ASCII/ISO-8859-1 and is switched using escape sequences -- for example, |
11621 in the JIS encoding, 'ESC $ B' switches to a mode where pairs of bytes | 11625 in the JIS encoding, 'ESC $ B' switches to a mode where pairs of bytes |
11622 in the range 0 - 0x7f are interpreted as Kanji characters. | 11626 in the range 0 - 0x7f are interpreted as Kanji characters. |
11642 | 11646 |
11643 There are three possible ways to specify positions in a buffer. All | 11647 There are three possible ways to specify positions in a buffer. All |
11644 of these are one-based: the beginning of the buffer is position or | 11648 of these are one-based: the beginning of the buffer is position or |
11645 index 1, and 0 is not a valid position. | 11649 index 1, and 0 is not a valid position. |
11646 | 11650 |
11647 As a "buffer position" (typedef Charbpos): | 11651 As a ``buffer position'' (typedef Charbpos): |
11648 | 11652 |
11649 This is an index specifying an offset in characters from the | 11653 This is an index specifying an offset in characters from the |
11650 beginning of the buffer. Note that buffer positions are | 11654 beginning of the buffer. Note that buffer positions are |
11651 logically @strong{between} characters, not on a character. The | 11655 logically @strong{between} characters, not on a character. The |
11652 difference between two buffer positions specifies the number of | 11656 difference between two buffer positions specifies the number of |
11653 characters between those positions. Buffer positions are the | 11657 characters between those positions. Buffer positions are the |
11654 only kind of position externally visible to the user. | 11658 only kind of position externally visible to the user. |
11655 | 11659 |
11656 As a "byte index" (typedef Bytebpos): | 11660 As a ``byte index'' (typedef Bytebpos): |
11657 | 11661 |
11658 This is an index over the bytes used to represent the characters | 11662 This is an index over the bytes used to represent the characters |
11659 in the buffer. If there is no Mule support, this is identical | 11663 in the buffer. If there is no Mule support, this is identical |
11660 to a buffer position, because each character is represented | 11664 to a buffer position, because each character is represented |
11661 using one byte. However, with Mule support, many characters | 11665 using one byte. However, with Mule support, many characters |
11662 require two or more bytes for their representation, and so a | 11666 require two or more bytes for their representation, and so a |
11663 byte index may be greater than the corresponding buffer | 11667 byte index may be greater than the corresponding buffer |
11664 position. | 11668 position. |
11665 | 11669 |
11666 As a "memory index" (typedef Membpos): | 11670 As a ``memory index'' (typedef Membpos): |
11667 | 11671 |
11668 This is the byte index adjusted for the gap. For positions | 11672 This is the byte index adjusted for the gap. For positions |
11669 before the gap, this is identical to the byte index. For | 11673 before the gap, this is identical to the byte index. For |
11670 positions after the gap, this is the byte index plus the gap | 11674 positions after the gap, this is the byte index plus the gap |
11671 size. There are two possible memory indices for the gap | 11675 size. There are two possible memory indices for the gap |
11672 position; the memory index at the beginning of the gap should | 11676 position; the memory index at the beginning of the gap should |
11673 always be used, except in code that deals with manipulating the | 11677 always be used, except in code that deals with manipulating the |
11674 gap, where both indices may be seen. The address of the | 11678 gap, where both indices may be seen. The address of the |
11675 character "at" (i.e. following) a particular position can be | 11679 character ``at'' (i.e. following) a particular position can be |
11676 obtained from the formula | 11680 obtained from the formula |
11677 | 11681 |
11678 buffer_start_address + memory_index(position) - 1 | 11682 buffer_start_address + memory_index(position) - 1 |
11679 | 11683 |
11680 except in the case of characters at the gap position. | 11684 except in the case of characters at the gap position. |
11779 use the buffer-level functions in buffer.h, which automatically know the | 11783 use the buffer-level functions in buffer.h, which automatically know the |
11780 correct format and handle the gap. | 11784 correct format and handle the gap. |
11781 | 11785 |
11782 Some terminology: | 11786 Some terminology: |
11783 | 11787 |
11784 "itext" appearing in the macros means "internal-format text" -- type | 11788 itext" appearing in the macros means "internal-format text" -- type |
11785 @code{Ibyte *}. Operations on such pointers themselves, rather than on the | 11789 @code{Ibyte *}. Operations on such pointers themselves, rather than on the |
11786 text being pointed to, have "itext" instead of "itext" in the macro | 11790 text being pointed to, have "itext" instead of "itext" in the macro |
11787 name. "ichar" in the macro names means an Ichar -- the representation | 11791 name. "ichar" in the macro names means an Ichar -- the representation |
11788 of a character as a single integer rather than a series of bytes, as part | 11792 of a character as a single integer rather than a series of bytes, as part |
11789 of "itext". Many of the macros below are for converting between the | 11793 of "itext". Many of the macros below are for converting between the |
11988 @item | 11992 @item |
11989 (c) using the GCC extension (@{ ... @}). | 11993 (c) using the GCC extension (@{ ... @}). |
11990 @end itemize | 11994 @end itemize |
11991 | 11995 |
11992 Turned out that all of the above had bugs, all caused by GCC (hence the | 11996 Turned out that all of the above had bugs, all caused by GCC (hence the |
11993 comments about "those GCC wankers" and "ream gcc up the ass"). As for | 11997 comments about ``those GCC wankers'' and ``ream gcc up the ass''). As for |
11994 (a), some versions of GCC (especially on Intel platforms), which had | 11998 (a), some versions of GCC (especially on Intel platforms), which had |
11995 buggy implementations of @code{alloca()} that couldn't handle being called | 11999 buggy implementations of @code{alloca()} that couldn't handle being called |
11996 inside of a function call -- they just decremented the stack right in the | 12000 inside of a function call -- they just decremented the stack right in the |
11997 middle of pushing args. Oops, crash with stack trashing, very bad. (b) | 12001 middle of pushing args. Oops, crash with stack trashing, very bad. (b) |
11998 was an attempt to fix (a), and that led to further GCC crashes, esp. when | 12002 was an attempt to fix (a), and that led to further GCC crashes, esp. when |
12971 consistency. For example, the new Mule workspace contains Ibyte | 12975 consistency. For example, the new Mule workspace contains Ibyte |
12972 versions of the stdlib string functions. | 12976 versions of the stdlib string functions. |
12973 @item Extbyte, UExtbyte | 12977 @item Extbyte, UExtbyte |
12974 Pointer to text in some external format, which can be defined as all | 12978 Pointer to text in some external format, which can be defined as all |
12975 formats other than the internal one. The data representing a string | 12979 formats other than the internal one. The data representing a string |
12976 in "external" format (binary or any external encoding) is logically a | 12980 in ``external'' format (binary or any external encoding) is logically a |
12977 set of Extbytes. Extbyte is guaranteed to be just a char, so for | 12981 set of Extbytes. Extbyte is guaranteed to be just a char, so for |
12978 example strlen (Extbyte *) is OK. Extbyte is only a documentation | 12982 example strlen (Extbyte *) is OK. Extbyte is only a documentation |
12979 device for referring to external text. | 12983 device for referring to external text. |
12980 @item Ascbyte, UAscbyte | 12984 @item Ascbyte, UAscbyte |
12981 pure ASCII text, consisting of bytesf in a string in entirely US-ASCII | 12985 pure ASCII text, consisting of bytesf in a string in entirely US-ASCII |
13115 | 13119 |
13116 @node Mule-izing Code, , An Example of Mule-Aware Code, Coding for Mule | 13120 @node Mule-izing Code, , An Example of Mule-Aware Code, Coding for Mule |
13117 @subsection Mule-izing Code | 13121 @subsection Mule-izing Code |
13118 | 13122 |
13119 A lot of code is written without Mule in mind, and needs to be made | 13123 A lot of code is written without Mule in mind, and needs to be made |
13120 Mule-correct or "Mule-ized". There is really no substitute for | 13124 Mule-correct or ``Mule-ized''. There is really no substitute for |
13121 line-by-line analysis when doing this, but the following checklist can | 13125 line-by-line analysis when doing this, but the following checklist can |
13122 help: | 13126 help: |
13123 | 13127 |
13124 @itemize @bullet | 13128 @itemize @bullet |
13125 @item | 13129 @item |
13333 @item | 13337 @item |
13334 Look in the CRT sources! They come with VC++. See win32.c. | 13338 Look in the CRT sources! They come with VC++. See win32.c. |
13335 @end enumerate | 13339 @end enumerate |
13336 | 13340 |
13337 @node Locales, More about code pages, Microsoft Documentation, Microsoft Windows-Related Multilingual Issues | 13341 @node Locales, More about code pages, Microsoft Documentation, Microsoft Windows-Related Multilingual Issues |
13338 @subsection Locales, code pages, and other concepts of "language" | 13342 @subsection Locales, code pages, and other concepts of ``language'' |
13339 @cindex locales, code pages, and other concepts of "language" | 13343 @cindex locales, code pages, and other concepts of ``language'' |
13340 | 13344 |
13341 First, make sure you clearly understand the difference between the C | 13345 First, make sure you clearly understand the difference between the C |
13342 runtime library (CRT) and the Win32 API! See win32.c. | 13346 runtime library (CRT) and the Win32 API! See win32.c. |
13343 | 13347 |
13344 There are various different ways of representing the vague concept | 13348 There are various different ways of representing the vague concept |
13345 of "language", and it can be very confusing. So: | 13349 of ``language'', and it can be very confusing. So: |
13346 | 13350 |
13347 @itemize @bullet | 13351 @itemize @bullet |
13348 @item | 13352 @item |
13349 The CRT library has the concept of "locale", which is a | 13353 The CRT library has the concept of ``locale'', which is a |
13350 combination of language and country, and which controls the way | 13354 combination of language and country, and which controls the way |
13351 currency and dates are displayed, the encoding of data, etc. | 13355 currency and dates are displayed, the encoding of data, etc. |
13352 | 13356 |
13353 @item | 13357 @item |
13354 XEmacs has the concept of "language environment", more or less | 13358 XEmacs has the concept of ``language environment'', more or less |
13355 like a locale; although currently in most cases it just refers to | 13359 like a locale; although currently in most cases it just refers to |
13356 the language, and no sub-language distinctions are | 13360 the language, and no sub-language distinctions are |
13357 made. (Exceptions are with Chinese, which has different language | 13361 made. (Exceptions are with Chinese, which has different language |
13358 environments for Taiwan and mainland China, due to the different | 13362 environments for Taiwan and mainland China, due to the different |
13359 encodings and writing systems.) | 13363 encodings and writing systems.) |
13361 @item | 13365 @item |
13362 Windows has a number of different language concepts: | 13366 Windows has a number of different language concepts: |
13363 | 13367 |
13364 @enumerate | 13368 @enumerate |
13365 @item | 13369 @item |
13366 There are "languages" and "sublanguages", which correspond to | 13370 There are ``languages'' and ``sublanguages'', which correspond to |
13367 the languages and countries of the C library -- e.g. LANG_ENGLISH | 13371 the languages and countries of the C library -- e.g. LANG_ENGLISH |
13368 and SUBLANG_ENGLISH_US. These are identified by 8-bit integers, | 13372 and SUBLANG_ENGLISH_US. These are identified by 8-bit integers, |
13369 called the "primary language identifier" and "sublanguage | 13373 called the ``primary language identifier'' and ``sublanguage |
13370 identifier", respectively. These are combined into a 16-bit | 13374 identifier'', respectively. These are combined into a 16-bit |
13371 integer or "language identifier" by MAKELANGID(). | 13375 integer or ``language identifier'' by @code{MAKELANGID()}. |
13372 | 13376 |
13373 @item | 13377 @item |
13374 The language identifier in turn is combined with a "sort | 13378 The language identifier in turn is combined with a ``sort |
13375 identifier" (and optionally a "sort version") to yield a 32-bit | 13379 identifier'' (and optionally a ``sort version'') to yield a 32-bit |
13376 integer called a "locale identifier" (type LCID), which identifies | 13380 integer called a ``locale identifier'' (type LCID), which identifies |
13377 locales -- the primary means of distinguishing language/regional | 13381 locales -- the primary means of distinguishing language/regional |
13378 settings and similar to C library locales. | 13382 settings and similar to C library locales. |
13379 | 13383 |
13380 @item | 13384 @item |
13381 A "code page" combines the XEmacs concepts of "charset" and "coding | 13385 A ``code page'' combines the XEmacs concepts of ``charset'' and ``coding |
13382 system". It logically encompasses | 13386 system''. It logically encompasses |
13383 | 13387 |
13384 @itemize @minus | 13388 @itemize @minus |
13385 @item | 13389 @item |
13386 a set of supported characters | 13390 a set of supported characters |
13387 @item | 13391 @item |
13390 supported | 13394 supported |
13391 @item | 13395 @item |
13392 a way of encoding a series of characters into a string of bytes | 13396 a way of encoding a series of characters into a string of bytes |
13393 @end itemize | 13397 @end itemize |
13394 | 13398 |
13395 Note that the first two properties correspond to an XEmacs "charset" | 13399 Note that the first two properties correspond to an XEmacs ``charset'' |
13396 and the latter an XEmacs "coding system". | 13400 and the latter an XEmacs ``coding system''. |
13397 | 13401 |
13398 Traditional encodings are either simple one-byte encodings, or | 13402 Traditional encodings are either simple one-byte encodings, or |
13399 combination one-byte/two-byte encodings (aka MBCS encodings, where MBCS | 13403 combination one-byte/two-byte encodings (aka MBCS encodings, where MBCS |
13400 stands for "Multibyte Character Set") with the following properties: | 13404 stands for ``Multibyte Character Set'') with the following properties: |
13401 | 13405 |
13402 @itemize @minus | 13406 @itemize @minus |
13403 @item | 13407 @item |
13404 all characters are encoded as a one-byte or two-byte sequence | 13408 all characters are encoded as a one-byte or two-byte sequence |
13405 @item | 13409 @item |
13406 the encoding is stateless (non-modal) | 13410 the encoding is stateless (non-modal) |
13407 @item | 13411 @item |
13408 the lower 128 bytes are compatible with ASCII | 13412 the lower 128 bytes are compatible with ASCII |
13409 @item | 13413 @item |
13410 in the higher bytes, the value of the first byte ("lead byte") | 13414 in the higher bytes, the value of the first byte (``lead byte'') |
13411 determines whether a second byte follows | 13415 determines whether a second byte follows |
13412 @item | 13416 @item |
13413 the values used for second bytes may overlap those used for first | 13417 the values used for second bytes may overlap those used for first |
13414 bytes, and (in some encodings) include values in the low half; thus, | 13418 bytes, and (in some encodings) include values in the low half; thus, |
13415 moving backwards is hard, and pure-ASCII algorithms (e.g. finding the | 13419 moving backwards is hard, and pure-ASCII algorithms (e.g. finding the |
13427 Every Windows locale has four associated code pages: ANSI (an | 13431 Every Windows locale has four associated code pages: ANSI (an |
13428 international standard or some Microsoft-created approximation; the | 13432 international standard or some Microsoft-created approximation; the |
13429 native code page under Windows), OEM (a DOS encoding, still used in the | 13433 native code page under Windows), OEM (a DOS encoding, still used in the |
13430 FAT file system), Mac (an encoding used on the Macintosh) and EBCDIC (a | 13434 FAT file system), Mac (an encoding used on the Macintosh) and EBCDIC (a |
13431 non-ASCII-compatible encoding used on IBM mainframes, originally based | 13435 non-ASCII-compatible encoding used on IBM mainframes, originally based |
13432 on the BCD or "binary-coded decimal" encoding of numbers). All code | 13436 on the BCD or ``binary-coded decimal'' encoding of numbers). All code |
13433 pages associated with a locale follow (as far as I know) the properties | 13437 pages associated with a locale follow (as far as I know) the properties |
13434 listed above for traditional code pages. More than one locale can share | 13438 listed above for traditional code pages. More than one locale can share |
13435 a code page -- e.g. all the Western European languages, including | 13439 a code page -- e.g. all the Western European languages, including |
13436 English, do. | 13440 English, do. |
13437 | 13441 |
13438 @item | 13442 @item |
13439 Windows also has an "input locale identifier" (aka "keyboard | 13443 Windows also has an ``input locale identifier'' (aka ``keyboard |
13440 layout id") or HKL, which is a 32-bit integer composed of the | 13444 layout id'') or HKL, which is a 32-bit integer composed of the |
13441 16-bit language identifier and a 16-bit "device identifier", which | 13445 16-bit language identifier and a 16-bit ``device identifier'', which |
13442 originally specified a particular keyboard layout (e.g. the locale | 13446 originally specified a particular keyboard layout (e.g. the locale |
13443 "US English" can have the QWERTY layout, the Dvorak layout, etc.), | 13447 ``US English'' can have the QWERTY layout, the Dvorak layout, etc.), |
13444 but has been expanded to include speech-to-text converters and | 13448 but has been expanded to include speech-to-text converters and |
13445 other non-keyboard ways of inputting text. Note that both the HKL | 13449 other non-keyboard ways of inputting text. Note that both the HKL |
13446 and LCID share the language identifier in the lower 16 bits, and in | 13450 and LCID share the language identifier in the lower 16 bits, and in |
13447 both cases a 0 in the upper 16 bits means "default" (sort order or | 13451 both cases a 0 in the upper 16 bits means ``default'' (sort order or |
13448 device), providing a way to convert between HKL's, LCID's, and | 13452 device), providing a way to convert between HKL's, LCID's, and |
13449 language identifiers (i.e. language/sublanguage pairs). The | 13453 language identifiers (i.e. language/sublanguage pairs). The |
13450 default keyboard layout for a language is (as far as I can | 13454 default keyboard layout for a language is (as far as I can |
13451 determine) established using the Regional Settings control panel | 13455 determine) established using the Regional Settings control panel |
13452 applet, where you can add input locales as combinations of language | 13456 applet, where you can add input locales as combinations of language |
13460 | 13464 |
13461 @node More about code pages, More about locales, Locales, Microsoft Windows-Related Multilingual Issues | 13465 @node More about code pages, More about locales, Locales, Microsoft Windows-Related Multilingual Issues |
13462 @subsection More about code pages | 13466 @subsection More about code pages |
13463 @cindex more about code pages | 13467 @cindex more about code pages |
13464 | 13468 |
13465 Here is what MSDN says about code pages (article "Code Pages"): | 13469 Here is what MSDN says about code pages (article ``Code Pages''): |
13466 | 13470 |
13467 @quotation | 13471 @quotation |
13468 A code page is a character set, which can include numbers, | 13472 A code page is a character set, which can include numbers, |
13469 punctuation marks, and other glyphs. Different languages and locales | 13473 punctuation marks, and other glyphs. Different languages and locales |
13470 may use different code pages. For example, ANSI code page 1252 is | 13474 may use different code pages. For example, ANSI code page 1252 is |
13502 | 13506 |
13503 -- The "C" locale is defined by ANSI to correspond to the locale in | 13507 -- The "C" locale is defined by ANSI to correspond to the locale in |
13504 which C programs have traditionally executed. The code page for the | 13508 which C programs have traditionally executed. The code page for the |
13505 "C" locale (code page) corresponds to the ASCII character | 13509 "C" locale (code page) corresponds to the ASCII character |
13506 set. For example, in the "C" locale, islower returns true for the | 13510 set. For example, in the "C" locale, islower returns true for the |
13507 values 0x61 ?0x7A only. In another locale, islower may return true | 13511 values 0x61 to 0x7A only. In another locale, islower may return true |
13508 for these as well as other values, as defined by that locale. | 13512 for these as well as other values, as defined by that locale. |
13509 | 13513 |
13510 Under "Locale-Dependent Routines" we notice the following setlocale | 13514 Under ``Locale-Dependent Routines'' we notice the following setlocale |
13511 dependencies: | 13515 dependencies: |
13512 | 13516 |
13513 atof, atoi, atol (LC_NUMERIC) | 13517 atof, atoi, atol (LC_NUMERIC) |
13514 is Routines (LC_CTYPE) | 13518 is Routines (LC_CTYPE) |
13515 isleadbyte (LC_CTYPE) | 13519 isleadbyte (LC_CTYPE) |
13538 wcstombs (LC_CTYPE) | 13542 wcstombs (LC_CTYPE) |
13539 wctomb (LC_CTYPE) | 13543 wctomb (LC_CTYPE) |
13540 _wtoi/_wtol (LC_NUMERIC) | 13544 _wtoi/_wtol (LC_NUMERIC) |
13541 @end quotation | 13545 @end quotation |
13542 | 13546 |
13543 NOTE: The above documentation doesn't clearly explain the "locale code | 13547 NOTE: The above documentation doesn't clearly explain the ``locale code |
13544 page" and "multibyte code page". These are two different values, | 13548 page'' and ``multibyte code page''. These are two different values, |
13545 maintained respectively in the CRT global variables __lc_codepage and | 13549 maintained respectively in the CRT global variables __lc_codepage and |
13546 __mbcodepage. Calling e.g. setlocale (LC_ALL, "JAPANESE") sets @strong{ONLY} | 13550 __mbcodepage. Calling e.g. setlocale (LC_ALL, "JAPANESE") sets @strong{ONLY} |
13547 __lc_codepage to 932 (the code page for Japanese), and leaves | 13551 __lc_codepage to 932 (the code page for Japanese), and leaves |
13548 __mbcodepage unchanged (usually 1252, i.e. Windows-ANSI). You'd have to | 13552 __mbcodepage unchanged (usually 1252, i.e. Windows-ANSI). You'd have to |
13549 call _setmbcp() to change __mbcodepage. Figuring out from the | 13553 call _setmbcp() to change __mbcodepage. Figuring out from the |
13550 documentation which routines use which code page is not so obvious. But: | 13554 documentation which routines use which code page is not so obvious. But: |
13551 | 13555 |
13552 @itemize @bullet | 13556 @itemize @bullet |
13553 @item | 13557 @item |
13554 from "Interpretation of Multibyte-Character Sequences" it appears that | 13558 from ``Interpretation of Multibyte-Character Sequences'' it appears that |
13555 all "multibyte-character routines" use the multibyte code page except for | 13559 all ``multibyte-character routines'' use the multibyte code page except for |
13556 mblen(), _mbstrlen(), mbstowcs(), mbtowc(), wcstombs(), and wctomb(). | 13560 @code{mblen()}, @code{_mbstrlen()}, @code{mbstowcs()}, @code{mbtowc()}, @code{wcstombs()}, and @code{wctomb()}. |
13557 | 13561 |
13558 @item | 13562 @item |
13559 from "_setmbcp": "The multibyte code page also affects | 13563 from ``_setmbcp'': ``The multibyte code page also affects |
13560 multibyte-character processing by the following run-time library | 13564 multibyte-character processing by the following run-time library |
13561 routines: _exec functions _mktemp _stat _fullpath _spawn functions | 13565 routines: _exec functions _mktemp _stat _fullpath _spawn functions |
13562 _tempnam _makepath _splitpath tmpnam. In addition, all run-time library | 13566 _tempnam _makepath _splitpath tmpnam. In addition, all run-time library |
13563 routines that receive multibyte-character argv or envp program arguments | 13567 routines that receive multibyte-character argv or envp program arguments |
13564 as parameters (such as the _exec and _spawn families) process these | 13568 as parameters (such as the _exec and _spawn families) process these |
13565 strings according to the multibyte code page. Hence these routines are | 13569 strings according to the multibyte code page. Hence these routines are |
13566 also affected by a call to _setmbcp that changes the multibyte code | 13570 also affected by a call to _setmbcp that changes the multibyte code |
13567 page." | 13571 page.'' |
13568 @end itemize | 13572 @end itemize |
13569 | 13573 |
13570 Summary: from looking at the CRT source (which comes with VC++) and | 13574 Summary: from looking at the CRT source (which comes with VC++) and |
13571 carefully looking through the docs, it appears that: | 13575 carefully looking through the docs, it appears that: |
13572 | 13576 |
13573 @itemize @bullet | 13577 @itemize @bullet |
13574 @item | 13578 @item |
13575 the "locale code page" is used by all of the routines listed above | 13579 the ``locale code page'' is used by all of the routines listed above |
13576 under "Locale-Dependent Routines" (EXCEPT _mbccpy() and _mbclen()), | 13580 under ``Locale-Dependent Routines'' (EXCEPT @code{_mbccpy()} and @code{_mbclen()}), |
13577 as well as any other place that converts between multibyte and Unicode | 13581 as well as any other place that converts between multibyte and Unicode |
13578 strings, e.g. the startup code. | 13582 strings, e.g. the startup code. |
13579 @item | 13583 @item |
13580 the "multibyte code page" is used in all of the *mb*() routines | 13584 the ``multibyte code page'' is used in all of the @code{mb*()} routines |
13581 except mblen(), _mbstrlen(), mbstowcs(), mbtowc(), wcstombs(), | 13585 except @code{mblen()}, @code{_mbstrlen()}, @code{mbstowcs()}, @code{mbtowc()}, @code{wcstombs()}, |
13582 and wctomb(); also _exec*(), _spawn*(), _mktemp(), _stat(), _fullpath(), | 13586 and @code{wctomb()}; also @code{_exec*()}, @code{_spawn*()}, @code{_mktemp()}, @code{_stat()}, @code{_fullpath()}, |
13583 _tempnam(), _makepath(), _splitpath(), tmpnam(), and similar functions | 13587 @code{_tempnam()}, @code{_makepath()}, @code{_splitpath()}, @code{tmpnam()}, and similar functions |
13584 without the leading underscore. | 13588 without the leading underscore. |
13585 @end itemize | 13589 @end itemize |
13586 | 13590 |
13587 @node More about locales, Unicode support under Windows, More about code pages, Microsoft Windows-Related Multilingual Issues | 13591 @node More about locales, Unicode support under Windows, More about code pages, Microsoft Windows-Related Multilingual Issues |
13588 @subsection More about locales | 13592 @subsection More about locales |
13591 In addition to the locale defined by the CRT, Windows (i.e. the Win32 API) | 13595 In addition to the locale defined by the CRT, Windows (i.e. the Win32 API) |
13592 defines various locales: | 13596 defines various locales: |
13593 | 13597 |
13594 @itemize @bullet | 13598 @itemize @bullet |
13595 @item | 13599 @item |
13596 The system-default locale is the locale defined under "Language | 13600 The system-default locale is the locale defined under ``Language |
13597 settings for the system" in the "Regional Options" control panel. This | 13601 settings for the system'' in the ``Regional Options'' control panel. This |
13598 is NOT user-specific, and changing it requires a reboot (at least under | 13602 is NOT user-specific, and changing it requires a reboot (at least under |
13599 Windows 2000). The ANSI code page of the system-default locale is | 13603 Windows 2000). The ANSI code page of the system-default locale is |
13600 returned by GetACP(), and you can specify this code page in calls | 13604 returned by @code{GetACP()}, and you can specify this code page in calls |
13601 e.g. to MultiByteToWideChar with the constant CP_ACP. | 13605 e.g. to MultiByteToWideChar with the constant CP_ACP. |
13602 | 13606 |
13603 @item | 13607 @item |
13604 The user-default locale is the locale defined under "Settings for the | 13608 The user-default locale is the locale defined under ``Settings for the |
13605 current user" in the "Regional Options" control panel. | 13609 current user'' in the ``Regional Options'' control panel. |
13606 | 13610 |
13607 @item | 13611 @item |
13608 There is a thread-local locale set by SetThreadLocale. #### What is this | 13612 There is a thread-local locale set by SetThreadLocale. #### What is this |
13609 used for? | 13613 used for? |
13610 @end itemize | 13614 @end itemize |
13611 | 13615 |
13612 The Win32 API has a bunch of multibyte functions -- all of those that | 13616 The Win32 API has a bunch of multibyte functions -- all of those that |
13613 end with ...A(), and on which we spend so much effort in | 13617 end with ...@code{A()}, and on which we spend so much effort in |
13614 intl-encap-win32.c. These appear to ALWAYS use the ANSI code page of | 13618 intl-encap-win32.c. These appear to ALWAYS use the ANSI code page of |
13615 the system-default locale (GetACP(), CP_ACP). Note that this applies | 13619 the system-default locale (@code{GetACP()}, CP_ACP). Note that this applies |
13616 also, for example, to the encoding of filenames in all file-handling | 13620 also, for example, to the encoding of filenames in all file-handling |
13617 routines, including the CRT ones such as open(), because they pass their | 13621 routines, including the CRT ones such as @code{open()}, because they pass their |
13618 args unchanged to the Win32 API. | 13622 args unchanged to the Win32 API. |
13619 | 13623 |
13620 @node Unicode support under Windows, The golden rules of writing Unicode-safe code, More about locales, Microsoft Windows-Related Multilingual Issues | 13624 @node Unicode support under Windows, The golden rules of writing Unicode-safe code, More about locales, Microsoft Windows-Related Multilingual Issues |
13621 @subsection Unicode support under Windows | 13625 @subsection Unicode support under Windows |
13622 @cindex unicode support under windows | 13626 @cindex unicode support under windows |
13630 table to convert the characters of that code page to and from Unicode, and | 13634 table to convert the characters of that code page to and from Unicode, and |
13631 the Win32 API itself probably (perhaps always) uses Unicode internally. | 13635 the Win32 API itself probably (perhaps always) uses Unicode internally. |
13632 | 13636 |
13633 Under Windows there are two different versions of all library routines that | 13637 Under Windows there are two different versions of all library routines that |
13634 accept or return text, those that handle Unicode text and those handling | 13638 accept or return text, those that handle Unicode text and those handling |
13635 "multibyte" text, i.e. variable-width ASCII-compatible text in some | 13639 ``multibyte'' text, i.e. variable-width ASCII-compatible text in some |
13636 national format such as EUC or Shift-JIS. Because Windows 95 basically | 13640 national format such as EUC or Shift-JIS. Because Windows 95 basically |
13637 doesn't support Unicode but Windows NT does, and Microsoft doesn't provide | 13641 doesn't support Unicode but Windows NT does, and Microsoft doesn't provide |
13638 any way of writing a single binary that will work on both systems and still | 13642 any way of writing a single binary that will work on both systems and still |
13639 use Unicode when it's available (although see below, Microsoft Layer for | 13643 use Unicode when it's available (although see below, Microsoft Layer for |
13640 Unicode), we need to provide a way of run-time conditionalizing so you | 13644 Unicode), we need to provide a way of run-time conditionalizing so you |
13641 could have one binary for both systems. "Unicode-splitting" refers to | 13645 could have one binary for both systems. ``Unicode-splitting'' refers to |
13642 writing code that will handle this properly. This means using | 13646 writing code that will handle this properly. This means using |
13643 Qmswindows_tstr as the external conversion format, calling the appropriate | 13647 Qmswindows_tstr as the external conversion format, calling the appropriate |
13644 qxe...() Unicode-split version of library functions, and doing other things | 13648 qxe...() Unicode-split version of library functions, and doing other things |
13645 in certain cases, e.g. when a qxe() function is not present. | 13649 in certain cases, e.g. when a @code{qxe()} function is not present. |
13646 | 13650 |
13647 Unicode support also requires that the various Windows APIs be | 13651 Unicode support also requires that the various Windows APIs be |
13648 "Unicode-encapsulated", so that they automatically call the ANSI or | 13652 ``Unicode-encapsulated'', so that they automatically call the ANSI or |
13649 Unicode version of the API call appropriately and handle the size | 13653 Unicode version of the API call appropriately and handle the size |
13650 differences in structures. What this means is: | 13654 differences in structures. What this means is: |
13651 | 13655 |
13652 @itemize @bullet | 13656 @itemize @bullet |
13653 @item | 13657 @item |
13654 first, note that Windows already provides a sort of encapsulation | 13658 first, note that Windows already provides a sort of encapsulation |
13655 of all APIs that deal with text. All such APIs are underlyingly | 13659 of all APIs that deal with text. All such APIs are underlyingly |
13656 provided in two versions, with an A or W suffix (ANSI or "wide" | 13660 provided in two versions, with an A or W suffix (ANSI or ``wide'' |
13657 i.e. Unicode), and the compile-time constant UNICODE controls which is | 13661 i.e. Unicode), and the compile-time constant UNICODE controls which is |
13658 selected by the unsuffixed API. Same thing happens with structures, and | 13662 selected by the unsuffixed API. Same thing happens with structures, and |
13659 also with types, where the generic types have names beginning with T -- | 13663 also with types, where the generic types have names beginning with T -- |
13660 TCHAR, LPTSTR, etc.. Unfortunately, this is compile-time only, not | 13664 TCHAR, LPTSTR, etc.. Unfortunately, this is compile-time only, not |
13661 run-time, so not sufficient. (Creating the necessary run-time encoding | 13665 run-time, so not sufficient. (Creating the necessary run-time encoding |
13670 such an API available internally.) | 13674 such an API available internally.) |
13671 | 13675 |
13672 @item | 13676 @item |
13673 what we do is provide an encapsulation of each standard Windows API call | 13677 what we do is provide an encapsulation of each standard Windows API call |
13674 that is split into A and W versions. current theory is to avoid all | 13678 that is split into A and W versions. current theory is to avoid all |
13675 preprocessor games; so we name the function with a prefix -- "qxe" | 13679 preprocessor games; so we name the function with a prefix -- ``qxe'' |
13676 currently -- and require callers to use the prefixed name. Callers need | 13680 currently -- and require callers to use the prefixed name. Callers need |
13677 to explicitly use the W version of all structures, and convert text | 13681 to explicitly use the W version of all structures, and convert text |
13678 themselves using Qmswindows_tstr. the qxe encapsulated version will | 13682 themselves using Qmswindows_tstr. the qxe encapsulated version will |
13679 automatically call the appropriate A or W version depending on whether | 13683 automatically call the appropriate A or W version depending on whether |
13680 we're running on 9x or NT (you can force use of the A calls on NT, | 13684 we're running on 9x or NT (you can force use of the A calls on NT, |
13730 purpose, to make the code easier to follow for someone who's not familiar | 13734 purpose, to make the code easier to follow for someone who's not familiar |
13731 with it. until our library is really complete and bug-free, we should | 13735 with it. until our library is really complete and bug-free, we should |
13732 think twice before doing this. | 13736 think twice before doing this. |
13733 | 13737 |
13734 According to Microsoft documentation, only the following functions are | 13738 According to Microsoft documentation, only the following functions are |
13735 provided under Windows 9x to support Unicode (see MSDN page "Windows | 13739 provided under Windows 9x to support Unicode (see MSDN page ``Windows |
13736 95/98/Me General Limitations"): | 13740 95/98/Me General Limitations''): |
13737 | 13741 |
13738 EnumResourceLanguagesW | 13742 EnumResourceLanguagesW |
13739 EnumResourceNamesW | 13743 EnumResourceNamesW |
13740 EnumResourceTypesW | 13744 EnumResourceTypesW |
13741 ExtTextOutW | 13745 ExtTextOutW |
13752 MessageBoxExW | 13756 MessageBoxExW |
13753 MultiByteToWideChar | 13757 MultiByteToWideChar |
13754 TextOutW | 13758 TextOutW |
13755 WideCharToMultiByte | 13759 WideCharToMultiByte |
13756 | 13760 |
13757 also maybe GetTextExtentExPoint? (KB Q125671 "Unicode Functions Supported | 13761 also maybe GetTextExtentExPoint? (KB Q125671 ``Unicode Functions Supported |
13758 by Windows 95") | 13762 by Windows 95'') |
13759 | 13763 |
13760 Q210341 says this in addition: | 13764 Q210341 says this in addition: |
13761 | 13765 |
13762 @quotation | 13766 @quotation |
13763 SUMMARY: | 13767 SUMMARY: |
13778 range beyond the 256 limitation of a one-byte representation. | 13782 range beyond the 256 limitation of a one-byte representation. |
13779 | 13783 |
13780 The Unicode standard offers application developers an opportunity to | 13784 The Unicode standard offers application developers an opportunity to |
13781 work with text without the limitations of character set based | 13785 work with text without the limitations of character set based |
13782 systems. For more information on the Unicode standard see the | 13786 systems. For more information on the Unicode standard see the |
13783 "References" section of this article. Windows NT is a fully Unicode | 13787 References" section of this article. Windows NT is a fully Unicode |
13784 capable operating system so it may be desirable to write software that | 13788 capable operating system so it may be desirable to write software that |
13785 supports Unicode on Windows 95. | 13789 supports Unicode on Windows 95. |
13786 | 13790 |
13787 Even though Windows 95 and Windows 98 are not Unicode based, they do | 13791 Even though Windows 95 and Windows 98 are not Unicode based, they do |
13788 provide some limited Unicode functionality. Drawing of Unicode text is | 13792 provide some limited Unicode functionality. Drawing of Unicode text is |
13861 @itemize @bullet | 13865 @itemize @bullet |
13862 @item | 13866 @item |
13863 wmain() is completely supported, and appropriate Unicode-formatted argv | 13867 wmain() is completely supported, and appropriate Unicode-formatted argv |
13864 and envp will always be passed. | 13868 and envp will always be passed. |
13865 @item | 13869 @item |
13866 Likewise, wWinMain() is completely supported. (NOTE: The docs are not at | 13870 Likewise, @code{wWinMain()} is completely supported. (NOTE: The docs are not at |
13867 all clear on how these various entry points interact, and implies that | 13871 all clear on how these various entry points interact, and implies that |
13868 a windows-subsystem program "must" use WinMain(), while a console- | 13872 a windows-subsystem program ``must'' use @code{WinMain()}, while a console- |
13869 subsystem program "must" use main(), and a program compiled with UNICODE | 13873 subsystem program ``must'' use @code{main()}, and a program compiled with UNICODE |
13870 (which we don't, see above) "must" use the w*() versions, while a program | 13874 (which we don't, see above) ``must'' use the @code{w*()} versions, while a program |
13871 not compiled this way "must" use the plain versions. In fact it appears | 13875 not compiled this way ``must'' use the plain versions. In fact it appears |
13872 that the CRT provides four different compiler entry points, namely | 13876 that the CRT provides four different compiler entry points, namely |
13873 w?(main|WinMain)CRTStartup, and we simply choose the one we like using | 13877 w?(main|WinMain)CRTStartup, and we simply choose the one we like using |
13874 the appropriate link flag. | 13878 the appropriate link flag. |
13875 @item | 13879 @item |
13876 _wenviron, _wputenv | 13880 _wenviron, _wputenv |
17948 boxes are not explicitly cleared and may contain junk. | 17952 boxes are not explicitly cleared and may contain junk. |
17949 | 17953 |
17950 @node The Frame, The Non-Client Area, Intro to Window and Frame Geometry, Window and Frame Geometry | 17954 @node The Frame, The Non-Client Area, Intro to Window and Frame Geometry, Window and Frame Geometry |
17951 @section The Frame | 17955 @section The Frame |
17952 | 17956 |
17953 The "top-level window area" is the entire area of a top-level window (or | 17957 The ``top-level window area'' is the entire area of a top-level window (or |
17954 "frame"). The "client area" (a term from MS Windows) is the area of a | 17958 ``frame''). The ``client area'' (a term from MS Windows) is the area of a |
17955 top-level window that XEmacs draws into and manages with redisplay. | 17959 top-level window that XEmacs draws into and manages with redisplay. |
17956 This includes the toolbar, scrollbars, gutters, dividers, text area, | 17960 This includes the toolbar, scrollbars, gutters, dividers, text area, |
17957 modeline and minibuffer. It does not include the menubar, title or | 17961 modeline and minibuffer. It does not include the menubar, title or |
17958 outer borders. The "non-client area" is the area of a top-level window | 17962 outer borders. The ``non-client area'' is the area of a top-level window |
17959 outside of the client area and includes the menubar, title and outer | 17963 outside of the client area and includes the menubar, title and outer |
17960 borders. Internally, all frame coordinates are relative to the client | 17964 borders. Internally, all frame coordinates are relative to the client |
17961 area. | 17965 area. |
17962 | 17966 |
17963 | 17967 |
17970 @item | 17974 @item |
17971 The outer layer is the window-manager decorations: The title and | 17975 The outer layer is the window-manager decorations: The title and |
17972 borders. These are controlled by the window manager, a separate process | 17976 borders. These are controlled by the window manager, a separate process |
17973 that controls the desktop, the location of icons, etc. When a process | 17977 that controls the desktop, the location of icons, etc. When a process |
17974 tries to create a window, the window manager intercepts this action and | 17978 tries to create a window, the window manager intercepts this action and |
17975 "reparents" the window, placing another window around it which contains | 17979 ``reparents'' the window, placing another window around it which contains |
17976 the window decorations, including the title bar, outer borders used for | 17980 the window decorations, including the title bar, outer borders used for |
17977 resizing, etc. The window manager also implements any actions involving | 17981 resizing, etc. The window manager also implements any actions involving |
17978 the decorations, such as the ability to resize a window by dragging its | 17982 the decorations, such as the ability to resize a window by dragging its |
17979 borders, move a window by dragging its title bar, etc. If there is no | 17983 borders, move a window by dragging its title bar, etc. If there is no |
17980 window manager or you kill it, windows will have no decorations (and | 17984 window manager or you kill it, windows will have no decorations (and |
17981 will lose them if they previously had any) and you will not be able to | 17985 will lose them if they previously had any) and you will not be able to |
17982 move or resize them. | 17986 move or resize them. |
17983 | 17987 |
17984 @item | 17988 @item |
17985 Inside of the window-manager decorations is the "shell", which is | 17989 Inside of the window-manager decorations is the ``shell'', which is |
17986 managed by the toolkit and widget libraries your program is linked with. | 17990 managed by the toolkit and widget libraries your program is linked with. |
17987 The code in @file{*-x.c} uses the Xt toolkit and various possible widget | 17991 The code in @file{*-x.c} uses the Xt toolkit and various possible widget |
17988 libraries built on top of Xt, such as Motif, Athena, the "Lucid" | 17992 libraries built on top of Xt, such as Motif, Athena, the ``Lucid'' |
17989 widgets, etc. Another possibility is GTK (@file{*-gtk.c}), which implements | 17993 widgets, etc. Another possibility is GTK (@file{*-gtk.c}), which implements |
17990 both the toolkit and widgets. Under Xt, the "shell" window is an | 17994 both the toolkit and widgets. Under Xt, the ``shell'' window is an |
17991 EmacsShell widget, containing an EmacsManager widget of the same size, | 17995 EmacsShell widget, containing an EmacsManager widget of the same size, |
17992 which in turn contains a menubar widget and an EmacsFrame widget, inside | 17996 which in turn contains a menubar widget and an EmacsFrame widget, inside |
17993 of which is the client area. (The division into EmacsShell and | 17997 of which is the client area. (The division into EmacsShell and |
17994 EmacsManager is due to the complex and screwy geometry-management system | 17998 EmacsManager is due to the complex and screwy geometry-management system |
17995 in Xt [and X more generally]. The EmacsShell handles negotation with | 17999 in Xt [and X more generally]. The EmacsShell handles negotation with |
18001 | 18005 |
18002 Under Windows, the non-client area is managed by the window system. | 18006 Under Windows, the non-client area is managed by the window system. |
18003 There is no division such as under X. Part of the window-system API | 18007 There is no division such as under X. Part of the window-system API |
18004 (@file{USER.DLL}) of Win32 includes functions to control the menubars, title, | 18008 (@file{USER.DLL}) of Win32 includes functions to control the menubars, title, |
18005 etc. and implements the move and resize behavior. There @strong{is} an | 18009 etc. and implements the move and resize behavior. There @strong{is} an |
18006 equivalent of the window manager, called the "shell", but it manages | 18010 equivalent of the window manager, called the ``shell'', but it manages |
18007 only the desktop, not the windows themselves. The normal shell under | 18011 only the desktop, not the windows themselves. The normal shell under |
18008 Windows is @file{EXPLORER.EXE}; if you kill this, you will lose the bar | 18012 Windows is @file{EXPLORER.EXE}; if you kill this, you will lose the bar |
18009 containing the "Start" menu and tray and such, but the windows | 18013 containing the ``Start'' menu and tray and such, but the windows |
18010 themselves will not be affected or lose their decorations. | 18014 themselves will not be affected or lose their decorations. |
18011 | 18015 |
18012 | 18016 |
18013 @node The Client Area, The Paned Area, The Non-Client Area, Window and Frame Geometry | 18017 @node The Client Area, The Paned Area, The Non-Client Area, Window and Frame Geometry |
18014 @section The Client Area | 18018 @section The Client Area |
18015 | 18019 |
18016 Inside of the client area is the toolbars, the gutters (where the buffer | 18020 Inside of the client area is the toolbars, the gutters (where the buffer |
18017 tabs are displayed), the minibuffer, the internal border width, and one | 18021 tabs are displayed), the minibuffer, the internal border width, and one |
18018 or more non-overlapping "windows" (this is old Emacs terminology, from | 18022 or more non-overlapping ``windows'' (this is old Emacs terminology, from |
18019 before the time when frames existed at all; the standard terminology for | 18023 before the time when frames existed at all; the standard terminology for |
18020 this would be "pane"). Each window can contain a modeline, horizontal | 18024 this would be ``pane''). Each window can contain a modeline, horizontal |
18021 and/or vertical scrollbars, and (for non-rightmost windows) a vertical | 18025 and/or vertical scrollbars, and (for non-rightmost windows) a vertical |
18022 divider, surrounding a text area. | 18026 divider, surrounding a text area. |
18023 | 18027 |
18024 The dimensions of the toolbars and gutters are determined by the formula | 18028 The dimensions of the toolbars and gutters are determined by the formula |
18025 (THICKNESS + 2 * BORDER-THICKNESS), where "thickness" is a cover term | 18029 (THICKNESS + 2 * BORDER-THICKNESS), where ``thickness'' is a cover term |
18026 for height or width, as appropriate. The height and width come from | 18030 for height or width, as appropriate. The height and width come from |
18027 @code{default-toolbar-height} and @code{default-toolbar-width} and the specific | 18031 @code{default-toolbar-height} and @code{default-toolbar-width} and the specific |
18028 versions of these (@code{top-toolbar-height}, @code{left-toolbar-width}, etc.). | 18032 versions of these (@code{top-toolbar-height}, @code{left-toolbar-width}, etc.). |
18029 The border thickness comes from @code{default-toolbar-border-height} and | 18033 The border thickness comes from @code{default-toolbar-border-height} and |
18030 @code{default-toolbar-border-width}, and the specific versions of these. The | 18034 @code{default-toolbar-border-width}, and the specific versions of these. The |
18045 | 18049 |
18046 | 18050 |
18047 @node The Paned Area, Text Areas, The Client Area, Window and Frame Geometry | 18051 @node The Paned Area, Text Areas, The Client Area, Window and Frame Geometry |
18048 @section The Paned Area | 18052 @section The Paned Area |
18049 | 18053 |
18050 The area occupied by the "windows" is called the paned area. | 18054 The area occupied by the ``windows'' is called the paned area. |
18051 Unfortunately, because of the presence of the gutter @strong{between} the | 18055 Unfortunately, because of the presence of the gutter @strong{between} the |
18052 minibuffer and other windows, the bottom of the paned area is not | 18056 minibuffer and other windows, the bottom of the paned area is not |
18053 well-defined -- does it include the minibuffer (in which case it also | 18057 well-defined -- does it include the minibuffer (in which case it also |
18054 includes the bottom gutter, but none others) or does it not include | 18058 includes the bottom gutter, but none others) or does it not include |
18055 the minibuffer? (In which case not all windows are included.) It would | 18059 the minibuffer? (In which case not all windows are included.) It would |
18080 @code{horizontal-scrollbar-visible-p}, @code{vertical-scrollbar-visible-p}, | 18084 @code{horizontal-scrollbar-visible-p}, @code{vertical-scrollbar-visible-p}, |
18081 @code{vertical-divider-always-visible-p}, etc. | 18085 @code{vertical-divider-always-visible-p}, etc. |
18082 | 18086 |
18083 In addition, it is possible to set margins in the text area using the | 18087 In addition, it is possible to set margins in the text area using the |
18084 specifiers @code{left-margin-width} and @code{right-margin-width}. When this is | 18088 specifiers @code{left-margin-width} and @code{right-margin-width}. When this is |
18085 done, only the "inner text area" (the area inside of the margins) will | 18089 done, only the ``inner text area'' (the area inside of the margins) will |
18086 be used for normal display of text; the margins will be used for glyphs | 18090 be used for normal display of text; the margins will be used for glyphs |
18087 with a layout policy of @code{outside-margin} (as set on an extent containing | 18091 with a layout policy of @code{outside-margin} (as set on an extent containing |
18088 the glyph by @code{set-extent-begin-glyph-layout} or | 18092 the glyph by @code{set-extent-begin-glyph-layout} or |
18089 @code{set-extent-end-glyph-layout}). However, the calculation of the text | 18093 @code{set-extent-end-glyph-layout}). However, the calculation of the text |
18090 area size (e.g. in the function @code{window-text-area-width}) includes the | 18094 area size (e.g. in the function @code{window-text-area-width}) includes the |
18091 margins. Which margin is used depends on whether a glyph has been set | 18095 margins. Which margin is used depends on whether a glyph has been set |
18092 as the begin-glyph or end-glyph of an extent (@code{set-extent-begin-glyph} | 18096 as the begin-glyph or end-glyph of an extent (@code{set-extent-begin-glyph} |
18093 etc.), using the left and right margins, respectively. | 18097 etc.), using the left and right margins, respectively. |
18094 | 18098 |
18095 Technically, the margins outside of the inner text area are known as the | 18099 Technically, the margins outside of the inner text area are known as the |
18096 "outside margins". The "inside margins" are in the inner text area and | 18100 ``outside margins''. The ``inside margins'' are in the inner text area and |
18097 constitute the whitespace between the outside margins and the first or | 18101 constitute the whitespace between the outside margins and the first or |
18098 last non-whitespace character in a line; their width can vary from line | 18102 last non-whitespace character in a line; their width can vary from line |
18099 to line. Glyphs will be placed in the inside margin if their layout | 18103 to line. Glyphs will be placed in the inside margin if their layout |
18100 policy is @code{inside-margin} or @code{whitespace}, with @code{whitespace} glyphs on | 18104 policy is @code{inside-margin} or @code{whitespace}, with @code{whitespace} glyphs on |
18101 the inside and @code{inside-margin} glyphs on the outside. Inside-margin | 18105 the inside and @code{inside-margin} glyphs on the outside. Inside-margin |
18106 | 18110 |
18107 | 18111 |
18108 @node The Displayable Area, Which Functions Use Which?, Text Areas, Window and Frame Geometry | 18112 @node The Displayable Area, Which Functions Use Which?, Text Areas, Window and Frame Geometry |
18109 @section The Displayable Area | 18113 @section The Displayable Area |
18110 | 18114 |
18111 The "displayable area" is not so much an actual area as a convenient | 18115 The ``displayable area'' is not so much an actual area as a convenient |
18112 fiction. It is the area used to convert between pixel and character | 18116 fiction. It is the area used to convert between pixel and character |
18113 dimensions for frames. The character dimensions for a frame (e.g. as | 18117 dimensions for frames. The character dimensions for a frame (e.g. as |
18114 returned by @code{frame-width} and @code{frame-height} and set by | 18118 returned by @code{frame-width} and @code{frame-height} and set by |
18115 @code{set-frame-width} and @code{set-frame-height}) are determined from the | 18119 @code{set-frame-width} and @code{set-frame-height}) are determined from the |
18116 displayable area by dividing by the pixel size of the default font as | 18120 displayable area by dividing by the pixel size of the default font as |
18117 instantiated in the frame. (For proportional fonts, the "average" width | 18121 instantiated in the frame. (For proportional fonts, the ``average'' width |
18118 is used. Under Windows, this is a built-in property of the fonts. | 18122 is used. Under Windows, this is a built-in property of the fonts. |
18119 Under X, this is based on the width of the lowercase 'n', or if this is | 18123 Under X, this is based on the width of the lowercase 'n', or if this is |
18120 zero then the width of the default character. [We prefer 'n' to the | 18124 zero then the width of the default character. [We prefer 'n' to the |
18121 specified default character because many X fonts have a default | 18125 specified default character because many X fonts have a default |
18122 character with a zero or otherwise non-representative width.]) | 18126 character with a zero or otherwise non-representative width.]) |
18123 | 18127 |
18124 The displayable area is essentially the "theoretical" gutter area of the | 18128 The displayable area is essentially the ``theoretical'' gutter area of the |
18125 frame, excluding the rightmost and bottom-most scrollbars. That is, it | 18129 frame, excluding the rightmost and bottom-most scrollbars. That is, it |
18126 starts from the client (or "total") area and then excludes the | 18130 starts from the client (or ``total'') area and then excludes the |
18127 "theoretical" toolbars and bottom-most/rightmost scrollbars, and the | 18131 ``theoretical'' toolbars and bottom-most/rightmost scrollbars, and the |
18128 internal border width. In this context, "theoretical" means that all | 18132 internal border width. In this context, ``theoretical'' means that all |
18129 calculations on based on frame-level values for toolbar and scrollbar | 18133 calculations on based on frame-level values for toolbar and scrollbar |
18130 thicknesses. Because these thicknesses are controlled by specifiers, | 18134 thicknesses. Because these thicknesses are controlled by specifiers, |
18131 and specifiers can have window-specific and buffer-specific values, | 18135 and specifiers can have window-specific and buffer-specific values, |
18132 these calculations may or may not reflect the actual size of the paned | 18136 these calculations may or may not reflect the actual size of the paned |
18133 area or of the scrollbars when any particular window is selected. Note | 18137 area or of the scrollbars when any particular window is selected. Note |
18134 also that the "displayable area" may not even be contiguous! In | 18138 also that the ``displayable area'' may not even be contiguous! In |
18135 particular, the gutters are included, but the bottom-most and rightmost | 18139 particular, the gutters are included, but the bottom-most and rightmost |
18136 scrollbars are excluded even though they are inside of the gutters. | 18140 scrollbars are excluded even though they are inside of the gutters. |
18137 Furthermore, if the frame-level value of the horizontal scrollbar height | 18141 Furthermore, if the frame-level value of the horizontal scrollbar height |
18138 is non-zero, then the displayable area includes the paned area above and | 18142 is non-zero, then the displayable area includes the paned area above and |
18139 below the bottom horizontal scrollbar (i.e. the modeline and minibuffer) | 18143 below the bottom horizontal scrollbar (i.e. the modeline and minibuffer) |
18148 width before dividing by the default-font width, and then adding 1 to | 18152 width before dividing by the default-font width, and then adding 1 to |
18149 the result.) (The ultimate motivation for this kludge as well as the | 18153 the result.) (The ultimate motivation for this kludge as well as the |
18150 subtraction of the scrollbars, but not the minibuffer or bottom-most | 18154 subtraction of the scrollbars, but not the minibuffer or bottom-most |
18151 modeline, is to maintain compatibility with TTY's.) | 18155 modeline, is to maintain compatibility with TTY's.) |
18152 | 18156 |
18153 Despite all these concerns and kludges, however, the "displayable area" | 18157 Despite all these concerns and kludges, however, the ``displayable area'' |
18154 concept works well in practice and mostly ensures that by default the | 18158 concept works well in practice and mostly ensures that by default the |
18155 frame will actually fit 79 characters + continuation/truncation glyph. | 18159 frame will actually fit 79 characters + continuation/truncation glyph. |
18156 | 18160 |
18157 | 18161 |
18158 @node Which Functions Use Which?, , The Displayable Area, Window and Frame Geometry | 18162 @node Which Functions Use Which?, , The Displayable Area, Window and Frame Geometry |
19797 @section Event Queues | 19801 @section Event Queues |
19798 @cindex event queues | 19802 @cindex event queues |
19799 @cindex queues, event | 19803 @cindex queues, event |
19800 | 19804 |
19801 There are two event queues here -- the command event queue (#### which | 19805 There are two event queues here -- the command event queue (#### which |
19802 should be called "deferred event queue" and is in my glyph ws) and the | 19806 should be called ``deferred event queue'' and is in my glyph ws) and the |
19803 dispatch event queue. (MS Windows actually has an extra dispatch queue | 19807 dispatch event queue. (MS Windows actually has an extra dispatch queue |
19804 for non-user events and uses the generic one only for user events. This | 19808 for non-user events and uses the generic one only for user events. This |
19805 is because user and non-user events in Windows come through the same | 19809 is because user and non-user events in Windows come through the same |
19806 place -- the window procedure -- but under X, it's possible to | 19810 place -- the window procedure -- but under X, it's possible to |
19807 selectively process events such that we take all the user events before | 19811 selectively process events such that we take all the user events before |
19902 | 19906 |
19903 @item handle_magic_event_cb | 19907 @item handle_magic_event_cb |
19904 XEmacs calls this with an event structure which contains window-system | 19908 XEmacs calls this with an event structure which contains window-system |
19905 dependent information that XEmacs doesn't need to know about, but which | 19909 dependent information that XEmacs doesn't need to know about, but which |
19906 must happen in order. If the @code{next_event_cb} never returns an | 19910 must happen in order. If the @code{next_event_cb} never returns an |
19907 event of type "magic", this will never be used. | 19911 event of type ``magic'', this will never be used. |
19908 | 19912 |
19909 @item format_magic_event_cb | 19913 @item format_magic_event_cb |
19910 Called with a magic event; print a representation of the innards of the | 19914 Called with a magic event; print a representation of the innards of the |
19911 event to @var{PSTREAM}. | 19915 event to @var{PSTREAM}. |
19912 | 19916 |
19934 @item select_process_cb | 19938 @item select_process_cb |
19935 @item unselect_process_cb | 19939 @item unselect_process_cb |
19936 These callbacks tell the underlying implementation to add or remove a | 19940 These callbacks tell the underlying implementation to add or remove a |
19937 file descriptor from the list of fds which are polled for | 19941 file descriptor from the list of fds which are polled for |
19938 inferior-process input. When input becomes available on the given | 19942 inferior-process input. When input becomes available on the given |
19939 process connection, an event of type "process" should be generated. | 19943 process connection, an event of type ``process'' should be generated. |
19940 | 19944 |
19941 @item select_console_cb | 19945 @item select_console_cb |
19942 @item unselect_console_cb | 19946 @item unselect_console_cb |
19943 These callbacks tell the underlying implementation to add or remove a | 19947 These callbacks tell the underlying implementation to add or remove a |
19944 console from the list of consoles which are polled for user-input. | 19948 console from the list of consoles which are polled for user-input. |
20062 @cindex focus handling | 20066 @cindex focus handling |
20063 | 20067 |
20064 Ben's capsule lecture on focus: | 20068 Ben's capsule lecture on focus: |
20065 | 20069 |
20066 In GNU Emacs @code{select-frame} never changes the window-manager frame | 20070 In GNU Emacs @code{select-frame} never changes the window-manager frame |
20067 focus. All it does is change the "selected frame". This is similar to | 20071 focus. All it does is change the ``selected frame''. This is similar to |
20068 what happens when we call @code{select-device} or @code{select-console}. | 20072 what happens when we call @code{select-device} or @code{select-console}. |
20069 Whenever an event comes in (including a keyboard event), its frame is | 20073 Whenever an event comes in (including a keyboard event), its frame is |
20070 selected; therefore, evaluating @code{select-frame} in @samp{*scratch*} | 20074 selected; therefore, evaluating @code{select-frame} in @samp{*scratch*} |
20071 won't cause any effects because the next received event (in the same | 20075 won't cause any effects because the next received event (in the same |
20072 frame) will cause a switch back to the frame displaying | 20076 frame) will cause a switch back to the frame displaying |
20097 minibuffer, you essentially want to temporarily switch the WM focus to | 20101 minibuffer, you essentially want to temporarily switch the WM focus to |
20098 the frame with the minibuffer, and switch it back when you exit the | 20102 the frame with the minibuffer, and switch it back when you exit the |
20099 minibuffer. | 20103 minibuffer. |
20100 | 20104 |
20101 GNU Emacs solves this with the crockish @code{redirect-frame-focus}, | 20105 GNU Emacs solves this with the crockish @code{redirect-frame-focus}, |
20102 which says "for keyboard events received from FRAME, act like they're | 20106 which says ``for keyboard events received from FRAME, act like they're |
20103 coming from FOCUS-FRAME". I think what this means is that, when a | 20107 coming from FOCUS-FRAME''. I think what this means is that, when a |
20104 keyboard event comes in and the event manager is about to select the | 20108 keyboard event comes in and the event manager is about to select the |
20105 event's frame, if that frame has its focus redirected, the redirected-to | 20109 event's frame, if that frame has its focus redirected, the redirected-to |
20106 frame is selected instead. That way, if you're in a minibufferless | 20110 frame is selected instead. That way, if you're in a minibufferless |
20107 frame and enter the minibuffer, then all Lisp functions that run see the | 20111 frame and enter the minibuffer, then all Lisp functions that run see the |
20108 selected frame as the minibuffer's frame rather than the minibufferless | 20112 selected frame as the minibuffer's frame rather than the minibufferless |
20112 There's also some weird logic that switches the redirected frame focus | 20116 There's also some weird logic that switches the redirected frame focus |
20113 from one frame to another if Lisp code explicitly calls | 20117 from one frame to another if Lisp code explicitly calls |
20114 @code{select-frame} (but not if @code{handle-switch-frame} is called), | 20118 @code{select-frame} (but not if @code{handle-switch-frame} is called), |
20115 and saves and restores the frame focus in window configurations, | 20119 and saves and restores the frame focus in window configurations, |
20116 etc. etc. All of this logic is heavily @code{#if 0}'d, with lots of | 20120 etc. etc. All of this logic is heavily @code{#if 0}'d, with lots of |
20117 comments saying "No, this approach doesn't seem to work, so I'm trying | 20121 comments saying ``No, this approach doesn't seem to work, so I'm trying |
20118 this ... is it reasonable? Well, I'm not sure ..." that are a red flag | 20122 this ... is it reasonable? Well, I'm not sure ...'' that are a red flag |
20119 indicating crockishness. | 20123 indicating crockishness. |
20120 | 20124 |
20121 Because of our way of doing things, we can avoid all this crock. | 20125 Because of our way of doing things, we can avoid all this crock. |
20122 Keyboard events never cause a select-frame (who cares what frame they're | 20126 Keyboard events never cause a select-frame (who cares what frame they're |
20123 associated with? They come from a console, only). We change the actual | 20127 associated with? They come from a console, only). We change the actual |
24896 return value should be an alist consisting of a list of all of the | 24900 return value should be an alist consisting of a list of all of the |
24897 defined subtypes for that coding system type along with a level of | 24901 defined subtypes for that coding system type along with a level of |
24898 likelihood and a list of additional properties indicating certain | 24902 likelihood and a list of additional properties indicating certain |
24899 features detected in the data. The extra properties returned are | 24903 features detected in the data. The extra properties returned are |
24900 defined entirely by the particular coding system type and are used | 24904 defined entirely by the particular coding system type and are used |
24901 only in the algorithm described below under "user control." However, | 24905 only in the algorithm described below under ``user control.'' However, |
24902 the levels of likelihood have a standard meaning as follows: | 24906 the levels of likelihood have a standard meaning as follows: |
24903 | 24907 |
24904 Level 4 means "near certainty" and typically indicates that a | 24908 Level 4 means ``near certainty'' and typically indicates that a |
24905 signature has been detected, usually at the beginning of the data, | 24909 signature has been detected, usually at the beginning of the data, |
24906 indicating that the data is encoded in this particular coding system | 24910 indicating that the data is encoded in this particular coding system |
24907 type. An example of this would be the byte order mark at the beginning | 24911 type. An example of this would be the byte order mark at the beginning |
24908 of UCS2 encoded data or the GZIP mark at the beginning of GZIP data. | 24912 of UCS2 encoded data or the GZIP mark at the beginning of GZIP data. |
24909 | 24913 |
24910 Level 3 means "highly likely" and indicates that tell-tale signs have | 24914 Level 3 means ``highly likely'' and indicates that tell-tale signs have |
24911 been discovered in the data that are characteristic of this particular | 24915 been discovered in the data that are characteristic of this particular |
24912 coding system type. Examples of this might be ISO 2022 escape | 24916 coding system type. Examples of this might be ISO 2022 escape |
24913 sequences or the current Unicode end of line markers at regular | 24917 sequences or the current Unicode end of line markers at regular |
24914 intervals. | 24918 intervals. |
24915 | 24919 |
24916 Level 2 means "strongly statistically likely" indicating that | 24920 Level 2 means ``strongly statistically likely'' indicating that |
24917 statistical analysis concludes that there's a high chance that this | 24921 statistical analysis concludes that there's a high chance that this |
24918 data is encoded according to this particular type. For example, this | 24922 data is encoded according to this particular type. For example, this |
24919 might mean that for UCS2 data, there is a high proportion of null bytes | 24923 might mean that for UCS2 data, there is a high proportion of null bytes |
24920 or other repeated bytes in the odd-numbered bytes of the data and a | 24924 or other repeated bytes in the odd-numbered bytes of the data and a |
24921 high variance in the even-numbered bytes of the data. For Shift-JIS, | 24925 high variance in the even-numbered bytes of the data. For Shift-JIS, |
24922 this might indicate that there were no illegal Shift-JIS sequences | 24926 this might indicate that there were no illegal Shift-JIS sequences |
24923 and a fairly high occurrence of common Shift-JIS characters. | 24927 and a fairly high occurrence of common Shift-JIS characters. |
24924 | 24928 |
24925 Level 1 means "weak statistical likelihood" meaning that there is some | 24929 Level 1 means ``weak statistical likelihood'' meaning that there is some |
24926 indication that the data is encoded in this coding system type. In | 24930 indication that the data is encoded in this coding system type. In |
24927 fact, there is a reasonable chance that it may be some other type as | 24931 fact, there is a reasonable chance that it may be some other type as |
24928 well. This means, for example, that no illegal sequences were | 24932 well. This means, for example, that no illegal sequences were |
24929 encountered and at least some data was encountered that is purposely | 24933 encountered and at least some data was encountered that is purposely |
24930 not in other coding system types. For Shift-JIS data, this might mean | 24934 not in other coding system types. For Shift-JIS data, this might mean |
24931 that some bytes in the range 128 to 159 were encountered in the data. | 24935 that some bytes in the range 128 to 159 were encountered in the data. |
24932 | 24936 |
24933 Level 0 means "neutral" which is to say that there's either not enough | 24937 Level 0 means ``neutral'' which is to say that there's either not enough |
24934 data to make any decision or that the data could well be interpreted | 24938 data to make any decision or that the data could well be interpreted |
24935 as this type (meaning no illegal sequences), but there is little or no | 24939 as this type (meaning no illegal sequences), but there is little or no |
24936 indication of anything particular to this particular type. | 24940 indication of anything particular to this particular type. |
24937 | 24941 |
24938 Level -1 means "weakly unlikely" meaning that some data was | 24942 Level -1 means ``weakly unlikely'' meaning that some data was |
24939 encountered that could conceivably be part of the coding system type | 24943 encountered that could conceivably be part of the coding system type |
24940 but is probably not. For example, successively long line-lengths or | 24944 but is probably not. For example, successively long line-lengths or |
24941 very rarely-encountered sequences. | 24945 very rarely-encountered sequences. |
24942 | 24946 |
24943 Level -2 means "strongly unlikely" meaning that typically a number | 24947 Level -2 means ``strongly unlikely'' meaning that typically a number |
24944 of illegal sequences were encountered. | 24948 of illegal sequences were encountered. |
24945 | 24949 |
24946 The algorithm to determine when to stop and indicate that the data has | 24950 The algorithm to determine when to stop and indicate that the data has |
24947 been detected as a particular coding system uses a priority list, | 24951 been detected as a particular coding system uses a priority list, |
24948 which is typically specified as part of the language environment | 24952 which is typically specified as part of the language environment |
24957 Japanese-language environment particular subtypes of ISO 2022 will be | 24961 Japanese-language environment particular subtypes of ISO 2022 will be |
24958 associated with the Japanese coding system version of those | 24962 associated with the Japanese coding system version of those |
24959 subtypes). It is perfectly legal and quite common in fact, to list the | 24963 subtypes). It is perfectly legal and quite common in fact, to list the |
24960 same subtype more than once in the priority list with successively | 24964 same subtype more than once in the priority list with successively |
24961 lower requirements. Other facts that can be listed in the priority | 24965 lower requirements. Other facts that can be listed in the priority |
24962 list for a subtype are "reject", meaning that the data should never be | 24966 list for a subtype are ``reject'', meaning that the data should never be |
24963 detected as this subtype, or "ask", meaning that if the data is | 24967 detected as this subtype, or ``ask'', meaning that if the data is |
24964 detected to be this subtype, the user will be asked whether they | 24968 detected to be this subtype, the user will be asked whether they |
24965 actually mean this. This latter property could be used, for example, | 24969 actually mean this. This latter property could be used, for example, |
24966 towards the bottom of the priority list. | 24970 towards the bottom of the priority list. |
24967 | 24971 |
24968 In addition there is a global variable which specifies the minimum | 24972 In addition there is a global variable which specifies the minimum |
24975 system, the subtype, the coding system and the associated level of | 24979 system, the subtype, the coding system and the associated level of |
24976 likelihood will be prominently displayed either in the echo area or in | 24980 likelihood will be prominently displayed either in the echo area or in |
24977 a status box somewhere. | 24981 a status box somewhere. |
24978 | 24982 |
24979 If no positive match is found according to the priority list, or if | 24983 If no positive match is found according to the priority list, or if |
24980 the matches that are found have the "ask" property on them, then the | 24984 the matches that are found have the ``ask'' property on them, then the |
24981 user will be presented with a list of choices of possible encodings | 24985 user will be presented with a list of choices of possible encodings |
24982 and asked to choose one. This list is typically sorted first by level | 24986 and asked to choose one. This list is typically sorted first by level |
24983 of likelihood, and then within this, by the order in which the | 24987 of likelihood, and then within this, by the order in which the |
24984 subtypes appear in the priority list. This list is displayed in a | 24988 subtypes appear in the priority list. This list is displayed in a |
24985 special kind of dialog box or other buffer allowing the user, in | 24989 special kind of dialog box or other buffer allowing the user, in |
24992 will be in the form of errors or warnings of various levels, some of | 24996 will be in the form of errors or warnings of various levels, some of |
24993 which may be severe enough to stop the decoding entirely, and some of | 24997 which may be severe enough to stop the decoding entirely, and some of |
24994 which may either indicate definitely malformed data but from which | 24998 which may either indicate definitely malformed data but from which |
24995 it's possible to recover, or simply data that appears rather | 24999 it's possible to recover, or simply data that appears rather |
24996 questionable. If any of these status values are reported during | 25000 questionable. If any of these status values are reported during |
24997 decoding, the user will be informed of this and asked "are you sure?" | 25001 decoding, the user will be informed of this and asked ``are you sure?'' |
24998 As part of the "are you sure" dialog box or question, the user can | 25002 As part of the ``are you sure'' dialog box or question, the user can |
24999 display the results of the decoding to make sure it's correct. If the | 25003 display the results of the decoding to make sure it's correct. If the |
25000 user says "no, they're not sure," then the same list of choices as | 25004 user says ``no, they're not sure,'' then the same list of choices as |
25001 previously mentioned will be presented. | 25005 previously mentioned will be presented. |
25002 | 25006 |
25003 @subheading RFC: Autodetection | 25007 @subheading RFC: Autodetection |
25004 | 25008 |
25005 Also appeared under heading "Implementation of Coding System Priority | 25009 Also appeared under heading "Implementation of Coding System Priority |
25215 | 25219 |
25216 @enumerate | 25220 @enumerate |
25217 @item | 25221 @item |
25218 Hopefully a system general enough to handle (2)--(4) will | 25222 Hopefully a system general enough to handle (2)--(4) will |
25219 handle these, too, but we should watch out for gotchas like | 25223 handle these, too, but we should watch out for gotchas like |
25220 Unicode "plane 14" tags which (I think _both_ Ben and Olivier | 25224 Unicode ``plane 14'' tags which (I think _both_ Ben and Olivier |
25221 will agree) have no place in the internal representation, and | 25225 will agree) have no place in the internal representation, and |
25222 thus must be treated as out-of-band control sequences. I | 25226 thus must be treated as out-of-band control sequences. I |
25223 don't know if all such gotchas will be as easy to dispose of. | 25227 don't know if all such gotchas will be as easy to dispose of. |
25224 | 25228 |
25225 @item | 25229 @item |
25256 | 25260 |
25257 sly, it can't be perfect if any autodecoding is done; | 25261 sly, it can't be perfect if any autodecoding is done; |
25258 like Hrvoje should have an easily available option to | 25262 like Hrvoje should have an easily available option to |
25259 to this default (or an optimized approximation which | 25263 to this default (or an optimized approximation which |
25260 t actually read the whole file into a buffer) or simply | 25264 t actually read the whole file into a buffer) or simply |
25261 y everything as binary (with the "font" for binary files | 25265 y everything as binary (with the ``font'' for binary files |
25262 a user option). | 25266 a user option). |
25263 | 25267 |
25264 @item | 25268 @item |
25265 This implies that we should be detecting conditions in the | 25269 This implies that we should be detecting conditions in the |
25266 tail of the file which violate the implicit assumptions of the | 25270 tail of the file which violate the implicit assumptions of the |
25365 | 25369 |
25366 Date: 11/1/1999 7:24 AM | 25370 Date: 11/1/1999 7:24 AM |
25367 | 25371 |
25368 Stephen, thank you very much for writing this up. I think it is a good start, | 25372 Stephen, thank you very much for writing this up. I think it is a good start, |
25369 and definitely moving in the direction I would like to see things going: more | 25373 and definitely moving in the direction I would like to see things going: more |
25370 proposals, less arguing. (aka "more light, less heat") However, I have some | 25374 proposals, less arguing. (aka ``more light, less heat'') However, I have some |
25371 suggestions for cleaning this up: | 25375 suggestions for cleaning this up: |
25372 | 25376 |
25373 You should try to make it more layered. For example, you might have one | 25377 You should try to make it more layered. For example, you might have one |
25374 section devoted to the workings of autodetection, which starts out like this | 25378 section devoted to the workings of autodetection, which starts out like this |
25375 (the section numbers below are totally arbitrary): | 25379 (the section numbers below are totally arbitrary): |