comparison man/internals/internals.texi @ 3322:cf02a1da936a

[xemacs-hg @ 2006-03-31 17:51:18 by stephent] Miscellaneous doc cleanup. <87u09eqzja.fsf@tleepslib.sk.tsukuba.ac.jp>
author stephent
date Fri, 31 Mar 2006 17:51:39 +0000
parents 971e3c687f18
children 15fb91e3a115
comparison
equal deleted inserted replaced
3321:4309d96fb8b7 3322:cf02a1da936a
474 * Internal Text APIs:: 474 * Internal Text APIs::
475 * Coding for Mule:: 475 * Coding for Mule::
476 * CCL:: 476 * CCL::
477 * Microsoft Windows-Related Multilingual Issues:: 477 * Microsoft Windows-Related Multilingual Issues::
478 * Modules for Internationalization:: 478 * Modules for Internationalization::
479 * The Great Mule Merge of March 2002::
479 480
480 Encodings 481 Encodings
481 482
482 * Japanese EUC (Extended Unix Code):: 483 * Japanese EUC (Extended Unix Code)::
483 * JIS7:: 484 * JIS7::
519 * More about locales:: 520 * More about locales::
520 * Unicode support under Windows:: 521 * Unicode support under Windows::
521 * The golden rules of writing Unicode-safe code:: 522 * The golden rules of writing Unicode-safe code::
522 * The format of the locale in setlocale():: 523 * The format of the locale in setlocale()::
523 * Random other Windows I18N docs:: 524 * Random other Windows I18N docs::
525
526 The Great Mule Merge of March 2002
527
528 * List of changed files in new Mule workspace::
529 * Changes to the MULE subsystems::
530 * Pervasive changes throughout XEmacs sources::
531 * Changes to specific subsystems::
532 * Mule changes by theme::
533 * File-coding rewrite::
534 * General User-Visible Changes::
535 * General Lisp-Visible Changes::
536 * User documentation::
537 * General internal changes::
538 * Ben's TODO list:: Probably obsolete.
539 * Ben's README:: Probably obsolete.
524 540
525 Consoles; Devices; Frames; Windows 541 Consoles; Devices; Frames; Windows
526 542
527 * Introduction to Consoles; Devices; Frames; Windows:: 543 * Introduction to Consoles; Devices; Frames; Windows::
528 * Point:: 544 * Point::
575 * Creating an Lstream:: Creating an lstream object. 591 * Creating an Lstream:: Creating an lstream object.
576 * Lstream Types:: Different sorts of things that are streamed. 592 * Lstream Types:: Different sorts of things that are streamed.
577 * Lstream Functions:: Functions for working with lstreams. 593 * Lstream Functions:: Functions for working with lstreams.
578 * Lstream Methods:: Creating new lstream types. 594 * Lstream Methods:: Creating new lstream types.
579 595
596 Subprocesses
597
598 * Ben's separate stderr notes:: Probably obsolete.
599
580 Interface to MS Windows 600 Interface to MS Windows
581 601
582 * Different kinds of Windows environments:: 602 * Different kinds of Windows environments::
583 * Windows Build Flags:: 603 * Windows Build Flags::
584 * Windows I18N Introduction:: 604 * Windows I18N Introduction::
585 * Modules for Interfacing with MS Windows:: 605 * Modules for Interfacing with MS Windows::
606 * CHANGES from 21.4-windows branch:: Probably obsolete.
586 607
587 Interface to the X Window System 608 Interface to the X Window System
588 609
589 * Lucid Widget Library:: An interface to various widget sets. 610 * Lucid Widget Library:: An interface to various widget sets.
590 * Modules for Interfacing with X Windows:: 611 * Modules for Interfacing with X Windows::
10371 * Internal Text APIs:: 10392 * Internal Text APIs::
10372 * Coding for Mule:: 10393 * Coding for Mule::
10373 * CCL:: 10394 * CCL::
10374 * Microsoft Windows-Related Multilingual Issues:: 10395 * Microsoft Windows-Related Multilingual Issues::
10375 * Modules for Internationalization:: 10396 * Modules for Internationalization::
10397 * The Great Mule Merge of March 2002::
10376 @end menu 10398 @end menu
10377 10399
10378 @node Introduction to Multilingual Issues #1, Introduction to Multilingual Issues #2, Multilingual Support, Multilingual Support 10400 @node Introduction to Multilingual Issues #1, Introduction to Multilingual Issues #2, Multilingual Support, Multilingual Support
10379 @section Introduction to Multilingual Issues #1 10401 @section Introduction to Multilingual Issues #1
10380 @cindex introduction to multilingual issues #1 10402 @cindex introduction to multilingual issues #1
14078 definition with a call to the macro XETEXT. This appropriately makes a 14100 definition with a call to the macro XETEXT. This appropriately makes a
14079 string of either regular or wide chars, which is to say this string may be 14101 string of either regular or wide chars, which is to say this string may be
14080 prepended with an L (causing it to be a wide string) depending on 14102 prepended with an L (causing it to be a wide string) depending on
14081 XEUNICODE_P. 14103 XEUNICODE_P.
14082 14104
14083 @node Modules for Internationalization, , Microsoft Windows-Related Multilingual Issues, Multilingual Support 14105 @node Modules for Internationalization, The Great Mule Merge of March 2002, Microsoft Windows-Related Multilingual Issues, Multilingual Support
14084 @section Modules for Internationalization 14106 @section Modules for Internationalization
14085 @cindex modules for internationalization 14107 @cindex modules for internationalization
14086 @cindex internationalization, modules for 14108 @cindex internationalization, modules for
14087 14109
14088 @example 14110 @example
14157 @file{iso-wide.h} 14179 @file{iso-wide.h}
14158 @end example 14180 @end example
14159 14181
14160 This contains leftover code from an earlier implementation of 14182 This contains leftover code from an earlier implementation of
14161 Asian-language support, and is not currently used. 14183 Asian-language support, and is not currently used.
14184
14185
14186 @c
14187 @c DO NOT CHANGE THE NAME OF THIS NODE; ChangeLogs refer to it.
14188 @c Well, of course you're welcome to seek them out and fix them, too.
14189 @c
14190
14191 @node The Great Mule Merge of March 2002, , Modules for Internationalization, Multilingual Support
14192 @section The Great Mule Merge of March 2002
14193 @cindex The Great Mule Merge
14194 @cindex Mule Merge, The Great
14195
14196 In March 2002, just after the release of XEmacs 21.5 beta 5, Ben Wing
14197 merged what was nominally a very large refactoring of the ``Mule''
14198 multilingual support code into the mainline. This merge added robust
14199 support for Unicode on all platforms, and by providing support for Win32
14200 Unicode APIs made the Mule support on the Windows platform a reality.
14201 This merge also included a large number of other changes and
14202 improvements, not necessarily related to internationalization.
14203
14204 This node basically amounts to the ChangeLog for 2002-03-12.
14205
14206 Some effort has been put into proper markup for code and file names, and
14207 some reorganization according to themes of revision. However, much
14208 remains to be done.
14209
14210 @menu
14211 * List of changed files in new Mule workspace::
14212 * Changes to the MULE subsystems::
14213 * Pervasive changes throughout XEmacs sources::
14214 * Changes to specific subsystems::
14215 * Mule changes by theme::
14216 * File-coding rewrite::
14217 * General User-Visible Changes::
14218 * General Lisp-Visible Changes::
14219 * User documentation::
14220 * General internal changes::
14221 * Ben's TODO list:: Probably obsolete.
14222 * Ben's README:: Probably obsolete.
14223 @end menu
14224
14225
14226 @node List of changed files in new Mule workspace, Changes to the MULE subsystems, , The Great Mule Merge of March 2002
14227 @subsection List of changed files in new Mule workspace
14228
14229 This node lists the files that were touched in the Great Mule Merge.
14230
14231 @heading Deleted files
14232
14233 @example
14234 src/iso-wide.h
14235 src/mule-charset.h
14236 src/mule.c
14237 src/ntheap.h
14238 src/syscommctrl.h
14239 lisp/files-nomule.el
14240 lisp/help-nomule.el
14241 lisp/mule/mule-help.el
14242 lisp/mule/mule-init.el
14243 lisp/mule/mule-misc.el
14244 nt/config.h
14245 @end example
14246
14247 @heading Other deleted files
14248
14249 These files were all zero-width and accidentally present.
14250
14251 @example
14252 src/events-mod.h
14253 tests/Dnd/README.OffiX
14254 tests/Dnd/dragtest.el
14255 netinstall/README.xemacs
14256 lib-src/srcdir-symlink.stamp
14257 @end example
14258
14259 @heading New files
14260
14261 @example
14262 CHANGES-ben-mule
14263 README.ben-mule-21-5
14264 README.ben-separate-stderr
14265 TODO.ben-mule-21-5
14266 etc/TUTORIAL.@{cs,es,nl,sk,sl@}
14267 etc/unicode/*
14268 lib-src/make-mswin-unicode.pl
14269 lisp/code-init.el
14270 lisp/resize-minibuffer.el
14271 lisp/unicode.el
14272 lisp/mule/china-util.el
14273 lisp/mule/cyril-util.el
14274 lisp/mule/devan-util.el
14275 lisp/mule/devanagari.el
14276 lisp/mule/ethio-util.el
14277 lisp/mule/indian.el
14278 lisp/mule/japan-util.el
14279 lisp/mule/korea-util.el
14280 lisp/mule/lao-util.el
14281 lisp/mule/lao.el
14282 lisp/mule/mule-locale.txt
14283 lisp/mule/mule-msw-init.el
14284 lisp/mule/thai-util.el
14285 lisp/mule/thai.el
14286 lisp/mule/tibet-util.el
14287 lisp/mule/tibetan.el
14288 lisp/mule/viet-util.el
14289 src/charset.h
14290 src/intl-auto-encap-win32.c
14291 src/intl-auto-encap-win32.h
14292 src/intl-encap-win32.c
14293 src/intl-win32.c
14294 src/intl-x.c
14295 src/mule-coding.c
14296 src/text.c
14297 src/text.h
14298 src/unicode.c
14299 src/s/win32-common.h
14300 src/s/win32-native.h
14301 @end example
14302
14303 @heading Changed files
14304
14305 ``Too numerous to mention.'' (Ben didn't write that, I did, but it's a
14306 good guess that's the intent....)
14307
14308
14309 @node Changes to the MULE subsystems, Pervasive changes throughout XEmacs sources, List of changed files in new Mule workspace, The Great Mule Merge of March 2002
14310 @subsection Changes to the MULE subsystems
14311
14312 @heading configure changes
14313
14314 @itemize
14315 @item
14316 file-coding always compiled in. eol detection is off by default on
14317 unix, non-mule, but can be enabled with configure option
14318 @code{--with-default-eol-detection} or command-line flag @code{-eol}.
14319
14320 @item
14321 code that selects which files are compiled is mostly moved to
14322 @file{Makefile.in.in}. see comment in @file{Makefile.in.in}.
14323
14324 @item
14325 vestigial i18n3 code deleted.
14326
14327 @item
14328 new cygwin mswin libs imm32 (input methods), mpr (user name
14329 enumeration).
14330
14331 @item
14332 check for @code{link}, @code{symlink}.
14333
14334 @item
14335 @code{vfork}-related code deleted.
14336
14337 @item
14338 fix @file{configure.usage}. (delete @code{--with-file-coding},
14339 @code{--no-doc-file}, add @code{--with-default-eol-detection},
14340 @code{--quick-build}).
14341
14342 @item
14343 @file{nt/config.h} has been eliminated and everything in it merged into
14344 @file{config.h.in} and @file{s/windowsnt.h}. see @file{config.h.in} for
14345 more info.
14346
14347 @item
14348 massive rewrite of @file{s/windowsnt.h}, @file{m/windowsnt.h},
14349 @file{s/cygwin32.h}, @file{s/mingw32.h}. common code moved into
14350 @file{s/win32-common.h}, @file{s/win32-native.h}.
14351
14352 @item
14353 in @file{nt/xemacs.mak}, @file{nt/config.inc.samp}, variable is called
14354 @code{MULE}, not @code{HAVE_MULE}, for consistency with sources.
14355
14356 @item
14357 define @code{TABDLY}, @code{TAB3} in @file{freebsd.h} (#### from where?)
14358 @end itemize
14359
14360
14361 @node Pervasive changes throughout XEmacs sources, Changes to specific subsystems, Changes to the MULE subsystems, The Great Mule Merge of March 2002
14362 @subsection Pervasive changes throughout XEmacs sources
14363
14364 @itemize
14365 @item
14366 all @code{#ifdef FILE_CODING} statements removed from code.
14367 @end itemize
14368
14369 @heading Changes to string processing
14370
14371 @itemize
14372 @item
14373 new @samp{qxe()} string functions that accept @code{Intbyte *} as
14374 arguments. These work exactly like the standard @code{strcmp()},
14375 @code{strcpy()}, @code{sprintf()}, etc. except for the argument
14376 declaration differences. We use these whenever we have @code{Intbyte *}
14377 strings, which is quite often.
14378
14379 @item
14380 new fun @code{build_intstring()} takes an @code{Intbyte *}. also new
14381 funs @code{build_msg_intstring} (like @code{build_intstring()}) and
14382 @code{build_msg_string} (like @code{build_string()}) to do a
14383 @code{GETTEXT()} before building the string. (elimination of old
14384 @code{build_translated_string()}, replaced by
14385 @code{build_msg_string()}).
14386
14387 @item
14388 function @code{intern_int()} for @code{Intbyte *} arguments, like
14389 @code{intern()}.
14390
14391 @item
14392 numerous places throughout code where @code{char *} replaced with
14393 something else, e.g. @code{Char_ASCII *}, @code{Intbyte *},
14394 @code{Char_Binary *}, etc. same with unsigned @code{char *}, going to
14395 @code{UChar_Binary *}, etc.
14396 @end itemize
14397
14398
14399 @node Changes to specific subsystems, Mule changes by theme, Pervasive changes throughout XEmacs sources, The Great Mule Merge of March 2002
14400 @subsection Changes to specific subsystems
14401
14402 @heading Changes to the init code
14403
14404 @itemize
14405 @item
14406 lots of init code rewritten to be mule-correct.
14407 @end itemize
14408
14409 @heading Changes to processes
14410
14411 @itemize
14412 @item
14413 always call @code{egetenv()}, never @code{getenv()}, for mule
14414 correctness.
14415 @end itemize
14416
14417 @heading command line (@file{startup.el}, @file{emacs.c})
14418
14419 @itemize
14420 @item
14421 new option @code{-eol} to enable auto EOL detection under non-mule unix.
14422
14423 @item
14424 new option @code{-nuni} (@code{--no-unicode-lib-calls}) to force use of
14425 non-Unicode API's under Windows NT, mostly for debugging purposes.
14426 @end itemize
14427
14428
14429 @node Mule changes by theme, File-coding rewrite, Changes to specific subsystems, The Great Mule Merge of March 2002
14430 @subsection Mule changes by theme
14431
14432 @itemize
14433 @item
14434 the code that handles the details of processing multilingual text has
14435 been consolidated to make it easier to extend it. it has been yanked
14436 out of various files (@file{buffer.h}, @file{mule-charset.h},
14437 @file{lisp.h}, @file{insdel.c}, @file{fns.c}, @file{file-coding.c},
14438 etc.) and put into @file{text.c} and @file{text.h}.
14439 @file{mule-charset.h} has also been renamed @file{charset.h}. all long
14440 comments concerning the representations and their processing have been
14441 consolidated into @file{text.c}.
14442
14443 @item
14444 major rewriting of file-coding. it's mostly abstracted into coding
14445 systems that are defined by methods (similar to devices and specifiers),
14446 with the ultimate aim being to allow non-i18n coding systems such as
14447 gzip. there is a ``chain'' coding system that allows multiple coding
14448 systems to be chained together. (it doesn't yet have the concept that
14449 either end of a coding system can be bytes or chars; this needs to be
14450 added.)
14451
14452 @item
14453 large amounts of code throughout the code base have been Mule-ized, not
14454 just Windows code.
14455
14456 @item
14457 total rewriting of OS locale code. it notices your locale at startup
14458 and sets the language environment accordingly, and calls
14459 @code{setlocale()} and sets @code{LANG} when you change the language
14460 environment. new language environment properties @code{locale},
14461 @code{mswindows-locale}, @code{cygwin-locale},
14462 @code{native-coding-system}, to determine langenv from locale and
14463 vice-versa; fix all language environments (lots of language files).
14464 langenv startup code rewritten. many new functions to convert between
14465 locales, language environments, etc.
14466
14467 @item
14468 major overhaul of the way default values for the various coding system
14469 variables are handled. all default values are collected into one
14470 location, a new file @file{code-init.el}, which provides a unified
14471 mechanism for setting and querying what i call ``basic coding system
14472 variables'' (which may be aliases, parts of conses, etc.) and a
14473 mechanism of different configurations (Windows w/Mule, Windows w/o Mule,
14474 Unix w/Mule, Unix w/o Mule, unix w/o Mule but w/auto EOL), each of which
14475 specifies a set of default values. we determine the configuration at
14476 startup and set all the values in one place. (@file{code-init.el},
14477 @file{code-files.el}, @file{coding.el}, ...)
14478
14479 @item
14480 i copied the remaining language-specific files from fsf. i made some
14481 minor changes in certain cases but for the most part the stuff was just
14482 copied and may not work.
14483
14484 @item
14485 ms windows mule support, with full unicode support. required font,
14486 redisplay, event, other changes. ime support from ikeyama.
14487 @end itemize
14488
14489 @heading Lisp-Visible Changes:
14490
14491 @itemize
14492 @item
14493 ensure that @code{escape-quoted} works correctly even without Mule
14494 support and use it for all auto-saves. (@file{auto-save.el},
14495 @file{fileio.c}, @file{coding.el}, @file{files.el})
14496
14497 @item
14498 new var @code{buffer-file-coding-system-when-loaded} specifies the
14499 actual coding system used when the file was loaded
14500 (@code{buffer-file-coding-system} is usually the same, but may be
14501 changed because it controls how the file is written out). use it in
14502 revert-buffer (@file{files.el}, @file{code-files.el}) and in new submenu
14503 File->Revert Buffer with Specified Encoding (@file{menubar-items.el}).
14504
14505 @item
14506 improve docs on how the coding system is determined when a file is read
14507 in; improved docs are in both @code{find-file} and
14508 @code{insert-file-contents} and a reference to where to find them is in
14509 @code{buffer-file-coding-system-for-read}. (@file{files.el},
14510 @file{code-files.el})
14511
14512 @item
14513 new (brain-damaged) FSF way of calling post-read-conversion (only one
14514 arg, not two) is supported, along with our two-argument way, as best we
14515 can. (@file{code-files.el})
14516
14517 @item
14518 add inexplicably missing var @code{default-process-coding-system}. use
14519 it. get rid of former hacked-up way of setting these defaults using
14520 @code{comint-exec-hook}. also fun
14521 @code{set-buffer-process-coding-system}. (@file{code-process.el},
14522 @file{code-cmds.el}, @file{process.c})
14523
14524 @item
14525 remove function @code{set-default-coding-systems}; replace with
14526 @code{set-default-output-coding-systems}, which affects only the output
14527 defaults (@code{buffer-file-coding-system}, output half of
14528 @code{default-process-coding-system}). the input defaults should not be
14529 set by this because they should always remain @code{undecided} in normal
14530 circumstances. fix @code{prefer-coding-system} to use the new function
14531 and correct its docs.
14532
14533 @item
14534 fix bug in @code{coding-system-change-eol-conversion}
14535 (@file{code-cmds.el})
14536
14537 @item
14538 recognize all eol types in @code{prefer-coding-system}
14539 (@file{code-cmds.el})
14540
14541 @item
14542 rewrite @code{coding-system-category} to be correct (@file{coding.el})
14543 @end itemize
14544
14545 @heading Internal Changes
14546
14547 @itemize
14548 @item
14549 major improvements to eistring code, fleshing out of missing funs.
14550 @end itemize
14551
14552 @itemize
14553 @item
14554 Separate encoding and decoding lstreams have been combined into a single
14555 coding lstream. Functions@samp{ make_encoding_*_stream} and
14556 @samp{make_decoding_*_stream} have been combined into
14557 @samp{make_coding_*_stream}, which takes an argument specifying whether
14558 encode or decode is wanted.
14559
14560 @item
14561 remove last vestiges of I18N3, I18N4 code.
14562
14563 @item
14564 ascii optimization for strings: we keep track of the number of ascii
14565 chars at the beginning and use this to optimize byte<->char conversion
14566 on strings.
14567
14568 @item
14569 @file{mule-misc.el}, @file{mule-init.el} deleted; code in there either
14570 deleted, rewritten, or moved to another file.
14571
14572 @item
14573 @file{mule.c} deleted.
14574
14575 @item
14576 move non-Mule-specific code out of @file{mule-cmds.el} into
14577 @file{code-cmds.el}. (@code{coding-system-change-text-conversion};
14578 remove duplicate @code{coding-system-change-eol-conversion})
14579
14580 @item
14581 remove duplicate @code{set-buffer-process-coding-system}
14582 (@file{code-cmds.el})
14583
14584 @item
14585 add some commented-out code from FSF @file{mule-cmds.el}
14586 (@code{find-coding-systems-region-subset-p},
14587 @code{find-coding-systems-region}, @code{find-coding-systems-string},
14588 @code{find-coding-systems-for-charsets},
14589 @code{find-multibyte-characters}, @code{last-coding-system-specified},
14590 @code{select-safe-coding-system}, @code{select-message-coding-system})
14591 (@file{code-cmds.el})
14592
14593 @item
14594 remove obsolete alias @code{pathname-coding-system}, function
14595 @code{set-pathname-coding-system} (@file{coding.el})
14596
14597 @item
14598 remove coding-system property @code{doc-string}; split into
14599 @code{description} (short, for menu items) and @code{documentation}
14600 (long); correct coding system defns (@file{coding.el},
14601 @file{file-coding.c}, lots of language files)
14602
14603 @item
14604 move coding-system-base into C and make use of internal info
14605 (@file{coding.el}, @file{file-coding.c})
14606
14607 @item
14608 move @code{undecided} defn into C (@file{coding.el},
14609 @file{file-coding.c})
14610
14611 @item
14612 use @code{define-coding-system-alias}, not @code{copy-coding-system}
14613 (@file{coding.el})
14614
14615 @item
14616 new coding system @code{iso-8859-6} for arabic
14617
14618 @item
14619 delete windows-1251 support from @file{cyrillic.el}; we do it
14620 automatically
14621
14622 @item
14623 remove @samp{setup-*-environment} as per FSF 21
14624
14625 @item
14626 rewrite @file{european.el} with lang envs for each language, so we can
14627 specify the locale
14628
14629 @item
14630 fix corruption in @file{greek.el}
14631
14632 @item
14633 sync @file{japanese.el} with FSF 20.6
14634
14635 @item
14636 fix warnings in @file{mule-ccl.el}
14637
14638 @item
14639 move FSF compat Mule fns from @file{obsolete.el} to
14640 @file{mule-charset.el}
14641
14642 @item
14643 eliminate unused @samp{truncate-string@{-to-width@}}
14644
14645 @item
14646 @code{make-coding-system} accepts (but ignores) the additional
14647 properties present in the fsf version, for compatibility.
14648
14649 @item
14650 i fixed the iso2022 handling so it will correctly read in files
14651 containing unknown charsets, creating a ``temporary'' charset which can
14652 later be overwritten by the real charset when it's defined. this allows
14653 iso2022 elisp files with literals in strange languages to compile
14654 correctly under mule. i also added a hack that will correctly read in
14655 and write out the emacs-specific ``composition'' escape sequences,
14656 i.e. @samp{ESC 0} through @samp{ESC 4}. this means that my workspace
14657 correctly compiles the new file @file{devanagari.el} that i added.
14658
14659 @item
14660 elimination of @code{string-to-char-list} (use @code{string-to-list})
14661
14662 @item
14663 elimination of junky @code{define-charset}
14664 @end itemize
14665
14666 @heading Selection
14667
14668 @itemize
14669 @item
14670 fix msw selection code for Mule. proper encoding for
14671 @code{RegisterClipboardFormat}. store selection as
14672 @code{CF_UNICODETEXT}, which will get converted to the other formats.
14673 don't respond to destroy messages from @code{EmptyClipboard()}.
14674 @end itemize
14675
14676 @heading Menubar
14677
14678 @itemize
14679 @item
14680 new items @samp{Open With Specified Encoding},
14681 @samp{Revert Buffer with Specified Encoding}
14682
14683 @item
14684 split Mule menu into @samp{Encoding} (non-Mule-specific; includes new
14685 item to control EOL auto-detection) and @samp{International} submenus on
14686 @samp{Options}, @samp{International} on @samp{Help}
14687
14688 @end itemize
14689
14690 @heading Unicode support:
14691
14692 @itemize
14693 @item
14694 translation tables added in @file{etc/unicode}
14695
14696 @item
14697 new files @file{unicode.c}, @file{unicode.el} containing unicode coding
14698 systems and support; old code ripped out of @file{file-coding.c}
14699
14700 @item
14701 translation tables read in at startup (NEEDS WORK TO MAKE IT MORE
14702 EFFICIENT)
14703
14704 @item
14705 support @code{CF_TEXT}, @code{CF_UNICODETEXT} in @file{select.el}
14706
14707 @item
14708 encapsulation code added so that we can support both Windows 9x and NT
14709 in a single executable, determining at runtime whether to call the
14710 Unicode or non-Unicode API. encapsulated routines in
14711 @file{intl-encap-win32.c} (non-auto-generated) and
14712 @file{intl-auto-encap-win32.[ch]} (auto-generated). code generator in
14713 @file{lib-src/make-mswin-unicode.pl}. changes throughout the code to
14714 use the wide structures (W suffix) and call the encapsulated Win32 API
14715 routines (@samp{qxe} prefix). calling code needs to do proper
14716 conversion of text using new coding systems @code{Qmswindows_tstr},
14717 @code{Qmswindows_unicode}, or @code{Qmswindows_multibyte}. (the first
14718 points to one of the other two.)
14719 @end itemize
14720
14721
14722 @node File-coding rewrite, General User-Visible Changes, Mule changes by theme, The Great Mule Merge of March 2002
14723 @subsection File-coding rewrite
14724
14725 The coding system code has been majorly rewritten. It's abstracted into
14726 coding systems that are defined by methods (similar to devices and
14727 specifiers). The types of conversions have also been generalized.
14728 Formerly, decoding always converted bytes to characters and encoding the
14729 reverse (these are now called ``text file converters''), but conversion
14730 can now happen either to or from bytes or characters. This allows
14731 coding systems such as @code{gzip} and @code{base64} to be written.
14732 When specifying such a coding system to an operation that expects a text
14733 file converter (such as reading in or writing out a file), the
14734 appropriate coding systems to convert between bytes and characters are
14735 automatically inserted into the conversion chain as necessary. To
14736 facilitate creating such chains, a special coding system called
14737 ``chain'' has been created, which chains together two or more coding
14738 systems.
14739
14740 Encoding detection has also been abstracted. Detectors are logically
14741 separate from coding systems, and each detector defines one or more
14742 categories. (For example, the detector for Unicode defines categories
14743 such as UTF-8, UTF-16, UCS-4, and UTF-7.) When a particular detector is
14744 given a piece of text to detect, it determines likeliness values (seven
14745 of them, from 3 [most likely] to -3 [least likely]; specific criteria
14746 are defined for each possible value). All detectors are run in parallel
14747 on a particular piece of text, and the results tabulated together to
14748 determine the actual encoding of the text.
14749
14750 Encoding and decoding are now completely parallel operations, and the
14751 former ``encoding'' and ``decoding'' lstreams have been combined into a
14752 single ``coding'' lstream. Coding system methods that were formerly
14753 split in such a fashion have also been combined.
14754
14755
14756 @node General User-Visible Changes, General Lisp-Visible Changes, File-coding rewrite, The Great Mule Merge of March 2002
14757 @subsection General User-Visible Changes
14758
14759 @heading Search
14760
14761 @itemize
14762 @item
14763 make regex routines reentrant, since they're sometimes called
14764 reentrantly. (see @file{regex.c} for a description of how.) all global
14765 variables used by the regex routines get pushed onto a stack by the
14766 callers before being set, and are restored when finished. redo the
14767 preprocessor flags controlling @code{REL_ALLOC} in conjunction with
14768 this.
14769 @end itemize
14770
14771 @heading Menubar
14772
14773 @itemize
14774 @item
14775 move menu-splitting code (@code{menu-split-long-menu}, etc.) from
14776 @file{font-menu.el} to @file{menubar-items.el} and redo its algorithm;
14777 use in various items with long generated menus; rename to remove
14778 @samp{font-} from beginning of functions but keep old names as aliases
14779
14780 @item
14781 new fn @code{menu-sort-menu}
14782
14783 @item
14784 redo items @samp{Grep All Files in Current Directory @{and Below@}}
14785 using stuff from sample @file{init.el}
14786
14787 @item
14788 @samp{Debug on Error} and friends now affect current session only; not
14789 saved
14790
14791 @item
14792 @code{maybe-add-init-button} -> @code{init-menubar-at-startup} and call
14793 explicitly from @file{startup.el}
14794
14795 @item
14796 don't use @code{charset-registry} in @file{msw-font-menu.el}; it's only
14797 for X
14798 @end itemize
14799
14800 @heading Changes to key bindings
14801
14802 These changes are primarily found in @file{keymap.c}, @file{keydefs.el},
14803 and @file{help.el}, but are found in many other files.
14804
14805 @itemize
14806 @item
14807 @kbd{M-home}, @kbd{M-end} now move forward and backward in buffers; with
14808 @key{Shift}, stay within current group (e.g. all C files; same grouping
14809 as the gutter tabs). (bindings
14810 @samp{switch-to-@{next/previous@}-buffer[-in-group]} in @file{files.el})
14811
14812 needed to move code from @file{gutter-items.el} to @file{buff-menu.el}
14813 that's used by these bindings, since @file{gutter-items.el} is loaded
14814 only when the gutter is active and these bindings (and hence the code)
14815 is not (any more) gutter specific.
14816
14817 @item
14818 new global vars global-tty-map and global-window-system-map specify key
14819 bindings for use only on TTY's or window systems, respectively. this is
14820 used to make @kbd{ESC ESC} be keyboard-quit on window systems, but
14821 @kbd{ESC ESC ESC} on TTY's, where @key{Meta + arrow} keys may appear as
14822 @kbd{ESC ESC O A} or whatever. @kbd{C-z} on window systems is now
14823 @code{zap-up-to-char}, and @code{iconify-frame} is moved to @kbd{C-Z}.
14824 @kbd{ESC ESC} is @code{isearch-quit}. (@file{isearch-mode.el})
14825
14826 @item
14827 document @samp{global-@{tty,window-system@}-map} in various places;
14828 display them when you do @kbd{C-h b}.
14829
14830 @item
14831 fix up function documentation in general for keyboard primitives.
14832 e.g. key-bindings now contains a detailed section on the steps prior to
14833 looking up in keymaps, i.e. @code{function-key-map},
14834 @code{keyboard-translate-table}. etc. @code{define-key} and other
14835 obvious starting points indicate where to look for more info.
14836
14837 @item
14838 eliminate use and mention of grody @code{advertised-undo} and
14839 @code{deprecated-help}. (@file{simple.el}, @file{startup.el},
14840 @file{picture.el}, @file{menubar-items.el})
14841 @end itemize
14842
14843
14844 @node General Lisp-Visible Changes, User documentation, General User-Visible Changes, The Great Mule Merge of March 2002
14845 @subsection General Lisp-Visible Changes
14846
14847 @heading gzip support
14848
14849 The gzip protocol is now partially supported as a coding system.
14850
14851 @itemize
14852 @item
14853 new coding system @code{gzip} (bytes -> bytes); unfortunately, not quite
14854 working yet because it handles only the raw zlib format and not the
14855 higher-level gzip format (the zlib library is brain-damaged in that it
14856 provides low-level, stream-oriented API's only for raw zlib, and for
14857 gzip you have only high-level API's, which aren't useful for xemacs).
14858
14859 @item
14860 configure support (@code{--with-zlib}).
14861 @end itemize
14862
14863
14864 @node User documentation, General internal changes, General Lisp-Visible Changes, The Great Mule Merge of March 2002
14865 @subsection User documentation
14866
14867 @heading Tutorial
14868
14869 @itemize
14870 @item
14871 massive rewrite; sync to FSF 21.0.106, switch focus to window systems,
14872 new sections on terminology and multiple frames, lots of fixes for
14873 current xemacs idioms.
14874
14875 @item
14876 german version from Adrian mostly matching my changes.
14877
14878 @item
14879 copy new tutorials from FSF (Spanish, Dutch, Slovak, Slovenian, Czech);
14880 not updated yet though.
14881
14882 @item
14883 eliminate @file{help-nomule.el} and @file{mule-help.el}; merge into one
14884 single tutorial function, fix lots of problems, put back in
14885 @file{help.el} where it belongs. (there was some random junk in
14886 @file{help-nomule.el}, @code{string-width} and @code{make-char}.
14887 @code{string-width} is now in @file{subr.el} with a single definition,
14888 and @code{make-char} in @file{text.c}.)
14889 @end itemize
14890
14891 @heading Sample init file
14892
14893 @itemize
14894 @item
14895 remove forward/backward buffer code, since it's now standard.
14896
14897 @item
14898 when disabling @kbd{C-x C-c}, make it display a message saying how to
14899 exit, not just beep and complain ``undefined''.
14900 @end itemize
14901
14902
14903 @node General internal changes, Ben's TODO list, User documentation, The Great Mule Merge of March 2002
14904 @subsection General internal changes
14905
14906 @heading Changes to gnuclient and gnuserv
14907
14908 @itemize
14909 @item
14910 clean up headers a bit.
14911
14912 @item
14913 use proper ms win idiom for checking for temp directory (@code{TEMP} or
14914 @code{TMP}, not @code{TMPDIR}).
14915 @end itemize
14916
14917 @heading Process changes
14918
14919 @itemize
14920 @item
14921 Move @code{setenv} from packages; synch @code{setenv}/@code{getenv} with
14922 21.0.105
14923 @end itemize
14924
14925 @heading Changes to I/O internals
14926
14927 @itemize
14928 @item
14929 use @code{PATH_MAX} consistently instead of @code{MAXPATHLEN},
14930 @code{MAX_PATH}, etc.
14931
14932 @item
14933 all code that does preprocessor games with C lib I/O functions (open,
14934 read) has been removed. The code has been changed to call the correct
14935 function directly. Functions that accept @code{Intbyte *} arguments for
14936 filenames and such and do automatic conversion to or from external
14937 format will be prefixed @samp{qxe...()}. Functions that are retrying in
14938 case of @code{EINTR} are prefixed @samp{retry_...()}.
14939 @code{DONT_ENCAPSULATE} is long-gone.
14940
14941 @item
14942 never call @code{getcwd()} any more. use our shadowed value always.
14943 @end itemize
14944
14945 @heading Changes to string processing
14946
14947 @itemize
14948 @item
14949 the @file{doprnt.c} external entry points have been completely rewritten
14950 to be more useful and have more sensible names. We now have, for
14951 example, versions that work exactly like @code{sprintf()} but return a
14952 @code{malloc()}ed string.
14953
14954 @item
14955 code in @file{print.c} that handles @code{stdout}, @code{stderr}
14956 rewritten.
14957
14958 @item
14959 places that print to @code{stderr} directly replaced with
14960 @code{stderr_out()}.
14961
14962 @item
14963 new convenience functions @code{write_fmt_string()},
14964 @code{write_fmt_string_lisp()}, @code{stderr_out_lisp()},
14965 @code{write_string()}.
14966 @end itemize
14967
14968 @heading Changes to Allocation, Objects, and the Lisp Interpreter
14969
14970 @itemize
14971 @item
14972 automatically use ``managed lcrecord'' code when allocating. any
14973 lcrecord can be put on a free list with @code{free_lcrecord()}.
14974
14975 @item
14976 @code{record_unwind_protect()} returns the old spec depth.
14977
14978 @item
14979 @code{unbind_to()} now takes only one arg. use @code{unbind_to_1()} if
14980 you want the 2-arg version, with GC protection of second arg.
14981
14982 @item
14983 new funs to easily inhibit GC. (@code{@{begin,end@}_gc_forbidden()})
14984 use them in places where gc is currently being inhibited in a more ugly
14985 fashion. also, we disable GC in certain strategic places where string
14986 data is often passed in, e.g. @samp{dfc} functions, @samp{print}
14987 functions.
14988
14989 @item
14990 @code{make_buffer()} -> @code{wrap_buffer()} for consistency with other
14991 objects; same for @code{make_frame()} ->@code{ wrap_frame()} and
14992 @code{make_console()} -> @code{wrap_console()}.
14993
14994 @item
14995 better documentation in condition-case.
14996
14997 @item
14998 new convenience funs @code{record_unwind_protect_freeing()} and
14999 @code{record_unwind_protect_freeing_dynarr()} for conveniently setting
15000 up an unwind-protect to @code{xfree()} or @code{Dynarr_free()} a
15001 pointer.
15002 @end itemize
15003
15004 @heading s/m files:
15005
15006 @itemize
15007 @item
15008 removal of unused @code{DATA_END}, @code{TEXT_END},
15009 @code{SYSTEM_PURESIZE_EXTRA}, @code{HAVE_ALLOCA} (automatically
15010 determined)
15011
15012 @item
15013 removal of @code{vfork} references (we no longer use @code{vfork})
15014 @end itemize
15015
15016 @heading @file{make-docfile}:
15017
15018 @itemize
15019 @item
15020 clean up headers a bit.
15021
15022 @item
15023 allow @file{.obj} to mean equivalent @file{.c}, just like for @file{.o}.
15024
15025 @item
15026 allow specification of a ``response file'' (a command-line argument
15027 beginning with @@, specifying a file containing further command-line
15028 arguments) -- a standard mswin idiom to avoid potential command-line
15029 limits and to simplify makefiles. use this in @file{xemacs.mak}.
15030 @end itemize
15031
15032 @heading debug support
15033
15034 @itemize
15035 @item
15036 (@file{cmdloop.el}) new var breakpoint-on-error, which breaks into the C
15037 debugger when an unhandled error occurs noninteractively. useful when
15038 debugging errors coming out of complicated make scripts, e.g. package
15039 compilation, since you can set this through an env var.
15040
15041 @item
15042 (@file{startup.el}) new env var @code{XEMACSDEBUG}, specifying a Lisp
15043 form executed early in the startup process; meant to be used for turning
15044 on debug flags such as @code{breakpoint-on-error} or
15045 @code{stack-trace-on-error}, to track down noninteractive errors.
15046
15047 @item
15048 (@file{cmdloop.el}) removed non-working code in @code{command-error} to
15049 display a backtrace on @code{debug-on-error}. use
15050 @code{stack-trace-on-error} instead to get this.
15051
15052 @item
15053 (@file{process.c}) new var @code{debug-process-io} displays data sent to
15054 and received from a process.
15055
15056 @item
15057 (@file{alloc.c}) staticpros have name stored with them for easier
15058 debugging.
15059
15060 @item
15061 (@file{emacs.c}) code that handles fatal errors consolidated and
15062 rewritten. much more robust and correctly handles all fatal exits on
15063 mswin (e.g. aborts, not previously handled right).
15064 @end itemize
15065
15066 @heading @file{startup.el}
15067
15068 @itemize
15069 @item
15070 move init routines from @code{before-init-hook} or
15071 @code{after-init-hook}; just call them directly
15072 (@code{init-menubar-at-startup}, @code{init-mule-at-startup}).
15073
15074 @item
15075 help message fixed up (divided into sections), existing problem causing
15076 incomplete output fixed, undocumented options documented.
15077 @end itemize
15078
15079 @heading @file{frame.el}
15080
15081 @itemize
15082 @item
15083 delete old commented-out code.
15084 @end itemize
15085
15086
15087 @node Ben's TODO list, Ben's README, General internal changes, The Great Mule Merge of March 2002
15088 @subsection Ben's TODO list (probably obsolete)
15089
15090 These notes substantially overlap those in @ref{Ben's README}. They
15091 should probably be combined.
15092
15093 @heading April 11, 2002
15094
15095 Priority:
15096
15097 @enumerate
15098 @item
15099 Finish checking in current mule ws.
15100
15101 @item
15102 Start working on bugs reported by others and noticed by me:
15103
15104 @itemize
15105 @item
15106 problems cutting and pasting binary data, e.g. from byte-compiler
15107 instructions
15108
15109 @item
15110 test suite failures
15111
15112 @item
15113 process i/o problems w.r.t. eol: |uniq (e.g.) leaves ^M's at end of
15114 line; running "bash" as shell-file-name doesn't work because it doesn't
15115 like the extra ^M's.
15116 @end itemize
15117 @end enumerate
15118
15119 @heading March 20, 2002
15120
15121 bugs:
15122
15123 @itemize
15124 @item
15125 TTY-mode problem. When you start up in TTY mode, XEmacs goes through
15126 the loadup process and appears to be working -- you see the startup
15127 screen pulsing through the different screens, and it appears to be
15128 listening (hitting a key stops the screen motion), but it's frozen --
15129 the screen won't get off the startup, key commands don't cause anything
15130 to happen. STATUS: In progress.
15131
15132 @item
15133 Memory ballooning in some cases. Not yet understood.
15134
15135 @item
15136 other test suite failures?
15137
15138 @item
15139 need to review the handling of sounds. seems that not everything is
15140 documented, not everything is consistently used where it's supposed to,
15141 some sounds are ugly, etc. add sounds to `completer' as well.
15142
15143 @item
15144 redo with-trapping-errors so that the backtrace is stored away and only
15145 outputted when an error actually occurs (i.e. in the condition-case
15146 handler). test. (use ding of various sorts as a helpful way of checking
15147 out what's going on.)
15148
15149 @item
15150 problems with process input: |uniq (for example) leaves ^M's at end of
15151 line.
15152
15153 @item
15154 carefully review looking up of fonts by charset, esp. wrt the last
15155 element of a font spec.
15156
15157 @item
15158 add package support to ignore certain files -- *-util.el for languages.
15159
15160 @item
15161 review use of escape-quoted in auto_save_1() vs. the buffer's own coding
15162 system.
15163
15164 @item
15165 figure out how to get the total amount of data memory (i.e. everything
15166 but the code, or even including the code if can't distinguish) used by
15167 the process on each different OS, and use it in a new algorithm for
15168 triggering GC: trigger only when a certain % of the data size has been
15169 consed up; in addition, have a minimum.
15170
15171 @item
15172 fixed bugs???
15173
15174 @itemize
15175 @item
15176 Occasional crash when freeing display structures. The problem seems to
15177 be this: A window has a "display line dynarr"; each display line has a
15178 "display block dynarr". Sometimes this display block dynarr is getting
15179 freed twice. It appears from looking at the code that sometimes a
15180 display line from somewhere in the dynarr gets added to the end -- hence
15181 two pointers to the same display block dynarr. need to review this
15182 code.
15183 @end itemize
15184 @end itemize
15185
15186 @heading August 29, 2001
15187
15188 This is the most current list of priorities in `ben-mule-21-5'.
15189 Updated often.
15190
15191 high-priority:
15192
15193 @table @strong
15194
15195 @item [input]
15196
15197 @itemize
15198 @item
15199 support for WM_IME_CHAR. IME input can work under -nuni if we use
15200 WM_IME_CHAR. probably we should always be using this, instead of
15201 snarfing input using WM_COMPOSITION. i'll check this out.
15202
15203 @item
15204 Russian C-x problem. see above.
15205 @end itemize
15206
15207 @item [clean-up]
15208
15209 @itemize
15210 @item
15211 make sure it compiles and runs under non-mule. remember that some
15212 code needs the unicode support, or at least a simple version of it.
15213
15214 @item
15215 make sure it compiles and runs under pdump. see below.
15216
15217 @item
15218 make sure it compiles and runs under cygwin. see below.
15219
15220 @item
15221 clean up mswindows-multibyte, TSTR_TO_C_STRING. expand dfc
15222 optimizations to work across chain.
15223
15224 @item
15225 eliminate last vestiges of codepage<->charset conversion and similar
15226 stuff.
15227 @end itemize
15228
15229 @item [other]
15230
15231 @itemize
15232 @item
15233 test the "file-coding is binary only on Unix, no-Mule" stuff.
15234
15235 @item
15236 test that things work correctly in -nuni if the system environment
15237 is set to e.g. japanese -- i should get japanese menus, japanese
15238 file names, etc. same for russian, hebrew ...
15239
15240 @item
15241 cut and paste. see below.
15242
15243 @item
15244 misc issues with handling lang environments. see also August 25,
15245 "finally: working on the @kbd{C-x} in ...".
15246
15247 @itemize
15248 @item
15249 when switching lang env, needs to set keyboard layout.
15250
15251 @item
15252 user var to control whether, when moving into text of a
15253 particular language, we set the appropriate keyboard layout. we
15254 would need to have a lisp api for retrieving and setting the
15255 keyboard layout, set text properties to indicate the layout of
15256 text, and have a way of dealing with text with no property on
15257 it. (e.g. saved text has no text properties on it.) basically,
15258 we need to get a keyboard layout from a charset; getting a
15259 language would do. Perhaps we need a table that maps charsets
15260 to language environments.
15261
15262 @item
15263 test that the lang env is properly set at startup. test that
15264 switching the lang env properly sets the C locale (call
15265 @code{setlocale()}, set @code{LANG}, etc.) -- a spawned subprogram
15266 should have the new locale in its environment.
15267 @end itemize
15268
15269 @item
15270 look through everything below and see if anything is missed in this
15271 priority list, and if so add it. create a separate file for the
15272 priority list, so it can be updated as appropriate.
15273 @end itemize
15274 @end table
15275
15276 mid-priority:
15277
15278 @itemize
15279 @item
15280 clean up the chain coding system. its list should specify decode
15281 order, not encode; i now think this way is more logical. it should
15282 check the endpoints to make sure they make sense. it should also
15283 allow for the specification of "reverse-direction coding systems":
15284 use the specified coding system, but invert the sense of decode and
15285 encode.
15286
15287 @item
15288 along with that, places that take an arbitrary coding system and
15289 expect the ends to be anything specific need to check this, and add
15290 the appropriate conversions from byte->char or char->byte.
15291
15292 @item
15293 get some support for arabic, thai, vietnamese, japanese jisx 0212:
15294 at least get the unicode information in place and make sure we have
15295 things tied together so that we can display them. worry about r2l
15296 some other time.
15297
15298 @item
15299 check the handling of @kbd{C-c}. can XEmacs itself be interrupted with
15300 @kbd{C-c}? is that impossible now that we are a window, not a console,
15301 app? at least we should work something out with @file{i} so that if it
15302 receives a @kbd{C-c} or @kbd{C-break}, it interrupts XEmacs, too. check
15303 out how process groups work and if they apply only to console apps.
15304 also redo the way that XEmacs sends @kbd{C-c} to other apps. the
15305 business of injecting code should be last resort. we should try
15306 @kbd{C-c} first, and if that doesn't work, then the next time we try to
15307 interrupt the same process, use the injection method.
15308 @end itemize
15309
15310 @node Ben's README, , Ben's TODO list, The Great Mule Merge of March 2002
15311 @subsection Ben's README (probably obsolete)
15312
15313 These notes substantially overlap those in @ref{Ben's TODO list}. They
15314 should probably be combined.
15315
15316 This may be of some historical interest as a record of Ben at work.
15317 There may also be some useful suggestions as yet unimplemented.
15318
15319 @heading oct 27, 2001
15320
15321 -------- proposal for better buffer-switching commands:
15322
15323 implement what VC++ currently has. you have a single "switch" command
15324 like @kbd{CTRL-TAB}, which as long as you hold the @key{CTRL} button
15325 down, brings successive buffers that are "next in line" into the current
15326 position, bumping the rest forward. once you release the @key{CTRL}
15327 key, the chain is broken, and further @kbd{CTRL-TAB}s will start from
15328 the beginning again. this way, frequently used buffers naturally move
15329 toward the front of the chain, and you can switch back and forth between
15330 two buffers using @kbd{CTRL-TAB}. the only thing about @kbd{CTRL-TAB}
15331 is it's a bit awkward. the way to implement is to have modifier-up
15332 strokes fire off a hook, like modifier-up-hook. this is driven by event
15333 dispatch, so there are no synchronization issues. when @kbd{C-tab} is
15334 pressed, the binding function does something like set a one-shot handler
15335 on the modifier-up-hook (perhaps separate hooks for separate
15336 modifiers?).
15337
15338 to do this, we'd also want to change the buffer tabs so that they maintain
15339 their own order. in particular, they start out synched to the regular
15340 order, but as you make changes, you don't want the tabs to change
15341 order. (in fact, they may already do this.) selecting a particular buffer
15342 from the buffer tabs DOES make the buffer go to the head of the line. the
15343 invariant is that if the tabs are displaying X items, those X items are the
15344 first X items in the standard buffer list, but may be in a different
15345 order. (it looks like the tabs may already implement all of this.)
15346
15347 @heading oct 26, 2001
15348
15349 necessary testing/changes:
15350
15351 @itemize
15352 @item
15353 test all eol detection stuff under windows w/ and w/o mule, unix w/ and
15354 w/o mule. (test configure flag, command-line flag, menu option) may need
15355 a way of pretending to be unix under cygwin.
15356
15357 @item
15358 test under windows w/ and w/o mule, cygwin w/ and w/o mule, cygwin x
15359 windows w/ and w/o mule.
15360
15361 @item
15362 test undecided-dos/unix/mac.
15363
15364 @item
15365 check @kbd{ESC ESC} works as @code{isearch-quit} under TTY's.
15366
15367 @item
15368 test @code{coding-system-base} and all its uses (grep for them).
15369
15370 @item
15371 menu item to revert to most recent auto save.
15372
15373 @item
15374 consider renaming @code{build_string} -> @code{build_intstring} and
15375 @code{build_c_string} to @code{build_string}. (consistent with
15376 @code{build_msg_string} et al; many more @code{build_c_string} than
15377 @code{build_string})
15378 @end itemize
15379
15380 @heading oct 20, 2001
15381
15382 fixed problem causing crash due to invalid internal-format data, fixed
15383 an existing bug in @code{valid_char_p}, and added checks to more quickly
15384 catch when invalid chars are generated. still need to investigate why
15385 @code{mswindows-multibyte} is being detected.
15386
15387 i now see why -- we only process 65536 bytes due to a constant
15388 @code{MAX_BYTES_PROCESSED_FOR_DETECTION}. instead, we should have no
15389 limit as long as we have a seekable stream. we also need to write
15390 @code{stderr_out_lisp()}, used in the debug info routines i wrote.
15391
15392 check once more about @code{DEBUG_XEMACS}. i think debugging info
15393 should be ON by default. make sure it is. check that nothing untoward
15394 will result in a production system, e.g. presumably @code{assert()}s
15395 should not really @code{abort()}. (!! Actually, this should be runtime
15396 settable! Use a variable for this, and it can be set using the same
15397 @code{XEMACSDEBUG} method. In fact, now that I think of it, I'm sure
15398 that debugging info should be on always, with runtime ways of turning on
15399 or off any funny behavior.)
15400
15401 @heading oct 19, 2001
15402
15403 fixed various bugs preventing packages from being able to be built.
15404 still another bug, with @file{psgml/etc/cdtd/docbook}, which contains
15405 some strange characters starting around char pos 110,000. It gets
15406 detected as @code{mswindows-multibyte} (wrong! why?) and then invalid
15407 internal-format data is generated. need to fix
15408 @code{mswindows-multibyte} (and possibly add something that signals an
15409 error as well; need to work on this error-signalling mechanism) and
15410 figure out why it's getting detected as such. what i should do is add a
15411 debug var that outputs blow-by-blow info of the detection process.
15412
15413 @heading oct 9, 2001
15414
15415 the stuff with @code{global-window-system-map} doesn't appear to work. in any
15416 case it needs better documentation. [DONE]
15417
15418 @kbd{M-home}, @kbd{M-end} do work, but cause cl-macs to get loaded. why?
15419
15420 @heading oct 8, 2001
15421
15422 finished the coding system changes and they finally work!
15423
15424 need to implement undecided-unix/dos/mac. they should be easy to do; it
15425 should be enough to specify an eol-type but not do-eol, but check this.
15426
15427 consider making the standard naming be foo-lf/crlf/cr, with unix/dos/mac as
15428 aliases.
15429
15430 print methods for coding systems should include some of the generic
15431 properties. (also then fix print_..._within_print_method). [DONE]
15432
15433 in a little while, go back and delete the
15434 @code{text-file-wrapper-coding-system} code. (it'll be in CVS if
15435 necessary to get at it.) [DONE]
15436
15437 need to verify at some point that non-text-file coding systems work
15438 properly when specified. when gzip is working, this would be a good test
15439 case. (and consider creating base64 as well!)
15440
15441 remove extra crap from @code{coding-system-category} that checks for
15442 chain coding systems. [DONE]
15443
15444 perhaps make a primitive that gets at
15445 @code{coding-system-canonical}. [DONE]
15446
15447 need to test cygwin, compiling the mule packages, get unix-eol stuff
15448 working. frank from germany says he doesn't see a lisp backtrace when he
15449 gets an error during temacs? verify that this actually gets outputted.
15450
15451 consider putting the current language on the modeline, mousable so it can
15452 be switched. also consider making the coding system be mousable and the
15453 line number (pick a line) and the percentage (pick a percentage).
15454
15455 @heading oct 6, 2001
15456
15457 added code so that @code{debug_print()} will output a newline to the
15458 mswindows debugging output, not just the console. need to test. [DONE]
15459
15460 working on problem where all files are being detected as binary. the
15461 problem may be that the undecided coding system is getting wrapped with
15462 an auto-eol coding system, which it shouldn't be -- but even in this
15463 situation, we should get the right results! check the
15464 canonicalize-after-coding methods. also,
15465 @code{determine_real_coding_system} appears to be getting called even
15466 when we're not detecting encoding. also, undecided needs a print method
15467 to show its params, and chain needs to be updated to show
15468 @code{canonicalize_after_coding}. check others as well. [DONE]
15469
15470 @heading oct 5, 2001
15471
15472 finished up coding system changes, testing.
15473
15474 errors byte-compiling files in @code{iso-2022-7-bit}. perhaps it's not
15475 correctly detecting the encoding?
15476
15477 noticed a problem in the dfc macros: we call
15478 @code{get_coding_system_for_text_file} with @code{eol_wrap == 1}, to
15479 allow for auto-detection of the eol type; but this defeats the check and
15480 short-circuit for unicode.
15481
15482 still need to implement calling @code{determine_real_coding_system()}
15483 for non-seekable streams. to implement correctly, we need to do our own
15484 buffering. [DONE, BUT WITHOUT BUFFERING]
15485
15486 @heading oct 4, 2001
15487
15488 implemented most stuff below.
15489
15490 need to finish up changes to @code{make_coding_system_1}. (i changed the
15491 way internal coding systems were handled; i need to create subsidiaries
15492 for all types of coding systems, not just text ones.) there's a nasty
15493 @code{xfree()} crash i was hitting; perhaps it'll go away once all stuff
15494 has been rewritten.
15495
15496 check under cygwin to make sure that when an error occurs during loadup, a
15497 backtrace is output.
15498
15499 as soon as andy releases his new setup, we should put it onto various
15500 standard windows software repositories.
15501
15502 @heading oct 3, 2001
15503
15504 added @code{global-tty-map} and @code{global-window-system-map}. add
15505 some stuff to the maps, e.g. @kbd{C-x ESC} for repeat vs. @kbd{C-x ESC
15506 ESC} on TTY's, and of course @kbd{ESC ESC} on window systems
15507 vs. @kbd{ESC ESC ESC} on TTY's. [TEST]
15508
15509 was working on integrating the two @code{help-for-tutorial} versions (mule,
15510 non-mule). [DONE, but test under non-Mule]
15511
15512 was working on the file-coding changes. need to think more about
15513 @code{text-file-wrapper}. conclusion i think is that
15514 @code{get_coding_system_for_text_file} should wrap using a special
15515 coding system type called a @code{text-file-wrapper}, which inherits
15516 from chain, and implements @code{canonicalize-after-decoding} to just
15517 return the unwrapped coding system. We need to implement inheritance of
15518 coding systems, which will certainly come in extremely useful when
15519 coding systems get implemented in Lisp, which should happen at some
15520 point. (see existing docs about this.) essentially, we have a way of
15521 declaring that we inherit from some system, and the appropriate data
15522 structures get created, perhaps just an extra inheritance pointer. but
15523 when we create the coding system, the extra data needs to be a stretchy
15524 array of offsets, pointing to the type-specific data for the coding
15525 system type and all its parents. that means that in the methods
15526 structure for a coding system (which perhaps should be expanded beyond
15527 method, it's just a "class structure") is the index in these arrays of
15528 offsets. @code{CODING_SYSTEM_DATA()} can take any of the coding system
15529 classes (rename type to class!) that make up this class. similarly, a
15530 coding system class inherits its methods from the class above unless
15531 specifying its own method, and can call the superclass method at any
15532 point by either just invoking its name, or conceivably by some macro
15533 like
15534
15535 @samp{CALL_SUPER (method, (args))}
15536
15537 similar mods would have to be made to coding stream structures.
15538
15539 perhaps for the immediate we can just sort of fake things like we currently
15540 do with undecided calling some stuff from chain.
15541
15542 @heading oct 2, 2001
15543
15544 need to implement support for iso-8859-15, i.e. iso-8859-1 + euro symbol.
15545 figure out how to fall back to iso-8859-1 as necessary.
15546
15547 leave the current bindings the way they are for the moment, but bump off
15548 @kbd{M-home} and @kbd{M-end} (hardly used), and substitute my buffer
15549 movement stuff there. [DONE, but test]
15550
15551 there's something to be said for combining block of 6 and paragraph,
15552 esp. if we make the definition of "paragraph" be so that it skips by 6 when
15553 within code. hmm.
15554
15555 eliminate @code{advertised-undo} crap, and similar hacks. [DONE]
15556
15557 think about obsolete stuff to be eliminated. think about eliminating or
15558 dimming obsolete items from @code{hyper-apropos} and something similar
15559 in completion buffers.
15560
15561 @heading sep 30, 2001
15562
15563 synched up the tutorials with FSF 21.0.105. was rewriting them to favor
15564 the cursor keys over the older @kbd{C-p}, etc. keys.
15565
15566 Got thinking about key bindings again.
15567
15568 @enumerate
15569 @item
15570 I think that @kbd{M-up/down} and @kbd{M-C-up/down} should be reversed. I use
15571 scroll-up/down much more often than motion by paragraph.
15572
15573 @item
15574 Should we eliminate move by block (of 6) and subsitute it for paragraph?
15575 This would have the advantage that I could make bindings for buffer
15576 change (forward/back buffer, perhaps @kbd{M-C-up/down}. with shift,
15577 @kbd{M-C-S-up/down} only goes within the same type (C files, etc.).
15578 alternatively, just bump off @code{beginning-of-defun} from
15579 @kbd{C-M-home}, since it's on @kbd{C-M-a} already.
15580 @end enumerate
15581
15582 need someone to go over the other tutorials (five new ones, from FSF
15583 21.0.105) and fix them up to correspond to the english one.
15584
15585 shouldn't shift-motion work with @kbd{C-a} and such as well as arrows?
15586
15587 @heading sep 29, 2001
15588
15589 @code{charcount_to_bytecount} can also be made to scream -- as can
15590 @code{scan_buffer}, @code{buffer_mule_signal_inserted_region}, others?
15591 we should start profiling though before going too far down this line.
15592
15593 Debug code that causes no slowdown should in general remain in the
15594 executable even in the release version because it may be useful
15595 (e.g. for people to see the event output). so @code{DEBUG_XEMACS}
15596 should be rethought. things like use of @file{msvcrtd.dll} should be
15597 controlled by error_checking on. maybe @code{DEBUG_XEMACS} controls
15598 general debug code (e.g. use of @file{msvcrtd.dll}, asserts abort, error
15599 checking), and the actual debugging code should remain always, or be
15600 conditonalized on something else (e.g. @samp{DEBUGGING_FUNS_PRESENT}).
15601
15602 doc strings in dumped files are displayed with an extra blank line between
15603 each line. presumably this is recent? i assume either the change to
15604 detect-coding-region or the double-wrapping mentioned below.
15605
15606 error with @code{coding-system-property} on @code{iso-2022-jp-dos}.
15607 problem is that that coding system is wrapped, so its type shows up as
15608 @code{chain}, not @code{iso-2022}. this is a general problem, and i
15609 think the way to fix it is to in essence do late canonicalization --
15610 similar in spirit to what was done long ago,
15611 @code{canonicalize_when_code}, except that the new coding system (the
15612 wrapper) is created only once, either when the original cs is created or
15613 when first needed. this way, operations on the coding system work like
15614 expected, and you get the same results as currently when
15615 decoding/encoding. the only thing tricky is handling
15616 @code{canonicalize-after-coding} and the ever-tricky double-wrapping
15617 problem mentioned below. i think the proper solution is to move the
15618 autodetection of eol into the main autodetect type. it can be asked to
15619 autodetect eol, coding, or both. for just coding, it does like it
15620 currently does. for just eol, it does similar to what it currently does
15621 but runs the detection code that @code{convert-eol} currently does, and
15622 selects the appropriate @code{convert-eol} system. when it does both
15623 eol and coding, it does something on the order of creating two more
15624 autodetect coding systems, one for eol only and one for coding only, and
15625 chains them together. when each has detected the appropriate value, the
15626 results are combined. this automatically eliminates the double-wrapping
15627 problem, removes the need for complicated
15628 @code{canonicalize-after-coding} stuff in chain, and fixes the problem
15629 of autodetect not having a seekable stream because hidden inside of a
15630 chain. (we presume that in the both-eol-and-coding case, the various
15631 autodetect coding streams can communicate with each other
15632 appropriately.)
15633
15634 also, we should solve the problem of internal coding systems floating
15635 around and clogging up the list simply by having an "internal" property
15636 on cs's and an internal param to @code{coding-system-list} (optional; if
15637 not given, you don't get the internal ones). [DONE]
15638
15639 we should try to reduce the size of the from-unicode tables (the dominant
15640 memory hog in the tables). one obvious thing is to not store a whole
15641 emchar as the mapped-to value, but a short that encodes the octets. [DONE]
15642
15643 @heading sep 28, 2001
15644
15645 need to merge up to latest in trunk.
15646
15647 add unicode charsets for all non-translatable unicode chars; probably
15648 want to extend the concept of charsets to allow for dimension 3 and
15649 dimension 4 charsets. for the moment we should stick with just
15650 dimension 3 charsets; otherwise we run past the current maximum of 4
15651 bytes per emchar. (most code would work automatically since it
15652 uses@code{ MAX_EMCHAR_LEN}; the trickiness is in certain code that has
15653 intimate knowledge of the representation.
15654 e.g. @code{bufpos_to_bytind()} has to multiply or divide by 1, 2, 3, or
15655 4, and has special ways of handling each number. with 5 or 6 bytes per
15656 char, we'd have to change that code in various ways.) 96x96x96 = 884,000
15657 or so, so with two 96x96x96 charsets, we could tackle all Unicode values
15658 representable by UTF-16 and then some -- and only these codepoints will
15659 ever have assigned chars, as far as we know.
15660
15661 need an easy way of showing the current language environment. some menus
15662 need to have the current one checked or whatever. [DONE]
15663
15664 implement unicode surrogates.
15665
15666 implement @code{buffer-file-coding-system-when-loaded} -- make sure
15667 @code{find-file}, @code{revert-file}, etc. set the coding system [DONE]
15668
15669 verify all the menu stuff [DONE]
15670
15671 implemented the entirely-ascii check in buffers. not sure how much gain
15672 it'll get us as we already have a known range inside of which is
15673 constant time, and with pure-ascii files the known range spans the whole
15674 buffer. improved the comment about how @code{bufpos-to-bytind} and
15675 vice-versa work. [DONE]
15676
15677 fix double-wrapping of @code{convert-eol}: when undecided converts
15678 itself to something with a non-autodetect eol, it needs to tell the
15679 adjacent @code{convert-eol} to reduce itself to nothing.
15680
15681 need menu item for find file with specified encoding. [DONE]
15682
15683 renamed coding systems mswindows-### to windows-### to follow the standard
15684 in rfc1345. [DONE]
15685
15686 implemented @code{coding-system-subsidiary-parent} [DONE]
15687 @code{HAVE_MULE} -> @code{MULE} in files in @file{nt/} so that depend
15688 checking works [DONE]
15689
15690 need to take the smarter @code{search-all-files-in-dir} stuff from my
15691 sample init file and put it on the grep menu [DONE]
15692
15693 added item for revert w/specified encoding; mostly works, but needs
15694 fixes. in particular, you get the correct results, but
15695 @code{buffer-file-coding-system} does not reflect things right. also,
15696 there are too many entries. need to split into submenus. there is
15697 already split code out there; see if it's generalized and if not make it
15698 so. it should only split when there's more than a specified number, and
15699 when splitting, split into groups of a specified size, not into a
15700 specified number of groups. [DONE]
15701
15702 too many entries in the langenv menus; need to split. [DONE]
15703
15704 @heading sep 27, 2001
15705
15706 NOTE: @kbd{M-x grep} for make-string causes crash now. something
15707 definitely to do with string changes. check very carefully the diffs
15708 and put in those sledgehammer checks. [DONE]
15709
15710 fix font-lock bug i introduced. [DONE]
15711
15712 added optimization to strings (keeps track of # of bytes of ascii at the
15713 beginning of a string). perhaps should also keep an all-ascii flag to deal
15714 with really large (> 2 MB) strings. rewrite code to count ascii-begin to
15715 use the 4-or-8-at-a-time stuff in @code{bytecount_to_charcount}.
15716
15717 Error: @kbd{M-q} is causing Invalid Regexp error on the above paragraph.
15718 It's not in working. I assume it's a side effect of the string stuff.
15719 VERIFY! Write sledgehammer checks for strings. [DONE]
15720
15721 revamped the locale/init stuff so that it tries much harder to get things
15722 right. should test a bit more. in particular, test out Describe Language
15723 on the various created environments and make sure everything looks right.
15724
15725 should change the menus: move the submenus on @samp{Edit->Mule} directly
15726 under @samp{Edit}. add a menu entry on @samp{File} to say "Reload with
15727 specified encoding ->". [DONE]
15728
15729 Also @samp{Find File} with specified encoding -> Also entry to change
15730 the EOL settings for Unix, and implement it.
15731
15732 @code{decode-coding-region} isn't working because it needs to insert a
15733 binary (char->byte) converter. [DONE]
15734
15735 chain should be rearranged to be in decoding order; similar for
15736 source/sink-type, other things?
15737
15738 the detector should check for a magic cookie even without a seekable input.
15739 (currently its input is not seekable, because it's hidden within a chain.
15740 #### See what we can do about this.)
15741
15742 provide a way to display various settings, e.g. the current category
15743 mappings and priority (see mule-diag; get this working so it's in the
15744 path); also a way to print out the likeliness results from a detection,
15745 perhaps a debug flag.
15746
15747 problem with `env', which causes path issues due to `env' in packages.
15748 move env code to process, sync with fsf 21.0.105, check that the autoloads
15749 in `env' don't cause problems. [DONE]
15750
15751 8-bit iso2022 detection appears broken; or at least, mule-canna.c is not so
15752 detected.
15753
15754 @heading sep 25, 2001
15755
15756 something else to do is review the font selection and fix it so that (e.g.)
15757 JISX-0212 can be displayed.
15758
15759 also, text in widgets needs to be drawn by us so that the correct fonts
15760 will be displayed even in multi-lingual text.
15761
15762 @heading sep 24, 2001
15763
15764 the detection system is now properly abstracted. the detectors have been
15765 rewritten to include multiple levels of abstraction. now we just need
15766 detectors for ascii, binary, and latin-x, as well as more sophisticated
15767 detectors in general and further review of the general algorithm for doing
15768 detection. (#### Is this written up anywhere?) after that, consider adding
15769 error-checking to decoding (VERY IMPORTANT) and verifying the binary
15770 correctness of things under unix no-mule.
15771
15772 @heading sep 23, 2001
15773
15774 began to fix the detection system -- adding multiple levels of likelihood
15775 and properly abstracting the detectors. the system is in place except for
15776 the abstraction of the detector-specific data out of the struct
15777 detection_state. we should get things working first before tackling that
15778 (which should not be too hard). i'm rewriting algorithms here rather than
15779 just converting code, so it's harder. mostly done with everything, but i
15780 need to review all detectors except iso2022 and make them properly follow
15781 the new way. also write a no-conversion detector. also need to look into
15782 the `recode' package and see how (if?) they handle detection, and maybe
15783 copy some of the algorithms. also look at recent FSF 21.0 and see if their
15784 algorithms have improved.
15785
15786 @heading sep 22, 2001
15787
15788 @itemize
15789 @item
15790 fixed gc bugs from yesterday.
15791
15792 @item
15793 fixed truename bug.
15794
15795 @item
15796 close/finalize stuff works.
15797
15798 @item
15799 eliminated notyet stuff in syswindows.h.
15800
15801 @item
15802 eliminated special code in tstr_to_c_string.
15803
15804 @item
15805 fixed pdump problems. (many of them, mostly latent bugs, ugh)
15806
15807 @item
15808 fixed cygwin @code{sscanf} problems in
15809 @code{parse-unicode-translation-table}. (NOT a @code{sscanf} bug, but
15810 subtly different behavior w.r.t. whitespace in the format string,
15811 combined with a debugger that sucks ROCKS!! and consistently outputs
15812 garbage for variable values.)
15813 @end itemize
15814
15815 main stuff to test is the handling of EOF recognition vs. binary
15816 (i.e. check what the default settings are under Unix). then we may have
15817 something that WORKS on all platforms!!! (Also need to test Windows
15818 non-Mule)
15819
15820 @heading sep 21, 2001
15821
15822 finished redoing the close/finalize stuff in the lstream code. but i
15823 encountered again the nasty bug mentioned on sep 15 that disappeared on
15824 its own then. the problem seems to be that the finalize method of some
15825 of the lstreams is calling @code{Lstream_delete()}, which calls
15826 @code{free_managed_lcrecord()}, which is a no-no when we're inside of
15827 garbage-collection and the object passed to
15828 @code{free_managed_lcrecord()} is unmarked, and about to be released by
15829 the gc mechanism -- the free lists will end up with @code{xfree()}d
15830 objects on them, which is very bad. we need to modify
15831 @code{free_managed_lcrecord()} to check if we're in gc and the object is
15832 unmarked, and ignore it rather than move it to the free list. [DONE]
15833
15834 (#### What we really need to do is do what Java and C# do w.r.t. their
15835 finalize methods: For objects with finalizers, when they're about to be
15836 freed, leave them marked, run the finalizer, and set another bit on them
15837 indicating that the finalizer has run. Next GC cycle, the objects will
15838 again come up for freeing, and this time the sweeper notices that the
15839 finalize method has already been called, and frees them for good (provided
15840 that a finalize method didn't do something to make the object alive
15841 again).)
15842
15843 @heading sep 20, 2001
15844
15845 redid the lstream code so there is only one coding stream. combined the
15846 various doubled coding stream methods into one; i'm a little bit unsure
15847 of this last part, though, as the results of combining the two together
15848 seem unclean. got it to compile, but it crashes in loadup. need to go
15849 through and rehash the close vs. finalize stuff, as the problem was
15850 stuff getting freed too quickly, before the canonicalize-after-decoding
15851 was run. should eliminate entirely @code{CODING_STATE_END} and use a
15852 different method (close coding stream). rewrite to use these two. make
15853 sure they're called in the right places. @code{Lstream_close} on a
15854 stream should *NOT* do finalizing. finalize only on delete. [DONE]
15855
15856 in general i'd like to see the flags eliminated and converted to
15857 bit-fields. also, rewriting the methods to take advantage of rejecting
15858 should make it possible to eliminate much of the state in the various
15859 methods, esp. including the flags. need to test this is working, though --
15860 reduce the buffer size down very low and try files with only CRLF's in
15861 them, with one offset by a byte from the other, and see if we correctly
15862 handle rejection.
15863
15864 still have the problem with incorrectly truenaming files.
15865
15866
15867 @heading sep 19, 2001
15868
15869 bug reported: crash while closing lstreams.
15870
15871 the lstream/coding system close code needs revamping. we need to document
15872 that order of closing lstreams is very important, and make sure we're
15873 consistent. furthermore, chain and undecided lstreams need to close their
15874 underneath lstreams when they receive the EOF signal (there may be data in
15875 the underneath streams waiting to come out), not when they themselves are
15876 closed. [DONE]
15877
15878 (if only we had proper inheritance. i think in any case we should
15879 simulate it for the chain coding stream -- write things in such a way that
15880 undecided can use the chain coding stream and not have to duplicate
15881 anything itself.)
15882
15883 in general we need to carefully think through the closing process to make
15884 sure everything always works correctly and in the right order. also check
15885 very carefully to make sure there are no dangling pointers to deleted
15886 objects floating around.
15887
15888 move the docs for the lstream functions to the functions themselves, not
15889 the header files. document more carefully what exactly
15890 @code{Lstream_delete()} means and how it's used, what the connections
15891 are between @code{Lstream_close(}), @code{Lstream_delete()},
15892 @code{Lstream_flush()}, @code{lstream_finalize}, etc. [DONE]
15893
15894 additional error-checking: consider deadbeefing the memory in objects
15895 stored in lcrecord free lists; furthermore, consider whether lifo or
15896 fifo is correct; under error-checking, we should perhaps be doing fifo,
15897 and setting a minimum number of objects on the lists that's quite large
15898 so that it's highly likely that any erroneous accesses to freed objects
15899 will go into such deadbeefed memory and cause crashes. also, at the
15900 earliest available opportunity, go through all freed memory and check
15901 for any consistency failures (overwrites of the deadbeef), crashing if
15902 so. perhaps we could have some sort of id for each block, to easier
15903 trace where the offending block came from. (all of these ideas are
15904 present in the debug system malloc from VC++, plus more stuff.) there's
15905 similar code i wrote sitting somewhere (in @file{free-hook.c}? doesn't
15906 appear so. we need to delete the blocking stuff out of there!). also
15907 look into using the debug system malloc from VC++, which has lots of
15908 cool stuff in it. we even have the sources. that means compiling under
15909 pdump, which would be a good idea anyway. set it as the default. (but
15910 then, we need to remove the requirement that Xpm be a DLL, which is
15911 extremely annoying. look into this.)
15912
15913 test the windows code page coding systems recently created.
15914
15915 problems reading my mail files -- 1personal appears to hang, others come up
15916 with lots of ^M's. investigate.
15917
15918 test the enum functions i just wrote, and finish them.
15919
15920 still pdump problems.
15921
15922 @heading sep 18, 2001
15923
15924 critical-quit broken sometime after aug 25.
15925
15926 @itemize
15927 @item
15928 fixed critical quit.
15929
15930 @item
15931 fixed process problems.
15932
15933 @item
15934 print routines work. (no routine for ccl, though)
15935
15936 @item
15937 can read and write unicode files, and they can still be read by some
15938 other program
15939
15940 @item
15941 defaults should come up correctly -- mswindows-multibyte is general.
15942 @end itemize
15943
15944 still need to test matej's stuff.
15945 seems ok with multibyte stuff but needs more testing.
15946
15947 @heading sep 17, 2001
15948
15949 !!!!! something broken with processes !!!!! cannot send mail anymore. must
15950 investigate.
15951
15952 @heading sep 17, 2001
15953
15954 on mon/wed nights, stop *BEFORE* 11pm. Otherwise i just start getting
15955 woozy and can't concentrate.
15956
15957 just finished getting assorted fixups to the main branch committed, so it
15958 will compile under C++ (Andy committed some code that broke C++ builds).
15959 cup'd the code into the fixtypes workspace, updated the tags appropriately.
15960 i've created the appropriate log message, sitting in fixtypes.txt in
15961 /src/xemacs; perhaps it should go into a README. now i just have to build
15962 on everything (it's currently building), verify it's ok, run patcher-mail,
15963 commit, send.
15964
15965 my mule ws is also very close. need to:
15966
15967 @itemize
15968 @item
15969 test the new print routines.
15970
15971 @item
15972 test it can read and write unicode files, and they can still be read by
15973 some other program.
15974
15975 @item
15976 try to see if unicode can be auto-detected properly.
15977
15978 @item
15979 test it can read and write multibyte files in a few different formats.
15980 currently can't recognize them, but if you set the cs right, it should
15981 work.
15982
15983 @item
15984 examine the test files sent by matej and see if we can handle them.
15985 @end itemize
15986
15987 @heading sep 15, 2001
15988
15989 more eol fixing. this stuff is utter crap.
15990
15991 currently we wrap coding systems with @code{convert-eol-autodetect} when we create
15992 them in @code{make_coding_system_1}. i had a feeling that this would be a
15993 problem, and indeed it is -- when autodetecting with `undecided', for
15994 example, we end up with multiple layers of eol conversion. to avoid this,
15995 we need to do the eol wrapping *ONLY* when we actually retrieve a coding
15996 system in places such as @code{insert-file-contents}. these places are
15997 @code{insert-file-contents}, load, process input, @code{call-process-internal},
15998 @samp{encode/decode/detect-coding-region}, database input, ...
15999
16000 (later) it's fixed, and things basically work. NOTE: for some reason,
16001 adding code to wrap coding systems with @code{convert-eol-lf} when
16002 @code{eol-type == lf} results in crashing during garbage collection in
16003 some pretty obscure place -- an lstream is free when it shouldn't be.
16004 this is a bad sign. i guess something might be getting initialized too
16005 early?
16006
16007 we still need to fix the canonicalization-after-decoding code to avoid
16008 problems with coding systems like `internal-7' showing up. basically,
16009 when @code{eol==lf} is detected, nil should be returned, and the callers
16010 should handle it appropriately, eliding when necessary. chain needs to
16011 recognize when it's got only one (or even 0) items in the chain, and
16012 elide out the chain.
16013
16014 @heading sep 11, 2001: the day that will live in infamy
16015
16016 rewrite of sep 9 entry about formats:
16017
16018 when calling @samp{make-coding-system}, the name can be a cons of @samp{(format1 .
16019 format2)}, specifying that it decodes @samp{format1->format2} and encodes the other
16020 way. if only one name is given, that is assumed to be @samp{format1}, and the
16021 other is either `external' or `internal' depending on the end type.
16022 normally the user when decoding gives the decoding order in formats, but
16023 can leave off the last one, `internal', which is assumed. a multichain
16024 might look like gzip|multibyte|unicode, using the coding systems named
16025 `gzip', `(unicode . multibyte)' and `unicode'. the way this actually works
16026 is by searching for gzip->multibyte; if not found, look for gzip->external
16027 or gzip->internal. (In general we automatically do conversion between
16028 internal and external as necessary: thus gzip|crlf does the expected, and
16029 maps to gzip->external, external->internal, crlf->internal, which when
16030 fully specified would be gzip|external:external|internal:crlf|internal --
16031 see below.) To forcibly fit together two converters that have explicitly
16032 specified and incompatible names (say you have unicode->multibyte and
16033 iso8859-1->ebcdic and you know that the multibyte and iso8859-1 in this
16034 case are compatible), you can force-cast using :, like this:
16035 ebcdic|iso8859-1:multibyte|unicode. (again, if you force-cast between
16036 internal and external formats, the conversion happens automatically.)
16037
16038
16039 @heading sep 10, 2001
16040
16041 moved the autodetection stuff (both codesys and eol) into particular coding
16042 systems -- `undecided' and `convert-eol' (type == `autodetect'). needs
16043 lots of work. still need to search through the rest of the code and find
16044 any remaining auto-detect code and move it into the undecided coding
16045 system. need to modify make-coding-system so that it spits out
16046 auto-detecting versions of all text-file coding systems unless we say not
16047 to. need eliminate entirely the EOF flag from both the stream info and the
16048 coding system; have only the original-eof flag. in
16049 coding_system_from_mask, need to check that the returned value is not of
16050 type `undecided', falling back to no-conversion if so. also need to make
16051 sure we wrap everything appropriate for text-files -- i removed the
16052 wrapping on set-coding-category-list or whatever (need to check all those
16053 files to make sure all wrapping is removed). need to review carefully the
16054 new code in `undecided' to make sure it works are preserves the same logic
16055 as previously. need to review the closing and rewinding behavior of chain
16056 and undecided (same -- should really consolidate into helper routines, so
16057 that any coding system can embed a chain in it) -- make sure the dynarr's
16058 are getting their data flushed out as necessary, rewound/closed in the
16059 right order, no missing steps, etc.
16060
16061 also split out mule stuff into @file{mule-coding.c}. work done on
16062 @file{configure}/@file{xemacs.mak}/@file{Makefile}s not done yet. work
16063 on @file{emacs.c}/@file{symsinit.h} to interface with the new init
16064 functions not done yet.
16065
16066 also put in a few declarations of the way i think the abstracted detection
16067 stuff ought to go. DON'T WORK ON THIS MORE UNTIL THE REST IS DEALT WITH
16068 AND WE HAVE A WORKING XEMACS AGAIN WITH ALL EOL ISSUES NAILED.
16069
16070 really need a version of @file{cvs-mods} that reports only the current
16071 directory. WRITE THIS! use it to implement a better
16072 @file{cvs-checkin}.
16073
16074 @heading sep 9, 2001
16075
16076 implemented a gzip coding system. unfortunately, doesn't quite work right
16077 because it doesn't handle the gzip headers -- it just reads and writes raw
16078 zlib data. there's no function in the library to skip past the header, but
16079 we do have some code out of the library that we can snarf that implements
16080 header parsing. we need to snarf that, store it, and output it again at
16081 the beginning when encoding. in the process, we should create a "get next
16082 byte" macro that bails out when there are no more. using this, we set up a
16083 nice way of doing most stuff statelessly -- if we have to bail, we reject
16084 everything back to the sync point. also need to fix up the autodetection
16085 of zlib in configure.in.
16086
16087 BIG problems with eol. finished up everything i thought i would need to
16088 get eol stuff working, but no -- when you have mswindows-unicode, with its
16089 eol set to autodetect, the detection routines themselves do the autodetect
16090 (first), and fail (they report CR on CRLF because of the NULL byte between
16091 the CR and the LF) since they're not looking at ascii data. with a chain
16092 it's similarly bad. for mswindows-multibyte, for example, which is a chain
16093 unicode->unicode-to-multibyte, autodetection happens inside of the chain,
16094 both when unicode and unicode-to-multibyte are active. we could twiddle
16095 around with the eol flags to try to deal with this, but it's gonna be a
16096 big mess, which is exactly what we're trying to avoid. what we
16097 basically want is to entirely rip out all EOL settings from either the
16098 coding system or the stream (yes, there are two! one might saw
16099 autodetect, and then the stream contains the actual detected value).
16100 instead, we simply create an eol-autodetect coding system -- or rather,
16101 it's part of the convert-eol coding system. convert-eol, type =
16102 autodetect, does autodetection the first time it gets data sent to it to
16103 decode, and thereafter sets a stream parameter indicating the actual eol
16104 type for this stream. this means that all autodetect coding systems, as
16105 created by @code{make-coding-system}, really are chains with a
16106 convert-eol at the beginning. only subsidiary xxx-unix has no wrapping
16107 at all. this should allow eof detection of gzip, unicode, etc. for
16108 that matter, general autodetection should be entirely encapsulated
16109 inside of the `autodetect' coding system, with no eol-autodetection --
16110 the chain becomes convert-eol (autodetect) -> autodetect or perhaps
16111 backwards. the generic autodetect similarly has a coding-system in its
16112 stream methods, and needs somehow or other to insert the detected
16113 coding-system into the chain. either it contains a chain inside of it
16114 (perhaps it *IS* a chain), or there's some magic involving
16115 canonicalization-type switcherooing in the middle of a decode. either
16116 way, once everything is good and done and we want to save the coding
16117 system so it can be used later, we need to do another sort of
16118 canonicalization -- converting auto-detect-type coding systems into the
16119 detected systems. again, a coding-system method, with some magic
16120 currently so that subsidiaries get properly used rather than something
16121 that's new but equivalent to subsidiaries. (#### perhaps we could use a
16122 hash table to avoid recreating coding systems when not necessary. but
16123 that would require that coding systems be immutable from external, and
16124 i'm not sure that's the case.)
16125
16126 i really think, after all, that i should reverse the naming of everything
16127 in chain and source-sink-type -- they should be decoding-centric. later
16128 on, if/when we come up with the proper way to make it totally symmetrical,
16129 we'll be fine whether before then we were encoding or decoding centric.
16130
16131
16132 @heading sep 9, 2001
16133
16134 investigated eol parameter.
16135
16136 implemented handling in @code{make-coding-system} of @code{eol-cr} and
16137 @code{eol-crlf}. fixed calls everywhere to @code{Fget_coding_system} /
16138 @code{Ffind_coding_system} to reject non-char->byte coding systems.
16139
16140 still need to handle "query eol type using coding-system-property" so it
16141 magically returns the right type by parsing the chain.
16142
16143 no work done on formats, as mentioned below. we should consider using :
16144 instead of || to indicate casting.
16145
16146 @heading early sep 9, 2001
16147
16148 renamed some codesys properties: `list' in chain -> chain; `subtype' in
16149 unicode -> type. everything compiles again and sort of works; some CRLF
16150 problems that may resolve themselves when i finish the convert-eol stuff.
16151 the stuff to create subsidiaries has been rewritten to use chains; but i
16152 still need to investigate how the EOL type parameter is used. also, still
16153 need to implement this: when a coding system is created, and its eol type
16154 is not autodetect or lf, a chain needs to be created and returned. i think
16155 that what needs to happen is that the eol type can only be set to
16156 autodetect or lf; later on this should be changed to simply be either
16157 autodetect or not (but that would require ripping out the eol converting
16158 stuff in the various coding systems), and eventually we will do the work on
16159 the detection mechanism so it can do chain detection; then we won't need an
16160 eol autodetect setting at all. i think there's a way to query the eol type
16161 of a coding system; this should check to see if the coding system is a
16162 chain and there's a convert-eol at the front; if so, the eol type comes
16163 from the type of the convert-eol.
16164
16165 also check out everywhere that @code{Fget_coding_system} or
16166 @code{Ffind_coding_system} is called, and see whether anything but a
16167 char->byte system can be tolerated. create a new function for all the
16168 places that only want char->byte, something like
16169 @samp{get_coding_system_char_to_byte_only}.
16170
16171 think about specifying formats in make-coding-system. perhaps the name can
16172 be a cons of (format1, format2), specifying that it encodes
16173 format1->format2 and decodes the other way. if only one name is given,
16174 that is assumed to be format2, and the other is either `byte' or `char'
16175 depending on the end type. normally the user when decoding gives the
16176 decoding order in formats, but can leave off the last one, `char', which is
16177 assumed. perhaps we should say `internal' instead of `char' and `external'
16178 instead of byte. a multichain might look like gzip|multibyte|unicode,
16179 using the coding systems named `gzip', `(unicode . multibyte)' and
16180 `unicode'. we would have to allow something where one format is given only
16181 as generic byte/char or internal/external to fit with any of the same
16182 byte/char type. when forcibly fitting together two converters that have
16183 explicitly specified and incompatible names (say you have
16184 unicode->multibyte and iso8859-1->ebcdic and you know that the multibyte
16185 and iso8859-1 in this case are compatible), you can force-cast using ||,
16186 like this: ebcdic|iso8859-1||multibyte|unicode. this will also force
16187 external->internal translation as necessary:
16188 unicode|multibyte||crlf|internal does unicode->multibyte,
16189 external->internal, crlf->internal. perhaps you'd need to put in the
16190 internal translation, like this: unicode|multibyte|internal||crlf|internal,
16191 which means unicode->multibyte, external->internal (multibyte is compatible
16192 with external); force-cast to crlf format and convert crlf->internal.
16193
16194 @heading even later: Sep 8, 2001
16195
16196 chain doesn't need to set character mode, that happens automatically when
16197 the coding systems are created. fixed chain to return correct source/sink
16198 type for itself and to check the compatibility of source/sink types in its
16199 chain. fixed decode/encode-coding-region to check the source and sink
16200 types of the coding system performing the conversion and insert appropriate
16201 byte->char/char->byte converters (aka "binary" coding system). fixed
16202 set-coding-category-system to only accept the traditional
16203 encode-char-to-byte types of coding systems.
16204
16205 still need to extend chain to specify the parameters mentioned below,
16206 esp. "reverse". also need to extend the print mechanism for chain so it
16207 prints out the chain. probably this should be general: have a new method
16208 to return all properties, and output those properties. you could also
16209 implement a read syntax for coding systems this way.
16210
16211 still need to implement @code{convert-eol} and finish up the rest of the
16212 eol stuff mentioned below.
16213
16214 @heading later September 7, 2001 (more like Sep 8)
16215
16216 moved many @code{Lisp_Coding_System *} params to @code{Lisp_Object}. In
16217 general this is the way to go, and if we ever implement a copying GC, we
16218 will never want to be passing direct pointers around. With no
16219 error-checking, we lose no cycles using @code{Lisp_Object}s in place of
16220 pointers -- the @code{Lisp_Object} itself is nothing but a pointer, and
16221 so all the casts and "dereferences" boil down to nothing.
16222
16223 Clarified and cleaned up the "character mode" on streams, and documented
16224 who (caller or object itself) has the right to be setting character mode
16225 on a stream, depending on whether it's a read or write stream. changed
16226 @code{conversion_end_type} method and @code{enum source_sink_type} to
16227 return encoding-centric values, rather than decoding-centric. for the
16228 moment, we're going to be entirely encoding-centric in everything; we
16229 can rethink later. fixed coding systems so that the decode and encode
16230 methods are guaranteed to receive only full characters, if that's the
16231 source type of the data, as per conversion_end_type.
16232
16233 still need to fix the chain method so that it correctly sets the
16234 character mode on all the lstreams in it and checks the source/sink
16235 types to be compatible. also fix @code{decode-coding-string} and
16236 friends to put the appropriate byte->character
16237 (i.e. @code{no-conversion}) coding systems on the ends as necessary so
16238 that the final ends are both character. also add to chain a parameter
16239 giving the ability to switch the direction of conversion of any
16240 particular item in the chain (i.e. swap encoding and decoding). i think
16241 what we really want to do is allow for arbitrary parameters to be put
16242 onto a particular coding system in the chain, of which the only one so
16243 far is swap-encode-decode. don't need too much codage here for that,
16244 but make the design extendable.
16245
16246
16247
16248 @heading September 7, 2001
16249
16250 just added a return value from the decode and encode methods of a coding
16251 system, so that some of the data can get rejected. fixed the calling
16252 routines to handle this. need to investigate when and whether the coding
16253 lstream is set to character mode, so that the decode/encode methods only
16254 get whole characters. if not, we should do so, according to the source
16255 type of these methods. also need to implement the convert_eol coding
16256 system, and fix the subsidiary coding systems (and in general, any coding
16257 system where the eol type is specified and is not LF) to be chains
16258 involving convert_eol.
16259
16260 after everything is working, need to remove eol handling from encode/decode
16261 methods and eventually consider rewriting (simplifying) them given the
16262 reject ability.
16263
16264 @heading September 5, 2001
16265
16266 @itemize
16267 @item
16268 need to organize this. get everything below into the TODO list.
16269 CVS the TODO list frequently so i can delete old stuff. prioritize
16270 it!!!!!!!!!
16271
16272 @item
16273 move @file{README.ben-mule...} to @file{STATUS.ben-mule...}; use
16274 @file{README} for intro, overview of what's new, what's broken, how to
16275 use the features, etc.
16276
16277 @item
16278 need a global and local @samp{coding-category-precedence} list, which
16279 get merged.
16280
16281 @item
16282 finished the BOM support. also finished something not listed below,
16283 expansion to the auto-generator of Unicode-encapsulation to support
16284 bracketing code with @samp{#if ... #endif}, for Cygwin and MINGW
16285 problems, e.g. This is tested; appears to work.
16286
16287 @item
16288 need to add more multibyte coding systems now that we have various
16289 properties to specify them. need to add DEFUN's for mac-code-page
16290 and ebcdic-code-page for completeness. need to rethink the whole
16291 way that the priority list works. it will continue to be total
16292 junk until multiple levels of likeliness get implemented.
16293
16294 @item
16295 need to finish up the stuff about the various defaults. [need to
16296 investigate more generally where all the different default values
16297 are that control encoding. (there are six places or so.) need to
16298 list them in @code{make-coding-system} docs and put pointers
16299 elsewhere. [[[[#### what interface to specify that this default
16300 should be unicode? a "Unicode" language environment seems too
16301 drastic, as the language environment controls much more.]]]] even
16302 skipping the Unicode stuff here, we need to survey and list the
16303 variables that control coding page behavior and determine how they
16304 need to be set for various possible scenarios:
16305
16306 @itemize
16307 @item
16308 total binary: no detection at all.
16309
16310 @item
16311 raw-text only: wants only autodetection of line endings, nothing else.
16312
16313 @item
16314 "standard Windows environment": tries for Unicode, falls back on
16315 code page encoding.
16316
16317 @item
16318 some sort of East European environment, and Russian.
16319
16320 @item
16321 some sort of standard Japanese Windows environment.
16322
16323 @item
16324 standard Chinese Windows environments (traditional and simplified)
16325
16326 @item
16327 various Unix environments (European, Japanese, Russian, etc.)
16328
16329 @item
16330 Unicode support in all of these when it's reasonable
16331 @end itemize
16332 @end itemize
16333
16334 These really require multiple likelihood levels to be fully
16335 implementable. We should see what can be done ("gracefully fall
16336 back") with single likelihood level. need lots of testing.
16337
16338 @itemize
16339 @item
16340 need to fix the truename problem.
16341
16342 @item
16343 lots of testing: need to test all of the stuff above and below that's
16344 recently been implemented.
16345 @end itemize
16346
16347
16348 @heading September 4, 2001
16349
16350 mostly everything compiles. currently there is a crash in
16351 @code{parse-unicode-translation-table}, and Cygwin/Mule won't run. it
16352 may well be a bug in the @code{sscanf()} in Cygwin.
16353
16354 working on today:
16355
16356 @itemize
16357 @item
16358 adding BOM support for Unicode coding systems. mostly there, but
16359 need to finish adding BOM support to the detection routines. then test.
16360
16361 @item
16362 adding properties to @code{unicode-to-multibyte} to specify the coding
16363 system in various flexible ways, e.g. directly specified code page or
16364 ansi or oem code page of specified locale, current locale, user-default
16365 or system-default locale. need to test.
16366
16367 @item
16368 creating a `multibyte' coding system, with the same parameters as
16369 unicode-to-multibyte and which resolves at coding-system-creation
16370 time to the appropriate chain. creating the underlying mechanism
16371 to allow such under-the-scenes switcheroo. need to test.
16372
16373 @item
16374 set default-value of @code{buffer-file-coding-system} to
16375 mswindows-multibyte, as Matej said it should be. need to test.
16376 need to investigate more generally where all the different default
16377 values are that control encoding. (there are six places or so.)
16378 need to list them in make-coding-system docs and put pointers
16379 elsewhere. #### what interface to specify that this default should
16380 be unicode? a "Unicode" language environment seems too drastic, as
16381 the language environment controls much more.
16382
16383 @item
16384 thinking about adding multiple levels of certainty to the detection
16385 schemes, instead of just a mask. eventually, we need to totally
16386 abstract things, but that can easier be done in many steps. (we
16387 need multiple levels of likelihood to more reasonably support a
16388 Windows environment with code-page type files. currently, in order
16389 to get them detected, we have to put them first, because they can
16390 look like lots of other things; but then, other encodings don't get
16391 detected. with multiple levels of likelihood, we still put the
16392 code-page categories first, but they will return low levels of
16393 likelihood. Lower-down encodings may be able to return higher
16394 levels of likelihood, and will get taken preferentially.)
16395
16396 @item
16397 making it so you cannot disable file-coding, but you get an
16398 equivalent default on Unix non-Mule systems where all defaults are
16399 `binary'. need to test!!!!!!!!!
16400 @end itemize
16401
16402 Matej (mostly, + some others) notes the following problems, and here
16403 are possible solutions:
16404
16405 @itemize
16406 @item
16407 he wants the defaults to work right. [figure out what those
16408 defaults are. i presume they are auto-detection of data in current
16409 code page and in unicode, and new files have current code page set
16410 as their output encoding.]
16411
16412 @item
16413 too easy to lose data with incorrect encodings. [need to set up an
16414 error system for encoding/decoding. extremely important but a
16415 little tricky to implement so let's deal with other issues now.]
16416
16417 @item
16418 EOL isn't always detected correctly. [#### ?? need examples]
16419
16420 @item
16421 truename isn't working: @file{c:\t.txt} and @file{c:\tmp.txt} have the
16422 same truename. [should be easy to fix]
16423
16424 @item
16425 unicode files lose the BOM mark. [working on this]
16426
16427 @item
16428 command-line utilities use OEM. [actually it seems more
16429 complicated. it seems they use the codepage of the console. we
16430 may be able to set that, e.g. to UTF8, before we invoke a command.
16431 need to investigate.]
16432
16433 @item
16434 no way to handle unicode characters not recognized as charsets. [we
16435 need to create something like 8 private 2-dimensional charsets to
16436 handle all BMP Unicode chars. Obviously this is a stopgap
16437 solution. Switching to Unicode internal will ultimately make life
16438 far easier and remove the BMP limitation. but for now it will
16439 work. we translate all characters where we have charsets into
16440 chars in those charsets, and the remainder in a unicode charset.
16441 that way we can save them out again and guarantee no data loss with
16442 unicode. this creates font problems, though ...]
16443
16444 @item
16445 problems with xemacs font handling. [xemacs font handling is not
16446 sophisticated enough. it goes on a charset granularity basis and
16447 only looks for a font whose name contains the corresponding windows
16448 charset in it. with unicode this fails in various ways. for one
16449 the granularity needs to be single character, so that those unicode
16450 charsets mentioned above work; and it needs to query the font to
16451 see what unicode ranges it supports, rather than just looking at
16452 the charset ending.]
16453 @end itemize
16454
16455
16456 @heading August 28, 2001
16457
16458 working on getting everything to compile again: Cygwin, non-MULE,
16459 pdump. not there yet.
16460
16461 @code{mswindows-multibyte} is now defined using chain, and works.
16462 removed most vestiges of the @code{mswindows-multibyte} coding system
16463 type.
16464
16465 file-coding is on by default; should default to binary only on Unix.
16466 Need to test. (Needs to compile first :-)
16467
16468 @heading August 26, 2001
16469
16470 I've fixed the issue of inputting non-ASCII text under -nuni, and done
16471 some of the work on the Russian @key{C-x} problem -- we now compute the
16472 other possibilities. We still need to fix the key-lookup code, though,
16473 and that code is unfortunately a bit ugly. the best way, it seems, is
16474 to expand the command-builder structure so you can specify different
16475 interpretations for keys. (if we do find an alternative binding, though,
16476 we need to mess with both the command builder and this-command-keys, as
16477 does the function-key stuff. probably need to abstract that munging
16478 code.)
16479
16480 high-priority:
16481
16482 @table @strong
16483
16484 @item [currently doing]
16485
16486 @itemize
16487 @item
16488 support for @code{WM_IME_CHAR}. IME input can work under @code{-nuni}
16489 if we use @code{WM_IME_CHAR}. probably we should always be using this,
16490 instead of snarfing input using @code{WM_COMPOSITION}. i'll check this
16491 out.
16492
16493 @item
16494 Russian @key{C-x} problem. see above.
16495 @end itemize
16496
16497 @item [clean-up]
16498
16499 @itemize
16500 @item
16501 make sure it compiles and runs under non-mule. remember that some
16502 code needs the unicode support, or at least a simple version of it.
16503
16504 @item
16505 make sure it compiles and runs under pdump. see below.
16506
16507 @item
16508 clean up @code{mswindows-multibyte}, @code{TSTR_TO_C_STRING}. see
16509 below. [DONE]
16510
16511 @item
16512 eliminate last vestiges of codepage<->charset conversion and similar stuff.
16513 @end itemize
16514
16515 @item [other]
16516
16517 @itemize
16518 @item
16519 cut and paste. see below.
16520 @item
16521 misc issues with handling lang environments. see also August 25,
16522 "finally: working on the C-x in ...".
16523 @itemize
16524 @item
16525 when switching lang env, needs to set keyboard layout.
16526 @item
16527 user var to control whether, when moving into text of a
16528 particular language, we set the appropriate keyboard layout. we
16529 would need to have a lisp api for retrieving and setting the
16530 keyboard layout, set text properties to indicate the layout of
16531 text, and have a way of dealing with text with no property on
16532 it. (e.g. saved text has no text properties on it.) basically,
16533 we need to get a keyboard layout from a charset; getting a
16534 language would do. Perhaps we need a table that maps charsets
16535 to language environments.
16536 @item
16537 test that the lang env is properly set at startup. test that
16538 switching the lang env properly sets the C locale (call
16539 setlocale(), set LANG, etc.) -- a spawned subprogram should have
16540 the new locale in its environment.
16541 @end itemize
16542 @item
16543 look through everything below and see if anything is missed in this
16544 priority list, and if so add it. create a separate file for the
16545 priority list, so it can be updated as appropriate.
16546 @end itemize
16547 @end table
16548
16549 mid-priority:
16550
16551 @itemize
16552 @item
16553 clean up the chain coding system. its list should specify decode
16554 order, not encode; i now think this way is more logical. it should
16555 check the endpoints to make sure they make sense. it should also
16556 allow for the specification of "reverse-direction coding systems":
16557 use the specified coding system, but invert the sense of decode and
16558 encode.
16559
16560 @item
16561 along with that, places that take an arbitrary coding system and
16562 expect the ends to be anything specific need to check this, and add
16563 the appropriate conversions from byte->char or char->byte.
16564
16565 @item
16566 get some support for arabic, thai, vietnamese, japanese jisx 0212:
16567 at least get the unicode information in place and make sure we have
16568 things tied together so that we can display them. worry about r2l
16569 some other time.
16570 @end itemize
16571
16572 @heading August 25, 2001
16573
16574 There is actually more non-Unicode-ized stuff, but it's basically
16575 inconsequential. (See previous note.) You can check using the file
16576 nmkun.txt (#### RENAME), which is just a list of all the routines that
16577 have been split. (It was generated from the output of `nmake
16578 unicode-encapsulate', after removing everything from the output but
16579 the function names.) Use something like
16580
16581 @example
16582 fgrep -f ../nmkun.txt -w [a-hj-z]*.[ch] |m
16583 @end example
16584
16585 in the source directory, which does a word match and skips
16586 @file{intl-unicode-win32.[ch]} and @file{intl-win32.[ch]}, which have a
16587 whole lot of references to these, unavoidably. It effectively detects
16588 what needs to be changed because changed versions either begin
16589 @samp{qxe...} or end with A or W, and in each case there's no whole-word
16590 match.
16591
16592 The nasty bug has been fixed below. The @code{-nuni} option now works
16593 -- all specially-written code to handle the encapsulation has been
16594 tested by some operation (fonts by loadup and checking the output of
16595 @code{(list-fonts "")}; devmode by printing; dragdrop tests other
16596 stuff).
16597
16598 NOTE: for @code{-nuni} (Win 95), areas need work:
16599
16600 @itemize
16601 @item
16602 cut and paste. we should be able to receive Unicode text if it's there,
16603 and we should be able to receive it even in Win 95 or @code{-nuni}. we
16604 should just check in all circumstances. also, under 95, when we put
16605 some text in the clipboard, it may or may not also be automatically
16606 enumerated as unicode. we need to test this out and/or just go ahead
16607 and manually do the unicode enumeration.
16608
16609 @item
16610 receiving keyboard input. we get only a single byte, but we should
16611 be able to correlate the language of the keyboard layout to a
16612 particular code page, so we can then decode it correctly.
16613
16614 @item
16615 @code{mswindows-multibyte}. still implemented as its own thing. should
16616 be done as a chain of (encoding) unicode | unicode-to-multibyte. need
16617 to turn this on, get it working, and look into optimizations in the dfc
16618 stuff. (#### perhaps there's a general way to do these optimizations???
16619 something like having a method on a coding system that can specify
16620 whether a pure-ASCII string gets rendered as pure-ASCII bytes and
16621 vice-versa.)
16622 @end itemize
16623
16624 ALSO:
16625
16626 @itemize
16627 @item
16628 we have special macros @code{TSTR_TO_C_STRING} and such because formerly
16629 the @samp{DFC} macros didn't know about external stuff that was Unicode
16630 encoded and would call @code{strlen()} on them. this is fixed, so now
16631 we should undo the special macros, make em normal, removal the comments
16632 about this, and make sure it works. [DONE]
16633
16634
16635 @item
16636 finally: working on the @kbd{C-x} in Russian key layout problem. in the
16637 process will probably end up doing work on cleaning up the handling
16638 of keyboard layouts, integrating or deleting the FSF stuff, adding
16639 code to change the keyboard layout as we move in and out of text in
16640 different languages (implemented as a post-command-hook; we need
16641 something like internal-post-command-hook if not already there, for
16642 internal stuff that doesn't want to get mixed up with the regular
16643 post-command-hook; similar for pre-command-hook). also, when
16644 langenv changes, ways to set the keyboard layout appropriately.
16645
16646 @item
16647 i think the stuff above is higher priority than the other stuff
16648 mentioned below. what i'm aiming for is to be able to input and
16649 work with multiple languages without weird glitches, both under 95
16650 and NT. the problems above are all basic impediments to such work.
16651 we assume for the moment that the user can make use of the existing
16652 file i/o conversion stuff, and put that lower in priority, after
16653 the basic input is working.
16654
16655 @item
16656 i should get my modem connected and write up what's going on and
16657 send it to the lists; also cvs commit my workspaces and get more
16658 testers.
16659 @end itemize
16660
16661 August 24, 2001:
16662
16663 All code has been Unicode-ized except for some stuff in console-msw.c
16664 that deals with console output. Much of the Unicode-encapsulation
16665 stuff, particularly the hand-written stuff, really needs testing. I
16666 added a new command-line option, @code{-nuni}, to force use of all ANSI
16667 calls -- @code{XE_UNICODEP} evaluates to false in this case.
16668
16669 There is a nasty bug that appeared recently, probably when the event
16670 code got Unicode-ized -- bad interactions with OS sticky modifiers.
16671 Hold the shift key down and release it, then instead of affecting the
16672 next char only, it gets permanently stuck on (until you do a regular
16673 shift+char stroke). This needs to be debugged.
16674
16675 Other things on agenda:
16676
16677 @itemize
16678 @item
16679 go through and prioritize what's listed below.
16680
16681 @item
16682 make sure the pdump code can compile and work. for the moment we
16683 just don't try to dump any Unicode tables and load them up each
16684 time. this is certainly fast but ...
16685
16686 @item
16687 there's the problem that XEmacs can't be run in a directory with
16688 non-ASCII/Latin-1 chars in it, since it will be doing Unicode processing
16689 before we've had a chance to load the tables. In fact, even finding the
16690 tables in such a situation is problematic using the normal commands. my
16691 idea is to eventually load the stuff extremely extremely early, at the
16692 same time as the pdump data gets loaded. in fact, the unicode table
16693 data (stored in an efficient binary format) can even be stuck into the
16694 pdump file (which would mean as a resource to the executable, for
16695 windows). we'd need to extend pdump a bit: to allow for attaching extra
16696 data to the pdump file. (something like @code{pdump_attach_extra_data
16697 (addr, length)} returns a number of some sort, an index into the file,
16698 which you can then retrieve with @code{pdump_load_extra_data()}, which
16699 returns an addr (@code{mmap()}ed or loaded), and later you
16700 @code{pdump_unload_extra_data()} when finished. we'd probably also need
16701 @code{pdump_attach_extra_data_append()}, which appends data to the data
16702 just written out with @code{pdump_attach_extra_data()}. this way,
16703 multiple tables in memory can be written out into one contiguous
16704 table. (we'd use the tar-like trick of allowing new blocks to be written
16705 without going back to change the old blocks -- we just rely on the end
16706 of file/end of memory.) this same mechanism could be extracted out of
16707 pdump and used to handle the non-pdump situation (or alternatively, we
16708 could just dump either the memory image of the tables themselves or the
16709 compressed binary version). in the case of extra unicode tables not
16710 known about at compile time that get loaded before dumping, we either
16711 just dump them into the image (pdump and all) or extract them into the
16712 compressed binary format, free the original tables, and treat them like
16713 all other tables.
16714
16715 @item
16716 @kbd{C-x b} when using a Russian keyboard layout. XEmacs currently
16717 tries to interpret @samp{C+cyrillic char}, which causes an error. We
16718 want @kbd{C-x b} to still work even when the keyboard normally generates
16719 Cyrillic. What we should do is expand the keyboard event structure so
16720 that it contains not only the actual char, but what the char would have
16721 been in various other keyboard layouts, and in contexts where only
16722 certain keystrokes make sense (creating control chars, and looking up in
16723 keymaps), we proceed in order, processing each of them until we get
16724 something. order should be something like: current keyboard layout;
16725 layout of the current language environment; layout of the user's default
16726 language; layout of the system default language; layout of US English.
16727
16728 @item
16729 reading and writing Unicode files. multiple problems:
16730
16731 @itemize
16732 @item
16733 EOL's aren't handled right. for the moment, just fix the
16734 Unicode coding systems; later on, create EOL-only coding
16735 systems:
16736
16737 @enumerate
16738 @item
16739 they would be character->character and operate next to the
16740 internal data; this means that coding systems need to be able
16741 to handle ends of lines that are either CR, LF, or CRLF.
16742 usually this isn't a problem, as they are just characters
16743 like any other and get encoded appropriately. however,
16744 coding systems that are line-oriented need to recognize any
16745 of the three as line endings.
16746
16747 @item
16748 we'd also have to complete the stuff that handles coding
16749 systems where either end can be byte or char (four
16750 possibilities total; use a single enum such as
16751 @code{ENCODES_CHAR_TO_BYTE}, @code{ENCODES_BYTE_TO_BYTE}, etc.).
16752
16753 @item
16754 we'd need ways of specifying the chaining of coding systems.
16755 e.g. when reading a coding system, a user can specify more
16756 than one with a | symbol between them. when a context calls
16757 for a coding system and a chain is needed, the `chain' coding
16758 system is useful; but we should really expand the contexts
16759 where a list of coding systems can be given, and whenever
16760 possible try to inline the chain instead of using a
16761 surrounding @code{chain} coding system.
16762
16763 @item
16764 the @code{chain} needs some work so that it passes all sorts of
16765 lstream commands down to the chain inside it -- it should be
16766 entirely transparent and the fact that there's actually a
16767 surrounding coding system should be invisible. more general
16768 coding system methods might need to be created.
16769
16770 @item
16771 important: we need a way of specifying how detecting works
16772 when we have more than one coding system. we might need more
16773 than a single priority list. need to think about this.
16774 @end enumerate
16775
16776 @item
16777 Unicode files beginning with the BOM are not recognized as such.
16778 we need to fix this; but to make things sensible, we really need
16779 to add the idea of different levels of confidence regarding
16780 what's detected. otherwise, Unicode says "yes this is me" but
16781 others higher up do too. in the process we should probably
16782 finish abstracting the detection system and fix up some
16783 stupidities in it.
16784
16785 @item
16786 When writing a file, we need error detection; otherwise somebody
16787 will create a Unicode file without realizing the coding system
16788 of the buffer is Raw, and then lose all the non-ASCII/Latin-1
16789 text when it's written out. We need two levels
16790
16791 @enumerate
16792 @item
16793 first, a "safe-charset" level that checks before any actual
16794 encoding to see if all characters in the document can safely
16795 be represented using the given coding system. FSF has a
16796 "safe-charset" property of coding systems, but it's stupid
16797 because this information can be automatically derived from
16798 the coding system, at least the vast majority of the time.
16799 What we need is some sort of
16800 alternative-coding-system-precedence-list, langenv-specific,
16801 where everything on it can be checked for safe charsets and
16802 then the user given a list of possibilities. When the user
16803 does "save with specified encoding", they should see the same
16804 precedence list. Again like with other precedence lists,
16805 there's also a global one, and presumably all coding systems
16806 not on other list get appended to the end (and perhaps not
16807 checked at all when doing safe-checking?). safe-checking
16808 should work something like this: compile a list of all
16809 charsets used in the buffer, along with a count of chars
16810 used. that way, "slightly unsafe" charsets can perhaps be
16811 presented at the end, which will lose only a few characters
16812 and are perhaps what the users were looking for.
16813
16814 @item
16815 when actually writing out, we need error checking in case an
16816 individual char in a charset can't be written even though the
16817 charsets are safe. again, the user gets the choice of other
16818 reasonable coding systems.
16819
16820 @item
16821 same thing (error checking, list of alternatives, etc.) needs
16822 to happen when reading! all of this will be a lot of work!
16823 @end enumerate
16824 @end itemize
16825 @end itemize
16826
16827
16828
16829 @heading Announcement, August 20, 2001:
16830
16831 I'm looking for testers. There is a complete and fast implementation
16832 in C of Unicode conversion, translations for almost all of the
16833 standardly-defined charsets that load up automatically and
16834 instantaneously at runtime, coding systems supporting the common
16835 external representations of Unicode [utf-16, ucs-4, utf-8,
16836 little-endian versions of utf-16 and ucs-4; utf-7 is sitting there
16837 with abort[]s where the coding routines should go, just waiting for
16838 somebody to implement], and a nice set of primitives for translating
16839 characters<->codepoints and setting the priority lists used to control
16840 codepoint->char lookup.
16841
16842 It's so far hooked into one place: the Windows IME. Currently I can
16843 select the Japanese IME from the thing on my tray pad in the lower
16844 right corner of the screen, and type Japanese into XEmacs, and you get
16845 Japanese in XEmacs -- regardless of whether you set either your
16846 current or global system locale to Japanese,and regardless of whether
16847 you set your XEmacs lang env as Japanese. This should work for many
16848 other languages, too -- Cyrillic, Chinese either Traditional or
16849 Simplified, and many others, but YMMV. There may be some lurking
16850 bugs (hardly surprising for something so raw).
16851
16852 To get at this, checkout using `ben-mule-21-5', NOT the simpler
16853 *`mule-21-5'. For example
16854
16855 cvs -d :pserver:xemacs@@cvs.xemacs.org:/usr/CVSroot checkout -r ben-mule-21-5 xemacs
16856
16857 or you get the idea. the `-r ben-mule-21-5' is important.
16858
16859 I keep track of my progress in a file called README.ben-mule-21-5 in
16860 the root directory of the source tree.
16861
16862 WARNING: Pdump might not work. Will be fixed rsn.
16863
16864 @heading August 20, 2001
16865
16866 @itemize
16867 @item
16868 still need to sort out demand loading, binary format, etc. figure
16869 out what the goals are and how we're going to achieve them. for
16870 the moment let's just say that running XEmacs in a directory with
16871 Japanese or other weird characters in the name is likely to cause
16872 problems under MS Windows, but once XEmacs is initialized (and
16873 before processing init files), all Unicode support is there.
16874
16875 @item
16876 wrote the size computation routines, although not yet tested.
16877
16878 @item
16879 lots more abstraction of coding systems; almost done.
16880
16881 @item
16882 UNICODE WORKS!!!!!
16883 @end itemize
16884
16885 @heading August 19, 2001
16886
16887 Still needed on the Unicode support:
16888
16889 @itemize
16890 @item
16891 demand loading: load the Unicode table data the first time a
16892 conversion needs to be done.
16893
16894 @item
16895 maybe: table size computation: figure out how big the in-memory
16896 tables actually are.
16897
16898 @item
16899 maybe: create a space-efficient binary format for the data, and a
16900 way to dump out an existing charset's data into this binary format.
16901 it should allow for many such groups of data to be appended
16902 together in one file, such that you can just append the new data
16903 onto the end and not have to go back and modify anything
16904 previously. (like how tar archives work, and how the UFS? for
16905 CD-R's and CD-RW's works.)
16906
16907 @item
16908 maybe: figure out how to be able to access the Unicode tables at
16909 @code{init_intl()} time, before we know how to get at data-directory;
16910 that way we can handle the need for unicode conversions that come up
16911 very early, for example if XEmacs is run from a directory containing
16912 Japanese in it. Presumably we'd want to generalize the stuff in
16913 @file{pdump.c} that deals with the dumper file, so that it can handle
16914 other files -- putting the file either in the directory of the
16915 executable or in a resource, maybe actually attached to the pdump file
16916 itself -- or maybe we just dump the data into the actual executable.
16917 With pdump we could extend pdump to allow for data that's in the pdump
16918 file but not actually mapped at startup, separate from the data that
16919 does get mapped -- and then at runtime the pointer gets restored not
16920 with a real pointer but an offset into the file; another pdump call and
16921 we get some way to access the data. (tricky because it might be in a
16922 resource, not a file. we might have to just tell pdump to mmap or
16923 whatever the data in, and then tell pdump to release it.)
16924
16925 @item
16926 fix multibyte to use unicode. at first, just reverse
16927 @code{mswindows-multibyte-to-unicode} to be @code{unicode-to-multibyte};
16928 later implement something in chain to allow for reversal, for declaring
16929 the ends of the coding systems, etc.
16930
16931 @item
16932 actually make sure that the IME stuff is working!!!
16933 @end itemize
16934
16935 Other things before announcing:
16936
16937 @itemize
16938 @item
16939 change so that the Unicode tables are not pdumped. This means we need
16940 to free any table data out there. Make sure that pdump compiles and try
16941 to finish the pretty-much-already-done stuff already with
16942 @code{XD_STRUCT_ARRAY} and dynamic size computation; just need to see
16943 what's going on with @code{LO_LINK}.
16944 @end itemize
16945
16946 @heading August 14, 2001
16947
16948 To do a diff between this workspace and the mainline, use the most recent sync tags, currently:
16949
16950 @example
16951 cvs diff -r main-branch-ben-mule-21-5-aug-11-2001-sync -r ben-mule-21-5-post-aug-11-2001-sync
16952 @end example
16953
16954 Unicode support:
16955
16956 Unicode support is important for supporting many languages under
16957 Windows, such as Cyrillic, without resorting to translation tables for
16958 particular Windows-specific code pages. Internally, all characters in
16959 Windows can be represented in two encodings: code pages and Unicode.
16960 With Unicode support, we can seamlessly support all Windows
16961 characters. Currently, the test in the drive to support Unicode is if
16962 IME input works properly, since it is being converted from Unicode.
16963
16964 Unicode support also requires that the various Windows API's be
16965 "Unicode-encapsulated", so that they automatically call the ANSI or
16966 Unicode version of the API call appropriately and handle the size
16967 differences in structures. What this means is:
16968
16969 @itemize
16970 @item
16971 first, note that Windows already provides a sort of encapsulation
16972 of all API's that deal with text. All such API's are underlyingly
16973 provided in two versions, with an A or W suffix (ANSI or "wide"
16974 i.e. Unicode), and the compile-time constant UNICODE controls which
16975 is selected by the unsuffixed API. Same thing happens with
16976 structures. Unfortunately, this is compile-time only, not
16977 run-time, so not sufficient. (Creating the necessary run-time
16978 encoding is not conceptually difficult, but very time-consuming to
16979 write. It adds no significant overhead, and the only reason it's
16980 not standard in Windows is conscious marketing attempts by
16981 Microsoft to cripple Windows 95. FUCK MICROSOFT! They even
16982 describe in a KnowledgeBase article exactly how to create such an
16983 API [although we don't exactly follow their procedure], and point
16984 out its usefulness; the procedure is also described more generally
16985 in Nadine Kano's book on Win32 internationalization -- written SIX
16986 YEARS AGO! Obviously Microsoft has such an API available
16987 internally.)
16988
16989 @item
16990 what we do is provide an encapsulation of each standard Windows API
16991 call that is split into A and W versions. current theory is to
16992 avoid all preprocessor games; so we name the function with a prefix
16993 -- "qxe" currently -- and require callers to use the prefixed name.
16994 Callers need to explicitly use the W version of all structures, and
16995 convert text themselves using @code{Qmswindows_tstr}. the qxe
16996 encapsulated version will automatically call the appropriate A or W
16997 version depending on whether we're running on 9x or NT, and copy
16998 data between W and A versions of the structures as necessary.
16999
17000 @item
17001 We require the caller to handle the actual translation of text to
17002 avoid possible overflow when dealing with fixed-size Windows
17003 structures. There are no such problems when copying data between
17004 the A and W versions because ANSI text is never larger than its
17005 equivalent Unicode representation.
17006
17007 @item
17008 We allow for incremental creation of the encapsulated routines by using
17009 the coding system @code{Qmswindows_tstr_notyet}. This is an alias for
17010 @code{Qmswindows_multibyte}, i.e. it always converts to ANSI; but it
17011 indicates that it will be changed to @code{Qmswindows_tstr} when we have
17012 a qxe version of the API call that the data is being passed to and
17013 change the code to use the new function.
17014 @end itemize
17015
17016 Besides creating the encapsulation, the following needs to be done for
17017 Unicode support:
17018
17019 @itemize
17020 @item
17021 No actual translation tables are fed into XEmacs. We need to
17022 provide glue code to read the tables in @file{etc/unicode}. See
17023 @file{etc/unicode/README} for the interface to implement.
17024
17025 @item
17026 Fix pdump. The translation tables for Unicode characters function as
17027 unions of structures with different numbers of indirection levels, in
17028 order to be efficient. pdump doesn't yet support such unions.
17029 @file{charset.h} has a general description of how the translation tables
17030 work, and the pdump code has constants added for the new required data
17031 types, and descriptions of how these should work.
17032
17033 @item
17034 ultimately, there's no end to additional work (composition, bidi
17035 reordering, glyph shaping/ordering, etc.), but the above is enough
17036 to get basic translation working.
17037 @end itemize
17038
17039 Merging this workspace into the trunk requires some work. ChangeLogs
17040 have not yet been created. Also, there is a lot of additional code in
17041 this workspace other than just Windows and Unicode stuff. Some of the
17042 changes have been somewhat disruptive to the code base, in particular:
17043
17044 @itemize
17045 @item
17046 the code that handles the details of processing multilingual text has
17047 been consolidated to make it easier to extend it. it has been yanked
17048 out of various files (@file{buffer.h}, @file{mule-charset.h},
17049 @file{lisp.h}, @file{insdel.c}, @file{fns.c}, @file{file-coding.c},
17050 etc.) and put into @file{text.c} and @file{text.h}.
17051 @file{mule-charset.h} has also been renamed @file{charset.h}. all long
17052 comments concerning the representations and their processing have been
17053 consolidated into @file{text.c}.
17054
17055 @item
17056 @file{nt/config.h} has been eliminated and everything in it merged into
17057 @file{config.h.in} and @file{s/windowsnt.h}. see @file{config.h.in} for
17058 more info.
17059
17060 @item
17061 @file{s/windowsnt.h} has been completely rewritten, and
17062 @file{s/cygwin32.h} and @file{s/mingw32.h} have been largely rewritten.
17063 tons of dead weight has been removed, and stuff common to more than one
17064 file has been isolated into @file{s/win32-common.h} and
17065 @file{s/win32-native.h}, similar to what's already done for usg
17066 variants.
17067
17068 @item
17069 large amounts of code throughout the code base have been Mule-ized,
17070 not just Windows code.
17071
17072 @item
17073 @file{file-coding.c/.h} have been largely rewritten (although still
17074 mostly syncable); see below.
17075 @end itemize
17076
17077
17078 @heading June 26, 2001
17079
17080 ben-mule-21-5
17081
17082 this contains all the mule work i've been doing. this includes mostly
17083 work done to get mule working under ms windows, but in the process
17084 i've [of course] fixed a whole lot of other things as well, mostly
17085 mule issues. the specifics:
17086
17087 @itemize
17088 @item
17089 it compiles and runs under windows and should basically work. the
17090 stuff remaining to do is (a) improved unicode support (see below)
17091 and (b) smarter handling of keyboard layouts. in particular, it
17092 should (1) set the right keyboard layout when you change your
17093 language environment; (2) optionally (a user var) set the
17094 appropriate keyboard layout as you move the cursor into text in a
17095 particular language.
17096
17097 @item
17098 i added a bunch of code to better support OS locales. it tries to
17099 notice your locale at startup and set the language environment
17100 accordingly (this more or less works), and call setlocale() and set
17101 LANG when you change the language environment (may or may not work).
17102
17103 @item
17104 major rewriting of file-coding. it's mostly abstracted into coding
17105 systems that are defined by methods (similar to devices and
17106 specifiers), with the ultimate aim being to allow non-i18n coding
17107 systems such as gzip. there is a "chain" coding system that allows
17108 multiple coding systems to be chained together. (it doesn't yet
17109 have the concept that either end of a coding system can be bytes or
17110 chars; this needs to be added.)
17111
17112 @item
17113 unicode support. very raw. a few days ago i wrote a complete and
17114 efficient implementation of unicode translation. it should be very
17115 fast, and fairly memory-efficient in its tables. it allows for
17116 charset priority lists, which should be language-environment
17117 specific (but i haven't yet written the glue code). it works in
17118 preliminary testing, but obviously needs more testing and work.
17119 as of yet there is no translation data added for the standard charsets.
17120 the tables are in etc/unicode, and all we need is a bit of glue code
17121 to process them. see etc/unicode/README for the interface to
17122 implement.
17123
17124 @item
17125 support for unicode in windows is partly there. this will work even
17126 on windows 95. the basic model is implemented but it needs finishing
17127 up.
17128
17129 @item
17130 there is a preliminary implementation of windows ime support courtesy
17131 of ikeyama.
17132
17133 @item
17134 if you want to get cyrillic working under windows (it appears to "work"
17135 but the wrong chars currently appear), the best way is to add unicode
17136 support for iso-8859-5 and use it in redisplay-msw.c. we are already
17137 passing unicode codepoints to the text-draw routine (ExtTextOutW).
17138 (ExtTextOutW and GetTextExtentPoint32W are implemented on both 95 and NT.)
17139
17140 @item
17141 i fixed the iso2022 handling so it will correctly read in files
17142 containing unknown charsets, creating a "temporary" charset which can
17143 later be overwritten by the real charset when it's defined. this allows
17144 iso2022 elisp files with literals in strange languages to compile
17145 correctly under mule. i also added a hack that will correctly read in
17146 and write out the emacs-specific "composition" escape sequences,
17147 i.e. @samp{ESC 0} through @samp{ESC 4}. this means that my workspace correctly
17148 compiles the new file @file{devanagari.el} that i added (see below).
17149
17150 @item
17151 i copied the remaining language-specific files from fsf. i made
17152 some minor changes in certain cases but for the most part the stuff
17153 was just copied and may not work.
17154
17155 @item
17156 i fixed @code{post-read-conversion} in coding systems to follow fsf
17157 conventions. (i also support our convention, for the moment. a
17158 kludge, of course.)
17159
17160 @item
17161 @code{make-coding-system} accepts (but ignores) the additional properties
17162 present in the fsf version, for compatibility.
17163 @end itemize
17164
14162 17165
14163 17166
14164 @node Consoles; Devices; Frames; Windows, The Redisplay Mechanism, Multilingual Support, Top 17167 @node Consoles; Devices; Frames; Windows, The Redisplay Mechanism, Multilingual Support, Top
14165 @chapter Consoles; Devices; Frames; Windows 17168 @chapter Consoles; Devices; Frames; Windows
14166 @cindex consoles; devices; frames; windows 17169 @cindex consoles; devices; frames; windows
17398 @item tty_name 20401 @item tty_name
17399 The name of the terminal that the subprocess is using, 20402 The name of the terminal that the subprocess is using,
17400 or @code{nil} if it is using pipes. 20403 or @code{nil} if it is using pipes.
17401 @end table 20404 @end table
17402 20405
20406 @menu
20407 * Ben's separate stderr notes:: Probably obsolete.
20408 @end menu
20409
20410
20411 @node Ben's separate stderr notes, , , Subprocesses
20412 @subsection Ben's separate stderr notes (probably obsolete)
20413
20414 This node contains some notes that Ben kept on his separate subprocess
20415 workspace. These notes probably describe changes and features that have
20416 already been included in XEmacs 21.5; somebody should check and/or ask
20417 Ben.
20418
20419 @heading ben-separate-stderr-improved-error-trapping
20420
20421 this is an old workspace, very close to being done, containing
20422
20423 @itemize
20424 @item
20425 subprocess stderr output can be read separately; needed to fully
20426 implement call-process with asynch. subprocesses.
20427
20428 @item
20429 huge improvements to the internal error-trapping routines (i.e. the
20430 routines that call Lisp code and trap errors); Lisp code can now be
20431 called from within redisplay.
20432
20433 @item
20434 cleanup and simplification of C-g handling; some things work now
20435 that never used to.
20436
20437 @item
20438 see the ChangeLogs in the workspace.
20439 @end itemize
20440
20441
17403 @node Interface to MS Windows, Interface to the X Window System, Subprocesses, Top 20442 @node Interface to MS Windows, Interface to the X Window System, Subprocesses, Top
17404 @chapter Interface to MS Windows 20443 @chapter Interface to MS Windows
17405 @cindex MS Windows, interface to 20444 @cindex MS Windows, interface to
17406 @cindex Windows, interface to 20445 @cindex Windows, interface to
17407 20446
17408 @menu 20447 @menu
17409 * Different kinds of Windows environments:: 20448 * Different kinds of Windows environments::
17410 * Windows Build Flags:: 20449 * Windows Build Flags::
17411 * Windows I18N Introduction:: 20450 * Windows I18N Introduction::
17412 * Modules for Interfacing with MS Windows:: 20451 * Modules for Interfacing with MS Windows::
20452 * CHANGES from 21.4-windows branch:: Probably obsolete.
17413 @end menu 20453 @end menu
17414 20454
17415 @node Different kinds of Windows environments, Windows Build Flags, Interface to MS Windows, Interface to MS Windows 20455 @node Different kinds of Windows environments, Windows Build Flags, Interface to MS Windows, Interface to MS Windows
17416 @section Different kinds of Windows environments 20456 @section Different kinds of Windows environments
17417 @cindex different kinds of Windows environments 20457 @cindex different kinds of Windows environments
17873 definition with a call to the macro XETEXT. This appropriately makes a 20913 definition with a call to the macro XETEXT. This appropriately makes a
17874 string of either regular or wide chars, which is to say this string may be 20914 string of either regular or wide chars, which is to say this string may be
17875 prepended with an L (causing it to be a wide string) depending on 20915 prepended with an L (causing it to be a wide string) depending on
17876 XEUNICODE_P. 20916 XEUNICODE_P.
17877 20917
17878 @node Modules for Interfacing with MS Windows, , Windows I18N Introduction, Interface to MS Windows 20918 @node Modules for Interfacing with MS Windows, CHANGES from 21.4-windows branch, Windows I18N Introduction, Interface to MS Windows
17879 @section Modules for Interfacing with MS Windows 20919 @section Modules for Interfacing with MS Windows
17880 @cindex modules for interfacing with MS Windows 20920 @cindex modules for interfacing with MS Windows
17881 @cindex interfacing with MS Windows, modules for 20921 @cindex interfacing with MS Windows, modules for
17882 @cindex MS Windows, modules for interfacing with 20922 @cindex MS Windows, modules for interfacing with
17883 @cindex Windows, modules for interfacing with 20923 @cindex Windows, modules for interfacing with
17934 @item intl-auto-encap-win32.c 20974 @item intl-auto-encap-win32.c
17935 Auto-generated Unicode encapsulation functions 20975 Auto-generated Unicode encapsulation functions
17936 @item intl-auto-encap-win32.h 20976 @item intl-auto-encap-win32.h
17937 Auto-generated Unicode encapsulation headers 20977 Auto-generated Unicode encapsulation headers
17938 @end table 20978 @end table
20979
20980
20981 @node CHANGES from 21.4-windows branch, , Modules for Interfacing with MS Windows, Interface to MS Windows
20982 @section CHANGES from 21.4-windows branch (probably obsolete)
20983
20984 This node contains the @file{CHANGES-msw} log that Andy Piper kept while
20985 he was maintaining the Windows branch of 21.4. These changes have
20986 (presumably) long since been merged to both 21.4 and 21.5, but let's not
20987 throw the list away yet.
20988
20989 @heading CHANGES-msw
20990
20991 This file briefly describes all mswindows-specific changes to XEmacs
20992 in the OXYMORON series of releases. The mswindows release branch
20993 contains additional changes on top of the mainline XEmacs
20994 release. These changes are deemed necessary for XEmacs to be fully
20995 functional under mswindows. It is not intended that these changes
20996 cause problems on UNIX systems, but they have not been tested on UNIX
20997 platforms. Caveat Emptor.
20998
20999 See the file @file{CHANGES-release} for a full list of mainline changes.
21000
21001 @heading to XEmacs 21.4.9 "Informed Management (Windows)"
21002
21003 @itemize
21004 @item
21005 Fix layout of widgets so that the search dialog works.
21006
21007 @item
21008 Fix focus capture of widgets under X.
21009 @end itemize
21010
21011 @heading to XEmacs 21.4.8 "Honest Recruiter (Windows)"
21012
21013 @itemize
21014 @item
21015 All changes from 21.4.6 and 21.4.7.
21016
21017 @item
21018 Make sure revert temporaries are not visiting files. Suggested by
21019 Mike Alexander.
21020
21021 @item
21022 File renaming fix from Mathias Grimmberger.
21023
21024 @item
21025 Fix printer metrics on windows 95 from Jonathan Harris.
21026
21027 @item
21028 Fix layout of widgets so that the search dialog works.
21029
21030 @item
21031 Fix focus capture of widgets under X.
21032
21033 @item
21034 Buffers tab doc fixes from John Palmieri.
21035
21036 @item
21037 Sync with FSF custom @code{:set-after} behavior.
21038
21039 @item
21040 Virtual window manager freeze fix from Rick Rankin.
21041
21042 @item
21043 Fix various printing problems.
21044
21045 @item
21046 Enable windows printing on cygwin.
21047 @end itemize
21048
21049 @heading to XEmacs 21.4.7 "Economic Science (Windows)"
21050
21051 @itemize
21052 @item
21053 All changes from 21.4.6.
21054
21055 @item
21056 Fix problems with auto-revert with noconfirm.
21057
21058 @item
21059 Undo autoconf 2.5x changes.
21060
21061 @item
21062 Undo 21.4.7 process change.
21063 @end itemize
21064
21065 to XEmacs 21.4.6 "Common Lisp (Windows)"
21066
21067 @itemize
21068 @item
21069 Made native registry entries match the installer.
21070
21071 @item
21072 Fixed mousewheel lockups.
21073
21074 @item
21075 Frame iconifcation fix from Adrian Aichner.
21076
21077 @item
21078 Fixed some printing problems.
21079
21080 @item
21081 Netinstaller updated to support kit revisions.
21082
21083 @item
21084 Fixed customize popup menus.
21085
21086 @item
21087 Fixed problems with too many dialog popups.
21088
21089 @item
21090 Netinstaller fixed to correctly upgrade shortcuts when upgrading
21091 core XEmacs.
21092
21093 @item
21094 Fix for virtual window managers from Adrian Aichner.
21095
21096 @item
21097 Installer registers all C++ file types.
21098
21099 @item
21100 Short-filename fix from Peter Arius.
21101
21102 @item
21103 Fix for GC assertions from Adrian Aichner.
21104
21105 @item
21106 Winclient DDE client from Alastair Houghton.
21107
21108 @item
21109 Fix event assert from Mike Alexander.
21110
21111 @item
21112 Warning removal noticed by Ben Wing.
21113
21114 @item
21115 Redisplay glyph height fix from Ben Wing.
21116
21117 @item
21118 Printer margin fix from Jonathan Harris.
21119
21120 @item
21121 Error dialog fix suggested by Thomas Vogler.
21122
21123 @item
21124 Fixed revert-buffer to not revert in the case that there is
21125 nothing to be done.
21126
21127 @item
21128 Glyph-baseline fix from Nix.
21129
21130 @item
21131 Fixed clipping of wide glyphs in non-zero-length extents.
21132
21133 @item
21134 Windows build fixes.
21135
21136 @item
21137 Fixed @code{:initial-focus} so that it works.
21138 @end itemize
21139
21140 @heading to XEmacs 21.4.5 "Civil Service (Windows)"
21141
21142 @itemize
21143 @item
21144 Fixed a scrollbar problem when selecting the frame with focus.
21145
21146 @item
21147 Fixed @code{mswindows-shell-execute} under cygwin.
21148
21149 @item
21150 Added a new function @code{mswindows-cygwin-to-win32-path} for JDE.
21151
21152 @item
21153 Added support for dialog-based directory selection.
21154
21155 @item
21156 The installer version has been updated to the 21.5 netinstaller. The 21.5
21157 installer now does proper dde file association and adds uninstall
21158 capability.
21159
21160 @item
21161 Handle leak fix from Mike Alexander.
21162
21163 @item
21164 New release build script.
21165 @end itemize
21166
21167
17939 21168
17940 @node Interface to the X Window System, Dumping, Interface to MS Windows, Top 21169 @node Interface to the X Window System, Dumping, Interface to MS Windows, Top
17941 @chapter Interface to the X Window System 21170 @chapter Interface to the X Window System
17942 @cindex X Window System, interface to the 21171 @cindex X Window System, interface to the
17943 21172