771
|
1 List of changes in new Mule workspace:
|
|
2 --------------------------------------
|
|
3
|
|
4 Deleted files:
|
|
5
|
|
6 src/iso-wide.h
|
|
7 src/mule-charset.h
|
|
8 src/mule.c
|
|
9 src/ntheap.h
|
|
10 src/syscommctrl.h
|
|
11 lisp/files-nomule.el
|
|
12 lisp/help-nomule.el
|
|
13 lisp/mule/mule-help.el
|
|
14 lisp/mule/mule-init.el
|
|
15 lisp/mule/mule-misc.el
|
|
16 nt/config.h
|
|
17
|
|
18
|
|
19 Other deleted files, all zero-width and accidentally present:
|
|
20
|
|
21 src/events-mod.h
|
|
22 tests/Dnd/README.OffiX
|
|
23 tests/Dnd/dragtest.el
|
|
24 netinstall/README.xemacs
|
|
25 lib-src/srcdir-symlink.stamp
|
|
26
|
|
27 New files:
|
|
28
|
|
29 CHANGES-ben-mule
|
|
30 README.ben-mule-21-5
|
|
31 README.ben-separate-stderr
|
|
32 TODO.ben-mule-21-5
|
|
33 etc/TUTORIAL.{cs,es,nl,sk,sl}
|
|
34 etc/unicode/*
|
|
35 lib-src/make-mswin-unicode.pl
|
|
36 lisp/code-init.el
|
|
37 lisp/resize-minibuffer.el
|
|
38 lisp/unicode.el
|
|
39 lisp/mule/china-util.el
|
|
40 lisp/mule/cyril-util.el
|
|
41 lisp/mule/devan-util.el
|
|
42 lisp/mule/devanagari.el
|
|
43 lisp/mule/ethio-util.el
|
|
44 lisp/mule/indian.el
|
|
45 lisp/mule/japan-util.el
|
|
46 lisp/mule/korea-util.el
|
|
47 lisp/mule/lao-util.el
|
|
48 lisp/mule/lao.el
|
|
49 lisp/mule/mule-locale.txt
|
|
50 lisp/mule/mule-msw-init.el
|
|
51 lisp/mule/thai-util.el
|
|
52 lisp/mule/thai.el
|
|
53 lisp/mule/tibet-util.el
|
|
54 lisp/mule/tibetan.el
|
|
55 lisp/mule/viet-util.el
|
|
56 src/charset.h
|
|
57 src/intl-auto-encap-win32.c
|
|
58 src/intl-auto-encap-win32.h
|
|
59 src/intl-encap-win32.c
|
|
60 src/intl-win32.c
|
|
61 src/intl-x.c
|
|
62 src/mule-coding.c
|
|
63 src/text.c
|
|
64 src/text.h
|
|
65 src/unicode.c
|
|
66 src/s/win32-common.h
|
|
67 src/s/win32-native.h
|
|
68
|
|
69
|
|
70
|
|
71 gzip support:
|
|
72
|
|
73 -- new coding system `gzip' (bytes -> bytes); unfortunately, not quite
|
|
74 working yet because it handles only the raw zlib format and not the
|
|
75 higher-level gzip format (the zlib library is brain-damaged in that it
|
|
76 provides low-level, stream-oriented API's only for raw zlib, and for
|
|
77 gzip you have only high-level API's, which aren't useful for xemacs).
|
|
78 -- configure support (with-zlib).
|
|
79
|
|
80 configure changes:
|
|
81
|
|
82 - file-coding always compiled in. eol detection is off by default on unix,
|
|
83 non-mule, but can be enabled with configure option
|
|
84 --with-default-eol-detection or command-line flag -eol.
|
|
85 - code that selects which files are compiled is mostly moved to
|
|
86 Makefile.in.in. see comment in Makefile.in.in.
|
|
87 - vestigial i18n3 code deleted.
|
|
88 - new cygwin mswin libs imm32 (input methods), mpr (user name enumeration).
|
|
89 - check for link, symlink.
|
|
90 - vfork-related code deleted.
|
|
91 - fix configure.usage. (delete --with-file-coding, --no-doc-file, add
|
|
92 --with-default-eol-detection, --quick-build).
|
|
93 - nt/config.h has been eliminated and everything in it merged into
|
|
94 config.h.in and s/windowsnt.h. see config.h.in for more info.
|
|
95 - massive rewrite of s/windowsnt.h, m/windowsnt.h, s/cygwin32.h,
|
|
96 s/mingw32.h. common code moved into s/win32-common.h, s/win32-native.h.
|
|
97 - in nt/xemacs.mak,config.inc.samp, variable is called MULE, not HAVE_MULE,
|
|
98 for consistency with sources.
|
|
99 - define TABDLY, TAB3 in freebsd.h (#### from where?)
|
|
100
|
|
101 Tutorial:
|
|
102
|
|
103 - massive rewrite; sync to FSF 21.0.106, switch focus to window systems,
|
|
104 new sections on terminology and multiple frames, lots of fixes for
|
|
105 current xemacs idioms.
|
|
106 - german version from Adrian mostly matching my changes.
|
|
107 - copy new tutorials from FSF (Spanish, Dutch, Slovak, Slovenian, Czech);
|
|
108 not updated yet though.
|
|
109 - eliminate help-nomule.el and mule-help.el; merge into one single tutorial
|
|
110 function, fix lots of problems, put back in help.el where it belongs.
|
|
111 (there was some random junk in help-nomule -- string-width and make-char.
|
|
112 string-width is now in subr.el with a single definition, and make-char in
|
|
113 text.c.)
|
|
114
|
|
115 Sample init file:
|
|
116
|
|
117 - remove forward/backward buffer code, since it's now standard.
|
|
118 - when disabling C-x C-c, make it display a message saying how to exit, not
|
|
119 just beep and complain "undefined".
|
|
120
|
|
121 Key bindings: (keymap.c, keydefs.el, help.el, etc.)
|
|
122
|
|
123 - M-home, M-end now move forward and backward in buffers; with Shift, stay
|
|
124 within current group (e.g. all C files; same grouping as the gutter
|
|
125 tabs). (bindings switch-to-{next/previous}-buffer[-in-group] in files.el)
|
|
126 - needed to move code from gutter-items.el to buff-menu.el that's used by
|
|
127 these bindings, since gutter-items.el is loaded only when the gutter is
|
|
128 active and these bindings (and hence the code) is not (any more) gutter
|
|
129 specific.
|
|
130 - new global vars global-tty-map and global-window-system-map specify key
|
|
131 bindings for use only on TTY's or window systems, respectively. this is
|
|
132 used to make ESC ESC be keyboard-quit on window systems, but ESC ESC ESC
|
|
133 on TTY's, where Meta + arrow keys may appear as ESC ESC O A or whatever.
|
|
134 C-z on window systems is now zap-up-to-char, and iconify-frame is moved
|
|
135 to C-Z. ESC ESC is isearch-quit. (isearch-mode.el)
|
|
136 - document global-{tty,window-system}-map in various places; display them
|
|
137 when you do C-h b.
|
|
138 - fix up function documentation in general for keyboard primitives.
|
|
139 e.g. key-bindings now contains a detailed section on the steps prior to
|
|
140 looking up in keymaps, i.e. function-key-map,
|
|
141 keyboard-translate-table. etc. define-key and other obvious starting
|
|
142 points indicate where to look for more info.
|
|
143 - eliminate use and mention of grody advertised-undo and
|
|
144 deprecated-help. (simple.el, startup.el, picture.el, menubar-items.el)
|
|
145
|
|
146 gnuclient, gnuserv:
|
|
147
|
|
148 - clean up headers a bit.
|
|
149 - use proper ms win idiom for checking for temp directory (TEMP or TMP, not
|
|
150 TMPDIR).
|
|
151
|
|
152 throughout XEmacs sources:
|
|
153
|
|
154 - all #ifdef FILE_CODING statements removed from code.
|
|
155
|
|
156 I/O:
|
|
157
|
|
158 - use PATH_MAX consistently instead of MAXPATHLEN, MAX_PATH, etc.
|
|
159 - all code that does preprocessor games with C lib I/O functions (open,
|
|
160 read) has been removed. The code has been changed to call the correct
|
|
161 function directly. Functions that accept Intbyte * arguments for
|
|
162 filenames and such and do automatic conversion to or from external format
|
|
163 will be prefixed qxe...(). Functions that are retrying in case of EINTR
|
|
164 are prefixed retry_...(). DONT_ENCAPSULATE is long-gone.
|
|
165 - never call getcwd() any more. use our shadowed value always.
|
|
166
|
|
167 Strings:
|
|
168
|
|
169 - new qxe() string functions that accept Intbyte * as arguments. These
|
|
170 work exactly like the standard strcmp(), strcpy(), sprintf(), etc. except
|
|
171 for the argument declaration differences. We use these whenever we have
|
|
172 Intbyte * strings, which is quite often.
|
|
173 - new fun build_intstring() takes an Intbyte *. also new funs
|
|
174 build_msg_intstring (like build_intstring()) and build_msg_string (like
|
|
175 build_string()) to do a GETTEXT() before building the
|
|
176 string. (elimination of old build_translated_string(), replaced by
|
|
177 build_msg_string()).
|
|
178 - the doprnt.c external entry points have been completely rewritten to be
|
|
179 more useful and have more sensible names. We now have, for example,
|
|
180 versions that work exactly like sprintf() but return a malloc()ed string.
|
|
181 - function intern_int() for Intbyte * arguments, like intern().
|
|
182 - numerous places throughout code where char * replaced with something
|
|
183 else, e.g. Char_ASCII *, Intbyte *, Char_Binary *, etc. same with
|
|
184 unsigned char *, going to UChar_Binary *, etc.
|
|
185 - code in print.c that handles stdout, stderr rewritten.
|
|
186 - places that print to stderr directly replaced with stderr_out().
|
|
187 - new convenience functions write_fmt_string(), write_fmt_string_lisp(), stderr_out_lisp(), write_string().
|
|
188
|
|
189 Allocation, Objects, Lisp Interpreter:
|
|
190
|
|
191 - automatically use "managed lcrecord" code when allocating. any lcrecord
|
|
192 can be put on a free list with free_lcrecord().
|
|
193 - record_unwind_protect() returns the old spec depth.
|
|
194 - unbind_to() now takes only one arg. use unbind_to_1() if you want the
|
|
195 2-arg version, with GC protection of second arg.
|
|
196 - new funs to easily inhibit GC. ({begin,end}_gc_forbidden()) use them in
|
|
197 places where gc is currently being inhibited in a more ugly fashion.
|
|
198 also, we disable GC in certain strategic places where string data is
|
|
199 often passed in, e.g. dfc functions, print functions.
|
|
200 - major improvements to eistring code, fleshing out of missing funs.
|
|
201 - make_buffer() -> wrap_buffer() for consistency with other objects; same
|
|
202 for make_frame() -> wrap_frame() and make_console() -> wrap_console().
|
|
203 - better documentation in condition-case.
|
|
204 - new convenience funs record_unwind_protect_freeing() and
|
|
205 record_unwind_protect_freeing_dynarr() for conveniently setting up an
|
|
206 unwind-protect to xfree() or Dynarr_free() a pointer.
|
|
207
|
|
208 Init code:
|
|
209
|
|
210 - lots of init code rewritten to be mule-correct.
|
|
211
|
|
212 Processes:
|
|
213
|
|
214 - always call egetenv(), never getenv(), for mule correctness.
|
|
215
|
|
216 s/m files:
|
|
217
|
|
218 - removal of unused DATA_END, TEXT_END, SYSTEM_PURESIZE_EXTRA, HAVE_ALLOCA
|
|
219 (automatically determined)
|
|
220 - removal of vfork references (we no longer use vfork)
|
|
221
|
|
222
|
|
223 make-docfile:
|
|
224
|
|
225 - clean up headers a bit.
|
|
226 - allow .obj to mean equivalent .c, just like for .o.
|
|
227 - allow specification of a "response file" (a command-line argument
|
|
228 beginning with @, specifying a file containing further command-line
|
|
229 arguments) -- a standard mswin idiom to avoid potential command-line
|
|
230 limits and to simplify makefiles. use this in xemacs.mak.
|
|
231
|
|
232 debug support:
|
|
233
|
|
234 - (cmdloop.el) new var breakpoint-on-error, which breaks into the C
|
|
235 debugger when an unhandled error occurs noninteractively. useful when
|
|
236 debugging errors coming out of complicated make scripts, e.g. package
|
|
237 compilation, since you can set this through an env var.
|
|
238 - (startup.el) new env var XEMACSDEBUG, specifying a Lisp form executed
|
|
239 early in the startup process; meant to be used for turning on debug flags
|
|
240 such as breakpoint-on-error or stack-trace-on-error, to track down
|
|
241 noninteractive errors.
|
|
242 - (cmdloop.el) removed non-working code in command-error to display a
|
|
243 backtrace on debug-on-error. use stack-trace-on-error instead to get
|
|
244 this.
|
|
245 - (process.c) new var debug-process-io displays data sent to and received
|
|
246 from a process.
|
|
247 - (alloc.c) staticpros have name stored with them for easier debugging.
|
|
248 - (emacs.c) code that handles fatal errors consolidated and rewritten.
|
|
249 much more robust and correctly handles all fatal exits on mswin
|
|
250 (e.g. aborts, not previously handled right).
|
|
251
|
|
252 command line (startup.el, emacs.c):
|
|
253
|
|
254 - new option -eol to enable auto EOL detection under non-mule unix.
|
|
255 - new option -nuni (--no-unicode-lib-calls) to force use of non-Unicode
|
|
256 API's under Windows NT, mostly for debugging purposes.
|
|
257 - help message fixed up (divided into sections), existing problem causing
|
|
258 incomplete output fixed, undocumented options documented.
|
|
259
|
|
260 startup.el:
|
|
261
|
|
262 - move init routines from before-init-hook or after-init-hook; just call
|
|
263 them directly (init-menubar-at-startup, init-mule-at-startup).
|
|
264
|
|
265 frame.el:
|
|
266
|
|
267 - delete old commented-out code.
|
|
268
|
|
269 Mule changes:
|
|
270
|
|
271 Major:
|
|
272
|
|
273 - the code that handles the details of processing multilingual text has
|
|
274 been consolidated to make it easier to extend it. it has been yanked out
|
|
275 of various files (buffer.h, mule-charset.h, lisp.h, insdel.c, fns.c,
|
|
276 file-coding.c, etc.) and put into text.c and text.h. mule-charset.h has
|
|
277 also been renamed charset.h. all long comments concerning the
|
|
278 representations and their processing have been consolidated into text.c.
|
|
279 - major rewriting of file-coding. it's mostly abstracted into coding
|
|
280 systems that are defined by methods (similar to devices and
|
|
281 specifiers), with the ultimate aim being to allow non-i18n coding
|
|
282 systems such as gzip. there is a "chain" coding system that allows
|
|
283 multiple coding systems to be chained together. (it doesn't yet
|
|
284 have the concept that either end of a coding system can be bytes or
|
|
285 chars; this needs to be added.)
|
|
286 - large amounts of code throughout the code base have been Mule-ized,
|
|
287 not just Windows code.
|
|
288 - total rewriting of OS locale code. it notices your locale at startup and
|
|
289 sets the language environment accordingly, and calls setlocale() and sets
|
|
290 LANG when you change the language environment. new language environment
|
|
291 properties locale, mswindows-locale, cygwin-locale, native-coding-system,
|
|
292 to determine langenv from locale and vice-versa; fix all language
|
|
293 environments (lots of language files). langenv startup code rewritten.
|
|
294 many new functions to convert between locales, language environments,
|
|
295 etc.
|
|
296 - major overhaul of the way default values for the various coding system
|
|
297 variables are handled. all default values are collected into one
|
|
298 location, a new file code-init.el, which provides a unified mechanism for
|
|
299 setting and querying what i call "basic coding system variables" (which
|
|
300 may be aliases, parts of conses, etc.) and a mechanism of different
|
|
301 configurations (Windows w/Mule, Windows w/o Mule, Unix w/Mule, Unix w/o
|
|
302 Mule, unix w/o Mule but w/auto EOL), each of which specifies a set of
|
|
303 default values. we determine the configuration at startup and set all
|
|
304 the values in one place. (code-init.el, code-files.el, coding.el, ...)
|
|
305 - i copied the remaining language-specific files from fsf. i made
|
|
306 some minor changes in certain cases but for the most part the stuff
|
|
307 was just copied and may not work.
|
|
308 - ms windows mule support, with full unicode support. required font,
|
|
309 redisplay, event, other changes. ime support from ikeyama.
|
|
310
|
|
311 User-Visible Changes:
|
|
312
|
|
313 Lisp-Visible Changes:
|
|
314
|
|
315 - ensure that `escape-quoted' works correctly even without Mule support and
|
|
316 use it for all auto-saves. (auto-save.el, fileio.c, coding.el, files.el)
|
|
317 - new var buffer-file-coding-system-when-loaded specifies the actual coding
|
|
318 system used when the file was loaded (buffer-file-coding-system is
|
|
319 usually the same, but may be changed because it controls how the file is
|
|
320 written out). use it in revert-buffer (files.el, code-files.el) and in
|
|
321 new submenu File->Revert Buffer with Specified Encoding
|
|
322 (menubar-items.el).
|
|
323 - improve docs on how the coding system is determined when a file is read
|
|
324 in; improved docs are in both find-file and insert-file-contents and a
|
|
325 reference to where to find them is in
|
|
326 buffer-file-coding-system-for-read. (files.el, code-files.el)
|
|
327 - new (brain-damaged) FSF way of calling post-read-conversion (only one
|
|
328 arg, not two) is supported, along with our two-argument way, as best we
|
|
329 can. (code-files.el)
|
|
330 - add inexplicably missing var default-process-coding-system. use it. get
|
|
331 rid of former hacked-up way of setting these defaults using
|
|
332 comint-exec-hook. also fun
|
|
333 set-buffer-process-coding-system. (code-process.el, code-cmds.el, process.c)
|
|
334 - remove function set-default-coding-systems; replace with
|
|
335 set-default-output-coding-systems, which affects only the output defaults
|
|
336 (buffer-file-coding-system, output half of
|
|
337 default-process-coding-system). the input defaults should not be set by
|
|
338 this because they should always remain `undecided' in normal
|
|
339 circumstances. fix prefer-coding-system to use the new function and
|
|
340 correct its docs.
|
|
341 - fix bug in coding-system-change-eol-conversion (code-cmds.el)
|
|
342 - recognize all eol types in prefer-coding-system (code-cmds.el)
|
|
343 - rewrite coding-system-category to be correct (coding.el)
|
|
344
|
|
345 Internal Changes:
|
|
346
|
|
347 - Separate encoding and decoding lstreams have been combined into a single
|
|
348 coding lstream. Functions make_encoding_*_stream and
|
|
349 make_decoding_*_stream have been combined into make_coding_*_stream,
|
|
350 which takes an argument specifying whether encode or decode is wanted.
|
|
351 - remove last vestiges of I18N3, I18N4 code.
|
|
352 - ascii optimization for strings: we keep track of the number of ascii
|
|
353 chars at the beginning and use this to optimize byte<->char conversion on
|
|
354 strings.
|
|
355 - mule-misc.el, mule-init.el deleted; code in there either deleted,
|
|
356 rewritten, or moved to another file.
|
|
357 - mule.c deleted.
|
|
358 - move non-Mule-specific code out of mule-cmds.el into code-cmds.el. (coding-system-change-text-conversion; remove duplicate coding-system-change-eol-conversion)
|
|
359 - remove duplicate set-buffer-process-coding-system (code-cmds.el)
|
|
360 - add some commented-out code from FSF mule-cmds.el
|
|
361 (find-coding-systems-region-subset-p, find-coding-systems-region,
|
|
362 find-coding-systems-string, find-coding-systems-for-charsets,
|
|
363 find-multibyte-characters, last-coding-system-specified,
|
|
364 select-safe-coding-system, select-message-coding-system) (code-cmds.el)
|
|
365 - remove obsolete alias pathname-coding-system, function set-pathname-coding-system (coding.el)
|
|
366 - remove coding-system property doc-string; split into `description'
|
|
367 (short, for menu items) and `documentation' (long); correct coding system
|
|
368 defns (coding.el, file-coding.c, lots of language files)
|
|
369 - move coding-system-base into C and make use of internal info (coding.el, file-coding.c)
|
|
370 - move undecided defn into C (coding.el, file-coding.c)
|
|
371 - use define-coding-system-alias, not copy-coding-system (coding.el)
|
|
372 - new coding system iso-8859-6 for arabic
|
|
373 - delete windows-1251 support from cyrillic.el; we do it automatically
|
|
374 - remove setup-*-environment as per FSF 21
|
|
375 - rewrite european.el with lang envs for each language, so we can specify the locale
|
|
376 - fix corruption in greek.el
|
|
377 - sync japanese.el with FSF 20.6
|
|
378 - fix warnings in mule-ccl.el
|
|
379 - move FSF compat Mule fns from obsolete.el to mule-charset.el
|
|
380 - eliminate unused truncate-string{-to-width}
|
|
381 - make-coding-system accepts (but ignores) the additional properties
|
|
382 present in the fsf version, for compatibility.
|
|
383 - i fixed the iso2022 handling so it will correctly read in files
|
|
384 containing unknown charsets, creating a "temporary" charset which
|
|
385 can later be overwritten by the real charset when it's defined.
|
|
386 this allows iso2022 elisp files with literals in strange languages
|
|
387 to compile correctly under mule. i also added a hack that will
|
|
388 correctly read in and write out the emacs-specific "composition"
|
|
389 escape sequences, i.e. ESC 0 through ESC 4. this means that my
|
|
390 workspace correctly compiles the new file devanagari.el that i added.
|
|
391 - elimination of string-to-char-list (use string-to-list)
|
|
392 - elimination of junky define-charset
|
|
393
|
|
394 Search:
|
|
395
|
|
396 - make regex routines reentrant, since they're sometimes called
|
|
397 reentrantly. (see regex.c for a description of how.) all global variables
|
|
398 used by the regex routines get pushed onto a stack by the callers before
|
|
399 being set, and are restored when finished. redo the preprocessor flags
|
|
400 controlling REL_ALLOC in conjunction with this.
|
|
401
|
|
402 Selection:
|
|
403
|
|
404 - fix msw selection code for Mule. proper encoding for
|
|
405 RegisterClipboardFormat. store selection as CF_UNICODETEXT, which will
|
|
406 get converted to the other formats. don't respond to destroy messages
|
|
407 from EmptyClipboard().
|
|
408
|
|
409 Menubar:
|
|
410
|
|
411 - move menu-splitting code (menu-split-long-menu, etc.) from font-menu.el
|
|
412 to menubar-items.el and redo its algorithm; use in various items with
|
|
413 long generated menus; rename to remove `font-' from beginning of
|
|
414 functions but keep old names as aliases
|
|
415 - new fn menu-sort-menu
|
|
416 - new items Open With Specified Encoding, Revert Buffer with Specified Encoding
|
|
417 - split Mule menu into Encoding (non-Mule-specific; includes new item to
|
|
418 control EOL auto-detection) and International submenus on Options,
|
|
419 International on Help
|
|
420 - redo items Grep All Files in Current Directory {and Below} using stuff
|
|
421 from sample init.el
|
|
422 - Debug on Error and friends now affect current session only; not saved
|
|
423 - maybe-add-init-button -> init-menubar-at-startup and call explicitly from startup.el
|
|
424 - don't use charset-registry in msw-font-menu.el; it's only for X
|
|
425
|
|
426 Process:
|
|
427
|
|
428 - Move setenv from packages; synch setenv/getenv with 21.0.105
|
|
429
|
|
430 Unicode support:
|
|
431
|
|
432 - translation tables added in etc/unicode
|
|
433 - new files unicode.c, unicode.el containing unicode coding systems and
|
|
434 support; old code ripped out of file-coding.c
|
|
435 - translation tables read in at startup (NEEDS WORK TO MAKE IT MORE EFFICIENT)
|
|
436 - support CF_TEXT, CF_UNICODETEXT in select.el
|
|
437 - encapsulation code added so that we can support both Windows 9x and NT in
|
|
438 a single executable, determining at runtime whether to call the Unicode
|
|
439 or non-Unicode API. encapsulated routines in intl-encap-win32.c
|
|
440 (non-auto-generated) and intl-auto-encap-win32.[ch] (auto-generated).
|
|
441 code generator in lib-src/make-mswin-unicode.pl. changes throughout the
|
|
442 code to use the wide structures (W suffix) and call the encapsulated
|
|
443 Win32 API routines (qxe prefix). calling code needs to do proper
|
|
444 conversion of text using new coding systems Qmswindows_tstr,
|
|
445 Qmswindows_unicode, or Qmswindows_multibyte. (the first points to one of
|
|
446 the other two.)
|
|
447
|
|
448
|
|
449 File-coding rewrite:
|
|
450
|
|
451 The coding system code has been majorly rewritten. It's abstracted into
|
|
452 coding systems that are defined by methods (similar to devices and
|
|
453 specifiers). The types of conversions have also been
|
|
454 generalized. Formerly, decoding always converted bytes to characters and
|
|
455 encoding the reverse (these are now called "text file converters"), but
|
|
456 conversion can now happen either to or from bytes or characters. This
|
|
457 allows coding systems such as `gzip' and `base64' to be written. When
|
|
458 specifying such a coding system to an operation that expects a text file
|
|
459 converter (such as reading in or writing out a file), the appropriate
|
|
460 coding systems to convert between bytes and characters are automatically
|
|
461 inserted into the conversion chain as necessary. To facilitate creating
|
|
462 such chains, a special coding system called "chain" has been created, which
|
|
463 chains together two or more coding systems.
|
|
464
|
|
465 Encoding detection has also been abstracted. Detectors are logically
|
|
466 separate from coding systems, and each detector defines one or more
|
|
467 categories. (For example, the detector for Unicode defines categories such
|
|
468 as UTF-8, UTF-16, UCS-4, and UTF-7.) When a particular detector is given a
|
|
469 piece of text to detect, it determines likeliness values (seven of them,
|
|
470 from 3 [most likely] to -3 [least likely]; specific criteria are defined
|
|
471 for each possible value). All detectors are run in parallel on a
|
|
472 particular piece of text, and the results tabulated together to determine
|
|
473 the actual encoding of the text.
|
|
474
|
|
475 Encoding and decoding are now completely parallel operations, and the
|
|
476 former "encoding" and "decoding" lstreams have been combined into a single
|
|
477 "coding" lstream. Coding system methods that were formerly split in such a
|
|
478 fashion have also been combined.
|
|
479
|