Mercurial > hg > xemacs-beta
diff src/ChangeLog @ 4096:1abf84db2c7f
[xemacs-hg @ 2007-08-04 20:00:10 by aidan]
Preserve invalid UTF-8, UTF-16 sequences on encoding, decoding.
author | aidan |
---|---|
date | Sat, 04 Aug 2007 20:00:24 +0000 |
parents | 6f05405e63fc |
children | d1cf2b9c4dfd |
line wrap: on
line diff
--- a/src/ChangeLog Fri Aug 03 21:51:12 2007 +0000 +++ b/src/ChangeLog Sat Aug 04 20:00:24 2007 +0000 @@ -1,3 +1,50 @@ +2007-08-04 Aidan Kehoe <kehoea@parhasard.net> + + * charset.h: + * charset.h (enum unicode_type): + Add UNICODE_UTF_32. + * lisp.h: + Add Qutf_32. + * lread.c (read_unicode_escape): + Error on an invalid Unicode escape; error on no mapping, as GNU does. + + * mule-coding.c: + * mule-coding.c (dynarr_add_2022_one_dimension): + * mule-coding.c (dynarr_add_2022_two_dimensions): + * mule-coding.c (struct iso2022_coding_stream): + * mule-coding.c (decode_unicode_char): + * mule-coding.c (indicate_invalid_utf_8): + * mule-coding.c (iso2022_decode): + * unicode.c: + * unicode.c (struct unicode_coding_stream): + * unicode.c (decode_unicode_char): + * unicode.c (DECODE_ERROR_OCTET): + * unicode.c (indicate_invalid_utf_8): + * unicode.c (encode_unicode_char_1): + * unicode.c (encode_unicode_char): + * unicode.c (unicode_convert): + * unicode.c (unicode_putprop): + * unicode.c (unicode_getprop): + * unicode.c (syms_of_unicode): + Make UTF-8 and UTF-16 handling more robust; indicate error + sequences when decoding, passing the octets as distinct from the + corresponding ISO8859-1 characters, and (by default) writing them + to disk on encoding. Don't accept over-long UTF-8 sequences, codes + >= #x110000, or UTF-16 surrogates on reading in the utf-8 coding + system; represent them as error sequences. + + Do accept code points above #x110000 in the ISO IR 196 handling, + since we decode Unicode error sequences to "Unicode" code points + starting at 0x200000, and will need to save them as such in + escape-quoted. Do not accept over-long UTF-8 sequences or UTF-16 + surrogates in escape-quoted. + + This change means that when a non-UTF-8 file is opened as UTF-8, + one change made, and immediately saved, the non-ASCII characters + are not corrupted. In Europe, this is a distinct win. + + Add UCS-4, UTF-32 as coding systems. + 2007-07-26 Aidan Kehoe <kehoea@parhasard.net> * mule-ccl.c (ccl_driver):