Mercurial > hg > xemacs-beta
comparison src/ChangeLog @ 4096:1abf84db2c7f
[xemacs-hg @ 2007-08-04 20:00:10 by aidan]
Preserve invalid UTF-8, UTF-16 sequences on encoding, decoding.
| author | aidan |
|---|---|
| date | Sat, 04 Aug 2007 20:00:24 +0000 |
| parents | 6f05405e63fc |
| children | d1cf2b9c4dfd |
comparison
equal
deleted
inserted
replaced
| 4095:bff7e065cfdc | 4096:1abf84db2c7f |
|---|---|
| 1 2007-08-04 Aidan Kehoe <kehoea@parhasard.net> | |
| 2 | |
| 3 * charset.h: | |
| 4 * charset.h (enum unicode_type): | |
| 5 Add UNICODE_UTF_32. | |
| 6 * lisp.h: | |
| 7 Add Qutf_32. | |
| 8 * lread.c (read_unicode_escape): | |
| 9 Error on an invalid Unicode escape; error on no mapping, as GNU does. | |
| 10 | |
| 11 * mule-coding.c: | |
| 12 * mule-coding.c (dynarr_add_2022_one_dimension): | |
| 13 * mule-coding.c (dynarr_add_2022_two_dimensions): | |
| 14 * mule-coding.c (struct iso2022_coding_stream): | |
| 15 * mule-coding.c (decode_unicode_char): | |
| 16 * mule-coding.c (indicate_invalid_utf_8): | |
| 17 * mule-coding.c (iso2022_decode): | |
| 18 * unicode.c: | |
| 19 * unicode.c (struct unicode_coding_stream): | |
| 20 * unicode.c (decode_unicode_char): | |
| 21 * unicode.c (DECODE_ERROR_OCTET): | |
| 22 * unicode.c (indicate_invalid_utf_8): | |
| 23 * unicode.c (encode_unicode_char_1): | |
| 24 * unicode.c (encode_unicode_char): | |
| 25 * unicode.c (unicode_convert): | |
| 26 * unicode.c (unicode_putprop): | |
| 27 * unicode.c (unicode_getprop): | |
| 28 * unicode.c (syms_of_unicode): | |
| 29 Make UTF-8 and UTF-16 handling more robust; indicate error | |
| 30 sequences when decoding, passing the octets as distinct from the | |
| 31 corresponding ISO8859-1 characters, and (by default) writing them | |
| 32 to disk on encoding. Don't accept over-long UTF-8 sequences, codes | |
| 33 >= #x110000, or UTF-16 surrogates on reading in the utf-8 coding | |
| 34 system; represent them as error sequences. | |
| 35 | |
| 36 Do accept code points above #x110000 in the ISO IR 196 handling, | |
| 37 since we decode Unicode error sequences to "Unicode" code points | |
| 38 starting at 0x200000, and will need to save them as such in | |
| 39 escape-quoted. Do not accept over-long UTF-8 sequences or UTF-16 | |
| 40 surrogates in escape-quoted. | |
| 41 | |
| 42 This change means that when a non-UTF-8 file is opened as UTF-8, | |
| 43 one change made, and immediately saved, the non-ASCII characters | |
| 44 are not corrupted. In Europe, this is a distinct win. | |
| 45 | |
| 46 Add UCS-4, UTF-32 as coding systems. | |
| 47 | |
| 1 2007-07-26 Aidan Kehoe <kehoea@parhasard.net> | 48 2007-07-26 Aidan Kehoe <kehoea@parhasard.net> |
| 2 | 49 |
| 3 * mule-ccl.c (ccl_driver): | 50 * mule-ccl.c (ccl_driver): |
| 4 op is an integer, not a Lisp_Object; don't use it to temporarily | 51 op is an integer, not a Lisp_Object; don't use it to temporarily |
| 5 store a Lisp_Object. This change fixes the union build; thank you | 52 store a Lisp_Object. This change fixes the union build; thank you |
