Mercurial > hg > xemacs-beta
comparison src/ChangeLog @ 4096:1abf84db2c7f
[xemacs-hg @ 2007-08-04 20:00:10 by aidan]
Preserve invalid UTF-8, UTF-16 sequences on encoding, decoding.
author | aidan |
---|---|
date | Sat, 04 Aug 2007 20:00:24 +0000 |
parents | 6f05405e63fc |
children | d1cf2b9c4dfd |
comparison
equal
deleted
inserted
replaced
4095:bff7e065cfdc | 4096:1abf84db2c7f |
---|---|
1 2007-08-04 Aidan Kehoe <kehoea@parhasard.net> | |
2 | |
3 * charset.h: | |
4 * charset.h (enum unicode_type): | |
5 Add UNICODE_UTF_32. | |
6 * lisp.h: | |
7 Add Qutf_32. | |
8 * lread.c (read_unicode_escape): | |
9 Error on an invalid Unicode escape; error on no mapping, as GNU does. | |
10 | |
11 * mule-coding.c: | |
12 * mule-coding.c (dynarr_add_2022_one_dimension): | |
13 * mule-coding.c (dynarr_add_2022_two_dimensions): | |
14 * mule-coding.c (struct iso2022_coding_stream): | |
15 * mule-coding.c (decode_unicode_char): | |
16 * mule-coding.c (indicate_invalid_utf_8): | |
17 * mule-coding.c (iso2022_decode): | |
18 * unicode.c: | |
19 * unicode.c (struct unicode_coding_stream): | |
20 * unicode.c (decode_unicode_char): | |
21 * unicode.c (DECODE_ERROR_OCTET): | |
22 * unicode.c (indicate_invalid_utf_8): | |
23 * unicode.c (encode_unicode_char_1): | |
24 * unicode.c (encode_unicode_char): | |
25 * unicode.c (unicode_convert): | |
26 * unicode.c (unicode_putprop): | |
27 * unicode.c (unicode_getprop): | |
28 * unicode.c (syms_of_unicode): | |
29 Make UTF-8 and UTF-16 handling more robust; indicate error | |
30 sequences when decoding, passing the octets as distinct from the | |
31 corresponding ISO8859-1 characters, and (by default) writing them | |
32 to disk on encoding. Don't accept over-long UTF-8 sequences, codes | |
33 >= #x110000, or UTF-16 surrogates on reading in the utf-8 coding | |
34 system; represent them as error sequences. | |
35 | |
36 Do accept code points above #x110000 in the ISO IR 196 handling, | |
37 since we decode Unicode error sequences to "Unicode" code points | |
38 starting at 0x200000, and will need to save them as such in | |
39 escape-quoted. Do not accept over-long UTF-8 sequences or UTF-16 | |
40 surrogates in escape-quoted. | |
41 | |
42 This change means that when a non-UTF-8 file is opened as UTF-8, | |
43 one change made, and immediately saved, the non-ASCII characters | |
44 are not corrupted. In Europe, this is a distinct win. | |
45 | |
46 Add UCS-4, UTF-32 as coding systems. | |
47 | |
1 2007-07-26 Aidan Kehoe <kehoea@parhasard.net> | 48 2007-07-26 Aidan Kehoe <kehoea@parhasard.net> |
2 | 49 |
3 * mule-ccl.c (ccl_driver): | 50 * mule-ccl.c (ccl_driver): |
4 op is an integer, not a Lisp_Object; don't use it to temporarily | 51 op is an integer, not a Lisp_Object; don't use it to temporarily |
5 store a Lisp_Object. This change fixes the union build; thank you | 52 store a Lisp_Object. This change fixes the union build; thank you |