comparison src/ChangeLog @ 4096:1abf84db2c7f

[xemacs-hg @ 2007-08-04 20:00:10 by aidan] Preserve invalid UTF-8, UTF-16 sequences on encoding, decoding.
author aidan
date Sat, 04 Aug 2007 20:00:24 +0000
parents 6f05405e63fc
children d1cf2b9c4dfd
comparison
equal deleted inserted replaced
4095:bff7e065cfdc 4096:1abf84db2c7f
1 2007-08-04 Aidan Kehoe <kehoea@parhasard.net>
2
3 * charset.h:
4 * charset.h (enum unicode_type):
5 Add UNICODE_UTF_32.
6 * lisp.h:
7 Add Qutf_32.
8 * lread.c (read_unicode_escape):
9 Error on an invalid Unicode escape; error on no mapping, as GNU does.
10
11 * mule-coding.c:
12 * mule-coding.c (dynarr_add_2022_one_dimension):
13 * mule-coding.c (dynarr_add_2022_two_dimensions):
14 * mule-coding.c (struct iso2022_coding_stream):
15 * mule-coding.c (decode_unicode_char):
16 * mule-coding.c (indicate_invalid_utf_8):
17 * mule-coding.c (iso2022_decode):
18 * unicode.c:
19 * unicode.c (struct unicode_coding_stream):
20 * unicode.c (decode_unicode_char):
21 * unicode.c (DECODE_ERROR_OCTET):
22 * unicode.c (indicate_invalid_utf_8):
23 * unicode.c (encode_unicode_char_1):
24 * unicode.c (encode_unicode_char):
25 * unicode.c (unicode_convert):
26 * unicode.c (unicode_putprop):
27 * unicode.c (unicode_getprop):
28 * unicode.c (syms_of_unicode):
29 Make UTF-8 and UTF-16 handling more robust; indicate error
30 sequences when decoding, passing the octets as distinct from the
31 corresponding ISO8859-1 characters, and (by default) writing them
32 to disk on encoding. Don't accept over-long UTF-8 sequences, codes
33 >= #x110000, or UTF-16 surrogates on reading in the utf-8 coding
34 system; represent them as error sequences.
35
36 Do accept code points above #x110000 in the ISO IR 196 handling,
37 since we decode Unicode error sequences to "Unicode" code points
38 starting at 0x200000, and will need to save them as such in
39 escape-quoted. Do not accept over-long UTF-8 sequences or UTF-16
40 surrogates in escape-quoted.
41
42 This change means that when a non-UTF-8 file is opened as UTF-8,
43 one change made, and immediately saved, the non-ASCII characters
44 are not corrupted. In Europe, this is a distinct win.
45
46 Add UCS-4, UTF-32 as coding systems.
47
1 2007-07-26 Aidan Kehoe <kehoea@parhasard.net> 48 2007-07-26 Aidan Kehoe <kehoea@parhasard.net>
2 49
3 * mule-ccl.c (ccl_driver): 50 * mule-ccl.c (ccl_driver):
4 op is an integer, not a Lisp_Object; don't use it to temporarily 51 op is an integer, not a Lisp_Object; don't use it to temporarily
5 store a Lisp_Object. This change fixes the union build; thank you 52 store a Lisp_Object. This change fixes the union build; thank you