Mercurial > hg > xemacs-beta
comparison src/ChangeLog @ 5648:3f4a234f4672
Support non-ASCII correctly in character classes, test this.
src/ChangeLog addition:
2012-04-21 Aidan Kehoe <kehoea@parhasard.net>
Support non-ASCII correctly in character classes ([:alnum:] and
friends).
* regex.c:
* regex.c (ISBLANK, ISUNIBYTE): New. Make these and friends
independent of the locale, since we want them to be consistent in
XEmacs.
* regex.c (print_partial_compiled_pattern): Print the flags for
charset_mule; don't print non-ASCII as the character values in
ranges, this breaks with locales.
* regex.c (enum):
Define various flags the charset_mule and charset_mule_not opcodes
can now take.
* regex.c (CHAR_CLASS_MAX_LENGTH): Update this.
* regex.c (re_iswctype, re_wctype): New, from GNU.
* regex.c (re_wctype_can_match_non_ascii): New; used when deciding
on whether to use charset_mule or the ASCII-only regex character
set opcode.
* regex.c (regex_compile):
Error correctly on long, non-existent character class names.
Break out the handling of charsets that can match non-ASCII into a
separate clause. Use compile_char_class when compiling character
classes.
* regex.c (compile_char_class): New. Used in regex_compile when
compiling character sets that may match non-ASCII.
* regex.c (re_compile_fastmap):
If there are flags set for charset_mule or charset_mule_not, we
can't use the fastmap (since we need to check syntax table values
that aren't available there).
* regex.c (re_match_2_internal):
Check the new flags passed to the charset_mule{,_not} opcode,
observe them if appropriate.
* regex.h:
* regex.h (enum):
Expose re_wctype_t here, imported from GNU.
tests/ChangeLog addition:
2012-04-21 Aidan Kehoe <kehoea@parhasard.net>
* automated/regexp-tests.el:
* automated/regexp-tests.el (Assert-char-class):
Check that #'string-match errors correctly with an over-long
character class name.
Add tests for character class functionality that supports
non-ASCII characters. These tests expose bugs in GNU Emacs
24.0.94.2, but pass under current XEmacs.
| author | Aidan Kehoe <kehoea@parhasard.net> |
|---|---|
| date | Sat, 21 Apr 2012 18:58:28 +0100 |
| parents | 1d9f603e9125 |
| children | d026b665014f |
comparison
equal
deleted
inserted
replaced
| 5647:1d9f603e9125 | 5648:3f4a234f4672 |
|---|---|
| 1 2012-04-21 Aidan Kehoe <kehoea@parhasard.net> | |
| 2 | |
| 3 Support non-ASCII correctly in character classes ([:alnum:] and | |
| 4 friends). | |
| 5 | |
| 6 * regex.c: | |
| 7 * regex.c (ISBLANK, ISUNIBYTE): New. Make these and friends | |
| 8 independent of the locale, since we want them to be consistent in | |
| 9 XEmacs. | |
| 10 * regex.c (print_partial_compiled_pattern): Print the flags for | |
| 11 charset_mule; don't print non-ASCII as the character values in | |
| 12 ranges, this breaks with locales. | |
| 13 * regex.c (enum): | |
| 14 Define various flags the charset_mule and charset_mule_not opcodes | |
| 15 can now take. | |
| 16 * regex.c (CHAR_CLASS_MAX_LENGTH): Update this. | |
| 17 * regex.c (re_iswctype, re_wctype): New, from GNU. | |
| 18 * regex.c (re_wctype_can_match_non_ascii): New; used when deciding | |
| 19 on whether to use charset_mule or the ASCII-only regex character | |
| 20 set opcode. | |
| 21 * regex.c (regex_compile): | |
| 22 Error correctly on long, non-existent character class names. | |
| 23 Break out the handling of charsets that can match non-ASCII into a | |
| 24 separate clause. Use compile_char_class when compiling character | |
| 25 classes. | |
| 26 * regex.c (compile_char_class): New. Used in regex_compile when | |
| 27 compiling character sets that may match non-ASCII. | |
| 28 * regex.c (re_compile_fastmap): | |
| 29 If there are flags set for charset_mule or charset_mule_not, we | |
| 30 can't use the fastmap (since we need to check syntax table values | |
| 31 that aren't available there). | |
| 32 * regex.c (re_match_2_internal): | |
| 33 Check the new flags passed to the charset_mule{,_not} opcode, | |
| 34 observe them if appropriate. | |
| 35 * regex.h: | |
| 36 * regex.h (enum): | |
| 37 Expose re_wctype_t here, imported from GNU. | |
| 38 | |
| 1 2012-04-21 Aidan Kehoe <kehoea@parhasard.net> | 39 2012-04-21 Aidan Kehoe <kehoea@parhasard.net> |
| 2 | 40 |
| 3 * regex.h (RE_SYNTAX_EMACS): | 41 * regex.h (RE_SYNTAX_EMACS): |
| 4 Turn on character classes ([:alnum:] and friends) by default. This | 42 Turn on character classes ([:alnum:] and friends) by default. This |
| 5 implementation is incomplete, am working on a version that handles | 43 implementation is incomplete, am working on a version that handles |
