Mercurial > hg > xemacs-beta
comparison src/regex.h @ 5648:3f4a234f4672
Support non-ASCII correctly in character classes, test this.
src/ChangeLog addition:
2012-04-21 Aidan Kehoe <kehoea@parhasard.net>
Support non-ASCII correctly in character classes ([:alnum:] and
friends).
* regex.c:
* regex.c (ISBLANK, ISUNIBYTE): New. Make these and friends
independent of the locale, since we want them to be consistent in
XEmacs.
* regex.c (print_partial_compiled_pattern): Print the flags for
charset_mule; don't print non-ASCII as the character values in
ranges, this breaks with locales.
* regex.c (enum):
Define various flags the charset_mule and charset_mule_not opcodes
can now take.
* regex.c (CHAR_CLASS_MAX_LENGTH): Update this.
* regex.c (re_iswctype, re_wctype): New, from GNU.
* regex.c (re_wctype_can_match_non_ascii): New; used when deciding
on whether to use charset_mule or the ASCII-only regex character
set opcode.
* regex.c (regex_compile):
Error correctly on long, non-existent character class names.
Break out the handling of charsets that can match non-ASCII into a
separate clause. Use compile_char_class when compiling character
classes.
* regex.c (compile_char_class): New. Used in regex_compile when
compiling character sets that may match non-ASCII.
* regex.c (re_compile_fastmap):
If there are flags set for charset_mule or charset_mule_not, we
can't use the fastmap (since we need to check syntax table values
that aren't available there).
* regex.c (re_match_2_internal):
Check the new flags passed to the charset_mule{,_not} opcode,
observe them if appropriate.
* regex.h:
* regex.h (enum):
Expose re_wctype_t here, imported from GNU.
tests/ChangeLog addition:
2012-04-21 Aidan Kehoe <kehoea@parhasard.net>
* automated/regexp-tests.el:
* automated/regexp-tests.el (Assert-char-class):
Check that #'string-match errors correctly with an over-long
character class name.
Add tests for character class functionality that supports
non-ASCII characters. These tests expose bugs in GNU Emacs
24.0.94.2, but pass under current XEmacs.
author | Aidan Kehoe <kehoea@parhasard.net> |
---|---|
date | Sat, 21 Apr 2012 18:58:28 +0100 |
parents | 1d9f603e9125 |
children | 3df910176b6a |
comparison
equal
deleted
inserted
replaced
5647:1d9f603e9125 | 5648:3f4a234f4672 |
---|---|
544 RE_DEBUG_MATCHING = 1 << 2, | 544 RE_DEBUG_MATCHING = 1 << 2, |
545 }; | 545 }; |
546 | 546 |
547 extern int debug_regexps; | 547 extern int debug_regexps; |
548 | 548 |
549 typedef enum | |
550 { | |
551 RECC_ERROR = 0, | |
552 RECC_ALNUM, RECC_ALPHA, RECC_WORD, | |
553 RECC_GRAPH, RECC_PRINT, | |
554 RECC_LOWER, RECC_UPPER, | |
555 RECC_PUNCT, RECC_CNTRL, | |
556 RECC_DIGIT, RECC_XDIGIT, | |
557 RECC_BLANK, RECC_SPACE, | |
558 RECC_MULTIBYTE, RECC_NONASCII, | |
559 RECC_ASCII, RECC_UNIBYTE | |
560 } re_wctype_t; | |
561 | |
549 END_C_DECLS | 562 END_C_DECLS |
550 | 563 |
551 #endif /* INCLUDED_regex_h_ */ | 564 #endif /* INCLUDED_regex_h_ */ |