comparison man/lispref/searching.texi @ 5653:3df910176b6a

Support predefined character classes in #'skip-chars-{forward,backward}, too src/ChangeLog addition: 2012-05-04 Aidan Kehoe <kehoea@parhasard.net> * regex.c: Move various #defines and enums to regex.h, since we need them when implementing #'skip-chars-{backward,forward}. * regex.c (re_wctype): * regex.c (re_iswctype): Be more robust about case insensitivity here. * regex.c (regex_compile): * regex.h: * regex.h (RE_ISWCTYPE_ARG_DECL): * regex.h (CHAR_CLASS_MAX_LENGTH): * search.c (skip_chars): Implement support for the predefined character classes in this function. tests/ChangeLog addition: 2012-05-04 Aidan Kehoe <kehoea@parhasard.net> * automated/regexp-tests.el (equal): * automated/regexp-tests.el (Assert-char-class): Correct a stray parenthesis; add tests for the predefined character classes with #'skip-chars-{forward,backward}; update the tests to reflect some changed design decisions on my part. man/ChangeLog addition: 2012-05-04 Aidan Kehoe <kehoea@parhasard.net> * lispref/searching.texi (Regular Expressions): * lispref/searching.texi (Syntax of Regexps): * lispref/searching.texi (Char Classes): * lispref/searching.texi (Regexp Example): Document the predefined character classes in this file.
author Aidan Kehoe <kehoea@parhasard.net>
date Fri, 04 May 2012 21:12:02 +0100
parents a46c5c8d6564
children 9fae6227ede5
comparison
equal deleted inserted replaced
5649:d026b665014f 5653:3df910176b6a
178 extended with some Perl--like capabilities, described in the next 178 extended with some Perl--like capabilities, described in the next
179 section. 179 section.
180 180
181 @menu 181 @menu
182 * Syntax of Regexps:: Rules for writing regular expressions. 182 * Syntax of Regexps:: Rules for writing regular expressions.
183 * Char Classes:: Predefined character classes for searching.
183 * Regexp Example:: Illustrates regular expression syntax. 184 * Regexp Example:: Illustrates regular expression syntax.
184 @end menu 185 @end menu
185 186
186 @node Syntax of Regexps 187 @node Syntax of Regexps
187 @subsection Syntax of Regular Expressions 188 @subsection Syntax of Regular Expressions
333 @samp{]}. 334 @samp{]}.
334 335
335 To include @samp{^} in a set, put it anywhere but at the beginning of 336 To include @samp{^} in a set, put it anywhere but at the beginning of
336 the set. 337 the set.
337 338
339 It is also possible to specify named character classes as part of your
340 character set; for example, @samp{[:xdigit:]} will match hexadecimal
341 digits, @samp{[:nonascii:]} will match characters outside the basic
342 ASCII set. These are documented elsewhere, @pxref{Char Classes}.
343
338 @item [^ @dots{} ] 344 @item [^ @dots{} ]
339 @cindex @samp{^} in regexp 345 @cindex @samp{^} in regexp
340 @samp{[^} begins a @dfn{complement character set}, which matches any 346 @samp{[^} begins a @dfn{complement character set}, which matches any
341 character except the ones specified. Thus, @samp{[^a-z0-9A-Z]} 347 character except the ones specified. Thus, @samp{[^a-z0-9A-Z]}
342 matches all characters @emph{except} letters and digits.@refill 348 matches all characters @emph{except} letters and digits.@refill
601 (re-search-forward 607 (re-search-forward
602 (concat "\\s-" (regexp-quote string) "\\s-")) 608 (concat "\\s-" (regexp-quote string) "\\s-"))
603 @end group 609 @end group
604 @end example 610 @end example
605 @end defun 611 @end defun
612
613 @node Char Classes
614 @subsection Char Classes
615
616 These are the predefined character classes available within regular
617 expression character sets, and within @samp{skip-chars-forward} and
618 @samp{skip-chars-backward}, @xref{Skipping Characters}.
619
620 @table @samp
621 @item [:alnum:]
622 This matches any ASCII letter or digit, or any non-ASCII character
623 with word syntax.
624 @item [:alpha:]
625 This matches any ASCII letter, or any non-ASCII character with word syntax.
626 @item [:ascii:]
627 This matches any character with a numeric value below @samp{?\x80}.
628 @item [:blank:]
629 This matches space or tab.
630 @item [:cntrl:]
631 This matches any character with a numeric value below @samp{?\x20},
632 the code for space; these are the ASCII control characters.
633 @item [:digit:]
634 This matches the characters @samp{?0} to @samp{?9}, inclusive.
635 @item [:graph:]
636 This matches ``graphic'' characters, with numeric values greater than
637 @samp{?\x20}, exclusive of @samp{?\x7f}, the delete character.
638 @item [:lower:]
639 This matches minuscule characters, or any character with case
640 information if @samp{case-fold-search} is non-nil.
641 @item [:multibyte:]
642 This matches non-ASCII characters, that is, any character with a
643 numeric value above @samp{?\x7f}.
644 @item [:nonascii:]
645 This is equivalent to @samp{[:multibyte:]}.
646 @item [:print:]
647 This is equivalent to [:graph:], but also matches the space character,
648 @samp{?\x20}.
649 @item [:punct:]
650 This matches non-control, non-alphanumeric ASCII characters, or any
651 non-ASCII character without word syntax.
652 @item [:space:]
653 This matches any character with whitespace syntax.
654 @item [:unibyte:]
655 This is a GNU Emacs extension; in XEmacs it is equivalent to
656 @samp{[:ascii:]}. Note that this means it is not equivalent to
657 @samp{"\x00-\xff"}, which one might have assumed to be the case.
658 @item [:upper:]
659 This matches majuscule characters, or any character with case
660 information if @samp{case-fold-search} is non-nil.
661 @item [:word:]
662 This matches any character with word syntax.
663 @item [:xdigit:]
664 This matches hexadecimal digits, so the decimal digits @samp{0-9} and the
665 letters @samp{a-F} and @samp{A-F}.
666 @end table
606 667
607 @node Regexp Example 668 @node Regexp Example
608 @subsection Complex Regexp Example 669 @subsection Complex Regexp Example
609 670
610 Here is a complicated regexp, used by XEmacs to recognize the end of a 671 Here is a complicated regexp, used by XEmacs to recognize the end of a