Mercurial > hg > xemacs-beta
comparison man/lispref/searching.texi @ 280:7df0dd720c89 r21-0b38
Import from CVS: tag r21-0b38
author | cvs |
---|---|
date | Mon, 13 Aug 2007 10:32:22 +0200 |
parents | 084402c475ba |
children | c9fe270a4101 |
comparison
equal
deleted
inserted
replaced
279:c20b2fb5bb0a | 280:7df0dd720c89 |
---|---|
171 also make good use of regular expressions. | 171 also make good use of regular expressions. |
172 | 172 |
173 The XEmacs regular expression syntax most closely resembles that of | 173 The XEmacs regular expression syntax most closely resembles that of |
174 @cite{ed}, or @cite{grep}, the GNU versions of which all utilize the GNU | 174 @cite{ed}, or @cite{grep}, the GNU versions of which all utilize the GNU |
175 @cite{regex} library. XEmacs' version of @cite{regex} has recently been | 175 @cite{regex} library. XEmacs' version of @cite{regex} has recently been |
176 extended with some perl--like capabilities, described in the next | 176 extended with some Perl--like capabilities, described in the next |
177 section. | 177 section. |
178 | 178 |
179 @menu | 179 @menu |
180 * Syntax of Regexps:: Rules for writing regular expressions. | 180 * Syntax of Regexps:: Rules for writing regular expressions. |
181 * Regexp Example:: Illustrates regular expression syntax. | 181 * Regexp Example:: Illustrates regular expression syntax. |
267 else. | 267 else. |
268 | 268 |
269 @item *? | 269 @item *? |
270 @cindex @samp{*?} in regexp | 270 @cindex @samp{*?} in regexp |
271 works just like @samp{*}, except that rather than matching the longest | 271 works just like @samp{*}, except that rather than matching the longest |
272 match, it matches the shortest match. This is known as a "non-greedy" | 272 match, it matches the shortest match. @samp{*?} is known as a |
273 quantifier. It is a syntax that comes to us from perl. It is very | 273 @dfn{non-greedy} quantifier, a regexp construct borrowed from Perl. |
274 useful for situations where you want to match the text inside a pair of | |
275 delimiters. | |
276 @c Did perl get this from somewhere? What's the real history of *? ? | 274 @c Did perl get this from somewhere? What's the real history of *? ? |
277 | 275 |
278 @lisp | 276 This construct very useful for when you want to match the text inside a |
279 @group | 277 pair of delimiters. For instance, @samp{/\*.*?\*/} will match C |
280 (setq s "/ blah / / blah2 /") | 278 comments in a string. This could not be achieved without the use of |
281 @result{} "/ blah / / blah2 /" | 279 greedy quantifier. |
282 (string-match "/.*/" s) | 280 |
283 @result{} 0 | 281 This construct has not been available prior to XEmacs 20.4. It is not |
284 (match-string 0 s) | 282 available in FSF Emacs. |
285 @result{} "/ blah / / blah2 /" | |
286 (string-match "/.*?/" s) | |
287 @result{} 0 | |
288 (match-string 0 s) | |
289 @result{} "/ blah /" | |
290 @end group | |
291 @end lisp | |
292 | 283 |
293 @item +? | 284 @item +? |
294 @cindex @samp{+?} in regexp | 285 @cindex @samp{+?} in regexp |
295 is the @samp{+} analog to @samp{*?}. | 286 is the @samp{+} analog to @samp{*?}. |
296 | 287 |
297 @item \@{n,m\@} | 288 @item \@{n,m\@} |
298 @c Note the spacing after the close brace is deliberate. | 289 @c Note the spacing after the close brace is deliberate. |
299 @cindex @samp{\@{n,m\@} }in regexp | 290 @cindex @samp{\@{n,m\@} }in regexp |
300 this is an interval quantifier, which is analogous to @samp{*} or | 291 serves as an interval quantifier, analogous to @samp{*} or @samp{+}, but |
301 @samp{+}, but specifies that the expression must match at least @samp{n} | 292 specifies that the expression must match at least @var{n} times, but no |
302 times, but no more than @samp{m} times. This syntax comes to us from | 293 more than @var{m} times. This syntax is supported by most Unix regexp |
303 @cite{ed}, @cite{grep}, and @cite{perl}. The @cite{etags} utility also | 294 utilities, and has been introduced to XEmacs for the version 20.3. |
304 supports it. | |
305 | |
306 @lisp | |
307 @group | |
308 (setq s "12 123 1234 12345") | |
309 @result{} "12 123 1234 12345" | |
310 (string-match "[0-9]\\@{2,4\\@}" s) | |
311 @result{} 0 | |
312 (match-string 0 s) | |
313 @result{} "12" | |
314 (string-match "[0-9]\\@{3,4\\@}" s) | |
315 @result{} 3 | |
316 (match-string 0 s) | |
317 @result{} "123" | |
318 @end group | |
319 @end lisp | |
320 | 295 |
321 @item [ @dots{} ] | 296 @item [ @dots{} ] |
322 @cindex character set (in regexp) | 297 @cindex character set (in regexp) |
323 @cindex @samp{[} in regexp | 298 @cindex @samp{[} in regexp |
324 @cindex @samp{]} in regexp | 299 @cindex @samp{]} in regexp |
480 the same exact text. | 455 the same exact text. |
481 | 456 |
482 @item \(?: @dots{} \) | 457 @item \(?: @dots{} \) |
483 @cindex @samp{(?:} in regex | 458 @cindex @samp{(?:} in regex |
484 @cindex regexp grouping | 459 @cindex regexp grouping |
485 is called a "shy" grouping operator, and it is used just like @samp{\( | 460 is called a @dfn{shy} grouping operator, and it is used just like |
486 @dots{} \)}, except that it does not cause the matched substring to be | 461 @samp{\( @dots{} \)}, except that it does not cause the matched |
487 recorded for future reference. This can be useful at times when a | 462 substring to be recorded for future reference. |
488 program wants to refer to a specific @samp{\( @dots{} \)} group's number | 463 |
489 (eg. in a @code{match-string} or @code{match-beginning} function | 464 This is useful when you need a lot of grouping @samp{\( @dots{} \)} |
490 application) and you need to use grouping constructs for an alternation | 465 constructs, but only want to remember one or two. Then you can use |
491 or multi--character repetition inside a regular expression string that | 466 not want to remember them for later use with @code{match-string}. |
492 can change each time the code is run, but you don't want those groups | |
493 counting because they'd change the reference number of the group you | |
494 want to refer to that is inside the static part of your generated | |
495 regular expression. | |
496 | |
497 @lisp | |
498 ;; @r{Here `dynamic-regex' might contain shy groups.} | |
499 (re-search-forward | |
500 (concat "\\(" dynamic-regex "\\)\\(-?[0-9]\\@{2,4\\@}\\)")) | |
501 ;; @r{and this `match-string' will still refer to the integer} | |
502 ;; @r{captured by the second group in the `concat' string.} | |
503 (match-string 2) | |
504 @end lisp | |
505 | 467 |
506 Using @samp{\(?: @dots{} \)} rather than @samp{\( @dots{} \)} when you | 468 Using @samp{\(?: @dots{} \)} rather than @samp{\( @dots{} \)} when you |
507 don't need the captured substrings ought to speed up your programs some, | 469 don't need the captured substrings ought to speed up your programs some, |
508 since it shortens the code path followed by the regular expression | 470 since it shortens the code path followed by the regular expression |
509 engine, as well as the amount of memory allocation and string copying it | 471 engine, as well as the amount of memory allocation and string copying it |
510 must do. The actual performance gain to be observed has not been | 472 must do. The actual performance gain to be observed has not been |
511 measured or quantified as of this writing. | 473 measured or quantified as of this writing. |
512 @c This is used to good advantage by the font-locking code, and by `regexp-opt.el'. | 474 @c This is used to good advantage by the font-locking code, and by |
513 @c ... It will be. It's not yet, but will be. | 475 @c `regexp-opt.el'. ... It will be. It's not yet, but will be. |
476 | |
477 The shy grouping operator has been borrowed from Perl, and has not been | |
478 available prior to XEmacs 20.3, nor is it available in FSF Emacs. | |
514 | 479 |
515 @item \w | 480 @item \w |
516 @cindex @samp{\w} in regexp | 481 @cindex @samp{\w} in regexp |
517 matches any word-constituent character. The editor syntax table | 482 matches any word-constituent character. The editor syntax table |
518 determines which characters these are. @xref{Syntax Tables}. | 483 determines which characters these are. @xref{Syntax Tables}. |
786 @end group | 751 @end group |
787 | 752 |
788 @group | 753 @group |
789 (match-end 0) | 754 (match-end 0) |
790 @result{} 32 | 755 @result{} 32 |
756 @end group | |
757 @end example | |
758 @end defun | |
759 | |
760 @defun split-string string &optional pattern | |
761 This function splits @var{string} to substrings delimited by | |
762 @var{pattern}, and returns a list of substrings. If @var{pattern} is | |
763 omitted, it defaults to @samp{[ \f\t\n\r\v]+}, which means that it | |
764 splits @var{string} by white--space. | |
765 | |
766 @example | |
767 @group | |
768 (split-string "foo bar") | |
769 @result{} ("foo" "bar") | |
770 @end group | |
771 | |
772 @group | |
773 (split-string "something") | |
774 @result{} ("something") | |
775 @end group | |
776 | |
777 @group | |
778 (split-string "a:b:c" ":") | |
779 @result{} ("a" "b" "c") | |
780 @end group | |
781 | |
782 @group | |
783 (split-string ":a::b:c" ":") | |
784 @result{} ("" "a" "" "b" "c") | |
791 @end group | 785 @end group |
792 @end example | 786 @end example |
793 @end defun | 787 @end defun |
794 | 788 |
795 @defun looking-at regexp | 789 @defun looking-at regexp |