comparison man/xemacs/search.texi @ 442:abe6d1db359e r21-2-36

Import from CVS: tag r21-2-36
author cvs
date Mon, 13 Aug 2007 11:35:02 +0200
parents 376386a54a3c
children
comparison
equal deleted inserted replaced
441:72a7cfa4a488 442:abe6d1db359e
75 only if the next command you want to type is a printing character, 75 only if the next command you want to type is a printing character,
76 @key{DEL}, @key{ESC}, or another control character that is special 76 @key{DEL}, @key{ESC}, or another control character that is special
77 within searches (@kbd{C-q}, @kbd{C-w}, @kbd{C-r}, @kbd{C-s}, or @kbd{C-y}). 77 within searches (@kbd{C-q}, @kbd{C-w}, @kbd{C-r}, @kbd{C-s}, or @kbd{C-y}).
78 78
79 Sometimes you search for @samp{FOO} and find it, but were actually 79 Sometimes you search for @samp{FOO} and find it, but were actually
80 looking for a different occurance of it. To move to the next occurrence 80 looking for a different occurrence of it. To move to the next occurrence
81 of the search string, type another @kbd{C-s}. Do this as often as 81 of the search string, type another @kbd{C-s}. Do this as often as
82 necessary. If you overshoot, you can cancel some @kbd{C-s} 82 necessary. If you overshoot, you can cancel some @kbd{C-s}
83 characters with @key{DEL}. 83 characters with @key{DEL}.
84 84
85 After you exit a search, you can search for the same string again by 85 After you exit a search, you can search for the same string again by
328 @section Regular Expression Search 328 @section Regular Expression Search
329 @cindex regular expression 329 @cindex regular expression
330 @cindex regexp 330 @cindex regexp
331 331
332 A @dfn{regular expression} (@dfn{regexp}, for short) is a pattern that 332 A @dfn{regular expression} (@dfn{regexp}, for short) is a pattern that
333 denotes a set of strings, possibly an infinite set. Searching for matches 333 denotes a (possibly infinite) set of strings. Searching for matches
334 for a regexp is a powerful operation that editors on Unix systems have 334 for a regexp is a powerful operation that editors on Unix systems have
335 traditionally offered. In XEmacs, you can search for the next match for 335 traditionally offered.
336 a regexp either incrementally or not. 336
337 To gain a thorough understanding of regular expressions and how to use
338 them to best advantage, we recommend that you study @cite{Mastering
339 Regular Expressions, by Jeffrey E.F. Friedl, O'Reilly and Associates,
340 1997}. (It's known as the "Hip Owls" book, because of the picture on its
341 cover.) You might also read the manuals to @ref{(gawk)Top},
342 @ref{(ed)Top}, @cite{sed}, @cite{grep}, @ref{(perl)Top},
343 @ref{(regex)Top}, @ref{(rx)Top}, @cite{pcre}, and @ref{(flex)Top}, which
344 also make good use of regular expressions.
345
346 The XEmacs regular expression syntax most closely resembles that of
347 @cite{ed}, or @cite{grep}, the GNU versions of which all utilize the GNU
348 @cite{regex} library. XEmacs' version of @cite{regex} has recently been
349 extended with some Perl--like capabilities, described in the next
350 section.
351
352 In XEmacs, you can search for the next match for a regexp either
353 incrementally or not.
337 354
338 @kindex M-C-s 355 @kindex M-C-s
356 @kindex M-C-r
339 @findex isearch-forward-regexp 357 @findex isearch-forward-regexp
340 @findex isearch-backward-regexp 358 @findex isearch-backward-regexp
341 Incremental search for a regexp is done by typing @kbd{M-C-s} 359 Incremental search for a regexp is done by typing @kbd{M-C-s}
342 (@code{isearch-forward-regexp}). This command reads a search string 360 (@code{isearch-forward-regexp}). This command reads a search string
343 incrementally just like @kbd{C-s}, but it treats the search string as a 361 incrementally just like @kbd{C-s}, but it treats the search string as a
344 regexp rather than looking for an exact match against the text in the 362 regexp rather than looking for an exact match against the text in the
345 buffer. Each time you add text to the search string, you make the regexp 363 buffer. Each time you add text to the search string, you make the regexp
346 longer, and the new regexp is searched for. A reverse regexp search command 364 longer, and the new regexp is searched for. A reverse regexp search command
347 @code{isearch-backward-regexp} also exists, but no key runs it. 365 @code{isearch-backward-regexp} also exists, bound to @kbd{M-C-r}.
348 366
349 All of the control characters that do special things within an ordinary 367 All of the control characters that do special things within an ordinary
350 incremental search have the same functionality in incremental regexp search. 368 incremental search have the same functionality in incremental regexp search.
351 Typing @kbd{C-s} or @kbd{C-r} immediately after starting a search 369 Typing @kbd{C-s} or @kbd{C-r} immediately after starting a search
352 retrieves the last incremental search regexp used: 370 retrieves the last incremental search regexp used:
356 @findex re-search-backward 374 @findex re-search-backward
357 Non-incremental search for a regexp is done by the functions 375 Non-incremental search for a regexp is done by the functions
358 @code{re-search-forward} and @code{re-search-backward}. You can invoke 376 @code{re-search-forward} and @code{re-search-backward}. You can invoke
359 them with @kbd{M-x} or bind them to keys. You can also call 377 them with @kbd{M-x} or bind them to keys. You can also call
360 @code{re-search-forward} by way of incremental regexp search with 378 @code{re-search-forward} by way of incremental regexp search with
361 @kbd{M-C-s @key{RET}}. 379 @kbd{M-C-s @key{RET}}; similarly for @code{re-search-backward} with
380 @kbd{M-C-r @key{RET}}.
362 381
363 @node Regexps, Search Case, Regexp Search, Search 382 @node Regexps, Search Case, Regexp Search, Search
364 @section Syntax of Regular Expressions 383 @section Syntax of Regular Expressions
365 384
366 Regular expressions have a syntax in which a few characters are special 385 Regular expressions have a syntax in which a few characters are
367 constructs and the rest are @dfn{ordinary}. An ordinary character is a 386 special constructs and the rest are @dfn{ordinary}. An ordinary
368 simple regular expression which matches that character and nothing else. 387 character is a simple regular expression that matches that character and
369 The special characters are @samp{$}, @samp{^}, @samp{.}, @samp{*}, 388 nothing else. The special characters are @samp{.}, @samp{*}, @samp{+},
370 @samp{+}, @samp{?}, @samp{[}, @samp{]} and @samp{\}; no new special 389 @samp{?}, @samp{[}, @samp{]}, @samp{^}, @samp{$}, and @samp{\}; no new
371 characters will be defined. Any other character appearing in a regular 390 special characters will be defined in the future. Any other character
372 expression is ordinary, unless a @samp{\} precedes it.@refill 391 appearing in a regular expression is ordinary, unless a @samp{\}
392 precedes it.
373 393
374 For example, @samp{f} is not a special character, so it is ordinary, and 394 For example, @samp{f} is not a special character, so it is ordinary, and
375 therefore @samp{f} is a regular expression that matches the string @samp{f} 395 therefore @samp{f} is a regular expression that matches the string
376 and no other string. (It does @i{not} match the string @samp{ff}.) Likewise, 396 @samp{f} and no other string. (It does @emph{not} match the string
377 @samp{o} is a regular expression that matches only @samp{o}.@refill 397 @samp{ff}.) Likewise, @samp{o} is a regular expression that matches
398 only @samp{o}.@refill
378 399
379 Any two regular expressions @var{a} and @var{b} can be concatenated. The 400 Any two regular expressions @var{a} and @var{b} can be concatenated. The
380 result is a regular expression which matches a string if @var{a} matches 401 result is a regular expression that matches a string if @var{a} matches
381 some amount of the beginning of that string and @var{b} matches the rest of 402 some amount of the beginning of that string and @var{b} matches the rest of
382 the string.@refill 403 the string.@refill
383 404
384 As a simple example, you can concatenate the regular expressions @samp{f} 405 As a simple example, we can concatenate the regular expressions @samp{f}
385 and @samp{o} to get the regular expression @samp{fo}, which matches only 406 and @samp{o} to get the regular expression @samp{fo}, which matches only
386 the string @samp{fo}. To do something nontrivial, you 407 the string @samp{fo}. Still trivial. To do something more powerful, you
387 need to use one of the following special characters: 408 need to use one of the special characters. Here is a list of them:
388 409
410 @need 1200
389 @table @kbd 411 @table @kbd
390 @item .@: @r{(Period)} 412 @item .@: @r{(Period)}
413 @cindex @samp{.} in regexp
391 is a special character that matches any single character except a newline. 414 is a special character that matches any single character except a newline.
392 Using concatenation, you can make regular expressions like @samp{a.b}, which 415 Using concatenation, we can make regular expressions like @samp{a.b}, which
393 matches any three-character string which begins with @samp{a} and ends with 416 matches any three-character string that begins with @samp{a} and ends with
394 @samp{b}.@refill 417 @samp{b}.@refill
395 418
396 @item * 419 @item *
397 is not a construct by itself; it is a suffix, which means the 420 @cindex @samp{*} in regexp
398 preceding regular expression is to be repeated as many times as 421 is not a construct by itself; it is a quantifying suffix operator that
422 means to repeat the preceding regular expression as many times as
399 possible. In @samp{fo*}, the @samp{*} applies to the @samp{o}, so 423 possible. In @samp{fo*}, the @samp{*} applies to the @samp{o}, so
400 @samp{fo*} matches one @samp{f} followed by any number of @samp{o}s. 424 @samp{fo*} matches one @samp{f} followed by any number of @samp{o}s.
401 The case of zero @samp{o}s is allowed: @samp{fo*} does match 425 The case of zero @samp{o}s is allowed: @samp{fo*} does match
402 @samp{f}.@refill 426 @samp{f}.@refill
403 427
404 @samp{*} always applies to the @i{smallest} possible preceding 428 @samp{*} always applies to the @emph{smallest} possible preceding
405 expression. Thus, @samp{fo*} has a repeating @samp{o}, not a 429 expression. Thus, @samp{fo*} has a repeating @samp{o}, not a
406 repeating @samp{fo}.@refill 430 repeating @samp{fo}.@refill
407 431
408 The matcher processes a @samp{*} construct by immediately matching 432 The matcher processes a @samp{*} construct by matching, immediately, as
409 as many repetitions as it can find. Then it continues with the rest 433 many repetitions as can be found; it is "greedy". Then it continues
410 of the pattern. If that fails, backtracking occurs, discarding some 434 with the rest of the pattern. If that fails, backtracking occurs,
411 of the matches of the @samp{*}-modified construct in case that makes 435 discarding some of the matches of the @samp{*}-modified construct in
412 it possible to match the rest of the pattern. For example, matching 436 case that makes it possible to match the rest of the pattern. For
413 @samp{ca*ar} against the string @samp{caaar}, the @samp{a*} first 437 example, in matching @samp{ca*ar} against the string @samp{caaar}, the
414 tries to match all three @samp{a}s; but the rest of the pattern is 438 @samp{a*} first tries to match all three @samp{a}s; but the rest of the
415 @samp{ar} and there is only @samp{r} left to match, so this try fails. 439 pattern is @samp{ar} and there is only @samp{r} left to match, so this
416 The next alternative is for @samp{a*} to match only two @samp{a}s. 440 try fails. The next alternative is for @samp{a*} to match only two
417 With this choice, the rest of the regexp matches successfully.@refill 441 @samp{a}s. With this choice, the rest of the regexp matches
442 successfully.@refill
443
444 Nested repetition operators can be extremely slow if they specify
445 backtracking loops. For example, it could take hours for the regular
446 expression @samp{\(x+y*\)*a} to match the sequence
447 @samp{xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxz}. The slowness is because
448 Emacs must try each imaginable way of grouping the 35 @samp{x}'s before
449 concluding that none of them can work. To make sure your regular
450 expressions run fast, check nested repetitions carefully.
418 451
419 @item + 452 @item +
420 is a suffix character similar to @samp{*} except that it requires that 453 @cindex @samp{+} in regexp
421 the preceding expression be matched at least once. For example, 454 is a quantifying suffix operator similar to @samp{*} except that the
422 @samp{ca+r} will match the strings @samp{car} and @samp{caaaar} 455 preceding expression must match at least once. It is also "greedy".
423 but not the string @samp{cr}, whereas @samp{ca*r} would match all 456 So, for example, @samp{ca+r} matches the strings @samp{car} and
424 three strings.@refill 457 @samp{caaaar} but not the string @samp{cr}, whereas @samp{ca*r} matches
458 all three strings.
425 459
426 @item ? 460 @item ?
427 is a suffix character similar to @samp{*} except that it can match the 461 @cindex @samp{?} in regexp
428 preceding expression either once or not at all. For example, 462 is a quantifying suffix operator similar to @samp{*}, except that the
429 @samp{ca?r} will match @samp{car} or @samp{cr}; nothing else. 463 preceding expression can match either once or not at all. For example,
464 @samp{ca?r} matches @samp{car} or @samp{cr}, but does not match anything
465 else.
466
467 @item *?
468 @cindex @samp{*?} in regexp
469 works just like @samp{*}, except that rather than matching the longest
470 match, it matches the shortest match. @samp{*?} is known as a
471 @dfn{non-greedy} quantifier, a regexp construct borrowed from Perl.
472 @c Did perl get this from somewhere? What's the real history of *? ?
473
474 This construct is very useful for when you want to match the text inside
475 a pair of delimiters. For instance, @samp{/\*.*?\*/} will match C
476 comments in a string. This could not easily be achieved without the use
477 of a non-greedy quantifier.
478
479 This construct has not been available prior to XEmacs 20.4. It is not
480 available in FSF Emacs.
481
482 @item +?
483 @cindex @samp{+?} in regexp
484 is the non-greedy version of @samp{+}.
485
486 @item ??
487 @cindex @samp{??} in regexp
488 is the non-greedy version of @samp{?}.
489
490 @item \@{n,m\@}
491 @c Note the spacing after the close brace is deliberate.
492 @cindex @samp{\@{n,m\@} }in regexp
493 serves as an interval quantifier, analogous to @samp{*} or @samp{+}, but
494 specifies that the expression must match at least @var{n} times, but no
495 more than @var{m} times. This syntax is supported by most Unix regexp
496 utilities, and has been introduced to XEmacs for the version 20.3.
497
498 Unfortunately, the non-greedy version of this quantifier does not exist
499 currently, although it does in Perl.
430 500
431 @item [ @dots{} ] 501 @item [ @dots{} ]
502 @cindex character set (in regexp)
503 @cindex @samp{[} in regexp
504 @cindex @samp{]} in regexp
432 @samp{[} begins a @dfn{character set}, which is terminated by a 505 @samp{[} begins a @dfn{character set}, which is terminated by a
433 @samp{]}. In the simplest case, the characters between the two form 506 @samp{]}. In the simplest case, the characters between the two brackets
434 the set. Thus, @samp{[ad]} matches either one @samp{a} or one 507 form the set. Thus, @samp{[ad]} matches either one @samp{a} or one
435 @samp{d}, and @samp{[ad]*} matches any string composed of just 508 @samp{d}, and @samp{[ad]*} matches any string composed of just @samp{a}s
436 @samp{a}s and @samp{d}s (including the empty string), from which it 509 and @samp{d}s (including the empty string), from which it follows that
437 follows that @samp{c[ad]*r} matches @samp{cr}, @samp{car}, @samp{cdr}, 510 @samp{c[ad]*r} matches @samp{cr}, @samp{car}, @samp{cdr},
438 @samp{caddaar}, etc.@refill 511 @samp{caddaar}, etc.@refill
439 512
440 You can include character ranges in a character set by writing two 513 The usual regular expression special characters are not special inside a
514 character set. A completely different set of special characters exists
515 inside character sets: @samp{]}, @samp{-} and @samp{^}.@refill
516
517 @samp{-} is used for ranges of characters. To write a range, write two
441 characters with a @samp{-} between them. Thus, @samp{[a-z]} matches any 518 characters with a @samp{-} between them. Thus, @samp{[a-z]} matches any
442 lower-case letter. Ranges may be intermixed freely with individual 519 lower case letter. Ranges may be intermixed freely with individual
443 characters, as in @samp{[a-z$%.]}, which matches any lower-case letter 520 characters, as in @samp{[a-z$%.]}, which matches any lower case letter
444 or @samp{$}, @samp{%}, or period. 521 or @samp{$}, @samp{%}, or a period.@refill
445 @refill 522
446 523 To include a @samp{]} in a character set, make it the first character.
447 Note that inside a character set the usual special characters are not 524 For example, @samp{[]a]} matches @samp{]} or @samp{a}. To include a
448 special any more. A completely different set of special characters 525 @samp{-}, write @samp{-} as the first character in the set, or put it
449 exists inside character sets: @samp{]}, @samp{-}, and @samp{^}.@refill 526 immediately after a range. (You can replace one individual character
450 527 @var{c} with the range @samp{@var{c}-@var{c}} to make a place to put the
451 To include a @samp{]} in a character set, you must make it the first 528 @samp{-}.) There is no way to write a set containing just @samp{-} and
452 character. For example, @samp{[]a]} matches @samp{]} or @samp{a}. To 529 @samp{]}.
453 include a @samp{-}, write @samp{---}, which is a range containing only 530
454 @samp{-}. To include @samp{^}, make it other than the first character 531 To include @samp{^} in a set, put it anywhere but at the beginning of
455 in the set.@refill 532 the set.
456 533
457 @item [^ @dots{} ] 534 @item [^ @dots{} ]
535 @cindex @samp{^} in regexp
458 @samp{[^} begins a @dfn{complement character set}, which matches any 536 @samp{[^} begins a @dfn{complement character set}, which matches any
459 character except the ones specified. Thus, @samp{[^a-z0-9A-Z]} 537 character except the ones specified. Thus, @samp{[^a-z0-9A-Z]}
460 matches all characters @i{except} letters and digits.@refill 538 matches all characters @emph{except} letters and digits.@refill
461 539
462 @samp{^} is not special in a character set unless it is the first 540 @samp{^} is not special in a character set unless it is the first
463 character. The character following the @samp{^} is treated as if it 541 character. The character following the @samp{^} is treated as if it
464 were first (@samp{-} and @samp{]} are not special there). 542 were first (thus, @samp{-} and @samp{]} are not special there).
465 543
466 Note that a complement character set can match a newline, unless 544 Note that a complement character set can match a newline, unless
467 newline is mentioned as one of the characters not to match. 545 newline is mentioned as one of the characters not to match.
468 546
469 @item ^ 547 @item ^
470 is a special character that matches the empty string, but only if at 548 @cindex @samp{^} in regexp
471 the beginning of a line in the text being matched. Otherwise, it fails 549 @cindex beginning of line in regexp
472 to match anything. Thus, @samp{^foo} matches a @samp{foo} that occurs 550 is a special character that matches the empty string, but only at the
473 at the beginning of a line. 551 beginning of a line in the text being matched. Otherwise it fails to
552 match anything. Thus, @samp{^foo} matches a @samp{foo} that occurs at
553 the beginning of a line.
554
555 When matching a string instead of a buffer, @samp{^} matches at the
556 beginning of the string or after a newline character @samp{\n}.
474 557
475 @item $ 558 @item $
559 @cindex @samp{$} in regexp
476 is similar to @samp{^} but matches only at the end of a line. Thus, 560 is similar to @samp{^} but matches only at the end of a line. Thus,
477 @samp{xx*$} matches a string of one @samp{x} or more at the end of a line. 561 @samp{x+$} matches a string of one @samp{x} or more at the end of a line.
562
563 When matching a string instead of a buffer, @samp{$} matches at the end
564 of the string or before a newline character @samp{\n}.
478 565
479 @item \ 566 @item \
480 does two things: it quotes the special characters (including 567 @cindex @samp{\} in regexp
568 has two functions: it quotes the special characters (including
481 @samp{\}), and it introduces additional special constructs. 569 @samp{\}), and it introduces additional special constructs.
482 570
483 Because @samp{\} quotes special characters, @samp{\$} is a regular 571 Because @samp{\} quotes special characters, @samp{\$} is a regular
484 expression that matches only @samp{$}, and @samp{\[} is a regular 572 expression that matches only @samp{$}, and @samp{\[} is a regular
485 expression that matches only @samp{[}, and so on.@refill 573 expression that matches only @samp{[}, and so on.
574
575 @c Removed a paragraph here in lispref about doubling backslashes inside
576 @c of Lisp strings.
577
486 @end table 578 @end table
487 579
488 Note: for historical compatibility, special characters are treated as 580 @strong{Please note:} For historical compatibility, special characters
489 ordinary ones if they are in contexts where their special meanings make no 581 are treated as ordinary ones if they are in contexts where their special
490 sense. For example, @samp{*foo} treats @samp{*} as ordinary since there is 582 meanings make no sense. For example, @samp{*foo} treats @samp{*} as
491 no preceding expression on which the @samp{*} can act. It is poor practice 583 ordinary since there is no preceding expression on which the @samp{*}
492 to depend on this behavior; better to quote the special character anyway, 584 can act. It is poor practice to depend on this behavior; quote the
493 regardless of where is appears.@refill 585 special character anyway, regardless of where it appears.@refill
494 586
495 Usually, @samp{\} followed by any character matches only 587 For the most part, @samp{\} followed by any character matches only
496 that character. However, there are several exceptions: characters 588 that character. However, there are several exceptions: characters
497 which, when preceded by @samp{\}, are special constructs. Such 589 that, when preceded by @samp{\}, are special constructs. Such
498 characters are always ordinary when encountered on their own. Here 590 characters are always ordinary when encountered on their own. Here
499 is a table of @samp{\} constructs. 591 is a table of @samp{\} constructs:
500 592
501 @table @kbd 593 @table @kbd
502 @item \| 594 @item \|
595 @cindex @samp{|} in regexp
596 @cindex regexp alternative
503 specifies an alternative. 597 specifies an alternative.
504 Two regular expressions @var{a} and @var{b} with @samp{\|} in 598 Two regular expressions @var{a} and @var{b} with @samp{\|} in
505 between form an expression that matches anything @var{a} or 599 between form an expression that matches anything that either @var{a} or
506 @var{b} matches.@refill 600 @var{b} matches.@refill
507 601
508 Thus, @samp{foo\|bar} matches either @samp{foo} or @samp{bar} 602 Thus, @samp{foo\|bar} matches either @samp{foo} or @samp{bar}
509 but no other string.@refill 603 but no other string.@refill
510 604
513 @samp{\|}.@refill 607 @samp{\|}.@refill
514 608
515 Full backtracking capability exists to handle multiple uses of @samp{\|}. 609 Full backtracking capability exists to handle multiple uses of @samp{\|}.
516 610
517 @item \( @dots{} \) 611 @item \( @dots{} \)
612 @cindex @samp{(} in regexp
613 @cindex @samp{)} in regexp
614 @cindex regexp grouping
518 is a grouping construct that serves three purposes: 615 is a grouping construct that serves three purposes:
519 616
520 @enumerate 617 @enumerate
521 @item 618 @item
522 To enclose a set of @samp{\|} alternatives for other operations. 619 To enclose a set of @samp{\|} alternatives for other operations.
523 Thus, @samp{\(foo\|bar\)x} matches either @samp{foox} or @samp{barx}. 620 Thus, @samp{\(foo\|bar\)x} matches either @samp{foox} or @samp{barx}.
524 621
525 @item 622 @item
526 To enclose a complicated expression for the postfix @samp{*} to operate on. 623 To enclose an expression for a suffix operator such as @samp{*} to act
527 Thus, @samp{ba\(na\)*} matches @samp{bananana}, etc., with any (zero or 624 on. Thus, @samp{ba\(na\)*} matches @samp{bananana}, etc., with any
528 more) number of @samp{na} strings.@refill 625 (zero or more) number of @samp{na} strings.@refill
529 626
530 @item 627 @item
531 To mark a matched substring for future reference. 628 To record a matched substring for future reference.
532
533 @end enumerate 629 @end enumerate
534 630
535 This last application is not a consequence of the idea of a 631 This last application is not a consequence of the idea of a
536 parenthetical grouping; it is a separate feature which happens to be 632 parenthetical grouping; it is a separate feature that happens to be
537 assigned as a second meaning to the same @samp{\( @dots{} \)} construct 633 assigned as a second meaning to the same @samp{\( @dots{} \)} construct
538 because in practice there is no conflict between the two meanings. 634 because there is no conflict in practice between the two meanings.
539 Here is an explanation: 635 Here is an explanation of this feature:
540 636
541 @item \@var{digit} 637 @item \@var{digit}
542 after the end of a @samp{\( @dots{} \)} construct, the matcher remembers the 638 matches the same text that matched the @var{digit}th occurrence of a
543 beginning and end of the text matched by that construct. Then, later on
544 in the regular expression, you can use @samp{\} followed by @var{digit}
545 to mean ``match the same text matched the @var{digit}'th time by the
546 @samp{\( @dots{} \)} construct.''@refill
547
548 The strings matching the first nine @samp{\( @dots{} \)} constructs appearing
549 in a regular expression are assigned numbers 1 through 9 in order that the
550 open-parentheses appear in the regular expression. @samp{\1} through
551 @samp{\9} may be used to refer to the text matched by the corresponding
552 @samp{\( @dots{} \)} construct. 639 @samp{\( @dots{} \)} construct.
640
641 In other words, after the end of a @samp{\( @dots{} \)} construct. the
642 matcher remembers the beginning and end of the text matched by that
643 construct. Then, later on in the regular expression, you can use
644 @samp{\} followed by @var{digit} to match that same text, whatever it
645 may have been.
646
647 The strings matching the first nine @samp{\( @dots{} \)} constructs
648 appearing in a regular expression are assigned numbers 1 through 9 in
649 the order that the open parentheses appear in the regular expression.
650 So you can use @samp{\1} through @samp{\9} to refer to the text matched
651 by the corresponding @samp{\( @dots{} \)} constructs.
553 652
554 For example, @samp{\(.*\)\1} matches any newline-free string that is 653 For example, @samp{\(.*\)\1} matches any newline-free string that is
555 composed of two identical halves. The @samp{\(.*\)} matches the first 654 composed of two identical halves. The @samp{\(.*\)} matches the first
556 half, which may be anything, but the @samp{\1} that follows must match 655 half, which may be anything, but the @samp{\1} that follows must match
557 the same exact text. 656 the same exact text.
558 657
658 @item \(?: @dots{} \)
659 @cindex @samp{\(?:} in regexp
660 @cindex regexp grouping
661 is called a @dfn{shy} grouping operator, and it is used just like
662 @samp{\( @dots{} \)}, except that it does not cause the matched
663 substring to be recorded for future reference.
664
665 This is useful when you need a lot of grouping @samp{\( @dots{} \)}
666 constructs, but only want to remember one or two -- or if you have
667 more than nine groupings and need to use backreferences to refer to
668 the groupings at the end.
669
670 Using @samp{\(?: @dots{} \)} rather than @samp{\( @dots{} \)} when you
671 don't need the captured substrings ought to speed up your programs some,
672 since it shortens the code path followed by the regular expression
673 engine, as well as the amount of memory allocation and string copying it
674 must do. The actual performance gain to be observed has not been
675 measured or quantified as of this writing.
676 @c This is used to good advantage by the font-locking code, and by
677 @c `regexp-opt.el'.
678
679 The shy grouping operator has been borrowed from Perl, and has not been
680 available prior to XEmacs 20.3, nor is it available in FSF Emacs.
681
682 @item \w
683 @cindex @samp{\w} in regexp
684 matches any word-constituent character. The editor syntax table
685 determines which characters these are. @xref{Syntax}.
686
687 @item \W
688 @cindex @samp{\W} in regexp
689 matches any character that is not a word constituent.
690
691 @item \s@var{code}
692 @cindex @samp{\s} in regexp
693 matches any character whose syntax is @var{code}. Here @var{code} is a
694 character that represents a syntax code: thus, @samp{w} for word
695 constituent, @samp{-} for whitespace, @samp{(} for open parenthesis,
696 etc. @xref{Syntax}, for a list of syntax codes and the characters that
697 stand for them.
698
699 @item \S@var{code}
700 @cindex @samp{\S} in regexp
701 matches any character whose syntax is not @var{code}.
702 @end table
703
704 The following regular expression constructs match the empty string---that is,
705 they don't use up any characters---but whether they match depends on the
706 context.
707
708 @table @kbd
559 @item \` 709 @item \`
560 matches the empty string, provided it is at the beginning 710 @cindex @samp{\`} in regexp
561 of the buffer. 711 matches the empty string, but only at the beginning
712 of the buffer or string being matched against.
562 713
563 @item \' 714 @item \'
564 matches the empty string, provided it is at the end of 715 @cindex @samp{\'} in regexp
565 the buffer. 716 matches the empty string, but only at the end of
717 the buffer or string being matched against.
718
719 @item \=
720 @cindex @samp{\=} in regexp
721 matches the empty string, but only at point.
722 (This construct is not defined when matching against a string.)
566 723
567 @item \b 724 @item \b
568 matches the empty string, provided it is at the beginning or 725 @cindex @samp{\b} in regexp
726 matches the empty string, but only at the beginning or
569 end of a word. Thus, @samp{\bfoo\b} matches any occurrence of 727 end of a word. Thus, @samp{\bfoo\b} matches any occurrence of
570 @samp{foo} as a separate word. @samp{\bballs?\b} matches 728 @samp{foo} as a separate word. @samp{\bballs?\b} matches
571 @samp{ball} or @samp{balls} as a separate word.@refill 729 @samp{ball} or @samp{balls} as a separate word.@refill
572 730
573 @item \B 731 @item \B
574 matches the empty string, provided it is @i{not} at the beginning or 732 @cindex @samp{\B} in regexp
733 matches the empty string, but @emph{not} at the beginning or
575 end of a word. 734 end of a word.
576 735
577 @item \< 736 @item \<
578 matches the empty string, provided it is at the beginning of a word. 737 @cindex @samp{\<} in regexp
738 matches the empty string, but only at the beginning of a word.
579 739
580 @item \> 740 @item \>
581 matches the empty string, provided it is at the end of a word. 741 @cindex @samp{\>} in regexp
582 742 matches the empty string, but only at the end of a word.
583 @item \w
584 matches any word-constituent character. The editor syntax table
585 determines which characters these are.
586
587 @item \W
588 matches any character that is not a word-constituent.
589
590 @item \s@var{code}
591 matches any character whose syntax is @var{code}. @var{code} is a
592 character which represents a syntax code: thus, @samp{w} for word
593 constituent, @samp{-} for whitespace, @samp{(} for open-parenthesis,
594 etc. @xref{Syntax}.@refill
595
596 @item \S@var{code}
597 matches any character whose syntax is not @var{code}.
598 @end table 743 @end table
599 744
600 Here is a complicated regexp used by Emacs to recognize the end of a 745 Here is a complicated regexp used by Emacs to recognize the end of a
601 sentence together with any whitespace that follows. It is given in Lisp 746 sentence together with any whitespace that follows. It is given in Lisp
602 syntax to enable you to distinguish the spaces from the tab characters. In 747 syntax to enable you to distinguish the spaces from the tab characters. In