comparison man/lispref/searching.texi @ 371:cc15677e0335 r21-2b1

Import from CVS: tag r21-2b1
author cvs
date Mon, 13 Aug 2007 11:03:08 +0200
parents a4f53d9b3154
children 6240c7796c7a
comparison
equal deleted inserted replaced
370:bd866891f083 371:cc15677e0335
157 @cindex regexp 157 @cindex regexp
158 158
159 A @dfn{regular expression} (@dfn{regexp}, for short) is a pattern that 159 A @dfn{regular expression} (@dfn{regexp}, for short) is a pattern that
160 denotes a (possibly infinite) set of strings. Searching for matches for 160 denotes a (possibly infinite) set of strings. Searching for matches for
161 a regexp is a very powerful operation. This section explains how to write 161 a regexp is a very powerful operation. This section explains how to write
162 regexps; the following section says how to search using them. 162 regexps; the following section says how to search for them.
163 163
164 To gain a thorough understanding of regular expressions and how to use 164 To gain a thorough understanding of regular expressions and how to use
165 them to best advantage, we recommend that you study @cite{Mastering 165 them to best advantage, we recommend that you study @cite{Mastering
166 Regular Expressions, by Jeffrey E.F. Friedl, O'Reilly and Associates, 166 Regular Expressions, by Jeffrey E.F. Friedl, O'Reilly and Associates,
167 1997}. (It's known as the "Hip Owls" book, because of the picture on its 167 1997}. (It's known as the "Hip Owls" book, because of the picture on its
168 cover.) You might also read the manuals to @ref{(gawk)Top}, 168 cover.) You might also read the manuals to @ref{(gawk)Top},
169 @ref{(ed)Top}, @cite{sed}, @cite{grep}, @ref{(perl)Top}, 169 @ref{(ed)Top}, @cite{sed}, @cite{grep}, @ref{(perl)Top},
170 @ref{(regex)Top}, @ref{(rx)Top}, @cite{pcre}, and @ref{(flex)Top}. All 170 @ref{(regex)Top}, @ref{(rx)Top}, @cite{pcre}, and @ref{(flex)Top}, which
171 of these programs and libraries make effective use of regular 171 also make good use of regular expressions.
172 expressions.
173 172
174 The XEmacs regular expression syntax most closely resembles that of 173 The XEmacs regular expression syntax most closely resembles that of
175 @cite{ed}, or @cite{grep}, the GNU versions of which all utilize the GNU 174 @cite{ed}, or @cite{grep}, the GNU versions of which all utilize the GNU
176 @cite{regex} library. XEmacs' version of @cite{regex} has recently been 175 @cite{regex} library. XEmacs' version of @cite{regex} has recently been
177 extended with some Perl--like capabilities, which are described in the 176 extended with some Perl--like capabilities, described in the next
178 next section. 177 section.
179 178
180 @menu 179 @menu
181 * Syntax of Regexps:: Rules for writing regular expressions. 180 * Syntax of Regexps:: Rules for writing regular expressions.
182 * Regexp Example:: Illustrates regular expression syntax. 181 * Regexp Example:: Illustrates regular expression syntax.
183 @end menu 182 @end menu
262 261
263 @item ? 262 @item ?
264 @cindex @samp{?} in regexp 263 @cindex @samp{?} in regexp
265 is a quantifying suffix operator similar to @samp{*}, except that the 264 is a quantifying suffix operator similar to @samp{*}, except that the
266 preceding expression can match either once or not at all. For example, 265 preceding expression can match either once or not at all. For example,
267 @samp{ca?r} matches @samp{car} or @samp{cr}, but does not match anything 266 @samp{ca?r} matches @samp{car} or @samp{cr}, but does not match anyhing
268 else. 267 else.
269 268
270 @item *? 269 @item *?
271 @cindex @samp{*?} in regexp 270 @cindex @samp{*?} in regexp
272 works just like @samp{*}, except that rather than matching the longest 271 works just like @samp{*}, except that rather than matching the longest
273 match, it matches the shortest match. @samp{*?} is known as a 272 match, it matches the shortest match. @samp{*?} is known as a
274 @dfn{non-greedy} quantifier, a regexp construct borrowed from Perl. 273 @dfn{non-greedy} quantifier, a regexp construct borrowed from Perl.
275 @c Did perl get this from somewhere? What's the real history of *? ? 274 @c Did perl get this from somewhere? What's the real history of *? ?
276 275
277 This construct is very useful for when you want to match the text inside 276 This construct very useful for when you want to match the text inside a
278 a pair of delimiters. For instance, @samp{/\*.*?\*/} will match C 277 pair of delimiters. For instance, @samp{/\*.*?\*/} will match C
279 comments in a string. This could not be so elegantly achieved without 278 comments in a string. This could not be achieved without the use of
280 the use of a non-greedy quantifier. 279 greedy quantifier.
281 280
282 This construct has not been available prior to XEmacs 20.4. It is not 281 This construct has not been available prior to XEmacs 20.4. It is not
283 available in FSF Emacs. 282 available in FSF Emacs.
284 283
285 @item +? 284 @item +?
454 composed of two identical halves. The @samp{\(.*\)} matches the first 453 composed of two identical halves. The @samp{\(.*\)} matches the first
455 half, which may be anything, but the @samp{\1} that follows must match 454 half, which may be anything, but the @samp{\1} that follows must match
456 the same exact text. 455 the same exact text.
457 456
458 @item \(?: @dots{} \) 457 @item \(?: @dots{} \)
459 @cindex @samp{(?:} in regexp 458 @cindex @samp{\(?:} in regexp
460 @cindex regexp grouping 459 @cindex regexp grouping
461 is called a @dfn{shy} grouping operator, and it is used just like 460 is called a @dfn{shy} grouping operator, and it is used just like
462 @samp{\( @dots{} \)}, except that it does not cause the match 461 @samp{\( @dots{} \)}, except that it does not cause the matched
463 substring to be recorded for future reference. 462 substring to be recorded for future reference.
464 463
465 This is useful when you need to use a lot of nested grouping @samp{\( 464 This is useful when you need a lot of grouping @samp{\( @dots{} \)}
466 @dots{} \)} constructs to express complex alternation, but only want to 465 constructs, but only want to remember one or two. Then you can use
467 memoize, or capture, one or two of the subexpression matches. Since 466 not want to remember them for later use with @code{match-string}.
468 @samp{\(?: @dots{} \)} doesn't capture a sub-match, it also doesn't need 467
469 to be counted when you count @samp{\( @dots{} \)} groups to figure the 468 Using @samp{\(?: @dots{} \)} rather than @samp{\( @dots{} \)} when you
470 @samp{match-string} index. That turns out to be a very convenient 469 don't need the captured substrings ought to speed up your programs some,
471 characteristic. 470 since it shortens the code path followed by the regular expression
472 471 engine, as well as the amount of memory allocation and string copying it
473 This situation occurs where parts of a regular expression have been 472 must do. The actual performance gain to be observed has not been
474 automaticly generated by a program that builds them from lists of 473 measured or quantified as of this writing.
475 strings, and the static code following the matching operation must 474 @c This is used to good advantage by the font-locking code, and by
476 access a specific match number. Here's an example that shows this. 475 @c `regexp-opt.el'. ... It will be. It's not yet, but will be.
477
478 We will assume that @code{(require 'regexp-opt)} has been executed
479 already, to ensure that @file{regexp-opt.el}, which is part of the
480 @code{xemacs-devel} package, is loaded.
481 @ifinfo
482 Please evaluate that @code{require} expression now, using @kbd{C-x C-e},
483 if you intend to try the following example.
484 @end ifinfo
485 In a real program, lets pretend that @var{varnames} would be a list of
486 strings holding the names of some variables extracted somehow from the
487 text of a program source you are editing and running this function on.
488 For the purposes of this illustration, we can just bind it in the
489 @code{let*} expression.
490
491 @example
492 @group
493 (let* ((varnames '("k" "n" "i" "j" "varname"))
494 (keys-regexp (regexp-opt
495 (mapcar #'symbol-name
496 '(if then else elif
497 case in of do while
498 with for next unless
499 cond begin end))))
500 (varname-regexp (regexp-opt varnames))
501 (contrived-regexp (concat "\\(" keys-regexp "\\)"
502 "\\s-(\\s-\\("
503 varname-regexp
504 "\\)\\s-)"))
505 (keyname "")
506 (varname ""))
507 ;; @r{In the body of this particular defun, we:}
508 (re-search-forward contrived-regexp nil t)
509 ;; @r{@dots{} and it finds a match. Now we want to extract the}
510 ;; @r{text that it matched on, and save it into @code{keyname}}
511 ;; @r{and @code{varname}.}
512 (setq keyname (match-string 1)
513 varname (match-string 2))
514 ;; @r{@dots{} and then do something with those values.}
515 (list keyname varname))
516
517 ;; @r{Here's something for it to match, so you can try it with}
518 ;; @kbd{C-x C-e}
519 ;; while ( j ) do ...
520 @end group
521 @end example
522
523 Here you should see that if the regular expression returned by
524 @code{regexp-opt} did not use @samp{\(?: @dots{} \)} for grouping, and
525 instead used @samp{\( @dots{} \)}, it would be necessary to count the
526 number of opening parentheses in the @code{keys-regexp} and to use that
527 figure to calculate which match number is matched by the
528 @code{varname-regexp}. It is much more convenient to be able to just
529 ask for the second match string.
530
531 @c This is used to good advantage by the font-locking code....
532 @c ... It will be. It's not yet, but will be.
533 476
534 The shy grouping operator has been borrowed from Perl, and has not been 477 The shy grouping operator has been borrowed from Perl, and has not been
535 available prior to XEmacs 20.3, nor is it available in FSF Emacs. 478 available prior to XEmacs 20.3, nor is it available in FSF Emacs.
536 479
537 @item \w 480 @item \w
697 @cindex regexp searching 640 @cindex regexp searching
698 @cindex searching for regexp 641 @cindex searching for regexp
699 642
700 In XEmacs, you can search for the next match for a regexp either 643 In XEmacs, you can search for the next match for a regexp either
701 incrementally or not. Incremental search commands are described in the 644 incrementally or not. Incremental search commands are described in the
702 @cite{The XEmacs Lisp Reference Manual}. @xref{Regexp Search, , Regular 645 @cite{The XEmacs Reference Manual}. @xref{Regexp Search, , Regular Expression
703 Expression Search, xemacs, The XEmacs Lisp Reference Manual}. Here we 646 Search, emacs, The XEmacs Reference Manual}. Here we describe only the search
704 describe only the search functions useful in programs. The principal 647 functions useful in programs. The principal one is
705 one is @code{re-search-forward}. 648 @code{re-search-forward}.
706 649
707 @deffn Command re-search-forward regexp &optional limit noerror repeat 650 @deffn Command re-search-forward regexp &optional limit noerror repeat
708 This function searches forward in the current buffer for a string of 651 This function searches forward in the current buffer for a string of
709 text that is matched by the regular expression @var{regexp}. The 652 text that is matched by the regular expression @var{regexp}. The
710 function skips over any amount of text that is not matched by 653 function skips over any amount of text that is not matched by
1151 This function returns the position of the start of text matched by the 1094 This function returns the position of the start of text matched by the
1152 last regular expression searched for, or a subexpression of it. 1095 last regular expression searched for, or a subexpression of it.
1153 1096
1154 If @var{count} is zero, then the value is the position of the start of 1097 If @var{count} is zero, then the value is the position of the start of
1155 the entire match. Otherwise, @var{count} specifies a subexpression in 1098 the entire match. Otherwise, @var{count} specifies a subexpression in
1156 the regular expression, and the value of the function is the starting 1099 the regular expresion, and the value of the function is the starting
1157 position of the match for that subexpression. 1100 position of the match for that subexpression.
1158 1101
1159 The value is @code{nil} for a subexpression inside a @samp{\|} 1102 The value is @code{nil} for a subexpression inside a @samp{\|}
1160 alternative that wasn't used in the match. 1103 alternative that wasn't used in the match.
1161 @end defun 1104 @end defun