Mercurial > hg > xemacs-beta
comparison man/lispref/searching.texi @ 371:cc15677e0335 r21-2b1
Import from CVS: tag r21-2b1
author | cvs |
---|---|
date | Mon, 13 Aug 2007 11:03:08 +0200 |
parents | a4f53d9b3154 |
children | 6240c7796c7a |
comparison
equal
deleted
inserted
replaced
370:bd866891f083 | 371:cc15677e0335 |
---|---|
157 @cindex regexp | 157 @cindex regexp |
158 | 158 |
159 A @dfn{regular expression} (@dfn{regexp}, for short) is a pattern that | 159 A @dfn{regular expression} (@dfn{regexp}, for short) is a pattern that |
160 denotes a (possibly infinite) set of strings. Searching for matches for | 160 denotes a (possibly infinite) set of strings. Searching for matches for |
161 a regexp is a very powerful operation. This section explains how to write | 161 a regexp is a very powerful operation. This section explains how to write |
162 regexps; the following section says how to search using them. | 162 regexps; the following section says how to search for them. |
163 | 163 |
164 To gain a thorough understanding of regular expressions and how to use | 164 To gain a thorough understanding of regular expressions and how to use |
165 them to best advantage, we recommend that you study @cite{Mastering | 165 them to best advantage, we recommend that you study @cite{Mastering |
166 Regular Expressions, by Jeffrey E.F. Friedl, O'Reilly and Associates, | 166 Regular Expressions, by Jeffrey E.F. Friedl, O'Reilly and Associates, |
167 1997}. (It's known as the "Hip Owls" book, because of the picture on its | 167 1997}. (It's known as the "Hip Owls" book, because of the picture on its |
168 cover.) You might also read the manuals to @ref{(gawk)Top}, | 168 cover.) You might also read the manuals to @ref{(gawk)Top}, |
169 @ref{(ed)Top}, @cite{sed}, @cite{grep}, @ref{(perl)Top}, | 169 @ref{(ed)Top}, @cite{sed}, @cite{grep}, @ref{(perl)Top}, |
170 @ref{(regex)Top}, @ref{(rx)Top}, @cite{pcre}, and @ref{(flex)Top}. All | 170 @ref{(regex)Top}, @ref{(rx)Top}, @cite{pcre}, and @ref{(flex)Top}, which |
171 of these programs and libraries make effective use of regular | 171 also make good use of regular expressions. |
172 expressions. | |
173 | 172 |
174 The XEmacs regular expression syntax most closely resembles that of | 173 The XEmacs regular expression syntax most closely resembles that of |
175 @cite{ed}, or @cite{grep}, the GNU versions of which all utilize the GNU | 174 @cite{ed}, or @cite{grep}, the GNU versions of which all utilize the GNU |
176 @cite{regex} library. XEmacs' version of @cite{regex} has recently been | 175 @cite{regex} library. XEmacs' version of @cite{regex} has recently been |
177 extended with some Perl--like capabilities, which are described in the | 176 extended with some Perl--like capabilities, described in the next |
178 next section. | 177 section. |
179 | 178 |
180 @menu | 179 @menu |
181 * Syntax of Regexps:: Rules for writing regular expressions. | 180 * Syntax of Regexps:: Rules for writing regular expressions. |
182 * Regexp Example:: Illustrates regular expression syntax. | 181 * Regexp Example:: Illustrates regular expression syntax. |
183 @end menu | 182 @end menu |
262 | 261 |
263 @item ? | 262 @item ? |
264 @cindex @samp{?} in regexp | 263 @cindex @samp{?} in regexp |
265 is a quantifying suffix operator similar to @samp{*}, except that the | 264 is a quantifying suffix operator similar to @samp{*}, except that the |
266 preceding expression can match either once or not at all. For example, | 265 preceding expression can match either once or not at all. For example, |
267 @samp{ca?r} matches @samp{car} or @samp{cr}, but does not match anything | 266 @samp{ca?r} matches @samp{car} or @samp{cr}, but does not match anyhing |
268 else. | 267 else. |
269 | 268 |
270 @item *? | 269 @item *? |
271 @cindex @samp{*?} in regexp | 270 @cindex @samp{*?} in regexp |
272 works just like @samp{*}, except that rather than matching the longest | 271 works just like @samp{*}, except that rather than matching the longest |
273 match, it matches the shortest match. @samp{*?} is known as a | 272 match, it matches the shortest match. @samp{*?} is known as a |
274 @dfn{non-greedy} quantifier, a regexp construct borrowed from Perl. | 273 @dfn{non-greedy} quantifier, a regexp construct borrowed from Perl. |
275 @c Did perl get this from somewhere? What's the real history of *? ? | 274 @c Did perl get this from somewhere? What's the real history of *? ? |
276 | 275 |
277 This construct is very useful for when you want to match the text inside | 276 This construct very useful for when you want to match the text inside a |
278 a pair of delimiters. For instance, @samp{/\*.*?\*/} will match C | 277 pair of delimiters. For instance, @samp{/\*.*?\*/} will match C |
279 comments in a string. This could not be so elegantly achieved without | 278 comments in a string. This could not be achieved without the use of |
280 the use of a non-greedy quantifier. | 279 greedy quantifier. |
281 | 280 |
282 This construct has not been available prior to XEmacs 20.4. It is not | 281 This construct has not been available prior to XEmacs 20.4. It is not |
283 available in FSF Emacs. | 282 available in FSF Emacs. |
284 | 283 |
285 @item +? | 284 @item +? |
454 composed of two identical halves. The @samp{\(.*\)} matches the first | 453 composed of two identical halves. The @samp{\(.*\)} matches the first |
455 half, which may be anything, but the @samp{\1} that follows must match | 454 half, which may be anything, but the @samp{\1} that follows must match |
456 the same exact text. | 455 the same exact text. |
457 | 456 |
458 @item \(?: @dots{} \) | 457 @item \(?: @dots{} \) |
459 @cindex @samp{(?:} in regexp | 458 @cindex @samp{\(?:} in regexp |
460 @cindex regexp grouping | 459 @cindex regexp grouping |
461 is called a @dfn{shy} grouping operator, and it is used just like | 460 is called a @dfn{shy} grouping operator, and it is used just like |
462 @samp{\( @dots{} \)}, except that it does not cause the match | 461 @samp{\( @dots{} \)}, except that it does not cause the matched |
463 substring to be recorded for future reference. | 462 substring to be recorded for future reference. |
464 | 463 |
465 This is useful when you need to use a lot of nested grouping @samp{\( | 464 This is useful when you need a lot of grouping @samp{\( @dots{} \)} |
466 @dots{} \)} constructs to express complex alternation, but only want to | 465 constructs, but only want to remember one or two. Then you can use |
467 memoize, or capture, one or two of the subexpression matches. Since | 466 not want to remember them for later use with @code{match-string}. |
468 @samp{\(?: @dots{} \)} doesn't capture a sub-match, it also doesn't need | 467 |
469 to be counted when you count @samp{\( @dots{} \)} groups to figure the | 468 Using @samp{\(?: @dots{} \)} rather than @samp{\( @dots{} \)} when you |
470 @samp{match-string} index. That turns out to be a very convenient | 469 don't need the captured substrings ought to speed up your programs some, |
471 characteristic. | 470 since it shortens the code path followed by the regular expression |
472 | 471 engine, as well as the amount of memory allocation and string copying it |
473 This situation occurs where parts of a regular expression have been | 472 must do. The actual performance gain to be observed has not been |
474 automaticly generated by a program that builds them from lists of | 473 measured or quantified as of this writing. |
475 strings, and the static code following the matching operation must | 474 @c This is used to good advantage by the font-locking code, and by |
476 access a specific match number. Here's an example that shows this. | 475 @c `regexp-opt.el'. ... It will be. It's not yet, but will be. |
477 | |
478 We will assume that @code{(require 'regexp-opt)} has been executed | |
479 already, to ensure that @file{regexp-opt.el}, which is part of the | |
480 @code{xemacs-devel} package, is loaded. | |
481 @ifinfo | |
482 Please evaluate that @code{require} expression now, using @kbd{C-x C-e}, | |
483 if you intend to try the following example. | |
484 @end ifinfo | |
485 In a real program, lets pretend that @var{varnames} would be a list of | |
486 strings holding the names of some variables extracted somehow from the | |
487 text of a program source you are editing and running this function on. | |
488 For the purposes of this illustration, we can just bind it in the | |
489 @code{let*} expression. | |
490 | |
491 @example | |
492 @group | |
493 (let* ((varnames '("k" "n" "i" "j" "varname")) | |
494 (keys-regexp (regexp-opt | |
495 (mapcar #'symbol-name | |
496 '(if then else elif | |
497 case in of do while | |
498 with for next unless | |
499 cond begin end)))) | |
500 (varname-regexp (regexp-opt varnames)) | |
501 (contrived-regexp (concat "\\(" keys-regexp "\\)" | |
502 "\\s-(\\s-\\(" | |
503 varname-regexp | |
504 "\\)\\s-)")) | |
505 (keyname "") | |
506 (varname "")) | |
507 ;; @r{In the body of this particular defun, we:} | |
508 (re-search-forward contrived-regexp nil t) | |
509 ;; @r{@dots{} and it finds a match. Now we want to extract the} | |
510 ;; @r{text that it matched on, and save it into @code{keyname}} | |
511 ;; @r{and @code{varname}.} | |
512 (setq keyname (match-string 1) | |
513 varname (match-string 2)) | |
514 ;; @r{@dots{} and then do something with those values.} | |
515 (list keyname varname)) | |
516 | |
517 ;; @r{Here's something for it to match, so you can try it with} | |
518 ;; @kbd{C-x C-e} | |
519 ;; while ( j ) do ... | |
520 @end group | |
521 @end example | |
522 | |
523 Here you should see that if the regular expression returned by | |
524 @code{regexp-opt} did not use @samp{\(?: @dots{} \)} for grouping, and | |
525 instead used @samp{\( @dots{} \)}, it would be necessary to count the | |
526 number of opening parentheses in the @code{keys-regexp} and to use that | |
527 figure to calculate which match number is matched by the | |
528 @code{varname-regexp}. It is much more convenient to be able to just | |
529 ask for the second match string. | |
530 | |
531 @c This is used to good advantage by the font-locking code.... | |
532 @c ... It will be. It's not yet, but will be. | |
533 | 476 |
534 The shy grouping operator has been borrowed from Perl, and has not been | 477 The shy grouping operator has been borrowed from Perl, and has not been |
535 available prior to XEmacs 20.3, nor is it available in FSF Emacs. | 478 available prior to XEmacs 20.3, nor is it available in FSF Emacs. |
536 | 479 |
537 @item \w | 480 @item \w |
697 @cindex regexp searching | 640 @cindex regexp searching |
698 @cindex searching for regexp | 641 @cindex searching for regexp |
699 | 642 |
700 In XEmacs, you can search for the next match for a regexp either | 643 In XEmacs, you can search for the next match for a regexp either |
701 incrementally or not. Incremental search commands are described in the | 644 incrementally or not. Incremental search commands are described in the |
702 @cite{The XEmacs Lisp Reference Manual}. @xref{Regexp Search, , Regular | 645 @cite{The XEmacs Reference Manual}. @xref{Regexp Search, , Regular Expression |
703 Expression Search, xemacs, The XEmacs Lisp Reference Manual}. Here we | 646 Search, emacs, The XEmacs Reference Manual}. Here we describe only the search |
704 describe only the search functions useful in programs. The principal | 647 functions useful in programs. The principal one is |
705 one is @code{re-search-forward}. | 648 @code{re-search-forward}. |
706 | 649 |
707 @deffn Command re-search-forward regexp &optional limit noerror repeat | 650 @deffn Command re-search-forward regexp &optional limit noerror repeat |
708 This function searches forward in the current buffer for a string of | 651 This function searches forward in the current buffer for a string of |
709 text that is matched by the regular expression @var{regexp}. The | 652 text that is matched by the regular expression @var{regexp}. The |
710 function skips over any amount of text that is not matched by | 653 function skips over any amount of text that is not matched by |
1151 This function returns the position of the start of text matched by the | 1094 This function returns the position of the start of text matched by the |
1152 last regular expression searched for, or a subexpression of it. | 1095 last regular expression searched for, or a subexpression of it. |
1153 | 1096 |
1154 If @var{count} is zero, then the value is the position of the start of | 1097 If @var{count} is zero, then the value is the position of the start of |
1155 the entire match. Otherwise, @var{count} specifies a subexpression in | 1098 the entire match. Otherwise, @var{count} specifies a subexpression in |
1156 the regular expression, and the value of the function is the starting | 1099 the regular expresion, and the value of the function is the starting |
1157 position of the match for that subexpression. | 1100 position of the match for that subexpression. |
1158 | 1101 |
1159 The value is @code{nil} for a subexpression inside a @samp{\|} | 1102 The value is @code{nil} for a subexpression inside a @samp{\|} |
1160 alternative that wasn't used in the match. | 1103 alternative that wasn't used in the match. |
1161 @end defun | 1104 @end defun |