Mercurial > hg > xemacs-beta
comparison man/lispref/searching.texi @ 314:341dac730539 r21-0b55
Import from CVS: tag r21-0b55
author | cvs |
---|---|
date | Mon, 13 Aug 2007 10:44:22 +0200 |
parents | 70ad99077275 |
children | 512e409c26a2 |
comparison
equal
deleted
inserted
replaced
313:2905de29931f | 314:341dac730539 |
---|---|
157 @cindex regexp | 157 @cindex regexp |
158 | 158 |
159 A @dfn{regular expression} (@dfn{regexp}, for short) is a pattern that | 159 A @dfn{regular expression} (@dfn{regexp}, for short) is a pattern that |
160 denotes a (possibly infinite) set of strings. Searching for matches for | 160 denotes a (possibly infinite) set of strings. Searching for matches for |
161 a regexp is a very powerful operation. This section explains how to write | 161 a regexp is a very powerful operation. This section explains how to write |
162 regexps; the following section says how to search for them. | 162 regexps; the following section says how to search using them. |
163 | 163 |
164 To gain a thorough understanding of regular expressions and how to use | 164 To gain a thorough understanding of regular expressions and how to use |
165 them to best advantage, we recommend that you study @cite{Mastering | 165 them to best advantage, we recommend that you study @cite{Mastering |
166 Regular Expressions, by Jeffrey E.F. Friedl, O'Reilly and Associates, | 166 Regular Expressions, by Jeffrey E.F. Friedl, O'Reilly and Associates, |
167 1997}. (It's known as the "Hip Owls" book, because of the picture on its | 167 1997}. (It's known as the "Hip Owls" book, because of the picture on its |
168 cover.) You might also read the manuals to @ref{(gawk)Top}, | 168 cover.) You might also read the manuals to @ref{(gawk)Top}, |
169 @ref{(ed)Top}, @cite{sed}, @cite{grep}, @ref{(perl)Top}, | 169 @ref{(ed)Top}, @cite{sed}, @cite{grep}, @ref{(perl)Top}, |
170 @ref{(regex)Top}, @ref{(rx)Top}, @cite{pcre}, and @ref{(flex)Top}, which | 170 @ref{(regex)Top}, @ref{(rx)Top}, @cite{pcre}, and @ref{(flex)Top}. All |
171 also make good use of regular expressions. | 171 of these programs and libraries make effective use of regular |
172 expressions. | |
172 | 173 |
173 The XEmacs regular expression syntax most closely resembles that of | 174 The XEmacs regular expression syntax most closely resembles that of |
174 @cite{ed}, or @cite{grep}, the GNU versions of which all utilize the GNU | 175 @cite{ed}, or @cite{grep}, the GNU versions of which all utilize the GNU |
175 @cite{regex} library. XEmacs' version of @cite{regex} has recently been | 176 @cite{regex} library. XEmacs' version of @cite{regex} has recently been |
176 extended with some Perl--like capabilities, described in the next | 177 extended with some Perl--like capabilities, which are described in the |
177 section. | 178 next section. |
178 | 179 |
179 @menu | 180 @menu |
180 * Syntax of Regexps:: Rules for writing regular expressions. | 181 * Syntax of Regexps:: Rules for writing regular expressions. |
181 * Regexp Example:: Illustrates regular expression syntax. | 182 * Regexp Example:: Illustrates regular expression syntax. |
182 @end menu | 183 @end menu |
261 | 262 |
262 @item ? | 263 @item ? |
263 @cindex @samp{?} in regexp | 264 @cindex @samp{?} in regexp |
264 is a quantifying suffix operator similar to @samp{*}, except that the | 265 is a quantifying suffix operator similar to @samp{*}, except that the |
265 preceding expression can match either once or not at all. For example, | 266 preceding expression can match either once or not at all. For example, |
266 @samp{ca?r} matches @samp{car} or @samp{cr}, but does not match anyhing | 267 @samp{ca?r} matches @samp{car} or @samp{cr}, but does not match anything |
267 else. | 268 else. |
268 | 269 |
269 @item *? | 270 @item *? |
270 @cindex @samp{*?} in regexp | 271 @cindex @samp{*?} in regexp |
271 works just like @samp{*}, except that rather than matching the longest | 272 works just like @samp{*}, except that rather than matching the longest |
272 match, it matches the shortest match. @samp{*?} is known as a | 273 match, it matches the shortest match. @samp{*?} is known as a |
273 @dfn{non-greedy} quantifier, a regexp construct borrowed from Perl. | 274 @dfn{non-greedy} quantifier, a regexp construct borrowed from Perl. |
274 @c Did perl get this from somewhere? What's the real history of *? ? | 275 @c Did perl get this from somewhere? What's the real history of *? ? |
275 | 276 |
276 This construct very useful for when you want to match the text inside a | 277 This construct is very useful for when you want to match the text inside |
277 pair of delimiters. For instance, @samp{/\*.*?\*/} will match C | 278 a pair of delimiters. For instance, @samp{/\*.*?\*/} will match C |
278 comments in a string. This could not be achieved without the use of | 279 comments in a string. This could not be so elegantly achieved without |
279 greedy quantifier. | 280 the use of a nongreedy quantifier. |
280 | 281 |
281 This construct has not been available prior to XEmacs 20.4. It is not | 282 This construct has not been available prior to XEmacs 20.4. It is not |
282 available in FSF Emacs. | 283 available in FSF Emacs. |
283 | 284 |
284 @item +? | 285 @item +? |
453 composed of two identical halves. The @samp{\(.*\)} matches the first | 454 composed of two identical halves. The @samp{\(.*\)} matches the first |
454 half, which may be anything, but the @samp{\1} that follows must match | 455 half, which may be anything, but the @samp{\1} that follows must match |
455 the same exact text. | 456 the same exact text. |
456 | 457 |
457 @item \(?: @dots{} \) | 458 @item \(?: @dots{} \) |
458 @cindex @samp{\(?:} in regexp | 459 @cindex @samp{(?:} in regexp |
459 @cindex regexp grouping | 460 @cindex regexp grouping |
460 is called a @dfn{shy} grouping operator, and it is used just like | 461 is called a @dfn{shy} grouping operator, and it is used just like |
461 @samp{\( @dots{} \)}, except that it does not cause the matched | 462 @samp{\( @dots{} \)}, except that it does not cause the match |
462 substring to be recorded for future reference. | 463 substring to be recorded for future reference. |
463 | 464 |
464 This is useful when you need a lot of grouping @samp{\( @dots{} \)} | 465 This is useful when you need to use a lot of nested grouping @samp{\( |
465 constructs, but only want to remember one or two. Then you can use | 466 @dots{} \)} constructs to express complex alternation, but only want to |
466 not want to remember them for later use with @code{match-string}. | 467 memoize, or capture, one or two of the subexpression matches. Since |
467 | 468 @samp{\(?: @dots{} \)} doesn't capture a submatch, it also doesn't need |
468 Using @samp{\(?: @dots{} \)} rather than @samp{\( @dots{} \)} when you | 469 to be counted when you count @samp{\( @dots{} \)} groups to figure the |
469 don't need the captured substrings ought to speed up your programs some, | 470 @samp{match-string} index. That turns out to be a very convenient |
470 since it shortens the code path followed by the regular expression | 471 characteristic. |
471 engine, as well as the amount of memory allocation and string copying it | 472 |
472 must do. The actual performance gain to be observed has not been | 473 This situtation occurs where parts of a regular expression have been |
473 measured or quantified as of this writing. | 474 automaticly generated by a program that builds them from lists of |
474 @c This is used to good advantage by the font-locking code, and by | 475 strings, and the static code following the matching operation must |
475 @c `regexp-opt.el'. ... It will be. It's not yet, but will be. | 476 access a specific match number. Here's an example that shows this: |
477 | |
478 @example | |
479 @group | |
480 ;; Assume that: | |
481 (require 'regexp-opt) ;; gets executed at toplevel | |
482 ;;; `regexp-opt.el' is part of the "xemacs-devel" package. | |
483 ;; ... and that VARNAMES is a list of strings holding the name of some | |
484 ;; variables extracted from the program source you are editting and | |
485 ;; running this function on. For this example, it will just be bound | |
486 ;; in the let* expression. | |
487 (let* ((varnames '("k" "n" "i" "j" "varname")) | |
488 (keys-regexp (regexp-opt | |
489 (mapcar #'symbol-name | |
490 '(if then else elif | |
491 case in of do while | |
492 with for next unless | |
493 cond begin end)))) | |
494 (varname-regexp (regexp-opt varnames)) | |
495 (contrived-regexp (concat "\\(" keys-regexp "\\)" | |
496 "\\s-(\\s-\\(" | |
497 varname-regexp | |
498 "\\)\\s-)")) | |
499 (keyname "") | |
500 (varname "")) | |
501 ;; In the body of this particular defun, we: | |
502 (re-search-forward contrived-regexp nil t) | |
503 ;; ... and it finds a match. Now we want to extract the text that | |
504 ;; it matched on, and save it into KEYNAME and VARNAME. | |
505 (setq keyname (match-string 1) | |
506 varname (match-string 2)) | |
507 ;; ... and then do something with those values. | |
508 (list keyname varname)) | |
509 | |
510 ;; Here's something for it to match, so you can try it with `C-x C-e'. | |
511 ;; while ( j ) do ... | |
512 @end group | |
513 @end example | |
514 | |
515 Here you can see that if the regular expression returned by | |
516 @samp{regexp-opt} did not use @samp{\(?: @dots{} \)} for grouping, and | |
517 instead used @samp{\( @dots{} \)}, it would be necessary to count the | |
518 number of opening parentheses in the @samp{keys-regexp} and to use that | |
519 figure to calculate which match number is matched by the | |
520 @code{varname-regexp}. It is much more convienient to be able to just | |
521 ask for the second match string. | |
522 | |
523 @c This is used to good advantage by the font-locking code.... | |
524 @c ... It will be. It's not yet, but will be. | |
476 | 525 |
477 The shy grouping operator has been borrowed from Perl, and has not been | 526 The shy grouping operator has been borrowed from Perl, and has not been |
478 available prior to XEmacs 20.3, nor is it available in FSF Emacs. | 527 available prior to XEmacs 20.3, nor is it available in FSF Emacs. |
479 | 528 |
480 @item \w | 529 @item \w |