comparison man/lispref/searching.texi @ 280:7df0dd720c89 r21-0b38

Import from CVS: tag r21-0b38
author cvs
date Mon, 13 Aug 2007 10:32:22 +0200
parents 084402c475ba
children c9fe270a4101
comparison
equal deleted inserted replaced
279:c20b2fb5bb0a 280:7df0dd720c89
171 also make good use of regular expressions. 171 also make good use of regular expressions.
172 172
173 The XEmacs regular expression syntax most closely resembles that of 173 The XEmacs regular expression syntax most closely resembles that of
174 @cite{ed}, or @cite{grep}, the GNU versions of which all utilize the GNU 174 @cite{ed}, or @cite{grep}, the GNU versions of which all utilize the GNU
175 @cite{regex} library. XEmacs' version of @cite{regex} has recently been 175 @cite{regex} library. XEmacs' version of @cite{regex} has recently been
176 extended with some perl--like capabilities, described in the next 176 extended with some Perl--like capabilities, described in the next
177 section. 177 section.
178 178
179 @menu 179 @menu
180 * Syntax of Regexps:: Rules for writing regular expressions. 180 * Syntax of Regexps:: Rules for writing regular expressions.
181 * Regexp Example:: Illustrates regular expression syntax. 181 * Regexp Example:: Illustrates regular expression syntax.
267 else. 267 else.
268 268
269 @item *? 269 @item *?
270 @cindex @samp{*?} in regexp 270 @cindex @samp{*?} in regexp
271 works just like @samp{*}, except that rather than matching the longest 271 works just like @samp{*}, except that rather than matching the longest
272 match, it matches the shortest match. This is known as a "non-greedy" 272 match, it matches the shortest match. @samp{*?} is known as a
273 quantifier. It is a syntax that comes to us from perl. It is very 273 @dfn{non-greedy} quantifier, a regexp construct borrowed from Perl.
274 useful for situations where you want to match the text inside a pair of
275 delimiters.
276 @c Did perl get this from somewhere? What's the real history of *? ? 274 @c Did perl get this from somewhere? What's the real history of *? ?
277 275
278 @lisp 276 This construct very useful for when you want to match the text inside a
279 @group 277 pair of delimiters. For instance, @samp{/\*.*?\*/} will match C
280 (setq s "/ blah / / blah2 /") 278 comments in a string. This could not be achieved without the use of
281 @result{} "/ blah / / blah2 /" 279 greedy quantifier.
282 (string-match "/.*/" s) 280
283 @result{} 0 281 This construct has not been available prior to XEmacs 20.4. It is not
284 (match-string 0 s) 282 available in FSF Emacs.
285 @result{} "/ blah / / blah2 /"
286 (string-match "/.*?/" s)
287 @result{} 0
288 (match-string 0 s)
289 @result{} "/ blah /"
290 @end group
291 @end lisp
292 283
293 @item +? 284 @item +?
294 @cindex @samp{+?} in regexp 285 @cindex @samp{+?} in regexp
295 is the @samp{+} analog to @samp{*?}. 286 is the @samp{+} analog to @samp{*?}.
296 287
297 @item \@{n,m\@} 288 @item \@{n,m\@}
298 @c Note the spacing after the close brace is deliberate. 289 @c Note the spacing after the close brace is deliberate.
299 @cindex @samp{\@{n,m\@} }in regexp 290 @cindex @samp{\@{n,m\@} }in regexp
300 this is an interval quantifier, which is analogous to @samp{*} or 291 serves as an interval quantifier, analogous to @samp{*} or @samp{+}, but
301 @samp{+}, but specifies that the expression must match at least @samp{n} 292 specifies that the expression must match at least @var{n} times, but no
302 times, but no more than @samp{m} times. This syntax comes to us from 293 more than @var{m} times. This syntax is supported by most Unix regexp
303 @cite{ed}, @cite{grep}, and @cite{perl}. The @cite{etags} utility also 294 utilities, and has been introduced to XEmacs for the version 20.3.
304 supports it.
305
306 @lisp
307 @group
308 (setq s "12 123 1234 12345")
309 @result{} "12 123 1234 12345"
310 (string-match "[0-9]\\@{2,4\\@}" s)
311 @result{} 0
312 (match-string 0 s)
313 @result{} "12"
314 (string-match "[0-9]\\@{3,4\\@}" s)
315 @result{} 3
316 (match-string 0 s)
317 @result{} "123"
318 @end group
319 @end lisp
320 295
321 @item [ @dots{} ] 296 @item [ @dots{} ]
322 @cindex character set (in regexp) 297 @cindex character set (in regexp)
323 @cindex @samp{[} in regexp 298 @cindex @samp{[} in regexp
324 @cindex @samp{]} in regexp 299 @cindex @samp{]} in regexp
480 the same exact text. 455 the same exact text.
481 456
482 @item \(?: @dots{} \) 457 @item \(?: @dots{} \)
483 @cindex @samp{(?:} in regex 458 @cindex @samp{(?:} in regex
484 @cindex regexp grouping 459 @cindex regexp grouping
485 is called a "shy" grouping operator, and it is used just like @samp{\( 460 is called a @dfn{shy} grouping operator, and it is used just like
486 @dots{} \)}, except that it does not cause the matched substring to be 461 @samp{\( @dots{} \)}, except that it does not cause the matched
487 recorded for future reference. This can be useful at times when a 462 substring to be recorded for future reference.
488 program wants to refer to a specific @samp{\( @dots{} \)} group's number 463
489 (eg. in a @code{match-string} or @code{match-beginning} function 464 This is useful when you need a lot of grouping @samp{\( @dots{} \)}
490 application) and you need to use grouping constructs for an alternation 465 constructs, but only want to remember one or two. Then you can use
491 or multi--character repetition inside a regular expression string that 466 not want to remember them for later use with @code{match-string}.
492 can change each time the code is run, but you don't want those groups
493 counting because they'd change the reference number of the group you
494 want to refer to that is inside the static part of your generated
495 regular expression.
496
497 @lisp
498 ;; @r{Here `dynamic-regex' might contain shy groups.}
499 (re-search-forward
500 (concat "\\(" dynamic-regex "\\)\\(-?[0-9]\\@{2,4\\@}\\)"))
501 ;; @r{and this `match-string' will still refer to the integer}
502 ;; @r{captured by the second group in the `concat' string.}
503 (match-string 2)
504 @end lisp
505 467
506 Using @samp{\(?: @dots{} \)} rather than @samp{\( @dots{} \)} when you 468 Using @samp{\(?: @dots{} \)} rather than @samp{\( @dots{} \)} when you
507 don't need the captured substrings ought to speed up your programs some, 469 don't need the captured substrings ought to speed up your programs some,
508 since it shortens the code path followed by the regular expression 470 since it shortens the code path followed by the regular expression
509 engine, as well as the amount of memory allocation and string copying it 471 engine, as well as the amount of memory allocation and string copying it
510 must do. The actual performance gain to be observed has not been 472 must do. The actual performance gain to be observed has not been
511 measured or quantified as of this writing. 473 measured or quantified as of this writing.
512 @c This is used to good advantage by the font-locking code, and by `regexp-opt.el'. 474 @c This is used to good advantage by the font-locking code, and by
513 @c ... It will be. It's not yet, but will be. 475 @c `regexp-opt.el'. ... It will be. It's not yet, but will be.
476
477 The shy grouping operator has been borrowed from Perl, and has not been
478 available prior to XEmacs 20.3, nor is it available in FSF Emacs.
514 479
515 @item \w 480 @item \w
516 @cindex @samp{\w} in regexp 481 @cindex @samp{\w} in regexp
517 matches any word-constituent character. The editor syntax table 482 matches any word-constituent character. The editor syntax table
518 determines which characters these are. @xref{Syntax Tables}. 483 determines which characters these are. @xref{Syntax Tables}.
786 @end group 751 @end group
787 752
788 @group 753 @group
789 (match-end 0) 754 (match-end 0)
790 @result{} 32 755 @result{} 32
756 @end group
757 @end example
758 @end defun
759
760 @defun split-string string &optional pattern
761 This function splits @var{string} to substrings delimited by
762 @var{pattern}, and returns a list of substrings. If @var{pattern} is
763 omitted, it defaults to @samp{[ \f\t\n\r\v]+}, which means that it
764 splits @var{string} by white--space.
765
766 @example
767 @group
768 (split-string "foo bar")
769 @result{} ("foo" "bar")
770 @end group
771
772 @group
773 (split-string "something")
774 @result{} ("something")
775 @end group
776
777 @group
778 (split-string "a:b:c" ":")
779 @result{} ("a" "b" "c")
780 @end group
781
782 @group
783 (split-string ":a::b:c" ":")
784 @result{} ("" "a" "" "b" "c")
791 @end group 785 @end group
792 @end example 786 @end example
793 @end defun 787 @end defun
794 788
795 @defun looking-at regexp 789 @defun looking-at regexp