diff man/lispref/searching.texi @ 314:341dac730539 r21-0b55

Import from CVS: tag r21-0b55
author cvs
date Mon, 13 Aug 2007 10:44:22 +0200
parents 70ad99077275
children 512e409c26a2
line wrap: on
line diff
--- a/man/lispref/searching.texi	Mon Aug 13 10:43:56 2007 +0200
+++ b/man/lispref/searching.texi	Mon Aug 13 10:44:22 2007 +0200
@@ -159,7 +159,7 @@
   A @dfn{regular expression} (@dfn{regexp}, for short) is a pattern that
 denotes a (possibly infinite) set of strings.  Searching for matches for
 a regexp is a very powerful operation.  This section explains how to write
-regexps; the following section says how to search for them.
+regexps; the following section says how to search using them.
 
  To gain a thorough understanding of regular expressions and how to use
 them to best advantage, we recommend that you study @cite{Mastering
@@ -167,14 +167,15 @@
 1997}. (It's known as the "Hip Owls" book, because of the picture on its
 cover.)  You might also read the manuals to @ref{(gawk)Top},
 @ref{(ed)Top}, @cite{sed}, @cite{grep}, @ref{(perl)Top},
-@ref{(regex)Top}, @ref{(rx)Top}, @cite{pcre}, and @ref{(flex)Top}, which
-also make good use of regular expressions.
+@ref{(regex)Top}, @ref{(rx)Top}, @cite{pcre}, and @ref{(flex)Top}.  All
+of these programs and libraries make effective use of regular
+expressions.
 
  The XEmacs regular expression syntax most closely resembles that of
 @cite{ed}, or @cite{grep}, the GNU versions of which all utilize the GNU
 @cite{regex} library.  XEmacs' version of @cite{regex} has recently been
-extended with some Perl--like capabilities, described in the next
-section.
+extended with some Perl--like capabilities, which are described in the
+next section.
 
 @menu
 * Syntax of Regexps::       Rules for writing regular expressions.
@@ -263,7 +264,7 @@
 @cindex @samp{?} in regexp
 is a quantifying suffix operator similar to @samp{*}, except that the
 preceding expression can match either once or not at all.  For example,
-@samp{ca?r} matches @samp{car} or @samp{cr}, but does not match anyhing
+@samp{ca?r} matches @samp{car} or @samp{cr}, but does not match anything
 else.
 
 @item *?
@@ -273,10 +274,10 @@
 @dfn{non-greedy} quantifier, a regexp construct borrowed from Perl.
 @c Did perl get this from somewhere?  What's the real history of *? ?
 
-This construct very useful for when you want to match the text inside a
-pair of delimiters.  For instance, @samp{/\*.*?\*/} will match C
-comments in a string.  This could not be achieved without the use of
-greedy quantifier.
+This construct is very useful for when you want to match the text inside
+a pair of delimiters.  For instance, @samp{/\*.*?\*/} will match C
+comments in a string.  This could not be so elegantly achieved without
+the use of a nongreedy quantifier.
 
 This construct has not been available prior to XEmacs 20.4.  It is not
 available in FSF Emacs.
@@ -455,24 +456,72 @@
 the same exact text.
 
 @item \(?: @dots{} \)
-@cindex @samp{\(?:} in regexp
+@cindex @samp{(?:} in regexp
 @cindex regexp grouping
 is called a @dfn{shy} grouping operator, and it is used just like
-@samp{\( @dots{} \)}, except that it does not cause the matched
+@samp{\( @dots{} \)}, except that it does not cause the match
 substring to be recorded for future reference.
 
-This is useful when you need a lot of grouping @samp{\( @dots{} \)}
-constructs, but only want to remember one or two.  Then you can use 
-not want to remember them for later use with @code{match-string}.
+This is useful when you need to use a lot of nested grouping @samp{\(
+@dots{} \)} constructs to express complex alternation, but only want to
+memoize, or capture, one or two of the subexpression matches.  Since
+@samp{\(?: @dots{} \)} doesn't capture a submatch, it also doesn't need
+to be counted when you count @samp{\( @dots{} \)} groups to figure the
+@samp{match-string} index.  That turns out to be a very convenient
+characteristic.
+
+This situtation occurs where parts of a regular expression have been
+automaticly generated by a program that builds them from lists of
+strings, and the static code following the matching operation must
+access a specific match number.  Here's an example that shows this:
 
-Using @samp{\(?: @dots{} \)} rather than @samp{\( @dots{} \)} when you
-don't need the captured substrings ought to speed up your programs some,
-since it shortens the code path followed by the regular expression
-engine, as well as the amount of memory allocation and string copying it
-must do.  The actual performance gain to be observed has not been
-measured or quantified as of this writing.
-@c This is used to good advantage by the font-locking code, and by
-@c `regexp-opt.el'.  ... It will be.  It's not yet, but will be.
+@example
+@group
+;; Assume that:
+(require 'regexp-opt) ;; gets executed at toplevel
+;;; `regexp-opt.el' is part of the "xemacs-devel" package.
+;; ... and that VARNAMES is a list of strings holding the name of some
+;; variables extracted from the program source you are editting and
+;; running this function on.  For this example, it will just be bound
+;; in the let* expression.
+(let* ((varnames '("k" "n" "i" "j" "varname"))
+       (keys-regexp (regexp-opt
+		     (mapcar #'symbol-name
+			     '(if then else elif
+			       case in of do while
+			       with for next unless
+			       cond begin end))))
+      (varname-regexp (regexp-opt varnames))
+      (contrived-regexp (concat "\\(" keys-regexp "\\)"
+				"\\s-(\\s-\\("
+				varname-regexp
+				"\\)\\s-)"))
+      (keyname "")
+      (varname ""))
+  ;; In the body of this particular defun, we:
+  (re-search-forward contrived-regexp nil t)
+  ;; ... and it finds a match.  Now we want to extract the text that
+  ;; it matched on, and save it into KEYNAME and VARNAME.
+  (setq keyname (match-string 1)
+	varname (match-string 2))
+  ;; ... and then do something with those values.
+  (list keyname varname))
+
+;; Here's something for it to match, so you can try it with `C-x C-e'.
+;; while ( j ) do ...
+@end group
+@end example
+
+Here you can see that if the regular expression returned by
+@samp{regexp-opt} did not use @samp{\(?: @dots{} \)} for grouping, and
+instead used @samp{\( @dots{} \)}, it would be necessary to count the
+number of opening parentheses in the @samp{keys-regexp} and to use that
+figure to calculate which match number is matched by the
+@code{varname-regexp}.  It is much more convienient to be able to just
+ask for the second match string.
+
+@c This is used to good advantage by the font-locking code....
+@c ... It will be.  It's not yet, but will be.
 
 The shy grouping operator has been borrowed from Perl, and has not been
 available prior to XEmacs 20.3, nor is it available in FSF Emacs.