changeset 4199:3660d327399f

[xemacs-hg @ 2007-10-01 08:07:39 by stephent] Implement subexpression replacement in replace-match. <87ejgf6yy9.fsf@uwakimon.sk.tsukuba.ac.jp>
author stephent
date Mon, 01 Oct 2007 08:07:57 +0000
parents fb83e69ce80a
children c22bb3b4b5a3
files lisp/ChangeLog lisp/subr.el man/ChangeLog man/lispref/searching.texi src/ChangeLog src/search.c tests/ChangeLog tests/automated/os-tests.el tests/automated/regexp-tests.el tests/automated/test-harness.el
diffstat 10 files changed, 273 insertions(+), 62 deletions(-) [+]
line wrap: on
line diff
--- a/lisp/ChangeLog	Sun Sep 30 21:50:52 2007 +0000
+++ b/lisp/ChangeLog	Mon Oct 01 08:07:57 2007 +0000
@@ -1,3 +1,7 @@
+2007-09-23  Stephen J. Turnbull  <stephen@xemacs.org>
+
+	* subr.el (replace-regexp-in-string): Handle SUBEXP arg properly.
+
 2007-09-30  Aidan Kehoe  <kehoea@parhasard.net>
 
 	* x-faces.el:
--- a/lisp/subr.el	Sun Sep 30 21:50:52 2007 +0000
+++ b/lisp/subr.el	Mon Oct 01 08:07:57 2007 +0000
@@ -791,20 +791,33 @@
 
 Return a new string containing the replacements.
 
-Optional arguments FIXEDCASE, LITERAL and SUBEXP are like the
-arguments with the same names of function `replace-match'.  If START
-is non-nil, start replacements at that index in STRING.
+Optional arguments FIXEDCASE and LITERAL are like the arguments with
+the same names of function `replace-match'.  If START is non-nil,
+start replacements at that index in STRING.
+
+For compatibility with old XEmacs code and with recent GNU Emacs, the
+interpretation of SUBEXP is somewhat complicated.  If SUBEXP is a
+buffer, it is interpreted as the buffer which provides syntax tables
+and case tables for the match and replacement.  If it is not a buffer,
+the current buffer is used.  If SUBEXP is an integer, it is the index
+of the subexpression of REGEXP which is to be replaced.
 
 REP is either a string used as the NEWTEXT arg of `replace-match' or a
 function.  If it is a function it is applied to each match to generate
 the replacement passed to `replace-match'; the match-data at this
-point are such that match 0 is the function's argument.
+point are such that `(match-string SUBEXP STRING)' is the function's
+argument if SUBEXP is an integer \(otherwise the whole match is passed
+and replaced).
 
 To replace only the first match (if any), make REGEXP match up to \\'
 and replace a sub-expression, e.g.
   (replace-regexp-in-string \"\\(foo\\).*\\'\" \"bar\" \" foo foo\" nil nil 1)
     => \" bar foo\"
-"
+
+Signals `invalid-argument' if SUBEXP is not an integer, buffer, or nil;
+or is an integer, but the indicated subexpression was not matched.
+Signals `invalid-argument' if STRING is nil but the last text matched was a string,
+or if STRING is a string but the last text matched was a buffer."
 
   ;; To avoid excessive consing from multiple matches in long strings,
   ;; don't just call `replace-match' continually.  Walk down the
@@ -817,6 +830,7 @@
   ;; might be reasonable to do so for long enough STRING.]
   (let ((l (length string))
 	(start (or start 0))
+	(expndx (if (integerp subexp) subexp 0))
 	matches str mb me)
     (save-match-data
       (while (and (< start l) (string-match regexp string start))
@@ -833,7 +847,8 @@
 	(setq matches
 	      (cons (replace-match (if (stringp rep)
 				       rep
-				     (funcall rep (match-string 0 str)))
+				     (funcall rep (match-string expndx str)))
+				   ;; no, this subexp shouldn't be expndx
 				   fixedcase literal str subexp)
 		    (cons (substring string start mb) ; unmatched prefix
 			  matches)))
--- a/man/ChangeLog	Sun Sep 30 21:50:52 2007 +0000
+++ b/man/ChangeLog	Mon Oct 01 08:07:57 2007 +0000
@@ -1,3 +1,9 @@
+2007-09-22  Stephen J. Turnbull  <stephen@xemacs.org>
+
+	* lispref/searching.texi (Replacing Match): Document the escapes
+	for changing case in `replace-match'.  Document the change to
+	STRBUFFER to permit subexpressions in string replacement.
+
 2007-09-30  Adrian Aichner  <adrian@xemacs.org>
 
 	* Makefile (TEXI2HTML_NOSPLIT): New.
--- a/man/lispref/searching.texi	Sun Sep 30 21:50:52 2007 +0000
+++ b/man/lispref/searching.texi	Mon Oct 01 08:07:57 2007 +0000
@@ -1282,19 +1282,14 @@
 @var{replacement}.
 
 If you did the last search in a buffer, you should specify @code{nil}
-for @var{string}.  Then @code{replace-match} does the replacement by
-editing the buffer; it leaves point at the end of the replacement text,
-and returns @code{t}.
+for @var{string}.  (An error will be signaled if you don't.)  Then
+@code{replace-match} does the replacement by editing the buffer; it
+leaves point at the end of the replacement text, and returns @code{t}.
 
 If you did the search in a string, pass the same string as @var{string}.
-Then @code{replace-match} does the replacement by constructing and
-returning a new string.
-
-If the fourth argument @var{string} is a string, fifth argument
-@var{strbuffer} specifies the buffer to be used for syntax-table and
-case-table lookup and defaults to the current buffer.  When @var{string}
-is not a string, the buffer that the match occurred in has automatically
-been remembered and you do not need to specify it.
+(An error will be signaled if you specify nil.)  Then
+@code{replace-match} does the replacement by constructing and returning
+a new string.
 
 If @var{fixedcase} is non-@code{nil}, then the case of the replacement
 text is not changed; otherwise, the replacement text is converted to a
@@ -1317,21 +1312,65 @@
 
 @table @asis
 @item @samp{\&}
-@cindex @samp{&} in replacement
+@cindex @samp{\&} in replacement
 @samp{\&} stands for the entire text being replaced.
 
 @item @samp{\@var{n}}
 @cindex @samp{\@var{n}} in replacement
+@cindex @samp{\@var{digit}} in replacement
 @samp{\@var{n}}, where @var{n} is a digit, stands for the text that
 matched the @var{n}th subexpression in the original regexp.
 Subexpressions are those expressions grouped inside @samp{\(@dots{}\)}.
 
 @item @samp{\\}
-@cindex @samp{\} in replacement
+@cindex @samp{\\} in replacement
 @samp{\\} stands for a single @samp{\} in the replacement text.
+
+@item @samp{\u}
+@cindex @samp{\u} in replacement
+@samp{\u} means upcase the next character.
+
+@item @samp{\l}
+@cindex @samp{\l} in replacement
+@samp{\l} means downcase the next character.
+
+@item @samp{\U}
+@cindex @samp{\U} in replacement
+@samp{\U} means begin upcasing all following characters.
+
+@item @samp{\L}
+@cindex @samp{\L} in replacement
+@samp{\L} means begin downcasing all following characters.
+
+@item @samp{\E}
+@cindex @samp{\E} in replacement
+@samp{\E} means terminate the effect of any @samp{\U} or @samp{\L}.
 @end table
+
+Case changes made with @samp{\u}, @samp{\l}, @samp{\U}, and @samp{\L}
+override all other case changes that may be made in the replaced text.
+
+The fifth argument @var{strbuffer} may be a buffer to be used for
+syntax-table and case-table lookup.  If @var{strbuffer} is not a buffer,
+the current buffer is used.  When @var{string} is not a string, the
+buffer that the match occurred in has automatically been remembered and
+you do not need to specify it.  @var{string} may also be an integer,
+specifying the index of the subexpression to match.  When @var{string}
+is not an integer, the ``subexpression'' is 0, @emph{i.e.}, the whole
+match.  An @code{invalid-argument} error will be signaled if you specify
+a buffer when @var{string} is nil, or specify a subexpression which was
+not matched.
+
+It is not possible to specify both a buffer and a subexpression, but the
+idiom
+@example
+(with-current-buffer @var{buffer} (replace-match ... @var{integer}))
+@end example
+may be used.
+
 @end defun
 
+
 @node Entire Match Data
 @subsection Accessing the Entire Match Data
 
--- a/src/ChangeLog	Sun Sep 30 21:50:52 2007 +0000
+++ b/src/ChangeLog	Mon Oct 01 08:07:57 2007 +0000
@@ -1,3 +1,12 @@
+2007-09-30  Stephen J. Turnbull  <stephen@xemacs.org>
+
+	* search.c (Freplace_match): Improve range-checking on STRBUFFER.
+
+2007-09-22  Stephen J. Turnbull  <stephen@xemacs.org>
+
+	* search.c (Freplace_match): Allow STRBUFFER to specify a
+	subexpression when the source is a string.
+
 2007-09-30  Aidan Kehoe  <kehoea@parhasard.net>
 
 	* mule-ccl.c:
--- a/src/search.c	Sun Sep 30 21:50:52 2007 +0000
+++ b/src/search.c	Mon Oct 01 08:07:57 2007 +0000
@@ -2364,15 +2364,22 @@
 
 DEFUN ("replace-match", Freplace_match, 1, 5, 0, /*
 Replace text matched by last search with REPLACEMENT.
-If second arg FIXEDCASE is non-nil, do not alter case of replacement text.
+Leaves point at end of replacement text.
+Optional boolean FIXEDCASE inhibits matching case of REPLACEMENT to source.
+Optional boolean LITERAL inhibits interpretation of escape sequences.
+Optional STRING provides the source text to replace.
+Optional STRBUFFER may be a buffer, providing match context, or an integer
+ specifying the subexpression to replace.
+
+If FIXEDCASE is non-nil, do not alter case of replacement text.
 Otherwise maybe capitalize the whole text, or maybe just word initials,
 based on the replaced text.
-If the replaced text has only capital letters
-and has at least one multiletter word, convert REPLACEMENT to all caps.
+If the replaced text has only capital letters and has at least one
+multiletter word, convert REPLACEMENT to all caps.
 If the replaced text has at least one word starting with a capital letter,
 then capitalize each word in REPLACEMENT.
 
-If third arg LITERAL is non-nil, insert REPLACEMENT literally.
+If LITERAL is non-nil, insert REPLACEMENT literally.
 Otherwise treat `\\' as special:
   `\\&' in REPLACEMENT means substitute original matched text.
   `\\N' means substitute what matched the Nth `\\(...\\)'.
@@ -2385,24 +2392,31 @@
   `\\E' means terminate the effect of any `\\U' or `\\L'.
   Case changes made with `\\u', `\\l', `\\U', and `\\L' override
   all other case changes that may be made in the replaced text.
-FIXEDCASE and LITERAL are optional arguments.
-Leaves point at end of replacement text.
-
-The optional fourth argument STRING can be a string to modify.
-In that case, this function creates and returns a new string
-which is made by replacing the part of STRING that was matched.
-When fourth argument is a string, fifth argument STRBUFFER specifies
-the buffer to be used for syntax-table and case-table lookup and
-defaults to the current buffer.  When fourth argument is not a string,
-the buffer that the match occurred in has automatically been remembered
-and you do not need to specify it.
-
-When fourth argument is nil, STRBUFFER specifies a subexpression of
-the match.  It says to replace just that subexpression instead of the
-whole match.  This is useful only after a regular expression search or
-match since only regular expressions have distinguished subexpressions.
-
-If no match (including searches) has been conducted or the requested
+
+If non-nil, STRING is the source string, and a new string with the specified
+replacements is created and returned.  Otherwise the current buffer is the
+source text.
+
+If non-nil, STRBUFFER may be an integer, interpreted as the index of the
+subexpression to replace in the source text, or a buffer to provide the
+syntax table and case table.  If nil, then the \"subexpression\" is 0, i.e.,
+the whole match, and the current buffer provides the syntax and case tables.
+If STRING is nil, STRBUFFER must be nil or an integer.
+
+Specifying a subexpression is only useful after a regular expression match,
+since a fixed string search has no non-trivial subexpressions.
+
+It is not possible to specify both a buffer and a subexpression.  If that is
+desired, the idiom `(with-current-buffer BUFFER (replace-match ... INTEGER))'
+may be appropriate.
+
+If STRING is nil but the last thing matched (or searched) was a string, or
+STRING is a string but the last thing matched was a buffer, an
+`invalid-argument' error will be signaled.  (XEmacs does not check that the
+last thing searched is the source string, but it is not useful to use a
+different string as source.)
+
+If no match (including searches) has been successful or the requested
 subexpression was not matched, an `args-out-of-range' error will be
 signaled.  (If no match has ever been conducted in this instance of
 XEmacs, an `invalid-operation' error will be signaled.  This is very
@@ -2430,31 +2444,59 @@
 
   CHECK_STRING (replacement);
 
+  /* Because GNU decided to be incompatible here, we support the following
+     baroque and bogus API for the STRING and STRBUFFER arguments:
+          types            interpretations
+     STRING   STRBUFFER   STRING   STRBUFFER
+     nil      nil         none     0 = index of subexpression to replace
+     nil      integer     none     index of subexpression to replace
+     nil      other       ***** error *****
+     string   nil         source   current buffer provides syntax table
+                                   subexpression = 0 (whole match)
+     string   buffer      source   buffer providing syntax table
+                                   subexpression = 0 (whole match)
+     string   integer     source   current buffer provides syntax table
+                                   subexpression = STRBUFFER
+     string   other       ***** error *****
+  */
+
+  /* Do STRBUFFER first; if STRING is nil, we'll overwrite BUF and BUFFER. */
+
+  /* If the match data were abstracted into a special "match data" type
+     instead of the typical half-assed "let the implementation be visible"
+     form it's in, we could extend it to include the last string matched
+     and the buffer used for that matching.  But of course we can't change
+     it as it is.
+  */
+  if (NILP (strbuffer) || BUFFERP (strbuffer))
+    {
+      buf = decode_buffer (strbuffer, 0);
+    }
+  else if (!NILP (strbuffer))
+    {
+      CHECK_INT (strbuffer);
+      sub = XINT (strbuffer);
+      if (sub < 0 || sub >= (int) search_regs.num_regs)
+	invalid_argument ("match data register invalid", strbuffer);
+      if (search_regs.start[sub] < 0)
+	invalid_argument ("match data register not set", strbuffer);
+      buf = current_buffer;
+    }
+  else
+    invalid_argument ("STRBUFFER must be nil, a buffer, or an integer",
+		      strbuffer);
+  buffer = wrap_buffer (buf);
+
   if (! NILP (string))
     {
       CHECK_STRING (string);
       if (!EQ (last_thing_searched, Qt))
- invalid_argument ("last thing matched was not a string", Qunbound);
-      /* If the match data
-	 were abstracted into a special "match data" type instead
-	 of the typical half-assed "let the implementation be
-	 visible" form it's in, we could extend it to include
-	 the last string matched and the buffer used for that
-	 matching.  But of course we can't change it as it is. */
-      buf = decode_buffer (strbuffer, 0);
-      buffer = wrap_buffer (buf);
+	invalid_argument ("last thing matched was not a string", Qunbound);
     }
   else
     {
-      if (!NILP (strbuffer))
-	{
-	  CHECK_INT (strbuffer);
-	  sub = XINT (strbuffer);
-	  if (sub < 0 || sub >= (int) search_regs.num_regs)
-	    args_out_of_range (strbuffer, make_int (search_regs.num_regs));
-	}
       if (!BUFFERP (last_thing_searched))
- invalid_argument ("last thing matched was not a buffer", Qunbound);
+	invalid_argument ("last thing matched was not a buffer", Qunbound);
       buffer = last_thing_searched;
       buf = XBUFFER (buffer);
     }
@@ -2557,8 +2599,8 @@
       Lisp_Object before, after;
 
       speccount = specpdl_depth ();
-      before = Fsubstring (string, Qzero, make_int (search_regs.start[0]));
-      after = Fsubstring (string, make_int (search_regs.end[0]), Qnil);
+      before = Fsubstring (string, Qzero, make_int (search_regs.start[sub]));
+      after = Fsubstring (string, make_int (search_regs.end[sub]), Qnil);
 
       /* Do case substitution into REPLACEMENT if desired.  */
       if (NILP (literal))
@@ -2600,6 +2642,8 @@
 		      substart = search_regs.start[0];
 		      subend = search_regs.end[0];
 		    }
+		  /* #### This logic is totally broken,
+		     since we can have backrefs like "\99", right? */
 		  else if (c >= '1' && c <= '9' &&
 			   c <= search_regs.num_regs + '0')
 		    {
@@ -2759,6 +2803,8 @@
                   (buffer,
                    make_int (search_regs.start[0] + offset),
                    make_int (search_regs.end[0] + offset));
+	      /* #### This logic is totally broken,
+		 since we can have backrefs like "\99", right? */
 	      else if (c >= '1' && c <= '9' &&
 		       c <= search_regs.num_regs + '0')
 		{
--- a/tests/ChangeLog	Sun Sep 30 21:50:52 2007 +0000
+++ b/tests/ChangeLog	Mon Oct 01 08:07:57 2007 +0000
@@ -1,3 +1,19 @@
+2007-09-30  Stephen J. Turnbull  <stephen@xemacs.org>
+
+	* automated/os-tests.el: Suppress `executable-find' lossage.
+
+	* automated/regexp-tests.el (replace-regexp-in-string):
+	New tests for error conditions.
+
+	* automated/test-harness.el (test-harness-from-buffer): Comment
+	about inability to check the FROB in invalid_argument.
+
+2007-09-23  Stephen J. Turnbull  <stephen@xemacs.org>
+
+	* automated/regexp-tests.el (replace-regexp-in-string):
+	New tests for correct operation.  Comment need for `replace-match'
+	tests.
+
 2007-08-21  Aidan Kehoe  <kehoea@parhasard.net>
 
 	* automated/mule-tests.el (test-chars):
--- a/tests/automated/os-tests.el	Sun Sep 30 21:50:52 2007 +0000
+++ b/tests/automated/os-tests.el	Mon Oct 01 08:07:57 2007 +0000
@@ -36,15 +36,18 @@
 ;; in <b9yoeipvwn0.fsf@jpl.org>.
 
 ;; tac works by lines, unfortunately
+;; #### The contortions around `executable-find' gag me, but I don't have time
+;; to deal today.  If we have `executable-find', we should use its value!
 (let* ((original-string "a\nb\nc\nd\n")
-       (tac-cases (if (executable-find "tac")
+       ;; `executable-find' is in a package and may be unavailable.
+       (tac-cases (if (and (fboundp 'executable-find) (executable-find "tac"))
 		      '((1 . "c\nb\na\nd\n")
 			(3 . "a\nc\nb\nd\n")
 			(5 . "a\nc\nb\nd\n")
 			(7 . "a\nc\nb\nd\n")
 			(9 . "a\nd\nc\nb\n"))
 		    nil))
-       (cat-cases (if (executable-find "cat")
+       (cat-cases (if (and (fboundp 'executable-find) (executable-find "cat"))
 		      '((1 . "b\nc\na\nd\n")
 			(3 . "a\nb\nc\nd\n")
 			(5 . "a\nb\nc\nd\n")
--- a/tests/automated/regexp-tests.el	Sun Sep 30 21:50:52 2007 +0000
+++ b/tests/automated/regexp-tests.el	Mon Oct 01 08:07:57 2007 +0000
@@ -459,3 +459,72 @@
   (Assert (null (match-string 2 text2)))
 )
 
+;; replace-regexp-in-string (regexp rep source
+;;                           fixedcase literal buf-or-subexp start)
+
+;; Currently we test the following cases:
+;; where `cbuf' and `bar-or-empty' are bound below.
+
+;; #### Tests for the various functional features (fixedcase, literal, start)
+;; should be added.
+
+(with-temp-buffer
+  (flet ((bar-or-empty (subexp) (if (string= subexp "foo") "bar" "")))
+    (let ((cbuf (current-buffer)))
+      (dolist (test-case
+               ;; REP           BUF-OR-SUBEXP   EXPECTED RESULT
+               `(("bar"         nil             " bar")
+                 ("bar"         ,cbuf           " bar")
+                 ("bar"         0               " bar")
+                 ("bar"         1               " bar foo")
+                 (bar-or-empty  nil             " ")
+                 (bar-or-empty  ,cbuf           " ")
+                 (bar-or-empty  0               " ")
+                 (bar-or-empty  1               " bar foo")))
+        (Assert
+         (string=
+          (nth 2 test-case)
+          (replace-regexp-in-string "\\(foo\\).*\\'" (nth 0 test-case)
+                                    " foo foo" nil nil (nth 1 test-case)))))
+      ;; #### Why doesn't this loop work right?
+;       (dolist (test-case
+;                ;; REP   BUF-OR-SUBEXP   EXPECTED ERROR		EXPECTED MESSAGE
+;                `(;; expected message was "bufferp, symbol" up to 21.5.28
+; 		 ("bar"     'symbol     wrong-type-argument	"integerp, symbol")
+;                  ("bar"     -1          invalid-argument
+; 						 "match data register invalid, -1")
+;                  ("bar"     2           invalid-argument
+; 						  "match data register not set, 2")
+; 		 ))
+;         (eval
+; 	 `(Check-Error-Message ,(nth 2 test-case) ,(nth 3 test-case)
+; 	    (replace-regexp-in-string "\\(foo\\).*\\'" ,(nth 0 test-case)
+; 				      " foo foo" nil nil ,(nth 1 test-case)))))
+      ;; #### Can't test the message with w-t-a, see test-harness.el.
+      (Check-Error wrong-type-argument
+		   (replace-regexp-in-string "\\(foo\\).*\\'"
+					     "bar"
+					     " foo foo" nil nil
+					     'symbol))
+      ;; #### Can't test the FROB (-1), see test-harness.el.
+      (Check-Error-Message invalid-argument
+			   "match data register invalid"
+			   (replace-regexp-in-string "\\(foo\\).*\\'"
+						     "bar"
+						     " foo foo" nil nil
+						     -1))
+      ;; #### Can't test the FROB (-1), see test-harness.el.
+      (Check-Error-Message invalid-argument
+			   "match data register not set"
+			   (replace-regexp-in-string "\\(foo\\).*\\'"
+						     "bar"
+						     " foo foo" nil nil
+						     2))
+      )))
+
+;; replace-match (REPLACEMENT &optional FIXEDCASE LITERAL STRING STRBUFFER)
+
+;; #### Write some tests!  Much functionality is implicitly tested above
+;; via `replace-regexp-in-string', but we should specifically test bogus
+;; combinations of STRING and STRBUFFER.
+
--- a/tests/automated/test-harness.el	Sun Sep 30 21:50:52 2007 +0000
+++ b/tests/automated/test-harness.el	Mon Oct 01 08:07:57 2007 +0000
@@ -285,6 +285,10 @@
 				,quoted-body ',expected-error)
 		 (incf no-error-failures))
 	     (,expected-error
+	      ;; #### Damn, this binding doesn't capture frobs, eg, for
+	      ;; invalid_argument() ... you only get the REASON.  And for
+	      ;; wrong_type_argument(), there's no reason only FROBs.
+	      ;; If this gets fixed, fix tests in regexp-tests.el.
 	      (let ((error-message (second error-info)))
 		(if (string-match ,expected-error-regexp error-message)
 		    (progn