Mercurial > hg > xemacs-beta
changeset 4199:3660d327399f
[xemacs-hg @ 2007-10-01 08:07:39 by stephent]
Implement subexpression replacement in replace-match. <87ejgf6yy9.fsf@uwakimon.sk.tsukuba.ac.jp>
author | stephent |
---|---|
date | Mon, 01 Oct 2007 08:07:57 +0000 |
parents | fb83e69ce80a |
children | c22bb3b4b5a3 |
files | lisp/ChangeLog lisp/subr.el man/ChangeLog man/lispref/searching.texi src/ChangeLog src/search.c tests/ChangeLog tests/automated/os-tests.el tests/automated/regexp-tests.el tests/automated/test-harness.el |
diffstat | 10 files changed, 273 insertions(+), 62 deletions(-) [+] |
line wrap: on
line diff
--- a/lisp/ChangeLog Sun Sep 30 21:50:52 2007 +0000 +++ b/lisp/ChangeLog Mon Oct 01 08:07:57 2007 +0000 @@ -1,3 +1,7 @@ +2007-09-23 Stephen J. Turnbull <stephen@xemacs.org> + + * subr.el (replace-regexp-in-string): Handle SUBEXP arg properly. + 2007-09-30 Aidan Kehoe <kehoea@parhasard.net> * x-faces.el:
--- a/lisp/subr.el Sun Sep 30 21:50:52 2007 +0000 +++ b/lisp/subr.el Mon Oct 01 08:07:57 2007 +0000 @@ -791,20 +791,33 @@ Return a new string containing the replacements. -Optional arguments FIXEDCASE, LITERAL and SUBEXP are like the -arguments with the same names of function `replace-match'. If START -is non-nil, start replacements at that index in STRING. +Optional arguments FIXEDCASE and LITERAL are like the arguments with +the same names of function `replace-match'. If START is non-nil, +start replacements at that index in STRING. + +For compatibility with old XEmacs code and with recent GNU Emacs, the +interpretation of SUBEXP is somewhat complicated. If SUBEXP is a +buffer, it is interpreted as the buffer which provides syntax tables +and case tables for the match and replacement. If it is not a buffer, +the current buffer is used. If SUBEXP is an integer, it is the index +of the subexpression of REGEXP which is to be replaced. REP is either a string used as the NEWTEXT arg of `replace-match' or a function. If it is a function it is applied to each match to generate the replacement passed to `replace-match'; the match-data at this -point are such that match 0 is the function's argument. +point are such that `(match-string SUBEXP STRING)' is the function's +argument if SUBEXP is an integer \(otherwise the whole match is passed +and replaced). To replace only the first match (if any), make REGEXP match up to \\' and replace a sub-expression, e.g. (replace-regexp-in-string \"\\(foo\\).*\\'\" \"bar\" \" foo foo\" nil nil 1) => \" bar foo\" -" + +Signals `invalid-argument' if SUBEXP is not an integer, buffer, or nil; +or is an integer, but the indicated subexpression was not matched. +Signals `invalid-argument' if STRING is nil but the last text matched was a string, +or if STRING is a string but the last text matched was a buffer." ;; To avoid excessive consing from multiple matches in long strings, ;; don't just call `replace-match' continually. Walk down the @@ -817,6 +830,7 @@ ;; might be reasonable to do so for long enough STRING.] (let ((l (length string)) (start (or start 0)) + (expndx (if (integerp subexp) subexp 0)) matches str mb me) (save-match-data (while (and (< start l) (string-match regexp string start)) @@ -833,7 +847,8 @@ (setq matches (cons (replace-match (if (stringp rep) rep - (funcall rep (match-string 0 str))) + (funcall rep (match-string expndx str))) + ;; no, this subexp shouldn't be expndx fixedcase literal str subexp) (cons (substring string start mb) ; unmatched prefix matches)))
--- a/man/ChangeLog Sun Sep 30 21:50:52 2007 +0000 +++ b/man/ChangeLog Mon Oct 01 08:07:57 2007 +0000 @@ -1,3 +1,9 @@ +2007-09-22 Stephen J. Turnbull <stephen@xemacs.org> + + * lispref/searching.texi (Replacing Match): Document the escapes + for changing case in `replace-match'. Document the change to + STRBUFFER to permit subexpressions in string replacement. + 2007-09-30 Adrian Aichner <adrian@xemacs.org> * Makefile (TEXI2HTML_NOSPLIT): New.
--- a/man/lispref/searching.texi Sun Sep 30 21:50:52 2007 +0000 +++ b/man/lispref/searching.texi Mon Oct 01 08:07:57 2007 +0000 @@ -1282,19 +1282,14 @@ @var{replacement}. If you did the last search in a buffer, you should specify @code{nil} -for @var{string}. Then @code{replace-match} does the replacement by -editing the buffer; it leaves point at the end of the replacement text, -and returns @code{t}. +for @var{string}. (An error will be signaled if you don't.) Then +@code{replace-match} does the replacement by editing the buffer; it +leaves point at the end of the replacement text, and returns @code{t}. If you did the search in a string, pass the same string as @var{string}. -Then @code{replace-match} does the replacement by constructing and -returning a new string. - -If the fourth argument @var{string} is a string, fifth argument -@var{strbuffer} specifies the buffer to be used for syntax-table and -case-table lookup and defaults to the current buffer. When @var{string} -is not a string, the buffer that the match occurred in has automatically -been remembered and you do not need to specify it. +(An error will be signaled if you specify nil.) Then +@code{replace-match} does the replacement by constructing and returning +a new string. If @var{fixedcase} is non-@code{nil}, then the case of the replacement text is not changed; otherwise, the replacement text is converted to a @@ -1317,21 +1312,65 @@ @table @asis @item @samp{\&} -@cindex @samp{&} in replacement +@cindex @samp{\&} in replacement @samp{\&} stands for the entire text being replaced. @item @samp{\@var{n}} @cindex @samp{\@var{n}} in replacement +@cindex @samp{\@var{digit}} in replacement @samp{\@var{n}}, where @var{n} is a digit, stands for the text that matched the @var{n}th subexpression in the original regexp. Subexpressions are those expressions grouped inside @samp{\(@dots{}\)}. @item @samp{\\} -@cindex @samp{\} in replacement +@cindex @samp{\\} in replacement @samp{\\} stands for a single @samp{\} in the replacement text. + +@item @samp{\u} +@cindex @samp{\u} in replacement +@samp{\u} means upcase the next character. + +@item @samp{\l} +@cindex @samp{\l} in replacement +@samp{\l} means downcase the next character. + +@item @samp{\U} +@cindex @samp{\U} in replacement +@samp{\U} means begin upcasing all following characters. + +@item @samp{\L} +@cindex @samp{\L} in replacement +@samp{\L} means begin downcasing all following characters. + +@item @samp{\E} +@cindex @samp{\E} in replacement +@samp{\E} means terminate the effect of any @samp{\U} or @samp{\L}. @end table + +Case changes made with @samp{\u}, @samp{\l}, @samp{\U}, and @samp{\L} +override all other case changes that may be made in the replaced text. + +The fifth argument @var{strbuffer} may be a buffer to be used for +syntax-table and case-table lookup. If @var{strbuffer} is not a buffer, +the current buffer is used. When @var{string} is not a string, the +buffer that the match occurred in has automatically been remembered and +you do not need to specify it. @var{string} may also be an integer, +specifying the index of the subexpression to match. When @var{string} +is not an integer, the ``subexpression'' is 0, @emph{i.e.}, the whole +match. An @code{invalid-argument} error will be signaled if you specify +a buffer when @var{string} is nil, or specify a subexpression which was +not matched. + +It is not possible to specify both a buffer and a subexpression, but the +idiom +@example +(with-current-buffer @var{buffer} (replace-match ... @var{integer})) +@end example +may be used. + @end defun + @node Entire Match Data @subsection Accessing the Entire Match Data
--- a/src/ChangeLog Sun Sep 30 21:50:52 2007 +0000 +++ b/src/ChangeLog Mon Oct 01 08:07:57 2007 +0000 @@ -1,3 +1,12 @@ +2007-09-30 Stephen J. Turnbull <stephen@xemacs.org> + + * search.c (Freplace_match): Improve range-checking on STRBUFFER. + +2007-09-22 Stephen J. Turnbull <stephen@xemacs.org> + + * search.c (Freplace_match): Allow STRBUFFER to specify a + subexpression when the source is a string. + 2007-09-30 Aidan Kehoe <kehoea@parhasard.net> * mule-ccl.c:
--- a/src/search.c Sun Sep 30 21:50:52 2007 +0000 +++ b/src/search.c Mon Oct 01 08:07:57 2007 +0000 @@ -2364,15 +2364,22 @@ DEFUN ("replace-match", Freplace_match, 1, 5, 0, /* Replace text matched by last search with REPLACEMENT. -If second arg FIXEDCASE is non-nil, do not alter case of replacement text. +Leaves point at end of replacement text. +Optional boolean FIXEDCASE inhibits matching case of REPLACEMENT to source. +Optional boolean LITERAL inhibits interpretation of escape sequences. +Optional STRING provides the source text to replace. +Optional STRBUFFER may be a buffer, providing match context, or an integer + specifying the subexpression to replace. + +If FIXEDCASE is non-nil, do not alter case of replacement text. Otherwise maybe capitalize the whole text, or maybe just word initials, based on the replaced text. -If the replaced text has only capital letters -and has at least one multiletter word, convert REPLACEMENT to all caps. +If the replaced text has only capital letters and has at least one +multiletter word, convert REPLACEMENT to all caps. If the replaced text has at least one word starting with a capital letter, then capitalize each word in REPLACEMENT. -If third arg LITERAL is non-nil, insert REPLACEMENT literally. +If LITERAL is non-nil, insert REPLACEMENT literally. Otherwise treat `\\' as special: `\\&' in REPLACEMENT means substitute original matched text. `\\N' means substitute what matched the Nth `\\(...\\)'. @@ -2385,24 +2392,31 @@ `\\E' means terminate the effect of any `\\U' or `\\L'. Case changes made with `\\u', `\\l', `\\U', and `\\L' override all other case changes that may be made in the replaced text. -FIXEDCASE and LITERAL are optional arguments. -Leaves point at end of replacement text. - -The optional fourth argument STRING can be a string to modify. -In that case, this function creates and returns a new string -which is made by replacing the part of STRING that was matched. -When fourth argument is a string, fifth argument STRBUFFER specifies -the buffer to be used for syntax-table and case-table lookup and -defaults to the current buffer. When fourth argument is not a string, -the buffer that the match occurred in has automatically been remembered -and you do not need to specify it. - -When fourth argument is nil, STRBUFFER specifies a subexpression of -the match. It says to replace just that subexpression instead of the -whole match. This is useful only after a regular expression search or -match since only regular expressions have distinguished subexpressions. - -If no match (including searches) has been conducted or the requested + +If non-nil, STRING is the source string, and a new string with the specified +replacements is created and returned. Otherwise the current buffer is the +source text. + +If non-nil, STRBUFFER may be an integer, interpreted as the index of the +subexpression to replace in the source text, or a buffer to provide the +syntax table and case table. If nil, then the \"subexpression\" is 0, i.e., +the whole match, and the current buffer provides the syntax and case tables. +If STRING is nil, STRBUFFER must be nil or an integer. + +Specifying a subexpression is only useful after a regular expression match, +since a fixed string search has no non-trivial subexpressions. + +It is not possible to specify both a buffer and a subexpression. If that is +desired, the idiom `(with-current-buffer BUFFER (replace-match ... INTEGER))' +may be appropriate. + +If STRING is nil but the last thing matched (or searched) was a string, or +STRING is a string but the last thing matched was a buffer, an +`invalid-argument' error will be signaled. (XEmacs does not check that the +last thing searched is the source string, but it is not useful to use a +different string as source.) + +If no match (including searches) has been successful or the requested subexpression was not matched, an `args-out-of-range' error will be signaled. (If no match has ever been conducted in this instance of XEmacs, an `invalid-operation' error will be signaled. This is very @@ -2430,31 +2444,59 @@ CHECK_STRING (replacement); + /* Because GNU decided to be incompatible here, we support the following + baroque and bogus API for the STRING and STRBUFFER arguments: + types interpretations + STRING STRBUFFER STRING STRBUFFER + nil nil none 0 = index of subexpression to replace + nil integer none index of subexpression to replace + nil other ***** error ***** + string nil source current buffer provides syntax table + subexpression = 0 (whole match) + string buffer source buffer providing syntax table + subexpression = 0 (whole match) + string integer source current buffer provides syntax table + subexpression = STRBUFFER + string other ***** error ***** + */ + + /* Do STRBUFFER first; if STRING is nil, we'll overwrite BUF and BUFFER. */ + + /* If the match data were abstracted into a special "match data" type + instead of the typical half-assed "let the implementation be visible" + form it's in, we could extend it to include the last string matched + and the buffer used for that matching. But of course we can't change + it as it is. + */ + if (NILP (strbuffer) || BUFFERP (strbuffer)) + { + buf = decode_buffer (strbuffer, 0); + } + else if (!NILP (strbuffer)) + { + CHECK_INT (strbuffer); + sub = XINT (strbuffer); + if (sub < 0 || sub >= (int) search_regs.num_regs) + invalid_argument ("match data register invalid", strbuffer); + if (search_regs.start[sub] < 0) + invalid_argument ("match data register not set", strbuffer); + buf = current_buffer; + } + else + invalid_argument ("STRBUFFER must be nil, a buffer, or an integer", + strbuffer); + buffer = wrap_buffer (buf); + if (! NILP (string)) { CHECK_STRING (string); if (!EQ (last_thing_searched, Qt)) - invalid_argument ("last thing matched was not a string", Qunbound); - /* If the match data - were abstracted into a special "match data" type instead - of the typical half-assed "let the implementation be - visible" form it's in, we could extend it to include - the last string matched and the buffer used for that - matching. But of course we can't change it as it is. */ - buf = decode_buffer (strbuffer, 0); - buffer = wrap_buffer (buf); + invalid_argument ("last thing matched was not a string", Qunbound); } else { - if (!NILP (strbuffer)) - { - CHECK_INT (strbuffer); - sub = XINT (strbuffer); - if (sub < 0 || sub >= (int) search_regs.num_regs) - args_out_of_range (strbuffer, make_int (search_regs.num_regs)); - } if (!BUFFERP (last_thing_searched)) - invalid_argument ("last thing matched was not a buffer", Qunbound); + invalid_argument ("last thing matched was not a buffer", Qunbound); buffer = last_thing_searched; buf = XBUFFER (buffer); } @@ -2557,8 +2599,8 @@ Lisp_Object before, after; speccount = specpdl_depth (); - before = Fsubstring (string, Qzero, make_int (search_regs.start[0])); - after = Fsubstring (string, make_int (search_regs.end[0]), Qnil); + before = Fsubstring (string, Qzero, make_int (search_regs.start[sub])); + after = Fsubstring (string, make_int (search_regs.end[sub]), Qnil); /* Do case substitution into REPLACEMENT if desired. */ if (NILP (literal)) @@ -2600,6 +2642,8 @@ substart = search_regs.start[0]; subend = search_regs.end[0]; } + /* #### This logic is totally broken, + since we can have backrefs like "\99", right? */ else if (c >= '1' && c <= '9' && c <= search_regs.num_regs + '0') { @@ -2759,6 +2803,8 @@ (buffer, make_int (search_regs.start[0] + offset), make_int (search_regs.end[0] + offset)); + /* #### This logic is totally broken, + since we can have backrefs like "\99", right? */ else if (c >= '1' && c <= '9' && c <= search_regs.num_regs + '0') {
--- a/tests/ChangeLog Sun Sep 30 21:50:52 2007 +0000 +++ b/tests/ChangeLog Mon Oct 01 08:07:57 2007 +0000 @@ -1,3 +1,19 @@ +2007-09-30 Stephen J. Turnbull <stephen@xemacs.org> + + * automated/os-tests.el: Suppress `executable-find' lossage. + + * automated/regexp-tests.el (replace-regexp-in-string): + New tests for error conditions. + + * automated/test-harness.el (test-harness-from-buffer): Comment + about inability to check the FROB in invalid_argument. + +2007-09-23 Stephen J. Turnbull <stephen@xemacs.org> + + * automated/regexp-tests.el (replace-regexp-in-string): + New tests for correct operation. Comment need for `replace-match' + tests. + 2007-08-21 Aidan Kehoe <kehoea@parhasard.net> * automated/mule-tests.el (test-chars):
--- a/tests/automated/os-tests.el Sun Sep 30 21:50:52 2007 +0000 +++ b/tests/automated/os-tests.el Mon Oct 01 08:07:57 2007 +0000 @@ -36,15 +36,18 @@ ;; in <b9yoeipvwn0.fsf@jpl.org>. ;; tac works by lines, unfortunately +;; #### The contortions around `executable-find' gag me, but I don't have time +;; to deal today. If we have `executable-find', we should use its value! (let* ((original-string "a\nb\nc\nd\n") - (tac-cases (if (executable-find "tac") + ;; `executable-find' is in a package and may be unavailable. + (tac-cases (if (and (fboundp 'executable-find) (executable-find "tac")) '((1 . "c\nb\na\nd\n") (3 . "a\nc\nb\nd\n") (5 . "a\nc\nb\nd\n") (7 . "a\nc\nb\nd\n") (9 . "a\nd\nc\nb\n")) nil)) - (cat-cases (if (executable-find "cat") + (cat-cases (if (and (fboundp 'executable-find) (executable-find "cat")) '((1 . "b\nc\na\nd\n") (3 . "a\nb\nc\nd\n") (5 . "a\nb\nc\nd\n")
--- a/tests/automated/regexp-tests.el Sun Sep 30 21:50:52 2007 +0000 +++ b/tests/automated/regexp-tests.el Mon Oct 01 08:07:57 2007 +0000 @@ -459,3 +459,72 @@ (Assert (null (match-string 2 text2))) ) +;; replace-regexp-in-string (regexp rep source +;; fixedcase literal buf-or-subexp start) + +;; Currently we test the following cases: +;; where `cbuf' and `bar-or-empty' are bound below. + +;; #### Tests for the various functional features (fixedcase, literal, start) +;; should be added. + +(with-temp-buffer + (flet ((bar-or-empty (subexp) (if (string= subexp "foo") "bar" ""))) + (let ((cbuf (current-buffer))) + (dolist (test-case + ;; REP BUF-OR-SUBEXP EXPECTED RESULT + `(("bar" nil " bar") + ("bar" ,cbuf " bar") + ("bar" 0 " bar") + ("bar" 1 " bar foo") + (bar-or-empty nil " ") + (bar-or-empty ,cbuf " ") + (bar-or-empty 0 " ") + (bar-or-empty 1 " bar foo"))) + (Assert + (string= + (nth 2 test-case) + (replace-regexp-in-string "\\(foo\\).*\\'" (nth 0 test-case) + " foo foo" nil nil (nth 1 test-case))))) + ;; #### Why doesn't this loop work right? +; (dolist (test-case +; ;; REP BUF-OR-SUBEXP EXPECTED ERROR EXPECTED MESSAGE +; `(;; expected message was "bufferp, symbol" up to 21.5.28 +; ("bar" 'symbol wrong-type-argument "integerp, symbol") +; ("bar" -1 invalid-argument +; "match data register invalid, -1") +; ("bar" 2 invalid-argument +; "match data register not set, 2") +; )) +; (eval +; `(Check-Error-Message ,(nth 2 test-case) ,(nth 3 test-case) +; (replace-regexp-in-string "\\(foo\\).*\\'" ,(nth 0 test-case) +; " foo foo" nil nil ,(nth 1 test-case))))) + ;; #### Can't test the message with w-t-a, see test-harness.el. + (Check-Error wrong-type-argument + (replace-regexp-in-string "\\(foo\\).*\\'" + "bar" + " foo foo" nil nil + 'symbol)) + ;; #### Can't test the FROB (-1), see test-harness.el. + (Check-Error-Message invalid-argument + "match data register invalid" + (replace-regexp-in-string "\\(foo\\).*\\'" + "bar" + " foo foo" nil nil + -1)) + ;; #### Can't test the FROB (-1), see test-harness.el. + (Check-Error-Message invalid-argument + "match data register not set" + (replace-regexp-in-string "\\(foo\\).*\\'" + "bar" + " foo foo" nil nil + 2)) + ))) + +;; replace-match (REPLACEMENT &optional FIXEDCASE LITERAL STRING STRBUFFER) + +;; #### Write some tests! Much functionality is implicitly tested above +;; via `replace-regexp-in-string', but we should specifically test bogus +;; combinations of STRING and STRBUFFER. +
--- a/tests/automated/test-harness.el Sun Sep 30 21:50:52 2007 +0000 +++ b/tests/automated/test-harness.el Mon Oct 01 08:07:57 2007 +0000 @@ -285,6 +285,10 @@ ,quoted-body ',expected-error) (incf no-error-failures)) (,expected-error + ;; #### Damn, this binding doesn't capture frobs, eg, for + ;; invalid_argument() ... you only get the REASON. And for + ;; wrong_type_argument(), there's no reason only FROBs. + ;; If this gets fixed, fix tests in regexp-tests.el. (let ((error-message (second error-info))) (if (string-match ,expected-error-regexp error-message) (progn