428
+ − 1 @c -*-texinfo-*-
+ − 2 @c This is part of the XEmacs Lisp Reference Manual.
444
+ − 3 @c Copyright (C) 1990, 1991, 1992, 1993, 1994 Free Software Foundation, Inc.
428
+ − 4 @c See the file lispref.texi for copying conditions.
+ − 5 @setfilename ../../info/searching.info
+ − 6 @node Searching and Matching, Syntax Tables, Text, Top
+ − 7 @chapter Searching and Matching
+ − 8 @cindex searching
+ − 9
+ − 10 XEmacs provides two ways to search through a buffer for specified
+ − 11 text: exact string searches and regular expression searches. After a
+ − 12 regular expression search, you can examine the @dfn{match data} to
+ − 13 determine which text matched the whole regular expression or various
+ − 14 portions of it.
+ − 15
+ − 16 @menu
+ − 17 * String Search:: Search for an exact match.
+ − 18 * Regular Expressions:: Describing classes of strings.
+ − 19 * Regexp Search:: Searching for a match for a regexp.
+ − 20 * POSIX Regexps:: Searching POSIX-style for the longest match.
+ − 21 * Search and Replace:: Internals of @code{query-replace}.
+ − 22 * Match Data:: Finding out which part of the text matched
+ − 23 various parts of a regexp, after regexp search.
+ − 24 * Searching and Case:: Case-independent or case-significant searching.
+ − 25 * Standard Regexps:: Useful regexps for finding sentences, pages,...
+ − 26 @end menu
+ − 27
+ − 28 The @samp{skip-chars@dots{}} functions also perform a kind of searching.
+ − 29 @xref{Skipping Characters}.
+ − 30
+ − 31 @node String Search
+ − 32 @section Searching for Strings
+ − 33 @cindex string search
+ − 34
+ − 35 These are the primitive functions for searching through the text in a
+ − 36 buffer. They are meant for use in programs, but you may call them
+ − 37 interactively. If you do so, they prompt for the search string;
444
+ − 38 @var{limit} and @var{noerror} are set to @code{nil}, and @var{count}
428
+ − 39 is set to 1.
+ − 40
444
+ − 41 @deffn Command search-forward string &optional limit noerror count buffer
428
+ − 42 This function searches forward from point for an exact match for
+ − 43 @var{string}. If successful, it sets point to the end of the occurrence
+ − 44 found, and returns the new value of point. If no match is found, the
+ − 45 value and side effects depend on @var{noerror} (see below).
+ − 46
+ − 47 In the following example, point is initially at the beginning of the
+ − 48 line. Then @code{(search-forward "fox")} moves point after the last
+ − 49 letter of @samp{fox}:
+ − 50
+ − 51 @example
+ − 52 @group
+ − 53 ---------- Buffer: foo ----------
+ − 54 @point{}The quick brown fox jumped over the lazy dog.
+ − 55 ---------- Buffer: foo ----------
+ − 56 @end group
+ − 57
+ − 58 @group
+ − 59 (search-forward "fox")
+ − 60 @result{} 20
+ − 61
+ − 62 ---------- Buffer: foo ----------
+ − 63 The quick brown fox@point{} jumped over the lazy dog.
+ − 64 ---------- Buffer: foo ----------
+ − 65 @end group
+ − 66 @end example
+ − 67
+ − 68 The argument @var{limit} specifies the upper bound to the search. (It
+ − 69 must be a position in the current buffer.) No match extending after
+ − 70 that position is accepted. If @var{limit} is omitted or @code{nil}, it
+ − 71 defaults to the end of the accessible portion of the buffer.
+ − 72
+ − 73 @kindex search-failed
+ − 74 What happens when the search fails depends on the value of
+ − 75 @var{noerror}. If @var{noerror} is @code{nil}, a @code{search-failed}
+ − 76 error is signaled. If @var{noerror} is @code{t}, @code{search-forward}
+ − 77 returns @code{nil} and does nothing. If @var{noerror} is neither
+ − 78 @code{nil} nor @code{t}, then @code{search-forward} moves point to the
+ − 79 upper bound and returns @code{nil}. (It would be more consistent now
+ − 80 to return the new position of point in that case, but some programs
+ − 81 may depend on a value of @code{nil}.)
+ − 82
444
+ − 83 If @var{count} is supplied (it must be an integer), then the search is
+ − 84 repeated that many times (each time starting at the end of the previous
+ − 85 time's match). If @var{count} is negative, the search direction is
+ − 86 backward. If the successive searches succeed, the function succeeds,
+ − 87 moving point and returning its new value. Otherwise the search fails.
+ − 88
+ − 89 @var{buffer} is the buffer to search in, and defaults to the current buffer.
428
+ − 90 @end deffn
+ − 91
444
+ − 92 @deffn Command search-backward string &optional limit noerror count buffer
428
+ − 93 This function searches backward from point for @var{string}. It is
+ − 94 just like @code{search-forward} except that it searches backwards and
+ − 95 leaves point at the beginning of the match.
+ − 96 @end deffn
+ − 97
444
+ − 98 @deffn Command word-search-forward string &optional limit noerror count buffer
428
+ − 99 @cindex word search
+ − 100 This function searches forward from point for a ``word'' match for
+ − 101 @var{string}. If it finds a match, it sets point to the end of the
+ − 102 match found, and returns the new value of point.
+ − 103
+ − 104 Word matching regards @var{string} as a sequence of words, disregarding
+ − 105 punctuation that separates them. It searches the buffer for the same
+ − 106 sequence of words. Each word must be distinct in the buffer (searching
+ − 107 for the word @samp{ball} does not match the word @samp{balls}), but the
+ − 108 details of punctuation and spacing are ignored (searching for @samp{ball
+ − 109 boy} does match @samp{ball. Boy!}).
+ − 110
+ − 111 In this example, point is initially at the beginning of the buffer; the
+ − 112 search leaves it between the @samp{y} and the @samp{!}.
+ − 113
+ − 114 @example
+ − 115 @group
+ − 116 ---------- Buffer: foo ----------
+ − 117 @point{}He said "Please! Find
+ − 118 the ball boy!"
+ − 119 ---------- Buffer: foo ----------
+ − 120 @end group
+ − 121
+ − 122 @group
+ − 123 (word-search-forward "Please find the ball, boy.")
+ − 124 @result{} 35
+ − 125
+ − 126 ---------- Buffer: foo ----------
+ − 127 He said "Please! Find
+ − 128 the ball boy@point{}!"
+ − 129 ---------- Buffer: foo ----------
+ − 130 @end group
+ − 131 @end example
+ − 132
+ − 133 If @var{limit} is non-@code{nil} (it must be a position in the current
+ − 134 buffer), then it is the upper bound to the search. The match found must
+ − 135 not extend after that position.
+ − 136
+ − 137 If @var{noerror} is @code{nil}, then @code{word-search-forward} signals
+ − 138 an error if the search fails. If @var{noerror} is @code{t}, then it
+ − 139 returns @code{nil} instead of signaling an error. If @var{noerror} is
+ − 140 neither @code{nil} nor @code{t}, it moves point to @var{limit} (or the
+ − 141 end of the buffer) and returns @code{nil}.
+ − 142
444
+ − 143 If @var{count} is non-@code{nil}, then the search is repeated that many
428
+ − 144 times. Point is positioned at the end of the last match.
444
+ − 145
+ − 146 @var{buffer} is the buffer to search in, and defaults to the current buffer.
428
+ − 147 @end deffn
+ − 148
444
+ − 149 @deffn Command word-search-backward string &optional limit noerror count buffer
428
+ − 150 This function searches backward from point for a word match to
+ − 151 @var{string}. This function is just like @code{word-search-forward}
+ − 152 except that it searches backward and normally leaves point at the
+ − 153 beginning of the match.
+ − 154 @end deffn
+ − 155
+ − 156 @node Regular Expressions
+ − 157 @section Regular Expressions
+ − 158 @cindex regular expression
+ − 159 @cindex regexp
+ − 160
+ − 161 A @dfn{regular expression} (@dfn{regexp}, for short) is a pattern that
+ − 162 denotes a (possibly infinite) set of strings. Searching for matches for
+ − 163 a regexp is a very powerful operation. This section explains how to write
+ − 164 regexps; the following section says how to search for them.
+ − 165
+ − 166 To gain a thorough understanding of regular expressions and how to use
+ − 167 them to best advantage, we recommend that you study @cite{Mastering
+ − 168 Regular Expressions, by Jeffrey E.F. Friedl, O'Reilly and Associates,
+ − 169 1997}. (It's known as the "Hip Owls" book, because of the picture on its
+ − 170 cover.) You might also read the manuals to @ref{(gawk)Top},
+ − 171 @ref{(ed)Top}, @cite{sed}, @cite{grep}, @ref{(perl)Top},
+ − 172 @ref{(regex)Top}, @ref{(rx)Top}, @cite{pcre}, and @ref{(flex)Top}, which
+ − 173 also make good use of regular expressions.
+ − 174
+ − 175 The XEmacs regular expression syntax most closely resembles that of
+ − 176 @cite{ed}, or @cite{grep}, the GNU versions of which all utilize the GNU
+ − 177 @cite{regex} library. XEmacs' version of @cite{regex} has recently been
+ − 178 extended with some Perl--like capabilities, described in the next
+ − 179 section.
+ − 180
+ − 181 @menu
+ − 182 * Syntax of Regexps:: Rules for writing regular expressions.
+ − 183 * Regexp Example:: Illustrates regular expression syntax.
+ − 184 @end menu
+ − 185
+ − 186 @node Syntax of Regexps
+ − 187 @subsection Syntax of Regular Expressions
+ − 188
+ − 189 Regular expressions have a syntax in which a few characters are
+ − 190 special constructs and the rest are @dfn{ordinary}. An ordinary
+ − 191 character is a simple regular expression that matches that character and
+ − 192 nothing else. The special characters are @samp{.}, @samp{*}, @samp{+},
+ − 193 @samp{?}, @samp{[}, @samp{]}, @samp{^}, @samp{$}, and @samp{\}; no new
+ − 194 special characters will be defined in the future. Any other character
+ − 195 appearing in a regular expression is ordinary, unless a @samp{\}
+ − 196 precedes it.
+ − 197
+ − 198 For example, @samp{f} is not a special character, so it is ordinary, and
+ − 199 therefore @samp{f} is a regular expression that matches the string
+ − 200 @samp{f} and no other string. (It does @emph{not} match the string
+ − 201 @samp{ff}.) Likewise, @samp{o} is a regular expression that matches
+ − 202 only @samp{o}.@refill
+ − 203
+ − 204 Any two regular expressions @var{a} and @var{b} can be concatenated. The
+ − 205 result is a regular expression that matches a string if @var{a} matches
+ − 206 some amount of the beginning of that string and @var{b} matches the rest of
+ − 207 the string.@refill
+ − 208
+ − 209 As a simple example, we can concatenate the regular expressions @samp{f}
+ − 210 and @samp{o} to get the regular expression @samp{fo}, which matches only
+ − 211 the string @samp{fo}. Still trivial. To do something more powerful, you
+ − 212 need to use one of the special characters. Here is a list of them:
+ − 213
+ − 214 @need 1200
+ − 215 @table @kbd
+ − 216 @item .@: @r{(Period)}
+ − 217 @cindex @samp{.} in regexp
+ − 218 is a special character that matches any single character except a newline.
+ − 219 Using concatenation, we can make regular expressions like @samp{a.b}, which
+ − 220 matches any three-character string that begins with @samp{a} and ends with
+ − 221 @samp{b}.@refill
+ − 222
+ − 223 @item *
+ − 224 @cindex @samp{*} in regexp
+ − 225 is not a construct by itself; it is a quantifying suffix operator that
+ − 226 means to repeat the preceding regular expression as many times as
+ − 227 possible. In @samp{fo*}, the @samp{*} applies to the @samp{o}, so
+ − 228 @samp{fo*} matches one @samp{f} followed by any number of @samp{o}s.
+ − 229 The case of zero @samp{o}s is allowed: @samp{fo*} does match
+ − 230 @samp{f}.@refill
+ − 231
+ − 232 @samp{*} always applies to the @emph{smallest} possible preceding
+ − 233 expression. Thus, @samp{fo*} has a repeating @samp{o}, not a
+ − 234 repeating @samp{fo}.@refill
+ − 235
+ − 236 The matcher processes a @samp{*} construct by matching, immediately, as
+ − 237 many repetitions as can be found; it is "greedy". Then it continues
+ − 238 with the rest of the pattern. If that fails, backtracking occurs,
+ − 239 discarding some of the matches of the @samp{*}-modified construct in
+ − 240 case that makes it possible to match the rest of the pattern. For
+ − 241 example, in matching @samp{ca*ar} against the string @samp{caaar}, the
+ − 242 @samp{a*} first tries to match all three @samp{a}s; but the rest of the
+ − 243 pattern is @samp{ar} and there is only @samp{r} left to match, so this
+ − 244 try fails. The next alternative is for @samp{a*} to match only two
+ − 245 @samp{a}s. With this choice, the rest of the regexp matches
+ − 246 successfully.@refill
+ − 247
+ − 248 Nested repetition operators can be extremely slow if they specify
+ − 249 backtracking loops. For example, it could take hours for the regular
+ − 250 expression @samp{\(x+y*\)*a} to match the sequence
+ − 251 @samp{xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxz}. The slowness is because
+ − 252 Emacs must try each imaginable way of grouping the 35 @samp{x}'s before
+ − 253 concluding that none of them can work. To make sure your regular
+ − 254 expressions run fast, check nested repetitions carefully.
+ − 255
+ − 256 @item +
+ − 257 @cindex @samp{+} in regexp
+ − 258 is a quantifying suffix operator similar to @samp{*} except that the
+ − 259 preceding expression must match at least once. It is also "greedy".
+ − 260 So, for example, @samp{ca+r} matches the strings @samp{car} and
+ − 261 @samp{caaaar} but not the string @samp{cr}, whereas @samp{ca*r} matches
+ − 262 all three strings.
+ − 263
+ − 264 @item ?
+ − 265 @cindex @samp{?} in regexp
+ − 266 is a quantifying suffix operator similar to @samp{*}, except that the
+ − 267 preceding expression can match either once or not at all. For example,
+ − 268 @samp{ca?r} matches @samp{car} or @samp{cr}, but does not match anything
+ − 269 else.
+ − 270
+ − 271 @item *?
+ − 272 @cindex @samp{*?} in regexp
+ − 273 works just like @samp{*}, except that rather than matching the longest
+ − 274 match, it matches the shortest match. @samp{*?} is known as a
+ − 275 @dfn{non-greedy} quantifier, a regexp construct borrowed from Perl.
+ − 276 @c Did perl get this from somewhere? What's the real history of *? ?
+ − 277
442
+ − 278 This construct is very useful for when you want to match the text inside
+ − 279 a pair of delimiters. For instance, @samp{/\*.*?\*/} will match C
+ − 280 comments in a string. This could not easily be achieved without the use
+ − 281 of a non-greedy quantifier.
428
+ − 282
+ − 283 This construct has not been available prior to XEmacs 20.4. It is not
+ − 284 available in FSF Emacs.
+ − 285
+ − 286 @item +?
+ − 287 @cindex @samp{+?} in regexp
442
+ − 288 is the non-greedy version of @samp{+}.
+ − 289
+ − 290 @item ??
+ − 291 @cindex @samp{??} in regexp
+ − 292 is the non-greedy version of @samp{?}.
428
+ − 293
+ − 294 @item \@{n,m\@}
+ − 295 @c Note the spacing after the close brace is deliberate.
+ − 296 @cindex @samp{\@{n,m\@} }in regexp
+ − 297 serves as an interval quantifier, analogous to @samp{*} or @samp{+}, but
+ − 298 specifies that the expression must match at least @var{n} times, but no
+ − 299 more than @var{m} times. This syntax is supported by most Unix regexp
+ − 300 utilities, and has been introduced to XEmacs for the version 20.3.
+ − 301
442
+ − 302 Unfortunately, the non-greedy version of this quantifier does not exist
+ − 303 currently, although it does in Perl.
+ − 304
428
+ − 305 @item [ @dots{} ]
+ − 306 @cindex character set (in regexp)
+ − 307 @cindex @samp{[} in regexp
+ − 308 @cindex @samp{]} in regexp
+ − 309 @samp{[} begins a @dfn{character set}, which is terminated by a
+ − 310 @samp{]}. In the simplest case, the characters between the two brackets
+ − 311 form the set. Thus, @samp{[ad]} matches either one @samp{a} or one
+ − 312 @samp{d}, and @samp{[ad]*} matches any string composed of just @samp{a}s
+ − 313 and @samp{d}s (including the empty string), from which it follows that
+ − 314 @samp{c[ad]*r} matches @samp{cr}, @samp{car}, @samp{cdr},
+ − 315 @samp{caddaar}, etc.@refill
+ − 316
+ − 317 The usual regular expression special characters are not special inside a
+ − 318 character set. A completely different set of special characters exists
+ − 319 inside character sets: @samp{]}, @samp{-} and @samp{^}.@refill
+ − 320
+ − 321 @samp{-} is used for ranges of characters. To write a range, write two
+ − 322 characters with a @samp{-} between them. Thus, @samp{[a-z]} matches any
+ − 323 lower case letter. Ranges may be intermixed freely with individual
+ − 324 characters, as in @samp{[a-z$%.]}, which matches any lower case letter
+ − 325 or @samp{$}, @samp{%}, or a period.@refill
+ − 326
+ − 327 To include a @samp{]} in a character set, make it the first character.
+ − 328 For example, @samp{[]a]} matches @samp{]} or @samp{a}. To include a
+ − 329 @samp{-}, write @samp{-} as the first character in the set, or put it
+ − 330 immediately after a range. (You can replace one individual character
+ − 331 @var{c} with the range @samp{@var{c}-@var{c}} to make a place to put the
+ − 332 @samp{-}.) There is no way to write a set containing just @samp{-} and
+ − 333 @samp{]}.
+ − 334
+ − 335 To include @samp{^} in a set, put it anywhere but at the beginning of
+ − 336 the set.
+ − 337
+ − 338 @item [^ @dots{} ]
+ − 339 @cindex @samp{^} in regexp
+ − 340 @samp{[^} begins a @dfn{complement character set}, which matches any
+ − 341 character except the ones specified. Thus, @samp{[^a-z0-9A-Z]}
+ − 342 matches all characters @emph{except} letters and digits.@refill
+ − 343
+ − 344 @samp{^} is not special in a character set unless it is the first
+ − 345 character. The character following the @samp{^} is treated as if it
+ − 346 were first (thus, @samp{-} and @samp{]} are not special there).
+ − 347
+ − 348 Note that a complement character set can match a newline, unless
+ − 349 newline is mentioned as one of the characters not to match.
+ − 350
+ − 351 @item ^
+ − 352 @cindex @samp{^} in regexp
+ − 353 @cindex beginning of line in regexp
+ − 354 is a special character that matches the empty string, but only at the
+ − 355 beginning of a line in the text being matched. Otherwise it fails to
+ − 356 match anything. Thus, @samp{^foo} matches a @samp{foo} that occurs at
+ − 357 the beginning of a line.
+ − 358
+ − 359 When matching a string instead of a buffer, @samp{^} matches at the
+ − 360 beginning of the string or after a newline character @samp{\n}.
+ − 361
+ − 362 @item $
+ − 363 @cindex @samp{$} in regexp
+ − 364 is similar to @samp{^} but matches only at the end of a line. Thus,
+ − 365 @samp{x+$} matches a string of one @samp{x} or more at the end of a line.
+ − 366
+ − 367 When matching a string instead of a buffer, @samp{$} matches at the end
+ − 368 of the string or before a newline character @samp{\n}.
+ − 369
+ − 370 @item \
+ − 371 @cindex @samp{\} in regexp
+ − 372 has two functions: it quotes the special characters (including
+ − 373 @samp{\}), and it introduces additional special constructs.
+ − 374
+ − 375 Because @samp{\} quotes special characters, @samp{\$} is a regular
+ − 376 expression that matches only @samp{$}, and @samp{\[} is a regular
+ − 377 expression that matches only @samp{[}, and so on.
+ − 378
+ − 379 Note that @samp{\} also has special meaning in the read syntax of Lisp
+ − 380 strings (@pxref{String Type}), and must be quoted with @samp{\}. For
+ − 381 example, the regular expression that matches the @samp{\} character is
+ − 382 @samp{\\}. To write a Lisp string that contains the characters
+ − 383 @samp{\\}, Lisp syntax requires you to quote each @samp{\} with another
+ − 384 @samp{\}. Therefore, the read syntax for a regular expression matching
+ − 385 @samp{\} is @code{"\\\\"}.@refill
+ − 386 @end table
+ − 387
+ − 388 @strong{Please note:} For historical compatibility, special characters
+ − 389 are treated as ordinary ones if they are in contexts where their special
+ − 390 meanings make no sense. For example, @samp{*foo} treats @samp{*} as
+ − 391 ordinary since there is no preceding expression on which the @samp{*}
+ − 392 can act. It is poor practice to depend on this behavior; quote the
+ − 393 special character anyway, regardless of where it appears.@refill
+ − 394
+ − 395 For the most part, @samp{\} followed by any character matches only
+ − 396 that character. However, there are several exceptions: characters
+ − 397 that, when preceded by @samp{\}, are special constructs. Such
+ − 398 characters are always ordinary when encountered on their own. Here
+ − 399 is a table of @samp{\} constructs:
+ − 400
+ − 401 @table @kbd
+ − 402 @item \|
+ − 403 @cindex @samp{|} in regexp
+ − 404 @cindex regexp alternative
+ − 405 specifies an alternative.
+ − 406 Two regular expressions @var{a} and @var{b} with @samp{\|} in
+ − 407 between form an expression that matches anything that either @var{a} or
+ − 408 @var{b} matches.@refill
+ − 409
+ − 410 Thus, @samp{foo\|bar} matches either @samp{foo} or @samp{bar}
+ − 411 but no other string.@refill
+ − 412
+ − 413 @samp{\|} applies to the largest possible surrounding expressions. Only a
+ − 414 surrounding @samp{\( @dots{} \)} grouping can limit the grouping power of
+ − 415 @samp{\|}.@refill
+ − 416
+ − 417 Full backtracking capability exists to handle multiple uses of @samp{\|}.
+ − 418
+ − 419 @item \( @dots{} \)
+ − 420 @cindex @samp{(} in regexp
+ − 421 @cindex @samp{)} in regexp
+ − 422 @cindex regexp grouping
+ − 423 is a grouping construct that serves three purposes:
+ − 424
+ − 425 @enumerate
+ − 426 @item
+ − 427 To enclose a set of @samp{\|} alternatives for other operations.
+ − 428 Thus, @samp{\(foo\|bar\)x} matches either @samp{foox} or @samp{barx}.
+ − 429
+ − 430 @item
+ − 431 To enclose an expression for a suffix operator such as @samp{*} to act
+ − 432 on. Thus, @samp{ba\(na\)*} matches @samp{bananana}, etc., with any
+ − 433 (zero or more) number of @samp{na} strings.@refill
+ − 434
+ − 435 @item
+ − 436 To record a matched substring for future reference.
+ − 437 @end enumerate
+ − 438
+ − 439 This last application is not a consequence of the idea of a
+ − 440 parenthetical grouping; it is a separate feature that happens to be
+ − 441 assigned as a second meaning to the same @samp{\( @dots{} \)} construct
+ − 442 because there is no conflict in practice between the two meanings.
+ − 443 Here is an explanation of this feature:
+ − 444
+ − 445 @item \@var{digit}
+ − 446 matches the same text that matched the @var{digit}th occurrence of a
+ − 447 @samp{\( @dots{} \)} construct.
+ − 448
2255
+ − 449 In other words, after the end of a @samp{\( @dots{} \)} construct, the
428
+ − 450 matcher remembers the beginning and end of the text matched by that
+ − 451 construct. Then, later on in the regular expression, you can use
+ − 452 @samp{\} followed by @var{digit} to match that same text, whatever it
+ − 453 may have been.
+ − 454
+ − 455 The strings matching the first nine @samp{\( @dots{} \)} constructs
+ − 456 appearing in a regular expression are assigned numbers 1 through 9 in
+ − 457 the order that the open parentheses appear in the regular expression.
+ − 458 So you can use @samp{\1} through @samp{\9} to refer to the text matched
+ − 459 by the corresponding @samp{\( @dots{} \)} constructs.
+ − 460
+ − 461 For example, @samp{\(.*\)\1} matches any newline-free string that is
+ − 462 composed of two identical halves. The @samp{\(.*\)} matches the first
+ − 463 half, which may be anything, but the @samp{\1} that follows must match
+ − 464 the same exact text.
+ − 465
+ − 466 @item \(?: @dots{} \)
+ − 467 @cindex @samp{\(?:} in regexp
+ − 468 @cindex regexp grouping
+ − 469 is called a @dfn{shy} grouping operator, and it is used just like
+ − 470 @samp{\( @dots{} \)}, except that it does not cause the matched
+ − 471 substring to be recorded for future reference.
+ − 472
+ − 473 This is useful when you need a lot of grouping @samp{\( @dots{} \)}
442
+ − 474 constructs, but only want to remember one or two -- or if you have
+ − 475 more than nine groupings and need to use backreferences to refer to
2255
+ − 476 the groupings at the end. It also allows construction of regular
+ − 477 expressions from variable subexpressions that contain varying numbers of
+ − 478 non-capturing subexpressions, without disturbing the group counts for
+ − 479 the main expression. For example
+ − 480
+ − 481 @example
+ − 482 (let ((sre (if foo "\\(?:bar\\|baz\\)" "quux")))
+ − 483 (re-search-forward (format "a\\(b+ %s c+\\) d" sre) nil t)
+ − 484 (match-string 1))
+ − 485 @end example
428
+ − 486
2255
+ − 487 It is very tedious to write this kind of code without shy groups, even
+ − 488 if you know what all the alternative subexpressions will look like.
428
+ − 489
2255
+ − 490 Using @samp{\(?: @dots{} \)} rather than @samp{\( @dots{} \)} should
+ − 491 give little performance gain, as the start of each group must be
+ − 492 recorded for the purpose of back-tracking in any case, and no string
+ − 493 copying is done until @code{match-string} is called.
+ − 494
+ − 495 The shy grouping operator has been borrowed from Perl, and was not
+ − 496 available prior to XEmacs 20.3, and has only been available in GNU Emacs
+ − 497 since version 21.
428
+ − 498
+ − 499 @item \w
+ − 500 @cindex @samp{\w} in regexp
+ − 501 matches any word-constituent character. The editor syntax table
+ − 502 determines which characters these are. @xref{Syntax Tables}.
+ − 503
+ − 504 @item \W
+ − 505 @cindex @samp{\W} in regexp
+ − 506 matches any character that is not a word constituent.
+ − 507
+ − 508 @item \s@var{code}
+ − 509 @cindex @samp{\s} in regexp
+ − 510 matches any character whose syntax is @var{code}. Here @var{code} is a
+ − 511 character that represents a syntax code: thus, @samp{w} for word
+ − 512 constituent, @samp{-} for whitespace, @samp{(} for open parenthesis,
+ − 513 etc. @xref{Syntax Tables}, for a list of syntax codes and the
+ − 514 characters that stand for them.
+ − 515
+ − 516 @item \S@var{code}
+ − 517 @cindex @samp{\S} in regexp
+ − 518 matches any character whose syntax is not @var{code}.
2608
+ − 519
+ − 520 @item \c@var{category}
+ − 521 @cindex @samp{\c} in regexp
+ − 522 matches any character in @var{category}. Only available under Mule,
+ − 523 categories, and category tables, are further described in @ref{Category
+ − 524 Tables}. They are a mechanism for constructing classes of characters
+ − 525 that can be local to a buffer, and that do not require complicated []
+ − 526 expressions every time they are referenced.
+ − 527
+ − 528 @item \C@var{category}
+ − 529 @cindex @samp{\C} in regexp
+ − 530 matches any character outside @var{category}. @xref{Category Tables},
+ − 531 again, and note that this is only available under Mule.
428
+ − 532 @end table
+ − 533
+ − 534 The following regular expression constructs match the empty string---that is,
+ − 535 they don't use up any characters---but whether they match depends on the
+ − 536 context.
+ − 537
+ − 538 @table @kbd
+ − 539 @item \`
+ − 540 @cindex @samp{\`} in regexp
+ − 541 matches the empty string, but only at the beginning
+ − 542 of the buffer or string being matched against.
+ − 543
+ − 544 @item \'
+ − 545 @cindex @samp{\'} in regexp
+ − 546 matches the empty string, but only at the end of
+ − 547 the buffer or string being matched against.
+ − 548
+ − 549 @item \=
+ − 550 @cindex @samp{\=} in regexp
+ − 551 matches the empty string, but only at point.
+ − 552 (This construct is not defined when matching against a string.)
+ − 553
+ − 554 @item \b
+ − 555 @cindex @samp{\b} in regexp
+ − 556 matches the empty string, but only at the beginning or
+ − 557 end of a word. Thus, @samp{\bfoo\b} matches any occurrence of
+ − 558 @samp{foo} as a separate word. @samp{\bballs?\b} matches
+ − 559 @samp{ball} or @samp{balls} as a separate word.@refill
+ − 560
+ − 561 @item \B
+ − 562 @cindex @samp{\B} in regexp
+ − 563 matches the empty string, but @emph{not} at the beginning or
+ − 564 end of a word.
+ − 565
+ − 566 @item \<
+ − 567 @cindex @samp{\<} in regexp
+ − 568 matches the empty string, but only at the beginning of a word.
+ − 569
+ − 570 @item \>
+ − 571 @cindex @samp{\>} in regexp
+ − 572 matches the empty string, but only at the end of a word.
+ − 573 @end table
+ − 574
+ − 575 @kindex invalid-regexp
+ − 576 Not every string is a valid regular expression. For example, a string
+ − 577 with unbalanced square brackets is invalid (with a few exceptions, such
+ − 578 as @samp{[]]}), and so is a string that ends with a single @samp{\}. If
+ − 579 an invalid regular expression is passed to any of the search functions,
+ − 580 an @code{invalid-regexp} error is signaled.
+ − 581
+ − 582 @defun regexp-quote string
+ − 583 This function returns a regular expression string that matches exactly
+ − 584 @var{string} and nothing else. This allows you to request an exact
+ − 585 string match when calling a function that wants a regular expression.
+ − 586
+ − 587 @example
+ − 588 @group
+ − 589 (regexp-quote "^The cat$")
+ − 590 @result{} "\\^The cat\\$"
+ − 591 @end group
+ − 592 @end example
+ − 593
+ − 594 One use of @code{regexp-quote} is to combine an exact string match with
+ − 595 context described as a regular expression. For example, this searches
+ − 596 for the string that is the value of @code{string}, surrounded by
+ − 597 whitespace:
+ − 598
+ − 599 @example
+ − 600 @group
+ − 601 (re-search-forward
+ − 602 (concat "\\s-" (regexp-quote string) "\\s-"))
+ − 603 @end group
+ − 604 @end example
+ − 605 @end defun
+ − 606
+ − 607 @node Regexp Example
+ − 608 @subsection Complex Regexp Example
+ − 609
+ − 610 Here is a complicated regexp, used by XEmacs to recognize the end of a
+ − 611 sentence together with any whitespace that follows. It is the value of
444
+ − 612 the variable @code{sentence-end}.
428
+ − 613
+ − 614 First, we show the regexp as a string in Lisp syntax to distinguish
+ − 615 spaces from tab characters. The string constant begins and ends with a
+ − 616 double-quote. @samp{\"} stands for a double-quote as part of the
+ − 617 string, @samp{\\} for a backslash as part of the string, @samp{\t} for a
+ − 618 tab and @samp{\n} for a newline.
+ − 619
+ − 620 @example
+ − 621 "[.?!][]\"')@}]*\\($\\| $\\|\t\\| \\)[ \t\n]*"
+ − 622 @end example
+ − 623
+ − 624 In contrast, if you evaluate the variable @code{sentence-end}, you
+ − 625 will see the following:
+ − 626
+ − 627 @example
+ − 628 @group
+ − 629 sentence-end
+ − 630 @result{}
444
+ − 631 "[.?!][]\"')@}]*\\($\\| $\\| \\| \\)[
428
+ − 632 ]*"
+ − 633 @end group
+ − 634 @end example
+ − 635
+ − 636 @noindent
+ − 637 In this output, tab and newline appear as themselves.
+ − 638
+ − 639 This regular expression contains four parts in succession and can be
+ − 640 deciphered as follows:
+ − 641
+ − 642 @table @code
+ − 643 @item [.?!]
+ − 644 The first part of the pattern is a character set that matches any one of
+ − 645 three characters: period, question mark, and exclamation mark. The
+ − 646 match must begin with one of these three characters.
+ − 647
+ − 648 @item []\"')@}]*
+ − 649 The second part of the pattern matches any closing braces and quotation
+ − 650 marks, zero or more of them, that may follow the period, question mark
+ − 651 or exclamation mark. The @code{\"} is Lisp syntax for a double-quote in
+ − 652 a string. The @samp{*} at the end indicates that the immediately
+ − 653 preceding regular expression (a character set, in this case) may be
+ − 654 repeated zero or more times.
+ − 655
+ − 656 @item \\($\\|@ $\\|\t\\|@ @ \\)
+ − 657 The third part of the pattern matches the whitespace that follows the
+ − 658 end of a sentence: the end of a line, or a tab, or two spaces. The
+ − 659 double backslashes mark the parentheses and vertical bars as regular
+ − 660 expression syntax; the parentheses delimit a group and the vertical bars
+ − 661 separate alternatives. The dollar sign is used to match the end of a
+ − 662 line.
+ − 663
+ − 664 @item [ \t\n]*
+ − 665 Finally, the last part of the pattern matches any additional whitespace
+ − 666 beyond the minimum needed to end a sentence.
+ − 667 @end table
+ − 668
+ − 669 @node Regexp Search
+ − 670 @section Regular Expression Searching
+ − 671 @cindex regular expression searching
+ − 672 @cindex regexp searching
+ − 673 @cindex searching for regexp
+ − 674
+ − 675 In XEmacs, you can search for the next match for a regexp either
+ − 676 incrementally or not. Incremental search commands are described in the
446
+ − 677 @cite{The XEmacs Lisp Reference Manual}. @xref{Regexp Search, , Regular Expression
+ − 678 Search, xemacs, The XEmacs Lisp Reference Manual}. Here we describe only the search
428
+ − 679 functions useful in programs. The principal one is
+ − 680 @code{re-search-forward}.
+ − 681
444
+ − 682 @deffn Command re-search-forward regexp &optional limit noerror count buffer
428
+ − 683 This function searches forward in the current buffer for a string of
+ − 684 text that is matched by the regular expression @var{regexp}. The
+ − 685 function skips over any amount of text that is not matched by
+ − 686 @var{regexp}, and leaves point at the end of the first match found.
+ − 687 It returns the new value of point.
+ − 688
+ − 689 If @var{limit} is non-@code{nil} (it must be a position in the current
+ − 690 buffer), then it is the upper bound to the search. No match extending
+ − 691 after that position is accepted.
+ − 692
+ − 693 What happens when the search fails depends on the value of
+ − 694 @var{noerror}. If @var{noerror} is @code{nil}, a @code{search-failed}
+ − 695 error is signaled. If @var{noerror} is @code{t},
+ − 696 @code{re-search-forward} does nothing and returns @code{nil}. If
+ − 697 @var{noerror} is neither @code{nil} nor @code{t}, then
+ − 698 @code{re-search-forward} moves point to @var{limit} (or the end of the
+ − 699 buffer) and returns @code{nil}.
+ − 700
444
+ − 701 If @var{count} is supplied (it must be a positive number), then the
428
+ − 702 search is repeated that many times (each time starting at the end of the
+ − 703 previous time's match). If these successive searches succeed, the
+ − 704 function succeeds, moving point and returning its new value. Otherwise
+ − 705 the search fails.
+ − 706
+ − 707 In the following example, point is initially before the @samp{T}.
+ − 708 Evaluating the search call moves point to the end of that line (between
+ − 709 the @samp{t} of @samp{hat} and the newline).
+ − 710
+ − 711 @example
+ − 712 @group
+ − 713 ---------- Buffer: foo ----------
+ − 714 I read "@point{}The cat in the hat
+ − 715 comes back" twice.
+ − 716 ---------- Buffer: foo ----------
+ − 717 @end group
+ − 718
+ − 719 @group
+ − 720 (re-search-forward "[a-z]+" nil t 5)
+ − 721 @result{} 27
+ − 722
+ − 723 ---------- Buffer: foo ----------
+ − 724 I read "The cat in the hat@point{}
+ − 725 comes back" twice.
+ − 726 ---------- Buffer: foo ----------
+ − 727 @end group
+ − 728 @end example
+ − 729 @end deffn
+ − 730
444
+ − 731 @deffn Command re-search-backward regexp &optional limit noerror count buffer
428
+ − 732 This function searches backward in the current buffer for a string of
+ − 733 text that is matched by the regular expression @var{regexp}, leaving
+ − 734 point at the beginning of the first text found.
+ − 735
+ − 736 This function is analogous to @code{re-search-forward}, but they are not
+ − 737 simple mirror images. @code{re-search-forward} finds the match whose
+ − 738 beginning is as close as possible to the starting point. If
+ − 739 @code{re-search-backward} were a perfect mirror image, it would find the
+ − 740 match whose end is as close as possible. However, in fact it finds the
+ − 741 match whose beginning is as close as possible. The reason is that
+ − 742 matching a regular expression at a given spot always works from
+ − 743 beginning to end, and starts at a specified beginning position.
+ − 744
+ − 745 A true mirror-image of @code{re-search-forward} would require a special
+ − 746 feature for matching regexps from end to beginning. It's not worth the
+ − 747 trouble of implementing that.
+ − 748 @end deffn
+ − 749
444
+ − 750 @defun string-match regexp string &optional start buffer
428
+ − 751 This function returns the index of the start of the first match for
+ − 752 the regular expression @var{regexp} in @var{string}, or @code{nil} if
+ − 753 there is no match. If @var{start} is non-@code{nil}, the search starts
+ − 754 at that index in @var{string}.
+ − 755
444
+ − 756
+ − 757 Optional arg @var{buffer} controls how case folding is done (according
+ − 758 to the value of @code{case-fold-search} in @var{buffer} and
+ − 759 @var{buffer}'s case tables) and defaults to the current buffer.
+ − 760
428
+ − 761 For example,
+ − 762
+ − 763 @example
+ − 764 @group
+ − 765 (string-match
+ − 766 "quick" "The quick brown fox jumped quickly.")
+ − 767 @result{} 4
+ − 768 @end group
+ − 769 @group
+ − 770 (string-match
+ − 771 "quick" "The quick brown fox jumped quickly." 8)
+ − 772 @result{} 27
+ − 773 @end group
+ − 774 @end example
+ − 775
+ − 776 @noindent
+ − 777 The index of the first character of the
+ − 778 string is 0, the index of the second character is 1, and so on.
+ − 779
+ − 780 After this function returns, the index of the first character beyond
+ − 781 the match is available as @code{(match-end 0)}. @xref{Match Data}.
+ − 782
+ − 783 @example
+ − 784 @group
+ − 785 (string-match
+ − 786 "quick" "The quick brown fox jumped quickly." 8)
+ − 787 @result{} 27
+ − 788 @end group
+ − 789
+ − 790 @group
+ − 791 (match-end 0)
+ − 792 @result{} 32
+ − 793 @end group
+ − 794 @end example
+ − 795 @end defun
+ − 796
1495
+ − 797 The function @code{split-string} can be used to parse a string into
+ − 798 components delimited by text matching a regular expression.
+ − 799
+ − 800 @defvar split-string-default-separators
+ − 801 The default value of @var{separators} for @code{split-string}, initially
+ − 802 @samp{"[ \f\t\n\r\v]+"}.
+ − 803 @end defvar
+ − 804
+ − 805 @defun split-string string &optional separators omit-nulls
+ − 806 This function splits @var{string} into substrings delimited by matches
+ − 807 for the regular expression @var{separators}. Each match for
+ − 808 @var{separators} defines a splitting point; the substrings between the
+ − 809 splitting points are made into a list, which is the value returned by
+ − 810 @code{split-string}. If @var{omit-nulls} is @code{t}, null strings will
+ − 811 be removed from the result list. Otherwise, null strings are left in
+ − 812 the result. If @var{separators} is @code{nil} (or omitted), the default
+ − 813 is the value of @code{split-string-default-separators}.
+ − 814
+ − 815 As a special case, when @var{separators} is @code{nil} (or omitted),
+ − 816 null strings are always omitted from the result. Thus:
+ − 817
+ − 818 @example
+ − 819 (split-string " two words ")
+ − 820 @result{} ("two" "words")
+ − 821 @end example
+ − 822
+ − 823 The result is not @samp{("" "two" "words" "")}, which would rarely be
+ − 824 useful. If you need such a result, use an explict value for
+ − 825 @var{separators}:
+ − 826
+ − 827 @example
+ − 828 (split-string " two words " split-string-default-separators)
+ − 829 @result{} ("" "two" "words" "")
+ − 830 @end example
+ − 831
+ − 832 A few examples (there are more in the regression tests):
428
+ − 833
+ − 834 @example
+ − 835 @group
1495
+ − 836 (split-string "foo" "")
+ − 837 @result{} ("" "f" "o" "o" "")
+ − 838 @end group
+ − 839 @group
+ − 840 (split-string "foo" "^")
+ − 841 @result{} ("" "foo")
+ − 842 @end group
+ − 843 @group
+ − 844 (split-string "foo" "$")
+ − 845 @result{} ("foo" ""))
+ − 846 @end group
+ − 847 @group
+ − 848 (split-string "foo,bar" ",")
428
+ − 849 @result{} ("foo" "bar")
+ − 850 @end group
+ − 851 @group
1495
+ − 852 (split-string ",foo,bar," ",")
+ − 853 @result{} ("" "foo" "bar" "")
428
+ − 854 @end group
+ − 855 @group
1495
+ − 856 (split-string ",foo,bar," "^,")
+ − 857 @result{} ("" "foo,bar,")
428
+ − 858 @end group
+ − 859 @group
1495
+ − 860 (split-string "foo,bar" "," t)
+ − 861 @result{} ("foo" "bar")
+ − 862 @end group
+ − 863 @group
+ − 864 (split-string ",foo,bar," "," t)
+ − 865 @result{} ("foo" "bar")
428
+ − 866 @end group
+ − 867 @end example
+ − 868 @end defun
+ − 869
+ − 870 @defun split-path path
+ − 871 This function splits a search path into a list of strings. The path
+ − 872 components are separated with the characters specified with
+ − 873 @code{path-separator}. Under Unix, @code{path-separator} will normally
+ − 874 be @samp{:}, while under Windows, it will be @samp{;}.
+ − 875 @end defun
+ − 876
444
+ − 877 @defun looking-at regexp &optional buffer
428
+ − 878 This function determines whether the text in the current buffer directly
+ − 879 following point matches the regular expression @var{regexp}. ``Directly
+ − 880 following'' means precisely that: the search is ``anchored'' and it can
+ − 881 succeed only starting with the first character following point. The
+ − 882 result is @code{t} if so, @code{nil} otherwise.
+ − 883
+ − 884 This function does not move point, but it updates the match data, which
+ − 885 you can access using @code{match-beginning} and @code{match-end}.
+ − 886 @xref{Match Data}.
+ − 887
+ − 888 In this example, point is located directly before the @samp{T}. If it
+ − 889 were anywhere else, the result would be @code{nil}.
+ − 890
+ − 891 @example
+ − 892 @group
+ − 893 ---------- Buffer: foo ----------
+ − 894 I read "@point{}The cat in the hat
+ − 895 comes back" twice.
+ − 896 ---------- Buffer: foo ----------
+ − 897
+ − 898 (looking-at "The cat in the hat$")
+ − 899 @result{} t
+ − 900 @end group
+ − 901 @end example
+ − 902 @end defun
+ − 903
+ − 904 @node POSIX Regexps
+ − 905 @section POSIX Regular Expression Searching
+ − 906
+ − 907 The usual regular expression functions do backtracking when necessary
+ − 908 to handle the @samp{\|} and repetition constructs, but they continue
+ − 909 this only until they find @emph{some} match. Then they succeed and
+ − 910 report the first match found.
+ − 911
+ − 912 This section describes alternative search functions which perform the
+ − 913 full backtracking specified by the POSIX standard for regular expression
+ − 914 matching. They continue backtracking until they have tried all
+ − 915 possibilities and found all matches, so they can report the longest
+ − 916 match, as required by POSIX. This is much slower, so use these
+ − 917 functions only when you really need the longest match.
+ − 918
+ − 919 In Emacs versions prior to 19.29, these functions did not exist, and
+ − 920 the functions described above implemented full POSIX backtracking.
+ − 921
444
+ − 922 @deffn Command posix-search-forward regexp &optional limit noerror count buffer
428
+ − 923 This is like @code{re-search-forward} except that it performs the full
+ − 924 backtracking specified by the POSIX standard for regular expression
+ − 925 matching.
444
+ − 926 @end deffn
428
+ − 927
444
+ − 928 @deffn Command posix-search-backward regexp &optional limit noerror count buffer
428
+ − 929 This is like @code{re-search-backward} except that it performs the full
+ − 930 backtracking specified by the POSIX standard for regular expression
+ − 931 matching.
444
+ − 932 @end deffn
428
+ − 933
444
+ − 934 @defun posix-looking-at regexp &optional buffer
428
+ − 935 This is like @code{looking-at} except that it performs the full
+ − 936 backtracking specified by the POSIX standard for regular expression
+ − 937 matching.
+ − 938 @end defun
+ − 939
444
+ − 940 @defun posix-string-match regexp string &optional start buffer
428
+ − 941 This is like @code{string-match} except that it performs the full
+ − 942 backtracking specified by the POSIX standard for regular expression
+ − 943 matching.
444
+ − 944
+ − 945 Optional arg @var{buffer} controls how case folding is done (according
+ − 946 to the value of @code{case-fold-search} in @var{buffer} and
+ − 947 @var{buffer}'s case tables) and defaults to the current buffer.
428
+ − 948 @end defun
+ − 949
+ − 950 @ignore
+ − 951 @deffn Command delete-matching-lines regexp
+ − 952 This function is identical to @code{delete-non-matching-lines}, save
+ − 953 that it deletes what @code{delete-non-matching-lines} keeps.
+ − 954
+ − 955 In the example below, point is located on the first line of text.
+ − 956
+ − 957 @example
+ − 958 @group
+ − 959 ---------- Buffer: foo ----------
+ − 960 We hold these truths
+ − 961 to be self-evident,
+ − 962 that all men are created
+ − 963 equal, and that they are
+ − 964 ---------- Buffer: foo ----------
+ − 965 @end group
+ − 966
+ − 967 @group
+ − 968 (delete-matching-lines "the")
+ − 969 @result{} nil
+ − 970
+ − 971 ---------- Buffer: foo ----------
+ − 972 to be self-evident,
+ − 973 that all men are created
+ − 974 ---------- Buffer: foo ----------
+ − 975 @end group
+ − 976 @end example
+ − 977 @end deffn
+ − 978
+ − 979 @deffn Command flush-lines regexp
444
+ − 980 This function is an alias of @code{delete-matching-lines}.
428
+ − 981 @end deffn
+ − 982
444
+ − 983 @deffn Command delete-non-matching-lines regexp
428
+ − 984 This function deletes all lines following point which don't
+ − 985 contain a match for the regular expression @var{regexp}.
444
+ − 986 @end deffn
428
+ − 987
+ − 988 @deffn Command keep-lines regexp
+ − 989 This function is the same as @code{delete-non-matching-lines}.
+ − 990 @end deffn
+ − 991
444
+ − 992 @deffn Command count-matches regexp
428
+ − 993 This function counts the number of matches for @var{regexp} there are in
+ − 994 the current buffer following point. It prints this number in
+ − 995 the echo area, returning the string printed.
+ − 996 @end deffn
+ − 997
444
+ − 998 @deffn Command how-many regexp
+ − 999 This function is an alias of @code{count-matches}.
428
+ − 1000 @end deffn
+ − 1001
444
+ − 1002 @deffn Command list-matching-lines regexp &optional nlines
428
+ − 1003 This function is a synonym of @code{occur}.
+ − 1004 Show all lines following point containing a match for @var{regexp}.
+ − 1005 Display each line with @var{nlines} lines before and after,
+ − 1006 or @code{-}@var{nlines} before if @var{nlines} is negative.
+ − 1007 @var{nlines} defaults to @code{list-matching-lines-default-context-lines}.
+ − 1008 Interactively it is the prefix arg.
+ − 1009
+ − 1010 The lines are shown in a buffer named @samp{*Occur*}.
+ − 1011 It serves as a menu to find any of the occurrences in this buffer.
+ − 1012 @kbd{C-h m} (@code{describe-mode} in that buffer gives help.
+ − 1013 @end deffn
+ − 1014
+ − 1015 @defopt list-matching-lines-default-context-lines
+ − 1016 Default value is 0.
+ − 1017 Default number of context lines to include around a @code{list-matching-lines}
+ − 1018 match. A negative number means to include that many lines before the match.
+ − 1019 A positive number means to include that many lines both before and after.
+ − 1020 @end defopt
+ − 1021 @end ignore
+ − 1022
+ − 1023 @node Search and Replace
+ − 1024 @section Search and Replace
+ − 1025 @cindex replacement
+ − 1026
+ − 1027 @defun perform-replace from-string replacements query-flag regexp-flag delimited-flag &optional repeat-count map
+ − 1028 This function is the guts of @code{query-replace} and related commands.
+ − 1029 It searches for occurrences of @var{from-string} and replaces some or
+ − 1030 all of them. If @var{query-flag} is @code{nil}, it replaces all
+ − 1031 occurrences; otherwise, it asks the user what to do about each one.
+ − 1032
+ − 1033 If @var{regexp-flag} is non-@code{nil}, then @var{from-string} is
+ − 1034 considered a regular expression; otherwise, it must match literally. If
+ − 1035 @var{delimited-flag} is non-@code{nil}, then only replacements
+ − 1036 surrounded by word boundaries are considered.
+ − 1037
+ − 1038 The argument @var{replacements} specifies what to replace occurrences
+ − 1039 with. If it is a string, that string is used. It can also be a list of
+ − 1040 strings, to be used in cyclic order.
+ − 1041
+ − 1042 If @var{repeat-count} is non-@code{nil}, it should be an integer. Then
+ − 1043 it specifies how many times to use each of the strings in the
+ − 1044 @var{replacements} list before advancing cyclicly to the next one.
+ − 1045
+ − 1046 Normally, the keymap @code{query-replace-map} defines the possible user
+ − 1047 responses for queries. The argument @var{map}, if non-@code{nil}, is a
+ − 1048 keymap to use instead of @code{query-replace-map}.
+ − 1049 @end defun
+ − 1050
+ − 1051 @defvar query-replace-map
+ − 1052 This variable holds a special keymap that defines the valid user
+ − 1053 responses for @code{query-replace} and related functions, as well as
+ − 1054 @code{y-or-n-p} and @code{map-y-or-n-p}. It is unusual in two ways:
+ − 1055
+ − 1056 @itemize @bullet
+ − 1057 @item
+ − 1058 The ``key bindings'' are not commands, just symbols that are meaningful
+ − 1059 to the functions that use this map.
+ − 1060
+ − 1061 @item
+ − 1062 Prefix keys are not supported; each key binding must be for a single event
+ − 1063 key sequence. This is because the functions don't use read key sequence to
+ − 1064 get the input; instead, they read a single event and look it up ``by hand.''
+ − 1065 @end itemize
+ − 1066 @end defvar
+ − 1067
+ − 1068 Here are the meaningful ``bindings'' for @code{query-replace-map}.
+ − 1069 Several of them are meaningful only for @code{query-replace} and
+ − 1070 friends.
+ − 1071
+ − 1072 @table @code
+ − 1073 @item act
+ − 1074 Do take the action being considered---in other words, ``yes.''
+ − 1075
+ − 1076 @item skip
+ − 1077 Do not take action for this question---in other words, ``no.''
+ − 1078
+ − 1079 @item exit
+ − 1080 Answer this question ``no,'' and give up on the entire series of
+ − 1081 questions, assuming that the answers will be ``no.''
+ − 1082
+ − 1083 @item act-and-exit
+ − 1084 Answer this question ``yes,'' and give up on the entire series of
+ − 1085 questions, assuming that subsequent answers will be ``no.''
+ − 1086
+ − 1087 @item act-and-show
+ − 1088 Answer this question ``yes,'' but show the results---don't advance yet
+ − 1089 to the next question.
+ − 1090
+ − 1091 @item automatic
+ − 1092 Answer this question and all subsequent questions in the series with
+ − 1093 ``yes,'' without further user interaction.
+ − 1094
+ − 1095 @item backup
+ − 1096 Move back to the previous place that a question was asked about.
+ − 1097
+ − 1098 @item edit
+ − 1099 Enter a recursive edit to deal with this question---instead of any
+ − 1100 other action that would normally be taken.
+ − 1101
+ − 1102 @item delete-and-edit
+ − 1103 Delete the text being considered, then enter a recursive edit to replace
+ − 1104 it.
+ − 1105
+ − 1106 @item recenter
+ − 1107 Redisplay and center the window, then ask the same question again.
+ − 1108
+ − 1109 @item quit
+ − 1110 Perform a quit right away. Only @code{y-or-n-p} and related functions
+ − 1111 use this answer.
+ − 1112
+ − 1113 @item help
+ − 1114 Display some help, then ask again.
+ − 1115 @end table
+ − 1116
+ − 1117 @node Match Data
+ − 1118 @section The Match Data
+ − 1119 @cindex match data
+ − 1120
+ − 1121 XEmacs keeps track of the positions of the start and end of segments of
+ − 1122 text found during a regular expression search. This means, for example,
+ − 1123 that you can search for a complex pattern, such as a date in an Rmail
+ − 1124 message, and then extract parts of the match under control of the
+ − 1125 pattern.
+ − 1126
1468
+ − 1127 Because the match data normally describe the most recent successful
+ − 1128 search only, you must be careful not to do another search inadvertently
+ − 1129 between the search you wish to refer back to and the use of the match
+ − 1130 data. If you can't avoid another intervening search, you must save and
+ − 1131 restore the match data around it, to prevent it from being overwritten.
+ − 1132
+ − 1133 To make it possible to write iterative or recursive code that repeatedly
+ − 1134 searches, and uses the data from the last successful search when no more
+ − 1135 matches can be found, a search or match which fails will preserve the
+ − 1136 match data from the last successful search. (You must not depend on
+ − 1137 match data being preserved in case the search or match signals an
+ − 1138 error.) If for some reason you need to clear the match data, you may
+ − 1139 use @code{(store-match-data nil)}.
428
+ − 1140
+ − 1141 @menu
+ − 1142 * Simple Match Data:: Accessing single items of match data,
+ − 1143 such as where a particular subexpression started.
+ − 1144 * Replacing Match:: Replacing a substring that was matched.
+ − 1145 * Entire Match Data:: Accessing the entire match data at once, as a list.
+ − 1146 * Saving Match Data:: Saving and restoring the match data.
+ − 1147 @end menu
+ − 1148
+ − 1149 @node Simple Match Data
+ − 1150 @subsection Simple Match Data Access
+ − 1151
+ − 1152 This section explains how to use the match data to find out what was
+ − 1153 matched by the last search or match operation.
+ − 1154
+ − 1155 You can ask about the entire matching text, or about a particular
+ − 1156 parenthetical subexpression of a regular expression. The @var{count}
+ − 1157 argument in the functions below specifies which. If @var{count} is
+ − 1158 zero, you are asking about the entire match. If @var{count} is
+ − 1159 positive, it specifies which subexpression you want.
+ − 1160
+ − 1161 Recall that the subexpressions of a regular expression are those
+ − 1162 expressions grouped with escaped parentheses, @samp{\(@dots{}\)}. The
+ − 1163 @var{count}th subexpression is found by counting occurrences of
+ − 1164 @samp{\(} from the beginning of the whole regular expression. The first
+ − 1165 subexpression is numbered 1, the second 2, and so on. Only regular
+ − 1166 expressions can have subexpressions---after a simple string search, the
+ − 1167 only information available is about the entire match.
+ − 1168
+ − 1169 @defun match-string count &optional in-string
+ − 1170 This function returns, as a string, the text matched in the last search
+ − 1171 or match operation. It returns the entire text if @var{count} is zero,
+ − 1172 or just the portion corresponding to the @var{count}th parenthetical
+ − 1173 subexpression, if @var{count} is positive. If @var{count} is out of
+ − 1174 range, or if that subexpression didn't match anything, the value is
+ − 1175 @code{nil}.
+ − 1176
+ − 1177 If the last such operation was done against a string with
+ − 1178 @code{string-match}, then you should pass the same string as the
+ − 1179 argument @var{in-string}. Otherwise, after a buffer search or match,
+ − 1180 you should omit @var{in-string} or pass @code{nil} for it; but you
+ − 1181 should make sure that the current buffer when you call
+ − 1182 @code{match-string} is the one in which you did the searching or
+ − 1183 matching.
+ − 1184 @end defun
+ − 1185
+ − 1186 @defun match-beginning count
+ − 1187 This function returns the position of the start of text matched by the
+ − 1188 last regular expression searched for, or a subexpression of it.
+ − 1189
+ − 1190 If @var{count} is zero, then the value is the position of the start of
+ − 1191 the entire match. Otherwise, @var{count} specifies a subexpression in
+ − 1192 the regular expression, and the value of the function is the starting
+ − 1193 position of the match for that subexpression.
+ − 1194
+ − 1195 The value is @code{nil} for a subexpression inside a @samp{\|}
+ − 1196 alternative that wasn't used in the match.
+ − 1197 @end defun
+ − 1198
+ − 1199 @defun match-end count
+ − 1200 This function is like @code{match-beginning} except that it returns the
+ − 1201 position of the end of the match, rather than the position of the
+ − 1202 beginning.
+ − 1203 @end defun
+ − 1204
+ − 1205 Here is an example of using the match data, with a comment showing the
+ − 1206 positions within the text:
+ − 1207
+ − 1208 @example
+ − 1209 @group
+ − 1210 (string-match "\\(qu\\)\\(ick\\)"
+ − 1211 "The quick fox jumped quickly.")
444
+ − 1212 ;0123456789
428
+ − 1213 @result{} 4
+ − 1214 @end group
+ − 1215
+ − 1216 @group
+ − 1217 (match-string 0 "The quick fox jumped quickly.")
+ − 1218 @result{} "quick"
+ − 1219 (match-string 1 "The quick fox jumped quickly.")
+ − 1220 @result{} "qu"
+ − 1221 (match-string 2 "The quick fox jumped quickly.")
+ − 1222 @result{} "ick"
+ − 1223 @end group
+ − 1224
+ − 1225 @group
+ − 1226 (match-beginning 1) ; @r{The beginning of the match}
+ − 1227 @result{} 4 ; @r{with @samp{qu} is at index 4.}
+ − 1228 @end group
+ − 1229
+ − 1230 @group
+ − 1231 (match-beginning 2) ; @r{The beginning of the match}
+ − 1232 @result{} 6 ; @r{with @samp{ick} is at index 6.}
+ − 1233 @end group
+ − 1234
+ − 1235 @group
+ − 1236 (match-end 1) ; @r{The end of the match}
+ − 1237 @result{} 6 ; @r{with @samp{qu} is at index 6.}
+ − 1238
+ − 1239 (match-end 2) ; @r{The end of the match}
+ − 1240 @result{} 9 ; @r{with @samp{ick} is at index 9.}
+ − 1241 @end group
+ − 1242 @end example
+ − 1243
+ − 1244 Here is another example. Point is initially located at the beginning
+ − 1245 of the line. Searching moves point to between the space and the word
+ − 1246 @samp{in}. The beginning of the entire match is at the 9th character of
+ − 1247 the buffer (@samp{T}), and the beginning of the match for the first
+ − 1248 subexpression is at the 13th character (@samp{c}).
+ − 1249
+ − 1250 @example
+ − 1251 @group
+ − 1252 (list
+ − 1253 (re-search-forward "The \\(cat \\)")
+ − 1254 (match-beginning 0)
+ − 1255 (match-beginning 1))
+ − 1256 @result{} (9 9 13)
+ − 1257 @end group
+ − 1258
+ − 1259 @group
+ − 1260 ---------- Buffer: foo ----------
+ − 1261 I read "The cat @point{}in the hat comes back" twice.
+ − 1262 ^ ^
+ − 1263 9 13
+ − 1264 ---------- Buffer: foo ----------
+ − 1265 @end group
+ − 1266 @end example
+ − 1267
+ − 1268 @noindent
+ − 1269 (In this case, the index returned is a buffer position; the first
+ − 1270 character of the buffer counts as 1.)
+ − 1271
+ − 1272 @node Replacing Match
+ − 1273 @subsection Replacing the Text That Matched
+ − 1274
+ − 1275 This function replaces the text matched by the last search with
+ − 1276 @var{replacement}.
+ − 1277
+ − 1278 @cindex case in replacements
444
+ − 1279 @defun replace-match replacement &optional fixedcase literal string strbuffer
428
+ − 1280 This function replaces the text in the buffer (or in @var{string}) that
+ − 1281 was matched by the last search. It replaces that text with
+ − 1282 @var{replacement}.
+ − 1283
+ − 1284 If you did the last search in a buffer, you should specify @code{nil}
4199
+ − 1285 for @var{string}. (An error will be signaled if you don't.) Then
+ − 1286 @code{replace-match} does the replacement by editing the buffer; it
+ − 1287 leaves point at the end of the replacement text, and returns @code{t}.
428
+ − 1288
+ − 1289 If you did the search in a string, pass the same string as @var{string}.
4199
+ − 1290 (An error will be signaled if you specify nil.) Then
+ − 1291 @code{replace-match} does the replacement by constructing and returning
+ − 1292 a new string.
444
+ − 1293
428
+ − 1294 If @var{fixedcase} is non-@code{nil}, then the case of the replacement
+ − 1295 text is not changed; otherwise, the replacement text is converted to a
+ − 1296 different case depending upon the capitalization of the text to be
+ − 1297 replaced. If the original text is all upper case, the replacement text
+ − 1298 is converted to upper case. If the first word of the original text is
+ − 1299 capitalized, then the first word of the replacement text is capitalized.
+ − 1300 If the original text contains just one word, and that word is a capital
+ − 1301 letter, @code{replace-match} considers this a capitalized first word
+ − 1302 rather than all upper case.
+ − 1303
+ − 1304 If @code{case-replace} is @code{nil}, then case conversion is not done,
444
+ − 1305 regardless of the value of @var{fixedcase}. @xref{Searching and Case}.
428
+ − 1306
+ − 1307 If @var{literal} is non-@code{nil}, then @var{replacement} is inserted
+ − 1308 exactly as it is, the only alterations being case changes as needed.
+ − 1309 If it is @code{nil} (the default), then the character @samp{\} is treated
+ − 1310 specially. If a @samp{\} appears in @var{replacement}, then it must be
+ − 1311 part of one of the following sequences:
+ − 1312
+ − 1313 @table @asis
+ − 1314 @item @samp{\&}
4199
+ − 1315 @cindex @samp{\&} in replacement
428
+ − 1316 @samp{\&} stands for the entire text being replaced.
+ − 1317
+ − 1318 @item @samp{\@var{n}}
+ − 1319 @cindex @samp{\@var{n}} in replacement
4199
+ − 1320 @cindex @samp{\@var{digit}} in replacement
428
+ − 1321 @samp{\@var{n}}, where @var{n} is a digit, stands for the text that
+ − 1322 matched the @var{n}th subexpression in the original regexp.
+ − 1323 Subexpressions are those expressions grouped inside @samp{\(@dots{}\)}.
+ − 1324
+ − 1325 @item @samp{\\}
4199
+ − 1326 @cindex @samp{\\} in replacement
428
+ − 1327 @samp{\\} stands for a single @samp{\} in the replacement text.
4199
+ − 1328
+ − 1329 @item @samp{\u}
+ − 1330 @cindex @samp{\u} in replacement
+ − 1331 @samp{\u} means upcase the next character.
+ − 1332
+ − 1333 @item @samp{\l}
+ − 1334 @cindex @samp{\l} in replacement
+ − 1335 @samp{\l} means downcase the next character.
+ − 1336
+ − 1337 @item @samp{\U}
+ − 1338 @cindex @samp{\U} in replacement
+ − 1339 @samp{\U} means begin upcasing all following characters.
+ − 1340
+ − 1341 @item @samp{\L}
+ − 1342 @cindex @samp{\L} in replacement
+ − 1343 @samp{\L} means begin downcasing all following characters.
+ − 1344
+ − 1345 @item @samp{\E}
+ − 1346 @cindex @samp{\E} in replacement
+ − 1347 @samp{\E} means terminate the effect of any @samp{\U} or @samp{\L}.
428
+ − 1348 @end table
4199
+ − 1349
+ − 1350 Case changes made with @samp{\u}, @samp{\l}, @samp{\U}, and @samp{\L}
+ − 1351 override all other case changes that may be made in the replaced text.
+ − 1352
+ − 1353 The fifth argument @var{strbuffer} may be a buffer to be used for
+ − 1354 syntax-table and case-table lookup. If @var{strbuffer} is not a buffer,
+ − 1355 the current buffer is used. When @var{string} is not a string, the
+ − 1356 buffer that the match occurred in has automatically been remembered and
+ − 1357 you do not need to specify it. @var{string} may also be an integer,
+ − 1358 specifying the index of the subexpression to match. When @var{string}
+ − 1359 is not an integer, the ``subexpression'' is 0, @emph{i.e.}, the whole
+ − 1360 match. An @code{invalid-argument} error will be signaled if you specify
+ − 1361 a buffer when @var{string} is nil, or specify a subexpression which was
+ − 1362 not matched.
+ − 1363
+ − 1364 It is not possible to specify both a buffer and a subexpression, but the
+ − 1365 idiom
+ − 1366 @example
+ − 1367 (with-current-buffer @var{buffer} (replace-match ... @var{integer}))
+ − 1368 @end example
+ − 1369 may be used.
+ − 1370
428
+ − 1371 @end defun
+ − 1372
4199
+ − 1373
428
+ − 1374 @node Entire Match Data
+ − 1375 @subsection Accessing the Entire Match Data
+ − 1376
+ − 1377 The functions @code{match-data} and @code{set-match-data} read or
+ − 1378 write the entire match data, all at once.
+ − 1379
444
+ − 1380 @defun match-data &optional integers reuse
428
+ − 1381 This function returns a newly constructed list containing all the
+ − 1382 information on what text the last search matched. Element zero is the
+ − 1383 position of the beginning of the match for the whole expression; element
+ − 1384 one is the position of the end of the match for the expression. The
+ − 1385 next two elements are the positions of the beginning and end of the
+ − 1386 match for the first subexpression, and so on. In general, element
+ − 1387 @ifinfo
+ − 1388 number 2@var{n}
+ − 1389 @end ifinfo
+ − 1390 @tex
+ − 1391 number {\mathsurround=0pt $2n$}
+ − 1392 @end tex
+ − 1393 corresponds to @code{(match-beginning @var{n})}; and
+ − 1394 element
+ − 1395 @ifinfo
+ − 1396 number 2@var{n} + 1
+ − 1397 @end ifinfo
+ − 1398 @tex
+ − 1399 number {\mathsurround=0pt $2n+1$}
+ − 1400 @end tex
+ − 1401 corresponds to @code{(match-end @var{n})}.
+ − 1402
+ − 1403 All the elements are markers or @code{nil} if matching was done on a
+ − 1404 buffer, and all are integers or @code{nil} if matching was done on a
444
+ − 1405 string with @code{string-match}. However, if the optional first
+ − 1406 argument @var{integers} is non-@code{nil}, always use integers (rather
+ − 1407 than markers) to represent buffer positions.
+ − 1408
+ − 1409 If the optional second argument @var{reuse} is a list, reuse it as part
+ − 1410 of the value. If @var{reuse} is long enough to hold all the values, and if
+ − 1411 @var{integers} is non-@code{nil}, no new lisp objects are created.
428
+ − 1412
+ − 1413 As always, there must be no possibility of intervening searches between
+ − 1414 the call to a search function and the call to @code{match-data} that is
+ − 1415 intended to access the match data for that search.
+ − 1416
+ − 1417 @example
+ − 1418 @group
+ − 1419 (match-data)
+ − 1420 @result{} (#<marker at 9 in foo>
+ − 1421 #<marker at 17 in foo>
+ − 1422 #<marker at 13 in foo>
+ − 1423 #<marker at 17 in foo>)
+ − 1424 @end group
+ − 1425 @end example
+ − 1426 @end defun
+ − 1427
+ − 1428 @defun set-match-data match-list
+ − 1429 This function sets the match data from the elements of @var{match-list},
+ − 1430 which should be a list that was the value of a previous call to
+ − 1431 @code{match-data}.
+ − 1432
+ − 1433 If @var{match-list} refers to a buffer that doesn't exist, you don't get
+ − 1434 an error; that sets the match data in a meaningless but harmless way.
+ − 1435
+ − 1436 @findex store-match-data
+ − 1437 @code{store-match-data} is an alias for @code{set-match-data}.
+ − 1438 @end defun
+ − 1439
+ − 1440 @node Saving Match Data
+ − 1441 @subsection Saving and Restoring the Match Data
+ − 1442
+ − 1443 When you call a function that may do a search, you may need to save
+ − 1444 and restore the match data around that call, if you want to preserve the
+ − 1445 match data from an earlier search for later use. Here is an example
+ − 1446 that shows the problem that arises if you fail to save the match data:
+ − 1447
+ − 1448 @example
+ − 1449 @group
+ − 1450 (re-search-forward "The \\(cat \\)")
+ − 1451 @result{} 48
+ − 1452 (foo) ; @r{Perhaps @code{foo} does}
+ − 1453 ; @r{more searching.}
+ − 1454 (match-end 0)
+ − 1455 @result{} 61 ; @r{Unexpected result---not 48!}
+ − 1456 @end group
+ − 1457 @end example
+ − 1458
+ − 1459 You can save and restore the match data with @code{save-match-data}:
+ − 1460
444
+ − 1461 @defspec save-match-data body@dots{}
428
+ − 1462 This special form executes @var{body}, saving and restoring the match
+ − 1463 data around it.
444
+ − 1464 @end defspec
428
+ − 1465
+ − 1466 You can use @code{set-match-data} together with @code{match-data} to
+ − 1467 imitate the effect of the special form @code{save-match-data}. This is
+ − 1468 useful for writing code that can run in Emacs 18. Here is how:
+ − 1469
+ − 1470 @example
+ − 1471 @group
+ − 1472 (let ((data (match-data)))
+ − 1473 (unwind-protect
+ − 1474 @dots{} ; @r{May change the original match data.}
+ − 1475 (set-match-data data)))
+ − 1476 @end group
+ − 1477 @end example
+ − 1478
+ − 1479 Emacs automatically saves and restores the match data when it runs
+ − 1480 process filter functions (@pxref{Filter Functions}) and process
+ − 1481 sentinels (@pxref{Sentinels}).
+ − 1482
+ − 1483 @ignore
+ − 1484 Here is a function which restores the match data provided the buffer
+ − 1485 associated with it still exists.
+ − 1486
+ − 1487 @smallexample
+ − 1488 @group
+ − 1489 (defun restore-match-data (data)
+ − 1490 @c It is incorrect to split the first line of a doc string.
+ − 1491 @c If there's a problem here, it should be solved in some other way.
+ − 1492 "Restore the match data DATA unless the buffer is missing."
+ − 1493 (catch 'foo
+ − 1494 (let ((d data))
+ − 1495 @end group
+ − 1496 (while d
+ − 1497 (and (car d)
+ − 1498 (null (marker-buffer (car d)))
+ − 1499 @group
+ − 1500 ;; @file{match-data} @r{buffer is deleted.}
+ − 1501 (throw 'foo nil))
+ − 1502 (setq d (cdr d)))
+ − 1503 (set-match-data data))))
+ − 1504 @end group
+ − 1505 @end smallexample
+ − 1506 @end ignore
+ − 1507
+ − 1508 @node Searching and Case
+ − 1509 @section Searching and Case
+ − 1510 @cindex searching and case
+ − 1511
+ − 1512 By default, searches in Emacs ignore the case of the text they are
+ − 1513 searching through; if you specify searching for @samp{FOO}, then
+ − 1514 @samp{Foo} or @samp{foo} is also considered a match. Regexps, and in
+ − 1515 particular character sets, are included: thus, @samp{[aB]} would match
+ − 1516 @samp{a} or @samp{A} or @samp{b} or @samp{B}.
+ − 1517
+ − 1518 If you do not want this feature, set the variable
+ − 1519 @code{case-fold-search} to @code{nil}. Then all letters must match
+ − 1520 exactly, including case. This is a buffer-local variable; altering the
+ − 1521 variable affects only the current buffer. (@xref{Intro to
+ − 1522 Buffer-Local}.) Alternatively, you may change the value of
+ − 1523 @code{default-case-fold-search}, which is the default value of
+ − 1524 @code{case-fold-search} for buffers that do not override it.
+ − 1525
+ − 1526 Note that the user-level incremental search feature handles case
+ − 1527 distinctions differently. When given a lower case letter, it looks for
+ − 1528 a match of either case, but when given an upper case letter, it looks
+ − 1529 for an upper case letter only. But this has nothing to do with the
+ − 1530 searching functions Lisp functions use.
+ − 1531
+ − 1532 @defopt case-replace
+ − 1533 This variable determines whether the replacement functions should
+ − 1534 preserve case. If the variable is @code{nil}, that means to use the
+ − 1535 replacement text verbatim. A non-@code{nil} value means to convert the
+ − 1536 case of the replacement text according to the text being replaced.
+ − 1537
+ − 1538 The function @code{replace-match} is where this variable actually has
+ − 1539 its effect. @xref{Replacing Match}.
+ − 1540 @end defopt
+ − 1541
+ − 1542 @defopt case-fold-search
+ − 1543 This buffer-local variable determines whether searches should ignore
+ − 1544 case. If the variable is @code{nil} they do not ignore case; otherwise
+ − 1545 they do ignore case.
+ − 1546 @end defopt
+ − 1547
+ − 1548 @defvar default-case-fold-search
+ − 1549 The value of this variable is the default value for
+ − 1550 @code{case-fold-search} in buffers that do not override it. This is the
+ − 1551 same as @code{(default-value 'case-fold-search)}.
+ − 1552 @end defvar
+ − 1553
+ − 1554 @node Standard Regexps
+ − 1555 @section Standard Regular Expressions Used in Editing
+ − 1556 @cindex regexps used standardly in editing
+ − 1557 @cindex standard regexps used in editing
+ − 1558
+ − 1559 This section describes some variables that hold regular expressions
+ − 1560 used for certain purposes in editing:
+ − 1561
+ − 1562 @defvar page-delimiter
+ − 1563 This is the regexp describing line-beginnings that separate pages. The
+ − 1564 default value is @code{"^\014"} (i.e., @code{"^^L"} or @code{"^\C-l"});
+ − 1565 this matches a line that starts with a formfeed character.
+ − 1566 @end defvar
+ − 1567
+ − 1568 The following two regular expressions should @emph{not} assume the
+ − 1569 match always starts at the beginning of a line; they should not use
+ − 1570 @samp{^} to anchor the match. Most often, the paragraph commands do
+ − 1571 check for a match only at the beginning of a line, which means that
+ − 1572 @samp{^} would be superfluous. When there is a nonzero left margin,
+ − 1573 they accept matches that start after the left margin. In that case, a
+ − 1574 @samp{^} would be incorrect. However, a @samp{^} is harmless in modes
+ − 1575 where a left margin is never used.
+ − 1576
+ − 1577 @defvar paragraph-separate
+ − 1578 This is the regular expression for recognizing the beginning of a line
+ − 1579 that separates paragraphs. (If you change this, you may have to
+ − 1580 change @code{paragraph-start} also.) The default value is
+ − 1581 @w{@code{"[@ \t\f]*$"}}, which matches a line that consists entirely of
+ − 1582 spaces, tabs, and form feeds (after its left margin).
+ − 1583 @end defvar
+ − 1584
+ − 1585 @defvar paragraph-start
+ − 1586 This is the regular expression for recognizing the beginning of a line
+ − 1587 that starts @emph{or} separates paragraphs. The default value is
+ − 1588 @w{@code{"[@ \t\n\f]"}}, which matches a line starting with a space, tab,
+ − 1589 newline, or form feed (after its left margin).
+ − 1590 @end defvar
+ − 1591
+ − 1592 @defvar sentence-end
+ − 1593 This is the regular expression describing the end of a sentence. (All
+ − 1594 paragraph boundaries also end sentences, regardless.) The default value
+ − 1595 is:
+ − 1596
+ − 1597 @example
+ − 1598 "[.?!][]\"')@}]*\\($\\| $\\|\t\\| \\)[ \t\n]*"
+ − 1599 @end example
+ − 1600
+ − 1601 This means a period, question mark or exclamation mark, followed
+ − 1602 optionally by a closing parenthetical character, followed by tabs,
+ − 1603 spaces or new lines.
+ − 1604
+ − 1605 For a detailed explanation of this regular expression, see @ref{Regexp
+ − 1606 Example}.
+ − 1607 @end defvar