Mercurial > hg > xemacs-beta
comparison src/search.c @ 665:fdefd0186b75
[xemacs-hg @ 2001-09-20 06:28:42 by ben]
The great integral types renaming.
The purpose of this is to rationalize the names used for various
integral types, so that they match their intended uses and follow
consist conventions, and eliminate types that were not semantically
different from each other.
The conventions are:
-- All integral types that measure quantities of anything are
signed. Some people disagree vociferously with this, but their
arguments are mostly theoretical, and are vastly outweighed by
the practical headaches of mixing signed and unsigned values,
and more importantly by the far increased likelihood of
inadvertent bugs: Because of the broken "viral" nature of
unsigned quantities in C (operations involving mixed
signed/unsigned are done unsigned, when exactly the opposite is
nearly always wanted), even a single error in declaring a
quantity unsigned that should be signed, or even the even more
subtle error of comparing signed and unsigned values and
forgetting the necessary cast, can be catastrophic, as
comparisons will yield wrong results. -Wsign-compare is turned
on specifically to catch this, but this tends to result in a
great number of warnings when mixing signed and unsigned, and
the casts are annoying. More has been written on this
elsewhere.
-- All such quantity types just mentioned boil down to EMACS_INT,
which is 32 bits on 32-bit machines and 64 bits on 64-bit
machines. This is guaranteed to be the same size as Lisp
objects of type `int', and (as far as I can tell) of size_t
(unsigned!) and ssize_t. The only type below that is not an
EMACS_INT is Hashcode, which is an unsigned value of the same
size as EMACS_INT.
-- Type names should be relatively short (no more than 10
characters or so), with the first letter capitalized and no
underscores if they can at all be avoided.
-- "count" == a zero-based measurement of some quantity. Includes
sizes, offsets, and indexes.
-- "bpos" == a one-based measurement of a position in a buffer.
"Charbpos" and "Bytebpos" count text in the buffer, rather than
bytes in memory; thus Bytebpos does not directly correspond to
the memory representation. Use "Membpos" for this.
-- "Char" refers to internal-format characters, not to the C type
"char", which is really a byte.
-- For the actual name changes, see the script below.
I ran the following script to do the conversion. (NOTE: This script
is idempotent. You can safely run it multiple times and it will
not screw up previous results -- in fact, it will do nothing if
nothing has changed. Thus, it can be run repeatedly as necessary
to handle patches coming in from old workspaces, or old branches.)
There are two tags, just before and just after the change:
`pre-integral-type-rename' and `post-integral-type-rename'. When
merging code from the main trunk into a branch, the best thing to
do is first merge up to `pre-integral-type-rename', then apply the
script and associated changes, then merge from
`post-integral-type-change' to the present. (Alternatively, just do
the merging in one operation; but you may then have a lot of
conflicts needing to be resolved by hand.)
Script `fixtypes.sh' follows:
----------------------------------- cut ------------------------------------
files="*.[ch] s/*.h m/*.h config.h.in ../configure.in Makefile.in.in ../lib-src/*.[ch] ../lwlib/*.[ch]"
gr Memory_Count Bytecount $files
gr Lstream_Data_Count Bytecount $files
gr Element_Count Elemcount $files
gr Hash_Code Hashcode $files
gr extcount bytecount $files
gr bufpos charbpos $files
gr bytind bytebpos $files
gr memind membpos $files
gr bufbyte intbyte $files
gr Extcount Bytecount $files
gr Bufpos Charbpos $files
gr Bytind Bytebpos $files
gr Memind Membpos $files
gr Bufbyte Intbyte $files
gr EXTCOUNT BYTECOUNT $files
gr BUFPOS CHARBPOS $files
gr BYTIND BYTEBPOS $files
gr MEMIND MEMBPOS $files
gr BUFBYTE INTBYTE $files
gr MEMORY_COUNT BYTECOUNT $files
gr LSTREAM_DATA_COUNT BYTECOUNT $files
gr ELEMENT_COUNT ELEMCOUNT $files
gr HASH_CODE HASHCODE $files
----------------------------------- cut ------------------------------------
`fixtypes.sh' is a Bourne-shell script; it uses 'gr':
----------------------------------- cut ------------------------------------
#!/bin/sh
# Usage is like this:
# gr FROM TO FILES ...
# globally replace FROM with TO in FILES. FROM and TO are regular expressions.
# backup files are stored in the `backup' directory.
from="$1"
to="$2"
shift 2
echo ${1+"$@"} | xargs global-replace "s/$from/$to/g"
----------------------------------- cut ------------------------------------
`gr' in turn uses a Perl script to do its real work,
`global-replace', which follows:
----------------------------------- cut ------------------------------------
: #-*- Perl -*-
### global-modify --- modify the contents of a file by a Perl expression
## Copyright (C) 1999 Martin Buchholz.
## Copyright (C) 2001 Ben Wing.
## Authors: Martin Buchholz <martin@xemacs.org>, Ben Wing <ben@xemacs.org>
## Maintainer: Ben Wing <ben@xemacs.org>
## Current Version: 1.0, May 5, 2001
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2, or (at your option)
# any later version.
#
# This program is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
# General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with XEmacs; see the file COPYING. If not, write to the Free
# Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA
# 02111-1307, USA.
eval 'exec perl -w -S $0 ${1+"$@"}'
if 0;
use strict;
use FileHandle;
use Carp;
use Getopt::Long;
use File::Basename;
(my $myName = $0) =~ s@.*/@@; my $usage="
Usage: $myName [--help] [--backup-dir=DIR] [--line-mode] [--hunk-mode]
PERLEXPR FILE ...
Globally modify a file, either line by line or in one big hunk.
Typical usage is like this:
[with GNU print, GNU xargs: guaranteed to handle spaces, quotes, etc.
in file names]
find . -name '*.[ch]' -print0 | xargs -0 $0 's/\bCONST\b/const/g'\n
[with non-GNU print, xargs]
find . -name '*.[ch]' -print | xargs $0 's/\bCONST\b/const/g'\n
The file is read in, either line by line (with --line-mode specified)
or in one big hunk (with --hunk-mode specified; it's the default), and
the Perl expression is then evalled with \$_ set to the line or hunk of
text, including the terminating newline if there is one. It should
destructively modify the value there, storing the changed result in \$_.
Files in which any modifications are made are backed up to the directory
specified using --backup-dir, or to `backup' by default. To disable this,
use --backup-dir= with no argument.
Hunk mode is the default because it is MUCH MUCH faster than line-by-line.
Use line-by-line only when it matters, e.g. you want to do a replacement
only once per line (the default without the `g' argument). Conversely,
when using hunk mode, *ALWAYS* use `g'; otherwise, you will only make one
replacement in the entire file!
";
my %options = ();
$Getopt::Long::ignorecase = 0;
&GetOptions (
\%options,
'help', 'backup-dir=s', 'line-mode', 'hunk-mode',
);
die $usage if $options{"help"} or @ARGV <= 1;
my $code = shift;
die $usage if grep (-d || ! -w, @ARGV);
sub SafeOpen {
open ((my $fh = new FileHandle), $_[0]);
confess "Can't open $_[0]: $!" if ! defined $fh;
return $fh;
}
sub SafeClose {
close $_[0] or confess "Can't close $_[0]: $!";
}
sub FileContents {
my $fh = SafeOpen ("< $_[0]");
my $olddollarslash = $/;
local $/ = undef;
my $contents = <$fh>;
$/ = $olddollarslash;
return $contents;
}
sub WriteStringToFile {
my $fh = SafeOpen ("> $_[0]");
binmode $fh;
print $fh $_[1] or confess "$_[0]: $!\n";
SafeClose $fh;
}
foreach my $file (@ARGV) {
my $changed_p = 0;
my $new_contents = "";
if ($options{"line-mode"}) {
my $fh = SafeOpen $file;
while (<$fh>) {
my $save_line = $_;
eval $code;
$changed_p = 1 if $save_line ne $_;
$new_contents .= $_;
}
} else {
my $orig_contents = $_ = FileContents $file;
eval $code;
if ($_ ne $orig_contents) {
$changed_p = 1;
$new_contents = $_;
}
}
if ($changed_p) {
my $backdir = $options{"backup-dir"};
$backdir = "backup" if !defined ($backdir);
if ($backdir) {
my ($name, $path, $suffix) = fileparse ($file, "");
my $backfulldir = $path . $backdir;
my $backfile = "$backfulldir/$name";
mkdir $backfulldir, 0755 unless -d $backfulldir;
print "modifying $file (original saved in $backfile)\n";
rename $file, $backfile;
}
WriteStringToFile ($file, $new_contents);
}
}
----------------------------------- cut ------------------------------------
In addition to those programs, I needed to fix up a few other
things, particularly relating to the duplicate definitions of
types, now that some types merged with others. Specifically:
1. in lisp.h, removed duplicate declarations of Bytecount. The
changed code should now look like this: (In each code snippet
below, the first and last lines are the same as the original, as
are all lines outside of those lines. That allows you to locate
the section to be replaced, and replace the stuff in that
section, verifying that there isn't anything new added that
would need to be kept.)
--------------------------------- snip -------------------------------------
/* Counts of bytes or chars */
typedef EMACS_INT Bytecount;
typedef EMACS_INT Charcount;
/* Counts of elements */
typedef EMACS_INT Elemcount;
/* Hash codes */
typedef unsigned long Hashcode;
/* ------------------------ dynamic arrays ------------------- */
--------------------------------- snip -------------------------------------
2. in lstream.h, removed duplicate declaration of Bytecount.
Rewrote the comment about this type. The changed code should
now look like this:
--------------------------------- snip -------------------------------------
#endif
/* The have been some arguments over the what the type should be that
specifies a count of bytes in a data block to be written out or read in,
using Lstream_read(), Lstream_write(), and related functions.
Originally it was long, which worked fine; Martin "corrected" these to
size_t and ssize_t on the grounds that this is theoretically cleaner and
is in keeping with the C standards. Unfortunately, this practice is
horribly error-prone due to design flaws in the way that mixed
signed/unsigned arithmetic happens. In fact, by doing this change,
Martin introduced a subtle but fatal error that caused the operation of
sending large mail messages to the SMTP server under Windows to fail.
By putting all values back to be signed, avoiding any signed/unsigned
mixing, the bug immediately went away. The type then in use was
Lstream_Data_Count, so that it be reverted cleanly if a vote came to
that. Now it is Bytecount.
Some earlier comments about why the type must be signed: This MUST BE
SIGNED, since it also is used in functions that return the number of
bytes actually read to or written from in an operation, and these
functions can return -1 to signal error.
Note that the standard Unix read() and write() functions define the
count going in as a size_t, which is UNSIGNED, and the count going
out as an ssize_t, which is SIGNED. This is a horrible design
flaw. Not only is it highly likely to lead to logic errors when a
-1 gets interpreted as a large positive number, but operations are
bound to fail in all sorts of horrible ways when a number in the
upper-half of the size_t range is passed in -- this number is
unrepresentable as an ssize_t, so code that checks to see how many
bytes are actually written (which is mandatory if you are dealing
with certain types of devices) will get completely screwed up.
--ben
*/
typedef enum lstream_buffering
--------------------------------- snip -------------------------------------
3. in dumper.c, there are four places, all inside of switch()
statements, where XD_BYTECOUNT appears twice as a case tag. In
each case, the two case blocks contain identical code, and you
should *REMOVE THE SECOND* and leave the first.
author | ben |
---|---|
date | Thu, 20 Sep 2001 06:31:11 +0000 |
parents | b39c14581166 |
children | a307f9a2021d |
comparison
equal
deleted
inserted
replaced
664:6e99cc8c6ca5 | 665:fdefd0186b75 |
---|---|
82 to call re_set_registers after compiling a new pattern or after | 82 to call re_set_registers after compiling a new pattern or after |
83 setting the match registers, so that the regex functions will be | 83 setting the match registers, so that the regex functions will be |
84 able to free or re-allocate it properly. */ | 84 able to free or re-allocate it properly. */ |
85 | 85 |
86 /* Note: things get trickier under Mule because the values returned from | 86 /* Note: things get trickier under Mule because the values returned from |
87 the regexp routines are in Bytinds but we need them to be in Bufpos's. | 87 the regexp routines are in Bytebposs but we need them to be in Charbpos's. |
88 We take the easy way out for the moment and just convert them immediately. | 88 We take the easy way out for the moment and just convert them immediately. |
89 We could be more clever by not converting them until necessary, but | 89 We could be more clever by not converting them until necessary, but |
90 that gets real ugly real fast since the buffer might have changed and | 90 that gets real ugly real fast since the buffer might have changed and |
91 the positions might be out of sync or out of range. | 91 the positions might be out of sync or out of range. |
92 */ | 92 */ |
107 Fixnum warn_about_possibly_incompatible_back_references; | 107 Fixnum warn_about_possibly_incompatible_back_references; |
108 | 108 |
109 /* range table for use with skip_chars. Only needed for Mule. */ | 109 /* range table for use with skip_chars. Only needed for Mule. */ |
110 Lisp_Object Vskip_chars_range_table; | 110 Lisp_Object Vskip_chars_range_table; |
111 | 111 |
112 static void set_search_regs (struct buffer *buf, Bufpos beg, Charcount len); | 112 static void set_search_regs (struct buffer *buf, Charbpos beg, Charcount len); |
113 static void save_search_regs (void); | 113 static void save_search_regs (void); |
114 static Bufpos simple_search (struct buffer *buf, Bufbyte *base_pat, | 114 static Charbpos simple_search (struct buffer *buf, Intbyte *base_pat, |
115 Bytecount len, Bytind pos, Bytind lim, | 115 Bytecount len, Bytebpos pos, Bytebpos lim, |
116 EMACS_INT n, Lisp_Object trt); | 116 EMACS_INT n, Lisp_Object trt); |
117 static Bufpos boyer_moore (struct buffer *buf, Bufbyte *base_pat, | 117 static Charbpos boyer_moore (struct buffer *buf, Intbyte *base_pat, |
118 Bytecount len, Bytind pos, Bytind lim, | 118 Bytecount len, Bytebpos pos, Bytebpos lim, |
119 EMACS_INT n, Lisp_Object trt, | 119 EMACS_INT n, Lisp_Object trt, |
120 Lisp_Object inverse_trt, int charset_base); | 120 Lisp_Object inverse_trt, int charset_base); |
121 static Bufpos search_buffer (struct buffer *buf, Lisp_Object str, | 121 static Charbpos search_buffer (struct buffer *buf, Lisp_Object str, |
122 Bufpos bufpos, Bufpos buflim, EMACS_INT n, int RE, | 122 Charbpos charbpos, Charbpos buflim, EMACS_INT n, int RE, |
123 Lisp_Object trt, Lisp_Object inverse_trt, | 123 Lisp_Object trt, Lisp_Object inverse_trt, |
124 int posix); | 124 int posix); |
125 | 125 |
126 static void | 126 static void |
127 matcher_overflow (void) | 127 matcher_overflow (void) |
227 for (;;) | 227 for (;;) |
228 Fsignal (Qsearch_failed, list1 (arg)); | 228 Fsignal (Qsearch_failed, list1 (arg)); |
229 return Qnil; /* Not reached. */ | 229 return Qnil; /* Not reached. */ |
230 } | 230 } |
231 | 231 |
232 /* Convert the search registers from Bytinds to Bufpos's. Needs to be | 232 /* Convert the search registers from Bytebposs to Charbpos's. Needs to be |
233 done after each regexp match that uses the search regs. | 233 done after each regexp match that uses the search regs. |
234 | 234 |
235 We could get a potential speedup by not converting the search registers | 235 We could get a potential speedup by not converting the search registers |
236 until it's really necessary, e.g. when match-data or replace-match is | 236 until it's really necessary, e.g. when match-data or replace-match is |
237 called. However, this complexifies the code a lot (e.g. the buffer | 237 called. However, this complexifies the code a lot (e.g. the buffer |
238 could have changed and the Bytinds stored might be invalid) and is | 238 could have changed and the Bytebposs stored might be invalid) and is |
239 probably not a great time-saver. */ | 239 probably not a great time-saver. */ |
240 | 240 |
241 static void | 241 static void |
242 fixup_search_regs_for_buffer (struct buffer *buf) | 242 fixup_search_regs_for_buffer (struct buffer *buf) |
243 { | 243 { |
245 int num_regs = search_regs.num_regs; | 245 int num_regs = search_regs.num_regs; |
246 | 246 |
247 for (i = 0; i < num_regs; i++) | 247 for (i = 0; i < num_regs; i++) |
248 { | 248 { |
249 if (search_regs.start[i] >= 0) | 249 if (search_regs.start[i] >= 0) |
250 search_regs.start[i] = bytind_to_bufpos (buf, search_regs.start[i]); | 250 search_regs.start[i] = bytebpos_to_charbpos (buf, search_regs.start[i]); |
251 if (search_regs.end[i] >= 0) | 251 if (search_regs.end[i] >= 0) |
252 search_regs.end[i] = bytind_to_bufpos (buf, search_regs.end[i]); | 252 search_regs.end[i] = bytebpos_to_charbpos (buf, search_regs.end[i]); |
253 } | 253 } |
254 } | 254 } |
255 | 255 |
256 /* Similar but for strings. */ | 256 /* Similar but for strings. */ |
257 static void | 257 static void |
288 static Lisp_Object | 288 static Lisp_Object |
289 looking_at_1 (Lisp_Object string, struct buffer *buf, int posix) | 289 looking_at_1 (Lisp_Object string, struct buffer *buf, int posix) |
290 { | 290 { |
291 /* This function has been Mule-ized, except for the trt table handling. */ | 291 /* This function has been Mule-ized, except for the trt table handling. */ |
292 Lisp_Object val; | 292 Lisp_Object val; |
293 Bytind p1, p2; | 293 Bytebpos p1, p2; |
294 Bytecount s1, s2; | 294 Bytecount s1, s2; |
295 REGISTER int i; | 295 REGISTER int i; |
296 struct re_pattern_buffer *bufp; | 296 struct re_pattern_buffer *bufp; |
297 | 297 |
298 if (running_asynch_code) | 298 if (running_asynch_code) |
456 /* Match REGEXP against STRING, searching all of STRING, | 456 /* Match REGEXP against STRING, searching all of STRING, |
457 and return the index of the match, or negative on failure. | 457 and return the index of the match, or negative on failure. |
458 This does not clobber the match data. */ | 458 This does not clobber the match data. */ |
459 | 459 |
460 Bytecount | 460 Bytecount |
461 fast_string_match (Lisp_Object regexp, const Bufbyte *nonreloc, | 461 fast_string_match (Lisp_Object regexp, const Intbyte *nonreloc, |
462 Lisp_Object reloc, Bytecount offset, | 462 Lisp_Object reloc, Bytecount offset, |
463 Bytecount length, int case_fold_search, | 463 Bytecount length, int case_fold_search, |
464 Error_Behavior errb, int no_quit) | 464 Error_Behavior errb, int no_quit) |
465 { | 465 { |
466 /* This function has been Mule-ized, except for the trt table handling. */ | 466 /* This function has been Mule-ized, except for the trt table handling. */ |
467 Bytecount val; | 467 Bytecount val; |
468 Bufbyte *newnonreloc = (Bufbyte *) nonreloc; | 468 Intbyte *newnonreloc = (Intbyte *) nonreloc; |
469 struct re_pattern_buffer *bufp; | 469 struct re_pattern_buffer *bufp; |
470 | 470 |
471 bufp = compile_pattern (regexp, 0, | 471 bufp = compile_pattern (regexp, 0, |
472 (case_fold_search | 472 (case_fold_search |
473 ? XCASE_TABLE_DOWNCASE (current_buffer->case_table) | 473 ? XCASE_TABLE_DOWNCASE (current_buffer->case_table) |
489 else | 489 else |
490 { | 490 { |
491 /* QUIT could relocate RELOC. Therefore we must alloca() | 491 /* QUIT could relocate RELOC. Therefore we must alloca() |
492 and copy. No way around this except some serious | 492 and copy. No way around this except some serious |
493 rewriting of re_search(). */ | 493 rewriting of re_search(). */ |
494 newnonreloc = (Bufbyte *) alloca (length); | 494 newnonreloc = (Intbyte *) alloca (length); |
495 memcpy (newnonreloc, XSTRING_DATA (reloc), length); | 495 memcpy (newnonreloc, XSTRING_DATA (reloc), length); |
496 } | 496 } |
497 } | 497 } |
498 | 498 |
499 /* #### evil current-buffer dependency */ | 499 /* #### evil current-buffer dependency */ |
559 If we don't find COUNT instances before reaching END, set *SHORTAGE | 559 If we don't find COUNT instances before reaching END, set *SHORTAGE |
560 to the number of TARGETs left unfound, and return END. | 560 to the number of TARGETs left unfound, and return END. |
561 | 561 |
562 If ALLOW_QUIT is non-zero, call QUIT periodically. */ | 562 If ALLOW_QUIT is non-zero, call QUIT periodically. */ |
563 | 563 |
564 static Bytind | 564 static Bytebpos |
565 bi_scan_buffer (struct buffer *buf, Emchar target, Bytind st, Bytind en, | 565 bi_scan_buffer (struct buffer *buf, Emchar target, Bytebpos st, Bytebpos en, |
566 EMACS_INT count, EMACS_INT *shortage, int allow_quit) | 566 EMACS_INT count, EMACS_INT *shortage, int allow_quit) |
567 { | 567 { |
568 /* This function has been Mule-ized. */ | 568 /* This function has been Mule-ized. */ |
569 Bytind lim = en > 0 ? en : | 569 Bytebpos lim = en > 0 ? en : |
570 ((count > 0) ? BI_BUF_ZV (buf) : BI_BUF_BEGV (buf)); | 570 ((count > 0) ? BI_BUF_ZV (buf) : BI_BUF_BEGV (buf)); |
571 | 571 |
572 /* #### newline cache stuff in this function not yet ported */ | 572 /* #### newline cache stuff in this function not yet ported */ |
573 | 573 |
574 assert (count != 0); | 574 assert (count != 0); |
588 { | 588 { |
589 while (st < lim && count > 0) | 589 while (st < lim && count > 0) |
590 { | 590 { |
591 if (BI_BUF_FETCH_CHAR (buf, st) == target) | 591 if (BI_BUF_FETCH_CHAR (buf, st) == target) |
592 count--; | 592 count--; |
593 INC_BYTIND (buf, st); | 593 INC_BYTEBPOS (buf, st); |
594 } | 594 } |
595 } | 595 } |
596 else | 596 else |
597 #endif | 597 #endif |
598 { | 598 { |
599 while (st < lim && count > 0) | 599 while (st < lim && count > 0) |
600 { | 600 { |
601 Bytind ceil; | 601 Bytebpos ceil; |
602 Bufbyte *bufptr; | 602 Intbyte *bufptr; |
603 | 603 |
604 ceil = BI_BUF_CEILING_OF (buf, st); | 604 ceil = BI_BUF_CEILING_OF (buf, st); |
605 ceil = min (lim, ceil); | 605 ceil = min (lim, ceil); |
606 bufptr = (Bufbyte *) memchr (BI_BUF_BYTE_ADDRESS (buf, st), | 606 bufptr = (Intbyte *) memchr (BI_BUF_BYTE_ADDRESS (buf, st), |
607 (int) target, ceil - st); | 607 (int) target, ceil - st); |
608 if (bufptr) | 608 if (bufptr) |
609 { | 609 { |
610 count--; | 610 count--; |
611 st = BI_BUF_PTR_BYTE_POS (buf, bufptr) + 1; | 611 st = BI_BUF_PTR_BYTE_POS (buf, bufptr) + 1; |
626 #ifdef MULE | 626 #ifdef MULE |
627 if (target >= 0200) | 627 if (target >= 0200) |
628 { | 628 { |
629 while (st > lim && count < 0) | 629 while (st > lim && count < 0) |
630 { | 630 { |
631 DEC_BYTIND (buf, st); | 631 DEC_BYTEBPOS (buf, st); |
632 if (BI_BUF_FETCH_CHAR (buf, st) == target) | 632 if (BI_BUF_FETCH_CHAR (buf, st) == target) |
633 count++; | 633 count++; |
634 } | 634 } |
635 } | 635 } |
636 else | 636 else |
637 #endif | 637 #endif |
638 { | 638 { |
639 while (st > lim && count < 0) | 639 while (st > lim && count < 0) |
640 { | 640 { |
641 Bytind floor; | 641 Bytebpos floor; |
642 Bufbyte *bufptr; | 642 Intbyte *bufptr; |
643 Bufbyte *floorptr; | 643 Intbyte *floorptr; |
644 | 644 |
645 floor = BI_BUF_FLOOR_OF (buf, st); | 645 floor = BI_BUF_FLOOR_OF (buf, st); |
646 floor = max (lim, floor); | 646 floor = max (lim, floor); |
647 /* No memrchr() ... */ | 647 /* No memrchr() ... */ |
648 bufptr = BI_BUF_BYTE_ADDRESS_BEFORE (buf, st); | 648 bufptr = BI_BUF_BYTE_ADDRESS_BEFORE (buf, st); |
672 else | 672 else |
673 { | 673 { |
674 /* We found the character we were looking for; we have to return | 674 /* We found the character we were looking for; we have to return |
675 the position *after* it due to the strange way that the return | 675 the position *after* it due to the strange way that the return |
676 value is defined. */ | 676 value is defined. */ |
677 INC_BYTIND (buf, st); | 677 INC_BYTEBPOS (buf, st); |
678 return st; | 678 return st; |
679 } | 679 } |
680 } | 680 } |
681 } | 681 } |
682 | 682 |
683 Bufpos | 683 Charbpos |
684 scan_buffer (struct buffer *buf, Emchar target, Bufpos start, Bufpos end, | 684 scan_buffer (struct buffer *buf, Emchar target, Charbpos start, Charbpos end, |
685 EMACS_INT count, EMACS_INT *shortage, int allow_quit) | 685 EMACS_INT count, EMACS_INT *shortage, int allow_quit) |
686 { | 686 { |
687 Bytind bi_retval; | 687 Bytebpos bi_retval; |
688 Bytind bi_start, bi_end; | 688 Bytebpos bi_start, bi_end; |
689 | 689 |
690 bi_start = bufpos_to_bytind (buf, start); | 690 bi_start = charbpos_to_bytebpos (buf, start); |
691 if (end) | 691 if (end) |
692 bi_end = bufpos_to_bytind (buf, end); | 692 bi_end = charbpos_to_bytebpos (buf, end); |
693 else | 693 else |
694 bi_end = 0; | 694 bi_end = 0; |
695 bi_retval = bi_scan_buffer (buf, target, bi_start, bi_end, count, | 695 bi_retval = bi_scan_buffer (buf, target, bi_start, bi_end, count, |
696 shortage, allow_quit); | 696 shortage, allow_quit); |
697 return bytind_to_bufpos (buf, bi_retval); | 697 return bytebpos_to_charbpos (buf, bi_retval); |
698 } | 698 } |
699 | 699 |
700 Bytind | 700 Bytebpos |
701 bi_find_next_newline_no_quit (struct buffer *buf, Bytind from, int count) | 701 bi_find_next_newline_no_quit (struct buffer *buf, Bytebpos from, int count) |
702 { | 702 { |
703 return bi_scan_buffer (buf, '\n', from, 0, count, 0, 0); | 703 return bi_scan_buffer (buf, '\n', from, 0, count, 0, 0); |
704 } | 704 } |
705 | 705 |
706 Bufpos | 706 Charbpos |
707 find_next_newline_no_quit (struct buffer *buf, Bufpos from, int count) | 707 find_next_newline_no_quit (struct buffer *buf, Charbpos from, int count) |
708 { | 708 { |
709 return scan_buffer (buf, '\n', from, 0, count, 0, 0); | 709 return scan_buffer (buf, '\n', from, 0, count, 0, 0); |
710 } | 710 } |
711 | 711 |
712 Bufpos | 712 Charbpos |
713 find_next_newline (struct buffer *buf, Bufpos from, int count) | 713 find_next_newline (struct buffer *buf, Charbpos from, int count) |
714 { | 714 { |
715 return scan_buffer (buf, '\n', from, 0, count, 0, 1); | 715 return scan_buffer (buf, '\n', from, 0, count, 0, 1); |
716 } | 716 } |
717 | 717 |
718 Bytind | 718 Bytebpos |
719 bi_find_next_emchar_in_string (Lisp_String* str, Emchar target, Bytind st, | 719 bi_find_next_emchar_in_string (Lisp_String* str, Emchar target, Bytebpos st, |
720 EMACS_INT count) | 720 EMACS_INT count) |
721 { | 721 { |
722 /* This function has been Mule-ized. */ | 722 /* This function has been Mule-ized. */ |
723 Bytind lim = string_length (str) -1; | 723 Bytebpos lim = string_length (str) -1; |
724 Bufbyte* s = string_data (str); | 724 Intbyte* s = string_data (str); |
725 | 725 |
726 assert (count >= 0); | 726 assert (count >= 0); |
727 | 727 |
728 #ifdef MULE | 728 #ifdef MULE |
729 /* Due to the Mule representation of characters in a buffer, | 729 /* Due to the Mule representation of characters in a buffer, |
735 { | 735 { |
736 while (st < lim && count > 0) | 736 while (st < lim && count > 0) |
737 { | 737 { |
738 if (string_char (str, st) == target) | 738 if (string_char (str, st) == target) |
739 count--; | 739 count--; |
740 INC_CHARBYTIND (s, st); | 740 INC_CHARBYTEBPOS (s, st); |
741 } | 741 } |
742 } | 742 } |
743 else | 743 else |
744 #endif | 744 #endif |
745 { | 745 { |
746 while (st < lim && count > 0) | 746 while (st < lim && count > 0) |
747 { | 747 { |
748 Bufbyte *bufptr = (Bufbyte *) memchr (charptr_n_addr (s, st), | 748 Intbyte *bufptr = (Intbyte *) memchr (charptr_n_addr (s, st), |
749 (int) target, lim - st); | 749 (int) target, lim - st); |
750 if (bufptr) | 750 if (bufptr) |
751 { | 751 { |
752 count--; | 752 count--; |
753 st = (Bytind)(bufptr - s) + 1; | 753 st = (Bytebpos)(bufptr - s) + 1; |
754 } | 754 } |
755 else | 755 else |
756 st = lim; | 756 st = lim; |
757 } | 757 } |
758 } | 758 } |
760 } | 760 } |
761 | 761 |
762 /* Like find_next_newline, but returns position before the newline, | 762 /* Like find_next_newline, but returns position before the newline, |
763 not after, and only search up to TO. This isn't just | 763 not after, and only search up to TO. This isn't just |
764 find_next_newline (...)-1, because you might hit TO. */ | 764 find_next_newline (...)-1, because you might hit TO. */ |
765 Bufpos | 765 Charbpos |
766 find_before_next_newline (struct buffer *buf, Bufpos from, Bufpos to, int count) | 766 find_before_next_newline (struct buffer *buf, Charbpos from, Charbpos to, int count) |
767 { | 767 { |
768 EMACS_INT shortage; | 768 EMACS_INT shortage; |
769 Bufpos pos = scan_buffer (buf, '\n', from, to, count, &shortage, 1); | 769 Charbpos pos = scan_buffer (buf, '\n', from, to, count, &shortage, 1); |
770 | 770 |
771 if (shortage == 0) | 771 if (shortage == 0) |
772 pos--; | 772 pos--; |
773 | 773 |
774 return pos; | 774 return pos; |
777 static Lisp_Object | 777 static Lisp_Object |
778 skip_chars (struct buffer *buf, int forwardp, int syntaxp, | 778 skip_chars (struct buffer *buf, int forwardp, int syntaxp, |
779 Lisp_Object string, Lisp_Object lim) | 779 Lisp_Object string, Lisp_Object lim) |
780 { | 780 { |
781 /* This function has been Mule-ized. */ | 781 /* This function has been Mule-ized. */ |
782 REGISTER Bufbyte *p, *pend; | 782 REGISTER Intbyte *p, *pend; |
783 REGISTER Emchar c; | 783 REGISTER Emchar c; |
784 /* We store the first 256 chars in an array here and the rest in | 784 /* We store the first 256 chars in an array here and the rest in |
785 a range table. */ | 785 a range table. */ |
786 unsigned char fastmap[0400]; | 786 unsigned char fastmap[0400]; |
787 int negate = 0; | 787 int negate = 0; |
788 REGISTER int i; | 788 REGISTER int i; |
789 #ifndef emacs | 789 #ifndef emacs |
790 Lisp_Char_Table *syntax_table = XCHAR_TABLE (buf->mirror_syntax_table); | 790 Lisp_Char_Table *syntax_table = XCHAR_TABLE (buf->mirror_syntax_table); |
791 #endif | 791 #endif |
792 Bufpos limit; | 792 Charbpos limit; |
793 | 793 |
794 if (NILP (lim)) | 794 if (NILP (lim)) |
795 limit = forwardp ? BUF_ZV (buf) : BUF_BEGV (buf); | 795 limit = forwardp ? BUF_ZV (buf) : BUF_BEGV (buf); |
796 else | 796 else |
797 { | 797 { |
878 if (negate) | 878 if (negate) |
879 for (i = 0; i < (int) (sizeof (fastmap)); i++) | 879 for (i = 0; i < (int) (sizeof (fastmap)); i++) |
880 fastmap[i] ^= 1; | 880 fastmap[i] ^= 1; |
881 | 881 |
882 { | 882 { |
883 Bufpos start_point = BUF_PT (buf); | 883 Charbpos start_point = BUF_PT (buf); |
884 | 884 |
885 if (syntaxp) | 885 if (syntaxp) |
886 { | 886 { |
887 SETUP_SYNTAX_CACHE_FOR_BUFFER (buf, BUF_PT (buf), forwardp ? 1 : -1); | 887 SETUP_SYNTAX_CACHE_FOR_BUFFER (buf, BUF_PT (buf), forwardp ? 1 : -1); |
888 /* All syntax designators are normal chars so nothing strange | 888 /* All syntax designators are normal chars so nothing strange |
1015 search_command (Lisp_Object string, Lisp_Object limit, Lisp_Object noerror, | 1015 search_command (Lisp_Object string, Lisp_Object limit, Lisp_Object noerror, |
1016 Lisp_Object count, Lisp_Object buffer, int direction, | 1016 Lisp_Object count, Lisp_Object buffer, int direction, |
1017 int RE, int posix) | 1017 int RE, int posix) |
1018 { | 1018 { |
1019 /* This function has been Mule-ized, except for the trt table handling. */ | 1019 /* This function has been Mule-ized, except for the trt table handling. */ |
1020 REGISTER Bufpos np; | 1020 REGISTER Charbpos np; |
1021 Bufpos lim; | 1021 Charbpos lim; |
1022 EMACS_INT n = direction; | 1022 EMACS_INT n = direction; |
1023 struct buffer *buf; | 1023 struct buffer *buf; |
1024 | 1024 |
1025 if (!NILP (count)) | 1025 if (!NILP (count)) |
1026 { | 1026 { |
1083 static int | 1083 static int |
1084 trivial_regexp_p (Lisp_Object regexp) | 1084 trivial_regexp_p (Lisp_Object regexp) |
1085 { | 1085 { |
1086 /* This function has been Mule-ized. */ | 1086 /* This function has been Mule-ized. */ |
1087 Bytecount len = XSTRING_LENGTH (regexp); | 1087 Bytecount len = XSTRING_LENGTH (regexp); |
1088 Bufbyte *s = XSTRING_DATA (regexp); | 1088 Intbyte *s = XSTRING_DATA (regexp); |
1089 while (--len >= 0) | 1089 while (--len >= 0) |
1090 { | 1090 { |
1091 switch (*s++) | 1091 switch (*s++) |
1092 { | 1092 { |
1093 case '.': case '*': case '+': case '?': case '[': case '^': case '$': | 1093 case '.': case '*': case '+': case '?': case '[': case '^': case '$': |
1112 } | 1112 } |
1113 return 1; | 1113 return 1; |
1114 } | 1114 } |
1115 | 1115 |
1116 /* Search for the n'th occurrence of STRING in BUF, | 1116 /* Search for the n'th occurrence of STRING in BUF, |
1117 starting at position BUFPOS and stopping at position BUFLIM, | 1117 starting at position CHARBPOS and stopping at position BUFLIM, |
1118 treating PAT as a literal string if RE is false or as | 1118 treating PAT as a literal string if RE is false or as |
1119 a regular expression if RE is true. | 1119 a regular expression if RE is true. |
1120 | 1120 |
1121 If N is positive, searching is forward and BUFLIM must be greater | 1121 If N is positive, searching is forward and BUFLIM must be greater |
1122 than BUFPOS. | 1122 than CHARBPOS. |
1123 If N is negative, searching is backward and BUFLIM must be less | 1123 If N is negative, searching is backward and BUFLIM must be less |
1124 than BUFPOS. | 1124 than CHARBPOS. |
1125 | 1125 |
1126 Returns -x if only N-x occurrences found (x > 0), | 1126 Returns -x if only N-x occurrences found (x > 0), |
1127 or else the position at the beginning of the Nth occurrence | 1127 or else the position at the beginning of the Nth occurrence |
1128 (if searching backward) or the end (if searching forward). | 1128 (if searching backward) or the end (if searching forward). |
1129 | 1129 |
1130 POSIX is nonzero if we want full backtracking (POSIX style) | 1130 POSIX is nonzero if we want full backtracking (POSIX style) |
1131 for this pattern. 0 means backtrack only enough to get a valid match. */ | 1131 for this pattern. 0 means backtrack only enough to get a valid match. */ |
1132 static Bufpos | 1132 static Charbpos |
1133 search_buffer (struct buffer *buf, Lisp_Object string, Bufpos bufpos, | 1133 search_buffer (struct buffer *buf, Lisp_Object string, Charbpos charbpos, |
1134 Bufpos buflim, EMACS_INT n, int RE, Lisp_Object trt, | 1134 Charbpos buflim, EMACS_INT n, int RE, Lisp_Object trt, |
1135 Lisp_Object inverse_trt, int posix) | 1135 Lisp_Object inverse_trt, int posix) |
1136 { | 1136 { |
1137 /* This function has been Mule-ized, except for the trt table handling. */ | 1137 /* This function has been Mule-ized, except for the trt table handling. */ |
1138 Bytecount len = XSTRING_LENGTH (string); | 1138 Bytecount len = XSTRING_LENGTH (string); |
1139 Bufbyte *base_pat = XSTRING_DATA (string); | 1139 Intbyte *base_pat = XSTRING_DATA (string); |
1140 REGISTER EMACS_INT i, j; | 1140 REGISTER EMACS_INT i, j; |
1141 Bytind p1, p2; | 1141 Bytebpos p1, p2; |
1142 Bytecount s1, s2; | 1142 Bytecount s1, s2; |
1143 Bytind pos, lim; | 1143 Bytebpos pos, lim; |
1144 | 1144 |
1145 if (running_asynch_code) | 1145 if (running_asynch_code) |
1146 save_search_regs (); | 1146 save_search_regs (); |
1147 | 1147 |
1148 /* Null string is found at starting position. */ | 1148 /* Null string is found at starting position. */ |
1149 if (len == 0) | 1149 if (len == 0) |
1150 { | 1150 { |
1151 set_search_regs (buf, bufpos, 0); | 1151 set_search_regs (buf, charbpos, 0); |
1152 return bufpos; | 1152 return charbpos; |
1153 } | 1153 } |
1154 | 1154 |
1155 /* Searching 0 times means don't move. */ | 1155 /* Searching 0 times means don't move. */ |
1156 if (n == 0) | 1156 if (n == 0) |
1157 return bufpos; | 1157 return charbpos; |
1158 | 1158 |
1159 pos = bufpos_to_bytind (buf, bufpos); | 1159 pos = charbpos_to_bytebpos (buf, charbpos); |
1160 lim = bufpos_to_bytind (buf, buflim); | 1160 lim = charbpos_to_bytebpos (buf, buflim); |
1161 if (RE && !trivial_regexp_p (string)) | 1161 if (RE && !trivial_regexp_p (string)) |
1162 { | 1162 { |
1163 struct re_pattern_buffer *bufp; | 1163 struct re_pattern_buffer *bufp; |
1164 | 1164 |
1165 bufp = compile_pattern (string, &search_regs, trt, posix, | 1165 bufp = compile_pattern (string, &search_regs, trt, posix, |
1201 } | 1201 } |
1202 XSETBUFFER (last_thing_searched, buf); | 1202 XSETBUFFER (last_thing_searched, buf); |
1203 /* Set pos to the new position. */ | 1203 /* Set pos to the new position. */ |
1204 pos = search_regs.start[0]; | 1204 pos = search_regs.start[0]; |
1205 fixup_search_regs_for_buffer (buf); | 1205 fixup_search_regs_for_buffer (buf); |
1206 /* And bufpos too. */ | 1206 /* And charbpos too. */ |
1207 bufpos = search_regs.start[0]; | 1207 charbpos = search_regs.start[0]; |
1208 } | 1208 } |
1209 else | 1209 else |
1210 { | 1210 { |
1211 return n; | 1211 return n; |
1212 } | 1212 } |
1238 } | 1238 } |
1239 XSETBUFFER (last_thing_searched, buf); | 1239 XSETBUFFER (last_thing_searched, buf); |
1240 /* Set pos to the new position. */ | 1240 /* Set pos to the new position. */ |
1241 pos = search_regs.end[0]; | 1241 pos = search_regs.end[0]; |
1242 fixup_search_regs_for_buffer (buf); | 1242 fixup_search_regs_for_buffer (buf); |
1243 /* And bufpos too. */ | 1243 /* And charbpos too. */ |
1244 bufpos = search_regs.end[0]; | 1244 charbpos = search_regs.end[0]; |
1245 } | 1245 } |
1246 else | 1246 else |
1247 { | 1247 { |
1248 return 0 - n; | 1248 return 0 - n; |
1249 } | 1249 } |
1250 n--; | 1250 n--; |
1251 } | 1251 } |
1252 return bufpos; | 1252 return charbpos; |
1253 } | 1253 } |
1254 else /* non-RE case */ | 1254 else /* non-RE case */ |
1255 { | 1255 { |
1256 int charset_base = -1; | 1256 int charset_base = -1; |
1257 int boyer_moore_ok = 1; | 1257 int boyer_moore_ok = 1; |
1258 Bufbyte *pat = 0; | 1258 Intbyte *pat = 0; |
1259 Bufbyte *patbuf = alloca_array (Bufbyte, len * MAX_EMCHAR_LEN); | 1259 Intbyte *patbuf = alloca_array (Intbyte, len * MAX_EMCHAR_LEN); |
1260 pat = patbuf; | 1260 pat = patbuf; |
1261 #ifdef MULE | 1261 #ifdef MULE |
1262 while (len > 0) | 1262 while (len > 0) |
1263 { | 1263 { |
1264 Bufbyte tmp_str[MAX_EMCHAR_LEN]; | 1264 Intbyte tmp_str[MAX_EMCHAR_LEN]; |
1265 Emchar c, translated, inverse; | 1265 Emchar c, translated, inverse; |
1266 Bytecount orig_bytelen, new_bytelen, inv_bytelen; | 1266 Bytecount orig_bytelen, new_bytelen, inv_bytelen; |
1267 | 1267 |
1268 /* If we got here and the RE flag is set, it's because | 1268 /* If we got here and the RE flag is set, it's because |
1269 we're dealing with a regexp known to be trivial, so the | 1269 we're dealing with a regexp known to be trivial, so the |
1335 | 1335 |
1336 This kind of search works regardless of what is in PAT and | 1336 This kind of search works regardless of what is in PAT and |
1337 regardless of what is in TRT. It is used in cases where | 1337 regardless of what is in TRT. It is used in cases where |
1338 boyer_moore cannot work. */ | 1338 boyer_moore cannot work. */ |
1339 | 1339 |
1340 static Bufpos | 1340 static Charbpos |
1341 simple_search (struct buffer *buf, Bufbyte *base_pat, Bytecount len_byte, | 1341 simple_search (struct buffer *buf, Intbyte *base_pat, Bytecount len_byte, |
1342 Bytind idx, Bytind lim, EMACS_INT n, Lisp_Object trt) | 1342 Bytebpos idx, Bytebpos lim, EMACS_INT n, Lisp_Object trt) |
1343 { | 1343 { |
1344 int forward = n > 0; | 1344 int forward = n > 0; |
1345 Bytecount buf_len = 0; /* Shut up compiler. */ | 1345 Bytecount buf_len = 0; /* Shut up compiler. */ |
1346 | 1346 |
1347 if (lim > idx) | 1347 if (lim > idx) |
1348 while (n > 0) | 1348 while (n > 0) |
1349 { | 1349 { |
1350 while (1) | 1350 while (1) |
1351 { | 1351 { |
1352 Bytecount this_len = len_byte; | 1352 Bytecount this_len = len_byte; |
1353 Bytind this_idx = idx; | 1353 Bytebpos this_idx = idx; |
1354 Bufbyte *p = base_pat; | 1354 Intbyte *p = base_pat; |
1355 if (idx >= lim) | 1355 if (idx >= lim) |
1356 goto stop; | 1356 goto stop; |
1357 | 1357 |
1358 while (this_len > 0) | 1358 while (this_len > 0) |
1359 { | 1359 { |
1369 break; | 1369 break; |
1370 | 1370 |
1371 pat_len = charcount_to_bytecount (p, 1); | 1371 pat_len = charcount_to_bytecount (p, 1); |
1372 p += pat_len; | 1372 p += pat_len; |
1373 this_len -= pat_len; | 1373 this_len -= pat_len; |
1374 INC_BYTIND (buf, this_idx); | 1374 INC_BYTEBPOS (buf, this_idx); |
1375 } | 1375 } |
1376 if (this_len == 0) | 1376 if (this_len == 0) |
1377 { | 1377 { |
1378 buf_len = this_idx - idx; | 1378 buf_len = this_idx - idx; |
1379 idx = this_idx; | 1379 idx = this_idx; |
1380 break; | 1380 break; |
1381 } | 1381 } |
1382 INC_BYTIND (buf, idx); | 1382 INC_BYTEBPOS (buf, idx); |
1383 } | 1383 } |
1384 n--; | 1384 n--; |
1385 } | 1385 } |
1386 else | 1386 else |
1387 while (n < 0) | 1387 while (n < 0) |
1388 { | 1388 { |
1389 while (1) | 1389 while (1) |
1390 { | 1390 { |
1391 Bytecount this_len = len_byte; | 1391 Bytecount this_len = len_byte; |
1392 Bytind this_idx = idx; | 1392 Bytebpos this_idx = idx; |
1393 Bufbyte *p; | 1393 Intbyte *p; |
1394 if (idx <= lim) | 1394 if (idx <= lim) |
1395 goto stop; | 1395 goto stop; |
1396 p = base_pat + len_byte; | 1396 p = base_pat + len_byte; |
1397 | 1397 |
1398 while (this_len > 0) | 1398 while (this_len > 0) |
1399 { | 1399 { |
1400 Emchar pat_ch, buf_ch; | 1400 Emchar pat_ch, buf_ch; |
1401 | 1401 |
1402 DEC_CHARPTR (p); | 1402 DEC_CHARPTR (p); |
1403 DEC_BYTIND (buf, this_idx); | 1403 DEC_BYTEBPOS (buf, this_idx); |
1404 pat_ch = charptr_emchar (p); | 1404 pat_ch = charptr_emchar (p); |
1405 buf_ch = BI_BUF_FETCH_CHAR (buf, this_idx); | 1405 buf_ch = BI_BUF_FETCH_CHAR (buf, this_idx); |
1406 | 1406 |
1407 buf_ch = TRANSLATE (trt, buf_ch); | 1407 buf_ch = TRANSLATE (trt, buf_ch); |
1408 | 1408 |
1415 { | 1415 { |
1416 buf_len = idx - this_idx; | 1416 buf_len = idx - this_idx; |
1417 idx = this_idx; | 1417 idx = this_idx; |
1418 break; | 1418 break; |
1419 } | 1419 } |
1420 DEC_BYTIND (buf, idx); | 1420 DEC_BYTEBPOS (buf, idx); |
1421 } | 1421 } |
1422 n++; | 1422 n++; |
1423 } | 1423 } |
1424 stop: | 1424 stop: |
1425 if (n == 0) | 1425 if (n == 0) |
1426 { | 1426 { |
1427 Bufpos beg, end, retval; | 1427 Charbpos beg, end, retval; |
1428 if (forward) | 1428 if (forward) |
1429 { | 1429 { |
1430 beg = bytind_to_bufpos (buf, idx - buf_len); | 1430 beg = bytebpos_to_charbpos (buf, idx - buf_len); |
1431 retval = end = bytind_to_bufpos (buf, idx); | 1431 retval = end = bytebpos_to_charbpos (buf, idx); |
1432 } | 1432 } |
1433 else | 1433 else |
1434 { | 1434 { |
1435 retval = beg = bytind_to_bufpos (buf, idx); | 1435 retval = beg = bytebpos_to_charbpos (buf, idx); |
1436 end = bytind_to_bufpos (buf, idx + buf_len); | 1436 end = bytebpos_to_charbpos (buf, idx + buf_len); |
1437 } | 1437 } |
1438 set_search_regs (buf, beg, end - beg); | 1438 set_search_regs (buf, beg, end - beg); |
1439 | 1439 |
1440 return retval; | 1440 return retval; |
1441 } | 1441 } |
1456 makes it possible to translate just the last byte of a character, | 1456 makes it possible to translate just the last byte of a character, |
1457 and do so after just a simple test of the context. | 1457 and do so after just a simple test of the context. |
1458 | 1458 |
1459 If that criterion is not satisfied, do not call this function. */ | 1459 If that criterion is not satisfied, do not call this function. */ |
1460 | 1460 |
1461 static Bufpos | 1461 static Charbpos |
1462 boyer_moore (struct buffer *buf, Bufbyte *base_pat, Bytecount len, | 1462 boyer_moore (struct buffer *buf, Intbyte *base_pat, Bytecount len, |
1463 Bytind pos, Bytind lim, EMACS_INT n, Lisp_Object trt, | 1463 Bytebpos pos, Bytebpos lim, EMACS_INT n, Lisp_Object trt, |
1464 Lisp_Object inverse_trt, int charset_base) | 1464 Lisp_Object inverse_trt, int charset_base) |
1465 { | 1465 { |
1466 /* #### Someone really really really needs to comment the workings | 1466 /* #### Someone really really really needs to comment the workings |
1467 of this junk somewhat better. | 1467 of this junk somewhat better. |
1468 | 1468 |
1494 is what the BM_tab holds. */ | 1494 is what the BM_tab holds. */ |
1495 REGISTER EMACS_INT *BM_tab; | 1495 REGISTER EMACS_INT *BM_tab; |
1496 EMACS_INT *BM_tab_base; | 1496 EMACS_INT *BM_tab_base; |
1497 REGISTER Bytecount dirlen; | 1497 REGISTER Bytecount dirlen; |
1498 EMACS_INT infinity; | 1498 EMACS_INT infinity; |
1499 Bytind limit; | 1499 Bytebpos limit; |
1500 Bytecount stride_for_teases = 0; | 1500 Bytecount stride_for_teases = 0; |
1501 REGISTER EMACS_INT i, j; | 1501 REGISTER EMACS_INT i, j; |
1502 Bufbyte *pat, *pat_end; | 1502 Intbyte *pat, *pat_end; |
1503 REGISTER Bufbyte *cursor, *p_limit, *ptr2; | 1503 REGISTER Intbyte *cursor, *p_limit, *ptr2; |
1504 Bufbyte simple_translate[0400]; | 1504 Intbyte simple_translate[0400]; |
1505 REGISTER int direction = ((n > 0) ? 1 : -1); | 1505 REGISTER int direction = ((n > 0) ? 1 : -1); |
1506 #ifdef MULE | 1506 #ifdef MULE |
1507 Bufbyte translate_prev_byte = 0; | 1507 Intbyte translate_prev_byte = 0; |
1508 Bufbyte translate_anteprev_byte = 0; | 1508 Intbyte translate_anteprev_byte = 0; |
1509 #endif | 1509 #endif |
1510 #ifdef C_ALLOCA | 1510 #ifdef C_ALLOCA |
1511 EMACS_INT BM_tab_space[0400]; | 1511 EMACS_INT BM_tab_space[0400]; |
1512 BM_tab = &BM_tab_space[0]; | 1512 BM_tab = &BM_tab_space[0]; |
1513 #else | 1513 #else |
1564 /* We use this for translation, instead of TRT itself. We | 1564 /* We use this for translation, instead of TRT itself. We |
1565 fill this in to handle the characters that actually occur | 1565 fill this in to handle the characters that actually occur |
1566 in the pattern. Others don't matter anyway! */ | 1566 in the pattern. Others don't matter anyway! */ |
1567 xzero (simple_translate); | 1567 xzero (simple_translate); |
1568 for (i = 0; i < 0400; i++) | 1568 for (i = 0; i < 0400; i++) |
1569 simple_translate[i] = (Bufbyte) i; | 1569 simple_translate[i] = (Intbyte) i; |
1570 i = 0; | 1570 i = 0; |
1571 while (i != infinity) | 1571 while (i != infinity) |
1572 { | 1572 { |
1573 Bufbyte *ptr = base_pat + i; | 1573 Intbyte *ptr = base_pat + i; |
1574 i += direction; | 1574 i += direction; |
1575 if (i == dirlen) | 1575 if (i == dirlen) |
1576 i = infinity; | 1576 i = infinity; |
1577 if (!NILP (trt)) | 1577 if (!NILP (trt)) |
1578 { | 1578 { |
1579 #ifdef MULE | 1579 #ifdef MULE |
1580 Emchar ch, untranslated; | 1580 Emchar ch, untranslated; |
1581 int this_translated = 1; | 1581 int this_translated = 1; |
1582 | 1582 |
1583 /* Is *PTR the last byte of a character? */ | 1583 /* Is *PTR the last byte of a character? */ |
1584 if (pat_end - ptr == 1 || BUFBYTE_FIRST_BYTE_P (ptr[1])) | 1584 if (pat_end - ptr == 1 || INTBYTE_FIRST_BYTE_P (ptr[1])) |
1585 { | 1585 { |
1586 Bufbyte *charstart = ptr; | 1586 Intbyte *charstart = ptr; |
1587 while (!BUFBYTE_FIRST_BYTE_P (*charstart)) | 1587 while (!INTBYTE_FIRST_BYTE_P (*charstart)) |
1588 charstart--; | 1588 charstart--; |
1589 untranslated = charptr_emchar (charstart); | 1589 untranslated = charptr_emchar (charstart); |
1590 if (charset_base == (untranslated & ~CHAR_FIELD3_MASK)) | 1590 if (charset_base == (untranslated & ~CHAR_FIELD3_MASK)) |
1591 { | 1591 { |
1592 ch = TRANSLATE (trt, untranslated); | 1592 ch = TRANSLATE (trt, untranslated); |
1593 if (!BUFBYTE_FIRST_BYTE_P (*ptr)) | 1593 if (!INTBYTE_FIRST_BYTE_P (*ptr)) |
1594 { | 1594 { |
1595 translate_prev_byte = ptr[-1]; | 1595 translate_prev_byte = ptr[-1]; |
1596 if (!BUFBYTE_FIRST_BYTE_P (translate_prev_byte)) | 1596 if (!INTBYTE_FIRST_BYTE_P (translate_prev_byte)) |
1597 translate_anteprev_byte = ptr[-2]; | 1597 translate_anteprev_byte = ptr[-2]; |
1598 } | 1598 } |
1599 } | 1599 } |
1600 else | 1600 else |
1601 { | 1601 { |
1649 /* A translation table is accompanied by its inverse -- | 1649 /* A translation table is accompanied by its inverse -- |
1650 see comment following downcase_table for details */ | 1650 see comment following downcase_table for details */ |
1651 | 1651 |
1652 while ((j = TRANSLATE (inverse_trt, j)) != k) | 1652 while ((j = TRANSLATE (inverse_trt, j)) != k) |
1653 { | 1653 { |
1654 simple_translate[j] = (Bufbyte) k; | 1654 simple_translate[j] = (Intbyte) k; |
1655 BM_tab[j] = dirlen - i; | 1655 BM_tab[j] = dirlen - i; |
1656 } | 1656 } |
1657 #endif | 1657 #endif |
1658 } | 1658 } |
1659 else | 1659 else |
1674 pos += dirlen - ((direction > 0) ? direction : 0); | 1674 pos += dirlen - ((direction > 0) ? direction : 0); |
1675 /* loop invariant - pos points at where last char (first char if | 1675 /* loop invariant - pos points at where last char (first char if |
1676 reverse) of pattern would align in a possible match. */ | 1676 reverse) of pattern would align in a possible match. */ |
1677 while (n != 0) | 1677 while (n != 0) |
1678 { | 1678 { |
1679 Bytind tail_end; | 1679 Bytebpos tail_end; |
1680 Bufbyte *tail_end_ptr; | 1680 Intbyte *tail_end_ptr; |
1681 /* It's been reported that some (broken) compiler thinks | 1681 /* It's been reported that some (broken) compiler thinks |
1682 that Boolean expressions in an arithmetic context are | 1682 that Boolean expressions in an arithmetic context are |
1683 unsigned. Using an explicit ?1:0 prevents this. */ | 1683 unsigned. Using an explicit ?1:0 prevents this. */ |
1684 if ((lim - pos - ((direction > 0) ? 1 : 0)) * direction < 0) | 1684 if ((lim - pos - ((direction > 0) ? 1 : 0)) * direction < 0) |
1685 return n * (0 - direction); | 1685 return n * (0 - direction); |
1760 #ifdef MULE | 1760 #ifdef MULE |
1761 Emchar ch; | 1761 Emchar ch; |
1762 cursor -= direction; | 1762 cursor -= direction; |
1763 /* Translate only the last byte of a character. */ | 1763 /* Translate only the last byte of a character. */ |
1764 if ((cursor == tail_end_ptr | 1764 if ((cursor == tail_end_ptr |
1765 || BUFBYTE_FIRST_BYTE_P (cursor[1])) | 1765 || INTBYTE_FIRST_BYTE_P (cursor[1])) |
1766 && (BUFBYTE_FIRST_BYTE_P (cursor[0]) | 1766 && (INTBYTE_FIRST_BYTE_P (cursor[0]) |
1767 || (translate_prev_byte == cursor[-1] | 1767 || (translate_prev_byte == cursor[-1] |
1768 && (BUFBYTE_FIRST_BYTE_P (translate_prev_byte) | 1768 && (INTBYTE_FIRST_BYTE_P (translate_prev_byte) |
1769 || translate_anteprev_byte == cursor[-2])))) | 1769 || translate_anteprev_byte == cursor[-2])))) |
1770 ch = simple_translate[*cursor]; | 1770 ch = simple_translate[*cursor]; |
1771 else | 1771 else |
1772 ch = *cursor; | 1772 ch = *cursor; |
1773 if (pat[i] != ch) | 1773 if (pat[i] != ch) |
1788 if (i + direction == 0) | 1788 if (i + direction == 0) |
1789 { | 1789 { |
1790 cursor -= direction; | 1790 cursor -= direction; |
1791 | 1791 |
1792 { | 1792 { |
1793 Bytind bytstart = (pos + cursor - ptr2 + | 1793 Bytebpos bytstart = (pos + cursor - ptr2 + |
1794 ((direction > 0) | 1794 ((direction > 0) |
1795 ? 1 - len : 0)); | 1795 ? 1 - len : 0)); |
1796 Bufpos bufstart = bytind_to_bufpos (buf, bytstart); | 1796 Charbpos bufstart = bytebpos_to_charbpos (buf, bytstart); |
1797 Bufpos bufend = bytind_to_bufpos (buf, bytstart + len); | 1797 Charbpos bufend = bytebpos_to_charbpos (buf, bytstart + len); |
1798 | 1798 |
1799 set_search_regs (buf, bufstart, bufend - bufstart); | 1799 set_search_regs (buf, bufstart, bufend - bufstart); |
1800 } | 1800 } |
1801 | 1801 |
1802 if ((n -= direction) != 0) | 1802 if ((n -= direction) != 0) |
1844 i = dirlen - direction; | 1844 i = dirlen - direction; |
1845 while ((i -= direction) + direction != 0) | 1845 while ((i -= direction) + direction != 0) |
1846 { | 1846 { |
1847 #ifdef MULE | 1847 #ifdef MULE |
1848 Emchar ch; | 1848 Emchar ch; |
1849 Bufbyte *ptr; | 1849 Intbyte *ptr; |
1850 #endif | 1850 #endif |
1851 pos -= direction; | 1851 pos -= direction; |
1852 #ifdef MULE | 1852 #ifdef MULE |
1853 ptr = BI_BUF_BYTE_ADDRESS (buf, pos); | 1853 ptr = BI_BUF_BYTE_ADDRESS (buf, pos); |
1854 if ((ptr == tail_end_ptr | 1854 if ((ptr == tail_end_ptr |
1855 || BUFBYTE_FIRST_BYTE_P (ptr[1])) | 1855 || INTBYTE_FIRST_BYTE_P (ptr[1])) |
1856 && (BUFBYTE_FIRST_BYTE_P (ptr[0]) | 1856 && (INTBYTE_FIRST_BYTE_P (ptr[0]) |
1857 || (translate_prev_byte == ptr[-1] | 1857 || (translate_prev_byte == ptr[-1] |
1858 && (BUFBYTE_FIRST_BYTE_P (translate_prev_byte) | 1858 && (INTBYTE_FIRST_BYTE_P (translate_prev_byte) |
1859 || translate_anteprev_byte == ptr[-2])))) | 1859 || translate_anteprev_byte == ptr[-2])))) |
1860 ch = simple_translate[*ptr]; | 1860 ch = simple_translate[*ptr]; |
1861 else | 1861 else |
1862 ch = *ptr; | 1862 ch = *ptr; |
1863 if (pat[i] != ch) | 1863 if (pat[i] != ch) |
1877 if (i + direction == 0) | 1877 if (i + direction == 0) |
1878 { | 1878 { |
1879 pos -= direction; | 1879 pos -= direction; |
1880 | 1880 |
1881 { | 1881 { |
1882 Bytind bytstart = (pos + | 1882 Bytebpos bytstart = (pos + |
1883 ((direction > 0) | 1883 ((direction > 0) |
1884 ? 1 - len : 0)); | 1884 ? 1 - len : 0)); |
1885 Bufpos bufstart = bytind_to_bufpos (buf, bytstart); | 1885 Charbpos bufstart = bytebpos_to_charbpos (buf, bytstart); |
1886 Bufpos bufend = bytind_to_bufpos (buf, bytstart + len); | 1886 Charbpos bufend = bytebpos_to_charbpos (buf, bytstart + len); |
1887 | 1887 |
1888 set_search_regs (buf, bufstart, bufend - bufstart); | 1888 set_search_regs (buf, bufstart, bufend - bufstart); |
1889 } | 1889 } |
1890 | 1890 |
1891 if ((n -= direction) != 0) | 1891 if ((n -= direction) != 0) |
1900 } | 1900 } |
1901 /* We have done one clump. Can we continue? */ | 1901 /* We have done one clump. Can we continue? */ |
1902 if ((lim - pos) * direction < 0) | 1902 if ((lim - pos) * direction < 0) |
1903 return (0 - n) * direction; | 1903 return (0 - n) * direction; |
1904 } | 1904 } |
1905 return bytind_to_bufpos (buf, pos); | 1905 return bytebpos_to_charbpos (buf, pos); |
1906 } | 1906 } |
1907 | 1907 |
1908 /* Record beginning BEG and end BEG + LEN | 1908 /* Record beginning BEG and end BEG + LEN |
1909 for a match just found in the current buffer. */ | 1909 for a match just found in the current buffer. */ |
1910 | 1910 |
1911 static void | 1911 static void |
1912 set_search_regs (struct buffer *buf, Bufpos beg, Charcount len) | 1912 set_search_regs (struct buffer *buf, Charbpos beg, Charcount len) |
1913 { | 1913 { |
1914 /* This function has been Mule-ized. */ | 1914 /* This function has been Mule-ized. */ |
1915 /* Make sure we have registers in which to store | 1915 /* Make sure we have registers in which to store |
1916 the match position. */ | 1916 the match position. */ |
1917 if (search_regs.num_regs == 0) | 1917 if (search_regs.num_regs == 0) |
1955 if (!word_count) return build_string (""); | 1955 if (!word_count) return build_string (""); |
1956 | 1956 |
1957 { | 1957 { |
1958 /* The following value is an upper bound on the amount of storage we | 1958 /* The following value is an upper bound on the amount of storage we |
1959 need. In non-Mule, it is exact. */ | 1959 need. In non-Mule, it is exact. */ |
1960 Bufbyte *storage = | 1960 Intbyte *storage = |
1961 (Bufbyte *) alloca (XSTRING_LENGTH (string) - punct_count + | 1961 (Intbyte *) alloca (XSTRING_LENGTH (string) - punct_count + |
1962 5 * (word_count - 1) + 4); | 1962 5 * (word_count - 1) + 4); |
1963 Bufbyte *o = storage; | 1963 Intbyte *o = storage; |
1964 | 1964 |
1965 *o++ = '\\'; | 1965 *o++ = '\\'; |
1966 *o++ = 'b'; | 1966 *o++ = 'b'; |
1967 | 1967 |
1968 for (i = 0; i < len; i++) | 1968 for (i = 0; i < len; i++) |
2257 (replacement, fixedcase, literal, string, strbuffer)) | 2257 (replacement, fixedcase, literal, string, strbuffer)) |
2258 { | 2258 { |
2259 /* This function has been Mule-ized. */ | 2259 /* This function has been Mule-ized. */ |
2260 /* This function can GC */ | 2260 /* This function can GC */ |
2261 enum { nochange, all_caps, cap_initial } case_action; | 2261 enum { nochange, all_caps, cap_initial } case_action; |
2262 Bufpos pos, last; | 2262 Charbpos pos, last; |
2263 int some_multiletter_word; | 2263 int some_multiletter_word; |
2264 int some_lowercase; | 2264 int some_lowercase; |
2265 int some_uppercase; | 2265 int some_uppercase; |
2266 int some_nonuppercase_initial; | 2266 int some_nonuppercase_initial; |
2267 Emchar c, prevc; | 2267 Emchar c, prevc; |
2646 | 2646 |
2647 /* Now go through and make all the case changes that were requested | 2647 /* Now go through and make all the case changes that were requested |
2648 in the replacement string. */ | 2648 in the replacement string. */ |
2649 if (ul_pos_dynarr) | 2649 if (ul_pos_dynarr) |
2650 { | 2650 { |
2651 Bufpos eend = BUF_PT (buf); | 2651 Charbpos eend = BUF_PT (buf); |
2652 int i = 0; | 2652 int i = 0; |
2653 int cur_action = 'E'; | 2653 int cur_action = 'E'; |
2654 | 2654 |
2655 for (pos = BUF_PT (buf) - inslen; pos < eend; pos++) | 2655 for (pos = BUF_PT (buf) - inslen; pos < eend; pos++) |
2656 { | 2656 { |
2754 data = alloca_array (Lisp_Object, 2 * search_regs.num_regs); | 2754 data = alloca_array (Lisp_Object, 2 * search_regs.num_regs); |
2755 | 2755 |
2756 len = -1; | 2756 len = -1; |
2757 for (i = 0; i < search_regs.num_regs; i++) | 2757 for (i = 0; i < search_regs.num_regs; i++) |
2758 { | 2758 { |
2759 Bufpos start = search_regs.start[i]; | 2759 Charbpos start = search_regs.start[i]; |
2760 if (start >= 0) | 2760 if (start >= 0) |
2761 { | 2761 { |
2762 if (EQ (last_thing_searched, Qt) | 2762 if (EQ (last_thing_searched, Qt) |
2763 || !NILP (integers)) | 2763 || !NILP (integers)) |
2764 { | 2764 { |
2931 DEFUN ("regexp-quote", Fregexp_quote, 1, 1, 0, /* | 2931 DEFUN ("regexp-quote", Fregexp_quote, 1, 1, 0, /* |
2932 Return a regexp string which matches exactly STRING and nothing else. | 2932 Return a regexp string which matches exactly STRING and nothing else. |
2933 */ | 2933 */ |
2934 (string)) | 2934 (string)) |
2935 { | 2935 { |
2936 REGISTER Bufbyte *in, *out, *end; | 2936 REGISTER Intbyte *in, *out, *end; |
2937 REGISTER Bufbyte *temp; | 2937 REGISTER Intbyte *temp; |
2938 | 2938 |
2939 CHECK_STRING (string); | 2939 CHECK_STRING (string); |
2940 | 2940 |
2941 temp = (Bufbyte *) alloca (XSTRING_LENGTH (string) * 2); | 2941 temp = (Intbyte *) alloca (XSTRING_LENGTH (string) * 2); |
2942 | 2942 |
2943 /* Now copy the data into the new string, inserting escapes. */ | 2943 /* Now copy the data into the new string, inserting escapes. */ |
2944 | 2944 |
2945 in = XSTRING_DATA (string); | 2945 in = XSTRING_DATA (string); |
2946 end = in + XSTRING_LENGTH (string); | 2946 end = in + XSTRING_LENGTH (string); |