Mercurial > hg > xemacs-beta
comparison src/regex.c @ 826:6728e641994e
[xemacs-hg @ 2002-05-05 11:30:15 by ben]
syntax cache, 8-bit-format, lots of code cleanup
README.packages: Update info about --package-path.
i.c: Create an inheritable event and pass it on to XEmacs, so that ^C
can be handled properly. Intercept ^C and signal the event.
"Stop Build" in VC++ now works.
bytecomp-runtime.el: Doc string changes.
compat.el: Some attempts to redo this to
make it truly useful and fix the "multiple versions interacting
with each other" problem. Not yet done. Currently doesn't work.
files.el: Use with-obsolete-variable to avoid warnings in new revert-buffer code.
xemacs.mak: Split up CFLAGS into a version without flags specifying the C
library. The problem seems to be that minitar depends on zlib,
which depends specifically on libc.lib, not on any of the other C
libraries. Unless you compile with libc.lib, you get errors --
specifically, no _errno in the other libraries, which must make it
something other than an int. (#### But this doesn't seem to obtain
in XEmacs, which also uses zlib, and can be linked with any of the
C libraries. Maybe zlib is used differently and doesn't need
errno, or maybe XEmacs provides an int errno; ... I don't
understand.
Makefile.in.in: Fix so that packages are around when testing.
abbrev.c, alloc.c, buffer.c, buffer.h, bytecode.c, callint.c, casefiddle.c, casetab.c, casetab.h, charset.h, chartab.c, chartab.h, cmds.c, console-msw.h, console-stream.c, console-x.c, console.c, console.h, data.c, device-msw.c, device.c, device.h, dialog-msw.c, dialog-x.c, dired-msw.c, dired.c, doc.c, doprnt.c, dumper.c, editfns.c, elhash.c, emacs.c, eval.c, event-Xt.c, event-gtk.c, event-msw.c, event-stream.c, events.c, events.h, extents.c, extents.h, faces.c, file-coding.c, file-coding.h, fileio.c, fns.c, font-lock.c, frame-gtk.c, frame-msw.c, frame-x.c, frame.c, frame.h, glade.c, glyphs-gtk.c, glyphs-msw.c, glyphs-msw.h, glyphs-x.c, glyphs.c, glyphs.h, gui-msw.c, gui-x.c, gui.h, gutter.h, hash.h, indent.c, insdel.c, intl-win32.c, intl.c, keymap.c, lisp-disunion.h, lisp-union.h, lisp.h, lread.c, lrecord.h, lstream.c, lstream.h, marker.c, menubar-gtk.c, menubar-msw.c, menubar-x.c, menubar.c, minibuf.c, mule-ccl.c, mule-charset.c, mule-coding.c, mule-wnnfns.c, nas.c, objects-msw.c, objects-x.c, opaque.c, postgresql.c, print.c, process-nt.c, process-unix.c, process.c, process.h, profile.c, rangetab.c, redisplay-gtk.c, redisplay-msw.c, redisplay-output.c, redisplay-x.c, redisplay.c, redisplay.h, regex.c, regex.h, scrollbar-msw.c, search.c, select-x.c, specifier.c, specifier.h, symbols.c, symsinit.h, syntax.c, syntax.h, syswindows.h, tests.c, text.c, text.h, tooltalk.c, ui-byhand.c, ui-gtk.c, unicode.c, win32.c, window.c: Another big Ben patch.
-- FUNCTIONALITY CHANGES:
add partial support for 8-bit-fixed, 16-bit-fixed, and
32-bit-fixed formats. not quite done yet. (in particular, needs
functions to actually convert the buffer.) NOTE: lots of changes
to regex.c here. also, many new *_fmt() inline funs that take an
Internal_Format argument.
redo syntax cache code. make the cache per-buffer; keep the cache
valid across calls to functions that use it. also keep it valid
across insertions/deletions and extent changes, as much as is
possible. eliminate the junky regex-reentrancy code by passing in
the relevant lisp info to the regex routines as local vars.
add general mechanism in extents code for signalling extent changes.
fix numerous problems with the case-table implementation; yoshiki
never properly transferred many algorithms from old-style to
new-style case tables.
redo char tables to support a default argument, so that mapping
only occurs over changed args. change many chartab functions to
accept Lisp_Object instead of Lisp_Char_Table *.
comment out the code in font-lock.c by default, because
font-lock.el no longer uses it. we should consider eliminating it
entirely.
Don't output bell as ^G in console-stream when not a TTY.
add -mswindows-termination-handle to interface with i.c, so we can
properly kill a build.
add more error-checking to buffer/string macros.
add some additional buffer_or_string_() funs.
-- INTERFACE CHANGES AFFECTING MORE CODE:
switch the arguments of write_c_string and friends to be
consistent with write_fmt_string, which must have printcharfun
first.
change BI_* macros to BYTE_* for increased clarity; similarly for
bi_* local vars.
change VOID_TO_LISP to be a one-argument function. eliminate
no-longer-needed CVOID_TO_LISP.
-- char/string macro changes:
rename MAKE_CHAR() to make_emchar() for slightly less confusion
with make_char(). (The former generates an Emchar, the latter a
Lisp object. Conceivably we should rename make_char() -> wrap_char()
and similarly for make_int(), make_float().)
Similar changes for other *CHAR* macros -- we now consistently use
names with `emchar' whenever we are working with Emchars. Any
remaining name with just `char' always refers to a Lisp object.
rename macros with XSTRING_* to string_* except for those that
reference actual fields in the Lisp_String object, following
conventions used elsewhere.
rename set_string_{data,length} macros (the only ones to work with
a Lisp_String_* instead of a Lisp_Object) to set_lispstringp_*
to make the difference clear.
try to be consistent about caps vs. lowercase in macro/inline-fun
names for chars and such, which wasn't the case before. we now
reserve caps either for XFOO_ macros that reference object fields
(e.g. XSTRING_DATA) or for things that have non-function semantics,
e.g. directly modifying an arg (BREAKUP_EMCHAR) or evaluating an
arg (any arg) more than once. otherwise, use lowercase.
here is a summary of most of the macros/inline funs changed by all
of the above changes:
BYTE_*_P -> byte_*_p
XSTRING_BYTE -> string_byte
set_string_data/length -> set_lispstringp_data/length
XSTRING_CHAR_LENGTH -> string_char_length
XSTRING_CHAR -> string_emchar
INTBYTE_FIRST_BYTE_P -> intbyte_first_byte_p
INTBYTE_LEADING_BYTE_P -> intbyte_leading_byte_p
charptr_copy_char -> charptr_copy_emchar
LEADING_BYTE_* -> leading_byte_*
CHAR_* -> EMCHAR_*
*_CHAR_* -> *_EMCHAR_*
*_CHAR -> *_EMCHAR
CHARSET_BY_ -> charset_by_*
BYTE_SHIFT_JIS* -> byte_shift_jis*
BYTE_BIG5* -> byte_big5*
REP_BYTES_BY_FIRST_BYTE -> rep_bytes_by_first_byte
char_to_unicode -> emchar_to_unicode
valid_char_p -> valid_emchar_p
Change intbyte_strcmp -> qxestrcmp_c (duplicated functionality).
-- INTERFACE CHANGES AFFECTING LESS CODE:
use DECLARE_INLINE_HEADER in various places.
remove '#ifdef emacs' from XEmacs-only files.
eliminate CHAR_TABLE_VALUE(), which duplicated the functionality
of get_char_table().
add BUFFER_TEXT_LOOP to simplify iterations over buffer text.
define typedefs for signed and unsigned types of fixed sizes
(INT_32_BIT, UINT_32_BIT, etc.).
create ALIGN_FOR_TYPE as a higher-level interface onto ALIGN_SIZE;
fix code to use it.
add charptr_emchar_len to return the text length of the character
pointed to by a ptr; use it in place of
charcount_to_bytecount(..., 1). add emchar_len to return the text
length of a given character.
add types Bytexpos and Charxpos to generalize Bytebpos/Bytecount
and Charbpos/Charcount, in code (particularly, the extents code
and redisplay code) that works with either kind of index. rename
redisplay struct params with names such as `charbpos' to
e.g. `charpos' when they are e.g. a Charxpos, not a Charbpos.
eliminate xxDEFUN in place of DEFUN; no longer necessary with
changes awhile back to doc.c.
split up big ugly combined list of EXFUNs in lisp.h on a
file-by-file basis, since other prototypes are similarly split.
rewrite some "*_UNSAFE" macros as inline funs and eliminate the
_UNSAFE suffix.
move most string code from lisp.h to text.h; the string code and
text.h code is now intertwined in such a fashion that they need
to be in the same place and partially interleaved. (you can't
create forward references for inline funs)
automated/lisp-tests.el, automated/symbol-tests.el, automated/test-harness.el: Fix test harness to output FAIL messages to stderr when in
batch mode.
Fix up some problems in lisp-tests/symbol-tests that were
causing spurious failures.
author | ben |
---|---|
date | Sun, 05 May 2002 11:33:57 +0000 |
parents | a634e3b7acc8 |
children | 804517e16990 |
comparison
equal
deleted
inserted
replaced
825:eb3bc15a6e0f | 826:6728e641994e |
---|---|
22 the Free Software Foundation, Inc., 59 Temple Place - Suite 330, | 22 the Free Software Foundation, Inc., 59 Temple Place - Suite 330, |
23 Boston, MA 02111-1307, USA. */ | 23 Boston, MA 02111-1307, USA. */ |
24 | 24 |
25 /* Synched up with: FSF 19.29. */ | 25 /* Synched up with: FSF 19.29. */ |
26 | 26 |
27 /* Changes made for XEmacs: | |
28 | |
29 (1) the REGEX_BEGLINE_CHECK code from the XEmacs v18 regex routines | |
30 was added. This causes a huge speedup in font-locking. | |
31 (2) Rel-alloc is disabled when the MMAP version of rel-alloc is | |
32 being used, because it's too slow -- all those calls to mmap() | |
33 add humongous overhead. | |
34 (3) Lots and lots of changes for Mule. They are bracketed by | |
35 `#ifdef MULE' or with comments that have `XEmacs' in them. | |
36 */ | |
37 | |
38 #ifdef HAVE_CONFIG_H | 27 #ifdef HAVE_CONFIG_H |
39 #include <config.h> | 28 #include <config.h> |
40 #endif | 29 #endif |
41 | 30 |
42 #ifndef REGISTER /* Rigidly enforced as of 20.3 */ | 31 #ifndef REGISTER /* Rigidly enforced as of 20.3 */ |
43 #define REGISTER | 32 #define REGISTER |
44 #endif | 33 #endif |
45 | 34 |
46 #ifndef _GNU_SOURCE | 35 #ifndef _GNU_SOURCE |
47 #define _GNU_SOURCE 1 | 36 #define _GNU_SOURCE 1 |
48 #endif | |
49 | |
50 #ifdef emacs | |
51 /* Converts the pointer to the char to BEG-based offset from the start. */ | |
52 #define PTR_TO_OFFSET(d) (MATCHING_IN_FIRST_STRING \ | |
53 ? (d) - string1 : (d) - (string2 - size1)) | |
54 #else | |
55 #define PTR_TO_OFFSET(d) 0 | |
56 #endif | 37 #endif |
57 | 38 |
58 /* We assume non-Mule if emacs isn't defined. */ | 39 /* We assume non-Mule if emacs isn't defined. */ |
59 #ifndef emacs | 40 #ifndef emacs |
60 #undef MULE | 41 #undef MULE |
118 { | 99 { |
119 } | 100 } |
120 | 101 |
121 #endif /* MULE */ | 102 #endif /* MULE */ |
122 | 103 |
123 #define RE_TRANSLATE(ch) TRT_TABLE_OF (translate, (Emchar) ch) | 104 #define RE_TRANSLATE_1(ch) TRT_TABLE_OF (translate, (Emchar) ch) |
124 #define TRANSLATE_P(tr) (!NILP (tr)) | 105 #define TRANSLATE_P(tr) (!NILP (tr)) |
106 | |
107 /* Converts the pointer to the char to BEG-based offset from the start. */ | |
108 #define PTR_TO_OFFSET(d) (MATCHING_IN_FIRST_STRING \ | |
109 ? (d) - string1 : (d) - (string2 - size1)) | |
110 | |
111 /* Convert an offset from the start of the logical text string formed by | |
112 concatenating the two strings together into a character position in the | |
113 Lisp buffer or string that the text represents. Knows that | |
114 when handling buffer text, the "string" we're passed in is always | |
115 BEGV - ZV. */ | |
116 | |
117 static Charxpos | |
118 offset_to_charxpos (Lisp_Object lispobj, int off) | |
119 { | |
120 if (STRINGP (lispobj)) | |
121 return string_index_byte_to_char (lispobj, off); | |
122 else if (BUFFERP (lispobj)) | |
123 return bytebpos_to_charbpos (XBUFFER (lispobj), | |
124 off + BYTE_BUF_BEGV (XBUFFER (lispobj))); | |
125 else | |
126 return 0; | |
127 } | |
125 | 128 |
126 #else /* not emacs */ | 129 #else /* not emacs */ |
127 | 130 |
128 /* If we are not linking with Emacs proper, | 131 /* If we are not linking with Emacs proper, |
129 we can't use the relocating allocator | 132 we can't use the relocating allocator |
137 #endif | 140 #endif |
138 #endif | 141 #endif |
139 | 142 |
140 #include <stdlib.h> | 143 #include <stdlib.h> |
141 | 144 |
142 #define charptr_emchar(str) ((Emchar) (str)[0]) | 145 #define charptr_emchar(str) ((Emchar) (str)[0]) |
146 #define charptr_emchar_fmt(str, fmt, object) ((Emchar) (str)[0]) | |
147 #define charptr_emchar_ascii_fmt(str, fmt, object) ((Emchar) (str)[0]) | |
143 | 148 |
144 #if (LONGBITS > INTBITS) | 149 #if (LONGBITS > INTBITS) |
145 # define EMACS_INT long | 150 # define EMACS_INT long |
146 #else | 151 #else |
147 # define EMACS_INT int | 152 # define EMACS_INT int |
148 #endif | 153 #endif |
149 | 154 |
150 typedef int Emchar; | 155 typedef int Emchar; |
151 | 156 |
152 #define INC_CHARPTR(p) ((p)++) | 157 #define INC_CHARPTR(p) ((p)++) |
158 #define INC_CHARPTR_FMT(p, fmt) ((p)++) | |
153 #define DEC_CHARPTR(p) ((p)--) | 159 #define DEC_CHARPTR(p) ((p)--) |
160 #define DEC_CHARPTR_FMT(p, fmt) ((p)--) | |
161 #define charptr_emchar_len(ptr) 1 | |
162 #define charptr_emchar_len_fmt(ptr, fmt) 1 | |
154 | 163 |
155 #include <string.h> | 164 #include <string.h> |
156 | 165 |
157 /* Define the syntax stuff for \<, \>, etc. */ | 166 /* Define the syntax stuff for \<, \>, etc. */ |
158 | 167 |
192 } | 201 } |
193 } | 202 } |
194 | 203 |
195 #endif /* SYNTAX_TABLE */ | 204 #endif /* SYNTAX_TABLE */ |
196 | 205 |
197 #define SYNTAX_UNSAFE(ignored, c) re_syntax_table[c] | 206 #define SYNTAX(ignored, c) re_syntax_table[c] |
198 #undef SYNTAX_FROM_CACHE | 207 #undef SYNTAX_FROM_CACHE |
199 #define SYNTAX_FROM_CACHE SYNTAX_UNSAFE | 208 #define SYNTAX_FROM_CACHE SYNTAX |
200 | 209 |
201 #define RE_TRANSLATE(c) translate[(unsigned char) (c)] | 210 #define RE_TRANSLATE_1(c) translate[(unsigned char) (c)] |
202 #define TRANSLATE_P(tr) tr | 211 #define TRANSLATE_P(tr) tr |
203 | 212 |
204 #endif /* emacs */ | 213 #endif /* emacs */ |
205 | 214 |
206 /* Under XEmacs, this is needed because we don't define it elsewhere. */ | 215 /* Under XEmacs, this is needed because we don't define it elsewhere. */ |
1138 matching routines; then we don't notice interrupts when they come | 1147 matching routines; then we don't notice interrupts when they come |
1139 in. So, Emacs blocks input around all regexp calls except the | 1148 in. So, Emacs blocks input around all regexp calls except the |
1140 matching calls, which it leaves unprotected, in the faith that they | 1149 matching calls, which it leaves unprotected, in the faith that they |
1141 will not malloc.]] This previous paragraph is irrelevant. | 1150 will not malloc.]] This previous paragraph is irrelevant. |
1142 | 1151 |
1143 XEmacs: We *do not* do anything so stupid as process input from | 1152 XEmacs: We *do not* do anything so stupid as process input from within a |
1144 within a signal handler. However, the regexp routines may get | 1153 signal handler. However, the regexp routines may get called reentrantly |
1145 called reentrantly as a result of QUIT processing (e.g. under | 1154 as a result of QUIT processing (e.g. under Windows: re_match -> QUIT -> |
1146 Windows: re_match -> QUIT -> quit_p -> drain events -> process | 1155 quit_p -> drain events -> process WM_INITMENU -> call filter -> |
1147 WM_INITMENU -> call filter -> re_match), so we cannot have any | 1156 re_match; see stack trace in signal.c), so we cannot have any global |
1148 global variables (unless we do lots of trickiness including some | 1157 variables (unless we do lots of trickiness including some |
1149 unwind-protects, which isn't worth it at this point). The first | 1158 unwind-protects, which isn't worth it at this point). The first |
1150 paragraph appears utterly garbled to me -- shouldn't *ANY* use of | 1159 paragraph appears utterly garbled to me -- shouldn't *ANY* use of |
1151 rel-alloc to different potentially cause buffer data to be | 1160 rel-alloc to different potentially cause buffer data to be relocated? I |
1152 relocated? I must be missing something, though -- perhaps the | 1161 must be missing something, though -- perhaps the writer above is |
1153 writer above is assuming that the failure stack(s) will always be | 1162 assuming that the failure stack(s) will always be allocated after the |
1154 allocated after the buffer data, and thus reallocating them with | 1163 buffer data, and thus reallocating them with rel-alloc won't move buffer |
1155 rel-alloc won't move buffer data. --ben */ | 1164 data. --ben */ |
1156 | 1165 |
1157 /* Normally, this is fine. */ | 1166 /* Normally, this is fine. */ |
1158 #define MATCH_MAY_ALLOCATE | 1167 #define MATCH_MAY_ALLOCATE |
1159 | 1168 |
1160 /* When using GNU C, we are not REALLY using the C alloca, no matter | 1169 /* When using GNU C, we are not REALLY using the C alloca, no matter |
1572 #define REG_UNSET(e) ((e) == REG_UNSET_VALUE) | 1581 #define REG_UNSET(e) ((e) == REG_UNSET_VALUE) |
1573 | 1582 |
1574 /* Subroutine declarations and macros for regex_compile. */ | 1583 /* Subroutine declarations and macros for regex_compile. */ |
1575 | 1584 |
1576 /* Fetch the next character in the uncompiled pattern---translating it | 1585 /* Fetch the next character in the uncompiled pattern---translating it |
1577 if necessary. Also cast from a signed character in the constant | 1586 if necessary. */ |
1578 string passed to us by the user to an unsigned char that we can use | |
1579 as an array index (in, e.g., `translate'). */ | |
1580 #define PATFETCH(c) \ | 1587 #define PATFETCH(c) \ |
1581 do { \ | 1588 do { \ |
1582 PATFETCH_RAW (c); \ | 1589 PATFETCH_RAW (c); \ |
1583 c = TRANSLATE (c); \ | 1590 c = RE_TRANSLATE (c); \ |
1584 } while (0) | 1591 } while (0) |
1585 | 1592 |
1586 /* Fetch the next character in the uncompiled pattern, with no | 1593 /* Fetch the next character in the uncompiled pattern, with no |
1587 translation. */ | 1594 translation. */ |
1588 #define PATFETCH_RAW(c) \ | 1595 #define PATFETCH_RAW(c) \ |
1593 } while (0) | 1600 } while (0) |
1594 | 1601 |
1595 /* Go backwards one character in the pattern. */ | 1602 /* Go backwards one character in the pattern. */ |
1596 #define PATUNFETCH DEC_CHARPTR (p) | 1603 #define PATUNFETCH DEC_CHARPTR (p) |
1597 | 1604 |
1598 #ifdef MULE | |
1599 | |
1600 #define PATFETCH_EXTENDED(emch) \ | |
1601 do {if (p == pend) return REG_EEND; \ | |
1602 assert (p < pend); \ | |
1603 emch = charptr_emchar ((const Intbyte *) p); \ | |
1604 INC_CHARPTR (p); \ | |
1605 if (TRANSLATE_P (translate) && emch < 0x80) \ | |
1606 emch = (Emchar) (unsigned char) RE_TRANSLATE (emch); \ | |
1607 } while (0) | |
1608 | |
1609 #define PATFETCH_RAW_EXTENDED(emch) \ | |
1610 do {if (p == pend) return REG_EEND; \ | |
1611 assert (p < pend); \ | |
1612 emch = charptr_emchar ((const Intbyte *) p); \ | |
1613 INC_CHARPTR (p); \ | |
1614 } while (0) | |
1615 | |
1616 #define PATUNFETCH_EXTENDED DEC_CHARPTR (p) | |
1617 | |
1618 #define PATFETCH_EITHER(emch) \ | |
1619 do { \ | |
1620 if (has_extended_chars) \ | |
1621 PATFETCH_EXTENDED (emch); \ | |
1622 else \ | |
1623 PATFETCH (emch); \ | |
1624 } while (0) | |
1625 | |
1626 #define PATFETCH_RAW_EITHER(emch) \ | |
1627 do { \ | |
1628 if (has_extended_chars) \ | |
1629 PATFETCH_RAW_EXTENDED (emch); \ | |
1630 else \ | |
1631 PATFETCH_RAW (emch); \ | |
1632 } while (0) | |
1633 | |
1634 #define PATUNFETCH_EITHER \ | |
1635 do { \ | |
1636 if (has_extended_chars) \ | |
1637 PATUNFETCH_EXTENDED (emch); \ | |
1638 else \ | |
1639 PATUNFETCH (emch); \ | |
1640 } while (0) | |
1641 | |
1642 #else /* not MULE */ | |
1643 | |
1644 #define PATFETCH_EITHER(emch) PATFETCH (emch) | |
1645 #define PATFETCH_RAW_EITHER(emch) PATFETCH_RAW (emch) | |
1646 #define PATUNFETCH_EITHER PATUNFETCH | |
1647 | |
1648 #endif /* MULE */ | |
1649 | |
1650 /* If `translate' is non-null, return translate[D], else just D. We | 1605 /* If `translate' is non-null, return translate[D], else just D. We |
1651 cast the subscript to translate because some data is declared as | 1606 cast the subscript to translate because some data is declared as |
1652 `char *', to avoid warnings when a string constant is passed. But | 1607 `char *', to avoid warnings when a string constant is passed. But |
1653 when we use a character as a subscript we must make it unsigned. */ | 1608 when we use a character as a subscript we must make it unsigned. */ |
1654 #define TRANSLATE(d) (TRANSLATE_P (translate) ? RE_TRANSLATE (d) : (d)) | 1609 #define RE_TRANSLATE(d) \ |
1655 | 1610 (TRANSLATE_P (translate) ? RE_TRANSLATE_1 (d) : (d)) |
1656 #ifdef MULE | |
1657 | |
1658 #define TRANSLATE_EXTENDED_UNSAFE(emch) \ | |
1659 (TRANSLATE_P (translate) && emch < 0x80 ? RE_TRANSLATE (emch) : (emch)) | |
1660 | |
1661 #endif | |
1662 | 1611 |
1663 /* Macros for outputting the compiled pattern into `buffer'. */ | 1612 /* Macros for outputting the compiled pattern into `buffer'. */ |
1664 | 1613 |
1665 /* If the buffer isn't allocated when it comes in, use this. */ | 1614 /* If the buffer isn't allocated when it comes in, use this. */ |
1666 #define INIT_BUF_SIZE 32 | 1615 #define INIT_BUF_SIZE 32 |
1727 reset the pointers that pointed into the old block to point to the | 1676 reset the pointers that pointed into the old block to point to the |
1728 correct places in the new one. If extending the buffer results in it | 1677 correct places in the new one. If extending the buffer results in it |
1729 being larger than MAX_BUF_SIZE, then flag memory exhausted. */ | 1678 being larger than MAX_BUF_SIZE, then flag memory exhausted. */ |
1730 #define EXTEND_BUFFER() \ | 1679 #define EXTEND_BUFFER() \ |
1731 do { \ | 1680 do { \ |
1732 re_char *old_buffer = bufp->buffer; \ | 1681 re_char *old_buffer = bufp->buffer; \ |
1733 if (bufp->allocated == MAX_BUF_SIZE) \ | 1682 if (bufp->allocated == MAX_BUF_SIZE) \ |
1734 return REG_ESIZE; \ | 1683 return REG_ESIZE; \ |
1735 bufp->allocated <<= 1; \ | 1684 bufp->allocated <<= 1; \ |
1736 if (bufp->allocated > MAX_BUF_SIZE) \ | 1685 if (bufp->allocated > MAX_BUF_SIZE) \ |
1737 bufp->allocated = MAX_BUF_SIZE; \ | 1686 bufp->allocated = MAX_BUF_SIZE; \ |
1881 static re_bool alt_match_null_string_p (unsigned char *p, unsigned char *end, | 1830 static re_bool alt_match_null_string_p (unsigned char *p, unsigned char *end, |
1882 register_info_type *reg_info); | 1831 register_info_type *reg_info); |
1883 static re_bool common_op_match_null_string_p (unsigned char **p, | 1832 static re_bool common_op_match_null_string_p (unsigned char **p, |
1884 unsigned char *end, | 1833 unsigned char *end, |
1885 register_info_type *reg_info); | 1834 register_info_type *reg_info); |
1886 static int bcmp_translate (const unsigned char *s1, const unsigned char *s2, | 1835 static int bcmp_translate (re_char *s1, re_char *s2, |
1887 REGISTER int len, RE_TRANSLATE_TYPE translate); | 1836 REGISTER int len, RE_TRANSLATE_TYPE translate |
1837 #ifdef emacs | |
1838 , Internal_Format fmt, Lisp_Object lispobj | |
1839 #endif | |
1840 ); | |
1888 static int re_match_2_internal (struct re_pattern_buffer *bufp, | 1841 static int re_match_2_internal (struct re_pattern_buffer *bufp, |
1889 re_char *string1, int size1, | 1842 re_char *string1, int size1, |
1890 re_char *string2, int size2, int pos, | 1843 re_char *string2, int size2, int pos, |
1891 struct re_registers *regs, int stop); | 1844 struct re_registers *regs, int stop |
1845 RE_LISP_CONTEXT_ARGS_DECL); | |
1892 | 1846 |
1893 #ifndef MATCH_MAY_ALLOCATE | 1847 #ifndef MATCH_MAY_ALLOCATE |
1894 | 1848 |
1895 /* If we cannot allocate large objects within re_match_2_internal, | 1849 /* If we cannot allocate large objects within re_match_2_internal, |
1896 we make the fail stack and register vectors global. | 1850 we make the fail stack and register vectors global. |
3180 default: | 3134 default: |
3181 normal_backslash: | 3135 normal_backslash: |
3182 /* You might think it would be useful for \ to mean | 3136 /* You might think it would be useful for \ to mean |
3183 not to translate; but if we don't translate it, | 3137 not to translate; but if we don't translate it, |
3184 it will never match anything. */ | 3138 it will never match anything. */ |
3185 c = TRANSLATE (c); | 3139 c = RE_TRANSLATE (c); |
3186 goto normal_char; | 3140 goto normal_char; |
3187 } | 3141 } |
3188 break; | 3142 break; |
3189 | 3143 |
3190 | 3144 |
3437 ending characters (inclusive) in the compiled pattern B. | 3391 ending characters (inclusive) in the compiled pattern B. |
3438 | 3392 |
3439 Return an error code. | 3393 Return an error code. |
3440 | 3394 |
3441 We use these short variable names so we can use the same macros as | 3395 We use these short variable names so we can use the same macros as |
3442 `regex_compile' itself. */ | 3396 `regex_compile' itself. |
3397 | |
3398 Under Mule, this is only called when both chars of the range are | |
3399 ASCII. */ | |
3443 | 3400 |
3444 static reg_errcode_t | 3401 static reg_errcode_t |
3445 compile_range (re_char **p_ptr, re_char *pend, RE_TRANSLATE_TYPE translate, | 3402 compile_range (re_char **p_ptr, re_char *pend, RE_TRANSLATE_TYPE translate, |
3446 reg_syntax_t syntax, unsigned char *buf_end) | 3403 reg_syntax_t syntax, unsigned char *buf_end) |
3447 { | 3404 { |
3476 char' -- the range is inclusive, so if `range_end' == 0xff | 3433 char' -- the range is inclusive, so if `range_end' == 0xff |
3477 (assuming 8-bit characters), we would otherwise go into an infinite | 3434 (assuming 8-bit characters), we would otherwise go into an infinite |
3478 loop, since all characters <= 0xff. */ | 3435 loop, since all characters <= 0xff. */ |
3479 for (this_char = range_start; this_char <= range_end; this_char++) | 3436 for (this_char = range_start; this_char <= range_end; this_char++) |
3480 { | 3437 { |
3481 SET_LIST_BIT (TRANSLATE (this_char)); | 3438 SET_LIST_BIT (RE_TRANSLATE (this_char)); |
3482 } | 3439 } |
3483 | 3440 |
3484 return REG_NOERROR; | 3441 return REG_NOERROR; |
3485 } | 3442 } |
3486 | 3443 |
3512 | 3469 |
3513 /* Can't have ranges spanning different charsets, except maybe for | 3470 /* Can't have ranges spanning different charsets, except maybe for |
3514 ranges entirely within the first 256 chars. */ | 3471 ranges entirely within the first 256 chars. */ |
3515 | 3472 |
3516 if ((range_start >= 0x100 || range_end >= 0x100) | 3473 if ((range_start >= 0x100 || range_end >= 0x100) |
3517 && CHAR_LEADING_BYTE (range_start) != | 3474 && emchar_leading_byte (range_start) != |
3518 CHAR_LEADING_BYTE (range_end)) | 3475 emchar_leading_byte (range_end)) |
3519 return REG_ERANGESPAN; | 3476 return REG_ERANGESPAN; |
3520 | 3477 |
3521 /* As advertised, translations only work over the 0 - 0x7F range. | 3478 /* #### This might be way inefficient if the range encompasses 10,000 |
3522 Making this kind of stuff work generally is much harder. | 3479 chars or something. To be efficient, you'd have to do something like |
3523 Iterating over the whole range like this would be way efficient | 3480 this: |
3524 if the range encompasses 10,000 chars or something. You'd have | |
3525 to do something like this: | |
3526 | 3481 |
3527 range_table a; | 3482 range_table a; |
3528 range_table b; | 3483 range_table b; |
3529 map over translation table in [range_start, range_end] of | 3484 map over translation table in [range_start, range_end] of |
3530 (put the mapped range in a; | 3485 (put the mapped range in a; |
3531 put the translation in b) | 3486 put the translation in b) |
3532 invert the range in a and truncate to [range_start, range_end] | 3487 invert the range in a and truncate to [range_start, range_end] |
3533 compute the union of a, b | 3488 compute the union of a, b |
3534 union the result into rtab | 3489 union the result into rtab |
3535 */ | 3490 */ |
3536 for (this_char = range_start; | 3491 for (this_char = range_start; this_char <= range_end; this_char++) |
3537 this_char <= range_end && this_char < 0x80; this_char++) | |
3538 { | 3492 { |
3539 SET_RANGETAB_BIT (TRANSLATE (this_char)); | 3493 SET_RANGETAB_BIT (RE_TRANSLATE (this_char)); |
3540 } | 3494 } |
3541 | 3495 |
3542 if (this_char <= range_end) | 3496 if (this_char <= range_end) |
3543 put_range_table (rtab, this_char, range_end, Qt); | 3497 put_range_table (rtab, this_char, range_end, Qt); |
3544 | 3498 |
3559 the pattern buffer. | 3513 the pattern buffer. |
3560 | 3514 |
3561 Returns 0 if we succeed, -2 if an internal error. */ | 3515 Returns 0 if we succeed, -2 if an internal error. */ |
3562 | 3516 |
3563 int | 3517 int |
3564 re_compile_fastmap (struct re_pattern_buffer *bufp) | 3518 re_compile_fastmap (struct re_pattern_buffer *bufp |
3519 RE_LISP_SHORT_CONTEXT_ARGS_DECL) | |
3565 { | 3520 { |
3566 int j, k; | 3521 int j, k; |
3567 #ifdef MATCH_MAY_ALLOCATE | 3522 #ifdef MATCH_MAY_ALLOCATE |
3568 fail_stack_type fail_stack; | 3523 fail_stack_type fail_stack; |
3569 #endif | 3524 #endif |
3570 DECLARE_DESTINATION; | 3525 DECLARE_DESTINATION; |
3571 /* We don't push any register information onto the failure stack. */ | 3526 /* We don't push any register information onto the failure stack. */ |
3572 | 3527 |
3528 /* &&#### this should be changed for 8-bit-fixed, for efficiency. see | |
3529 comment marked with &&#### in re_search_2. */ | |
3530 | |
3573 REGISTER char *fastmap = bufp->fastmap; | 3531 REGISTER char *fastmap = bufp->fastmap; |
3574 unsigned char *pattern = bufp->buffer; | 3532 unsigned char *pattern = bufp->buffer; |
3575 long size = bufp->used; | 3533 long size = bufp->used; |
3576 unsigned char *p = pattern; | 3534 unsigned char *p = pattern; |
3577 REGISTER unsigned char *pend = pattern + size; | 3535 REGISTER unsigned char *pend = pattern + size; |
3730 } | 3688 } |
3731 break; | 3689 break; |
3732 #endif /* MULE */ | 3690 #endif /* MULE */ |
3733 | 3691 |
3734 | 3692 |
3735 case wordchar: | |
3736 #ifdef emacs | |
3737 k = (int) Sword; | |
3738 goto matchsyntax; | |
3739 #else | |
3740 for (j = 0; j < (1 << BYTEWIDTH); j++) | |
3741 if (SYNTAX_UNSAFE | |
3742 (XCHAR_TABLE | |
3743 (regex_emacs_buffer->mirror_syntax_table), j) == Sword) | |
3744 fastmap[j] = 1; | |
3745 break; | |
3746 #endif | |
3747 | |
3748 | |
3749 case notwordchar: | |
3750 #ifdef emacs | |
3751 k = (int) Sword; | |
3752 goto matchnotsyntax; | |
3753 #else | |
3754 for (j = 0; j < (1 << BYTEWIDTH); j++) | |
3755 if (SYNTAX_UNSAFE | |
3756 (XCHAR_TABLE | |
3757 (regex_emacs_buffer->mirror_syntax_table), j) != Sword) | |
3758 fastmap[j] = 1; | |
3759 break; | |
3760 #endif | |
3761 | |
3762 | |
3763 case anychar: | 3693 case anychar: |
3764 { | 3694 { |
3765 int fastmap_newline = fastmap['\n']; | 3695 int fastmap_newline = fastmap['\n']; |
3766 | 3696 |
3767 /* `.' matches anything ... */ | 3697 /* `.' matches anything ... */ |
3786 | 3716 |
3787 /* Otherwise, have to check alternative paths. */ | 3717 /* Otherwise, have to check alternative paths. */ |
3788 break; | 3718 break; |
3789 } | 3719 } |
3790 | 3720 |
3791 #ifdef emacs | 3721 #ifndef emacs |
3722 case wordchar: | |
3723 for (j = 0; j < (1 << BYTEWIDTH); j++) | |
3724 if (SYNTAX (ignored, j) == Sword) | |
3725 fastmap[j] = 1; | |
3726 break; | |
3727 | |
3728 case notwordchar: | |
3729 for (j = 0; j < (1 << BYTEWIDTH); j++) | |
3730 if (SYNTAX (ignored, j) != Sword) | |
3731 fastmap[j] = 1; | |
3732 break; | |
3733 #else /* emacs */ | |
3734 case wordchar: | |
3735 case notwordchar: | |
3792 case wordbound: | 3736 case wordbound: |
3793 case notwordbound: | 3737 case notwordbound: |
3794 case wordbeg: | 3738 case wordbeg: |
3795 case wordend: | 3739 case wordend: |
3796 case notsyntaxspec: | 3740 case notsyntaxspec: |
3797 case syntaxspec: | 3741 case syntaxspec: |
3798 /* This match depends on text properties. These end with | 3742 /* This match depends on text properties. These end with |
3799 aborting optimizations. */ | 3743 aborting optimizations. */ |
3800 bufp->can_be_null = 1; | 3744 bufp->can_be_null = 1; |
3801 goto done; | 3745 goto done; |
3802 | 3746 #if 0 /* all of the following code is unused now that the `syntax-table' |
3803 #ifdef emacs | 3747 property exists -- it's trickier to do this than just look in |
3804 #if 0 /* Removed during syntax-table properties patch -- 2000/12/07 mct */ | 3748 the buffer. &&#### but we could just use the syntax-cache stuff |
3749 instead; why don't we? --ben */ | |
3750 case wordchar: | |
3751 k = (int) Sword; | |
3752 goto matchsyntax; | |
3753 | |
3754 case notwordchar: | |
3755 k = (int) Sword; | |
3756 goto matchnotsyntax; | |
3757 | |
3805 case syntaxspec: | 3758 case syntaxspec: |
3806 k = *p++; | 3759 k = *p++; |
3807 #endif | 3760 matchsyntax: |
3808 matchsyntax: | |
3809 #ifdef MULE | 3761 #ifdef MULE |
3810 for (j = 0; j < 0x80; j++) | 3762 for (j = 0; j < 0x80; j++) |
3811 if (SYNTAX_UNSAFE | 3763 if (SYNTAX |
3812 (XCHAR_TABLE | 3764 (XCHAR_TABLE (BUFFER_MIRROR_SYNTAX_TABLE (lispbuf)), j) == |
3813 (regex_emacs_buffer->mirror_syntax_table), j) == | |
3814 (enum syntaxcode) k) | 3765 (enum syntaxcode) k) |
3815 fastmap[j] = 1; | 3766 fastmap[j] = 1; |
3816 for (j = 0x80; j < 0xA0; j++) | 3767 for (j = 0x80; j < 0xA0; j++) |
3817 { | 3768 { |
3818 if (LEADING_BYTE_PREFIX_P((unsigned char) j)) | 3769 if (leading_byte_prefix_p ((unsigned char) j)) |
3819 /* too complicated to calculate this right */ | 3770 /* too complicated to calculate this right */ |
3820 fastmap[j] = 1; | 3771 fastmap[j] = 1; |
3821 else | 3772 else |
3822 { | 3773 { |
3823 int multi_p; | 3774 int multi_p; |
3824 Lisp_Object cset; | 3775 Lisp_Object cset; |
3825 | 3776 |
3826 cset = CHARSET_BY_LEADING_BYTE (j); | 3777 cset = charset_by_leading_byte (j); |
3827 if (CHARSETP (cset)) | 3778 if (CHARSETP (cset)) |
3828 { | 3779 { |
3829 if (charset_syntax (regex_emacs_buffer, cset, | 3780 if (charset_syntax (lispbuf, cset, &multi_p) |
3830 &multi_p) | |
3831 == Sword || multi_p) | 3781 == Sword || multi_p) |
3832 fastmap[j] = 1; | 3782 fastmap[j] = 1; |
3833 } | 3783 } |
3834 } | 3784 } |
3835 } | 3785 } |
3836 #else /* not MULE */ | 3786 #else /* not MULE */ |
3837 for (j = 0; j < (1 << BYTEWIDTH); j++) | 3787 for (j = 0; j < (1 << BYTEWIDTH); j++) |
3838 if (SYNTAX_UNSAFE | 3788 if (SYNTAX |
3839 (XCHAR_TABLE | 3789 (XCHAR_TABLE (BUFFER_MIRROR_SYNTAX_TABLE (lispbuf)), j) == |
3840 (regex_emacs_buffer->mirror_syntax_table), j) == | |
3841 (enum syntaxcode) k) | 3790 (enum syntaxcode) k) |
3842 fastmap[j] = 1; | 3791 fastmap[j] = 1; |
3843 #endif /* MULE */ | 3792 #endif /* MULE */ |
3844 break; | 3793 break; |
3845 | 3794 |
3846 | 3795 |
3847 #if 0 /* Removed during syntax-table properties patch -- 2000/12/07 mct */ | |
3848 case notsyntaxspec: | 3796 case notsyntaxspec: |
3849 k = *p++; | 3797 k = *p++; |
3850 #endif | 3798 matchnotsyntax: |
3851 matchnotsyntax: | |
3852 #ifdef MULE | 3799 #ifdef MULE |
3853 for (j = 0; j < 0x80; j++) | 3800 for (j = 0; j < 0x80; j++) |
3854 if (SYNTAX_UNSAFE | 3801 if (SYNTAX |
3855 (XCHAR_TABLE | 3802 (XCHAR_TABLE |
3856 (regex_emacs_buffer->mirror_syntax_table), j) != | 3803 (BUFFER_MIRROR_SYNTAX_TABLE (lispbuf)), j) != |
3857 (enum syntaxcode) k) | 3804 (enum syntaxcode) k) |
3858 fastmap[j] = 1; | 3805 fastmap[j] = 1; |
3859 for (j = 0x80; j < 0xA0; j++) | 3806 for (j = 0x80; j < 0xA0; j++) |
3860 { | 3807 { |
3861 if (LEADING_BYTE_PREFIX_P((unsigned char) j)) | 3808 if (leading_byte_prefix_p ((unsigned char) j)) |
3862 /* too complicated to calculate this right */ | 3809 /* too complicated to calculate this right */ |
3863 fastmap[j] = 1; | 3810 fastmap[j] = 1; |
3864 else | 3811 else |
3865 { | 3812 { |
3866 int multi_p; | 3813 int multi_p; |
3867 Lisp_Object cset; | 3814 Lisp_Object cset; |
3868 | 3815 |
3869 cset = CHARSET_BY_LEADING_BYTE (j); | 3816 cset = charset_by_leading_byte (j); |
3870 if (CHARSETP (cset)) | 3817 if (CHARSETP (cset)) |
3871 { | 3818 { |
3872 if (charset_syntax (regex_emacs_buffer, cset, | 3819 if (charset_syntax (lispbuf, cset, &multi_p) |
3873 &multi_p) | |
3874 != Sword || multi_p) | 3820 != Sword || multi_p) |
3875 fastmap[j] = 1; | 3821 fastmap[j] = 1; |
3876 } | 3822 } |
3877 } | 3823 } |
3878 } | 3824 } |
3879 #else /* not MULE */ | 3825 #else /* not MULE */ |
3880 for (j = 0; j < (1 << BYTEWIDTH); j++) | 3826 for (j = 0; j < (1 << BYTEWIDTH); j++) |
3881 if (SYNTAX_UNSAFE | 3827 if (SYNTAX |
3882 (XCHAR_TABLE | 3828 (XCHAR_TABLE |
3883 (regex_emacs_buffer->mirror_syntax_table), j) != | 3829 (BUFFER_MIRROR_SYNTAX_TABLE (lispbuf)), j) != |
3884 (enum syntaxcode) k) | 3830 (enum syntaxcode) k) |
3885 fastmap[j] = 1; | 3831 fastmap[j] = 1; |
3886 #endif /* MULE */ | 3832 #endif /* MULE */ |
3887 break; | 3833 break; |
3888 #endif /* emacs */ | 3834 #endif /* 0 */ |
3889 | 3835 |
3890 #ifdef MULE | 3836 #ifdef MULE |
3891 /* 97/2/17 jhod category patch */ | 3837 /* 97/2/17 jhod category patch */ |
3892 case categoryspec: | 3838 case categoryspec: |
3893 case notcategoryspec: | 3839 case notcategoryspec: |
3896 /* end if category patch */ | 3842 /* end if category patch */ |
3897 #endif /* MULE */ | 3843 #endif /* MULE */ |
3898 | 3844 |
3899 /* All cases after this match the empty string. These end with | 3845 /* All cases after this match the empty string. These end with |
3900 `continue'. */ | 3846 `continue'. */ |
3901 | |
3902 | |
3903 case before_dot: | 3847 case before_dot: |
3904 case at_dot: | 3848 case at_dot: |
3905 case after_dot: | 3849 case after_dot: |
3906 continue; | 3850 continue; |
3907 #endif /* not emacs */ | 3851 #endif /* emacs */ |
3908 | 3852 |
3909 | 3853 |
3910 case no_op: | 3854 case no_op: |
3911 case begline: | 3855 case begline: |
3912 case endline: | 3856 case endline: |
4072 /* Like re_search_2, below, but only one string is specified, and | 4016 /* Like re_search_2, below, but only one string is specified, and |
4073 doesn't let you say where to stop matching. */ | 4017 doesn't let you say where to stop matching. */ |
4074 | 4018 |
4075 int | 4019 int |
4076 re_search (struct re_pattern_buffer *bufp, const char *string, int size, | 4020 re_search (struct re_pattern_buffer *bufp, const char *string, int size, |
4077 int startpos, int range, struct re_registers *regs) | 4021 int startpos, int range, struct re_registers *regs |
4022 RE_LISP_CONTEXT_ARGS_DECL) | |
4078 { | 4023 { |
4079 return re_search_2 (bufp, NULL, 0, string, size, startpos, range, | 4024 return re_search_2 (bufp, NULL, 0, string, size, startpos, range, |
4080 regs, size); | 4025 regs, size RE_LISP_CONTEXT_ARGS); |
4081 } | 4026 } |
4082 | |
4083 #ifndef emacs | |
4084 /* Snarfed from src/lisp.h, needed for compiling [ce]tags. */ | |
4085 # define bytecount_to_charcount(ptr, len) (len) | |
4086 # define charcount_to_bytecount(ptr, len) (len) | |
4087 typedef int Charcount; | |
4088 #endif | |
4089 | 4027 |
4090 /* Using the compiled pattern in BUFP->buffer, first tries to match the | 4028 /* Using the compiled pattern in BUFP->buffer, first tries to match the |
4091 virtual concatenation of STRING1 and STRING2, starting first at index | 4029 virtual concatenation of STRING1 and STRING2, starting first at index |
4092 STARTPOS, then at STARTPOS + 1, and so on. | 4030 STARTPOS, then at STARTPOS + 1, and so on. |
4093 | 4031 |
4094 With MULE, STARTPOS is a byte position, not a char position. And the | |
4095 search will increment STARTPOS by the width of the current leading | |
4096 character. | |
4097 | |
4098 STRING1 and STRING2 have length SIZE1 and SIZE2, respectively. | 4032 STRING1 and STRING2 have length SIZE1 and SIZE2, respectively. |
4099 | 4033 |
4100 RANGE is how far to scan while trying to match. RANGE = 0 means try | 4034 RANGE is how far to scan while trying to match. RANGE = 0 means try |
4101 only at STARTPOS; in general, the last start tried is STARTPOS + | 4035 only at STARTPOS; in general, the last start tried is STARTPOS + |
4102 RANGE. | 4036 RANGE. |
4103 | 4037 |
4038 All sizes and positions refer to bytes (not chars); under Mule, the code | |
4039 knows about the format of the text and will only check at positions | |
4040 where a character starts. | |
4041 | |
4104 With MULE, RANGE is a byte position, not a char position. The last | 4042 With MULE, RANGE is a byte position, not a char position. The last |
4105 start tried is the character starting <= STARTPOS + RANGE. | 4043 start tried is the character starting <= STARTPOS + RANGE. |
4106 | 4044 |
4107 In REGS, return the indices of the virtual concatenation of STRING1 | 4045 In REGS, return the indices of the virtual concatenation of STRING1 |
4108 and STRING2 that matched the entire BUFP->buffer and its contained | 4046 and STRING2 that matched the entire BUFP->buffer and its contained |
4116 stack overflow). */ | 4054 stack overflow). */ |
4117 | 4055 |
4118 int | 4056 int |
4119 re_search_2 (struct re_pattern_buffer *bufp, const char *str1, | 4057 re_search_2 (struct re_pattern_buffer *bufp, const char *str1, |
4120 int size1, const char *str2, int size2, int startpos, | 4058 int size1, const char *str2, int size2, int startpos, |
4121 int range, struct re_registers *regs, int stop) | 4059 int range, struct re_registers *regs, int stop |
4060 RE_LISP_CONTEXT_ARGS_DECL) | |
4122 { | 4061 { |
4123 int val; | 4062 int val; |
4124 re_char *string1 = (re_char *) str1; | 4063 re_char *string1 = (re_char *) str1; |
4125 re_char *string2 = (re_char *) str2; | 4064 re_char *string2 = (re_char *) str2; |
4126 REGISTER char *fastmap = bufp->fastmap; | 4065 REGISTER char *fastmap = bufp->fastmap; |
4129 int endpos = startpos + range; | 4068 int endpos = startpos + range; |
4130 #ifdef REGEX_BEGLINE_CHECK | 4069 #ifdef REGEX_BEGLINE_CHECK |
4131 int anchored_at_begline = 0; | 4070 int anchored_at_begline = 0; |
4132 #endif | 4071 #endif |
4133 re_char *d; | 4072 re_char *d; |
4134 Charcount d_size; | 4073 #ifdef emacs |
4074 Internal_Format fmt = buffer_or_other_internal_format (lispobj); | |
4075 #endif /* emacs */ | |
4076 #if 1 | |
4077 int forward_search_p; | |
4078 #endif | |
4135 | 4079 |
4136 /* Check for out-of-range STARTPOS. */ | 4080 /* Check for out-of-range STARTPOS. */ |
4137 if (startpos < 0 || startpos > total_size) | 4081 if (startpos < 0 || startpos > total_size) |
4138 return -1; | 4082 return -1; |
4139 | 4083 |
4141 the virtual concatenation of STRING1 and STRING2. */ | 4085 the virtual concatenation of STRING1 and STRING2. */ |
4142 if (endpos < 0) | 4086 if (endpos < 0) |
4143 range = 0 - startpos; | 4087 range = 0 - startpos; |
4144 else if (endpos > total_size) | 4088 else if (endpos > total_size) |
4145 range = total_size - startpos; | 4089 range = total_size - startpos; |
4090 | |
4091 #if 1 | |
4092 forward_search_p = range > 0; | |
4093 #endif | |
4146 | 4094 |
4147 /* If the search isn't to be a backwards one, don't waste time in a | 4095 /* If the search isn't to be a backwards one, don't waste time in a |
4148 search for a pattern that must be anchored. */ | 4096 search for a pattern that must be anchored. */ |
4149 if (bufp->used > 0 && (re_opcode_t) bufp->buffer[0] == begbuf && range > 0) | 4097 if (bufp->used > 0 && (re_opcode_t) bufp->buffer[0] == begbuf && range > 0) |
4150 { | 4098 { |
4152 return -1; | 4100 return -1; |
4153 else | 4101 else |
4154 { | 4102 { |
4155 d = ((const unsigned char *) | 4103 d = ((const unsigned char *) |
4156 (startpos >= size1 ? string2 - size1 : string1) + startpos); | 4104 (startpos >= size1 ? string2 - size1 : string1) + startpos); |
4157 range = charcount_to_bytecount (d, 1); | 4105 range = charptr_emchar_len_fmt (d, fmt); |
4158 } | 4106 } |
4159 } | 4107 } |
4160 | 4108 |
4161 #ifdef emacs | 4109 #ifdef emacs |
4162 /* In a forward search for something that starts with \=. | 4110 /* In a forward search for something that starts with \=. |
4163 don't keep searching past point. */ | 4111 don't keep searching past point. */ |
4164 if (bufp->used > 0 && (re_opcode_t) bufp->buffer[0] == at_dot && range > 0) | 4112 if (bufp->used > 0 && (re_opcode_t) bufp->buffer[0] == at_dot && range > 0) |
4165 { | 4113 { |
4166 range = BUF_PT (regex_emacs_buffer) - BUF_BEGV (regex_emacs_buffer) | 4114 if (!BUFFERP (lispobj)) |
4167 - startpos; | 4115 return -1; |
4116 range = (BUF_PT (XBUFFER (lispobj)) - BUF_BEGV (XBUFFER (lispobj)) | |
4117 - startpos); | |
4168 if (range < 0) | 4118 if (range < 0) |
4169 return -1; | 4119 return -1; |
4170 } | 4120 } |
4171 #endif /* emacs */ | 4121 #endif /* emacs */ |
4172 | 4122 |
4173 /* Update the fastmap now if not correct already. */ | 4123 /* Update the fastmap now if not correct already. */ |
4174 if (fastmap && !bufp->fastmap_accurate) | 4124 if (fastmap && !bufp->fastmap_accurate) |
4175 if (re_compile_fastmap (bufp) == -2) | 4125 if (re_compile_fastmap (bufp RE_LISP_SHORT_CONTEXT_ARGS) == -2) |
4176 return -2; | 4126 return -2; |
4177 | 4127 |
4178 #ifdef REGEX_BEGLINE_CHECK | 4128 #ifdef REGEX_BEGLINE_CHECK |
4179 { | 4129 { |
4180 long i = 0; | 4130 long i = 0; |
4190 anchored_at_begline = i < bufp->used && bufp->buffer[i] == begline; | 4140 anchored_at_begline = i < bufp->used && bufp->buffer[i] == begline; |
4191 } | 4141 } |
4192 #endif | 4142 #endif |
4193 | 4143 |
4194 #ifdef emacs | 4144 #ifdef emacs |
4195 SETUP_SYNTAX_CACHE_FOR_OBJECT (regex_match_object, | 4145 scache = setup_syntax_cache (scache, lispobj, lispbuf, |
4196 regex_emacs_buffer, | 4146 offset_to_charxpos (lispobj, startpos), |
4197 SYNTAX_CACHE_OBJECT_BYTE_TO_CHAR (regex_match_object, | 4147 1); |
4198 regex_emacs_buffer, | |
4199 startpos), | |
4200 1); | |
4201 #endif | 4148 #endif |
4202 | 4149 |
4203 /* Loop through the string, looking for a place to start matching. */ | 4150 /* Loop through the string, looking for a place to start matching. */ |
4204 for (;;) | 4151 for (;;) |
4205 { | 4152 { |
4206 #ifdef REGEX_BEGLINE_CHECK | 4153 #ifdef REGEX_BEGLINE_CHECK |
4207 /* If the regex is anchored at the beginning of a line (i.e. with a ^), | 4154 /* If the regex is anchored at the beginning of a line (i.e. with a |
4208 then we can speed things up by skipping to the next beginning-of- | 4155 ^), then we can speed things up by skipping to the next |
4209 line. */ | 4156 beginning-of-line. However, to determine "beginning of line" we |
4210 if (anchored_at_begline && startpos > 0 && startpos != size1 && | 4157 need to look at the previous char, so can't do this check if at |
4211 range > 0) | 4158 beginning of either string. (Well, we could if at the beginning of |
4159 the second string, but it would require additional code, and this | |
4160 is just an optimization.) */ | |
4161 if (anchored_at_begline && startpos > 0 && startpos != size1) | |
4212 { | 4162 { |
4213 /* whose stupid idea was it anyway to make this | 4163 if (range > 0) |
4214 function take two strings to match?? */ | 4164 { |
4215 int lim = 0; | 4165 /* whose stupid idea was it anyway to make this |
4216 int irange = range; | 4166 function take two strings to match?? */ |
4217 | 4167 int lim = 0; |
4218 if (startpos < size1 && startpos + range >= size1) | 4168 re_char *orig_d; |
4219 lim = range - (size1 - startpos); | 4169 re_char *stop_d; |
4220 | 4170 |
4221 d = ((const unsigned char *) | 4171 /* Compute limit as below in fastmap code, so we are guaranteed |
4222 (startpos >= size1 ? string2 - size1 : string1) + startpos); | 4172 to remain within a single string. */ |
4223 DEC_CHARPTR(d); /* Ok, since startpos != size1. */ | 4173 if (startpos < size1 && startpos + range >= size1) |
4224 d_size = charcount_to_bytecount (d, 1); | 4174 lim = range - (size1 - startpos); |
4225 | 4175 |
4226 if (TRANSLATE_P (translate)) | 4176 d = ((const unsigned char *) |
4227 while (range > lim && *d != '\n') | 4177 (startpos >= size1 ? string2 - size1 : string1) + startpos); |
4228 { | 4178 orig_d = d; |
4229 d += d_size; /* Speedier INC_CHARPTR(d) */ | 4179 stop_d = d + range - lim; |
4230 d_size = charcount_to_bytecount (d, 1); | 4180 |
4231 range -= d_size; | 4181 /* We want to find the next location (including the current |
4232 } | 4182 one) where the previous char is a newline, so back up one |
4233 else | 4183 and search forward for a newline. */ |
4234 while (range > lim && *d != '\n') | 4184 DEC_CHARPTR_FMT (d, fmt); /* Ok, since startpos != size1. */ |
4235 { | 4185 |
4236 d += d_size; /* Speedier INC_CHARPTR(d) */ | 4186 /* Written out as an if-else to avoid testing `translate' |
4237 d_size = charcount_to_bytecount (d, 1); | 4187 inside the loop. */ |
4238 range -= d_size; | 4188 if (TRANSLATE_P (translate)) |
4239 } | 4189 while (d < stop_d && |
4240 | 4190 RE_TRANSLATE_1 (charptr_emchar_fmt (d, fmt, lispobj)) |
4241 startpos += irange - range; | 4191 != '\n') |
4192 INC_CHARPTR_FMT (d, fmt); | |
4193 else | |
4194 while (d < stop_d && | |
4195 charptr_emchar_ascii_fmt (d, fmt, lispobj) != '\n') | |
4196 INC_CHARPTR_FMT (d, fmt); | |
4197 | |
4198 /* If we were stopped by a newline, skip forward over it. | |
4199 Otherwise we will get in an infloop when our start position | |
4200 was at begline. */ | |
4201 if (d < stop_d) | |
4202 INC_CHARPTR_FMT (d, fmt); | |
4203 range -= d - orig_d; | |
4204 startpos += d - orig_d; | |
4205 #if 1 | |
4206 assert (!forward_search_p || range >= 0); | |
4207 #endif | |
4208 } | |
4209 else if (range < 0) | |
4210 { | |
4211 /* We're lazy, like in the fastmap code below */ | |
4212 Emchar c; | |
4213 | |
4214 d = ((const unsigned char *) | |
4215 (startpos >= size1 ? string2 - size1 : string1) + startpos); | |
4216 DEC_CHARPTR_FMT (d, fmt); | |
4217 c = charptr_emchar_fmt (d, fmt, lispobj); | |
4218 c = RE_TRANSLATE (c); | |
4219 if (c != '\n') | |
4220 goto advance; | |
4221 } | |
4242 } | 4222 } |
4243 #endif /* REGEX_BEGLINE_CHECK */ | 4223 #endif /* REGEX_BEGLINE_CHECK */ |
4244 | 4224 |
4245 /* If a fastmap is supplied, skip quickly over characters that | 4225 /* If a fastmap is supplied, skip quickly over characters that |
4246 cannot be the start of a match. If the pattern can match the | 4226 cannot be the start of a match. If the pattern can match the |
4247 null string, however, we don't need to skip characters; we want | 4227 null string, however, we don't need to skip characters; we want |
4248 the first null string. */ | 4228 the first null string. */ |
4249 if (fastmap && startpos < total_size && !bufp->can_be_null) | 4229 if (fastmap && startpos < total_size && !bufp->can_be_null) |
4250 { | 4230 { |
4231 /* For the moment, fastmap always works as if buffer | |
4232 is in default format, so convert chars in the search strings | |
4233 into default format as we go along, if necessary. | |
4234 | |
4235 &&#### fastmap needs rethinking for 8-bit-fixed so | |
4236 it's faster. We need it to reflect the raw | |
4237 8-bit-fixed values. That isn't so hard if we assume | |
4238 that the top 96 bytes represent a single 1-byte | |
4239 charset. For 16-bit/32-bit stuff it's probably not | |
4240 worth it to make the fastmap represent the raw, due to | |
4241 its nature -- we'd have to use the LSB for the | |
4242 fastmap, and that causes lots of problems with Mule | |
4243 chars, where it essentially wipes out the usefulness | |
4244 of the fastmap entirely. */ | |
4251 if (range > 0) /* Searching forwards. */ | 4245 if (range > 0) /* Searching forwards. */ |
4252 { | 4246 { |
4253 int lim = 0; | 4247 int lim = 0; |
4254 int irange = range; | 4248 int irange = range; |
4255 | 4249 |
4260 (startpos >= size1 ? string2 - size1 : string1) + startpos); | 4254 (startpos >= size1 ? string2 - size1 : string1) + startpos); |
4261 | 4255 |
4262 /* Written out as an if-else to avoid testing `translate' | 4256 /* Written out as an if-else to avoid testing `translate' |
4263 inside the loop. */ | 4257 inside the loop. */ |
4264 if (TRANSLATE_P (translate)) | 4258 if (TRANSLATE_P (translate)) |
4265 while (range > lim) | 4259 { |
4266 { | 4260 while (range > lim) |
4261 { | |
4262 re_char *old_d = d; | |
4267 #ifdef MULE | 4263 #ifdef MULE |
4268 Emchar buf_ch; | 4264 Intbyte tempch[MAX_EMCHAR_LEN]; |
4269 | 4265 Emchar buf_ch = |
4270 buf_ch = charptr_emchar (d); | 4266 RE_TRANSLATE_1 (charptr_emchar_fmt (d, fmt, lispobj)); |
4271 buf_ch = RE_TRANSLATE (buf_ch); | 4267 set_charptr_emchar (tempch, buf_ch); |
4272 if (buf_ch >= 0200 || fastmap[(unsigned char) buf_ch]) | 4268 if (fastmap[*tempch]) |
4273 break; | 4269 break; |
4274 #else | 4270 #else |
4275 if (fastmap[(unsigned char)RE_TRANSLATE (*d)]) | 4271 if (fastmap[(unsigned char) RE_TRANSLATE_1 (*d)]) |
4276 break; | 4272 break; |
4277 #endif /* MULE */ | 4273 #endif /* MULE */ |
4278 d_size = charcount_to_bytecount (d, 1); | 4274 INC_CHARPTR_FMT (d, fmt); |
4279 range -= d_size; | 4275 range -= (d - old_d); |
4280 d += d_size; /* Speedier INC_CHARPTR(d) */ | 4276 #if 1 |
4281 } | 4277 assert (!forward_search_p || range >= 0); |
4278 #endif | |
4279 } | |
4280 } | |
4281 #ifdef MULE | |
4282 else if (fmt != FORMAT_DEFAULT) | |
4283 { | |
4284 while (range > lim) | |
4285 { | |
4286 re_char *old_d = d; | |
4287 Intbyte tempch[MAX_EMCHAR_LEN]; | |
4288 Emchar buf_ch = charptr_emchar_fmt (d, fmt, lispobj); | |
4289 set_charptr_emchar (tempch, buf_ch); | |
4290 if (fastmap[*tempch]) | |
4291 break; | |
4292 INC_CHARPTR_FMT (d, fmt); | |
4293 range -= (d - old_d); | |
4294 #if 1 | |
4295 assert (!forward_search_p || range >= 0); | |
4296 #endif | |
4297 } | |
4298 } | |
4299 #endif /* MULE */ | |
4282 else | 4300 else |
4283 while (range > lim && !fastmap[*d]) | 4301 { |
4284 { | 4302 while (range > lim && !fastmap[*d]) |
4285 d_size = charcount_to_bytecount (d, 1); | 4303 { |
4286 range -= d_size; | 4304 re_char *old_d = d; |
4287 d += d_size; /* Speedier INC_CHARPTR(d) */ | 4305 INC_CHARPTR (d); |
4288 } | 4306 range -= (d - old_d); |
4307 #if 1 | |
4308 assert (!forward_search_p || range >= 0); | |
4309 #endif | |
4310 } | |
4311 } | |
4289 | 4312 |
4290 startpos += irange - range; | 4313 startpos += irange - range; |
4291 } | 4314 } |
4292 else /* Searching backwards. */ | 4315 else /* Searching backwards. */ |
4293 { | 4316 { |
4294 Emchar c = (size1 == 0 || startpos >= size1 | 4317 /* #### It's not clear why we don't just write a loop, like |
4295 ? charptr_emchar (string2 + startpos - size1) | 4318 for the moving-forward case. Perhaps the writer got lazy, |
4296 : charptr_emchar (string1 + startpos)); | 4319 since backward searches aren't so common. */ |
4297 c = TRANSLATE (c); | 4320 d = ((const unsigned char *) |
4321 (startpos >= size1 ? string2 - size1 : string1) + startpos); | |
4298 #ifdef MULE | 4322 #ifdef MULE |
4299 if (!(c >= 0200 || fastmap[(unsigned char) c])) | 4323 { |
4324 Intbyte tempch[MAX_EMCHAR_LEN]; | |
4325 Emchar buf_ch = | |
4326 RE_TRANSLATE (charptr_emchar_fmt (d, fmt, lispobj)); | |
4327 set_charptr_emchar (tempch, buf_ch); | |
4328 if (!fastmap[*tempch]) | |
4329 goto advance; | |
4330 } | |
4331 #else | |
4332 if (!fastmap[(unsigned char) RE_TRANSLATE (*d)]) | |
4300 goto advance; | 4333 goto advance; |
4301 #else | 4334 #endif /* MULE */ |
4302 if (!fastmap[(unsigned char) c]) | |
4303 goto advance; | |
4304 #endif | |
4305 } | 4335 } |
4306 } | 4336 } |
4307 | 4337 |
4308 /* If can't match the null string, and that's all we have left, fail. */ | 4338 /* If can't match the null string, and that's all we have left, fail. */ |
4309 if (range >= 0 && startpos == total_size && fastmap | 4339 if (range >= 0 && startpos == total_size && fastmap |
4313 #ifdef emacs /* XEmacs added, w/removal of immediate_quit */ | 4343 #ifdef emacs /* XEmacs added, w/removal of immediate_quit */ |
4314 if (!no_quit_in_re_search) | 4344 if (!no_quit_in_re_search) |
4315 QUIT; | 4345 QUIT; |
4316 #endif | 4346 #endif |
4317 val = re_match_2_internal (bufp, string1, size1, string2, size2, | 4347 val = re_match_2_internal (bufp, string1, size1, string2, size2, |
4318 startpos, regs, stop); | 4348 startpos, regs, stop |
4349 RE_LISP_CONTEXT_ARGS); | |
4319 #ifndef REGEX_MALLOC | 4350 #ifndef REGEX_MALLOC |
4320 #ifdef C_ALLOCA | 4351 #ifdef C_ALLOCA |
4321 alloca (0); | 4352 alloca (0); |
4322 #endif | 4353 #endif |
4323 #endif | 4354 #endif |
4331 advance: | 4362 advance: |
4332 if (!range) | 4363 if (!range) |
4333 break; | 4364 break; |
4334 else if (range > 0) | 4365 else if (range > 0) |
4335 { | 4366 { |
4367 Bytecount d_size; | |
4336 d = ((const unsigned char *) | 4368 d = ((const unsigned char *) |
4337 (startpos >= size1 ? string2 - size1 : string1) + startpos); | 4369 (startpos >= size1 ? string2 - size1 : string1) + startpos); |
4338 d_size = charcount_to_bytecount (d, 1); | 4370 d_size = charptr_emchar_len_fmt (d, fmt); |
4339 range -= d_size; | 4371 range -= d_size; |
4372 #if 1 | |
4373 assert (!forward_search_p || range >= 0); | |
4374 #endif | |
4340 startpos += d_size; | 4375 startpos += d_size; |
4341 } | 4376 } |
4342 else | 4377 else |
4343 { | 4378 { |
4379 Bytecount d_size; | |
4344 /* Note startpos > size1 not >=. If we are on the | 4380 /* Note startpos > size1 not >=. If we are on the |
4345 string1/string2 boundary, we want to backup into string1. */ | 4381 string1/string2 boundary, we want to backup into string1. */ |
4346 d = ((const unsigned char *) | 4382 d = ((const unsigned char *) |
4347 (startpos > size1 ? string2 - size1 : string1) + startpos); | 4383 (startpos > size1 ? string2 - size1 : string1) + startpos); |
4348 DEC_CHARPTR(d); | 4384 DEC_CHARPTR_FMT (d, fmt); |
4349 d_size = charcount_to_bytecount (d, 1); | 4385 d_size = charptr_emchar_len_fmt (d, fmt); |
4350 range += d_size; | 4386 range += d_size; |
4387 #if 1 | |
4388 assert (!forward_search_p || range >= 0); | |
4389 #endif | |
4351 startpos -= d_size; | 4390 startpos -= d_size; |
4352 } | 4391 } |
4353 } | 4392 } |
4354 return -1; | 4393 return -1; |
4355 } /* re_search_2 */ | 4394 } /* re_search_2 */ |
4395 | |
4356 | 4396 |
4357 /* Declarations and macros for re_match_2. */ | 4397 /* Declarations and macros for re_match_2. */ |
4358 | 4398 |
4359 /* This converts PTR, a pointer into one of the search strings `string1' | 4399 /* This converts PTR, a pointer into one of the search strings `string1' |
4360 and `string2' into an offset from the beginning of that string. */ | 4400 and `string2' into an offset from the beginning of that string. */ |
4367 | 4407 |
4368 #define MATCHING_IN_FIRST_STRING (dend == end_match_1) | 4408 #define MATCHING_IN_FIRST_STRING (dend == end_match_1) |
4369 | 4409 |
4370 /* Call before fetching a character with *d. This switches over to | 4410 /* Call before fetching a character with *d. This switches over to |
4371 string2 if necessary. */ | 4411 string2 if necessary. */ |
4372 #define REGEX_PREFETCH() \ | 4412 #define REGEX_PREFETCH() \ |
4373 while (d == dend) \ | 4413 while (d == dend) \ |
4374 { \ | 4414 { \ |
4375 /* End of string2 => fail. */ \ | 4415 /* End of string2 => fail. */ \ |
4376 if (dend == end_match_2) \ | 4416 if (dend == end_match_2) \ |
4377 goto fail; \ | 4417 goto fail; \ |
4392 return the same position. */ | 4432 return the same position. */ |
4393 #define POS_BEFORE_GAP_UNSAFE(d) ((d) == string2 ? end1 : (d)) | 4433 #define POS_BEFORE_GAP_UNSAFE(d) ((d) == string2 ? end1 : (d)) |
4394 #define POS_AFTER_GAP_UNSAFE(d) ((d) == end1 ? string2 : (d)) | 4434 #define POS_AFTER_GAP_UNSAFE(d) ((d) == end1 ? string2 : (d)) |
4395 | 4435 |
4396 /* Test if CH is a word-constituent character. (XEmacs change) */ | 4436 /* Test if CH is a word-constituent character. (XEmacs change) */ |
4397 #define WORDCHAR_P_UNSAFE(ch) \ | 4437 #define WORDCHAR_P(ch) \ |
4398 (SYNTAX_UNSAFE (XCHAR_TABLE (regex_emacs_buffer->mirror_syntax_table), \ | 4438 (SYNTAX (BUFFER_MIRROR_SYNTAX_TABLE (lispbuf), ch) == Sword) |
4399 ch) == Sword) | |
4400 | 4439 |
4401 /* Free everything we malloc. */ | 4440 /* Free everything we malloc. */ |
4402 #ifdef MATCH_MAY_ALLOCATE | 4441 #ifdef MATCH_MAY_ALLOCATE |
4403 #define FREE_VAR(var) if (var) REGEX_FREE (var); var = NULL | 4442 #define FREE_VAR(var) if (var) REGEX_FREE (var); var = NULL |
4404 #define FREE_VARIABLES() \ | 4443 #define FREE_VARIABLES() \ |
4428 #define NO_HIGHEST_ACTIVE_REG (1 << BYTEWIDTH) | 4467 #define NO_HIGHEST_ACTIVE_REG (1 << BYTEWIDTH) |
4429 #define NO_LOWEST_ACTIVE_REG (NO_HIGHEST_ACTIVE_REG + 1) | 4468 #define NO_LOWEST_ACTIVE_REG (NO_HIGHEST_ACTIVE_REG + 1) |
4430 | 4469 |
4431 /* Matching routines. */ | 4470 /* Matching routines. */ |
4432 | 4471 |
4433 #ifndef emacs /* Emacs never uses this. */ | 4472 #ifndef emacs /* XEmacs never uses this. */ |
4434 /* re_match is like re_match_2 except it takes only a single string. */ | 4473 /* re_match is like re_match_2 except it takes only a single string. */ |
4435 | 4474 |
4436 int | 4475 int |
4437 re_match (struct re_pattern_buffer *bufp, const char *string, int size, | 4476 re_match (struct re_pattern_buffer *bufp, const char *string, int size, |
4438 int pos, struct re_registers *regs) | 4477 int pos, struct re_registers *regs |
4478 RE_LISP_CONTEXT_ARGS_DECL) | |
4439 { | 4479 { |
4440 int result = re_match_2_internal (bufp, NULL, 0, (re_char *) string, size, | 4480 int result = re_match_2_internal (bufp, NULL, 0, (re_char *) string, size, |
4441 pos, regs, size); | 4481 pos, regs, size |
4482 RE_LISP_CONTEXT_ARGS); | |
4442 alloca (0); | 4483 alloca (0); |
4443 return result; | 4484 return result; |
4444 } | 4485 } |
4445 #endif /* not emacs */ | 4486 #endif /* not emacs */ |
4446 | 4487 |
4447 | |
4448 /* re_match_2 matches the compiled pattern in BUFP against the | 4488 /* re_match_2 matches the compiled pattern in BUFP against the |
4449 (virtual) concatenation of STRING1 and STRING2 (of length SIZE1 and | 4489 (virtual) concatenation of STRING1 and STRING2 (of length SIZE1 and |
4450 SIZE2, respectively). We start matching at POS, and stop matching | 4490 SIZE2, respectively). We start matching at POS, and stop matching |
4451 at STOP. | 4491 at STOP. |
4452 | 4492 |
4459 matched substring. */ | 4499 matched substring. */ |
4460 | 4500 |
4461 int | 4501 int |
4462 re_match_2 (struct re_pattern_buffer *bufp, const char *string1, | 4502 re_match_2 (struct re_pattern_buffer *bufp, const char *string1, |
4463 int size1, const char *string2, int size2, int pos, | 4503 int size1, const char *string2, int size2, int pos, |
4464 struct re_registers *regs, int stop) | 4504 struct re_registers *regs, int stop |
4505 RE_LISP_CONTEXT_ARGS_DECL) | |
4465 { | 4506 { |
4466 int result; | 4507 int result; |
4467 | 4508 |
4468 #ifdef emacs | 4509 #ifdef emacs |
4469 SETUP_SYNTAX_CACHE_FOR_OBJECT (regex_match_object, | 4510 scache = setup_syntax_cache (scache, lispobj, lispbuf, |
4470 regex_emacs_buffer, | 4511 offset_to_charxpos (lispobj, pos), |
4471 SYNTAX_CACHE_OBJECT_BYTE_TO_CHAR (regex_match_object, | 4512 1); |
4472 regex_emacs_buffer, | |
4473 pos), | |
4474 1); | |
4475 #endif | 4513 #endif |
4476 | 4514 |
4477 result = re_match_2_internal (bufp, (re_char *) string1, size1, | 4515 result = re_match_2_internal (bufp, (re_char *) string1, size1, |
4478 (re_char *) string2, size2, | 4516 (re_char *) string2, size2, |
4479 pos, regs, stop); | 4517 pos, regs, stop |
4518 RE_LISP_CONTEXT_ARGS); | |
4480 | 4519 |
4481 alloca (0); | 4520 alloca (0); |
4482 return result; | 4521 return result; |
4483 } | 4522 } |
4484 | |
4485 #if defined (ERROR_CHECK_TEXT) && defined (emacs) | |
4486 int in_re_match_2_internal; | |
4487 | |
4488 /* #### I am seeing an error (once) where regex_match_object gets set | |
4489 to a string while matching on a buffer. The only way this seems | |
4490 possible is recursive invocation of re_match_2_internal(). */ | |
4491 static Lisp_Object | |
4492 restore_in_re_match_2_internal (Lisp_Object val) | |
4493 { | |
4494 in_re_match_2_internal = 0; | |
4495 return Qnil; | |
4496 } | |
4497 | |
4498 #define RESTORE_IN_MATCH_FLAG unbind_to (speccount) | |
4499 | |
4500 #else | |
4501 | |
4502 #define RESTORE_IN_MATCH_FLAG do {} while (0) | |
4503 | |
4504 #endif /* defined (ERROR_CHECK_TEXT) && defined (emacs) */ | |
4505 | |
4506 | 4523 |
4507 /* This is a separate function so that we can force an alloca cleanup | 4524 /* This is a separate function so that we can force an alloca cleanup |
4508 afterwards. */ | 4525 afterwards. */ |
4509 static int | 4526 static int |
4510 re_match_2_internal (struct re_pattern_buffer *bufp, re_char *string1, | 4527 re_match_2_internal (struct re_pattern_buffer *bufp, re_char *string1, |
4511 int size1, re_char *string2, int size2, int pos, | 4528 int size1, re_char *string2, int size2, int pos, |
4512 struct re_registers *regs, int stop) | 4529 struct re_registers *regs, int stop |
4530 RE_LISP_CONTEXT_ARGS_DECL) | |
4513 { | 4531 { |
4514 /* General temporaries. */ | 4532 /* General temporaries. */ |
4515 int mcnt; | 4533 int mcnt; |
4516 unsigned char *p1; | 4534 unsigned char *p1; |
4517 int should_succeed; /* XEmacs change */ | 4535 int should_succeed; /* XEmacs change */ |
4637 re_bool same_str_p; | 4655 re_bool same_str_p; |
4638 | 4656 |
4639 /* 1 if this match is the best seen so far. */ | 4657 /* 1 if this match is the best seen so far. */ |
4640 re_bool best_match_p; | 4658 re_bool best_match_p; |
4641 | 4659 |
4642 #if defined (ERROR_CHECK_TEXT) && defined (emacs) | 4660 #ifdef emacs |
4643 int speccount = specpdl_depth (); | 4661 Internal_Format fmt = buffer_or_other_internal_format (lispobj); |
4644 | 4662 #endif /* emacs */ |
4645 #if 0 | |
4646 /* we've hopefully fixed the reentrancy problem. */ | |
4647 assert (!in_re_match_2_internal); | |
4648 #endif | |
4649 in_re_match_2_internal = 1; | |
4650 record_unwind_protect (restore_in_re_match_2_internal, Qnil); | |
4651 #endif /* defined (ERROR_CHECK_TEXT) && defined (emacs) */ | |
4652 | 4663 |
4653 DEBUG_PRINT1 ("\n\nEntering re_match_2.\n"); | 4664 DEBUG_PRINT1 ("\n\nEntering re_match_2.\n"); |
4654 | 4665 |
4655 INIT_FAIL_STACK (); | 4666 INIT_FAIL_STACK (); |
4656 | 4667 |
4674 | 4685 |
4675 if (!(regstart && regend && old_regstart && old_regend && reg_info | 4686 if (!(regstart && regend && old_regstart && old_regend && reg_info |
4676 && best_regstart && best_regend && reg_dummy && reg_info_dummy)) | 4687 && best_regstart && best_regend && reg_dummy && reg_info_dummy)) |
4677 { | 4688 { |
4678 FREE_VARIABLES (); | 4689 FREE_VARIABLES (); |
4679 RESTORE_IN_MATCH_FLAG; | |
4680 return -2; | 4690 return -2; |
4681 } | 4691 } |
4682 } | 4692 } |
4683 else | 4693 else |
4684 { | 4694 { |
4692 | 4702 |
4693 /* The starting position is bogus. */ | 4703 /* The starting position is bogus. */ |
4694 if (pos < 0 || pos > size1 + size2) | 4704 if (pos < 0 || pos > size1 + size2) |
4695 { | 4705 { |
4696 FREE_VARIABLES (); | 4706 FREE_VARIABLES (); |
4697 RESTORE_IN_MATCH_FLAG; | |
4698 return -1; | 4707 return -1; |
4699 } | 4708 } |
4700 | 4709 |
4701 /* Initialize subexpression text positions to -1 to mark ones that no | 4710 /* Initialize subexpression text positions to -1 to mark ones that no |
4702 start_memory/stop_memory has been seen for. Also initialize the | 4711 start_memory/stop_memory has been seen for. Also initialize the |
4850 regs->start = TALLOC (regs->num_regs, regoff_t); | 4859 regs->start = TALLOC (regs->num_regs, regoff_t); |
4851 regs->end = TALLOC (regs->num_regs, regoff_t); | 4860 regs->end = TALLOC (regs->num_regs, regoff_t); |
4852 if (regs->start == NULL || regs->end == NULL) | 4861 if (regs->start == NULL || regs->end == NULL) |
4853 { | 4862 { |
4854 FREE_VARIABLES (); | 4863 FREE_VARIABLES (); |
4855 RESTORE_IN_MATCH_FLAG; | |
4856 return -2; | 4864 return -2; |
4857 } | 4865 } |
4858 bufp->regs_allocated = REGS_REALLOCATE; | 4866 bufp->regs_allocated = REGS_REALLOCATE; |
4859 } | 4867 } |
4860 else if (bufp->regs_allocated == REGS_REALLOCATE) | 4868 else if (bufp->regs_allocated == REGS_REALLOCATE) |
4867 RETALLOC (regs->start, regs->num_regs, regoff_t); | 4875 RETALLOC (regs->start, regs->num_regs, regoff_t); |
4868 RETALLOC (regs->end, regs->num_regs, regoff_t); | 4876 RETALLOC (regs->end, regs->num_regs, regoff_t); |
4869 if (regs->start == NULL || regs->end == NULL) | 4877 if (regs->start == NULL || regs->end == NULL) |
4870 { | 4878 { |
4871 FREE_VARIABLES (); | 4879 FREE_VARIABLES (); |
4872 RESTORE_IN_MATCH_FLAG; | |
4873 return -2; | 4880 return -2; |
4874 } | 4881 } |
4875 } | 4882 } |
4876 } | 4883 } |
4877 else | 4884 else |
4929 : string2 - size1); | 4936 : string2 - size1); |
4930 | 4937 |
4931 DEBUG_PRINT2 ("Returning %d from re_match_2.\n", mcnt); | 4938 DEBUG_PRINT2 ("Returning %d from re_match_2.\n", mcnt); |
4932 | 4939 |
4933 FREE_VARIABLES (); | 4940 FREE_VARIABLES (); |
4934 RESTORE_IN_MATCH_FLAG; | |
4935 return mcnt; | 4941 return mcnt; |
4936 } | 4942 } |
4937 | 4943 |
4938 /* Otherwise match next pattern command. */ | 4944 /* Otherwise match next pattern command. */ |
4939 switch (SWITCH_ENUM_CAST ((re_opcode_t) *p++)) | 4945 switch (SWITCH_ENUM_CAST ((re_opcode_t) *p++)) |
4946 | 4952 |
4947 case succeed: | 4953 case succeed: |
4948 DEBUG_PRINT1 ("EXECUTING succeed.\n"); | 4954 DEBUG_PRINT1 ("EXECUTING succeed.\n"); |
4949 goto succeed_label; | 4955 goto succeed_label; |
4950 | 4956 |
4951 /* Match the next n pattern characters exactly. The following | 4957 /* Match exactly a string of length n in the pattern. The |
4952 byte in the pattern defines n, and the n bytes after that | 4958 following byte in the pattern defines n, and the n bytes after |
4953 are the characters to match. */ | 4959 that make up the string to match. (Under Mule, this will be in |
4960 the default internal format.) */ | |
4954 case exactn: | 4961 case exactn: |
4955 mcnt = *p++; | 4962 mcnt = *p++; |
4956 DEBUG_PRINT2 ("EXECUTING exactn %d.\n", mcnt); | 4963 DEBUG_PRINT2 ("EXECUTING exactn %d.\n", mcnt); |
4957 | 4964 |
4958 /* This is written out as an if-else so we don't waste time | 4965 /* This is written out as an if-else so we don't waste time |
4960 if (TRANSLATE_P (translate)) | 4967 if (TRANSLATE_P (translate)) |
4961 { | 4968 { |
4962 do | 4969 do |
4963 { | 4970 { |
4964 #ifdef MULE | 4971 #ifdef MULE |
4965 Emchar pat_ch, buf_ch; | |
4966 Bytecount pat_len; | 4972 Bytecount pat_len; |
4967 | 4973 |
4968 REGEX_PREFETCH (); | 4974 REGEX_PREFETCH (); |
4969 pat_ch = charptr_emchar (p); | 4975 if (RE_TRANSLATE_1 (charptr_emchar_fmt (d, fmt, lispobj)) |
4970 buf_ch = charptr_emchar (d); | 4976 != charptr_emchar (p)) |
4971 if (RE_TRANSLATE (buf_ch) != pat_ch) | |
4972 goto fail; | 4977 goto fail; |
4973 | 4978 |
4974 pat_len = charcount_to_bytecount (p, 1); | 4979 pat_len = charptr_emchar_len (p); |
4975 p += pat_len; | 4980 p += pat_len; |
4976 INC_CHARPTR (d); | 4981 INC_CHARPTR_FMT (d, fmt); |
4977 | 4982 |
4978 mcnt -= pat_len; | 4983 mcnt -= pat_len; |
4979 #else /* not MULE */ | 4984 #else /* not MULE */ |
4980 REGEX_PREFETCH (); | 4985 REGEX_PREFETCH (); |
4981 if ((unsigned char) RE_TRANSLATE (*d++) != *p++) | 4986 if ((unsigned char) RE_TRANSLATE_1 (*d++) != *p++) |
4982 goto fail; | 4987 goto fail; |
4983 mcnt--; | 4988 mcnt--; |
4984 #endif | 4989 #endif |
4985 } | 4990 } |
4986 while (mcnt > 0); | 4991 while (mcnt > 0); |
4987 } | 4992 } |
4988 else | 4993 else |
4989 { | 4994 { |
4990 do | 4995 #ifdef MULE |
4996 /* If buffer format is default, then we can shortcut and just | |
4997 compare the text directly, byte by byte. Otherwise, we | |
4998 need to go character by character. */ | |
4999 if (fmt != FORMAT_DEFAULT) | |
4991 { | 5000 { |
4992 REGEX_PREFETCH (); | 5001 do |
4993 if (*d++ != *p++) goto fail; | 5002 { |
5003 Bytecount pat_len; | |
5004 | |
5005 REGEX_PREFETCH (); | |
5006 if (charptr_emchar_fmt (d, fmt, lispobj) != | |
5007 charptr_emchar (p)) | |
5008 goto fail; | |
5009 | |
5010 pat_len = charptr_emchar_len (p); | |
5011 p += pat_len; | |
5012 INC_CHARPTR_FMT (d, fmt); | |
5013 | |
5014 mcnt -= pat_len; | |
5015 } | |
5016 while (mcnt > 0); | |
4994 } | 5017 } |
4995 while (--mcnt); | 5018 else |
5019 #endif | |
5020 { | |
5021 do | |
5022 { | |
5023 REGEX_PREFETCH (); | |
5024 if (*d++ != *p++) goto fail; | |
5025 mcnt--; | |
5026 } | |
5027 while (mcnt > 0); | |
5028 } | |
4996 } | 5029 } |
4997 SET_REGS_MATCHED (); | 5030 SET_REGS_MATCHED (); |
4998 break; | 5031 break; |
4999 | 5032 |
5000 | 5033 |
5002 case anychar: | 5035 case anychar: |
5003 DEBUG_PRINT1 ("EXECUTING anychar.\n"); | 5036 DEBUG_PRINT1 ("EXECUTING anychar.\n"); |
5004 | 5037 |
5005 REGEX_PREFETCH (); | 5038 REGEX_PREFETCH (); |
5006 | 5039 |
5007 if ((!(bufp->syntax & RE_DOT_NEWLINE) && TRANSLATE (*d) == '\n') | 5040 if ((!(bufp->syntax & RE_DOT_NEWLINE) && |
5008 || (bufp->syntax & RE_DOT_NOT_NULL && TRANSLATE (*d) == '\000')) | 5041 RE_TRANSLATE (charptr_emchar_fmt (d, fmt, lispobj)) == '\n') |
5042 || (bufp->syntax & RE_DOT_NOT_NULL && | |
5043 RE_TRANSLATE (charptr_emchar_fmt (d, fmt, lispobj)) == | |
5044 '\000')) | |
5009 goto fail; | 5045 goto fail; |
5010 | 5046 |
5011 SET_REGS_MATCHED (); | 5047 SET_REGS_MATCHED (); |
5012 DEBUG_PRINT2 (" Matched `%d'.\n", *d); | 5048 DEBUG_PRINT2 (" Matched `%d'.\n", *d); |
5013 INC_CHARPTR (d); /* XEmacs change */ | 5049 INC_CHARPTR_FMT (d, fmt); /* XEmacs change */ |
5014 break; | 5050 break; |
5015 | 5051 |
5016 | 5052 |
5017 case charset: | 5053 case charset: |
5018 case charset_not: | 5054 case charset_not: |
5021 re_bool not_p = (re_opcode_t) *(p - 1) == charset_not; | 5057 re_bool not_p = (re_opcode_t) *(p - 1) == charset_not; |
5022 | 5058 |
5023 DEBUG_PRINT2 ("EXECUTING charset%s.\n", not_p ? "_not" : ""); | 5059 DEBUG_PRINT2 ("EXECUTING charset%s.\n", not_p ? "_not" : ""); |
5024 | 5060 |
5025 REGEX_PREFETCH (); | 5061 REGEX_PREFETCH (); |
5026 c = TRANSLATE (*d); /* The character to match. */ | 5062 c = charptr_emchar_fmt (d, fmt, lispobj); |
5063 c = RE_TRANSLATE (c); /* The character to match. */ | |
5027 | 5064 |
5028 /* Cast to `unsigned int' instead of `unsigned char' in case the | 5065 /* Cast to `unsigned int' instead of `unsigned char' in case the |
5029 bit list is a full 32 bytes long. */ | 5066 bit list is a full 32 bytes long. */ |
5030 if (c < (unsigned int) (*p * BYTEWIDTH) | 5067 if (c < (unsigned int) (*p * BYTEWIDTH) |
5031 && p[1 + c / BYTEWIDTH] & (1 << (c % BYTEWIDTH))) | 5068 && p[1 + c / BYTEWIDTH] & (1 << (c % BYTEWIDTH))) |
5034 p += 1 + *p; | 5071 p += 1 + *p; |
5035 | 5072 |
5036 if (!not_p) goto fail; | 5073 if (!not_p) goto fail; |
5037 | 5074 |
5038 SET_REGS_MATCHED (); | 5075 SET_REGS_MATCHED (); |
5039 INC_CHARPTR (d); /* XEmacs change */ | 5076 INC_CHARPTR_FMT (d, fmt); /* XEmacs change */ |
5040 break; | 5077 break; |
5041 } | 5078 } |
5042 | 5079 |
5043 #ifdef MULE | 5080 #ifdef MULE |
5044 case charset_mule: | 5081 case charset_mule: |
5048 re_bool not_p = (re_opcode_t) *(p - 1) == charset_mule_not; | 5085 re_bool not_p = (re_opcode_t) *(p - 1) == charset_mule_not; |
5049 | 5086 |
5050 DEBUG_PRINT2 ("EXECUTING charset_mule%s.\n", not_p ? "_not" : ""); | 5087 DEBUG_PRINT2 ("EXECUTING charset_mule%s.\n", not_p ? "_not" : ""); |
5051 | 5088 |
5052 REGEX_PREFETCH (); | 5089 REGEX_PREFETCH (); |
5053 c = charptr_emchar ((const Intbyte *) d); | 5090 c = charptr_emchar_fmt (d, fmt, lispobj); |
5054 c = TRANSLATE_EXTENDED_UNSAFE (c); /* The character to match. */ | 5091 c = RE_TRANSLATE (c); /* The character to match. */ |
5055 | 5092 |
5056 if (EQ (Qt, unified_range_table_lookup (p, c, Qnil))) | 5093 if (EQ (Qt, unified_range_table_lookup (p, c, Qnil))) |
5057 not_p = !not_p; | 5094 not_p = !not_p; |
5058 | 5095 |
5059 p += unified_range_table_bytes_used (p); | 5096 p += unified_range_table_bytes_used (p); |
5060 | 5097 |
5061 if (!not_p) goto fail; | 5098 if (!not_p) goto fail; |
5062 | 5099 |
5063 SET_REGS_MATCHED (); | 5100 SET_REGS_MATCHED (); |
5064 INC_CHARPTR (d); | 5101 INC_CHARPTR_FMT (d, fmt); |
5065 break; | 5102 break; |
5066 } | 5103 } |
5067 #endif /* MULE */ | 5104 #endif /* MULE */ |
5068 | 5105 |
5069 | 5106 |
5316 mcnt = dend2 - d2; | 5353 mcnt = dend2 - d2; |
5317 | 5354 |
5318 /* Compare that many; failure if mismatch, else move | 5355 /* Compare that many; failure if mismatch, else move |
5319 past them. */ | 5356 past them. */ |
5320 if (TRANSLATE_P (translate) | 5357 if (TRANSLATE_P (translate) |
5321 ? bcmp_translate ((unsigned char *) d, | 5358 ? bcmp_translate (d, d2, mcnt, translate |
5322 (unsigned char *) d2, mcnt, translate) | 5359 #ifdef emacs |
5360 , fmt, lispobj | |
5361 #endif | |
5362 ) | |
5323 : memcmp (d, d2, mcnt)) | 5363 : memcmp (d, d2, mcnt)) |
5324 goto fail; | 5364 goto fail; |
5325 d += mcnt, d2 += mcnt; | 5365 d += mcnt, d2 += mcnt; |
5326 | 5366 |
5327 /* Do this because we've match some characters. */ | 5367 /* Do this because we've match some characters. */ |
5339 | 5379 |
5340 if (AT_STRINGS_BEG (d)) | 5380 if (AT_STRINGS_BEG (d)) |
5341 { | 5381 { |
5342 if (!bufp->not_bol) break; | 5382 if (!bufp->not_bol) break; |
5343 } | 5383 } |
5344 else if (d[-1] == '\n' && bufp->newline_anchor) | 5384 else |
5345 { | 5385 { |
5346 break; | 5386 re_char *d2 = d; |
5347 } | 5387 DEC_CHARPTR (d2); |
5388 if (charptr_emchar_ascii_fmt (d2, fmt, lispobj) == '\n' && | |
5389 bufp->newline_anchor) | |
5390 break; | |
5391 } | |
5348 /* In all other cases, we fail. */ | 5392 /* In all other cases, we fail. */ |
5349 goto fail; | 5393 goto fail; |
5350 | 5394 |
5351 | 5395 |
5352 /* endline is the dual of begline. */ | 5396 /* endline is the dual of begline. */ |
5357 { | 5401 { |
5358 if (!bufp->not_eol) break; | 5402 if (!bufp->not_eol) break; |
5359 } | 5403 } |
5360 | 5404 |
5361 /* We have to ``prefetch'' the next character. */ | 5405 /* We have to ``prefetch'' the next character. */ |
5362 else if ((d == end1 ? *string2 : *d) == '\n' | 5406 else if ((d == end1 ? |
5407 charptr_emchar_ascii_fmt (string2, fmt, lispobj) : | |
5408 charptr_emchar_ascii_fmt (d, fmt, lispobj)) == '\n' | |
5363 && bufp->newline_anchor) | 5409 && bufp->newline_anchor) |
5364 { | 5410 { |
5365 break; | 5411 break; |
5366 } | 5412 } |
5367 goto fail; | 5413 goto fail; |
5742 else | 5788 else |
5743 { | 5789 { |
5744 re_char *d_before = POS_BEFORE_GAP_UNSAFE (d); | 5790 re_char *d_before = POS_BEFORE_GAP_UNSAFE (d); |
5745 re_char *d_after = POS_AFTER_GAP_UNSAFE (d); | 5791 re_char *d_after = POS_AFTER_GAP_UNSAFE (d); |
5746 | 5792 |
5747 /* emch1 is the character before d, syn1 is the syntax of emch1, | 5793 /* emch1 is the character before d, syn1 is the syntax of |
5748 emch2 is the character at d, and syn2 is the syntax of emch2. */ | 5794 emch1, emch2 is the character at d, and syn2 is the |
5795 syntax of emch2. */ | |
5749 Emchar emch1, emch2; | 5796 Emchar emch1, emch2; |
5750 int syn1, syn2; | 5797 int syn1, syn2; |
5751 #ifdef emacs | 5798 #ifdef emacs |
5752 int pos_before; | 5799 Charxpos pos_before; |
5753 #endif | 5800 #endif |
5754 | 5801 |
5755 DEC_CHARPTR (d_before); | 5802 DEC_CHARPTR_FMT (d_before, fmt); |
5756 emch1 = charptr_emchar (d_before); | 5803 emch1 = charptr_emchar_fmt (d_before, fmt, lispobj); |
5757 emch2 = charptr_emchar (d_after); | 5804 emch2 = charptr_emchar_fmt (d_after, fmt, lispobj); |
5758 | 5805 |
5759 #ifdef emacs | 5806 #ifdef emacs |
5760 pos_before = SYNTAX_CACHE_BYTE_TO_CHAR (PTR_TO_OFFSET (d)) - 1; | 5807 pos_before = |
5761 UPDATE_SYNTAX_CACHE (pos_before); | 5808 offset_to_charxpos (lispobj, PTR_TO_OFFSET (d)) - 1; |
5762 #endif | 5809 UPDATE_SYNTAX_CACHE (scache, pos_before); |
5763 syn1 = SYNTAX_FROM_CACHE (XCHAR_TABLE (regex_emacs_buffer->mirror_syntax_table), | 5810 #endif |
5764 emch1); | 5811 syn1 = SYNTAX_FROM_CACHE (scache, emch1); |
5765 #ifdef emacs | 5812 #ifdef emacs |
5766 UPDATE_SYNTAX_CACHE_FORWARD (pos_before + 1); | 5813 UPDATE_SYNTAX_CACHE_FORWARD (scache, pos_before + 1); |
5767 #endif | 5814 #endif |
5768 syn2 = SYNTAX_FROM_CACHE (XCHAR_TABLE (regex_emacs_buffer->mirror_syntax_table), | 5815 syn2 = SYNTAX_FROM_CACHE (scache, emch2); |
5769 emch2); | |
5770 | 5816 |
5771 result = ((syn1 == Sword) != (syn2 == Sword)); | 5817 result = ((syn1 == Sword) != (syn2 == Sword)); |
5772 } | 5818 } |
5773 if (result == should_succeed) | 5819 if (result == should_succeed) |
5774 break; | 5820 break; |
5790 if (WORDCHAR_P (d) && (AT_STRINGS_BEG (d) || !WORDCHAR_P (d - 1))) | 5836 if (WORDCHAR_P (d) && (AT_STRINGS_BEG (d) || !WORDCHAR_P (d - 1))) |
5791 break; | 5837 break; |
5792 | 5838 |
5793 */ | 5839 */ |
5794 re_char *dtmp = POS_AFTER_GAP_UNSAFE (d); | 5840 re_char *dtmp = POS_AFTER_GAP_UNSAFE (d); |
5795 Emchar emch = charptr_emchar (dtmp); | 5841 Emchar emch = charptr_emchar_fmt (dtmp, fmt, lispobj); |
5796 #ifdef emacs | 5842 #ifdef emacs |
5797 int charpos = SYNTAX_CACHE_BYTE_TO_CHAR (PTR_TO_OFFSET (d)); | 5843 Charxpos charpos = offset_to_charxpos (lispobj, PTR_TO_OFFSET (d)); |
5798 UPDATE_SYNTAX_CACHE (charpos); | 5844 UPDATE_SYNTAX_CACHE (scache, charpos); |
5799 #endif | 5845 #endif |
5800 if (SYNTAX_FROM_CACHE (XCHAR_TABLE (regex_emacs_buffer->mirror_syntax_table), | 5846 if (SYNTAX_FROM_CACHE (scache, emch) != Sword) |
5801 emch) != Sword) | |
5802 goto fail; | 5847 goto fail; |
5803 if (AT_STRINGS_BEG (d)) | 5848 if (AT_STRINGS_BEG (d)) |
5804 break; | 5849 break; |
5805 dtmp = POS_BEFORE_GAP_UNSAFE (d); | 5850 dtmp = POS_BEFORE_GAP_UNSAFE (d); |
5806 DEC_CHARPTR (dtmp); | 5851 DEC_CHARPTR_FMT (dtmp, fmt); |
5807 emch = charptr_emchar (dtmp); | 5852 emch = charptr_emchar_fmt (dtmp, fmt, lispobj); |
5808 #ifdef emacs | 5853 #ifdef emacs |
5809 UPDATE_SYNTAX_CACHE_BACKWARD (charpos - 1); | 5854 UPDATE_SYNTAX_CACHE_BACKWARD (scache, charpos - 1); |
5810 #endif | 5855 #endif |
5811 if (SYNTAX_FROM_CACHE (XCHAR_TABLE (regex_emacs_buffer->mirror_syntax_table), | 5856 if (SYNTAX_FROM_CACHE (scache, emch) != Sword) |
5812 emch) != Sword) | |
5813 break; | 5857 break; |
5814 goto fail; | 5858 goto fail; |
5815 } | 5859 } |
5816 | 5860 |
5817 case wordend: | 5861 case wordend: |
5828 The or condition is incorrect (reversed). | 5872 The or condition is incorrect (reversed). |
5829 */ | 5873 */ |
5830 re_char *dtmp; | 5874 re_char *dtmp; |
5831 Emchar emch; | 5875 Emchar emch; |
5832 #ifdef emacs | 5876 #ifdef emacs |
5833 int charpos = SYNTAX_CACHE_BYTE_TO_CHAR (PTR_TO_OFFSET (d)) - 1; | 5877 Charxpos charpos = offset_to_charxpos (lispobj, PTR_TO_OFFSET (d)); |
5834 UPDATE_SYNTAX_CACHE (charpos); | 5878 UPDATE_SYNTAX_CACHE (scache, charpos); |
5835 #endif | 5879 #endif |
5836 dtmp = POS_BEFORE_GAP_UNSAFE (d); | 5880 dtmp = POS_BEFORE_GAP_UNSAFE (d); |
5837 DEC_CHARPTR (dtmp); | 5881 DEC_CHARPTR_FMT (dtmp, fmt); |
5838 emch = charptr_emchar (dtmp); | 5882 emch = charptr_emchar_fmt (dtmp, fmt, lispobj); |
5839 if (SYNTAX_FROM_CACHE (XCHAR_TABLE (regex_emacs_buffer->mirror_syntax_table), | 5883 if (SYNTAX_FROM_CACHE (scache, emch) != Sword) |
5840 emch) != Sword) | |
5841 goto fail; | 5884 goto fail; |
5842 if (AT_STRINGS_END (d)) | 5885 if (AT_STRINGS_END (d)) |
5843 break; | 5886 break; |
5844 dtmp = POS_AFTER_GAP_UNSAFE (d); | 5887 dtmp = POS_AFTER_GAP_UNSAFE (d); |
5845 emch = charptr_emchar (dtmp); | 5888 emch = charptr_emchar_fmt (dtmp, fmt, lispobj); |
5846 #ifdef emacs | 5889 #ifdef emacs |
5847 UPDATE_SYNTAX_CACHE_FORWARD (charpos + 1); | 5890 UPDATE_SYNTAX_CACHE_FORWARD (scache, charpos + 1); |
5848 #endif | 5891 #endif |
5849 if (SYNTAX_FROM_CACHE (XCHAR_TABLE (regex_emacs_buffer->mirror_syntax_table), | 5892 if (SYNTAX_FROM_CACHE (scache, emch) != Sword) |
5850 emch) != Sword) | |
5851 break; | 5893 break; |
5852 goto fail; | 5894 goto fail; |
5853 } | 5895 } |
5854 | 5896 |
5855 #ifdef emacs | 5897 #ifdef emacs |
5856 case before_dot: | 5898 case before_dot: |
5857 DEBUG_PRINT1 ("EXECUTING before_dot.\n"); | 5899 DEBUG_PRINT1 ("EXECUTING before_dot.\n"); |
5858 if (! (NILP (regex_match_object) || BUFFERP (regex_match_object)) | 5900 if (!BUFFERP (lispobj) |
5859 || (BUF_PTR_BYTE_POS (regex_emacs_buffer, (unsigned char *) d) | 5901 || (BUF_PTR_BYTE_POS (XBUFFER (lispobj), (unsigned char *) d) |
5860 >= BUF_PT (regex_emacs_buffer))) | 5902 >= BUF_PT (XBUFFER (lispobj)))) |
5861 goto fail; | 5903 goto fail; |
5862 break; | 5904 break; |
5863 | 5905 |
5864 case at_dot: | 5906 case at_dot: |
5865 DEBUG_PRINT1 ("EXECUTING at_dot.\n"); | 5907 DEBUG_PRINT1 ("EXECUTING at_dot.\n"); |
5866 if (! (NILP (regex_match_object) || BUFFERP (regex_match_object)) | 5908 if (!BUFFERP (lispobj) |
5867 || (BUF_PTR_BYTE_POS (regex_emacs_buffer, (unsigned char *) d) | 5909 || (BUF_PTR_BYTE_POS (XBUFFER (lispobj), (unsigned char *) d) |
5868 != BUF_PT (regex_emacs_buffer))) | 5910 != BUF_PT (XBUFFER (lispobj)))) |
5869 goto fail; | 5911 goto fail; |
5870 break; | 5912 break; |
5871 | 5913 |
5872 case after_dot: | 5914 case after_dot: |
5873 DEBUG_PRINT1 ("EXECUTING after_dot.\n"); | 5915 DEBUG_PRINT1 ("EXECUTING after_dot.\n"); |
5874 if (! (NILP (regex_match_object) || BUFFERP (regex_match_object)) | 5916 if (!BUFFERP (lispobj) |
5875 || (BUF_PTR_BYTE_POS (regex_emacs_buffer, (unsigned char *) d) | 5917 || (BUF_PTR_BYTE_POS (XBUFFER (lispobj), (unsigned char *) d) |
5876 <= BUF_PT (regex_emacs_buffer))) | 5918 <= BUF_PT (XBUFFER (lispobj)))) |
5877 goto fail; | 5919 goto fail; |
5878 break; | 5920 break; |
5879 #if 0 /* not emacs19 */ | |
5880 case at_dot: | |
5881 DEBUG_PRINT1 ("EXECUTING at_dot.\n"); | |
5882 if (BUF_PTR_BYTE_POS (regex_emacs_buffer, (unsigned char *) d) + 1 | |
5883 != BUF_PT (regex_emacs_buffer)) | |
5884 goto fail; | |
5885 break; | |
5886 #endif /* not emacs19 */ | |
5887 | 5921 |
5888 case syntaxspec: | 5922 case syntaxspec: |
5889 DEBUG_PRINT2 ("EXECUTING syntaxspec %d.\n", mcnt); | 5923 DEBUG_PRINT2 ("EXECUTING syntaxspec %d.\n", mcnt); |
5890 mcnt = *p++; | 5924 mcnt = *p++; |
5891 goto matchsyntax; | 5925 goto matchsyntax; |
5899 { | 5933 { |
5900 int matches; | 5934 int matches; |
5901 Emchar emch; | 5935 Emchar emch; |
5902 | 5936 |
5903 REGEX_PREFETCH (); | 5937 REGEX_PREFETCH (); |
5904 #ifdef emacs | 5938 UPDATE_SYNTAX_CACHE |
5905 { | 5939 (scache, offset_to_charxpos (lispobj, PTR_TO_OFFSET (d))); |
5906 int charpos = SYNTAX_CACHE_BYTE_TO_CHAR (PTR_TO_OFFSET (d)); | 5940 |
5907 UPDATE_SYNTAX_CACHE (charpos); | 5941 emch = charptr_emchar_fmt (d, fmt, lispobj); |
5908 } | 5942 matches = (SYNTAX_FROM_CACHE (scache, emch) == |
5909 #endif | 5943 (enum syntaxcode) mcnt); |
5910 | 5944 INC_CHARPTR_FMT (d, fmt); |
5911 emch = charptr_emchar ((const Intbyte *) d); | |
5912 matches = (SYNTAX_FROM_CACHE (regex_emacs_buffer->mirror_syntax_table, | |
5913 emch) == (enum syntaxcode) mcnt); | |
5914 INC_CHARPTR (d); | |
5915 if (matches != should_succeed) | 5945 if (matches != should_succeed) |
5916 goto fail; | 5946 goto fail; |
5917 SET_REGS_MATCHED (); | 5947 SET_REGS_MATCHED (); |
5918 } | 5948 } |
5919 break; | 5949 break; |
5938 { | 5968 { |
5939 Emchar emch; | 5969 Emchar emch; |
5940 | 5970 |
5941 mcnt = *p++; | 5971 mcnt = *p++; |
5942 REGEX_PREFETCH (); | 5972 REGEX_PREFETCH (); |
5943 emch = charptr_emchar ((const Intbyte *) d); | 5973 emch = charptr_emchar_fmt (d, fmt, lispobj); |
5944 INC_CHARPTR (d); | 5974 INC_CHARPTR_FMT (d, fmt); |
5945 if (check_category_char(emch, regex_emacs_buffer->category_table, | 5975 if (check_category_char (emch, BUFFER_CATEGORY_TABLE (lispbuf), |
5946 mcnt, should_succeed)) | 5976 mcnt, should_succeed)) |
5947 goto fail; | 5977 goto fail; |
5948 SET_REGS_MATCHED (); | 5978 SET_REGS_MATCHED (); |
5949 } | 5979 } |
5950 break; | 5980 break; |
5951 | 5981 |
5956 #endif /* MULE */ | 5986 #endif /* MULE */ |
5957 #else /* not emacs */ | 5987 #else /* not emacs */ |
5958 case wordchar: | 5988 case wordchar: |
5959 DEBUG_PRINT1 ("EXECUTING non-Emacs wordchar.\n"); | 5989 DEBUG_PRINT1 ("EXECUTING non-Emacs wordchar.\n"); |
5960 REGEX_PREFETCH (); | 5990 REGEX_PREFETCH (); |
5961 if (!WORDCHAR_P_UNSAFE ((int) (*d))) | 5991 if (!WORDCHAR_P ((int) (*d))) |
5962 goto fail; | 5992 goto fail; |
5963 SET_REGS_MATCHED (); | 5993 SET_REGS_MATCHED (); |
5964 d++; | 5994 d++; |
5965 break; | 5995 break; |
5966 | 5996 |
5967 case notwordchar: | 5997 case notwordchar: |
5968 DEBUG_PRINT1 ("EXECUTING non-Emacs notwordchar.\n"); | 5998 DEBUG_PRINT1 ("EXECUTING non-Emacs notwordchar.\n"); |
5969 REGEX_PREFETCH (); | 5999 REGEX_PREFETCH (); |
5970 if (!WORDCHAR_P_UNSAFE ((int) (*d))) | 6000 if (!WORDCHAR_P ((int) (*d))) |
5971 goto fail; | 6001 goto fail; |
5972 SET_REGS_MATCHED (); | 6002 SET_REGS_MATCHED (); |
5973 d++; | 6003 d++; |
5974 break; | 6004 break; |
5975 #endif /* emacs */ | 6005 #endif /* emacs */ |
6032 if (best_regs_set) | 6062 if (best_regs_set) |
6033 goto restore_best_regs; | 6063 goto restore_best_regs; |
6034 | 6064 |
6035 FREE_VARIABLES (); | 6065 FREE_VARIABLES (); |
6036 | 6066 |
6037 RESTORE_IN_MATCH_FLAG; | |
6038 return -1; /* Failure to match. */ | 6067 return -1; /* Failure to match. */ |
6039 } /* re_match_2 */ | 6068 } /* re_match_2 */ |
6040 | 6069 |
6041 /* Subroutine definitions for re_match_2. */ | 6070 /* Subroutine definitions for re_match_2. */ |
6042 | 6071 |
6282 /* Return zero if TRANSLATE[S1] and TRANSLATE[S2] are identical for LEN | 6311 /* Return zero if TRANSLATE[S1] and TRANSLATE[S2] are identical for LEN |
6283 bytes; nonzero otherwise. */ | 6312 bytes; nonzero otherwise. */ |
6284 | 6313 |
6285 static int | 6314 static int |
6286 bcmp_translate (re_char *s1, re_char *s2, | 6315 bcmp_translate (re_char *s1, re_char *s2, |
6287 REGISTER int len, RE_TRANSLATE_TYPE translate) | 6316 REGISTER int len, RE_TRANSLATE_TYPE translate |
6317 #ifdef emacs | |
6318 , Internal_Format fmt, Lisp_Object lispobj | |
6319 #endif | |
6320 ) | |
6288 { | 6321 { |
6289 REGISTER const unsigned char *p1 = s1, *p2 = s2; | 6322 REGISTER re_char *p1 = s1, *p2 = s2; |
6290 #ifdef MULE | 6323 #ifdef MULE |
6291 const unsigned char *p1_end = s1 + len; | 6324 re_char *p1_end = s1 + len; |
6292 const unsigned char *p2_end = s2 + len; | 6325 re_char *p2_end = s2 + len; |
6293 | 6326 |
6294 while (p1 != p1_end && p2 != p2_end) | 6327 while (p1 != p1_end && p2 != p2_end) |
6295 { | 6328 { |
6296 Emchar p1_ch, p2_ch; | 6329 Emchar p1_ch, p2_ch; |
6297 | 6330 |
6298 p1_ch = charptr_emchar (p1); | 6331 p1_ch = charptr_emchar_fmt (p1, fmt, lispobj); |
6299 p2_ch = charptr_emchar (p2); | 6332 p2_ch = charptr_emchar_fmt (p2, fmt, lispobj); |
6300 | 6333 |
6301 if (RE_TRANSLATE (p1_ch) | 6334 if (RE_TRANSLATE_1 (p1_ch) |
6302 != RE_TRANSLATE (p2_ch)) | 6335 != RE_TRANSLATE_1 (p2_ch)) |
6303 return 1; | 6336 return 1; |
6304 INC_CHARPTR (p1); | 6337 INC_CHARPTR_FMT (p1, fmt); |
6305 INC_CHARPTR (p2); | 6338 INC_CHARPTR_FMT (p2, fmt); |
6306 } | 6339 } |
6307 #else /* not MULE */ | 6340 #else /* not MULE */ |
6308 while (len) | 6341 while (len) |
6309 { | 6342 { |
6310 if (RE_TRANSLATE (*p1++) != RE_TRANSLATE (*p2++)) return 1; | 6343 if (RE_TRANSLATE_1 (*p1++) != RE_TRANSLATE_1 (*p2++)) return 1; |
6311 len--; | 6344 len--; |
6312 } | 6345 } |
6313 #endif /* MULE */ | 6346 #endif /* MULE */ |
6314 return 0; | 6347 return 0; |
6315 } | 6348 } |
6341 bufp->no_sub = 0; | 6374 bufp->no_sub = 0; |
6342 | 6375 |
6343 /* Match anchors at newline. */ | 6376 /* Match anchors at newline. */ |
6344 bufp->newline_anchor = 1; | 6377 bufp->newline_anchor = 1; |
6345 | 6378 |
6346 ret = regex_compile ((unsigned char *) pattern, length, re_syntax_options, bufp); | 6379 ret = regex_compile ((unsigned char *) pattern, length, re_syntax_options, |
6380 bufp); | |
6347 | 6381 |
6348 if (!ret) | 6382 if (!ret) |
6349 return NULL; | 6383 return NULL; |
6350 return gettext (re_error_msgid[(int) ret]); | 6384 return gettext (re_error_msgid[(int) ret]); |
6351 } | 6385 } |
6386 don't need to initialize the pattern buffer fields which affect it. */ | 6420 don't need to initialize the pattern buffer fields which affect it. */ |
6387 | 6421 |
6388 /* Match anchors at newlines. */ | 6422 /* Match anchors at newlines. */ |
6389 re_comp_buf.newline_anchor = 1; | 6423 re_comp_buf.newline_anchor = 1; |
6390 | 6424 |
6391 ret = regex_compile ((unsigned char *)s, strlen (s), re_syntax_options, &re_comp_buf); | 6425 ret = regex_compile ((unsigned char *)s, strlen (s), re_syntax_options, |
6426 &re_comp_buf); | |
6392 | 6427 |
6393 if (!ret) | 6428 if (!ret) |
6394 return NULL; | 6429 return NULL; |
6395 | 6430 |
6396 /* Yes, we're discarding `const' here if !HAVE_LIBINTL. */ | 6431 /* Yes, we're discarding `const' here if !HAVE_LIBINTL. */ |
6638 preg->translate = NULL; | 6673 preg->translate = NULL; |
6639 } | 6674 } |
6640 | 6675 |
6641 #endif /* not emacs */ | 6676 #endif /* not emacs */ |
6642 | 6677 |
6643 /* | |
6644 Local variables: | |
6645 make-backup-files: t | |
6646 version-control: t | |
6647 trim-versions-without-asking: nil | |
6648 End: | |
6649 */ |