Mercurial > hg > xemacs-beta
diff src/regex.c @ 502:7039e6323819
[xemacs-hg @ 2001-05-04 22:41:46 by ben]
----------------------- byte-comp warning fixes -----------------
New functions for cleanly eliminating byte-compiler warnings.
Their definitions require no changes at all in bytecomp.el,
meaning that any package that wants to use them and be compatible
with older versions of XEmacs need only copy the code and rename
the functions (i.e. prefix them with the package name).
Eliminate byte-compiler warnings using the new functions in
bytecomp-runtime.el.
Move coding-system-put,get,category, since they're not
Mule-specific and are used in prefer-coding-system.
font.el was incredibly ugly. Clean it up. Avoid using defsubst
for any exported functions, to avoid possible compatibility
problems if we later change the internal interface. (It happened
before, with face accessors, between 19.8 and 19.9). Fix tons
of warnings.
Clean up (new function gpm-is-supported-p eliminates duplicate
code in gpm-create/delete-device-hook) and eliminate warnings.
---------- make byte-recompile-directory work in the ---------
core `lisp' dir, even in the absence of
a Mule XEmacs (i.e. make it skip the Mule
files rather than trying to compile them).
now you should be able to do `touch *.el'
in the `lisp' dir, then
M-x byte-recompile-directory, and get no
warnings.
Avoid trying to compile Mule files in byte-recompile-directory
when we're not in a Mule XEmacs, since we're highly likely to get
syntax errors.
Add a coding-system cookie to all Mule files so that
byte-recompile-directory ignores them.
Magic cookie function moved to files.el from code-files.el (for
use by bytecomp even in a non-coding-system XEmacs), and changed
names and semantics for use by bytecomp. NOTE: IMO this is an
internal function that we can change as we like (and there is
absolutely no code anywhere else using the function).
---------------- GUI improvements: menus, help -------------------
Rearrange order of keymap declarations to be alphabetical.
Improve help on help to include all bindings, and group by
category. Add bindings for new Info commands. Remove
warnings. Use command-hyper-apropos in place of command-apropos.
Add a function to do the equivalent of command-apropos.
Evals its help-text argument so you can put expressions there.
Used now by help-for-help.
Add binding to continue text searches. Expand index searches to
work over multiple info documents. Add commands to search
text/index in User and Lispref.
Add new entry, "Uncomment Region" (parallels "Comment Out Region").
Redo Help menu; add bindings for new Info commands to search the
index or text of the User and Lispref manuals. Add command for
mark-paragraph, activate-region. Make Edit->R accelerator be
rectangle, not register (more commonly used), and put rectangle
first. Fix the Edit Init File entry to never load the .elc file.
Simplify the default-popup-menu. Add Cmds->Tabs menu.
Use kp-left not kp_left, etc.
---------------- Miscellaneous bug fixes/cleanup -------------------
byte-compiler-options: Correct doc string.
easy-menu-do-define: fix extra quote.
fill-paragraph-or-region:Rewrite to be more correct -- use
call-interactively so that we always get exactly the same
behavior as if the functions were called directly.
No need to fiddle with zmacs-region-stays, now that bogus
clearing of it (2001-04-28 src/ChangeLog) is removed.
Put dialog titles back in -- this time correctly. Fix various
other problems with leaks and such.
key-sequence-list-description:
Clean up fun to always correctly canonicalize.
Clean up Kinsoku comments, synch comment-region with FSF 20.7.
* simple.el (region-exists-p):
* simple.el (region-active-p):
Add comment about which one is correct to use in menu specs.
* sound.el (load-sound-file):
Minor code clean up.
* startup.el:
* startup.el (command-line-early):
* startup.el (initial-scratch-message):
Comment changes. Add info about sample.init.el to splash screen.
Improve initial-scratch-message and clarify purpose of Scratch
buffer. Fix byte-compile warning.
------------------------ Added features -------------------------
Add new variable to control whether etags checks all parent
directories for tag files. (On by default.)
* hash-table.el: New file, useful utility functions.
* dumped-lisp.el (preloaded-file-list): Dump hash-table.el.
------------ notable bug fix: Windows event code --------------
Get critical quit working.
------------ notable bug fix and new feature: regex code --------------
Shy groups were implemented in a horrible, half-assed way that
would cause them to screw up regex searching in most cases.
Fixed to work correctly.
Also extended back-reference syntax past 9. Only is recognized
as such if there are at least that many non-shy groups; and
optionally will warn about such uses, to catch old code that
might be using them differently. (Added variable to control
this in search.c -- `warn-about-possibly-incompatible-back-
references', on by default for the moment. Declared in lisp.h.
---------------- process/SIGIO improvements -------------------
define USE_GETADDRINFO to replace more complex conditional,
and use it. the code conditionalized on this in
unix_open_network_stream had *serious* problems handling errors.
it's now fixed, and major amounts of duplicate code between
the two versions were combined.
don't disable SIGIO and other interrupts unless
CONNECT_NEEDS_SLOWED_INTERRUPTS is defined -- don't penalize OS's
without bugs. similarly for a freebsd bug that was affecting all
OS's.
* s\ultrix.h:
define CONNECT_NEEDS_SLOWED_INTERRUPTS, since that's the OS
mentioned as having a kernel bug.
* sysdep.c (request_sigio_on_device):
* sysdep.c (unrequest_sigio_on_device):
fix SIGIO problems on Linux. add check for O_ASYNC in case it's
defined and FASYNC isn't. add comment about other ways to do
SIGIO on Linux.
* callproc.c (Fold_call_process_internal):
* process.c (Fstart_process_internal):
Deal with the possibility that `default-directory' doesn't
have terminating slash. Correct comments about vfork.
---------------- Miscellaneous bug fixes/cleanup -------------------
* callint.c (Finteractive):
Add lots of documentation -- exactly what the Lisp equivalents of
all the interactive specs are.
* console.h (struct console): change type of quit_char to Emchar.
* event-msw.c (lstream_type_create_mswindows_selectable): spacing
change.
Eliminate events-mod.h and combine into events.h.
* emacs.c:
* emacs.c (make_arg_list_1):
* emacs.c (main_1):
A couple of char->Extbyte changes, add a comment.
* glyphs-msw.c:
Correct indentation of function defns to not exceed 80 cols.
Try (sort of) to fix some code that sets the colors of the
progress gauge. (Commented out)
* keymap.c (syms_of_keymap):
use DEFSYMBOL.
* process.c (read_process_output):
No need to fiddle with zmacs_region_stays, now that bogus
clearing of it (see below) is removed.
* search.c (Freplace_match): warning fix.
author | ben |
---|---|
date | Fri, 04 May 2001 22:42:35 +0000 |
parents | 223736d75acb |
children | cd662ad69f40 |
line wrap: on
line diff
--- a/src/regex.c Thu May 03 21:08:39 2001 +0000 +++ b/src/regex.c Fri May 04 22:42:35 2001 +0000 @@ -415,7 +415,7 @@ /* Start remembering the text that is matched, for storing in a register. Followed by one byte with the register number, in - the range 0 to one less than the pattern buffer's re_nsub + the range 1 to the pattern buffer's re_ngroups field. Then followed by one byte with the number of groups inner to this one. (This last has to be part of the start_memory only because we need it in the on_failure_jump @@ -424,7 +424,7 @@ /* Stop remembering the text that is matched and store it in a memory register. Followed by one byte with the register - number, in the range 0 to one less than `re_nsub' in the + number, in the range 1 to `re_ngroups' in the pattern buffer, and one byte with the number of inner groups, just like `start_memory'. (We need the number of inner groups here because we don't have any easy way of finding the @@ -971,6 +971,7 @@ } printf ("re_nsub: %ld\t", (long)bufp->re_nsub); + printf ("re_ngroups: %ld\t", (long)bufp->re_ngroups); printf ("regs_alloc: %d\t", bufp->regs_allocated); printf ("can_be_null: %d\t", bufp->can_be_null); printf ("newline_anchor: %d\n", bufp->newline_anchor); @@ -980,6 +981,20 @@ printf ("syntax: %d\n", bufp->syntax); /* Perhaps we should print the translate table? */ /* and maybe the category table? */ + + if (bufp->external_to_internal_register) + { + int i; + + printf ("external_to_internal_register:\n"); + for (i = 0; i <= bufp->re_nsub; i++) + { + if (i > 0) + printf (", "); + printf ("%d -> %d", i, bufp->external_to_internal_register[i]); + } + printf ("\n"); + } } @@ -1694,9 +1709,13 @@ #define MAX_REGNUM 255 /* But patterns can have more than `MAX_REGNUM' registers. We just - ignore the excess. */ + ignore the excess. + #### not true! groups past this will fail in lots of ways, if we + ever have to backtrack. + */ typedef unsigned regnum_t; +#define INIT_REG_TRANSLATE_SIZE 5 /* Macros for the compile stack. */ @@ -1880,7 +1899,9 @@ `syntax' is set to SYNTAX; `used' is set to the length of the compiled pattern; `fastmap_accurate' is zero; - `re_nsub' is the number of subexpressions in PATTERN; + `re_ngroups' is the number of groups/subexpressions (including shy + groups) in PATTERN; + `re_nsub' is the number of non-shy groups in PATTERN; `not_bol' and `not_eol' are zero; The `fastmap' and `newline_anchor' fields are neither @@ -1978,6 +1999,25 @@ /* Always count groups, whether or not bufp->no_sub is set. */ bufp->re_nsub = 0; + bufp->re_ngroups = 0; + + bufp->warned_about_incompatible_back_references = 0; + + if (bufp->external_to_internal_register == 0) + { + bufp->external_to_internal_register_size = INIT_REG_TRANSLATE_SIZE; + RETALLOC (bufp->external_to_internal_register, + bufp->external_to_internal_register_size, + int); + } + + { + int i; + + bufp->external_to_internal_register[0] = 0; + for (i = 1; i < bufp->external_to_internal_register_size; i++) + bufp->external_to_internal_register[i] = (int) 0xDEADBEEF; + } #if !defined (emacs) && !defined (SYNTAX_TABLE) /* Initialize the syntax table. */ @@ -2560,6 +2600,7 @@ handle_open: { regnum_t r; + int shy = 0; if (!(syntax & RE_NO_SHY_GROUPS) && p != pend @@ -2570,7 +2611,7 @@ switch (c) { case ':': /* shy groups */ - r = MAX_REGNUM + 1; + shy = 1; break; /* All others are reserved for future constructs. */ @@ -2578,11 +2619,32 @@ FREE_STACK_RETURN (REG_BADPAT); } } - else - { - bufp->re_nsub++; - r = ++regnum; - } + + r = ++regnum; + bufp->re_ngroups++; + if (!shy) + { + bufp->re_nsub++; + while (bufp->external_to_internal_register_size <= + bufp->re_nsub) + { + int i; + int old_size = + bufp->external_to_internal_register_size; + bufp->external_to_internal_register_size += 5; + RETALLOC (bufp->external_to_internal_register, + bufp->external_to_internal_register_size, + int); + /* debugging */ + for (i = old_size; + i < bufp->external_to_internal_register_size; i++) + bufp->external_to_internal_register[i] = + (int) 0xDEADBEEF; + } + + bufp->external_to_internal_register[bufp->re_nsub] = + bufp->re_ngroups; + } if (COMPILE_STACK_FULL) { @@ -2606,7 +2668,10 @@ /* We will eventually replace the 0 with the number of groups inner to this one. But do not push a start_memory for groups beyond the last one we can - represent in the compiled pattern. */ + represent in the compiled pattern. + #### bad bad bad. this will fail in lots of ways, if we + ever have to backtrack for these groups. + */ if (r <= MAX_REGNUM) { COMPILE_STACK_TOP.inner_group_offset @@ -2996,21 +3061,59 @@ case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': { - regnum_t reg; + regnum_t reg, regint; + int may_need_to_unfetch = 0; if (syntax & RE_NO_BK_REFS) goto normal_char; + /* This only goes up to 99. It could be extended to work + up to 255 (the maximum number of registers that can be + handled by the current regexp engine, because it stores + its register numbers in the compiled pattern as one byte, + ugh). Doing that's a bit trickier, because you might + have the case where \25 a back-ref but \255 is not, ... */ reg = c - '0'; - - if (reg > regnum) + if (p < pend) + { + PATFETCH (c); + if (c >= '0' && c <= '9') + { + regnum_t new_reg = reg * 10 + c - '0'; + if (new_reg <= bufp->re_nsub) + { + reg = new_reg; + may_need_to_unfetch = 1; + } + else + PATUNFETCH; + } + } + + if (reg > bufp->re_nsub) FREE_STACK_RETURN (REG_ESUBREG); + regint = bufp->external_to_internal_register[reg]; /* Can't back reference to a subexpression if inside of it. */ - if (group_in_compile_stack (compile_stack, reg)) - goto normal_char; + if (group_in_compile_stack (compile_stack, regint)) + { + if (may_need_to_unfetch) + PATUNFETCH; + goto normal_char; + } + +#ifdef emacs + if (reg > 9 && + bufp->warned_about_incompatible_back_references == 0) + { + bufp->warned_about_incompatible_back_references = 1; + warn_when_safe (intern ("regex"), Qinfo, + "Back reference \\%d now has new " + "semantics in %s", reg, pattern); + } +#endif laststart = buf_end; - BUF_PUSH_2 (duplicate, reg); + BUF_PUSH_2 (duplicate, regint); } break; @@ -3125,7 +3228,7 @@ isn't necessary unless we're trying to avoid calling alloca in the search and match routines. */ { - int num_regs = bufp->re_nsub + 1; + int num_regs = bufp->re_ngroups + 1; /* Since DOUBLE_FAIL_STACK refuses to double only if the current size is strictly greater than re_max_failures, the largest possible stack @@ -4386,7 +4489,7 @@ /* We fill all the registers internally, independent of what we return, for use in backreferences. The number here includes an element for register zero. */ - unsigned num_regs = bufp->re_nsub + 1; + unsigned num_regs = bufp->re_ngroups + 1; /* The currently active registers. */ unsigned lowest_active_reg = NO_LOWEST_ACTIVE_REG; @@ -4472,7 +4575,7 @@ there are groups, we include space for register 0 (the whole pattern), even though we never use it, since it simplifies the array indexing. We should fix this. */ - if (bufp->re_nsub) + if (bufp->re_ngroups) { regstart = REGEX_TALLOC (num_regs, re_char *); regend = REGEX_TALLOC (num_regs, re_char *); @@ -4650,12 +4753,13 @@ /* If caller wants register contents data back, do it. */ if (regs && !bufp->no_sub) { + int num_nonshy_regs = bufp->re_nsub + 1; /* Have the register data arrays been allocated? */ if (bufp->regs_allocated == REGS_UNALLOCATED) { /* No. So allocate them with malloc. We need one extra element beyond `num_regs' for the `-1' marker GNU code uses. */ - regs->num_regs = MAX (RE_NREGS, num_regs + 1); + regs->num_regs = MAX (RE_NREGS, num_nonshy_regs + 1); regs->start = TALLOC (regs->num_regs, regoff_t); regs->end = TALLOC (regs->num_regs, regoff_t); if (regs->start == NULL || regs->end == NULL) @@ -4669,9 +4773,9 @@ { /* Yes. If we need more elements than were already allocated, reallocate them. If we need fewer, just leave it alone. */ - if (regs->num_regs < num_regs + 1) + if (regs->num_regs < num_nonshy_regs + 1) { - regs->num_regs = num_regs + 1; + regs->num_regs = num_nonshy_regs + 1; RETALLOC (regs->start, regs->num_regs, regoff_t); RETALLOC (regs->end, regs->num_regs, regoff_t); if (regs->start == NULL || regs->end == NULL) @@ -4701,16 +4805,19 @@ /* Go through the first `min (num_regs, regs->num_regs)' registers, since that is all we initialized. */ - for (mcnt = 1; mcnt < MIN (num_regs, regs->num_regs); mcnt++) + for (mcnt = 1; mcnt < MIN (num_nonshy_regs, regs->num_regs); + mcnt++) { - if (REG_UNSET (regstart[mcnt]) || REG_UNSET (regend[mcnt])) + int internal_reg = bufp->external_to_internal_register[mcnt]; + if (REG_UNSET (regstart[internal_reg]) || + REG_UNSET (regend[internal_reg])) regs->start[mcnt] = regs->end[mcnt] = -1; else { - regs->start[mcnt] - = (regoff_t) POINTER_TO_OFFSET (regstart[mcnt]); - regs->end[mcnt] - = (regoff_t) POINTER_TO_OFFSET (regend[mcnt]); + regs->start[mcnt] = + (regoff_t) POINTER_TO_OFFSET (regstart[internal_reg]); + regs->end[mcnt] = + (regoff_t) POINTER_TO_OFFSET (regend[internal_reg]); } } @@ -4719,7 +4826,7 @@ we (re)allocated the registers, this is the case, because we always allocate enough to have at least one -1 at the end. */ - for (mcnt = num_regs; mcnt < regs->num_regs; mcnt++) + for (mcnt = num_nonshy_regs; mcnt < regs->num_regs; mcnt++) regs->start[mcnt] = regs->end[mcnt] = -1; } /* regs && !bufp->no_sub */ @@ -5065,11 +5172,15 @@ /* \<digit> has been turned into a `duplicate' command which is - followed by the numeric value of <digit> as the register number. */ + followed by the numeric value of <digit> as the register number. + (Already passed through external-to-internal-register mapping, + so it refers to the actual group number, not the non-shy-only + numbering used in the external world.) */ case duplicate: { REGISTER re_char *d2, *dend2; - int regno = *p++; /* Get which register to match against. */ + /* Get which register to match against. */ + int regno = *p++; DEBUG_PRINT2 ("EXECUTING duplicate %d.\n", regno); /* Can't back reference a group which we've never matched. */ @@ -6222,6 +6333,8 @@ `newline_anchor' to REG_NEWLINE being set in CFLAGS; `fastmap' and `fastmap_accurate' to zero; `re_nsub' to the number of subexpressions in PATTERN. + (non-shy of course. POSIX probably doesn't know about + shy ones, and in any case they should be invisible.) PATTERN is the address of the pattern string.