Mercurial > hg > xemacs-beta
comparison src/casetab.c @ 4407:4ee73bbe4f8e
Always use boyer_moore in ASCII or Latin-1 buffers with ASCII search strings.
2007-12-26 Aidan Kehoe <kehoea@parhasard.net>
* casetab.c:
Extend and correct some case table documentation.
* search.c (search_buffer):
Correct a bug where only the first entry for a character in the
case equivalence table was examined in determining if the
Boyer-Moore search algorithm is appropriate.
If there are case mappings outside of the charset and row of the
characters specified in the search string, those case mappings can
be safely ignored (and Boyer-Moore search can be used) if we know
from the buffer statistics that the corresponding characters cannot
occur.
* search.c (boyer_moore):
Assert that we haven't been passed a string with varying
characters sets or rows within character sets. That's what
simple_search is for.
In the very rare event that a character in the search string has a
canonical case mapping that is not in the same character set and
row, don't try to search for the canonical character, search for
some other character that is in the the desired character set and
row. Assert that the case table isn't corrupt.
Do not search for any character case mappings that cannot possibly
occur in the buffer, given the buffer metadata about its
contents.
author | Aidan Kehoe <kehoea@parhasard.net> |
---|---|
date | Wed, 26 Dec 2007 17:30:16 +0100 |
parents | 1e7cc382eb16 |
children | a98ca4640147 e0db3c197671 |
comparison
equal
deleted
inserted
replaced
4356:cc293ef846d2 | 4407:4ee73bbe4f8e |
---|---|
46 (3) `canon' maps each character to a "canonical" lowercase, such that if | 46 (3) `canon' maps each character to a "canonical" lowercase, such that if |
47 two different uppercase characters map to the same lowercase character, | 47 two different uppercase characters map to the same lowercase character, |
48 or vice versa, both characters will have the same entry in the canon | 48 or vice versa, both characters will have the same entry in the canon |
49 table. | 49 table. |
50 | 50 |
51 (4) `equiv' lists the "equivalence classes" defined by `canon'. Imagine | 51 (4) `eqv' lists the "equivalence classes" defined by `canon'. Imagine |
52 that all characters are divided into groups having the same `canon' | 52 that all characters are divided into groups having the same `canon' |
53 entry; these groups are called "equivalence classes" and `equiv' lists | 53 entry; these groups are called "equivalence classes" and `eqv' lists them |
54 them by linking the characters in each equivalence class together in a | 54 by linking the characters in each equivalence class together in a |
55 circular list. | 55 circular list. That is, to find out all all the members of a given char's |
56 | 56 equivalence classe, you need something like the following code: |
57 `canon' is used when doing case-insensitive comparisons. `equiv' is | 57 |
58 (let* ((char ?i) | |
59 (original-char char) | |
60 (standard-case-eqv (case-table-eqv (standard-case-table)))) | |
61 (loop | |
62 with res = (list char) | |
63 until (eq (setq char (get-char-table char standard-case-eqv)) | |
64 original-char) | |
65 do (push char res) | |
66 finally return res)) | |
67 | |
68 (Where #'case-table-eqv doesn't yet exist, and probably never will, given | |
69 that the C code needs to keep it in a consistent state so Lisp can't mess | |
70 around with it.) | |
71 | |
72 `canon' is used when doing case-insensitive comparisons. `eqv' is | |
58 used in the Boyer-Moore search code. | 73 used in the Boyer-Moore search code. |
59 */ | 74 */ |
60 | 75 |
61 #include <config.h> | 76 #include <config.h> |
62 #include "lisp.h" | 77 #include "lisp.h" |