annotate lisp/regexp-opt.el @ 4792:95b04754ea8c

Make #'equalp more compatible with CL; add a compiler macro, test & doc it. lisp/ChangeLog addition: 2009-11-08 Aidan Kehoe <kehoea@parhasard.net> * cl-extra.el (cl-string-vector-equalp) (cl-bit-vector-vector-equalp, cl-vector-array-equalp) (cl-hash-table-contents-equalp): New functions, to implement equalp treating arrays with identical contents as equivalent, as specified by Common Lisp. (equalp): Revise this function to implement array equivalence, and the hash-table equalp behaviour specified by CL. * cl-macs.el (equalp): Add a compiler macro for this function, used when one of the arguments is constant, and as such, its type is known at compile time. man/ChangeLog addition: 2009-11-08 Aidan Kehoe <kehoea@parhasard.net> * lispref/objects.texi (Equality Predicates): Document #'equalp here, as well as #'equal and #'eq. tests/ChangeLog addition: 2009-12-31 Aidan Kehoe <kehoea@parhasard.net> * automated/lisp-tests.el: Test much of the functionality of equalp; add a pointer to Paul Dietz' ANSI test suite for this function, converted to Emacs Lisp. Not including the tests themselves in XEmacs because who owns the copyright on the files is unclear and the GCL people didn't respond to my queries.
author Aidan Kehoe <kehoea@parhasard.net>
date Thu, 31 Dec 2009 15:09:41 +0000
parents 317f30471f4e
children f00192e1cd49 308d34e9f07d
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
2548
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
1 ;;; regexp-opt.el --- generate efficient regexps to match strings
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
2
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
3 ;; Copyright (C) 1994,95,96,97,98,99,2000 Free Software Foundation, Inc.
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
4
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
5 ;; Author: Simon Marshall <simon@gnu.org>
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
6 ;; Maintainer: FSF
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
7 ;; Keywords: strings, regexps, extensions
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
8
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
9 ;; This file is part of XEmacs.
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
10
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
11 ;; XEmacs is free software; you can redistribute it and/or modify
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
12 ;; it under the terms of the GNU General Public License as published by
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
13 ;; the Free Software Foundation; either version 2, or (at your option)
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
14 ;; any later version.
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
15
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
16 ;; XEmacs is distributed in the hope that it will be useful,
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
17 ;; but WITHOUT ANY WARRANTY; without even the implied warranty of
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
18 ;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
19 ;; GNU General Public License for more details.
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
20
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
21 ;; You should have received a copy of the GNU General Public License
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
22 ;; along with XEmacs; see the file COPYING. If not, write to the
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
23 ;; Free Software Foundation, Inc., 59 Temple Place - Suite 330,
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
24 ;; Boston, MA 02111-1307, USA.
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
25
2550
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
26 ;;; Synched up with: GNU Emacs 21.3 + paren-in-char-set fix from CVS
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
27 ;;; revision 1.25. Some implementation differences in
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
28 ;;; regexp-opt-group and regexp-opt-charset but the APIs
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
29 ;;; are compatible and should return compatible (if not
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
30 ;;; exactly the same) regexps.
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
31
2548
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
32 ;;; Commentary:
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
33
2550
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
34 ;; The "opt" in "regexp-opt" stands for "optim\\(?:al\\|i\\(?:se\\|ze\\)\\)".
2548
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
35 ;;
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
36 ;; This package generates a regexp from a given list of strings (which matches
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
37 ;; one of those strings) so that the regexp generated by:
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
38 ;;
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
39 ;; (regexp-opt strings)
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
40 ;;
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
41 ;; is equivalent to, but more efficient than, the regexp generated by:
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
42 ;;
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
43 ;; (mapconcat 'regexp-quote strings "\\|")
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
44 ;;
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
45 ;; For example:
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
46 ;;
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
47 ;; (let ((strings '("cond" "if" "when" "unless" "while"
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
48 ;; "let" "let*" "progn" "prog1" "prog2"
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
49 ;; "save-restriction" "save-excursion" "save-window-excursion"
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
50 ;; "save-current-buffer" "save-match-data"
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
51 ;; "catch" "throw" "unwind-protect" "condition-case")))
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
52 ;; (concat "(" (regexp-opt strings t) "\\>"))
2550
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
53 ;; => "(\\(c\\(?:atch\\|ond\\(?:ition-case\\)?\\)\\|if\\|let\\*?\\|prog[12n]\\|save-\\(?:current-buffer\\|excursion\\|match-data\\|restriction\\|window-excursion\\)\\|throw\\|un\\(?:less\\|wind-protect\\)\\|wh\\(?:en\\|ile\\)\\)\\>"
2548
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
54 ;;
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
55 ;; Searching using the above example `regexp-opt' regexp takes approximately
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
56 ;; two-thirds of the time taken using the equivalent `mapconcat' regexp.
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
57
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
58 ;; Since this package was written to produce efficient regexps, not regexps
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
59 ;; efficiently, it is probably not a good idea to in-line too many calls in
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
60 ;; your code, unless you use the following trick with `eval-when-compile':
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
61 ;;
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
62 ;; (defvar definition-regexp
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
63 ;; (eval-when-compile
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
64 ;; (concat "^("
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
65 ;; (regexp-opt '("defun" "defsubst" "defmacro" "defalias"
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
66 ;; "defvar" "defconst") t)
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
67 ;; "\\>")))
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
68 ;;
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
69 ;; The `byte-compile' code will be as if you had defined the variable thus:
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
70 ;;
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
71 ;; (defvar definition-regexp
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
72 ;; "^(\\(def\\(alias\\|const\\|macro\\|subst\\|un\\|var\\)\\)\\>")
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
73 ;;
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
74 ;; Note that if you use this trick for all instances of `regexp-opt' and
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
75 ;; `regexp-opt-depth' in your code, regexp-opt.el would only have to be loaded
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
76 ;; at compile time. But note also that using this trick means that should
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
77 ;; regexp-opt.el be changed, perhaps to fix a bug or to add a feature to
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
78 ;; improve the efficiency of `regexp-opt' regexps, you would have to recompile
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
79 ;; your code for such changes to have effect in your code.
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
80
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
81 ;; Originally written for font-lock.el, from an idea from Stig's hl319.el, with
2550
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
82 ;; thanks for ideas also to Michael Ernst, Bob Glickstein, Dan Nicolaescu and
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
83 ;; Stefan Monnier.
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
84 ;; No doubt `regexp-opt' doesn't always produce optimal regexps, so code, ideas
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
85 ;; or any other information to improve things are welcome.
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
86 ;;
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
87 ;; One possible improvement would be to compile '("aa" "ab" "ba" "bb")
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
88 ;; into "[ab][ab]" rather than "a[ab]\\|b[ab]". I'm not sure it's worth
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
89 ;; it but if someone knows how to do it without going through too many
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
90 ;; contortions, I'm all ears.
2548
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
91
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
92 ;;; Code:
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
93
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
94 ;;;###autoload
2550
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
95 (defun regexp-opt (strings &optional paren)
2548
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
96 "Return a regexp to match a string in STRINGS.
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
97 Each string should be unique in STRINGS and should not contain any regexps,
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
98 quoted or not. If optional PAREN is non-nil, ensure that the returned regexp
2550
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
99 is enclosed by at least one regexp grouping construct.
2548
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
100 The returned regexp is typically more efficient than the equivalent regexp:
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
101
2550
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
102 (let ((open (if PAREN \"\\\\(\" \"\")) (close (if PAREN \"\\\\)\" \"\")))
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
103 (concat open (mapconcat 'regexp-quote STRINGS \"\\\\|\") close))
2548
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
104
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
105 If PAREN is `words', then the resulting regexp is additionally surrounded
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
106 by \\=\\< and \\>."
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
107 (save-match-data
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
108 ;; Recurse on the sorted list.
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
109 (let* ((max-lisp-eval-depth (* 1024 1024))
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
110 (completion-ignore-case nil)
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
111 (words (eq paren 'words))
2550
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
112 (open (cond ((stringp paren) paren) (paren "\\(")))
2548
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
113 (sorted-strings (sort (copy-sequence strings) 'string-lessp))
2550
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
114 (re (regexp-opt-group sorted-strings open)))
2548
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
115 (if words (concat "\\<" re "\\>") re))))
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
116
2550
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
117 (defconst regexp-opt-not-groupie*-re
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
118 (let* ((harmless-ch "[^\\\\[]")
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
119 (esc-pair-not-lp "\\\\[^(]")
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
120 (class-harmless-ch "[^][]")
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
121 (class-lb-harmless "[^]:]")
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
122 (class-lb-colon-maybe-charclass ":\\([a-z]+:]\\)?")
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
123 (class-lb (concat "\\[\\(" class-lb-harmless
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
124 "\\|" class-lb-colon-maybe-charclass "\\)"))
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
125 (class
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
126 (concat "\\[^?]?"
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
127 "\\(" class-harmless-ch
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
128 "\\|" class-lb "\\)*"
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
129 "\\[?]")) ; special handling for bare [ at end of re
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
130 (shy-lp "\\\\(\\?:"))
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
131 (concat "\\(" harmless-ch "\\|" esc-pair-not-lp
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
132 "\\|" class "\\|" shy-lp "\\)*"))
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
133 "Matches any part of a regular expression EXCEPT for non-shy \"\\\\(\"s")
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
134
2548
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
135 ;;;###autoload
2550
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
136 (defun regexp-opt-depth (regexp)
2548
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
137 "Return the depth of REGEXP.
2550
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
138 This means the number of regexp grouping constructs (parenthesised expressions)
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
139 in REGEXP."
2548
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
140 (save-match-data
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
141 ;; Hack to signal an error if REGEXP does not have balanced parentheses.
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
142 (string-match regexp "")
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
143 ;; Count the number of open parentheses in REGEXP.
2550
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
144 (let ((count 0) start)
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
145 (while
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
146 (progn
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
147 (string-match regexp-opt-not-groupie*-re regexp start)
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
148 (setq start ( + (match-end 0) 2)) ; +2 for "\\(" after match-end.
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
149 (<= start (length regexp)))
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
150 (setq count (1+ count)))
2548
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
151 count)))
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
152
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
153 ;;; Workhorse functions.
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
154
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
155 (eval-when-compile
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
156 (require 'cl))
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
157
2550
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
158 (defun regexp-opt-group (strings &optional paren lax)
2548
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
159 "Return a regexp to match a string in STRINGS.
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
160 If PAREN non-nil, output regexp parentheses around returned regexp.
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
161 If LAX non-nil, don't output parentheses if it doesn't require them.
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
162 Merges keywords to avoid backtracking in Emacs' regexp matcher.
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
163
2550
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
164 The basic idea is to find the shortest common prefix or suffix, remove it
2548
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
165 and recurse. If there is no prefix, we divide the list into two so that
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
166 \(at least) one half will have at least a one-character common prefix.
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
167
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
168 Also we delay the addition of grouping parenthesis as long as possible
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
169 until we're sure we need them, and try to remove one-character sequences
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
170 so we can use character sets rather than grouping parenthesis."
2550
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
171 (let* ((open-group (cond ((stringp paren) paren) (paren "\\(?:") (t "")))
2548
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
172 (close-group (if paren "\\)" ""))
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
173 (open-charset (if lax "" open-group))
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
174 (close-charset (if lax "" close-group)))
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
175 (cond
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
176 ;;
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
177 ;; If there are no strings, just return the empty string.
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
178 ((= (length strings) 0)
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
179 "")
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
180 ;;
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
181 ;; If there is only one string, just return it.
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
182 ((= (length strings) 1)
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
183 (if (= (length (car strings)) 1)
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
184 (concat open-charset (regexp-quote (car strings)) close-charset)
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
185 (concat open-group (regexp-quote (car strings)) close-group)))
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
186 ;;
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
187 ;; If there is an empty string, remove it and recurse on the rest.
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
188 ((= (length (car strings)) 0)
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
189 (concat open-charset
2550
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
190 (regexp-opt-group (cdr strings) t t) "?"
2548
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
191 close-charset))
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
192 ;;
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
193 ;; If all are one-character strings, just return a character set.
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
194 ((= (length strings) (apply '+ (mapcar 'length strings)))
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
195 (concat open-charset
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
196 (regexp-opt-charset strings)
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
197 close-charset))
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
198 ;;
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
199 ;; We have a list of different length strings.
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
200 (t
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
201 (let ((prefix (try-completion "" (mapcar 'list strings)))
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
202 (letters (let ((completion-regexp-list '("^.$")))
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
203 (all-completions "" (mapcar 'list strings)))))
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
204 (cond
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
205 ;;
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
206 ;; If there is a common prefix, remove it and recurse on the suffixes.
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
207 ((> (length prefix) 0)
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
208 (let* ((length (length prefix))
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
209 (suffixes (mapcar (lambda (s) (substring s length)) strings)))
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
210 (concat open-group
2550
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
211 (regexp-quote prefix) (regexp-opt-group suffixes t t)
2548
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
212 close-group)))
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
213 ;;
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
214 ;; If there are several one-character strings, remove them and recurse
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
215 ;; on the rest (first so the final regexp finds the longest match).
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
216 ((> (length letters) 1)
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
217 (let ((rest (let ((completion-regexp-list '("^..+$")))
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
218 (all-completions "" (mapcar 'list strings)))))
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
219 (concat open-group
2550
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
220 (regexp-opt-group rest) "\\|" (regexp-opt-charset letters)
2548
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
221 close-group)))
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
222 ;;
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
223 ;; Otherwise, divide the list into those that start with a particular
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
224 ;; letter and those that do not, and recurse on them.
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
225 (t
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
226 (let* ((char (substring (car strings) 0 1))
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
227 (half1 (all-completions char (mapcar 'list strings)))
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
228 (half2 (nthcdr (length half1) strings)))
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
229 (concat open-group
2550
317f30471f4e [xemacs-hg @ 2005-02-03 07:30:21 by ben]
ben
parents: 2548
diff changeset
230 (regexp-opt-group half1) "\\|" (regexp-opt-group half2)
2548
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
231 close-group)))))))))
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
232
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
233 (defun regexp-opt-charset (chars)
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
234 ;;
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
235 ;; Return a regexp to match a character in CHARS.
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
236 ;;
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
237 ;; The basic idea is to find character ranges. Also we take care in the
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
238 ;; position of character set meta characters in the character set regexp.
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
239 ;;
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
240 (let* ((charwidth 256) ; Yeah, right.
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
241 ;; XEmacs: use bit-vectors instead of bool-vectors
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
242 (charmap (make-bit-vector charwidth 0))
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
243 (charset "")
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
244 (bracket "") (dash "") (caret ""))
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
245 ;;
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
246 ;; Make a character map but extract character set meta characters.
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
247 (dolist (char (mapcar 'string-to-char chars))
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
248 (case char
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
249 (?\]
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
250 (setq bracket "]"))
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
251 (?^
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
252 (setq caret "^"))
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
253 (?-
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
254 (setq dash "-"))
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
255 (otherwise
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
256 ;; XEmacs: 1
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
257 (aset charmap char 1))))
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
258 ;;
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
259 ;; Make a character set from the map using ranges where applicable.
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
260 (dotimes (char charwidth)
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
261 (let ((start char))
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
262 (while (and (< char charwidth)
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
263 ;; XEmacs: (not (zerop ...))
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
264 (not (zerop (aref charmap char))))
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
265 (incf char))
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
266 (cond ((> char (+ start 3))
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
267 (setq charset (format "%s%c-%c" charset start (1- char))))
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
268 ((> char start)
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
269 (setq charset (format "%s%c" charset (setq char start)))))))
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
270 ;;
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
271 ;; Make sure a caret is not first and a dash is first or last.
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
272 (if (and (string-equal charset "") (string-equal bracket ""))
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
273 (concat "[" dash caret "]")
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
274 (concat "[" bracket charset caret dash "]"))))
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
275
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
276 (provide 'regexp-opt)
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
277
c4c8a36043be [xemacs-hg @ 2005-02-03 07:11:19 by ben]
ben
parents:
diff changeset
278 ;;; regexp-opt.el ends here