Mercurial > hg > xemacs-beta
diff lisp/coding.el @ 4690:257b468bf2ca
Move the #'query-coding-region implementation to C.
This is necessary because there is no reasonable way to access the
corresponding mswindows-multibyte functionality from Lisp, and we need such
functionality if we're going to have a reliable and portable
#'query-coding-region implementation. However, this change doesn't yet
provide #'query-coding-region for the mswindow-multibyte coding systems,
there should be no functional differences between an XEmacs with this change
and one without it.
src/ChangeLog addition:
2009-09-19 Aidan Kehoe <kehoea@parhasard.net>
Move the #'query-coding-region implementation to C.
This is necessary because there is no reasonable way to access the
corresponding mswindows-multibyte functionality from Lisp, and we
need such functionality if we're going to have a reliable and
portable #'query-coding-region implementation. However, this
change doesn't yet provide #'query-coding-region for the
mswindow-multibyte coding systems, there should be no functional
differences between an XEmacs with this change and one without it.
* mule-coding.c (struct fixed_width_coding_system):
Add a new coding system type, fixed_width, and implement it. It
uses the CCL infrastructure but has a much simpler creation API,
and its own query_method, formerly in lisp/mule/mule-coding.el.
* unicode.c:
Move the Unicode query method implementation here from
unicode.el.
* lisp.h: Declare Fmake_coding_system_internal, Fcopy_range_table
here.
* intl-win32.c (complex_vars_of_intl_win32):
Use Fmake_coding_system_internal, not Fmake_coding_system.
* general-slots.h: Add Qsucceeded, Qunencodable, Qinvalid_sequence
here.
* file-coding.h (enum coding_system_variant):
Add fixed_width_coding_system here.
(struct coding_system_methods):
Add query_method and query_lstream_method to the coding system
methods.
Provide flags for the query methods.
Declare the default query method; initialise it correctly in
INITIALIZE_CODING_SYSTEM_TYPE.
* file-coding.c (default_query_method):
New function, the default query method for coding systems that do
not set it. Moved from coding.el.
(make_coding_system_1):
Accept new elements in PROPS in #'make-coding-system; aliases, a
list of aliases; safe-chars and safe-charsets (these were
previously accepted but not saved); and category.
(Fmake_coding_system_internal):
New function, what used to be #'make-coding-system--on Mule
builds, we've now moved some of the functionality of this to
Lisp.
(Fcoding_system_canonical_name_p):
Move this earlier in the file, since it's now called from within
make_coding_system_1.
(Fquery_coding_region):
Move the implementation of this here, from coding.el.
(complex_vars_of_file_coding):
Call Fmake_coding_system_internal, not Fmake_coding_system;
specify safe-charsets properties when we're a mule build.
* extents.h (mouse_highlight_priority, Fset_extent_priority,
Fset_extent_face, Fmap_extents):
Make these available to other C files.
lisp/ChangeLog addition:
2009-09-19 Aidan Kehoe <kehoea@parhasard.net>
Move the #'query-coding-region implementation to C.
* coding.el:
Consolidate code that depends on the presence or absence of Mule
at the end of this file.
(default-query-coding-region, query-coding-region):
Move these functions to C.
(default-query-coding-region-safe-charset-skip-chars-map):
Remove this variable, the corresponding C variable is
Vdefault_query_coding_region_chartab_cache in file-coding.c.
(query-coding-string): Update docstring to reflect actual multiple
values, be more careful about not modifying a range table that
we're currently mapping over.
(encode-coding-char): Make the implementation of this simpler.
(featurep 'mule): Autoload #'make-coding-system from
mule/make-coding-system.el if we're a mule build; provide an
appropriate compiler macro.
Do various non-mule compatibility things if we're not a mule
build.
* update-elc.el (additional-dump-dependencies):
Add mule/make-coding-system as a dump time dependency if we're a
mule build.
* unicode.el (ccl-encode-to-ucs-2):
(decode-char):
(encode-char):
Move these earlier in the file, for the sake of some byte compile
warnings.
(unicode-query-coding-region):
Move this to unicode.c
* mule/make-coding-system.el:
New file, not dumped. Contains the functionality to rework the
arguments necessary for fixed-width coding systems, and contains
the implementation of #'make-coding-system, which now calls
#'make-coding-system-internal.
* mule/vietnamese.el (viscii):
* mule/latin.el (iso-8859-2):
(windows-1250):
(iso-8859-3):
(iso-8859-4):
(iso-8859-14):
(iso-8859-15):
(iso-8859-16):
(iso-8859-9):
(macintosh):
(windows-1252):
* mule/hebrew.el (iso-8859-8):
* mule/greek.el (iso-8859-7):
(windows-1253):
* mule/cyrillic.el (iso-8859-5):
(koi8-r):
(koi8-u):
(windows-1251):
(alternativnyj):
(koi8-ru):
(koi8-t):
(koi8-c):
(koi8-o):
* mule/arabic.el (iso-8859-6):
(windows-1256):
Move all these coding systems to being of type fixed-width, not of
type CCL. This allows the distinct query-coding-region for them to
be in C, something which will eventually allow us to implement
query-coding-region for the mswindows-multibyte coding systems.
* mule/general-late.el (posix-charset-to-coding-system-hash):
Document why we're pre-emptively persuading the byte compiler that
the ELC for this file needs to be written using escape-quoted.
Call #'set-unicode-query-skip-chars-args, now the Unicode
query-coding-region implementation is in C.
* mule/thai-xtis.el (tis-620):
Don't bother checking whether we're XEmacs or not here.
* mule/mule-coding.el:
Move the eight bit fixed-width functionality from this file to
make-coding-system.el.
tests/ChangeLog addition:
2009-09-19 Aidan Kehoe <kehoea@parhasard.net>
* automated/mule-tests.el:
Check a coding system's type, not an 8-bit-fixed property, for
whether that coding system should be treated as a fixed-width
coding system.
* automated/query-coding-tests.el:
Don't test the query coding functionality for mswindows-multibyte
coding systems, it's not yet implemented.
author | Aidan Kehoe <kehoea@parhasard.net> |
---|---|
date | Sat, 19 Sep 2009 22:53:13 +0100 |
parents | 8cbca852bcd4 |
children | c673987f5f3d |
line wrap: on
line diff
--- a/lisp/coding.el Sat Sep 19 17:56:23 2009 +0200 +++ b/lisp/coding.el Sat Sep 19 22:53:13 2009 +0100 @@ -270,25 +270,12 @@ (terminal terminal-coding-system) (keyboard keyboard-coding-system))) -(when (not (featurep 'mule)) - (define-coding-system-alias 'escape-quoted 'binary) - ;; these are so that gnus and friends work when not mule - (define-coding-system-alias 'iso-8859-1 'raw-text) - ;; We're misrepresenting ourselves to the gnus code by saying we support - ;; both. - ; (define-coding-system-alias 'iso-8859-2 'raw-text) - (define-coding-system-alias 'ctext 'raw-text)) - (make-compatible-variable 'enable-multibyte-characters "Unimplemented") ;; Sure would be nice to be able to use defface here. (copy-face 'highlight 'query-coding-warning-face) -(defvar default-query-coding-region-safe-charset-skip-chars-map - #s(hash-table test equal data ()) - "A map from list of charsets to `skip-chars-forward' arguments for them.") - -(defsubst query-coding-clear-highlights (begin end &optional buffer-or-string) +(defun query-coding-clear-highlights (begin end &optional buffer-or-string) "Remove extent faces added by `query-coding-region' between BEGIN and END. Optional argument BUFFER-OR-STRING is the buffer or string to use, and @@ -302,170 +289,6 @@ (extent-face extent)) (delete-extent extent))) buffer-or-string begin end)) -(defun* default-query-coding-region (begin end coding-system - &optional buffer ignore-invalid-sequencesp - errorp highlightp) - "The default `query-coding-region' implementation. - -Uses the `safe-charsets' and `safe-chars' coding system properties. -The former is a list of XEmacs character sets that can be safely -encoded by CODING-SYSTEM; the latter a char table describing, in -addition, characters that can be safely encoded by CODING-SYSTEM. - -Does not support IGNORE-INVALID-SEQUENCESP." - (check-argument-type #'coding-system-p - (setq coding-system (find-coding-system coding-system))) - (check-argument-type #'integer-or-marker-p begin) - (check-argument-type #'integer-or-marker-p end) - (let* ((safe-charsets - (or (coding-system-get coding-system 'safe-charsets) - (coding-system-get (coding-system-base coding-system) - 'safe-charsets))) - (safe-chars - (or (coding-system-get coding-system 'safe-chars) - (coding-system-get (coding-system-base coding-system) - 'safe-chars))) - (skip-chars-arg - (gethash safe-charsets - default-query-coding-region-safe-charset-skip-chars-map)) - (ranges (make-range-table)) - (case-fold-search nil) - fail-range-start fail-range-end char-after - looking-at-arg failed extent) - ;; Coding systems with a value of t for safe-charsets support everything. - (when (eq t safe-charsets) - (return-from default-query-coding-region (values t nil))) - (unless skip-chars-arg - (setq skip-chars-arg - (puthash safe-charsets - (mapconcat #'charset-skip-chars-string - safe-charsets "") - default-query-coding-region-safe-charset-skip-chars-map))) - (when highlightp - (query-coding-clear-highlights begin end buffer)) - (if (and (zerop (length skip-chars-arg)) (null safe-chars)) - (progn - ;; Uh-oh, nothing known about this coding system. Fail. - (when errorp - (error 'text-conversion-error - "Coding system doesn't say what it can encode" - (coding-system-name coding-system))) - (put-range-table begin end t ranges) - (when highlightp - (setq extent (make-extent begin end buffer)) - (set-extent-priority extent (+ mouse-highlight-priority 2)) - (set-extent-face extent 'query-coding-warning-face)) - (values nil ranges)) - (setq looking-at-arg (if (equal "" skip-chars-arg) - ;; Regexp that will never match. - #r".\{0,0\}" - (concat "[" skip-chars-arg "]"))) - (save-excursion - (goto-char begin buffer) - (skip-chars-forward skip-chars-arg end buffer) - (while (< (point buffer) end) - ; (message - ; "fail-range-start is %S, point is %S, end is %S" - ; fail-range-start (point buffer) end) - (setq char-after (char-after (point buffer) buffer) - fail-range-start (point buffer)) - (while (and - (< (point buffer) end) - (not (looking-at looking-at-arg)) - (or (not safe-chars) - (not (get-char-table char-after safe-chars)))) - (forward-char 1 buffer) - (setq char-after (char-after (point buffer) buffer) - failed t)) - (if (= fail-range-start (point buffer)) - ;; The character can actually be encoded by the coding - ;; system; check the characters past it. - (forward-char 1 buffer) - ;; Can't be encoded; note this. - (when errorp - (error 'text-conversion-error - (format "Cannot encode %s using coding system" - (buffer-substring fail-range-start (point buffer) - buffer)) - (coding-system-name coding-system))) - (put-range-table fail-range-start - ;; If char-after is non-nil, we're not at - ;; the end of the buffer. - (setq fail-range-end (if char-after - (point buffer) - (point-max buffer))) - t ranges) - (when highlightp - (setq extent (make-extent fail-range-start fail-range-end buffer)) - (set-extent-priority extent (+ mouse-highlight-priority 2)) - (set-extent-face extent 'query-coding-warning-face))) - (skip-chars-forward skip-chars-arg end buffer)) - (if failed - (values nil ranges) - (values t nil)))))) - -(defun query-coding-region (start end coding-system &optional buffer - ignore-invalid-sequencesp errorp highlight) - "Work out whether CODING-SYSTEM can losslessly encode a region. - -START and END are the beginning and end of the region to check. -CODING-SYSTEM is the coding system to try. - -Optional argument BUFFER is the buffer to check, and defaults to the current -buffer. - -IGNORE-INVALID-SEQUENCESP, also an optional argument, says to treat XEmacs -characters which have an unambiguous encoded representation, despite being -undefined in what they represent, as encodable. These chiefly arise with -variable-length encodings like UTF-8 and UTF-16, where an invalid sequence -is passed through to XEmacs as a sequence of characters with a defined -correspondence to the octets on disk, but no non-error semantics; see the -`invalid-sequence-coding-system' argument to `set-language-info'. - -They can also arise with fixed-length encodings like ISO 8859-7, where -certain octets on disk have undefined values, and treating them as -corresponding to the ISO 8859-1 characters with the same numerical values -may lead to data that is not understood by other applications. - -Optional argument ERRORP says to signal a `text-conversion-error' if some -character in the region cannot be encoded, and defaults to nil. - -Optional argument HIGHLIGHT says to display unencodable characters in the -region using `query-coding-warning-face'. It defaults to nil. - -This function returns a list; the intention is that callers use -`multiple-value-bind' or the related CL multiple value functions to deal -with it. The first element is `t' if the region can be encoded using -CODING-SYSTEM, or `nil' if not. The second element is `nil' if the region -can be encoded using CODING-SYSTEM; otherwise, it is a range table -describing the positions of the unencodable characters. Ranges that -describe characters that would be ignored were IGNORE-INVALID-SEQUENCESP -non-nil map to the symbol `invalid-sequence'; other ranges map to the symbol -`unencodable'. If IGNORE-INVALID-SEQUENCESP is non-nil, all ranges will map -to the symbol `unencodable'. See `make-range-table' for more details of -range tables." - (funcall (or (coding-system-get coding-system 'query-coding-function) - #'default-query-coding-region) - start end coding-system buffer ignore-invalid-sequencesp errorp - highlight)) - -(define-compiler-macro query-coding-region (start end coding-system - &optional buffer - ignore-invalid-sequencesp - errorp highlight) - `(funcall (or (coding-system-get ,coding-system 'query-coding-function) - #'default-query-coding-region) - ,start ,end ,coding-system ,@(append (when (or buffer - ignore-invalid-sequencesp - errorp highlight) - (list buffer)) - (when (or ignore-invalid-sequencesp - errorp highlight) - (list ignore-invalid-sequencesp)) - (when (or errorp highlight) - (list errorp)) - (when highlight (list highlight))))) - (defun query-coding-string (string coding-system &optional ignore-invalid-sequencesp errorp highlight) "Work out whether CODING-SYSTEM can losslessly encode STRING. @@ -482,7 +305,7 @@ They can also arise with fixed-length encodings like ISO 8859-7, where certain octets on disk have undefined values, and treating them as corresponding to the ISO 8859-1 characters with the same numerical values -may lead to data that is not understood by other applications. +may lead to data that are not understood by other applications. Optional argument ERRORP says to signal a `text-conversion-error' if some character in the region cannot be encoded, and defaults to nil. @@ -490,39 +313,42 @@ Optional argument HIGHLIGHT says to display unencodable characters in the region using `query-coding-warning-face'. It defaults to nil. -This function returns a list; the intention is that callers use +This function can return multiple values; the intention is that callers use `multiple-value-bind' or the related CL multiple value functions to deal -with it. The first element is `t' if the region can be encoded using -CODING-SYSTEM, or `nil' if not. The second element is `nil' if the region -can be encoded using CODING-SYSTEM; otherwise, it is a range table -describing the positions of the unencodable characters. Ranges that -describe characters that would be ignored were IGNORE-INVALID-SEQUENCESP -non-nil map to the symbol `invalid-sequence'; other ranges map to the symbol -`unencodable'. If IGNORE-INVALID-SEQUENCESP is non-nil, all ranges will map -to the symbol `unencodable'. See `make-range-table' for more details of -range tables." +with it. The first result is `t' if the region can be encoded using +CODING-SYSTEM, or `nil' if not. If the region cannot be encoded using +CODING-SYSTEM, the second result is a range table describing the positions +of the unencodable characters. + +Ranges that describe characters that would be ignored were +IGNORE-INVALID-SEQUENCESP non-nil map to the symbol `invalid-sequence'; +other ranges map to the symbol `unencodable'. If IGNORE-INVALID-SEQUENCESP +is non-nil, all ranges will map to the symbol `unencodable'. See +`make-range-table' for more details of range tables." (with-temp-buffer (when highlight (query-coding-clear-highlights 0 (length string) string)) (insert string) - (multiple-value-bind (result ranges extent) + (multiple-value-bind (result ranges) (query-coding-region (point-min) (point-max) coding-system (current-buffer) ignore-invalid-sequencesp errorp) - (unless result - (map-range-table - #'(lambda (begin end value) - ;; Sigh, string indices are zero-based, buffer offsets are - ;; one-based. - (remove-range-table begin end ranges) - (put-range-table (decf begin) (decf end) value ranges) - (when highlight - (setq extent (make-extent begin end string)) - (set-extent-priority extent (+ mouse-highlight-priority 2)) - (set-extent-property extent 'duplicable t) - (set-extent-face extent 'query-coding-warning-face))) - ranges)) - (values result ranges)))) + (unless result + (let ((original-ranges ranges) + extent) + (setq ranges (make-range-table)) + (map-range-table + #'(lambda (begin end value) + ;; Sigh, string indices are zero-based, buffer offsets are + ;; one-based. + (put-range-table (decf begin) (decf end) value ranges) + (when highlight + (setq extent (make-extent begin end string)) + (set-extent-priority extent (+ mouse-highlight-priority 2)) + (set-extent-property extent 'duplicable t) + (set-extent-face extent 'query-coding-warning-face))) + original-ranges))) + (if result result (values result ranges))))) ;; Function docstring and API are taken from GNU coding.c version 1.353, GPLv2. (defun unencodable-char-position (start end coding-system @@ -615,7 +441,8 @@ (multiple-value-bind (encoded ranges) (query-coding-region begin end coding-system) (unless encoded - (setq intermediate (list (coding-system-name coding-system))) + (setq intermediate + (list (coding-system-name coding-system))) (map-range-table range-lambda ranges) (push (nreverse intermediate) result))) finally return result)))) @@ -634,18 +461,36 @@ If CODING-SYSTEM can't safely encode CHAR, return nil. The optional third argument CHARSET is, for the moment, ignored." (check-argument-type #'characterp char) - (multiple-value-bind (succeededp) - (query-coding-string char coding-system) - (when succeededp - (encode-coding-string char coding-system)))) + (and (query-coding-string char coding-system) + (encode-coding-string char coding-system))) + +(if (featurep 'mule) + (progn + ;; Under Mule, we do much of the complicated coding system creation in + ;; Lisp and especially at compile time. We need some function + ;; definition for this function to be created in this file, but we can + ;; leave assigning the docstring to the autoload cookie + ;; handling later. Thankfully; that docstring is big. + (autoload 'make-coding-system "mule/make-coding-system") -(unless (featurep 'mule) - ;; If we're under non-Mule, every XEmacs character can be encoded - ;; with every XEmacs coding system. - (fset #'default-query-coding-region - #'(lambda (&rest ignored) - "Stub `query-coding-region' implementation. Always succeeds." - (values t nil))) - (unintern 'default-query-coding-region-safe-charset-skip-chars-map)) + ;; (During byte-compile before dumping, make-coding-system may already + ;; have been loaded, make sure not to overwrite the correct compiler + ;; macro:) + (when (eq 'autoload (car (symbol-function 'make-coding-system))) + ;; Make sure to pick up the correct compiler macro when compiling + ;; files: + (define-compiler-macro make-coding-system (&whole form name type + &optional description props) + (load (second (symbol-function 'make-coding-system))) + (funcall (get 'make-coding-system 'cl-compiler-macro) + form name type description props)))) + + ;; Mule's not available; + (fset 'make-coding-system (symbol-function 'make-coding-system-internal)) + (define-coding-system-alias 'escape-quoted 'binary) + + ;; These are so that gnus and friends work when not mule: + (define-coding-system-alias 'iso-8859-1 'raw-text) + (define-coding-system-alias 'ctext 'raw-text)) ;;; coding.el ends here