Mercurial > hg > xemacs-beta
view lisp/coding.el @ 4568:1d74a1d115ee
Add #'query-coding-region tests; do the work necessary to get them running.
lisp/ChangeLog addition:
2008-12-28 Aidan Kehoe <kehoea@parhasard.net>
* coding.el (default-query-coding-region):
Declare using defun*, so we can #'return-from to it on
encountering a safe-charsets value of t. Comment out a few
debug messages.
(query-coding-region):
Correct the docstring, it deals with a region, not a string.
(unencodable-char-position):
Correct the implementation for non-nil COUNT, special-case a zero
value for count, treat it as one. Don't rely on dynamic scope when
calling the main lambda.
* unicode.el (unicode-query-coding-region):
Comment out some debug messages here.
* mule/mule-coding.el (8-bit-fixed-query-coding-region):
Comment out some debug messages here.
* code-init.el (raw-text):
Add a safe-charsets property to this coding system.
* mule/korean.el (iso-2022-int-1):
* mule/korean.el (euc-kr):
* mule/korean.el (iso-2022-kr):
Add safe-charsets properties for these coding systems.
* mule/japanese.el (iso-2022-jp):
* mule/japanese.el (jis7):
* mule/japanese.el (jis8):
* mule/japanese.el (shift-jis):
* mule/japanese.el (iso-2022-jp-1978-irv):
* mule/japanese.el (euc-jp):
Add safe-charsets properties for all these coding systems.
* mule/iso-with-esc.el:
Add safe-charsets properties to all the coding systems in
here. Comment on the downside of a safe-charsets value of t for
iso-latin-1-with-esc.
* mule/hebrew.el (ctext-hebrew):
Add a safe-charsets property for this coding system.
* mule/devanagari.el (in-is13194-devanagari):
Add a safe-charsets property for this coding system.
* mule/chinese.el (cn-gb-2312):
* mule/chinese.el (hz-gb-2312):
* mule/chinese.el (big5):
Add safe-charsets properties for these coding systems.
* mule/latin.el (iso-8859-14):
Add an implementation for this, using #'make-8-bit-coding-system.
* mule/mule-coding.el (ctext):
* mule/mule-coding.el (iso-2022-8bit-ss2):
* mule/mule-coding.el (iso-2022-7bit-ss2):
* mule/mule-coding.el (iso-2022-jp-2):
* mule/mule-coding.el (iso-2022-7bit):
* mule/mule-coding.el (iso-2022-8):
* mule/mule-coding.el (escape-quoted):
* mule/mule-coding.el (iso-2022-lock):
Add safe-charsets properties for all these coding systems.
src/ChangeLog addition:
2008-12-28 Aidan Kehoe <kehoea@parhasard.net>
* file-coding.c (Fmake_coding_system):
Document our use of the safe-chars and safe-charsets properties,
and the differences compared to GNU.
(make_coding_system_1): Don't drop the safe-chars and
safe-charsets properties.
(Fcoding_system_property): Return the safe-chars and safe-charsets
properties when asked for them.
* file-coding.h (CODING_SYSTEM_SAFE_CHARSETS):
* coding-system-slots.h:
Make the safe-chars and safe-charsets slots available in these
headers.
tests/ChangeLog addition:
2008-12-28 Aidan Kehoe <kehoea@parhasard.net>
* automated/query-coding-tests.el:
New file, testing the functionality of #'query-coding-region and
#'query-coding-string.
author | Aidan Kehoe <kehoea@parhasard.net> |
---|---|
date | Sun, 28 Dec 2008 14:46:24 +0000 |
parents | 46ddeaa7c738 |
children | e6a7054a9c30 |
line wrap: on
line source
;;; coding.el --- Coding-system functions for XEmacs. ;; Copyright (C) 1995 Electrotechnical Laboratory, JAPAN. ;; Licensed to the Free Software Foundation. ;; Copyright (C) 1995 Amdahl Corporation. ;; Copyright (C) 1995 Sun Microsystems. ;; Copyright (C) 1997 MORIOKA Tomohiko ;; Copyright (C) 2000, 2001, 2002 Ben Wing. ;; This file is part of XEmacs. ;; XEmacs is free software; you can redistribute it and/or modify it ;; under the terms of the GNU General Public License as published by ;; the Free Software Foundation; either version 2, or (at your option) ;; any later version. ;; XEmacs is distributed in the hope that it will be useful, but ;; WITHOUT ANY WARRANTY; without even the implied warranty of ;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ;; General Public License for more details. ;; You should have received a copy of the GNU General Public License ;; along with XEmacs; see the file COPYING. If not, write to the ;; Free Software Foundation, Inc., 59 Temple Place - Suite 330, ;; Boston, MA 02111-1307, USA. ;;; Commentary: ;;; split off of mule.el. ;;; Code: (globally-declare-fboundp '(coding-system-lock-shift coding-system-seven coding-system-charset charset-dimension)) (defalias 'check-coding-system 'get-coding-system) (defun modify-coding-system-alist (target-type regexp coding-system) "Modify one of look up tables for finding a coding system on I/O operation. There are three of such tables, `file-coding-system-alist', `process-coding-system-alist', and `network-coding-system-alist'. TARGET-TYPE specifies which of them to modify. If it is `file', it affects `file-coding-system-alist' (which see). If it is `process', it affects `process-coding-system-alist' (which see). If it is `network', it affects `network-coding-system-alist' (which see). REGEXP is a regular expression matching a target of I/O operation. The target is a file name if TARGET-TYPE is `file', a program name if TARGET-TYPE is `process', or a network service name or a port number to connect to if TARGET-TYPE is `network'. CODING-SYSTEM is a coding system to perform code conversion on the I/O operation, or a cons cell (DECODING . ENCODING) specifying the coding systems for decoding and encoding respectively, or a function symbol which, when called, returns such a cons cell." (or (memq target-type '(file process network)) (error "Invalid target type: %s" target-type)) (or (stringp regexp) (and (eq target-type 'network) (integerp regexp)) (error "Invalid regular expression: %s" regexp)) (if (symbolp coding-system) (if (not (fboundp coding-system)) (progn (check-coding-system coding-system) (setq coding-system (cons coding-system coding-system)))) (check-coding-system (car coding-system)) (check-coding-system (cdr coding-system))) (cond ((eq target-type 'file) (let ((slot (assoc regexp file-coding-system-alist))) (if slot (setcdr slot coding-system) (setq file-coding-system-alist (cons (cons regexp coding-system) file-coding-system-alist))))) ((eq target-type 'process) (let ((slot (assoc regexp process-coding-system-alist))) (if slot (setcdr slot coding-system) (setq process-coding-system-alist (cons (cons regexp coding-system) process-coding-system-alist))))) (t (let ((slot (assoc regexp network-coding-system-alist))) (if slot (setcdr slot coding-system) (setq network-coding-system-alist (cons (cons regexp coding-system) network-coding-system-alist))))))) (defsubst keyboard-coding-system () "Return coding-system of what is sent from terminal keyboard." keyboard-coding-system) (defun set-keyboard-coding-system (coding-system) "Set the coding system used for TTY keyboard input. Currently broken." (interactive "zkeyboard-coding-system: ") (get-coding-system coding-system) ; correctness check (setq keyboard-coding-system coding-system) (if (eq (device-type) 'tty) (declare-fboundp (set-console-tty-input-coding-system (device-console) keyboard-coding-system))) (redraw-modeline t)) (defsubst terminal-coding-system () "Return coding-system of your terminal." terminal-coding-system) (defun set-terminal-coding-system (coding-system) "Set the coding system used for TTY display output." (interactive "zterminal-coding-system: ") (get-coding-system coding-system) ; correctness check (setq terminal-coding-system coding-system) ; #### should this affect all current tty consoles ? (if (eq (device-type) 'tty) (declare-fboundp (set-console-tty-output-coding-system (device-console) terminal-coding-system))) (redraw-modeline t)) (defun what-coding-system (start end &optional arg) "Show the encoding of text in the region. This function is meant to be called interactively; from a Lisp program, use `detect-coding-region' instead." (interactive "r\nP") (princ (detect-coding-region start end))) (defun decode-coding-string (str coding-system &optional nocopy) "Decode the string STR which is encoded in CODING-SYSTEM. Normally does not modify STR. Returns the decoded string on successful conversion. Optional argument NOCOPY says that modifying STR and returning it is allowed." (with-string-as-buffer-contents str (decode-coding-region (point-min) (point-max) coding-system))) (defun encode-coding-string (str coding-system &optional nocopy) "Encode the string STR using CODING-SYSTEM. Does not modify STR. Returns the encoded string on successful conversion. Optional argument NOCOPY says that the original string may be returned if does not differ from the encoded string. " (with-string-as-buffer-contents str (encode-coding-region (point-min) (point-max) coding-system))) ;;;; Coding system accessors (defun coding-system-mnemonic (coding-system) "Return the 'mnemonic property of CODING-SYSTEM." (coding-system-property coding-system 'mnemonic)) (defun coding-system-documentation (coding-system) "Return the 'documentation property of CODING-SYSTEM." (coding-system-property coding-system 'documentation)) (define-obsolete-function-alias 'coding-system-doc-string 'coding-system-description) (defun coding-system-eol-type (coding-system) "Return the 'eol-type property of CODING-SYSTEM." (coding-system-property coding-system 'eol-type)) (defun coding-system-eol-lf (coding-system) "Return the 'eol-lf property of CODING-SYSTEM." (coding-system-property coding-system 'eol-lf)) (defun coding-system-eol-crlf (coding-system) "Return the 'eol-crlf property of CODING-SYSTEM." (coding-system-property coding-system 'eol-crlf)) (defun coding-system-eol-cr (coding-system) "Return the 'eol-cr property of CODING-SYSTEM." (coding-system-property coding-system 'eol-cr)) (defun coding-system-post-read-conversion (coding-system) "Return the 'post-read-conversion property of CODING-SYSTEM." (coding-system-property coding-system 'post-read-conversion)) (defun coding-system-pre-write-conversion (coding-system) "Return the 'pre-write-conversion property of CODING-SYSTEM." (coding-system-property coding-system 'pre-write-conversion)) ;;; #### bleagh!!!!!!! (defun coding-system-get (coding-system prop) "Extract a value from CODING-SYSTEM's property list for property PROP." (or (plist-get (get (coding-system-name coding-system) 'coding-system-property) prop) (condition-case nil (coding-system-property coding-system prop) (error nil)))) (defun coding-system-put (coding-system prop value) "Change value in CODING-SYSTEM's property list PROP to VALUE." (put (coding-system-name coding-system) 'coding-system-property (plist-put (get (coding-system-name coding-system) 'coding-system-property) prop value))) (defun coding-system-category (coding-system) "Return the coding category of CODING-SYSTEM." (or (coding-system-get coding-system 'category) (case (coding-system-type coding-system) (no-conversion 'no-conversion) (shift-jis 'shift-jis) (unicode (case (coding-system-property coding-system 'unicode-type) (utf-8 (let ((bom (coding-system-property coding-system 'need-bom))) (cond (bom 'utf-8-bom) ((not bom) 'utf-8)))) (ucs-4 'ucs-4) (utf-16 (let ((bom (coding-system-property coding-system 'need-bom)) (le (coding-system-property coding-system 'little-endian))) (cond ((and bom le) 'utf-16-little-endian-bom) ((and bom (not le) 'utf-16-bom)) ((and (not bom) le) 'utf-16-little-endian) ((and (not bom) (not le) 'utf-16))))))) (big5 'big5) (iso2022 (cond ((coding-system-lock-shift coding-system) 'iso-lock-shift) ((coding-system-seven coding-system) 'iso-7) (t (let ((dim 0) ccs (i 0)) (while (< i 4) (setq ccs (declare-fboundp (coding-system-iso2022-charset coding-system i))) (if (and ccs (> (charset-dimension ccs) dim)) (setq dim (charset-dimension ccs)) ) (setq i (1+ i))) (cond ((= dim 1) 'iso-8-1) ((= dim 2) 'iso-8-2) (t 'iso-8-designate)))))) ))) ;;; Make certain variables equivalent to coding-system aliases (defun dontusethis-set-value-file-name-coding-system-handler (sym args fun harg handlers) (define-coding-system-alias 'file-name (or (car args) 'binary))) (dontusethis-set-symbol-value-handler 'file-name-coding-system 'set-value 'dontusethis-set-value-file-name-coding-system-handler) (defun dontusethis-set-value-terminal-coding-system-handler (sym args fun harg handlers) (define-coding-system-alias 'terminal (or (car args) 'binary))) (dontusethis-set-symbol-value-handler 'terminal-coding-system 'set-value 'dontusethis-set-value-terminal-coding-system-handler) (defun dontusethis-set-value-keyboard-coding-system-handler (sym args fun harg handlers) (define-coding-system-alias 'keyboard (or (car args) 'binary))) (dontusethis-set-symbol-value-handler 'keyboard-coding-system 'set-value 'dontusethis-set-value-keyboard-coding-system-handler) (when (not (featurep 'mule)) (define-coding-system-alias 'escape-quoted 'binary) ;; these are so that gnus and friends work when not mule (define-coding-system-alias 'iso-8859-1 'raw-text) ;; We're misrepresenting ourselves to the gnus code by saying we support ;; both. ; (define-coding-system-alias 'iso-8859-2 'raw-text) (define-coding-system-alias 'ctext 'raw-text)) (make-compatible-variable 'enable-multibyte-characters "Unimplemented") ;; Sure would be nice to be able to use defface here. (copy-face 'highlight 'query-coding-warning-face) (defvar default-query-coding-region-safe-charset-skip-chars-map #s(hash-table test equal data ()) "A map from list of charsets to `skip-chars-forward' arguments for them.") (defsubst query-coding-clear-highlights (begin end &optional buffer) "Remove extent faces added by `query-coding-region' between BEGIN and END. Optional argument BUFFER is the buffer to use, and defaults to the current buffer. The HIGHLIGHTP argument to `query-coding-region' indicates that it should display unencodable characters using `query-coding-warning-face'. After this function has been called, this will no longer be the case. " (map-extents #'(lambda (extent ignored-arg) (when (eq 'query-coding-warning-face (extent-face extent)) (delete-extent extent))) buffer begin end)) (defun* default-query-coding-region (begin end coding-system &optional buffer errorp highlightp) "The default `query-coding-region' implementation. Uses the `safe-charsets' and `safe-chars' coding system properties. The former is a list of XEmacs character sets that can be safely encoded by CODING-SYSTEM; the latter a char table describing, in addition, characters that can be safely encoded by CODING-SYSTEM." (check-argument-type #'coding-system-p (setq coding-system (find-coding-system coding-system))) (check-argument-type #'integer-or-marker-p begin) (check-argument-type #'integer-or-marker-p end) (let* ((safe-charsets (or (coding-system-get coding-system 'safe-charsets) (coding-system-get (coding-system-base coding-system) 'safe-charsets))) (safe-chars (or (coding-system-get coding-system 'safe-chars) (coding-system-get (coding-system-base coding-system) 'safe-chars))) (skip-chars-arg (gethash safe-charsets default-query-coding-region-safe-charset-skip-chars-map)) (ranges (make-range-table)) fail-range-start fail-range-end char-after looking-at-arg failed extent) ;; Coding systems with a value of t for safe-charsets support everything. (when (eq t safe-charsets) (return-from default-query-coding-region (values t nil))) (unless skip-chars-arg (setq skip-chars-arg (puthash safe-charsets (mapconcat #'charset-skip-chars-string safe-charsets "") default-query-coding-region-safe-charset-skip-chars-map))) (when highlightp (query-coding-clear-highlights begin end buffer)) (if (and (zerop (length skip-chars-arg)) (null safe-chars)) (progn ;; Uh-oh, nothing known about this coding system. Fail. (when errorp (error 'text-conversion-error "Coding system doesn't say what it can encode" (coding-system-name coding-system))) (put-range-table begin end t ranges) (when highlightp (setq extent (make-extent begin end buffer)) (set-extent-priority extent (+ mouse-highlight-priority 2)) (set-extent-face extent 'query-coding-warning-face)) (values nil ranges)) (setq looking-at-arg (if (equal "" skip-chars-arg) ;; Regexp that will never match. #r".\{0,0\}" (concat "[" skip-chars-arg "]"))) (save-excursion (goto-char begin buffer) (skip-chars-forward skip-chars-arg end buffer) (while (< (point buffer) end) ; (message ; "fail-range-start is %S, point is %S, end is %S" ; fail-range-start (point buffer) end) (setq char-after (char-after (point buffer) buffer) fail-range-start (point buffer)) (while (and (< (point buffer) end) (not (looking-at looking-at-arg)) (or (not safe-chars) (not (get-char-table char-after safe-chars)))) (forward-char 1 buffer) (setq char-after (char-after (point buffer) buffer) failed t)) (if (= fail-range-start (point buffer)) ;; The character can actually be encoded by the coding ;; system; check the characters past it. (forward-char 1 buffer) ;; Can't be encoded; note this. (when errorp (error 'text-conversion-error (format "Cannot encode %s using coding system" (buffer-substring fail-range-start (point buffer) buffer)) (coding-system-name coding-system))) (put-range-table fail-range-start ;; If char-after is non-nil, we're not at ;; the end of the buffer. (setq fail-range-end (if char-after (point buffer) (point-max buffer))) t ranges) (when highlightp (setq extent (make-extent fail-range-start fail-range-end buffer)) (set-extent-priority extent (+ mouse-highlight-priority 2)) (set-extent-face extent 'query-coding-warning-face))) (skip-chars-forward skip-chars-arg end buffer)) (if failed (values nil ranges) (values t nil)))))) (defun query-coding-region (start end coding-system &optional buffer errorp highlight) "Work out whether CODING-SYSTEM can losslessly encode a region. START and END are the beginning and end of the region to check. CODING-SYSTEM is the coding system to try. Optional argument BUFFER is the buffer to check, and defaults to the current buffer. Optional argument ERRORP says to signal a `text-conversion-error' if some character in the region cannot be encoded, and defaults to nil. Optional argument HIGHLIGHT says to display unencodable characters in the region using `query-coding-warning-face'. It defaults to nil. This function returns a list; the intention is that callers use `multiple-value-bind' or the related CL multiple value functions to deal with it. The first element is `t' if the region can be encoded using CODING-SYSTEM, or `nil' if not. The second element is `nil' if the region can be encoded using CODING-SYSTEM; otherwise, it is a range table describing the positions of the unencodable characters. See `make-range-table'." (funcall (or (coding-system-get coding-system 'query-coding-function) #'default-query-coding-region) start end coding-system buffer errorp highlight)) (defun query-coding-string (string coding-system &optional errorp highlight) "Work out whether CODING-SYSTEM can losslessly encode STRING. CODING-SYSTEM is the coding system to check. Optional argument ERRORP says to signal a `text-conversion-error' if some character in the region cannot be encoded, and defaults to nil. Optional argument HIGHLIGHT says to display unencodable characters in the region using `query-coding-warning-face'. It defaults to nil. This function returns a list; the intention is that callers use use `multiple-value-bind' or the related CL multiple value functions to deal with it. The first element is `t' if the string can be encoded using CODING-SYSTEM, or `nil' if not. The second element is `nil' if the string can be encoded using CODING-SYSTEM; otherwise, it is a range table describing the positions of the unencodable characters. See `make-range-table'." (with-temp-buffer (insert string) (query-coding-region (point-min) (point-max) coding-system (current-buffer) ;; ### Will highlight work here? errorp highlight))) (defun unencodable-char-position (start end coding-system &optional count string) "Return position of first un-encodable character in a region. START and END specify the region and CODING-SYSTEM specifies the encoding to check. Return nil if CODING-SYSTEM does encode the region. If optional 4th argument COUNT is non-nil, it specifies at most how many un-encodable characters to search. In this case, the value is a list of positions. If optional 5th argument STRING is non-nil, it is a string to search for un-encodable characters. In that case, START and END are indexes in the string." (let ((thunk #'(lambda (start end coding-system &optional count) (multiple-value-bind (result ranges) (query-coding-region start end coding-system) (if result nil (block worked-it-all-out (if count (map-range-table #'(lambda (begin end value) (while (and (< begin end) (< (length result) count)) (push begin result) (incf begin)) (when (= (length result) count) (return-from worked-it-all-out result))) ranges) (map-range-table #'(lambda (begin end value) (return-from worked-it-all-out begin)) ranges)) (assert (not (null count)) t "We should never reach this point with null COUNT.") result)))))) (check-argument-type #'integer-or-marker-p start) (check-argument-type #'integer-or-marker-p end) (check-coding-system coding-system) (and count (check-argument-type #'natnump count) ;; Special-case zero, sigh. (if (zerop count) (setq count 1))) (and string (check-argument-type #'stringp string)) (if string (with-temp-buffer (insert string) (funcall thunk start end coding-system count)) (funcall thunk start end coding-system count)))) (defun encode-coding-char (char coding-system) "Encode CHAR by CODING-SYSTEM and return the resulting string. If CODING-SYSTEM can't safely encode CHAR, return nil." (check-argument-type #'characterp char) (multiple-value-bind (succeededp) (query-coding-string char coding-system) (when succeededp (encode-coding-string char coding-system)))) (unless (featurep 'mule) ;; If we're under non-Mule, every XEmacs character can be encoded ;; with every XEmacs coding system. (fset #'default-query-coding-region #'(lambda (&rest ignored) (values t nil))) (unintern 'default-query-coding-region-safe-charset-skip-chars-map)) ;;; coding.el ends here