xemacs-beta: lisp/mule/mule-cmds.el comparison

comparison lisp/mule/mule-cmds.el @ 4604:e0a8715fdb1f

Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region. lisp/ChangeLog addition: 2009-02-07 Aidan Kehoe <kehoea@parhasard.net> * coding.el (query-coding-clear-highlights): Rename the BUFFER argument to BUFFER-OR-STRING, describe it as possibly being a string in its documentation. (default-query-coding-region): Add a new IGNORE-INVALID-SEQUENCESP argument, document that this function does not support it. Bind case-fold-search to nil, we don't want this to influence what the function thinks is encodable or not. (query-coding-region): Add a new IGNORE-INVALID-SEQUENCESP argument, document what it does; reflect this new argument in the associated compiler macro. (query-coding-string): Add a new IGNORE-INVALID-SEQUENCESP argument, document what it does. Support the HIGHLIGHT argument correctly. * unicode.el (unicode-query-coding-region): Add a new IGNORE-INVALID-SEQUENCESP argument, document what it does, implement this. Document a potential problem. Use #'query-coding-clear-highlights instead of reimplementing it ourselves. Remove some debugging messages. * mule/arabic.el (iso-8859-6): * mule/cyrillic.el (iso-8859-5): * mule/greek.el (iso-8859-7): * mule/hebrew.el (iso-8859-8): * mule/latin.el (iso-8859-2): * mule/latin.el (iso-8859-3): * mule/latin.el (iso-8859-4): * mule/latin.el (iso-8859-14): * mule/latin.el (iso-8859-15): * mule/latin.el (iso-8859-16): * mule/latin.el (iso-8859-9): * mule/latin.el (windows-1252): * mule/mule-coding.el (iso-8859-1): Avoid the assumption that characters not given an explicit mapping in these coding systems map to the ISO 8859-1 characters corresponding to the octets on disk; this makes it much more reasonable to implement the IGNORE-INVALID-SEQUENCESP argument to query-coding-region. * mule/mule-cmds.el (set-language-info): Correct the docstring. * mule/mule-cmds.el (finish-set-language-environment): Treat invalid Unicode sequences produced from invalid-sequence-coding-system and corresponding to control characters the same as control characters in redisplay. * mule/mule-cmds.el: Document that encode-coding-char is available in coding.el * mule/mule-coding.el (make-8-bit-generate-helper): Change to return the both the encode-program generated and the relevant non-ASCII charset; update the docstring to reflect this. * mule/mule-coding.el (make-8-bit-generate-encode-program-and-skip-chars-strings): Rename this function; have it return skip-chars-strings as well as the encode program. Have these skip-chars-strings use ranges for charsets, where possible. * mule/mule-coding.el (make-8-bit-create-decode-encode-tables): Revise this to allow people to specify explicitly characters that should be undefined (= corresponding to keys in unicode-error-default-translation-table), and treating unspecified octets above #x7f as undefined by default. * mule/mule-coding.el (8-bit-fixed-query-coding-region): Add a new IGNORE-INVALID-SEQUENCESP argument, implement support for it using the 8-bit-fixed-invalid-sequences-skip-chars coding system property; remove some debugging messages. * mule/mule-coding.el (make-8-bit-coding-system): This function is dumped, autoloading it makes no sense. Document what happens when characters above #x7f are not specified, implement this. * mule/vietnamese.el: Correct spelling. tests/ChangeLog addition: 2009-02-07 Aidan Kehoe <kehoea@parhasard.net> * automated/query-coding-tests.el: Add FAILING-CASE arguments to the Assert calls, making #'q-c-debug mostly unnecessary. Remove #'q-c-debug. Add new tests that use the IGNORE-INVALID-SEQUENCESP argument to #'query-coding-region; rework the existing ones to respect it.

author	Aidan Kehoe <kehoea@parhasard.net>
date	Sat, 07 Feb 2009 17:13:37 +0000
parents	c83cab5a4f04
children	ba06a6cae484

comparison

equal deleted inserted replaced

-:202cb69c4d87
+:e0a8715fdb1f
 invalid-sequence-coding-system
 VALUE is a fixed-width 8-bit coding system used to
 display Unicode error sequences (using a face to make
 it clear that the data is invalid).  In Western Europe
-this is normally windows-1252; in the Russia and the
+and the Americas this is normally windows-1252; in
-former Soviet Union koi8-ru or windows-1251 makes more
+Russia and the former Soviet Union koi8-ru or
-sense."
+windows-1251 makes more sense."
 (if (symbolp lang-env)
 (setq lang-env (symbol-name lang-env)))
 (let (lang-slot prop-slot)
 (setq lang-slot (assoc lang-env language-info-alist))
 (if (null lang-slot)		; If no slot for the language, add it.
 	(funcall func)))
 (let ((invalid-sequence-coding-system
 (get-language-info language-name 'invalid-sequence-coding-system))
 (disp-table (specifier-instance current-display-table))
-glyph string)
+glyph string unicode-error-lookup)
 (when (consp invalid-sequence-coding-system)
 (setq invalid-sequence-coding-system
 (car invalid-sequence-coding-system)))
 (map-char-table
 #'(lambda (key entry)
 (setq string (decode-coding-string (string entry)
 invalid-sequence-coding-system))
-;; Treat control characters specially:
+(when (= 1 (length string))
-(when (string-match "^[\x00-\x1f\x80-\x9f]$" string)
+;; Treat control characters specially:
-(setq string (format "^%c" (+ ?@ (aref string 0)))))
+(cond
+((string-match "^[\x00-\x1f\x80-\x9f]$" string)
+(setq string (format "^%c" (+ ?@ (aref string 0)))))
+((setq unicode-error-lookup
+(get-char-table (aref string 0)
+unicode-error-default-translation-table))
+(setq string (format "^%c" (+ ?@ unicode-error-lookup))))))
 (setq glyph (make-glyph (vector 'string :data string)))
 (set-glyph-face glyph 'unicode-invalid-sequence-warning-face)
 (put-char-table key glyph disp-table)
 nil)
 unicode-error-default-translation-table))
 (?\x8f . "SS3")
 (?\x9b . "CSI")))
 (defun encoded-string-description (str coding-system)
 "Return a pretty description of STR that is encoded by CODING-SYSTEM."
-;  (setq str (string-as-unibyte str))
+;; XEmacs; no transformation to unibyte.
 (mapconcat
 (if (and coding-system (eq (coding-system-type coding-system) 'iso2022))
 ;; Try to get a pretty description for ISO 2022 escape sequences.
 (function (lambda (x) (or (cdr (assq x iso-2022-control-alist))
 				 (format "#x%02X" x))))
 (function (lambda (x) (format "#x%02X" x))))
 str " "))
-;; (defun encode-coding-char (char coding-system)
+;; XEmacs;
-;;   "Encode CHAR by CODING-SYSTEM and return the resulting string.
+;; (defun encode-coding-char (char coding-system) in coding.el.
-;; If CODING-SYSTEM can't safely encode CHAR, return nil."
-;;   (if (cmpcharp char)
-;;       (setq char (car (decompose-composite-char char 'list))))
-;;   (let ((str1 (char-to-string char))
-;;         (str2 (make-string 2 char))
-;;         (safe-charsets (and coding-system
-;;                             (coding-system-get coding-system 'safe-charsets)))
-;;         enc1 enc2 i1 i2)
-;;     (when (or (eq safe-charsets t)
-;;               (memq (char-charset char) safe-charsets))
-;;       ;; We must find the encoded string of CHAR.  But, just encoding
-;;       ;; CHAR will put extra control sequences (usually to designate
-;;       ;; ASCII charset) at the tail if type of CODING is ISO 2022.
-;;       ;; To exclude such tailing bytes, we at first encode one-char
-;;       ;; string and two-char string, then check how many bytes at the
-;;       ;; tail of both encoded strings are the same.
-;;
-;;       (setq enc1 (string-as-unibyte (encode-coding-string str1 coding-system))
-;;             i1 (length enc1)
-;;             enc2 (string-as-unibyte (encode-coding-string str2 coding-system))
-;;             i2 (length enc2))
-;;       (while (and (> i1 0) (= (aref enc1 (1- i1)) (aref enc2 (1- i2))))
-;;         (setq i1 (1- i1) i2 (1- i2)))
-;;
-;;       ;; Now (substring enc1 i1) and (substring enc2 i2) are the same,
-;;       ;; and they are the extra control sequences at the tail to
-;;       ;; exclude.
-;;       (substring enc2 0 i2))))
 ;; #### The following section is utter junk from mule-misc.el.
 ;; I've deleted everything that's not referenced in mule-packages and
 ;; not in FSF 20.6; there's no point in keeping old namespace-polluting

Mercurial > hg > xemacs-beta

comparison lisp/mule/mule-cmds.el @ 4604:e0a8715fdb1f