comparison lisp/mule/mule-cmds.el @ 4604:e0a8715fdb1f

Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region. lisp/ChangeLog addition: 2009-02-07 Aidan Kehoe <kehoea@parhasard.net> * coding.el (query-coding-clear-highlights): Rename the BUFFER argument to BUFFER-OR-STRING, describe it as possibly being a string in its documentation. (default-query-coding-region): Add a new IGNORE-INVALID-SEQUENCESP argument, document that this function does not support it. Bind case-fold-search to nil, we don't want this to influence what the function thinks is encodable or not. (query-coding-region): Add a new IGNORE-INVALID-SEQUENCESP argument, document what it does; reflect this new argument in the associated compiler macro. (query-coding-string): Add a new IGNORE-INVALID-SEQUENCESP argument, document what it does. Support the HIGHLIGHT argument correctly. * unicode.el (unicode-query-coding-region): Add a new IGNORE-INVALID-SEQUENCESP argument, document what it does, implement this. Document a potential problem. Use #'query-coding-clear-highlights instead of reimplementing it ourselves. Remove some debugging messages. * mule/arabic.el (iso-8859-6): * mule/cyrillic.el (iso-8859-5): * mule/greek.el (iso-8859-7): * mule/hebrew.el (iso-8859-8): * mule/latin.el (iso-8859-2): * mule/latin.el (iso-8859-3): * mule/latin.el (iso-8859-4): * mule/latin.el (iso-8859-14): * mule/latin.el (iso-8859-15): * mule/latin.el (iso-8859-16): * mule/latin.el (iso-8859-9): * mule/latin.el (windows-1252): * mule/mule-coding.el (iso-8859-1): Avoid the assumption that characters not given an explicit mapping in these coding systems map to the ISO 8859-1 characters corresponding to the octets on disk; this makes it much more reasonable to implement the IGNORE-INVALID-SEQUENCESP argument to query-coding-region. * mule/mule-cmds.el (set-language-info): Correct the docstring. * mule/mule-cmds.el (finish-set-language-environment): Treat invalid Unicode sequences produced from invalid-sequence-coding-system and corresponding to control characters the same as control characters in redisplay. * mule/mule-cmds.el: Document that encode-coding-char is available in coding.el * mule/mule-coding.el (make-8-bit-generate-helper): Change to return the both the encode-program generated and the relevant non-ASCII charset; update the docstring to reflect this. * mule/mule-coding.el (make-8-bit-generate-encode-program-and-skip-chars-strings): Rename this function; have it return skip-chars-strings as well as the encode program. Have these skip-chars-strings use ranges for charsets, where possible. * mule/mule-coding.el (make-8-bit-create-decode-encode-tables): Revise this to allow people to specify explicitly characters that should be undefined (= corresponding to keys in unicode-error-default-translation-table), and treating unspecified octets above #x7f as undefined by default. * mule/mule-coding.el (8-bit-fixed-query-coding-region): Add a new IGNORE-INVALID-SEQUENCESP argument, implement support for it using the 8-bit-fixed-invalid-sequences-skip-chars coding system property; remove some debugging messages. * mule/mule-coding.el (make-8-bit-coding-system): This function is dumped, autoloading it makes no sense. Document what happens when characters above #x7f are not specified, implement this. * mule/vietnamese.el: Correct spelling. tests/ChangeLog addition: 2009-02-07 Aidan Kehoe <kehoea@parhasard.net> * automated/query-coding-tests.el: Add FAILING-CASE arguments to the Assert calls, making #'q-c-debug mostly unnecessary. Remove #'q-c-debug. Add new tests that use the IGNORE-INVALID-SEQUENCESP argument to #'query-coding-region; rework the existing ones to respect it.
author Aidan Kehoe <kehoea@parhasard.net>
date Sat, 07 Feb 2009 17:13:37 +0000
parents c83cab5a4f04
children ba06a6cae484
comparison
equal deleted inserted replaced
4603:202cb69c4d87 4604:e0a8715fdb1f
229 229
230 invalid-sequence-coding-system 230 invalid-sequence-coding-system
231 VALUE is a fixed-width 8-bit coding system used to 231 VALUE is a fixed-width 8-bit coding system used to
232 display Unicode error sequences (using a face to make 232 display Unicode error sequences (using a face to make
233 it clear that the data is invalid). In Western Europe 233 it clear that the data is invalid). In Western Europe
234 this is normally windows-1252; in the Russia and the 234 and the Americas this is normally windows-1252; in
235 former Soviet Union koi8-ru or windows-1251 makes more 235 Russia and the former Soviet Union koi8-ru or
236 sense." 236 windows-1251 makes more sense."
237 (if (symbolp lang-env) 237 (if (symbolp lang-env)
238 (setq lang-env (symbol-name lang-env))) 238 (setq lang-env (symbol-name lang-env)))
239 (let (lang-slot prop-slot) 239 (let (lang-slot prop-slot)
240 (setq lang-slot (assoc lang-env language-info-alist)) 240 (setq lang-slot (assoc lang-env language-info-alist))
241 (if (null lang-slot) ; If no slot for the language, add it. 241 (if (null lang-slot) ; If no slot for the language, add it.
769 (funcall func))) 769 (funcall func)))
770 770
771 (let ((invalid-sequence-coding-system 771 (let ((invalid-sequence-coding-system
772 (get-language-info language-name 'invalid-sequence-coding-system)) 772 (get-language-info language-name 'invalid-sequence-coding-system))
773 (disp-table (specifier-instance current-display-table)) 773 (disp-table (specifier-instance current-display-table))
774 glyph string) 774 glyph string unicode-error-lookup)
775 (when (consp invalid-sequence-coding-system) 775 (when (consp invalid-sequence-coding-system)
776 (setq invalid-sequence-coding-system 776 (setq invalid-sequence-coding-system
777 (car invalid-sequence-coding-system))) 777 (car invalid-sequence-coding-system)))
778 (map-char-table 778 (map-char-table
779 #'(lambda (key entry) 779 #'(lambda (key entry)
780 (setq string (decode-coding-string (string entry) 780 (setq string (decode-coding-string (string entry)
781 invalid-sequence-coding-system)) 781 invalid-sequence-coding-system))
782 ;; Treat control characters specially: 782 (when (= 1 (length string))
783 (when (string-match "^[\x00-\x1f\x80-\x9f]$" string) 783 ;; Treat control characters specially:
784 (setq string (format "^%c" (+ ?@ (aref string 0))))) 784 (cond
785 ((string-match "^[\x00-\x1f\x80-\x9f]$" string)
786 (setq string (format "^%c" (+ ?@ (aref string 0)))))
787 ((setq unicode-error-lookup
788 (get-char-table (aref string 0)
789 unicode-error-default-translation-table))
790 (setq string (format "^%c" (+ ?@ unicode-error-lookup))))))
785 (setq glyph (make-glyph (vector 'string :data string))) 791 (setq glyph (make-glyph (vector 'string :data string)))
786 (set-glyph-face glyph 'unicode-invalid-sequence-warning-face) 792 (set-glyph-face glyph 'unicode-invalid-sequence-warning-face)
787 (put-char-table key glyph disp-table) 793 (put-char-table key glyph disp-table)
788 nil) 794 nil)
789 unicode-error-default-translation-table)) 795 unicode-error-default-translation-table))
937 (?\x8f . "SS3") 943 (?\x8f . "SS3")
938 (?\x9b . "CSI"))) 944 (?\x9b . "CSI")))
939 945
940 (defun encoded-string-description (str coding-system) 946 (defun encoded-string-description (str coding-system)
941 "Return a pretty description of STR that is encoded by CODING-SYSTEM." 947 "Return a pretty description of STR that is encoded by CODING-SYSTEM."
942 ; (setq str (string-as-unibyte str)) 948 ;; XEmacs; no transformation to unibyte.
943 (mapconcat 949 (mapconcat
944 (if (and coding-system (eq (coding-system-type coding-system) 'iso2022)) 950 (if (and coding-system (eq (coding-system-type coding-system) 'iso2022))
945 ;; Try to get a pretty description for ISO 2022 escape sequences. 951 ;; Try to get a pretty description for ISO 2022 escape sequences.
946 (function (lambda (x) (or (cdr (assq x iso-2022-control-alist)) 952 (function (lambda (x) (or (cdr (assq x iso-2022-control-alist))
947 (format "#x%02X" x)))) 953 (format "#x%02X" x))))
948 (function (lambda (x) (format "#x%02X" x)))) 954 (function (lambda (x) (format "#x%02X" x))))
949 str " ")) 955 str " "))
950 956
951 ;; (defun encode-coding-char (char coding-system) 957 ;; XEmacs;
952 ;; "Encode CHAR by CODING-SYSTEM and return the resulting string. 958 ;; (defun encode-coding-char (char coding-system) in coding.el.
953 ;; If CODING-SYSTEM can't safely encode CHAR, return nil."
954 ;; (if (cmpcharp char)
955 ;; (setq char (car (decompose-composite-char char 'list))))
956 ;; (let ((str1 (char-to-string char))
957 ;; (str2 (make-string 2 char))
958 ;; (safe-charsets (and coding-system
959 ;; (coding-system-get coding-system 'safe-charsets)))
960 ;; enc1 enc2 i1 i2)
961 ;; (when (or (eq safe-charsets t)
962 ;; (memq (char-charset char) safe-charsets))
963 ;; ;; We must find the encoded string of CHAR. But, just encoding
964 ;; ;; CHAR will put extra control sequences (usually to designate
965 ;; ;; ASCII charset) at the tail if type of CODING is ISO 2022.
966 ;; ;; To exclude such tailing bytes, we at first encode one-char
967 ;; ;; string and two-char string, then check how many bytes at the
968 ;; ;; tail of both encoded strings are the same.
969 ;;
970 ;; (setq enc1 (string-as-unibyte (encode-coding-string str1 coding-system))
971 ;; i1 (length enc1)
972 ;; enc2 (string-as-unibyte (encode-coding-string str2 coding-system))
973 ;; i2 (length enc2))
974 ;; (while (and (> i1 0) (= (aref enc1 (1- i1)) (aref enc2 (1- i2))))
975 ;; (setq i1 (1- i1) i2 (1- i2)))
976 ;;
977 ;; ;; Now (substring enc1 i1) and (substring enc2 i2) are the same,
978 ;; ;; and they are the extra control sequences at the tail to
979 ;; ;; exclude.
980 ;; (substring enc2 0 i2))))
981 959
982 960
983 ;; #### The following section is utter junk from mule-misc.el. 961 ;; #### The following section is utter junk from mule-misc.el.
984 ;; I've deleted everything that's not referenced in mule-packages and 962 ;; I've deleted everything that's not referenced in mule-packages and
985 ;; not in FSF 20.6; there's no point in keeping old namespace-polluting 963 ;; not in FSF 20.6; there's no point in keeping old namespace-polluting