Mercurial > hg > xemacs-beta
comparison lisp/mule/mule-cmds.el @ 4604:e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
lisp/ChangeLog addition:
2009-02-07 Aidan Kehoe <kehoea@parhasard.net>
* coding.el (query-coding-clear-highlights):
Rename the BUFFER argument to BUFFER-OR-STRING, describe it as
possibly being a string in its documentation.
(default-query-coding-region):
Add a new IGNORE-INVALID-SEQUENCESP argument, document that this
function does not support it.
Bind case-fold-search to nil, we don't want this to influence what the
function thinks is encodable or not.
(query-coding-region):
Add a new IGNORE-INVALID-SEQUENCESP argument, document what it
does; reflect this new argument in the associated compiler macro.
(query-coding-string):
Add a new IGNORE-INVALID-SEQUENCESP argument, document what it
does. Support the HIGHLIGHT argument correctly.
* unicode.el (unicode-query-coding-region):
Add a new IGNORE-INVALID-SEQUENCESP argument, document what it
does, implement this. Document a potential problem.
Use #'query-coding-clear-highlights instead of reimplementing it
ourselves.
Remove some debugging messages.
* mule/arabic.el (iso-8859-6):
* mule/cyrillic.el (iso-8859-5):
* mule/greek.el (iso-8859-7):
* mule/hebrew.el (iso-8859-8):
* mule/latin.el (iso-8859-2):
* mule/latin.el (iso-8859-3):
* mule/latin.el (iso-8859-4):
* mule/latin.el (iso-8859-14):
* mule/latin.el (iso-8859-15):
* mule/latin.el (iso-8859-16):
* mule/latin.el (iso-8859-9):
* mule/latin.el (windows-1252):
* mule/mule-coding.el (iso-8859-1):
Avoid the assumption that characters not given an explicit mapping
in these coding systems map to the ISO 8859-1 characters
corresponding to the octets on disk; this makes it much more
reasonable to implement the IGNORE-INVALID-SEQUENCESP argument to
query-coding-region.
* mule/mule-cmds.el (set-language-info):
Correct the docstring.
* mule/mule-cmds.el (finish-set-language-environment):
Treat invalid Unicode sequences produced from
invalid-sequence-coding-system and corresponding to control
characters the same as control characters in redisplay.
* mule/mule-cmds.el:
Document that encode-coding-char is available in coding.el
* mule/mule-coding.el (make-8-bit-generate-helper):
Change to return the both the encode-program generated and the
relevant non-ASCII charset; update the docstring to reflect this.
* mule/mule-coding.el
(make-8-bit-generate-encode-program-and-skip-chars-strings):
Rename this function; have it return skip-chars-strings as well as
the encode program. Have these skip-chars-strings use ranges for
charsets, where possible.
* mule/mule-coding.el (make-8-bit-create-decode-encode-tables):
Revise this to allow people to specify explicitly characters that
should be undefined (= corresponding to keys in
unicode-error-default-translation-table), and treating unspecified
octets above #x7f as undefined by default.
* mule/mule-coding.el (8-bit-fixed-query-coding-region):
Add a new IGNORE-INVALID-SEQUENCESP argument, implement support
for it using the 8-bit-fixed-invalid-sequences-skip-chars coding
system property; remove some debugging messages.
* mule/mule-coding.el (make-8-bit-coding-system):
This function is dumped, autoloading it makes no sense.
Document what happens when characters above #x7f are not
specified, implement this.
* mule/vietnamese.el:
Correct spelling.
tests/ChangeLog addition:
2009-02-07 Aidan Kehoe <kehoea@parhasard.net>
* automated/query-coding-tests.el:
Add FAILING-CASE arguments to the Assert calls, making #'q-c-debug
mostly unnecessary. Remove #'q-c-debug.
Add new tests that use the IGNORE-INVALID-SEQUENCESP argument to
#'query-coding-region; rework the existing ones to respect it.
author | Aidan Kehoe <kehoea@parhasard.net> |
---|---|
date | Sat, 07 Feb 2009 17:13:37 +0000 |
parents | c83cab5a4f04 |
children | ba06a6cae484 |
comparison
equal
deleted
inserted
replaced
4603:202cb69c4d87 | 4604:e0a8715fdb1f |
---|---|
229 | 229 |
230 invalid-sequence-coding-system | 230 invalid-sequence-coding-system |
231 VALUE is a fixed-width 8-bit coding system used to | 231 VALUE is a fixed-width 8-bit coding system used to |
232 display Unicode error sequences (using a face to make | 232 display Unicode error sequences (using a face to make |
233 it clear that the data is invalid). In Western Europe | 233 it clear that the data is invalid). In Western Europe |
234 this is normally windows-1252; in the Russia and the | 234 and the Americas this is normally windows-1252; in |
235 former Soviet Union koi8-ru or windows-1251 makes more | 235 Russia and the former Soviet Union koi8-ru or |
236 sense." | 236 windows-1251 makes more sense." |
237 (if (symbolp lang-env) | 237 (if (symbolp lang-env) |
238 (setq lang-env (symbol-name lang-env))) | 238 (setq lang-env (symbol-name lang-env))) |
239 (let (lang-slot prop-slot) | 239 (let (lang-slot prop-slot) |
240 (setq lang-slot (assoc lang-env language-info-alist)) | 240 (setq lang-slot (assoc lang-env language-info-alist)) |
241 (if (null lang-slot) ; If no slot for the language, add it. | 241 (if (null lang-slot) ; If no slot for the language, add it. |
769 (funcall func))) | 769 (funcall func))) |
770 | 770 |
771 (let ((invalid-sequence-coding-system | 771 (let ((invalid-sequence-coding-system |
772 (get-language-info language-name 'invalid-sequence-coding-system)) | 772 (get-language-info language-name 'invalid-sequence-coding-system)) |
773 (disp-table (specifier-instance current-display-table)) | 773 (disp-table (specifier-instance current-display-table)) |
774 glyph string) | 774 glyph string unicode-error-lookup) |
775 (when (consp invalid-sequence-coding-system) | 775 (when (consp invalid-sequence-coding-system) |
776 (setq invalid-sequence-coding-system | 776 (setq invalid-sequence-coding-system |
777 (car invalid-sequence-coding-system))) | 777 (car invalid-sequence-coding-system))) |
778 (map-char-table | 778 (map-char-table |
779 #'(lambda (key entry) | 779 #'(lambda (key entry) |
780 (setq string (decode-coding-string (string entry) | 780 (setq string (decode-coding-string (string entry) |
781 invalid-sequence-coding-system)) | 781 invalid-sequence-coding-system)) |
782 ;; Treat control characters specially: | 782 (when (= 1 (length string)) |
783 (when (string-match "^[\x00-\x1f\x80-\x9f]$" string) | 783 ;; Treat control characters specially: |
784 (setq string (format "^%c" (+ ?@ (aref string 0))))) | 784 (cond |
785 ((string-match "^[\x00-\x1f\x80-\x9f]$" string) | |
786 (setq string (format "^%c" (+ ?@ (aref string 0))))) | |
787 ((setq unicode-error-lookup | |
788 (get-char-table (aref string 0) | |
789 unicode-error-default-translation-table)) | |
790 (setq string (format "^%c" (+ ?@ unicode-error-lookup)))))) | |
785 (setq glyph (make-glyph (vector 'string :data string))) | 791 (setq glyph (make-glyph (vector 'string :data string))) |
786 (set-glyph-face glyph 'unicode-invalid-sequence-warning-face) | 792 (set-glyph-face glyph 'unicode-invalid-sequence-warning-face) |
787 (put-char-table key glyph disp-table) | 793 (put-char-table key glyph disp-table) |
788 nil) | 794 nil) |
789 unicode-error-default-translation-table)) | 795 unicode-error-default-translation-table)) |
937 (?\x8f . "SS3") | 943 (?\x8f . "SS3") |
938 (?\x9b . "CSI"))) | 944 (?\x9b . "CSI"))) |
939 | 945 |
940 (defun encoded-string-description (str coding-system) | 946 (defun encoded-string-description (str coding-system) |
941 "Return a pretty description of STR that is encoded by CODING-SYSTEM." | 947 "Return a pretty description of STR that is encoded by CODING-SYSTEM." |
942 ; (setq str (string-as-unibyte str)) | 948 ;; XEmacs; no transformation to unibyte. |
943 (mapconcat | 949 (mapconcat |
944 (if (and coding-system (eq (coding-system-type coding-system) 'iso2022)) | 950 (if (and coding-system (eq (coding-system-type coding-system) 'iso2022)) |
945 ;; Try to get a pretty description for ISO 2022 escape sequences. | 951 ;; Try to get a pretty description for ISO 2022 escape sequences. |
946 (function (lambda (x) (or (cdr (assq x iso-2022-control-alist)) | 952 (function (lambda (x) (or (cdr (assq x iso-2022-control-alist)) |
947 (format "#x%02X" x)))) | 953 (format "#x%02X" x)))) |
948 (function (lambda (x) (format "#x%02X" x)))) | 954 (function (lambda (x) (format "#x%02X" x)))) |
949 str " ")) | 955 str " ")) |
950 | 956 |
951 ;; (defun encode-coding-char (char coding-system) | 957 ;; XEmacs; |
952 ;; "Encode CHAR by CODING-SYSTEM and return the resulting string. | 958 ;; (defun encode-coding-char (char coding-system) in coding.el. |
953 ;; If CODING-SYSTEM can't safely encode CHAR, return nil." | |
954 ;; (if (cmpcharp char) | |
955 ;; (setq char (car (decompose-composite-char char 'list)))) | |
956 ;; (let ((str1 (char-to-string char)) | |
957 ;; (str2 (make-string 2 char)) | |
958 ;; (safe-charsets (and coding-system | |
959 ;; (coding-system-get coding-system 'safe-charsets))) | |
960 ;; enc1 enc2 i1 i2) | |
961 ;; (when (or (eq safe-charsets t) | |
962 ;; (memq (char-charset char) safe-charsets)) | |
963 ;; ;; We must find the encoded string of CHAR. But, just encoding | |
964 ;; ;; CHAR will put extra control sequences (usually to designate | |
965 ;; ;; ASCII charset) at the tail if type of CODING is ISO 2022. | |
966 ;; ;; To exclude such tailing bytes, we at first encode one-char | |
967 ;; ;; string and two-char string, then check how many bytes at the | |
968 ;; ;; tail of both encoded strings are the same. | |
969 ;; | |
970 ;; (setq enc1 (string-as-unibyte (encode-coding-string str1 coding-system)) | |
971 ;; i1 (length enc1) | |
972 ;; enc2 (string-as-unibyte (encode-coding-string str2 coding-system)) | |
973 ;; i2 (length enc2)) | |
974 ;; (while (and (> i1 0) (= (aref enc1 (1- i1)) (aref enc2 (1- i2)))) | |
975 ;; (setq i1 (1- i1) i2 (1- i2))) | |
976 ;; | |
977 ;; ;; Now (substring enc1 i1) and (substring enc2 i2) are the same, | |
978 ;; ;; and they are the extra control sequences at the tail to | |
979 ;; ;; exclude. | |
980 ;; (substring enc2 0 i2)))) | |
981 | 959 |
982 | 960 |
983 ;; #### The following section is utter junk from mule-misc.el. | 961 ;; #### The following section is utter junk from mule-misc.el. |
984 ;; I've deleted everything that's not referenced in mule-packages and | 962 ;; I've deleted everything that's not referenced in mule-packages and |
985 ;; not in FSF 20.6; there's no point in keeping old namespace-polluting | 963 ;; not in FSF 20.6; there's no point in keeping old namespace-polluting |