Mercurial > hg > xemacs-beta
diff tests/automated/query-coding-tests.el @ 4604:e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
lisp/ChangeLog addition:
2009-02-07 Aidan Kehoe <kehoea@parhasard.net>
* coding.el (query-coding-clear-highlights):
Rename the BUFFER argument to BUFFER-OR-STRING, describe it as
possibly being a string in its documentation.
(default-query-coding-region):
Add a new IGNORE-INVALID-SEQUENCESP argument, document that this
function does not support it.
Bind case-fold-search to nil, we don't want this to influence what the
function thinks is encodable or not.
(query-coding-region):
Add a new IGNORE-INVALID-SEQUENCESP argument, document what it
does; reflect this new argument in the associated compiler macro.
(query-coding-string):
Add a new IGNORE-INVALID-SEQUENCESP argument, document what it
does. Support the HIGHLIGHT argument correctly.
* unicode.el (unicode-query-coding-region):
Add a new IGNORE-INVALID-SEQUENCESP argument, document what it
does, implement this. Document a potential problem.
Use #'query-coding-clear-highlights instead of reimplementing it
ourselves.
Remove some debugging messages.
* mule/arabic.el (iso-8859-6):
* mule/cyrillic.el (iso-8859-5):
* mule/greek.el (iso-8859-7):
* mule/hebrew.el (iso-8859-8):
* mule/latin.el (iso-8859-2):
* mule/latin.el (iso-8859-3):
* mule/latin.el (iso-8859-4):
* mule/latin.el (iso-8859-14):
* mule/latin.el (iso-8859-15):
* mule/latin.el (iso-8859-16):
* mule/latin.el (iso-8859-9):
* mule/latin.el (windows-1252):
* mule/mule-coding.el (iso-8859-1):
Avoid the assumption that characters not given an explicit mapping
in these coding systems map to the ISO 8859-1 characters
corresponding to the octets on disk; this makes it much more
reasonable to implement the IGNORE-INVALID-SEQUENCESP argument to
query-coding-region.
* mule/mule-cmds.el (set-language-info):
Correct the docstring.
* mule/mule-cmds.el (finish-set-language-environment):
Treat invalid Unicode sequences produced from
invalid-sequence-coding-system and corresponding to control
characters the same as control characters in redisplay.
* mule/mule-cmds.el:
Document that encode-coding-char is available in coding.el
* mule/mule-coding.el (make-8-bit-generate-helper):
Change to return the both the encode-program generated and the
relevant non-ASCII charset; update the docstring to reflect this.
* mule/mule-coding.el
(make-8-bit-generate-encode-program-and-skip-chars-strings):
Rename this function; have it return skip-chars-strings as well as
the encode program. Have these skip-chars-strings use ranges for
charsets, where possible.
* mule/mule-coding.el (make-8-bit-create-decode-encode-tables):
Revise this to allow people to specify explicitly characters that
should be undefined (= corresponding to keys in
unicode-error-default-translation-table), and treating unspecified
octets above #x7f as undefined by default.
* mule/mule-coding.el (8-bit-fixed-query-coding-region):
Add a new IGNORE-INVALID-SEQUENCESP argument, implement support
for it using the 8-bit-fixed-invalid-sequences-skip-chars coding
system property; remove some debugging messages.
* mule/mule-coding.el (make-8-bit-coding-system):
This function is dumped, autoloading it makes no sense.
Document what happens when characters above #x7f are not
specified, implement this.
* mule/vietnamese.el:
Correct spelling.
tests/ChangeLog addition:
2009-02-07 Aidan Kehoe <kehoea@parhasard.net>
* automated/query-coding-tests.el:
Add FAILING-CASE arguments to the Assert calls, making #'q-c-debug
mostly unnecessary. Remove #'q-c-debug.
Add new tests that use the IGNORE-INVALID-SEQUENCESP argument to
#'query-coding-region; rework the existing ones to respect it.
author | Aidan Kehoe <kehoea@parhasard.net> |
---|---|
date | Sat, 07 Feb 2009 17:13:37 +0000 |
parents | e6a7054a9c30 |
children | 8cbca852bcd4 |
line wrap: on
line diff
--- a/tests/automated/query-coding-tests.el Thu Feb 05 21:18:37 2009 -0500 +++ b/tests/automated/query-coding-tests.el Sat Feb 07 17:13:37 2009 +0000 @@ -31,28 +31,6 @@ (require 'bytecomp) -(defun q-c-debug (&rest aerger) - (let ((standard-output (get-buffer-create "query-coding-debug")) - (fmt (condition-case nil - (and (stringp (first aerger)) - (apply #'format aerger)) - (error nil)))) - (if fmt - (progn - (princ (apply #'format aerger)) - (terpri)) - (princ "--> ") - (let ((i 1)) - (dolist (sgra aerger) - (if (> i 1) (princ " ")) - (princ (format "%d. " i)) - (prin1 sgra) - (incf i)) - (terpri))))) - -;; Comment this out if debugging: -(defalias 'q-c-debug #'ignore) - (when (featurep 'mule) (let ((ascii-chars-string (apply #'string (loop for i from #x0 to #x7f @@ -64,7 +42,7 @@ (with-temp-buffer (insert ascii-chars-string) ;; First, check all the coding systems that are ASCII-transparent for - ;; ASCII-transparency in the check. + ;; ASCII-transparency in query-coding-region. (dolist (coding-system (delete-duplicates (mapcar #'(lambda (coding-system) @@ -87,76 +65,142 @@ unix-coding-system))) (coding-system-list nil)) :test #'eq)) - (q-c-debug "looking at coding system %S" (coding-system-name - coding-system)) (multiple-value-bind (query-coding-succeeded query-coding-table) (query-coding-region (point-min) (point-max) coding-system) - (Assert (eq t query-coding-succeeded)) - (Assert (null query-coding-table))) + (Assert (eq t query-coding-succeeded) + (format "checking query-coding-region ASCII-transparency, %s" + coding-system)) + (Assert (null query-coding-table) + (format "checking query-coding-region ASCII-transparency, %s" + coding-system))) (multiple-value-bind (query-coding-succeeded query-coding-table) (query-coding-string ascii-chars-string coding-system) - (Assert (eq t query-coding-succeeded)) - (Assert (null query-coding-table)))) + (Assert (eq t query-coding-succeeded) + (format "checking query-coding-string ASCII-transparency, %s" + coding-system)) + (Assert (null query-coding-table) + (format "checking query-coding-string ASCII-transparency, %s" + coding-system)))) (delete-region (point-min) (point-max)) ;; Check for success from the two Latin-1 coding systems (insert latin-1-chars-string) (multiple-value-bind (query-coding-succeeded query-coding-table) (query-coding-region (point-min) (point-max) 'iso-8859-1-unix) - (Assert (eq t query-coding-succeeded)) - (Assert (null query-coding-table))) + (Assert (eq t query-coding-succeeded) + "checking query-coding-region iso-8859-1-transparency") + (Assert (null query-coding-table) + "checking query-coding-region iso-8859-1-transparency")) (multiple-value-bind (query-coding-succeeded query-coding-table) (query-coding-string (buffer-string) 'iso-8859-1-unix) - (Assert (eq t query-coding-succeeded)) - (Assert (null query-coding-table))) + (Assert (eq t query-coding-succeeded) + "checking query-coding-string iso-8859-1-transparency") + (Assert (null query-coding-table) + "checking query-coding-string iso-8859-1-transparency")) (multiple-value-bind (query-coding-succeeded query-coding-table) (query-coding-string (buffer-string) 'iso-latin-1-with-esc-unix) - (Assert (eq t query-coding-succeeded)) - (Assert (null query-coding-table))) + (Assert + (eq t query-coding-succeeded) + "checking query-coding-region iso-latin-1-with-esc-transparency") + (Assert + (null query-coding-table) + "checking query-coding-region iso-latin-1-with-esc-transparency")) ;; Make it fail, check that it fails correctly (insert (decode-char 'ucs #x20AC)) ;; EURO SIGN (multiple-value-bind (query-coding-succeeded query-coding-table) (query-coding-region (point-min) (point-max) 'iso-8859-1-unix) - (Assert (null query-coding-succeeded)) - (Assert (equal query-coding-table - #s(range-table type start-closed-end-open data - ((257 258) t))))) + (Assert + (null query-coding-succeeded) + "checking that query-coding-region fails, U+20AC, iso-8859-1") + (Assert + (equal query-coding-table + #s(range-table type start-closed-end-open data + ((257 258) unencodable))) + "checking query-coding-region fails correctly, U+20AC, iso-8859-1")) (multiple-value-bind (query-coding-succeeded query-coding-table) (query-coding-region (point-min) (point-max) 'iso-latin-1-with-esc-unix) ;; Stupidly, this succeeds. The behaviour is compatible with ;; GNU, though, and we encourage people not to use ;; iso-latin-1-with-esc-unix anyway: - (Assert query-coding-succeeded) - (Assert (null query-coding-table))) + (Assert + query-coding-succeeded + "checking that query-coding-region succeeds, U+20AC, \ +iso-latin-with-esc-unix-1") + (Assert + (null query-coding-table) + "checking that query-coding-region succeeds, U+20AC, \ +iso-latin-with-esc-unix-1")) ;; Check that it errors correctly. (setq text-conversion-error-signalled nil) (condition-case nil - (query-coding-region (point-min) (point-max) 'iso-8859-1-unix nil t) + (query-coding-region (point-min) (point-max) 'iso-8859-1-unix + (current-buffer) nil t) (text-conversion-error (setq text-conversion-error-signalled t))) - (Assert text-conversion-error-signalled) + (Assert + text-conversion-error-signalled + "checking query-coding-region signals text-conversion-error correctly") (setq text-conversion-error-signalled nil) (condition-case nil (query-coding-region (point-min) (point-max) - 'iso-latin-1-with-esc-unix nil t) + 'iso-latin-1-with-esc-unix nil nil t) (text-conversion-error (setq text-conversion-error-signalled t))) - (Assert (null text-conversion-error-signalled)) + (Assert + (null text-conversion-error-signalled) + "checking query-coding-region doesn't signal text-conversion-error") (delete-region (point-min) (point-max)) (insert latin-1-chars-string) (decode-coding-region (point-min) (point-max) 'windows-1252-unix) (goto-char (point-max)) ;; #'decode-coding-region just messed up point. (multiple-value-bind (query-coding-succeeded query-coding-table) (query-coding-region (point-min) (point-max) 'windows-1252-unix) - (Assert (eq t query-coding-succeeded)) - (Assert (null query-coding-table))) + (Assert + (null query-coding-succeeded) + "check query-coding-region fails, windows-1252, invalid-sequences") + (Assert + (equal query-coding-table + #s(range-table type start-closed-end-open + data ((130 131) invalid-sequence + (142 143) invalid-sequence + (144 146) invalid-sequence + (158 159) invalid-sequence))) + "check query-coding-region fails, windows-1252, invalid-sequences")) + (multiple-value-bind (query-coding-succeeded query-coding-table) + (query-coding-region (point-min) (point-max) 'windows-1252-unix + (current-buffer) t) + (Assert + (eq t query-coding-succeeded) + "checking that query-coding-region succeeds, U+20AC, windows-1252") + (Assert + (null query-coding-table) + "checking that query-coding-region succeeds, U+20AC, windows-1252")) (insert ?\x80) (multiple-value-bind (query-coding-succeeded query-coding-table) + (query-coding-region (point-min) (point-max) 'windows-1252-unix + (current-buffer) t) + (Assert + (null query-coding-succeeded) + "checking that query-coding-region fails, U+0080, windows-1252") + (Assert + (equal query-coding-table + #s(range-table type start-closed-end-open data + ((257 258) unencodable))) + "checking that query-coding-region fails, U+0080, windows-1252")) + (multiple-value-bind (query-coding-succeeded query-coding-table) (query-coding-region (point-min) (point-max) 'windows-1252-unix) - (Assert (null query-coding-succeeded)) - (Assert (equal query-coding-table - #s(range-table type start-closed-end-open data - ((257 258) t))))) + (Assert + (null query-coding-succeeded) + "check query-coding-region fails, U+0080, invalid-sequence, cp1252") + (Assert + (equal query-coding-table + #s(range-table type start-closed-end-open + data ((130 131) invalid-sequence + (142 143) invalid-sequence + (144 146) invalid-sequence + (158 159) invalid-sequence + (257 258) unencodable))) + "check query-coding-region fails, U+0080, invalid-sequence, cp1252")) ;; Try a similar approach with koi8-o, the koi8 variant with ;; support for Old Church Slavonic. (delete-region (point-min) (point-max)) @@ -164,29 +208,53 @@ (decode-coding-region (point-min) (point-max) 'koi8-o-unix) (multiple-value-bind (query-coding-succeeded query-coding-table) (query-coding-region (point-min) (point-max) 'koi8-o-unix) - (Assert (eq t query-coding-succeeded)) - (Assert (null query-coding-table))) + (Assert + (eq t query-coding-succeeded) + "checking that query-coding-region succeeds, koi8-o-unix") + (Assert + (null query-coding-table) + "checking that query-coding-region succeeds, koi8-o-unix")) (multiple-value-bind (query-coding-succeeded query-coding-table) (query-coding-region (point-min) (point-max) 'escape-quoted) - (Assert (eq t query-coding-succeeded)) - (Assert (null query-coding-table))) + (Assert (eq t query-coding-succeeded) + "checking that query-coding-region succeeds, escape-quoted") + (Assert (null query-coding-table) + "checking that query-coding-region succeeds, escape-quoted")) (multiple-value-bind (query-coding-succeeded query-coding-table) (query-coding-region (point-min) (point-max) 'windows-1252-unix) - (Assert (null query-coding-succeeded)) - (Assert (equal query-coding-table - #s(range-table type start-closed-end-open - data ((129 131) t (132 133) t (139 140) t - (141 146) t (155 156) t (157 161) t - (162 170) t (173 176) t (178 187) t - (189 192) t (193 257) t))))) + (Assert + (null query-coding-succeeded) + "checking that query-coding-region fails, windows-1252 and Cyrillic") + (Assert + (equal query-coding-table + #s(range-table type start-closed-end-open + data ((129 131) unencodable + (132 133) unencodable + (139 140) unencodable + (141 146) unencodable + (155 156) unencodable + (157 161) unencodable + (162 170) unencodable + (173 176) unencodable + (178 187) unencodable + (189 192) unencodable + (193 257) unencodable))) + "checking that query-coding-region fails, windows-1252 and Cyrillic")) (multiple-value-bind (query-coding-succeeded query-coding-table) (query-coding-region (point-min) (point-max) 'koi8-r-unix) - (Assert (null query-coding-succeeded)) - (Assert (equal query-coding-table - #s(range-table type start-closed-end-open - data ((129 154) t (155 161) t (162 164) t - (165 177) t (178 180) t - (181 192) t))))) + (Assert + (null query-coding-succeeded) + "checking that query-coding-region fails, koi8-r and OCS characters") + (Assert + (equal query-coding-table + #s(range-table type start-closed-end-open + data ((129 154) unencodable + (155 161) unencodable + (162 164) unencodable + (165 177) unencodable + (178 180) unencodable + (181 192) unencodable))) + "checking that query-coding-region fails, koi8-r and OCS characters")) ;; Check that the Unicode coding systems handle characters ;; without Unicode mappings. (delete-region (point-min) (point-max)) @@ -210,19 +278,29 @@ utf-16-little-endian-bom)) (multiple-value-bind (query-coding-succeeded query-coding-table) (query-coding-region (point-min) (point-max) coding-system) - (Assert (null query-coding-succeeded)) + (Assert (null query-coding-succeeded) + "checking unicode coding systems fail with unmapped chars") (Assert (equal query-coding-table #s(range-table type start-closed-end-open data - ((173 174) t (209 210) t - (254 255) t))))) + ((173 174) unencodable + (209 210) unencodable + (254 255) unencodable))) + "checking unicode coding systems fail with unmapped chars")) (multiple-value-bind (query-coding-succeeded query-coding-table) (query-coding-region (point-min) 173 coding-system) - (Assert (eq t query-coding-succeeded)) - (Assert (null query-coding-table))) + (Assert (eq t query-coding-succeeded) + "checking unicode coding systems succeed sans unmapped chars") + (Assert + (null query-coding-table) + "checking unicode coding systems succeed sans unmapped chars")) (multiple-value-bind (query-coding-succeeded query-coding-table) (query-coding-region 174 209 coding-system) - (Assert (eq t query-coding-succeeded)) - (Assert (null query-coding-table))) + (Assert + (eq t query-coding-succeeded) + "checking unicode coding systems succeed sans unmapped chars, again") + (Assert + (null query-coding-table) + "checking unicode coding systems succeed sans unmapped chars again")) (multiple-value-bind (query-coding-succeeded query-coding-table) (query-coding-region 210 254 coding-system) (Assert (eq t query-coding-succeeded)) @@ -230,77 +308,143 @@ ;; Check that it errors correctly. (setq text-conversion-error-signalled nil) (condition-case nil - (query-coding-region (point-min) (point-max) coding-system nil t) + (query-coding-region (point-min) (point-max) coding-system + (current-buffer) nil t) (text-conversion-error (setq text-conversion-error-signalled t))) - (Assert text-conversion-error-signalled) + (Assert text-conversion-error-signalled + "checking that unicode coding systems error correctly") (setq text-conversion-error-signalled nil) (condition-case nil - (query-coding-region (point-min) 173 coding-system nil t) + (query-coding-region (point-min) 173 coding-system + (current-buffer) + nil t) (text-conversion-error (setq text-conversion-error-signalled t))) - (Assert (null text-conversion-error-signalled))) + (Assert + (null text-conversion-error-signalled) + "checking that unicode coding systems do not error when unnecessary")) + (delete-region (point-min) (point-max)) + (insert (decode-coding-string "\xff\xff\xff\xff" + 'greek-iso-8bit-with-esc)) + (insert (decode-coding-string "\xff\xff\xff\xff" 'utf-8)) + (insert (decode-coding-string "\xff\xff\xff\xff" + 'greek-iso-8bit-with-esc)) + (dolist (coding-system '(utf-8 utf-16 utf-16-little-endian + utf-32 utf-32-little-endian)) + (multiple-value-bind (query-coding-succeeded query-coding-table) + (query-coding-region (point-min) (point-max) coding-system) + (Assert (null query-coding-succeeded) + (format + "checking %s fails with unmapped chars and invalid seqs" + coding-system)) + (Assert (equal query-coding-table + #s(range-table type start-closed-end-open + data ((1 5) unencodable + (5 9) invalid-sequence + (9 13) unencodable))) + (format + "checking %s fails with unmapped chars and invalid seqs" + coding-system))) + (multiple-value-bind (query-coding-succeeded query-coding-table) + (query-coding-region (point-min) (point-max) coding-system + (current-buffer) t) + (Assert (null query-coding-succeeded) + (format + "checking %s fails with unmapped chars sans invalid seqs" + coding-system)) + (Assert + (equal query-coding-table + #s(range-table type start-closed-end-open + data ((1 5) unencodable + (9 13) unencodable))) + (format + "checking %s fails correctly, unmapped chars sans invalid seqs" + coding-system)))) ;; Now to test #'encode-coding-char. Most of the functionality was ;; tested in the query-coding-region tests above, so we don't go into ;; as much detail here. - (Assert (null (encode-coding-char - (decode-char 'ucs #x20ac) 'iso-8859-1))) - (Assert (equal "\x80" (encode-coding-char - (decode-char 'ucs #x20ac) 'windows-1252))) + (Assert + (null (encode-coding-char + (decode-char 'ucs #x20ac) 'iso-8859-1)) + "check #'encode-coding-char doesn't think iso-8859-1 handles U+20AC") + (Assert + (equal "\x80" (encode-coding-char + (decode-char 'ucs #x20ac) 'windows-1252)) + "check #'encode-coding-char doesn't think windows-1252 handles U+0080") (delete-region (point-min) (point-max)) ;; And #'unencodable-char-position. (insert latin-1-chars-string) (insert (decode-char 'ucs #x20ac)) - (Assert (= 257 (unencodable-char-position (point-min) (point-max) - 'iso-8859-1))) - (Assert (equal '(257) (unencodable-char-position (point-min) (point-max) - 'iso-8859-1 1))) + (Assert + (= 257 (unencodable-char-position (point-min) (point-max) + 'iso-8859-1)) + "check #'unencodable-char-position doesn't think latin-1 encodes U+20AC") + (Assert + (equal '(257) (unencodable-char-position (point-min) (point-max) + 'iso-8859-1 1)) + "check #'unencodable-char-position doesn't think latin-1 encodes U+20AC") ;; Compatiblity, sigh: - (Assert (equal '(257) (unencodable-char-position (point-min) (point-max) - 'iso-8859-1 0))) + (Assert + (equal '(257) (unencodable-char-position (point-min) (point-max) + 'iso-8859-1 0)) + "check #'unencodable-char-position has some borked GNU semantics") (dotimes (i 6) (insert (decode-char 'ucs #x20ac))) ;; Check if it stops at one: (Assert (equal '(257) (unencodable-char-position (point-min) (point-max) - 'iso-8859-1 1))) + 'iso-8859-1 1)) + "check #'unencodable-char-position stops at 1 when asked to") ;; Check if it stops at four: (Assert (equal '(260 259 258 257) (unencodable-char-position (point-min) (point-max) - 'iso-8859-1 4))) + 'iso-8859-1 4)) + "check #'unencodable-char-position stops at 4 when asked to") ;; Check whether it stops at seven: (Assert (equal '(263 262 261 260 259 258 257) (unencodable-char-position (point-min) (point-max) - 'iso-8859-1 7))) + 'iso-8859-1 7)) + "check #'unencodable-char-position stops at 7 when asked to") ;; Check that it still stops at seven: (Assert (equal '(263 262 261 260 259 258 257) (unencodable-char-position (point-min) (point-max) - 'iso-8859-1 2000))) + 'iso-8859-1 2000)) + "check #'unencodable-char-position stops at 7 if 2000 asked for") ;; Now, #'check-coding-systems-region. ;; UTF-8 should certainly be able to encode these characters: (Assert (eq t (check-coding-systems-region (point-min) (point-max) - '(utf-8)))) - (Assert (equal '((iso-8859-1 257 258 259 260 261 262 263) - (windows-1252 129 131 132 133 134 135 136 137 138 139 - 140 141 143 146 147 148 149 150 151 152 - 153 154 155 156 157 159 160)) - (sort - (check-coding-systems-region (point-min) (point-max) - '(utf-8 iso-8859-1 - windows-1252)) - ;; (The sort is to make the algorithm irrelevant.) - #'(lambda (left right) - (string< (car left) (car right)))))) + '(utf-8))) + "check #'check-coding-systems-region gives t if encoding works") + (Assert + (equal '((iso-8859-1 257 258 259 260 261 262 263) + (windows-1252 129 130 131 132 133 134 135 136 + 137 138 139 140 141 142 143 144 + 145 146 147 148 149 150 151 152 + 153 154 155 156 157 158 159 160)) + (sort + (check-coding-systems-region (point-min) (point-max) + '(utf-8 iso-8859-1 + windows-1252)) + ;; (The sort is to make the algorithm irrelevant.) + #'(lambda (left right) + (string< (car left) (car right))))) + "check #'check-coding-systems-region behaves well given a list") ;; Ensure that the indices are all decreased by one when passed a ;; string: - (Assert (equal '((iso-8859-1 256 257 258 259 260 261 262) - (windows-1252 128 130 131 132 133 134 135 136 137 138 - 139 140 142 145 146 147 148 149 150 151 - 152 153 154 155 156 158 159)) - (sort - (check-coding-systems-region (buffer-string) nil - '(utf-8 iso-8859-1 - windows-1252)) - #'(lambda (left right) - (string< (car left) (car right))))))))) + (Assert + (equal '((iso-8859-1 256 257 258 259 260 261 262) + (windows-1252 128 129 130 131 132 133 134 135 + 136 137 138 139 140 141 142 143 + 144 145 146 147 148 149 150 151 + 152 153 154 155 156 157 158 159)) + (sort + (check-coding-systems-region (buffer-string) nil + '(utf-8 iso-8859-1 + windows-1252)) + #'(lambda (left right) + (string< (car left) (car right))))) + "check #'check-coding-systems-region behaves given a string and list")))) + +