# HG changeset patch # User Aidan Kehoe # Date 1234026817 0 # Node ID e0a8715fdb1fbeeee7cad2fe6d5d44fe1e44455f # Parent 202cb69c4d87c4d04b06676366e8d57a60149691 Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region. lisp/ChangeLog addition: 2009-02-07 Aidan Kehoe * coding.el (query-coding-clear-highlights): Rename the BUFFER argument to BUFFER-OR-STRING, describe it as possibly being a string in its documentation. (default-query-coding-region): Add a new IGNORE-INVALID-SEQUENCESP argument, document that this function does not support it. Bind case-fold-search to nil, we don't want this to influence what the function thinks is encodable or not. (query-coding-region): Add a new IGNORE-INVALID-SEQUENCESP argument, document what it does; reflect this new argument in the associated compiler macro. (query-coding-string): Add a new IGNORE-INVALID-SEQUENCESP argument, document what it does. Support the HIGHLIGHT argument correctly. * unicode.el (unicode-query-coding-region): Add a new IGNORE-INVALID-SEQUENCESP argument, document what it does, implement this. Document a potential problem. Use #'query-coding-clear-highlights instead of reimplementing it ourselves. Remove some debugging messages. * mule/arabic.el (iso-8859-6): * mule/cyrillic.el (iso-8859-5): * mule/greek.el (iso-8859-7): * mule/hebrew.el (iso-8859-8): * mule/latin.el (iso-8859-2): * mule/latin.el (iso-8859-3): * mule/latin.el (iso-8859-4): * mule/latin.el (iso-8859-14): * mule/latin.el (iso-8859-15): * mule/latin.el (iso-8859-16): * mule/latin.el (iso-8859-9): * mule/latin.el (windows-1252): * mule/mule-coding.el (iso-8859-1): Avoid the assumption that characters not given an explicit mapping in these coding systems map to the ISO 8859-1 characters corresponding to the octets on disk; this makes it much more reasonable to implement the IGNORE-INVALID-SEQUENCESP argument to query-coding-region. * mule/mule-cmds.el (set-language-info): Correct the docstring. * mule/mule-cmds.el (finish-set-language-environment): Treat invalid Unicode sequences produced from invalid-sequence-coding-system and corresponding to control characters the same as control characters in redisplay. * mule/mule-cmds.el: Document that encode-coding-char is available in coding.el * mule/mule-coding.el (make-8-bit-generate-helper): Change to return the both the encode-program generated and the relevant non-ASCII charset; update the docstring to reflect this. * mule/mule-coding.el (make-8-bit-generate-encode-program-and-skip-chars-strings): Rename this function; have it return skip-chars-strings as well as the encode program. Have these skip-chars-strings use ranges for charsets, where possible. * mule/mule-coding.el (make-8-bit-create-decode-encode-tables): Revise this to allow people to specify explicitly characters that should be undefined (= corresponding to keys in unicode-error-default-translation-table), and treating unspecified octets above #x7f as undefined by default. * mule/mule-coding.el (8-bit-fixed-query-coding-region): Add a new IGNORE-INVALID-SEQUENCESP argument, implement support for it using the 8-bit-fixed-invalid-sequences-skip-chars coding system property; remove some debugging messages. * mule/mule-coding.el (make-8-bit-coding-system): This function is dumped, autoloading it makes no sense. Document what happens when characters above #x7f are not specified, implement this. * mule/vietnamese.el: Correct spelling. tests/ChangeLog addition: 2009-02-07 Aidan Kehoe * automated/query-coding-tests.el: Add FAILING-CASE arguments to the Assert calls, making #'q-c-debug mostly unnecessary. Remove #'q-c-debug. Add new tests that use the IGNORE-INVALID-SEQUENCESP argument to #'query-coding-region; rework the existing ones to respect it. diff -r 202cb69c4d87 -r e0a8715fdb1f lisp/ChangeLog --- a/lisp/ChangeLog Thu Feb 05 21:18:37 2009 -0500 +++ b/lisp/ChangeLog Sat Feb 07 17:13:37 2009 +0000 @@ -1,3 +1,75 @@ +2009-02-07 Aidan Kehoe + + * coding.el (query-coding-clear-highlights): + Rename the BUFFER argument to BUFFER-OR-STRING, describe it as + possibly being a string in its documentation. + (default-query-coding-region): + Add a new IGNORE-INVALID-SEQUENCESP argument, document that this + function does not support it. + Bind case-fold-search to nil, we don't want this to influence what the + function thinks is encodable or not. + (query-coding-region): + Add a new IGNORE-INVALID-SEQUENCESP argument, document what it + does; reflect this new argument in the associated compiler macro. + (query-coding-string): + Add a new IGNORE-INVALID-SEQUENCESP argument, document what it + does. Support the HIGHLIGHT argument correctly. + * unicode.el (unicode-query-coding-region): + Add a new IGNORE-INVALID-SEQUENCESP argument, document what it + does, implement this. Document a potential problem. + Use #'query-coding-clear-highlights instead of reimplementing it + ourselves. + Remove some debugging messages. + * mule/arabic.el (iso-8859-6): + * mule/cyrillic.el (iso-8859-5): + * mule/greek.el (iso-8859-7): + * mule/hebrew.el (iso-8859-8): + * mule/latin.el (iso-8859-2): + * mule/latin.el (iso-8859-3): + * mule/latin.el (iso-8859-4): + * mule/latin.el (iso-8859-14): + * mule/latin.el (iso-8859-15): + * mule/latin.el (iso-8859-16): + * mule/latin.el (iso-8859-9): + * mule/latin.el (windows-1252): + * mule/mule-coding.el (iso-8859-1): + Avoid the assumption that characters not given an explicit mapping + in these coding systems map to the ISO 8859-1 characters + corresponding to the octets on disk; this makes it much more + reasonable to implement the IGNORE-INVALID-SEQUENCESP argument to + query-coding-region. + * mule/mule-cmds.el (set-language-info): + Correct the docstring. + * mule/mule-cmds.el (finish-set-language-environment): + Treat invalid Unicode sequences produced from + invalid-sequence-coding-system and corresponding to control + characters the same as control characters in redisplay. + * mule/mule-cmds.el: + Document that encode-coding-char is available in coding.el + * mule/mule-coding.el (make-8-bit-generate-helper): + Change to return the both the encode-program generated and the + relevant non-ASCII charset; update the docstring to reflect this. + * mule/mule-coding.el + (make-8-bit-generate-encode-program-and-skip-chars-strings): + Rename this function; have it return skip-chars-strings as well as + the encode program. Have these skip-chars-strings use ranges for + charsets, where possible. + * mule/mule-coding.el (make-8-bit-create-decode-encode-tables): + Revise this to allow people to specify explicitly characters that + should be undefined (= corresponding to keys in + unicode-error-default-translation-table), and treating unspecified + octets above #x7f as undefined by default. + * mule/mule-coding.el (8-bit-fixed-query-coding-region): + Add a new IGNORE-INVALID-SEQUENCESP argument, implement support + for it using the 8-bit-fixed-invalid-sequences-skip-chars coding + system property; remove some debugging messages. + * mule/mule-coding.el (make-8-bit-coding-system): + This function is dumped, autoloading it makes no sense. + Document what happens when characters above #x7f are not + specified, implement this. + * mule/vietnamese.el: + Correct spelling. + 2009-02-04 Aidan Kehoe * help.el: diff -r 202cb69c4d87 -r e0a8715fdb1f lisp/coding.el --- a/lisp/coding.el Thu Feb 05 21:18:37 2009 -0500 +++ b/lisp/coding.el Sat Feb 07 17:13:37 2009 +0000 @@ -288,11 +288,11 @@ #s(hash-table test equal data ()) "A map from list of charsets to `skip-chars-forward' arguments for them.") -(defsubst query-coding-clear-highlights (begin end &optional buffer) +(defsubst query-coding-clear-highlights (begin end &optional buffer-or-string) "Remove extent faces added by `query-coding-region' between BEGIN and END. -Optional argument BUFFER is the buffer to use, and defaults to the current -buffer. +Optional argument BUFFER-OR-STRING is the buffer or string to use, and +defaults to the current buffer. The HIGHLIGHTP argument to `query-coding-region' indicates that it should display unencodable characters using `query-coding-warning-face'. After @@ -300,16 +300,19 @@ (map-extents #'(lambda (extent ignored-arg) (when (eq 'query-coding-warning-face (extent-face extent)) - (delete-extent extent))) buffer begin end)) + (delete-extent extent))) buffer-or-string begin end)) (defun* default-query-coding-region (begin end coding-system - &optional buffer errorp highlightp) + &optional buffer ignore-invalid-sequencesp + errorp highlightp) "The default `query-coding-region' implementation. Uses the `safe-charsets' and `safe-chars' coding system properties. The former is a list of XEmacs character sets that can be safely encoded by CODING-SYSTEM; the latter a char table describing, in -addition, characters that can be safely encoded by CODING-SYSTEM." +addition, characters that can be safely encoded by CODING-SYSTEM. + +Does not support IGNORE-INVALID-SEQUENCESP." (check-argument-type #'coding-system-p (setq coding-system (find-coding-system coding-system))) (check-argument-type #'integer-or-marker-p begin) @@ -326,6 +329,7 @@ (gethash safe-charsets default-query-coding-region-safe-charset-skip-chars-map)) (ranges (make-range-table)) + (case-fold-search nil) fail-range-start fail-range-end char-after looking-at-arg failed extent) ;; Coding systems with a value of t for safe-charsets support everything. @@ -401,41 +405,27 @@ (values t nil)))))) (defun query-coding-region (start end coding-system &optional buffer - errorp highlight) + ignore-invalid-sequencesp errorp highlight) "Work out whether CODING-SYSTEM can losslessly encode a region. START and END are the beginning and end of the region to check. CODING-SYSTEM is the coding system to try. Optional argument BUFFER is the buffer to check, and defaults to the current -buffer. Optional argument ERRORP says to signal a `text-conversion-error' -if some character in the region cannot be encoded, and defaults to nil. - -Optional argument HIGHLIGHT says to display unencodable characters in the -region using `query-coding-warning-face'. It defaults to nil. +buffer. -This function returns a list; the intention is that callers use -`multiple-value-bind' or the related CL multiple value functions to deal -with it. The first element is `t' if the region can be encoded using -CODING-SYSTEM, or `nil' if not. The second element is `nil' if the region -can be encoded using CODING-SYSTEM; otherwise, it is a range table -describing the positions of the unencodable characters. See -`make-range-table'." - (funcall (or (coding-system-get coding-system 'query-coding-function) - #'default-query-coding-region) - start end coding-system buffer errorp highlight)) +IGNORE-INVALID-SEQUENCESP, also an optional argument, says to treat XEmacs +characters which have an unambiguous encoded representation, despite being +undefined in what they represent, as encodable. These chiefly arise with +variable-length encodings like UTF-8 and UTF-16, where an invalid sequence +is passed through to XEmacs as a sequence of characters with a defined +correspondence to the octets on disk, but no non-error semantics; see the +`invalid-sequence-coding-system' argument to `set-language-info'. -(define-compiler-macro query-coding-region (start end coding-system - &optional buffer errorp highlight) - `(funcall (or (coding-system-get ,coding-system 'query-coding-function) - #'default-query-coding-region) - ,start ,end ,coding-system ,@(append (if buffer (list buffer)) - (if errorp (list errorp)) - (if highlight (list highlight))))) - -(defun query-coding-string (string coding-system &optional errorp highlight) - "Work out whether CODING-SYSTEM can losslessly encode STRING. -CODING-SYSTEM is the coding system to check. +They can also arise with fixed-length encodings like ISO 8859-7, where +certain octets on disk have undefined values, and treating them as +corresponding to the ISO 8859-1 characters with the same numerical values +may lead to data that is not understood by other applications. Optional argument ERRORP says to signal a `text-conversion-error' if some character in the region cannot be encoded, and defaults to nil. @@ -443,28 +433,94 @@ Optional argument HIGHLIGHT says to display unencodable characters in the region using `query-coding-warning-face'. It defaults to nil. -This function returns a list; the intention is that callers use use +This function returns a list; the intention is that callers use `multiple-value-bind' or the related CL multiple value functions to deal -with it. The first element is `t' if the string can be encoded using -CODING-SYSTEM, or `nil' if not. The second element is `nil' if the string +with it. The first element is `t' if the region can be encoded using +CODING-SYSTEM, or `nil' if not. The second element is `nil' if the region can be encoded using CODING-SYSTEM; otherwise, it is a range table -describing the positions of the unencodable characters. See -`make-range-table'." +describing the positions of the unencodable characters. Ranges that +describe characters that would be ignored were IGNORE-INVALID-SEQUENCESP +non-nil map to the symbol `invalid-sequence'; other ranges map to the symbol +`unencodable'. If IGNORE-INVALID-SEQUENCESP is non-nil, all ranges will map +to the symbol `unencodable'. See `make-range-table' for more details of +range tables." + (funcall (or (coding-system-get coding-system 'query-coding-function) + #'default-query-coding-region) + start end coding-system buffer ignore-invalid-sequencesp errorp + highlight)) + +(define-compiler-macro query-coding-region (start end coding-system + &optional buffer + ignore-invalid-sequencesp + errorp highlight) + `(funcall (or (coding-system-get ,coding-system 'query-coding-function) + #'default-query-coding-region) + ,start ,end ,coding-system ,@(append (when (or buffer + ignore-invalid-sequencesp + errorp highlight) + (list buffer)) + (when (or ignore-invalid-sequencesp + errorp highlight) + (list ignore-invalid-sequencesp)) + (when (or errorp highlight) + (list errorp)) + (when highlight (list highlight))))) + +(defun query-coding-string (string coding-system &optional + ignore-invalid-sequencesp errorp highlight) + "Work out whether CODING-SYSTEM can losslessly encode STRING. +CODING-SYSTEM is the coding system to check. + +IGNORE-INVALID-SEQUENCESP, an optional argument, says to treat XEmacs +characters which have an unambiguous encoded representation, despite being +undefined in what they represent, as encodable. These chiefly arise with +variable-length encodings like UTF-8 and UTF-16, where an invalid sequence +is passed through to XEmacs as a sequence of characters with a defined +correspondence to the octets on disk, but no non-error semantics; see the +`invalid-sequence-coding-system' argument to `set-language-info'. + +They can also arise with fixed-length encodings like ISO 8859-7, where +certain octets on disk have undefined values, and treating them as +corresponding to the ISO 8859-1 characters with the same numerical values +may lead to data that is not understood by other applications. + +Optional argument ERRORP says to signal a `text-conversion-error' if some +character in the region cannot be encoded, and defaults to nil. + +Optional argument HIGHLIGHT says to display unencodable characters in the +region using `query-coding-warning-face'. It defaults to nil. + +This function returns a list; the intention is that callers use +`multiple-value-bind' or the related CL multiple value functions to deal +with it. The first element is `t' if the region can be encoded using +CODING-SYSTEM, or `nil' if not. The second element is `nil' if the region +can be encoded using CODING-SYSTEM; otherwise, it is a range table +describing the positions of the unencodable characters. Ranges that +describe characters that would be ignored were IGNORE-INVALID-SEQUENCESP +non-nil map to the symbol `invalid-sequence'; other ranges map to the symbol +`unencodable'. If IGNORE-INVALID-SEQUENCESP is non-nil, all ranges will map +to the symbol `unencodable'. See `make-range-table' for more details of +range tables." (with-temp-buffer + (when highlight + (query-coding-clear-highlights 0 (length string) string)) (insert string) - (multiple-value-bind (result ranges) + (multiple-value-bind (result ranges extent) (query-coding-region (point-min) (point-max) coding-system (current-buffer) errorp - ;; #### Highlight won't work here, - ;; query-coding-region may need to be modified. - highlight) + nil ignore-invalid-sequencesp) (unless result - ;; Sigh, string indices are zero-based, buffer offsets are - ;; one-based. (map-range-table #'(lambda (begin end value) + ;; Sigh, string indices are zero-based, buffer offsets are + ;; one-based. (remove-range-table begin end ranges) - (put-range-table (1- begin) (1- end) value ranges)) + (put-range-table (decf begin) (decf end) value ranges) + (when highlight + (setq extent (make-extent begin end string)) + (set-extent-priority extent (+ mouse-highlight-priority 2)) + (set-extent-property extent 'duplicable t) + (set-extent-face extent 'query-coding-warning-face))) ranges)) (values result ranges)))) diff -r 202cb69c4d87 -r e0a8715fdb1f lisp/mule/arabic.el --- a/lisp/mule/arabic.el Thu Feb 05 21:18:37 2009 -0500 +++ b/lisp/mule/arabic.el Sat Feb 07 17:13:37 2009 +0000 @@ -33,7 +33,39 @@ (make-8-bit-coding-system 'iso-8859-6 - '((#xA0 ?\u00A0) ;; NO-BREAK SPACE + '((#x80 ?\u0080) ;; + (#x81 ?\u0081) ;; + (#x82 ?\u0082) ;; + (#x83 ?\u0083) ;; + (#x84 ?\u0084) ;; + (#x85 ?\u0085) ;; + (#x86 ?\u0086) ;; + (#x87 ?\u0087) ;; + (#x88 ?\u0088) ;; + (#x89 ?\u0089) ;; + (#x8A ?\u008A) ;; + (#x8B ?\u008B) ;; + (#x8C ?\u008C) ;; + (#x8D ?\u008D) ;; + (#x8E ?\u008E) ;; + (#x8F ?\u008F) ;; + (#x90 ?\u0090) ;; + (#x91 ?\u0091) ;; + (#x92 ?\u0092) ;; + (#x93 ?\u0093) ;; + (#x94 ?\u0094) ;; + (#x95 ?\u0095) ;; + (#x96 ?\u0096) ;; + (#x97 ?\u0097) ;; + (#x98 ?\u0098) ;; + (#x99 ?\u0099) ;; + (#x9A ?\u009A) ;; + (#x9B ?\u009B) ;; + (#x9C ?\u009C) ;; + (#x9D ?\u009D) ;; + (#x9E ?\u009E) ;; + (#x9F ?\u009F) ;; + (#xA0 ?\u00A0) ;; NO-BREAK SPACE (#xA4 ?\u00A4) ;; CURRENCY SIGN (#xAC ?\u060C) ;; ARABIC COMMA (#xAD ?\u00AD) ;; SOFT HYPHEN diff -r 202cb69c4d87 -r e0a8715fdb1f lisp/mule/cyrillic.el --- a/lisp/mule/cyrillic.el Thu Feb 05 21:18:37 2009 -0500 +++ b/lisp/mule/cyrillic.el Sat Feb 07 17:13:37 2009 +0000 @@ -108,7 +108,40 @@ ;; And create the coding system. (make-8-bit-coding-system 'iso-8859-5 - '((#xA1 ?\u0401) ;; CYRILLIC CAPITAL LETTER IO + '((#x80 ?\u0080) ;; + (#x81 ?\u0081) ;; + (#x82 ?\u0082) ;; + (#x83 ?\u0083) ;; + (#x84 ?\u0084) ;; + (#x85 ?\u0085) ;; + (#x86 ?\u0086) ;; + (#x87 ?\u0087) ;; + (#x88 ?\u0088) ;; + (#x89 ?\u0089) ;; + (#x8A ?\u008A) ;; + (#x8B ?\u008B) ;; + (#x8C ?\u008C) ;; + (#x8D ?\u008D) ;; + (#x8E ?\u008E) ;; + (#x8F ?\u008F) ;; + (#x90 ?\u0090) ;; + (#x91 ?\u0091) ;; + (#x92 ?\u0092) ;; + (#x93 ?\u0093) ;; + (#x94 ?\u0094) ;; + (#x95 ?\u0095) ;; + (#x96 ?\u0096) ;; + (#x97 ?\u0097) ;; + (#x98 ?\u0098) ;; + (#x99 ?\u0099) ;; + (#x9A ?\u009A) ;; + (#x9B ?\u009B) ;; + (#x9C ?\u009C) ;; + (#x9D ?\u009D) ;; + (#x9E ?\u009E) ;; + (#x9F ?\u009F) ;; + (#xA0 ?\u00A0) ;; NO-BREAK SPACE + (#xA1 ?\u0401) ;; CYRILLIC CAPITAL LETTER IO (#xA2 ?\u0402) ;; CYRILLIC CAPITAL LETTER DJE (#xA3 ?\u0403) ;; CYRILLIC CAPITAL LETTER GJE (#xA4 ?\u0404) ;; CYRILLIC CAPITAL LETTER UKRAINIAN IE @@ -120,6 +153,7 @@ (#xAA ?\u040A) ;; CYRILLIC CAPITAL LETTER NJE (#xAB ?\u040B) ;; CYRILLIC CAPITAL LETTER TSHE (#xAC ?\u040C) ;; CYRILLIC CAPITAL LETTER KJE + (#xAD ?\u00AD) ;; SOFT HYPHEN (#xAE ?\u040E) ;; CYRILLIC CAPITAL LETTER SHORT U (#xAF ?\u040F) ;; CYRILLIC CAPITAL LETTER DZHE (#xB0 ?\u0410) ;; CYRILLIC CAPITAL LETTER A @@ -205,7 +239,7 @@ "ISO-8859-5 (Cyrillic)" '(mnemonic "ISO8/Cyr" documentation "The ISO standard for encoding Cyrillic. Not used in practice. -See `koi8-r' and `windows-1250'. " +See `koi8-r' and `windows-1251'. " aliases (cyrillic-iso-8bit))) ;; Provide this locale; but don't allow it to be picked up from the Unix diff -r 202cb69c4d87 -r e0a8715fdb1f lisp/mule/greek.el --- a/lisp/mule/greek.el Thu Feb 05 21:18:37 2009 -0500 +++ b/lisp/mule/greek.el Sat Feb 07 17:13:37 2009 +0000 @@ -120,19 +120,67 @@ (make-8-bit-coding-system 'iso-8859-7 - '((#xA1 ?\u2018) ;; LEFT SINGLE QUOTATION MARK + '((#x80 ?\u0080) ;; + (#x81 ?\u0081) ;; + (#x82 ?\u0082) ;; + (#x83 ?\u0083) ;; + (#x84 ?\u0084) ;; + (#x85 ?\u0085) ;; + (#x86 ?\u0086) ;; + (#x87 ?\u0087) ;; + (#x88 ?\u0088) ;; + (#x89 ?\u0089) ;; + (#x8A ?\u008A) ;; + (#x8B ?\u008B) ;; + (#x8C ?\u008C) ;; + (#x8D ?\u008D) ;; + (#x8E ?\u008E) ;; + (#x8F ?\u008F) ;; + (#x90 ?\u0090) ;; + (#x91 ?\u0091) ;; + (#x92 ?\u0092) ;; + (#x93 ?\u0093) ;; + (#x94 ?\u0094) ;; + (#x95 ?\u0095) ;; + (#x96 ?\u0096) ;; + (#x97 ?\u0097) ;; + (#x98 ?\u0098) ;; + (#x99 ?\u0099) ;; + (#x9A ?\u009A) ;; + (#x9B ?\u009B) ;; + (#x9C ?\u009C) ;; + (#x9D ?\u009D) ;; + (#x9E ?\u009E) ;; + (#x9F ?\u009F) ;; + (#xA0 ?\u00A0) ;; NO-BREAK SPACE + (#xA1 ?\u2018) ;; LEFT SINGLE QUOTATION MARK (#xA2 ?\u2019) ;; RIGHT SINGLE QUOTATION MARK + (#xA3 ?\u00A3) ;; POUND SIGN (#xA4 ?\u20AC) ;; EURO SIGN (#xA5 ?\u20AF) ;; DRACHMA SIGN + (#xA6 ?\u00A6) ;; BROKEN BAR + (#xA7 ?\u00A7) ;; SECTION SIGN + (#xA8 ?\u00A8) ;; DIAERESIS + (#xA9 ?\u00A9) ;; COPYRIGHT SIGN (#xAA ?\u037A) ;; GREEK YPOGEGRAMMENI + (#xAB ?\u00AB) ;; LEFT-POINTING DOUBLE ANGLE QUOTATION MARK + (#xAC ?\u00AC) ;; NOT SIGN + (#xAD ?\u00AD) ;; SOFT HYPHEN (#xAF ?\u2015) ;; HORIZONTAL BAR + (#xB0 ?\u00B0) ;; DEGREE SIGN + (#xB1 ?\u00B1) ;; PLUS-MINUS SIGN + (#xB2 ?\u00B2) ;; SUPERSCRIPT TWO + (#xB3 ?\u00B3) ;; SUPERSCRIPT THREE (#xB4 ?\u0384) ;; GREEK TONOS (#xB5 ?\u0385) ;; GREEK DIALYTIKA TONOS (#xB6 ?\u0386) ;; GREEK CAPITAL LETTER ALPHA WITH TONOS + (#xB7 ?\u00B7) ;; MIDDLE DOT (#xB8 ?\u0388) ;; GREEK CAPITAL LETTER EPSILON WITH TONOS (#xB9 ?\u0389) ;; GREEK CAPITAL LETTER ETA WITH TONOS (#xBA ?\u038A) ;; GREEK CAPITAL LETTER IOTA WITH TONOS + (#xBB ?\u00BB) ;; RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK (#xBC ?\u038C) ;; GREEK CAPITAL LETTER OMICRON WITH TONOS + (#xBD ?\u00BD) ;; VULGAR FRACTION ONE HALF (#xBE ?\u038E) ;; GREEK CAPITAL LETTER UPSILON WITH TONOS (#xBF ?\u038F) ;; GREEK CAPITAL LETTER OMEGA WITH TONOS (#xC0 ?\u0390) ;; GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS @@ -196,7 +244,7 @@ (#xFB ?\u03CB) ;; GREEK SMALL LETTER UPSILON WITH DIALYTIKA (#xFC ?\u03CC) ;; GREEK SMALL LETTER OMICRON WITH TONOS (#xFD ?\u03CD) ;; GREEK SMALL LETTER UPSILON WITH TONOS - (#xFE ?\u03CE)) ;; GREEK SMALL LETTER OMEGA WITH TONOS + (#xFE ?\u03CE));; GREEK SMALL LETTER OMEGA WITH TONOS "ISO-8859-7 (Greek)" '(mnemonic "Grk" aliases (greek-iso-8bit))) diff -r 202cb69c4d87 -r e0a8715fdb1f lisp/mule/hebrew.el --- a/lisp/mule/hebrew.el Thu Feb 05 21:18:37 2009 -0500 +++ b/lisp/mule/hebrew.el Sat Feb 07 17:13:37 2009 +0000 @@ -50,8 +50,68 @@ (make-8-bit-coding-system 'iso-8859-8 - '((#xAA ?\u00D7) ;; MULTIPLICATION SIGN + '((#x80 ?\u0080) ;; + (#x81 ?\u0081) ;; + (#x82 ?\u0082) ;; + (#x83 ?\u0083) ;; + (#x84 ?\u0084) ;; + (#x85 ?\u0085) ;; + (#x86 ?\u0086) ;; + (#x87 ?\u0087) ;; + (#x88 ?\u0088) ;; + (#x89 ?\u0089) ;; + (#x8A ?\u008A) ;; + (#x8B ?\u008B) ;; + (#x8C ?\u008C) ;; + (#x8D ?\u008D) ;; + (#x8E ?\u008E) ;; + (#x8F ?\u008F) ;; + (#x90 ?\u0090) ;; + (#x91 ?\u0091) ;; + (#x92 ?\u0092) ;; + (#x93 ?\u0093) ;; + (#x94 ?\u0094) ;; + (#x95 ?\u0095) ;; + (#x96 ?\u0096) ;; + (#x97 ?\u0097) ;; + (#x98 ?\u0098) ;; + (#x99 ?\u0099) ;; + (#x9A ?\u009A) ;; + (#x9B ?\u009B) ;; + (#x9C ?\u009C) ;; + (#x9D ?\u009D) ;; + (#x9E ?\u009E) ;; + (#x9F ?\u009F) ;; + (#xA0 ?\u00A0) ;; NO-BREAK SPACE + (#xA2 ?\u00A2) ;; CENT SIGN + (#xA3 ?\u00A3) ;; POUND SIGN + (#xA4 ?\u00A4) ;; CURRENCY SIGN + (#xA5 ?\u00A5) ;; YEN SIGN + (#xA6 ?\u00A6) ;; BROKEN BAR + (#xA7 ?\u00A7) ;; SECTION SIGN + (#xA8 ?\u00A8) ;; DIAERESIS + (#xA9 ?\u00A9) ;; COPYRIGHT SIGN + (#xAA ?\u00D7) ;; MULTIPLICATION SIGN + (#xAB ?\u00AB) ;; LEFT-POINTING DOUBLE ANGLE QUOTATION MARK + (#xAC ?\u00AC) ;; NOT SIGN + (#xAD ?\u00AD) ;; SOFT HYPHEN + (#xAE ?\u00AE) ;; REGISTERED SIGN + (#xAF ?\u00AF) ;; MACRON + (#xB0 ?\u00B0) ;; DEGREE SIGN + (#xB1 ?\u00B1) ;; PLUS-MINUS SIGN + (#xB2 ?\u00B2) ;; SUPERSCRIPT TWO + (#xB3 ?\u00B3) ;; SUPERSCRIPT THREE + (#xB4 ?\u00B4) ;; ACUTE ACCENT + (#xB5 ?\u00B5) ;; MICRO SIGN + (#xB6 ?\u00B6) ;; PILCROW SIGN + (#xB7 ?\u00B7) ;; MIDDLE DOT + (#xB8 ?\u00B8) ;; CEDILLA + (#xB9 ?\u00B9) ;; SUPERSCRIPT ONE (#xBA ?\u00F7) ;; DIVISION SIGN + (#xBB ?\u00BB) ;; RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK + (#xBC ?\u00BC) ;; VULGAR FRACTION ONE QUARTER + (#xBD ?\u00BD) ;; VULGAR FRACTION ONE HALF + (#xBE ?\u00BE) ;; VULGAR FRACTION THREE QUARTERS (#xDF ?\u2017) ;; DOUBLE LOW LINE (#xE0 ?\u05D0) ;; HEBREW LETTER ALEF (#xE1 ?\u05D1) ;; HEBREW LETTER BET diff -r 202cb69c4d87 -r e0a8715fdb1f lisp/mule/latin.el --- a/lisp/mule/latin.el Thu Feb 05 21:18:37 2009 -0500 +++ b/lisp/mule/latin.el Sat Feb 07 17:13:37 2009 +0000 @@ -126,23 +126,63 @@ (make-8-bit-coding-system 'iso-8859-2 - '((#xA1 ?\u0104) ;; LATIN CAPITAL LETTER A WITH OGONEK + '((#x80 ?\u0080) ;; + (#x81 ?\u0081) ;; + (#x82 ?\u0082) ;; + (#x83 ?\u0083) ;; + (#x84 ?\u0084) ;; + (#x85 ?\u0085) ;; + (#x86 ?\u0086) ;; + (#x87 ?\u0087) ;; + (#x88 ?\u0088) ;; + (#x89 ?\u0089) ;; + (#x8A ?\u008A) ;; + (#x8B ?\u008B) ;; + (#x8C ?\u008C) ;; + (#x8D ?\u008D) ;; + (#x8E ?\u008E) ;; + (#x8F ?\u008F) ;; + (#x90 ?\u0090) ;; + (#x91 ?\u0091) ;; + (#x92 ?\u0092) ;; + (#x93 ?\u0093) ;; + (#x94 ?\u0094) ;; + (#x95 ?\u0095) ;; + (#x96 ?\u0096) ;; + (#x97 ?\u0097) ;; + (#x98 ?\u0098) ;; + (#x99 ?\u0099) ;; + (#x9A ?\u009A) ;; + (#x9B ?\u009B) ;; + (#x9C ?\u009C) ;; + (#x9D ?\u009D) ;; + (#x9E ?\u009E) ;; + (#x9F ?\u009F) ;; + (#xA0 ?\u00A0) ;; NO-BREAK SPACE + (#xA1 ?\u0104) ;; LATIN CAPITAL LETTER A WITH OGONEK (#xA2 ?\u02D8) ;; BREVE (#xA3 ?\u0141) ;; LATIN CAPITAL LETTER L WITH STROKE + (#xA4 ?\u00A4) ;; CURRENCY SIGN (#xA5 ?\u013D) ;; LATIN CAPITAL LETTER L WITH CARON (#xA6 ?\u015A) ;; LATIN CAPITAL LETTER S WITH ACUTE + (#xA7 ?\u00A7) ;; SECTION SIGN + (#xA8 ?\u00A8) ;; DIAERESIS (#xA9 ?\u0160) ;; LATIN CAPITAL LETTER S WITH CARON (#xAA ?\u015E) ;; LATIN CAPITAL LETTER S WITH CEDILLA (#xAB ?\u0164) ;; LATIN CAPITAL LETTER T WITH CARON (#xAC ?\u0179) ;; LATIN CAPITAL LETTER Z WITH ACUTE + (#xAD ?\u00AD) ;; SOFT HYPHEN (#xAE ?\u017D) ;; LATIN CAPITAL LETTER Z WITH CARON (#xAF ?\u017B) ;; LATIN CAPITAL LETTER Z WITH DOT ABOVE + (#xB0 ?\u00B0) ;; DEGREE SIGN (#xB1 ?\u0105) ;; LATIN SMALL LETTER A WITH OGONEK (#xB2 ?\u02DB) ;; OGONEK (#xB3 ?\u0142) ;; LATIN SMALL LETTER L WITH STROKE + (#xB4 ?\u00B4) ;; ACUTE ACCENT (#xB5 ?\u013E) ;; LATIN SMALL LETTER L WITH CARON (#xB6 ?\u015B) ;; LATIN SMALL LETTER S WITH ACUTE (#xB7 ?\u02C7) ;; CARON + (#xB8 ?\u00B8) ;; CEDILLA (#xB9 ?\u0161) ;; LATIN SMALL LETTER S WITH CARON (#xBA ?\u015F) ;; LATIN SMALL LETTER S WITH CEDILLA (#xBB ?\u0165) ;; LATIN SMALL LETTER T WITH CARON @@ -151,39 +191,70 @@ (#xBE ?\u017E) ;; LATIN SMALL LETTER Z WITH CARON (#xBF ?\u017C) ;; LATIN SMALL LETTER Z WITH DOT ABOVE (#xC0 ?\u0154) ;; LATIN CAPITAL LETTER R WITH ACUTE + (#xC1 ?\u00C1) ;; LATIN CAPITAL LETTER A WITH ACUTE + (#xC2 ?\u00C2) ;; LATIN CAPITAL LETTER A WITH CIRCUMFLEX (#xC3 ?\u0102) ;; LATIN CAPITAL LETTER A WITH BREVE + (#xC4 ?\u00C4) ;; LATIN CAPITAL LETTER A WITH DIAERESIS (#xC5 ?\u0139) ;; LATIN CAPITAL LETTER L WITH ACUTE (#xC6 ?\u0106) ;; LATIN CAPITAL LETTER C WITH ACUTE + (#xC7 ?\u00C7) ;; LATIN CAPITAL LETTER C WITH CEDILLA (#xC8 ?\u010C) ;; LATIN CAPITAL LETTER C WITH CARON + (#xC9 ?\u00C9) ;; LATIN CAPITAL LETTER E WITH ACUTE (#xCA ?\u0118) ;; LATIN CAPITAL LETTER E WITH OGONEK + (#xCB ?\u00CB) ;; LATIN CAPITAL LETTER E WITH DIAERESIS (#xCC ?\u011A) ;; LATIN CAPITAL LETTER E WITH CARON + (#xCD ?\u00CD) ;; LATIN CAPITAL LETTER I WITH ACUTE + (#xCE ?\u00CE) ;; LATIN CAPITAL LETTER I WITH CIRCUMFLEX (#xCF ?\u010E) ;; LATIN CAPITAL LETTER D WITH CARON (#xD0 ?\u0110) ;; LATIN CAPITAL LETTER D WITH STROKE (#xD1 ?\u0143) ;; LATIN CAPITAL LETTER N WITH ACUTE (#xD2 ?\u0147) ;; LATIN CAPITAL LETTER N WITH CARON + (#xD3 ?\u00D3) ;; LATIN CAPITAL LETTER O WITH ACUTE + (#xD4 ?\u00D4) ;; LATIN CAPITAL LETTER O WITH CIRCUMFLEX (#xD5 ?\u0150) ;; LATIN CAPITAL LETTER O WITH DOUBLE ACUTE + (#xD6 ?\u00D6) ;; LATIN CAPITAL LETTER O WITH DIAERESIS + (#xD7 ?\u00D7) ;; MULTIPLICATION SIGN (#xD8 ?\u0158) ;; LATIN CAPITAL LETTER R WITH CARON (#xD9 ?\u016E) ;; LATIN CAPITAL LETTER U WITH RING ABOVE + (#xDA ?\u00DA) ;; LATIN CAPITAL LETTER U WITH ACUTE (#xDB ?\u0170) ;; LATIN CAPITAL LETTER U WITH DOUBLE ACUTE + (#xDC ?\u00DC) ;; LATIN CAPITAL LETTER U WITH DIAERESIS + (#xDD ?\u00DD) ;; LATIN CAPITAL LETTER Y WITH ACUTE (#xDE ?\u0162) ;; LATIN CAPITAL LETTER T WITH CEDILLA + (#xDF ?\u00DF) ;; LATIN SMALL LETTER SHARP S (#xE0 ?\u0155) ;; LATIN SMALL LETTER R WITH ACUTE + (#xE1 ?\u00E1) ;; LATIN SMALL LETTER A WITH ACUTE + (#xE2 ?\u00E2) ;; LATIN SMALL LETTER A WITH CIRCUMFLEX (#xE3 ?\u0103) ;; LATIN SMALL LETTER A WITH BREVE + (#xE4 ?\u00E4) ;; LATIN SMALL LETTER A WITH DIAERESIS (#xE5 ?\u013A) ;; LATIN SMALL LETTER L WITH ACUTE (#xE6 ?\u0107) ;; LATIN SMALL LETTER C WITH ACUTE + (#xE7 ?\u00E7) ;; LATIN SMALL LETTER C WITH CEDILLA (#xE8 ?\u010D) ;; LATIN SMALL LETTER C WITH CARON + (#xE9 ?\u00E9) ;; LATIN SMALL LETTER E WITH ACUTE (#xEA ?\u0119) ;; LATIN SMALL LETTER E WITH OGONEK + (#xEB ?\u00EB) ;; LATIN SMALL LETTER E WITH DIAERESIS (#xEC ?\u011B) ;; LATIN SMALL LETTER E WITH CARON + (#xED ?\u00ED) ;; LATIN SMALL LETTER I WITH ACUTE + (#xEE ?\u00EE) ;; LATIN SMALL LETTER I WITH CIRCUMFLEX (#xEF ?\u010F) ;; LATIN SMALL LETTER D WITH CARON (#xF0 ?\u0111) ;; LATIN SMALL LETTER D WITH STROKE (#xF1 ?\u0144) ;; LATIN SMALL LETTER N WITH ACUTE (#xF2 ?\u0148) ;; LATIN SMALL LETTER N WITH CARON + (#xF3 ?\u00F3) ;; LATIN SMALL LETTER O WITH ACUTE + (#xF4 ?\u00F4) ;; LATIN SMALL LETTER O WITH CIRCUMFLEX (#xF5 ?\u0151) ;; LATIN SMALL LETTER O WITH DOUBLE ACUTE + (#xF6 ?\u00F6) ;; LATIN SMALL LETTER O WITH DIAERESIS + (#xF7 ?\u00F7) ;; DIVISION SIGN (#xF8 ?\u0159) ;; LATIN SMALL LETTER R WITH CARON (#xF9 ?\u016F) ;; LATIN SMALL LETTER U WITH RING ABOVE + (#xFA ?\u00FA) ;; LATIN SMALL LETTER U WITH ACUTE (#xFB ?\u0171) ;; LATIN SMALL LETTER U WITH DOUBLE ACUTE + (#xFC ?\u00FC) ;; LATIN SMALL LETTER U WITH DIAERESIS + (#xFD ?\u00FD) ;; LATIN SMALL LETTER Y WITH ACUTE (#xFE ?\u0163) ;; LATIN SMALL LETTER T WITH CEDILLA - (#xFF ?\u02D9));; DOT ABOVE - "ISO-8859-2 (Latin-2) for Central Europe. + (#xFF ?\u02D9)) ;; DOT ABOVE + "ISO-8859-2 (Latin-2) for Central Europe. See also `windows-1250', and `iso-8859-1', which is compatible with Latin 2 when used to write German (or English, of course). " '(mnemonic "Latin 2" @@ -391,31 +462,124 @@ (make-8-bit-coding-system 'iso-8859-3 - '((#xA1 ?\u0126) ;; LATIN CAPITAL LETTER H WITH STROKE + '((#x80 ?\u0080) ;; + (#x81 ?\u0081) ;; + (#x82 ?\u0082) ;; + (#x83 ?\u0083) ;; + (#x84 ?\u0084) ;; + (#x85 ?\u0085) ;; + (#x86 ?\u0086) ;; + (#x87 ?\u0087) ;; + (#x88 ?\u0088) ;; + (#x89 ?\u0089) ;; + (#x8A ?\u008A) ;; + (#x8B ?\u008B) ;; + (#x8C ?\u008C) ;; + (#x8D ?\u008D) ;; + (#x8E ?\u008E) ;; + (#x8F ?\u008F) ;; + (#x90 ?\u0090) ;; + (#x91 ?\u0091) ;; + (#x92 ?\u0092) ;; + (#x93 ?\u0093) ;; + (#x94 ?\u0094) ;; + (#x95 ?\u0095) ;; + (#x96 ?\u0096) ;; + (#x97 ?\u0097) ;; + (#x98 ?\u0098) ;; + (#x99 ?\u0099) ;; + (#x9A ?\u009A) ;; + (#x9B ?\u009B) ;; + (#x9C ?\u009C) ;; + (#x9D ?\u009D) ;; + (#x9E ?\u009E) ;; + (#x9F ?\u009F) ;; + (#xA0 ?\u00A0) ;; NO-BREAK SPACE + (#xA1 ?\u0126) ;; LATIN CAPITAL LETTER H WITH STROKE (#xA2 ?\u02D8) ;; BREVE + (#xA3 ?\u00A3) ;; POUND SIGN + (#xA4 ?\u00A4) ;; CURRENCY SIGN (#xA6 ?\u0124) ;; LATIN CAPITAL LETTER H WITH CIRCUMFLEX + (#xA7 ?\u00A7) ;; SECTION SIGN + (#xA8 ?\u00A8) ;; DIAERESIS (#xA9 ?\u0130) ;; LATIN CAPITAL LETTER I WITH DOT ABOVE (#xAA ?\u015E) ;; LATIN CAPITAL LETTER S WITH CEDILLA (#xAB ?\u011E) ;; LATIN CAPITAL LETTER G WITH BREVE (#xAC ?\u0134) ;; LATIN CAPITAL LETTER J WITH CIRCUMFLEX + (#xAD ?\u00AD) ;; SOFT HYPHEN (#xAF ?\u017B) ;; LATIN CAPITAL LETTER Z WITH DOT ABOVE + (#xB0 ?\u00B0) ;; DEGREE SIGN (#xB1 ?\u0127) ;; LATIN SMALL LETTER H WITH STROKE + (#xB2 ?\u00B2) ;; SUPERSCRIPT TWO + (#xB3 ?\u00B3) ;; SUPERSCRIPT THREE + (#xB4 ?\u00B4) ;; ACUTE ACCENT + (#xB5 ?\u00B5) ;; MICRO SIGN (#xB6 ?\u0125) ;; LATIN SMALL LETTER H WITH CIRCUMFLEX + (#xB7 ?\u00B7) ;; MIDDLE DOT + (#xB8 ?\u00B8) ;; CEDILLA (#xB9 ?\u0131) ;; LATIN SMALL LETTER DOTLESS I (#xBA ?\u015F) ;; LATIN SMALL LETTER S WITH CEDILLA (#xBB ?\u011F) ;; LATIN SMALL LETTER G WITH BREVE (#xBC ?\u0135) ;; LATIN SMALL LETTER J WITH CIRCUMFLEX + (#xBD ?\u00BD) ;; VULGAR FRACTION ONE HALF (#xBF ?\u017C) ;; LATIN SMALL LETTER Z WITH DOT ABOVE + (#xC0 ?\u00C0) ;; LATIN CAPITAL LETTER A WITH GRAVE + (#xC1 ?\u00C1) ;; LATIN CAPITAL LETTER A WITH ACUTE + (#xC2 ?\u00C2) ;; LATIN CAPITAL LETTER A WITH CIRCUMFLEX + (#xC4 ?\u00C4) ;; LATIN CAPITAL LETTER A WITH DIAERESIS (#xC5 ?\u010A) ;; LATIN CAPITAL LETTER C WITH DOT ABOVE (#xC6 ?\u0108) ;; LATIN CAPITAL LETTER C WITH CIRCUMFLEX + (#xC7 ?\u00C7) ;; LATIN CAPITAL LETTER C WITH CEDILLA + (#xC8 ?\u00C8) ;; LATIN CAPITAL LETTER E WITH GRAVE + (#xC9 ?\u00C9) ;; LATIN CAPITAL LETTER E WITH ACUTE + (#xCA ?\u00CA) ;; LATIN CAPITAL LETTER E WITH CIRCUMFLEX + (#xCB ?\u00CB) ;; LATIN CAPITAL LETTER E WITH DIAERESIS + (#xCC ?\u00CC) ;; LATIN CAPITAL LETTER I WITH GRAVE + (#xCD ?\u00CD) ;; LATIN CAPITAL LETTER I WITH ACUTE + (#xCE ?\u00CE) ;; LATIN CAPITAL LETTER I WITH CIRCUMFLEX + (#xCF ?\u00CF) ;; LATIN CAPITAL LETTER I WITH DIAERESIS + (#xD1 ?\u00D1) ;; LATIN CAPITAL LETTER N WITH TILDE + (#xD2 ?\u00D2) ;; LATIN CAPITAL LETTER O WITH GRAVE + (#xD3 ?\u00D3) ;; LATIN CAPITAL LETTER O WITH ACUTE + (#xD4 ?\u00D4) ;; LATIN CAPITAL LETTER O WITH CIRCUMFLEX (#xD5 ?\u0120) ;; LATIN CAPITAL LETTER G WITH DOT ABOVE + (#xD6 ?\u00D6) ;; LATIN CAPITAL LETTER O WITH DIAERESIS + (#xD7 ?\u00D7) ;; MULTIPLICATION SIGN (#xD8 ?\u011C) ;; LATIN CAPITAL LETTER G WITH CIRCUMFLEX + (#xD9 ?\u00D9) ;; LATIN CAPITAL LETTER U WITH GRAVE + (#xDA ?\u00DA) ;; LATIN CAPITAL LETTER U WITH ACUTE + (#xDB ?\u00DB) ;; LATIN CAPITAL LETTER U WITH CIRCUMFLEX + (#xDC ?\u00DC) ;; LATIN CAPITAL LETTER U WITH DIAERESIS (#xDD ?\u016C) ;; LATIN CAPITAL LETTER U WITH BREVE (#xDE ?\u015C) ;; LATIN CAPITAL LETTER S WITH CIRCUMFLEX + (#xDF ?\u00DF) ;; LATIN SMALL LETTER SHARP S + (#xE0 ?\u00E0) ;; LATIN SMALL LETTER A WITH GRAVE + (#xE1 ?\u00E1) ;; LATIN SMALL LETTER A WITH ACUTE + (#xE2 ?\u00E2) ;; LATIN SMALL LETTER A WITH CIRCUMFLEX + (#xE4 ?\u00E4) ;; LATIN SMALL LETTER A WITH DIAERESIS (#xE5 ?\u010B) ;; LATIN SMALL LETTER C WITH DOT ABOVE (#xE6 ?\u0109) ;; LATIN SMALL LETTER C WITH CIRCUMFLEX + (#xE7 ?\u00E7) ;; LATIN SMALL LETTER C WITH CEDILLA + (#xE8 ?\u00E8) ;; LATIN SMALL LETTER E WITH GRAVE + (#xE9 ?\u00E9) ;; LATIN SMALL LETTER E WITH ACUTE + (#xEA ?\u00EA) ;; LATIN SMALL LETTER E WITH CIRCUMFLEX + (#xEB ?\u00EB) ;; LATIN SMALL LETTER E WITH DIAERESIS + (#xEC ?\u00EC) ;; LATIN SMALL LETTER I WITH GRAVE + (#xED ?\u00ED) ;; LATIN SMALL LETTER I WITH ACUTE + (#xEE ?\u00EE) ;; LATIN SMALL LETTER I WITH CIRCUMFLEX + (#xEF ?\u00EF) ;; LATIN SMALL LETTER I WITH DIAERESIS + (#xF1 ?\u00F1) ;; LATIN SMALL LETTER N WITH TILDE + (#xF2 ?\u00F2) ;; LATIN SMALL LETTER O WITH GRAVE + (#xF3 ?\u00F3) ;; LATIN SMALL LETTER O WITH ACUTE + (#xF4 ?\u00F4) ;; LATIN SMALL LETTER O WITH CIRCUMFLEX (#xF5 ?\u0121) ;; LATIN SMALL LETTER G WITH DOT ABOVE + (#xF6 ?\u00F6) ;; LATIN SMALL LETTER O WITH DIAERESIS + (#xF7 ?\u00F7) ;; DIVISION SIGN (#xF8 ?\u011D) ;; LATIN SMALL LETTER G WITH CIRCUMFLEX + (#xF9 ?\u00F9) ;; LATIN SMALL LETTER U WITH GRAVE + (#xFA ?\u00FA) ;; LATIN SMALL LETTER U WITH ACUTE + (#xFB ?\u00FB) ;; LATIN SMALL LETTER U WITH CIRCUMFLEX + (#xFC ?\u00FC) ;; LATIN SMALL LETTER U WITH DIAERESIS (#xFD ?\u016D) ;; LATIN SMALL LETTER U WITH BREVE (#xFE ?\u015D) ;; LATIN SMALL LETTER S WITH CIRCUMFLEX (#xFF ?\u02D9)) ;; DOT ABOVE @@ -498,22 +662,63 @@ (make-8-bit-coding-system 'iso-8859-4 - '((#xA1 ?\u0104) ;; LATIN CAPITAL LETTER A WITH OGONEK + '((#x80 ?\u0080) ;; + (#x81 ?\u0081) ;; + (#x82 ?\u0082) ;; + (#x83 ?\u0083) ;; + (#x84 ?\u0084) ;; + (#x85 ?\u0085) ;; + (#x86 ?\u0086) ;; + (#x87 ?\u0087) ;; + (#x88 ?\u0088) ;; + (#x89 ?\u0089) ;; + (#x8A ?\u008A) ;; + (#x8B ?\u008B) ;; + (#x8C ?\u008C) ;; + (#x8D ?\u008D) ;; + (#x8E ?\u008E) ;; + (#x8F ?\u008F) ;; + (#x90 ?\u0090) ;; + (#x91 ?\u0091) ;; + (#x92 ?\u0092) ;; + (#x93 ?\u0093) ;; + (#x94 ?\u0094) ;; + (#x95 ?\u0095) ;; + (#x96 ?\u0096) ;; + (#x97 ?\u0097) ;; + (#x98 ?\u0098) ;; + (#x99 ?\u0099) ;; + (#x9A ?\u009A) ;; + (#x9B ?\u009B) ;; + (#x9C ?\u009C) ;; + (#x9D ?\u009D) ;; + (#x9E ?\u009E) ;; + (#x9F ?\u009F) ;; + (#xA0 ?\u00A0) ;; NO-BREAK SPACE + (#xA1 ?\u0104) ;; LATIN CAPITAL LETTER A WITH OGONEK (#xA2 ?\u0138) ;; LATIN SMALL LETTER KRA (#xA3 ?\u0156) ;; LATIN CAPITAL LETTER R WITH CEDILLA + (#xA4 ?\u00A4) ;; CURRENCY SIGN (#xA5 ?\u0128) ;; LATIN CAPITAL LETTER I WITH TILDE (#xA6 ?\u013B) ;; LATIN CAPITAL LETTER L WITH CEDILLA + (#xA7 ?\u00A7) ;; SECTION SIGN + (#xA8 ?\u00A8) ;; DIAERESIS (#xA9 ?\u0160) ;; LATIN CAPITAL LETTER S WITH CARON (#xAA ?\u0112) ;; LATIN CAPITAL LETTER E WITH MACRON (#xAB ?\u0122) ;; LATIN CAPITAL LETTER G WITH CEDILLA (#xAC ?\u0166) ;; LATIN CAPITAL LETTER T WITH STROKE + (#xAD ?\u00AD) ;; SOFT HYPHEN (#xAE ?\u017D) ;; LATIN CAPITAL LETTER Z WITH CARON + (#xAF ?\u00AF) ;; MACRON + (#xB0 ?\u00B0) ;; DEGREE SIGN (#xB1 ?\u0105) ;; LATIN SMALL LETTER A WITH OGONEK (#xB2 ?\u02DB) ;; OGONEK (#xB3 ?\u0157) ;; LATIN SMALL LETTER R WITH CEDILLA + (#xB4 ?\u00B4) ;; ACUTE ACCENT (#xB5 ?\u0129) ;; LATIN SMALL LETTER I WITH TILDE (#xB6 ?\u013C) ;; LATIN SMALL LETTER L WITH CEDILLA (#xB7 ?\u02C7) ;; CARON + (#xB8 ?\u00B8) ;; CEDILLA (#xB9 ?\u0161) ;; LATIN SMALL LETTER S WITH CARON (#xBA ?\u0113) ;; LATIN SMALL LETTER E WITH MACRON (#xBB ?\u0123) ;; LATIN SMALL LETTER G WITH CEDILLA @@ -522,29 +727,66 @@ (#xBE ?\u017E) ;; LATIN SMALL LETTER Z WITH CARON (#xBF ?\u014B) ;; LATIN SMALL LETTER ENG (#xC0 ?\u0100) ;; LATIN CAPITAL LETTER A WITH MACRON + (#xC1 ?\u00C1) ;; LATIN CAPITAL LETTER A WITH ACUTE + (#xC2 ?\u00C2) ;; LATIN CAPITAL LETTER A WITH CIRCUMFLEX + (#xC3 ?\u00C3) ;; LATIN CAPITAL LETTER A WITH TILDE + (#xC4 ?\u00C4) ;; LATIN CAPITAL LETTER A WITH DIAERESIS + (#xC5 ?\u00C5) ;; LATIN CAPITAL LETTER A WITH RING ABOVE + (#xC6 ?\u00C6) ;; LATIN CAPITAL LETTER AE (#xC7 ?\u012E) ;; LATIN CAPITAL LETTER I WITH OGONEK (#xC8 ?\u010C) ;; LATIN CAPITAL LETTER C WITH CARON + (#xC9 ?\u00C9) ;; LATIN CAPITAL LETTER E WITH ACUTE (#xCA ?\u0118) ;; LATIN CAPITAL LETTER E WITH OGONEK + (#xCB ?\u00CB) ;; LATIN CAPITAL LETTER E WITH DIAERESIS (#xCC ?\u0116) ;; LATIN CAPITAL LETTER E WITH DOT ABOVE + (#xCD ?\u00CD) ;; LATIN CAPITAL LETTER I WITH ACUTE + (#xCE ?\u00CE) ;; LATIN CAPITAL LETTER I WITH CIRCUMFLEX (#xCF ?\u012A) ;; LATIN CAPITAL LETTER I WITH MACRON (#xD0 ?\u0110) ;; LATIN CAPITAL LETTER D WITH STROKE (#xD1 ?\u0145) ;; LATIN CAPITAL LETTER N WITH CEDILLA (#xD2 ?\u014C) ;; LATIN CAPITAL LETTER O WITH MACRON (#xD3 ?\u0136) ;; LATIN CAPITAL LETTER K WITH CEDILLA + (#xD4 ?\u00D4) ;; LATIN CAPITAL LETTER O WITH CIRCUMFLEX + (#xD5 ?\u00D5) ;; LATIN CAPITAL LETTER O WITH TILDE + (#xD6 ?\u00D6) ;; LATIN CAPITAL LETTER O WITH DIAERESIS + (#xD7 ?\u00D7) ;; MULTIPLICATION SIGN + (#xD8 ?\u00D8) ;; LATIN CAPITAL LETTER O WITH STROKE (#xD9 ?\u0172) ;; LATIN CAPITAL LETTER U WITH OGONEK + (#xDA ?\u00DA) ;; LATIN CAPITAL LETTER U WITH ACUTE + (#xDB ?\u00DB) ;; LATIN CAPITAL LETTER U WITH CIRCUMFLEX + (#xDC ?\u00DC) ;; LATIN CAPITAL LETTER U WITH DIAERESIS (#xDD ?\u0168) ;; LATIN CAPITAL LETTER U WITH TILDE (#xDE ?\u016A) ;; LATIN CAPITAL LETTER U WITH MACRON + (#xDF ?\u00DF) ;; LATIN SMALL LETTER SHARP S (#xE0 ?\u0101) ;; LATIN SMALL LETTER A WITH MACRON + (#xE1 ?\u00E1) ;; LATIN SMALL LETTER A WITH ACUTE + (#xE2 ?\u00E2) ;; LATIN SMALL LETTER A WITH CIRCUMFLEX + (#xE3 ?\u00E3) ;; LATIN SMALL LETTER A WITH TILDE + (#xE4 ?\u00E4) ;; LATIN SMALL LETTER A WITH DIAERESIS + (#xE5 ?\u00E5) ;; LATIN SMALL LETTER A WITH RING ABOVE + (#xE6 ?\u00E6) ;; LATIN SMALL LETTER AE (#xE7 ?\u012F) ;; LATIN SMALL LETTER I WITH OGONEK (#xE8 ?\u010D) ;; LATIN SMALL LETTER C WITH CARON + (#xE9 ?\u00E9) ;; LATIN SMALL LETTER E WITH ACUTE (#xEA ?\u0119) ;; LATIN SMALL LETTER E WITH OGONEK + (#xEB ?\u00EB) ;; LATIN SMALL LETTER E WITH DIAERESIS (#xEC ?\u0117) ;; LATIN SMALL LETTER E WITH DOT ABOVE + (#xED ?\u00ED) ;; LATIN SMALL LETTER I WITH ACUTE + (#xEE ?\u00EE) ;; LATIN SMALL LETTER I WITH CIRCUMFLEX (#xEF ?\u012B) ;; LATIN SMALL LETTER I WITH MACRON (#xF0 ?\u0111) ;; LATIN SMALL LETTER D WITH STROKE (#xF1 ?\u0146) ;; LATIN SMALL LETTER N WITH CEDILLA (#xF2 ?\u014D) ;; LATIN SMALL LETTER O WITH MACRON (#xF3 ?\u0137) ;; LATIN SMALL LETTER K WITH CEDILLA + (#xF4 ?\u00F4) ;; LATIN SMALL LETTER O WITH CIRCUMFLEX + (#xF5 ?\u00F5) ;; LATIN SMALL LETTER O WITH TILDE + (#xF6 ?\u00F6) ;; LATIN SMALL LETTER O WITH DIAERESIS + (#xF7 ?\u00F7) ;; DIVISION SIGN + (#xF8 ?\u00F8) ;; LATIN SMALL LETTER O WITH STROKE (#xF9 ?\u0173) ;; LATIN SMALL LETTER U WITH OGONEK + (#xFA ?\u00FA) ;; LATIN SMALL LETTER U WITH ACUTE + (#xFB ?\u00FB) ;; LATIN SMALL LETTER U WITH CIRCUMFLEX + (#xFC ?\u00FC) ;; LATIN SMALL LETTER U WITH DIAERESIS (#xFD ?\u0169) ;; LATIN SMALL LETTER U WITH TILDE (#xFE ?\u016B) ;; LATIN SMALL LETTER U WITH MACRON (#xFF ?\u02D9));; DOT ABOVE @@ -633,15 +875,53 @@ (make-8-bit-coding-system 'iso-8859-14 - '((#xA1 ?\u1E02) ;; LATIN CAPITAL LETTER B WITH DOT ABOVE + '((#x80 ?\u0080) ;; + (#x81 ?\u0081) ;; + (#x82 ?\u0082) ;; + (#x83 ?\u0083) ;; + (#x84 ?\u0084) ;; + (#x85 ?\u0085) ;; + (#x86 ?\u0086) ;; + (#x87 ?\u0087) ;; + (#x88 ?\u0088) ;; + (#x89 ?\u0089) ;; + (#x8A ?\u008A) ;; + (#x8B ?\u008B) ;; + (#x8C ?\u008C) ;; + (#x8D ?\u008D) ;; + (#x8E ?\u008E) ;; + (#x8F ?\u008F) ;; + (#x90 ?\u0090) ;; + (#x91 ?\u0091) ;; + (#x92 ?\u0092) ;; + (#x93 ?\u0093) ;; + (#x94 ?\u0094) ;; + (#x95 ?\u0095) ;; + (#x96 ?\u0096) ;; + (#x97 ?\u0097) ;; + (#x98 ?\u0098) ;; + (#x99 ?\u0099) ;; + (#x9A ?\u009A) ;; + (#x9B ?\u009B) ;; + (#x9C ?\u009C) ;; + (#x9D ?\u009D) ;; + (#x9E ?\u009E) ;; + (#x9F ?\u009F) ;; + (#xA0 ?\u00A0) ;; NO-BREAK SPACE + (#xA1 ?\u1E02) ;; LATIN CAPITAL LETTER B WITH DOT ABOVE (#xA2 ?\u1E03) ;; LATIN SMALL LETTER B WITH DOT ABOVE + (#xA3 ?\u00A3) ;; POUND SIGN (#xA4 ?\u010A) ;; LATIN CAPITAL LETTER C WITH DOT ABOVE (#xA5 ?\u010B) ;; LATIN SMALL LETTER C WITH DOT ABOVE (#xA6 ?\u1E0A) ;; LATIN CAPITAL LETTER D WITH DOT ABOVE + (#xA7 ?\u00A7) ;; SECTION SIGN (#xA8 ?\u1E80) ;; LATIN CAPITAL LETTER W WITH GRAVE + (#xA9 ?\u00A9) ;; COPYRIGHT SIGN (#xAA ?\u1E82) ;; LATIN CAPITAL LETTER W WITH ACUTE (#xAB ?\u1E0B) ;; LATIN SMALL LETTER D WITH DOT ABOVE (#xAC ?\u1EF2) ;; LATIN CAPITAL LETTER Y WITH GRAVE + (#xAD ?\u00AD) ;; SOFT HYPHEN + (#xAE ?\u00AE) ;; REGISTERED SIGN (#xAF ?\u0178) ;; LATIN CAPITAL LETTER Y WITH DIAERESIS (#xB0 ?\u1E1E) ;; LATIN CAPITAL LETTER F WITH DOT ABOVE (#xB1 ?\u1E1F) ;; LATIN SMALL LETTER F WITH DOT ABOVE @@ -649,6 +929,7 @@ (#xB3 ?\u0121) ;; LATIN SMALL LETTER G WITH DOT ABOVE (#xB4 ?\u1E40) ;; LATIN CAPITAL LETTER M WITH DOT ABOVE (#xB5 ?\u1E41) ;; LATIN SMALL LETTER M WITH DOT ABOVE + (#xB6 ?\u00B6) ;; PILCROW SIGN (#xB7 ?\u1E56) ;; LATIN CAPITAL LETTER P WITH DOT ABOVE (#xB8 ?\u1E81) ;; LATIN SMALL LETTER W WITH GRAVE (#xB9 ?\u1E57) ;; LATIN SMALL LETTER P WITH DOT ABOVE @@ -658,12 +939,70 @@ (#xBD ?\u1E84) ;; LATIN CAPITAL LETTER W WITH DIAERESIS (#xBE ?\u1E85) ;; LATIN SMALL LETTER W WITH DIAERESIS (#xBF ?\u1E61) ;; LATIN SMALL LETTER S WITH DOT ABOVE + (#xC0 ?\u00C0) ;; LATIN CAPITAL LETTER A WITH GRAVE + (#xC1 ?\u00C1) ;; LATIN CAPITAL LETTER A WITH ACUTE + (#xC2 ?\u00C2) ;; LATIN CAPITAL LETTER A WITH CIRCUMFLEX + (#xC3 ?\u00C3) ;; LATIN CAPITAL LETTER A WITH TILDE + (#xC4 ?\u00C4) ;; LATIN CAPITAL LETTER A WITH DIAERESIS + (#xC5 ?\u00C5) ;; LATIN CAPITAL LETTER A WITH RING ABOVE + (#xC6 ?\u00C6) ;; LATIN CAPITAL LETTER AE + (#xC7 ?\u00C7) ;; LATIN CAPITAL LETTER C WITH CEDILLA + (#xC8 ?\u00C8) ;; LATIN CAPITAL LETTER E WITH GRAVE + (#xC9 ?\u00C9) ;; LATIN CAPITAL LETTER E WITH ACUTE + (#xCA ?\u00CA) ;; LATIN CAPITAL LETTER E WITH CIRCUMFLEX + (#xCB ?\u00CB) ;; LATIN CAPITAL LETTER E WITH DIAERESIS + (#xCC ?\u00CC) ;; LATIN CAPITAL LETTER I WITH GRAVE + (#xCD ?\u00CD) ;; LATIN CAPITAL LETTER I WITH ACUTE + (#xCE ?\u00CE) ;; LATIN CAPITAL LETTER I WITH CIRCUMFLEX + (#xCF ?\u00CF) ;; LATIN CAPITAL LETTER I WITH DIAERESIS (#xD0 ?\u0174) ;; LATIN CAPITAL LETTER W WITH CIRCUMFLEX + (#xD1 ?\u00D1) ;; LATIN CAPITAL LETTER N WITH TILDE + (#xD2 ?\u00D2) ;; LATIN CAPITAL LETTER O WITH GRAVE + (#xD3 ?\u00D3) ;; LATIN CAPITAL LETTER O WITH ACUTE + (#xD4 ?\u00D4) ;; LATIN CAPITAL LETTER O WITH CIRCUMFLEX + (#xD5 ?\u00D5) ;; LATIN CAPITAL LETTER O WITH TILDE + (#xD6 ?\u00D6) ;; LATIN CAPITAL LETTER O WITH DIAERESIS (#xD7 ?\u1E6A) ;; LATIN CAPITAL LETTER T WITH DOT ABOVE + (#xD8 ?\u00D8) ;; LATIN CAPITAL LETTER O WITH STROKE + (#xD9 ?\u00D9) ;; LATIN CAPITAL LETTER U WITH GRAVE + (#xDA ?\u00DA) ;; LATIN CAPITAL LETTER U WITH ACUTE + (#xDB ?\u00DB) ;; LATIN CAPITAL LETTER U WITH CIRCUMFLEX + (#xDC ?\u00DC) ;; LATIN CAPITAL LETTER U WITH DIAERESIS + (#xDD ?\u00DD) ;; LATIN CAPITAL LETTER Y WITH ACUTE (#xDE ?\u0176) ;; LATIN CAPITAL LETTER Y WITH CIRCUMFLEX + (#xDF ?\u00DF) ;; LATIN SMALL LETTER SHARP S + (#xE0 ?\u00E0) ;; LATIN SMALL LETTER A WITH GRAVE + (#xE1 ?\u00E1) ;; LATIN SMALL LETTER A WITH ACUTE + (#xE2 ?\u00E2) ;; LATIN SMALL LETTER A WITH CIRCUMFLEX + (#xE3 ?\u00E3) ;; LATIN SMALL LETTER A WITH TILDE + (#xE4 ?\u00E4) ;; LATIN SMALL LETTER A WITH DIAERESIS + (#xE5 ?\u00E5) ;; LATIN SMALL LETTER A WITH RING ABOVE + (#xE6 ?\u00E6) ;; LATIN SMALL LETTER AE + (#xE7 ?\u00E7) ;; LATIN SMALL LETTER C WITH CEDILLA + (#xE8 ?\u00E8) ;; LATIN SMALL LETTER E WITH GRAVE + (#xE9 ?\u00E9) ;; LATIN SMALL LETTER E WITH ACUTE + (#xEA ?\u00EA) ;; LATIN SMALL LETTER E WITH CIRCUMFLEX + (#xEB ?\u00EB) ;; LATIN SMALL LETTER E WITH DIAERESIS + (#xEC ?\u00EC) ;; LATIN SMALL LETTER I WITH GRAVE + (#xED ?\u00ED) ;; LATIN SMALL LETTER I WITH ACUTE + (#xEE ?\u00EE) ;; LATIN SMALL LETTER I WITH CIRCUMFLEX + (#xEF ?\u00EF) ;; LATIN SMALL LETTER I WITH DIAERESIS (#xF0 ?\u0175) ;; LATIN SMALL LETTER W WITH CIRCUMFLEX + (#xF1 ?\u00F1) ;; LATIN SMALL LETTER N WITH TILDE + (#xF2 ?\u00F2) ;; LATIN SMALL LETTER O WITH GRAVE + (#xF3 ?\u00F3) ;; LATIN SMALL LETTER O WITH ACUTE + (#xF4 ?\u00F4) ;; LATIN SMALL LETTER O WITH CIRCUMFLEX + (#xF5 ?\u00F5) ;; LATIN SMALL LETTER O WITH TILDE + (#xF6 ?\u00F6) ;; LATIN SMALL LETTER O WITH DIAERESIS (#xF7 ?\u1E6B) ;; LATIN SMALL LETTER T WITH DOT ABOVE - (#xFE ?\u0177)) ;; LATIN SMALL LETTER Y WITH CIRCUMFLEX + (#xF8 ?\u00F8) ;; LATIN SMALL LETTER O WITH STROKE + (#xF9 ?\u00F9) ;; LATIN SMALL LETTER U WITH GRAVE + (#xFA ?\u00FA) ;; LATIN SMALL LETTER U WITH ACUTE + (#xFB ?\u00FB) ;; LATIN SMALL LETTER U WITH CIRCUMFLEX + (#xFC ?\u00FC) ;; LATIN SMALL LETTER U WITH DIAERESIS + (#xFD ?\u00FD) ;; LATIN SMALL LETTER Y WITH ACUTE + (#xFE ?\u0177) ;; LATIN SMALL LETTER Y WITH CIRCUMFLEX + (#xFF ?\u00FF)) ;; LATIN SMALL LETTER Y WITH DIAERESIS "ISO-8859-14 (Latin-8)" '(mnemonic "Latin 8" aliases (iso-latin-8 latin-8))) @@ -742,14 +1081,134 @@ (make-8-bit-coding-system 'iso-8859-15 - '((#xA4 ?\u20AC) ;; EURO SIGN + '((#x80 ?\u0080) ;; + (#x81 ?\u0081) ;; + (#x82 ?\u0082) ;; + (#x83 ?\u0083) ;; + (#x84 ?\u0084) ;; + (#x85 ?\u0085) ;; + (#x86 ?\u0086) ;; + (#x87 ?\u0087) ;; + (#x88 ?\u0088) ;; + (#x89 ?\u0089) ;; + (#x8A ?\u008A) ;; + (#x8B ?\u008B) ;; + (#x8C ?\u008C) ;; + (#x8D ?\u008D) ;; + (#x8E ?\u008E) ;; + (#x8F ?\u008F) ;; + (#x90 ?\u0090) ;; + (#x91 ?\u0091) ;; + (#x92 ?\u0092) ;; + (#x93 ?\u0093) ;; + (#x94 ?\u0094) ;; + (#x95 ?\u0095) ;; + (#x96 ?\u0096) ;; + (#x97 ?\u0097) ;; + (#x98 ?\u0098) ;; + (#x99 ?\u0099) ;; + (#x9A ?\u009A) ;; + (#x9B ?\u009B) ;; + (#x9C ?\u009C) ;; + (#x9D ?\u009D) ;; + (#x9E ?\u009E) ;; + (#x9F ?\u009F) ;; + (#xA0 ?\u00A0) ;; NO-BREAK SPACE + (#xA1 ?\u00A1) ;; INVERTED EXCLAMATION MARK + (#xA2 ?\u00A2) ;; CENT SIGN + (#xA3 ?\u00A3) ;; POUND SIGN + (#xA4 ?\u20AC) ;; EURO SIGN + (#xA5 ?\u00A5) ;; YEN SIGN (#xA6 ?\u0160) ;; LATIN CAPITAL LETTER S WITH CARON + (#xA7 ?\u00A7) ;; SECTION SIGN (#xA8 ?\u0161) ;; LATIN SMALL LETTER S WITH CARON + (#xA9 ?\u00A9) ;; COPYRIGHT SIGN + (#xAA ?\u00AA) ;; FEMININE ORDINAL INDICATOR + (#xAB ?\u00AB) ;; LEFT-POINTING DOUBLE ANGLE QUOTATION MARK + (#xAC ?\u00AC) ;; NOT SIGN + (#xAD ?\u00AD) ;; SOFT HYPHEN + (#xAE ?\u00AE) ;; REGISTERED SIGN + (#xAF ?\u00AF) ;; MACRON + (#xB0 ?\u00B0) ;; DEGREE SIGN + (#xB1 ?\u00B1) ;; PLUS-MINUS SIGN + (#xB2 ?\u00B2) ;; SUPERSCRIPT TWO + (#xB3 ?\u00B3) ;; SUPERSCRIPT THREE (#xB4 ?\u017D) ;; LATIN CAPITAL LETTER Z WITH CARON + (#xB5 ?\u00B5) ;; MICRO SIGN + (#xB6 ?\u00B6) ;; PILCROW SIGN + (#xB7 ?\u00B7) ;; MIDDLE DOT (#xB8 ?\u017E) ;; LATIN SMALL LETTER Z WITH CARON + (#xB9 ?\u00B9) ;; SUPERSCRIPT ONE + (#xBA ?\u00BA) ;; MASCULINE ORDINAL INDICATOR + (#xBB ?\u00BB) ;; RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK (#xBC ?\u0152) ;; LATIN CAPITAL LIGATURE OE (#xBD ?\u0153) ;; LATIN SMALL LIGATURE OE - (#xBE ?\u0178)) ;; LATIN CAPITAL LETTER Y WITH DIAERESIS + (#xBE ?\u0178) ;; LATIN CAPITAL LETTER Y WITH DIAERESIS + (#xBF ?\u00BF) ;; INVERTED QUESTION MARK + (#xC0 ?\u00C0) ;; LATIN CAPITAL LETTER A WITH GRAVE + (#xC1 ?\u00C1) ;; LATIN CAPITAL LETTER A WITH ACUTE + (#xC2 ?\u00C2) ;; LATIN CAPITAL LETTER A WITH CIRCUMFLEX + (#xC3 ?\u00C3) ;; LATIN CAPITAL LETTER A WITH TILDE + (#xC4 ?\u00C4) ;; LATIN CAPITAL LETTER A WITH DIAERESIS + (#xC5 ?\u00C5) ;; LATIN CAPITAL LETTER A WITH RING ABOVE + (#xC6 ?\u00C6) ;; LATIN CAPITAL LETTER AE + (#xC7 ?\u00C7) ;; LATIN CAPITAL LETTER C WITH CEDILLA + (#xC8 ?\u00C8) ;; LATIN CAPITAL LETTER E WITH GRAVE + (#xC9 ?\u00C9) ;; LATIN CAPITAL LETTER E WITH ACUTE + (#xCA ?\u00CA) ;; LATIN CAPITAL LETTER E WITH CIRCUMFLEX + (#xCB ?\u00CB) ;; LATIN CAPITAL LETTER E WITH DIAERESIS + (#xCC ?\u00CC) ;; LATIN CAPITAL LETTER I WITH GRAVE + (#xCD ?\u00CD) ;; LATIN CAPITAL LETTER I WITH ACUTE + (#xCE ?\u00CE) ;; LATIN CAPITAL LETTER I WITH CIRCUMFLEX + (#xCF ?\u00CF) ;; LATIN CAPITAL LETTER I WITH DIAERESIS + (#xD0 ?\u00D0) ;; LATIN CAPITAL LETTER ETH + (#xD1 ?\u00D1) ;; LATIN CAPITAL LETTER N WITH TILDE + (#xD2 ?\u00D2) ;; LATIN CAPITAL LETTER O WITH GRAVE + (#xD3 ?\u00D3) ;; LATIN CAPITAL LETTER O WITH ACUTE + (#xD4 ?\u00D4) ;; LATIN CAPITAL LETTER O WITH CIRCUMFLEX + (#xD5 ?\u00D5) ;; LATIN CAPITAL LETTER O WITH TILDE + (#xD6 ?\u00D6) ;; LATIN CAPITAL LETTER O WITH DIAERESIS + (#xD7 ?\u00D7) ;; MULTIPLICATION SIGN + (#xD8 ?\u00D8) ;; LATIN CAPITAL LETTER O WITH STROKE + (#xD9 ?\u00D9) ;; LATIN CAPITAL LETTER U WITH GRAVE + (#xDA ?\u00DA) ;; LATIN CAPITAL LETTER U WITH ACUTE + (#xDB ?\u00DB) ;; LATIN CAPITAL LETTER U WITH CIRCUMFLEX + (#xDC ?\u00DC) ;; LATIN CAPITAL LETTER U WITH DIAERESIS + (#xDD ?\u00DD) ;; LATIN CAPITAL LETTER Y WITH ACUTE + (#xDE ?\u00DE) ;; LATIN CAPITAL LETTER THORN + (#xDF ?\u00DF) ;; LATIN SMALL LETTER SHARP S + (#xE0 ?\u00E0) ;; LATIN SMALL LETTER A WITH GRAVE + (#xE1 ?\u00E1) ;; LATIN SMALL LETTER A WITH ACUTE + (#xE2 ?\u00E2) ;; LATIN SMALL LETTER A WITH CIRCUMFLEX + (#xE3 ?\u00E3) ;; LATIN SMALL LETTER A WITH TILDE + (#xE4 ?\u00E4) ;; LATIN SMALL LETTER A WITH DIAERESIS + (#xE5 ?\u00E5) ;; LATIN SMALL LETTER A WITH RING ABOVE + (#xE6 ?\u00E6) ;; LATIN SMALL LETTER AE + (#xE7 ?\u00E7) ;; LATIN SMALL LETTER C WITH CEDILLA + (#xE8 ?\u00E8) ;; LATIN SMALL LETTER E WITH GRAVE + (#xE9 ?\u00E9) ;; LATIN SMALL LETTER E WITH ACUTE + (#xEA ?\u00EA) ;; LATIN SMALL LETTER E WITH CIRCUMFLEX + (#xEB ?\u00EB) ;; LATIN SMALL LETTER E WITH DIAERESIS + (#xEC ?\u00EC) ;; LATIN SMALL LETTER I WITH GRAVE + (#xED ?\u00ED) ;; LATIN SMALL LETTER I WITH ACUTE + (#xEE ?\u00EE) ;; LATIN SMALL LETTER I WITH CIRCUMFLEX + (#xEF ?\u00EF) ;; LATIN SMALL LETTER I WITH DIAERESIS + (#xF0 ?\u00F0) ;; LATIN SMALL LETTER ETH + (#xF1 ?\u00F1) ;; LATIN SMALL LETTER N WITH TILDE + (#xF2 ?\u00F2) ;; LATIN SMALL LETTER O WITH GRAVE + (#xF3 ?\u00F3) ;; LATIN SMALL LETTER O WITH ACUTE + (#xF4 ?\u00F4) ;; LATIN SMALL LETTER O WITH CIRCUMFLEX + (#xF5 ?\u00F5) ;; LATIN SMALL LETTER O WITH TILDE + (#xF6 ?\u00F6) ;; LATIN SMALL LETTER O WITH DIAERESIS + (#xF7 ?\u00F7) ;; DIVISION SIGN + (#xF8 ?\u00F8) ;; LATIN SMALL LETTER O WITH STROKE + (#xF9 ?\u00F9) ;; LATIN SMALL LETTER U WITH GRAVE + (#xFA ?\u00FA) ;; LATIN SMALL LETTER U WITH ACUTE + (#xFB ?\u00FB) ;; LATIN SMALL LETTER U WITH CIRCUMFLEX + (#xFC ?\u00FC) ;; LATIN SMALL LETTER U WITH DIAERESIS + (#xFD ?\u00FD) ;; LATIN SMALL LETTER Y WITH ACUTE + (#xFE ?\u00FE) ;; LATIN SMALL LETTER THORN + (#xFF ?\u00FF)) ;; LATIN SMALL LETTER Y WITH DIAERESIS "ISO 4873 conforming 8-bit code (ASCII + Latin 9; aka Latin-1 with Euro)" '(mnemonic "Latin 9" aliases (iso-latin-9 latin-9 latin-0))) @@ -852,46 +1311,134 @@ ;; Add a coding system for ISO 8859-16. (make-8-bit-coding-system 'iso-8859-16 - '((#xA1 ?\u0104) ;; LATIN CAPITAL LETTER A WITH OGONEK + '((#x80 ?\u0080) ;; + (#x81 ?\u0081) ;; + (#x82 ?\u0082) ;; + (#x83 ?\u0083) ;; + (#x84 ?\u0084) ;; + (#x85 ?\u0085) ;; + (#x86 ?\u0086) ;; + (#x87 ?\u0087) ;; + (#x88 ?\u0088) ;; + (#x89 ?\u0089) ;; + (#x8A ?\u008A) ;; + (#x8B ?\u008B) ;; + (#x8C ?\u008C) ;; + (#x8D ?\u008D) ;; + (#x8E ?\u008E) ;; + (#x8F ?\u008F) ;; + (#x90 ?\u0090) ;; + (#x91 ?\u0091) ;; + (#x92 ?\u0092) ;; + (#x93 ?\u0093) ;; + (#x94 ?\u0094) ;; + (#x95 ?\u0095) ;; + (#x96 ?\u0096) ;; + (#x97 ?\u0097) ;; + (#x98 ?\u0098) ;; + (#x99 ?\u0099) ;; + (#x9A ?\u009A) ;; + (#x9B ?\u009B) ;; + (#x9C ?\u009C) ;; + (#x9D ?\u009D) ;; + (#x9E ?\u009E) ;; + (#x9F ?\u009F) ;; + (#xA0 ?\u00A0) ;; NO-BREAK SPACE + (#xA1 ?\u0104) ;; LATIN CAPITAL LETTER A WITH OGONEK (#xA2 ?\u0105) ;; LATIN SMALL LETTER A WITH OGONEK (#xA3 ?\u0141) ;; LATIN CAPITAL LETTER L WITH STROKE (#xA4 ?\u20AC) ;; EURO SIGN (#xA5 ?\u201E) ;; DOUBLE LOW-9 QUOTATION MARK (#xA6 ?\u0160) ;; LATIN CAPITAL LETTER S WITH CARON + (#xA7 ?\u00A7) ;; SECTION SIGN (#xA8 ?\u0161) ;; LATIN SMALL LETTER S WITH CARON + (#xA9 ?\u00A9) ;; COPYRIGHT SIGN (#xAA ?\u0218) ;; LATIN CAPITAL LETTER S WITH COMMA BELOW + (#xAB ?\u00AB) ;; LEFT-POINTING DOUBLE ANGLE QUOTATION MARK (#xAC ?\u0179) ;; LATIN CAPITAL LETTER Z WITH ACUTE + (#xAD ?\u00AD) ;; SOFT HYPHEN (#xAE ?\u017A) ;; LATIN SMALL LETTER Z WITH ACUTE (#xAF ?\u017B) ;; LATIN CAPITAL LETTER Z WITH DOT ABOVE + (#xB0 ?\u00B0) ;; DEGREE SIGN + (#xB1 ?\u00B1) ;; PLUS-MINUS SIGN (#xB2 ?\u010C) ;; LATIN CAPITAL LETTER C WITH CARON (#xB3 ?\u0142) ;; LATIN SMALL LETTER L WITH STROKE (#xB4 ?\u017D) ;; LATIN CAPITAL LETTER Z WITH CARON (#xB5 ?\u201D) ;; RIGHT DOUBLE QUOTATION MARK + (#xB6 ?\u00B6) ;; PILCROW SIGN + (#xB7 ?\u00B7) ;; MIDDLE DOT (#xB8 ?\u017E) ;; LATIN SMALL LETTER Z WITH CARON (#xB9 ?\u010D) ;; LATIN SMALL LETTER C WITH CARON (#xBA ?\u0219) ;; LATIN SMALL LETTER S WITH COMMA BELOW + (#xBB ?\u00BB) ;; RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK (#xBC ?\u0152) ;; LATIN CAPITAL LIGATURE OE (#xBD ?\u0153) ;; LATIN SMALL LIGATURE OE (#xBE ?\u0178) ;; LATIN CAPITAL LETTER Y WITH DIAERESIS (#xBF ?\u017C) ;; LATIN SMALL LETTER Z WITH DOT ABOVE + (#xC0 ?\u00C0) ;; LATIN CAPITAL LETTER A WITH GRAVE + (#xC1 ?\u00C1) ;; LATIN CAPITAL LETTER A WITH ACUTE + (#xC2 ?\u00C2) ;; LATIN CAPITAL LETTER A WITH CIRCUMFLEX (#xC3 ?\u0102) ;; LATIN CAPITAL LETTER A WITH BREVE + (#xC4 ?\u00C4) ;; LATIN CAPITAL LETTER A WITH DIAERESIS (#xC5 ?\u0106) ;; LATIN CAPITAL LETTER C WITH ACUTE + (#xC6 ?\u00C6) ;; LATIN CAPITAL LETTER AE + (#xC7 ?\u00C7) ;; LATIN CAPITAL LETTER C WITH CEDILLA + (#xC8 ?\u00C8) ;; LATIN CAPITAL LETTER E WITH GRAVE + (#xC9 ?\u00C9) ;; LATIN CAPITAL LETTER E WITH ACUTE + (#xCA ?\u00CA) ;; LATIN CAPITAL LETTER E WITH CIRCUMFLEX + (#xCB ?\u00CB) ;; LATIN CAPITAL LETTER E WITH DIAERESIS + (#xCC ?\u00CC) ;; LATIN CAPITAL LETTER I WITH GRAVE + (#xCD ?\u00CD) ;; LATIN CAPITAL LETTER I WITH ACUTE + (#xCE ?\u00CE) ;; LATIN CAPITAL LETTER I WITH CIRCUMFLEX + (#xCF ?\u00CF) ;; LATIN CAPITAL LETTER I WITH DIAERESIS (#xD0 ?\u0110) ;; LATIN CAPITAL LETTER D WITH STROKE (#xD1 ?\u0143) ;; LATIN CAPITAL LETTER N WITH ACUTE + (#xD2 ?\u00D2) ;; LATIN CAPITAL LETTER O WITH GRAVE + (#xD3 ?\u00D3) ;; LATIN CAPITAL LETTER O WITH ACUTE + (#xD4 ?\u00D4) ;; LATIN CAPITAL LETTER O WITH CIRCUMFLEX (#xD5 ?\u0150) ;; LATIN CAPITAL LETTER O WITH DOUBLE ACUTE + (#xD6 ?\u00D6) ;; LATIN CAPITAL LETTER O WITH DIAERESIS (#xD7 ?\u015A) ;; LATIN CAPITAL LETTER S WITH ACUTE (#xD8 ?\u0170) ;; LATIN CAPITAL LETTER U WITH DOUBLE ACUTE + (#xD9 ?\u00D9) ;; LATIN CAPITAL LETTER U WITH GRAVE + (#xDA ?\u00DA) ;; LATIN CAPITAL LETTER U WITH ACUTE + (#xDB ?\u00DB) ;; LATIN CAPITAL LETTER U WITH CIRCUMFLEX + (#xDC ?\u00DC) ;; LATIN CAPITAL LETTER U WITH DIAERESIS (#xDD ?\u0118) ;; LATIN CAPITAL LETTER E WITH OGONEK (#xDE ?\u021A) ;; LATIN CAPITAL LETTER T WITH COMMA BELOW + (#xDF ?\u00DF) ;; LATIN SMALL LETTER SHARP S + (#xE0 ?\u00E0) ;; LATIN SMALL LETTER A WITH GRAVE + (#xE1 ?\u00E1) ;; LATIN SMALL LETTER A WITH ACUTE + (#xE2 ?\u00E2) ;; LATIN SMALL LETTER A WITH CIRCUMFLEX (#xE3 ?\u0103) ;; LATIN SMALL LETTER A WITH BREVE + (#xE4 ?\u00E4) ;; LATIN SMALL LETTER A WITH DIAERESIS (#xE5 ?\u0107) ;; LATIN SMALL LETTER C WITH ACUTE + (#xE6 ?\u00E6) ;; LATIN SMALL LETTER AE + (#xE7 ?\u00E7) ;; LATIN SMALL LETTER C WITH CEDILLA + (#xE8 ?\u00E8) ;; LATIN SMALL LETTER E WITH GRAVE + (#xE9 ?\u00E9) ;; LATIN SMALL LETTER E WITH ACUTE + (#xEA ?\u00EA) ;; LATIN SMALL LETTER E WITH CIRCUMFLEX + (#xEB ?\u00EB) ;; LATIN SMALL LETTER E WITH DIAERESIS + (#xEC ?\u00EC) ;; LATIN SMALL LETTER I WITH GRAVE + (#xED ?\u00ED) ;; LATIN SMALL LETTER I WITH ACUTE + (#xEE ?\u00EE) ;; LATIN SMALL LETTER I WITH CIRCUMFLEX + (#xEF ?\u00EF) ;; LATIN SMALL LETTER I WITH DIAERESIS (#xF0 ?\u0111) ;; LATIN SMALL LETTER D WITH STROKE (#xF1 ?\u0144) ;; LATIN SMALL LETTER N WITH ACUTE + (#xF2 ?\u00F2) ;; LATIN SMALL LETTER O WITH GRAVE + (#xF3 ?\u00F3) ;; LATIN SMALL LETTER O WITH ACUTE + (#xF4 ?\u00F4) ;; LATIN SMALL LETTER O WITH CIRCUMFLEX (#xF5 ?\u0151) ;; LATIN SMALL LETTER O WITH DOUBLE ACUTE + (#xF6 ?\u00F6) ;; LATIN SMALL LETTER O WITH DIAERESIS (#xF7 ?\u015B) ;; LATIN SMALL LETTER S WITH ACUTE (#xF8 ?\u0171) ;; LATIN SMALL LETTER U WITH DOUBLE ACUTE + (#xF9 ?\u00F9) ;; LATIN SMALL LETTER U WITH GRAVE + (#xFA ?\u00FA) ;; LATIN SMALL LETTER U WITH ACUTE + (#xFB ?\u00FB) ;; LATIN SMALL LETTER U WITH CIRCUMFLEX + (#xFC ?\u00FC) ;; LATIN SMALL LETTER U WITH DIAERESIS (#xFD ?\u0119) ;; LATIN SMALL LETTER E WITH OGONEK - (#xFE ?\u021B)) ;; LATIN SMALL LETTER T WITH COMMA BELOW + (#xFE ?\u021B) ;; LATIN SMALL LETTER T WITH COMMA BELOW + (#xFF ?\u00FF)) ;; LATIN SMALL LETTER Y WITH DIAERESIS "ISO-8859-16 (Latin-10)" '(mnemonic "Latin 10" aliases (iso-latin-10))) @@ -972,12 +1519,134 @@ (make-8-bit-coding-system 'iso-8859-9 - '((#xD0 ?\u011E) ;; LATIN CAPITAL LETTER G WITH BREVE + '((#x80 ?\u0080) ;; + (#x81 ?\u0081) ;; + (#x82 ?\u0082) ;; + (#x83 ?\u0083) ;; + (#x84 ?\u0084) ;; + (#x85 ?\u0085) ;; + (#x86 ?\u0086) ;; + (#x87 ?\u0087) ;; + (#x88 ?\u0088) ;; + (#x89 ?\u0089) ;; + (#x8A ?\u008A) ;; + (#x8B ?\u008B) ;; + (#x8C ?\u008C) ;; + (#x8D ?\u008D) ;; + (#x8E ?\u008E) ;; + (#x8F ?\u008F) ;; + (#x90 ?\u0090) ;; + (#x91 ?\u0091) ;; + (#x92 ?\u0092) ;; + (#x93 ?\u0093) ;; + (#x94 ?\u0094) ;; + (#x95 ?\u0095) ;; + (#x96 ?\u0096) ;; + (#x97 ?\u0097) ;; + (#x98 ?\u0098) ;; + (#x99 ?\u0099) ;; + (#x9A ?\u009A) ;; + (#x9B ?\u009B) ;; + (#x9C ?\u009C) ;; + (#x9D ?\u009D) ;; + (#x9E ?\u009E) ;; + (#x9F ?\u009F) ;; + (#xA0 ?\u00A0) ;; NO-BREAK SPACE + (#xA1 ?\u00A1) ;; INVERTED EXCLAMATION MARK + (#xA2 ?\u00A2) ;; CENT SIGN + (#xA3 ?\u00A3) ;; POUND SIGN + (#xA4 ?\u00A4) ;; CURRENCY SIGN + (#xA5 ?\u00A5) ;; YEN SIGN + (#xA6 ?\u00A6) ;; BROKEN BAR + (#xA7 ?\u00A7) ;; SECTION SIGN + (#xA8 ?\u00A8) ;; DIAERESIS + (#xA9 ?\u00A9) ;; COPYRIGHT SIGN + (#xAA ?\u00AA) ;; FEMININE ORDINAL INDICATOR + (#xAB ?\u00AB) ;; LEFT-POINTING DOUBLE ANGLE QUOTATION MARK + (#xAC ?\u00AC) ;; NOT SIGN + (#xAD ?\u00AD) ;; SOFT HYPHEN + (#xAE ?\u00AE) ;; REGISTERED SIGN + (#xAF ?\u00AF) ;; MACRON + (#xB0 ?\u00B0) ;; DEGREE SIGN + (#xB1 ?\u00B1) ;; PLUS-MINUS SIGN + (#xB2 ?\u00B2) ;; SUPERSCRIPT TWO + (#xB3 ?\u00B3) ;; SUPERSCRIPT THREE + (#xB4 ?\u00B4) ;; ACUTE ACCENT + (#xB5 ?\u00B5) ;; MICRO SIGN + (#xB6 ?\u00B6) ;; PILCROW SIGN + (#xB7 ?\u00B7) ;; MIDDLE DOT + (#xB8 ?\u00B8) ;; CEDILLA + (#xB9 ?\u00B9) ;; SUPERSCRIPT ONE + (#xBA ?\u00BA) ;; MASCULINE ORDINAL INDICATOR + (#xBB ?\u00BB) ;; RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK + (#xBC ?\u00BC) ;; VULGAR FRACTION ONE QUARTER + (#xBD ?\u00BD) ;; VULGAR FRACTION ONE HALF + (#xBE ?\u00BE) ;; VULGAR FRACTION THREE QUARTERS + (#xBF ?\u00BF) ;; INVERTED QUESTION MARK + (#xC0 ?\u00C0) ;; LATIN CAPITAL LETTER A WITH GRAVE + (#xC1 ?\u00C1) ;; LATIN CAPITAL LETTER A WITH ACUTE + (#xC2 ?\u00C2) ;; LATIN CAPITAL LETTER A WITH CIRCUMFLEX + (#xC3 ?\u00C3) ;; LATIN CAPITAL LETTER A WITH TILDE + (#xC4 ?\u00C4) ;; LATIN CAPITAL LETTER A WITH DIAERESIS + (#xC5 ?\u00C5) ;; LATIN CAPITAL LETTER A WITH RING ABOVE + (#xC6 ?\u00C6) ;; LATIN CAPITAL LETTER AE + (#xC7 ?\u00C7) ;; LATIN CAPITAL LETTER C WITH CEDILLA + (#xC8 ?\u00C8) ;; LATIN CAPITAL LETTER E WITH GRAVE + (#xC9 ?\u00C9) ;; LATIN CAPITAL LETTER E WITH ACUTE + (#xCA ?\u00CA) ;; LATIN CAPITAL LETTER E WITH CIRCUMFLEX + (#xCB ?\u00CB) ;; LATIN CAPITAL LETTER E WITH DIAERESIS + (#xCC ?\u00CC) ;; LATIN CAPITAL LETTER I WITH GRAVE + (#xCD ?\u00CD) ;; LATIN CAPITAL LETTER I WITH ACUTE + (#xCE ?\u00CE) ;; LATIN CAPITAL LETTER I WITH CIRCUMFLEX + (#xCF ?\u00CF) ;; LATIN CAPITAL LETTER I WITH DIAERESIS + (#xD0 ?\u011E) ;; LATIN CAPITAL LETTER G WITH BREVE + (#xD1 ?\u00D1) ;; LATIN CAPITAL LETTER N WITH TILDE + (#xD2 ?\u00D2) ;; LATIN CAPITAL LETTER O WITH GRAVE + (#xD3 ?\u00D3) ;; LATIN CAPITAL LETTER O WITH ACUTE + (#xD4 ?\u00D4) ;; LATIN CAPITAL LETTER O WITH CIRCUMFLEX + (#xD5 ?\u00D5) ;; LATIN CAPITAL LETTER O WITH TILDE + (#xD6 ?\u00D6) ;; LATIN CAPITAL LETTER O WITH DIAERESIS + (#xD7 ?\u00D7) ;; MULTIPLICATION SIGN + (#xD8 ?\u00D8) ;; LATIN CAPITAL LETTER O WITH STROKE + (#xD9 ?\u00D9) ;; LATIN CAPITAL LETTER U WITH GRAVE + (#xDA ?\u00DA) ;; LATIN CAPITAL LETTER U WITH ACUTE + (#xDB ?\u00DB) ;; LATIN CAPITAL LETTER U WITH CIRCUMFLEX + (#xDC ?\u00DC) ;; LATIN CAPITAL LETTER U WITH DIAERESIS (#xDD ?\u0130) ;; LATIN CAPITAL LETTER I WITH DOT ABOVE (#xDE ?\u015E) ;; LATIN CAPITAL LETTER S WITH CEDILLA + (#xDF ?\u00DF) ;; LATIN SMALL LETTER SHARP S + (#xE0 ?\u00E0) ;; LATIN SMALL LETTER A WITH GRAVE + (#xE1 ?\u00E1) ;; LATIN SMALL LETTER A WITH ACUTE + (#xE2 ?\u00E2) ;; LATIN SMALL LETTER A WITH CIRCUMFLEX + (#xE3 ?\u00E3) ;; LATIN SMALL LETTER A WITH TILDE + (#xE4 ?\u00E4) ;; LATIN SMALL LETTER A WITH DIAERESIS + (#xE5 ?\u00E5) ;; LATIN SMALL LETTER A WITH RING ABOVE + (#xE6 ?\u00E6) ;; LATIN SMALL LETTER AE + (#xE7 ?\u00E7) ;; LATIN SMALL LETTER C WITH CEDILLA + (#xE8 ?\u00E8) ;; LATIN SMALL LETTER E WITH GRAVE + (#xE9 ?\u00E9) ;; LATIN SMALL LETTER E WITH ACUTE + (#xEA ?\u00EA) ;; LATIN SMALL LETTER E WITH CIRCUMFLEX + (#xEB ?\u00EB) ;; LATIN SMALL LETTER E WITH DIAERESIS + (#xEC ?\u00EC) ;; LATIN SMALL LETTER I WITH GRAVE + (#xED ?\u00ED) ;; LATIN SMALL LETTER I WITH ACUTE + (#xEE ?\u00EE) ;; LATIN SMALL LETTER I WITH CIRCUMFLEX + (#xEF ?\u00EF) ;; LATIN SMALL LETTER I WITH DIAERESIS (#xF0 ?\u011F) ;; LATIN SMALL LETTER G WITH BREVE + (#xF1 ?\u00F1) ;; LATIN SMALL LETTER N WITH TILDE + (#xF2 ?\u00F2) ;; LATIN SMALL LETTER O WITH GRAVE + (#xF3 ?\u00F3) ;; LATIN SMALL LETTER O WITH ACUTE + (#xF4 ?\u00F4) ;; LATIN SMALL LETTER O WITH CIRCUMFLEX + (#xF5 ?\u00F5) ;; LATIN SMALL LETTER O WITH TILDE + (#xF6 ?\u00F6) ;; LATIN SMALL LETTER O WITH DIAERESIS + (#xF7 ?\u00F7) ;; DIVISION SIGN + (#xF8 ?\u00F8) ;; LATIN SMALL LETTER O WITH STROKE + (#xF9 ?\u00F9) ;; LATIN SMALL LETTER U WITH GRAVE + (#xFA ?\u00FA) ;; LATIN SMALL LETTER U WITH ACUTE + (#xFB ?\u00FB) ;; LATIN SMALL LETTER U WITH CIRCUMFLEX + (#xFC ?\u00FC) ;; LATIN SMALL LETTER U WITH DIAERESIS (#xFD ?\u0131) ;; LATIN SMALL LETTER DOTLESS I - (#xFE ?\u015F)) ;; LATIN SMALL LETTER S WITH CEDILLA + (#xFE ?\u015F) ;; LATIN SMALL LETTER S WITH CEDILLA + (#xFF ?\u00FF)) ;; LATIN SMALL LETTER Y WITH DIAERESIS "ISO-8859-9 (Latin-5)" '(mnemonic "Latin 5" aliases (iso-latin-5 latin-5))) @@ -1270,7 +1939,103 @@ (#x9B ?\u203A) ;; SINGLE RIGHT-POINTING ANGLE QUOTATION MARK (#x9C ?\u0153) ;; LATIN SMALL LIGATURE OE (#x9E ?\u017E) ;; LATIN SMALL LETTER Z WITH CARON - (#x9F ?\u0178));; LATIN CAPITAL LETTER Y WITH DIAERESIS + (#x9F ?\u0178) ;; LATIN CAPITAL LETTER Y WITH DIAERESIS + (#xA0 ?\u00A0) ;; NO-BREAK SPACE + (#xA1 ?\u00A1) ;; INVERTED EXCLAMATION MARK + (#xA2 ?\u00A2) ;; CENT SIGN + (#xA3 ?\u00A3) ;; POUND SIGN + (#xA4 ?\u00A4) ;; CURRENCY SIGN + (#xA5 ?\u00A5) ;; YEN SIGN + (#xA6 ?\u00A6) ;; BROKEN BAR + (#xA7 ?\u00A7) ;; SECTION SIGN + (#xA8 ?\u00A8) ;; DIAERESIS + (#xA9 ?\u00A9) ;; COPYRIGHT SIGN + (#xAA ?\u00AA) ;; FEMININE ORDINAL INDICATOR + (#xAB ?\u00AB) ;; LEFT-POINTING DOUBLE ANGLE QUOTATION MARK + (#xAC ?\u00AC) ;; NOT SIGN + (#xAD ?\u00AD) ;; SOFT HYPHEN + (#xAE ?\u00AE) ;; REGISTERED SIGN + (#xAF ?\u00AF) ;; MACRON + (#xB0 ?\u00B0) ;; DEGREE SIGN + (#xB1 ?\u00B1) ;; PLUS-MINUS SIGN + (#xB2 ?\u00B2) ;; SUPERSCRIPT TWO + (#xB3 ?\u00B3) ;; SUPERSCRIPT THREE + (#xB4 ?\u00B4) ;; ACUTE ACCENT + (#xB5 ?\u00B5) ;; MICRO SIGN + (#xB6 ?\u00B6) ;; PILCROW SIGN + (#xB7 ?\u00B7) ;; MIDDLE DOT + (#xB8 ?\u00B8) ;; CEDILLA + (#xB9 ?\u00B9) ;; SUPERSCRIPT ONE + (#xBA ?\u00BA) ;; MASCULINE ORDINAL INDICATOR + (#xBB ?\u00BB) ;; RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK + (#xBC ?\u00BC) ;; VULGAR FRACTION ONE QUARTER + (#xBD ?\u00BD) ;; VULGAR FRACTION ONE HALF + (#xBE ?\u00BE) ;; VULGAR FRACTION THREE QUARTERS + (#xBF ?\u00BF) ;; INVERTED QUESTION MARK + (#xC0 ?\u00C0) ;; LATIN CAPITAL LETTER A WITH GRAVE + (#xC1 ?\u00C1) ;; LATIN CAPITAL LETTER A WITH ACUTE + (#xC2 ?\u00C2) ;; LATIN CAPITAL LETTER A WITH CIRCUMFLEX + (#xC3 ?\u00C3) ;; LATIN CAPITAL LETTER A WITH TILDE + (#xC4 ?\u00C4) ;; LATIN CAPITAL LETTER A WITH DIAERESIS + (#xC5 ?\u00C5) ;; LATIN CAPITAL LETTER A WITH RING ABOVE + (#xC6 ?\u00C6) ;; LATIN CAPITAL LETTER AE + (#xC7 ?\u00C7) ;; LATIN CAPITAL LETTER C WITH CEDILLA + (#xC8 ?\u00C8) ;; LATIN CAPITAL LETTER E WITH GRAVE + (#xC9 ?\u00C9) ;; LATIN CAPITAL LETTER E WITH ACUTE + (#xCA ?\u00CA) ;; LATIN CAPITAL LETTER E WITH CIRCUMFLEX + (#xCB ?\u00CB) ;; LATIN CAPITAL LETTER E WITH DIAERESIS + (#xCC ?\u00CC) ;; LATIN CAPITAL LETTER I WITH GRAVE + (#xCD ?\u00CD) ;; LATIN CAPITAL LETTER I WITH ACUTE + (#xCE ?\u00CE) ;; LATIN CAPITAL LETTER I WITH CIRCUMFLEX + (#xCF ?\u00CF) ;; LATIN CAPITAL LETTER I WITH DIAERESIS + (#xD0 ?\u00D0) ;; LATIN CAPITAL LETTER ETH + (#xD1 ?\u00D1) ;; LATIN CAPITAL LETTER N WITH TILDE + (#xD2 ?\u00D2) ;; LATIN CAPITAL LETTER O WITH GRAVE + (#xD3 ?\u00D3) ;; LATIN CAPITAL LETTER O WITH ACUTE + (#xD4 ?\u00D4) ;; LATIN CAPITAL LETTER O WITH CIRCUMFLEX + (#xD5 ?\u00D5) ;; LATIN CAPITAL LETTER O WITH TILDE + (#xD6 ?\u00D6) ;; LATIN CAPITAL LETTER O WITH DIAERESIS + (#xD7 ?\u00D7) ;; MULTIPLICATION SIGN + (#xD8 ?\u00D8) ;; LATIN CAPITAL LETTER O WITH STROKE + (#xD9 ?\u00D9) ;; LATIN CAPITAL LETTER U WITH GRAVE + (#xDA ?\u00DA) ;; LATIN CAPITAL LETTER U WITH ACUTE + (#xDB ?\u00DB) ;; LATIN CAPITAL LETTER U WITH CIRCUMFLEX + (#xDC ?\u00DC) ;; LATIN CAPITAL LETTER U WITH DIAERESIS + (#xDD ?\u00DD) ;; LATIN CAPITAL LETTER Y WITH ACUTE + (#xDE ?\u00DE) ;; LATIN CAPITAL LETTER THORN + (#xDF ?\u00DF) ;; LATIN SMALL LETTER SHARP S + (#xE0 ?\u00E0) ;; LATIN SMALL LETTER A WITH GRAVE + (#xE1 ?\u00E1) ;; LATIN SMALL LETTER A WITH ACUTE + (#xE2 ?\u00E2) ;; LATIN SMALL LETTER A WITH CIRCUMFLEX + (#xE3 ?\u00E3) ;; LATIN SMALL LETTER A WITH TILDE + (#xE4 ?\u00E4) ;; LATIN SMALL LETTER A WITH DIAERESIS + (#xE5 ?\u00E5) ;; LATIN SMALL LETTER A WITH RING ABOVE + (#xE6 ?\u00E6) ;; LATIN SMALL LETTER AE + (#xE7 ?\u00E7) ;; LATIN SMALL LETTER C WITH CEDILLA + (#xE8 ?\u00E8) ;; LATIN SMALL LETTER E WITH GRAVE + (#xE9 ?\u00E9) ;; LATIN SMALL LETTER E WITH ACUTE + (#xEA ?\u00EA) ;; LATIN SMALL LETTER E WITH CIRCUMFLEX + (#xEB ?\u00EB) ;; LATIN SMALL LETTER E WITH DIAERESIS + (#xEC ?\u00EC) ;; LATIN SMALL LETTER I WITH GRAVE + (#xED ?\u00ED) ;; LATIN SMALL LETTER I WITH ACUTE + (#xEE ?\u00EE) ;; LATIN SMALL LETTER I WITH CIRCUMFLEX + (#xEF ?\u00EF) ;; LATIN SMALL LETTER I WITH DIAERESIS + (#xF0 ?\u00F0) ;; LATIN SMALL LETTER ETH + (#xF1 ?\u00F1) ;; LATIN SMALL LETTER N WITH TILDE + (#xF2 ?\u00F2) ;; LATIN SMALL LETTER O WITH GRAVE + (#xF3 ?\u00F3) ;; LATIN SMALL LETTER O WITH ACUTE + (#xF4 ?\u00F4) ;; LATIN SMALL LETTER O WITH CIRCUMFLEX + (#xF5 ?\u00F5) ;; LATIN SMALL LETTER O WITH TILDE + (#xF6 ?\u00F6) ;; LATIN SMALL LETTER O WITH DIAERESIS + (#xF7 ?\u00F7) ;; DIVISION SIGN + (#xF8 ?\u00F8) ;; LATIN SMALL LETTER O WITH STROKE + (#xF9 ?\u00F9) ;; LATIN SMALL LETTER U WITH GRAVE + (#xFA ?\u00FA) ;; LATIN SMALL LETTER U WITH ACUTE + (#xFB ?\u00FB) ;; LATIN SMALL LETTER U WITH CIRCUMFLEX + (#xFC ?\u00FC) ;; LATIN SMALL LETTER U WITH DIAERESIS + (#xFD ?\u00FD) ;; LATIN SMALL LETTER Y WITH ACUTE + (#xFE ?\u00FE) ;; LATIN SMALL LETTER THORN + (#xFF ?\u00FF));; LATIN SMALL LETTER Y WITH DIAERESIS "Microsoft's extension of iso-8859-1 for Western Europe and the Americas. " '(mnemonic "cp1252" aliases (cp1252))) diff -r 202cb69c4d87 -r e0a8715fdb1f lisp/mule/mule-cmds.el --- a/lisp/mule/mule-cmds.el Thu Feb 05 21:18:37 2009 -0500 +++ b/lisp/mule/mule-cmds.el Sat Feb 07 17:13:37 2009 +0000 @@ -231,9 +231,9 @@ VALUE is a fixed-width 8-bit coding system used to display Unicode error sequences (using a face to make it clear that the data is invalid). In Western Europe - this is normally windows-1252; in the Russia and the - former Soviet Union koi8-ru or windows-1251 makes more - sense." + and the Americas this is normally windows-1252; in + Russia and the former Soviet Union koi8-ru or + windows-1251 makes more sense." (if (symbolp lang-env) (setq lang-env (symbol-name lang-env))) (let (lang-slot prop-slot) @@ -771,7 +771,7 @@ (let ((invalid-sequence-coding-system (get-language-info language-name 'invalid-sequence-coding-system)) (disp-table (specifier-instance current-display-table)) - glyph string) + glyph string unicode-error-lookup) (when (consp invalid-sequence-coding-system) (setq invalid-sequence-coding-system (car invalid-sequence-coding-system))) @@ -779,9 +779,15 @@ #'(lambda (key entry) (setq string (decode-coding-string (string entry) invalid-sequence-coding-system)) - ;; Treat control characters specially: - (when (string-match "^[\x00-\x1f\x80-\x9f]$" string) - (setq string (format "^%c" (+ ?@ (aref string 0))))) + (when (= 1 (length string)) + ;; Treat control characters specially: + (cond + ((string-match "^[\x00-\x1f\x80-\x9f]$" string) + (setq string (format "^%c" (+ ?@ (aref string 0))))) + ((setq unicode-error-lookup + (get-char-table (aref string 0) + unicode-error-default-translation-table)) + (setq string (format "^%c" (+ ?@ unicode-error-lookup)))))) (setq glyph (make-glyph (vector 'string :data string))) (set-glyph-face glyph 'unicode-invalid-sequence-warning-face) (put-char-table key glyph disp-table) @@ -939,7 +945,7 @@ (defun encoded-string-description (str coding-system) "Return a pretty description of STR that is encoded by CODING-SYSTEM." -; (setq str (string-as-unibyte str)) + ;; XEmacs; no transformation to unibyte. (mapconcat (if (and coding-system (eq (coding-system-type coding-system) 'iso2022)) ;; Try to get a pretty description for ISO 2022 escape sequences. @@ -948,36 +954,8 @@ (function (lambda (x) (format "#x%02X" x)))) str " ")) -;; (defun encode-coding-char (char coding-system) -;; "Encode CHAR by CODING-SYSTEM and return the resulting string. -;; If CODING-SYSTEM can't safely encode CHAR, return nil." -;; (if (cmpcharp char) -;; (setq char (car (decompose-composite-char char 'list)))) -;; (let ((str1 (char-to-string char)) -;; (str2 (make-string 2 char)) -;; (safe-charsets (and coding-system -;; (coding-system-get coding-system 'safe-charsets))) -;; enc1 enc2 i1 i2) -;; (when (or (eq safe-charsets t) -;; (memq (char-charset char) safe-charsets)) -;; ;; We must find the encoded string of CHAR. But, just encoding -;; ;; CHAR will put extra control sequences (usually to designate -;; ;; ASCII charset) at the tail if type of CODING is ISO 2022. -;; ;; To exclude such tailing bytes, we at first encode one-char -;; ;; string and two-char string, then check how many bytes at the -;; ;; tail of both encoded strings are the same. -;; -;; (setq enc1 (string-as-unibyte (encode-coding-string str1 coding-system)) -;; i1 (length enc1) -;; enc2 (string-as-unibyte (encode-coding-string str2 coding-system)) -;; i2 (length enc2)) -;; (while (and (> i1 0) (= (aref enc1 (1- i1)) (aref enc2 (1- i2)))) -;; (setq i1 (1- i1) i2 (1- i2))) -;; -;; ;; Now (substring enc1 i1) and (substring enc2 i2) are the same, -;; ;; and they are the extra control sequences at the tail to -;; ;; exclude. -;; (substring enc2 0 i2)))) +;; XEmacs; +;; (defun encode-coding-char (char coding-system) in coding.el. ;; #### The following section is utter junk from mule-misc.el. diff -r 202cb69c4d87 -r e0a8715fdb1f lisp/mule/mule-coding.el --- a/lisp/mule/mule-coding.el Thu Feb 05 21:18:37 2009 -0500 +++ b/lisp/mule/mule-coding.el Sat Feb 07 17:13:37 2009 +0000 @@ -231,11 +231,13 @@ (defun make-8-bit-generate-helper (decode-table encode-table encode-failure-octet) - "Helper function for `make-8-bit-generate-encode-program', which see. + "Helper function, `make-8-bit-generate-encode-program-and-skip-chars-strings', +which see. Deals with the case where ASCII and another character set can both be encoded unambiguously and completely into the coding-system; if this is so, -returns a list corresponding to such a ccl-program. If not, it returns nil. " +returns a list comprised of such a ccl-program and the character set in +question. If not, it returns a list with both entries nil." (let ((tentative-encode-program-parts (eval-when-compile (let* ((vec-len 128) @@ -337,11 +339,11 @@ (append other-charset-vector nil) (copy-tree (second tentative-encode-program-parts)))))) - encode-program)) + (values encode-program worth-trying))) -(defun make-8-bit-generate-encode-program (decode-table encode-table - encode-failure-octet) - "Generate a CCL program to decode a 8-bit fixed-width charset. +(defun make-8-bit-generate-encode-program-and-skip-chars-strings + (decode-table encode-table encode-failure-octet) + "Generate a CCL program to encode a 8-bit fixed-width charset. DECODE-TABLE must have 256 non-cons entries, and will be regarded as describing a map from the octet corresponding to an offset in the @@ -399,7 +401,13 @@ in compiled CCL code.\nIf that is not the case, and it appears not to be--that's why you're getting this message--it will not work. ") prog))) - (ascii-encodes-as-itself nil)) + (ascii-encodes-as-itself nil) + (control-1-encodes-as-itself t) + (invalid-sequence-code-point-start + (eval-when-compile + (char-to-unicode + (aref (decode-coding-string "\xd8\x00\x00\x00" 'utf-16-be) 3)))) + further-char-set skip-chars invalid-sequences-skip-chars) ;; Is this coding system ASCII-compatible? If so, we can avoid the hash ;; table lookup for those characters. @@ -418,17 +426,18 @@ ;; slow, a hash table lookup + mule-unicode conversion is done ;; for every character encoding. (setq encode-program general-encode-program) - (setq encode-program - ;; Encode program with ascii-ascii mapping (based on a - ;; character's mule character set), and one other mule - ;; character set using table-based encoding, other - ;; character sets using hash table lookups. - ;; make-8-bit-non-ascii-completely-coveredp only returns - ;; such a mapping if some non-ASCII charset with - ;; characters in decode-table is entirely covered by - ;; encode-table. - (make-8-bit-generate-helper decode-table encode-table - encode-failure-octet)) + (multiple-value-setq + (encode-program further-char-set) + ;; Encode program with ascii-ascii mapping (based on a + ;; character's mule character set), and one other mule + ;; character set using table-based encoding, other + ;; character sets using hash table lookups. + ;; make-8-bit-non-ascii-completely-coveredp only returns + ;; such a mapping if some non-ASCII charset with + ;; characters in decode-table is entirely covered by + ;; encode-table. + (make-8-bit-generate-helper decode-table encode-table + encode-failure-octet)) (unless encode-program ;; If make-8-bit-non-ascii-completely-coveredp returned nil, ;; but ASCII still encodes as itself, do one-to-one mapping @@ -441,7 +450,66 @@ (logior (lsh encode-failure-octet 8) #x14))) (copy-tree encode-program))) - encode-program)) + (loop + for i from #x80 to #x9f + do (unless (= i (aref decode-table i)) + (setq control-1-encodes-as-itself nil) + (return))) + (loop + for i from #x00 to #xFF + initially (setq skip-chars + (cond + ((and ascii-encodes-as-itself + control-1-encodes-as-itself further-char-set) + (concat "\x00-\x9f" (charset-skip-chars-string + further-char-set))) + ((and ascii-encodes-as-itself + control-1-encodes-as-itself) + "\x00-\x9f") + ((null ascii-encodes-as-itself) + (skip-chars-quote (apply #'string + (append decode-table nil)))) + (further-char-set + (concat (charset-skip-chars-string 'ascii) + (charset-skip-chars-string further-char-set))) + (t + (charset-skip-chars-string 'ascii))) + invalid-sequences-skip-chars "") + with decoded-ucs = nil + with decoded = nil + with no-ascii-transparency-skip-chars-list = + (unless ascii-encodes-as-itself (append decode-table nil)) + ;; Can't use #'match-string here, see: + ;; http://mid.gmane.org/18829.34118.709782.704574@parhasard.net + with skip-chars-test = + #'(lambda (skip-chars-string testing) + (with-temp-buffer + (insert testing) + (goto-char (point-min)) + (skip-chars-forward skip-chars-string) + (= (point) (point-max)))) + do + (setq decoded (aref decode-table i) + decoded-ucs (char-to-unicode decoded)) + (cond + ((<= invalid-sequence-code-point-start decoded-ucs + (+ invalid-sequence-code-point-start #xFF)) + (setq invalid-sequences-skip-chars + (concat (string decoded) + invalid-sequences-skip-chars)) + (assert (not (funcall skip-chars-test skip-chars decoded)) + "This char should only be skipped with \ +`invalid-sequences-skip-chars', not by `skip-chars'")) + ((not (funcall skip-chars-test skip-chars decoded)) + (if ascii-encodes-as-itself + (setq skip-chars (concat skip-chars (string decoded))) + (push decoded no-ascii-transparency-skip-chars-list)))) + finally (unless ascii-encodes-as-itself + (setq skip-chars + (skip-chars-quote + (apply #'string + no-ascii-transparency-skip-chars-list))))) + (values encode-program skip-chars invalid-sequences-skip-chars))) (defun make-8-bit-create-decode-encode-tables (unicode-map) "Return a list \(DECODE-TABLE ENCODE-TABLE) given UNICODE-MAP. @@ -453,7 +521,11 @@ (let ((decode-table (make-vector 256 nil)) (encode-table (make-hash-table :size 256)) (private-use-start (encode-char make-8-bit-private-use-start 'ucs)) - desired-ucs) + (invalid-sequence-code-point-start + (eval-when-compile + (char-to-unicode + (aref (decode-coding-string "\xd8\x00\x00\x00" 'utf-16-be) 3)))) + desired-ucs decode-table-entry) (loop for (external internal) in unicode-map @@ -475,24 +547,51 @@ (int-to-char external) encode-table)) - ;; Now, go through the decode table looking at the characters that - ;; remain nil. If the XEmacs character with that integer is already in - ;; the encode table, map the on-disk octet to a Unicode private use - ;; character. Otherwise map the on-disk octet to the XEmacs character - ;; with that numeric value, to make it clearer what it is. + ;; Now, go through the decode table. For octet values above #x7f, if the + ;; decode table entry is nil, this means that they have an undefined + ;; mapping (= they map to XEmacs characters with keys in + ;; unicode-error-default-translation-table); for octet values below or + ;; equal to #x7f, it means that they map to ASCII. + + ;; If any entry (whether below or above #x7f) in the decode-table + ;; already maps to some character with a key in + ;; unicode-error-default-translation-table, it is treated as an + ;; undefined octet by `query-coding-region'. That is, it is not + ;; necessary for an octet value to be above #x7f for this to happen. + (dotimes (i 256) - (when (null (aref decode-table i)) - ;; Find a free code point. - (setq desired-ucs i) - (while (gethash desired-ucs encode-table) - ;; In the normal case, the code point chosen will be U+E0XY, where - ;; XY is the hexadecimal octet on disk. In pathological cases - ;; it'll be something else. - (setq desired-ucs (+ private-use-start desired-ucs) - private-use-start (+ private-use-start 1))) - (puthash desired-ucs (int-to-char i) encode-table) + (setq decode-table-entry (aref decode-table i)) + (if decode-table-entry + (when (get-char-table + decode-table-entry + unicode-error-default-translation-table) + ;; The caller is explicitly specifying that this octet + ;; corresponds to an invalid sequence on disk: + (assert (= (get-char-table + decode-table-entry + unicode-error-default-translation-table) i) + "Bad argument to `make-8-bit-coding-system'. +If you're going to designate an octet with value below #x80 as invalid +for this coding system, make sure to map it to the invalid sequence +character corresponding to its octet value on disk. ")) + + ;; decode-table-entry is nil; either the octet is to be treated as + ;; contributing to an error sequence (when (> #x7f i)), or it should + ;; be attempted to treat it as ASCII-equivalent. + (setq desired-ucs (or (and (< i #x80) i) + (+ invalid-sequence-code-point-start i))) + (while (gethash desired-ucs encode-table) + (assert (not (< i #x80)) + "UCS code point should not already be in encode-table!" + ;; There is one invalid sequence char per octet value; + ;; with eight-bit-fixed coding systems, it makes no sense + ;; for us to be multiply allocating them. + (gethash desired-ucs encode-table)) + (setq desired-ucs (+ private-use-start desired-ucs) + private-use-start (+ private-use-start 1))) + (puthash desired-ucs (int-to-char i) encode-table) (setq desired-ucs (if (> desired-ucs #xFF) - (decode-char 'ucs desired-ucs) + (unicode-to-char desired-ucs) ;; So we get Latin-1 when run at dump time, ;; instead of JIT-allocated characters. (int-to-char desired-ucs))) @@ -546,8 +645,9 @@ (return-from category 'no-conversion)) finally return 'iso-8-1)) -(defun 8-bit-fixed-query-coding-region (begin end coding-system - &optional buffer errorp highlightp) +(defun 8-bit-fixed-query-coding-region (begin end coding-system &optional + buffer ignore-invalid-sequencesp + errorp highlightp) "The `query-coding-region' implementation for 8-bit-fixed coding systems. Uses the `8-bit-fixed-query-from-unicode' and `8-bit-fixed-query-skip-chars' @@ -570,65 +670,79 @@ (or (coding-system-get coding-system '8-bit-fixed-query-skip-chars) (coding-system-get (coding-system-base coding-system) '8-bit-fixed-query-skip-chars))) + (invalid-sequences-skip-chars + (or (coding-system-get coding-system + '8-bit-fixed-invalid-sequences-skip-chars) + (coding-system-get (coding-system-base coding-system) + '8-bit-fixed-invalid-sequences-skip-chars))) (ranges (make-range-table)) + (case-fold-search nil) char-after fail-range-start fail-range-end previous-fail extent - failed) + failed invalid-sequences-looking-at failed-reason + previous-failed-reason) (check-type from-unicode hash-table) (check-type skip-chars-arg string) + (check-type invalid-sequences-skip-chars string) + (setq invalid-sequences-looking-at + (if (equal "" invalid-sequences-skip-chars) + ;; Regexp that will never match. + #r".\{0,0\}" + (concat "[" invalid-sequences-skip-chars "]"))) + (when ignore-invalid-sequencesp + (setq skip-chars-arg + (concat skip-chars-arg invalid-sequences-skip-chars))) (save-excursion (when highlightp - (map-extents #'(lambda (extent ignored-arg) - (when (eq 'query-coding-warning-face - (extent-face extent)) - (delete-extent extent))) buffer begin end)) + (query-coding-clear-highlights begin end buffer)) (goto-char begin buffer) (skip-chars-forward skip-chars-arg end buffer) (while (< (point buffer) end) - ; (message - ; "fail-range-start is %S, previous-fail %S, point is %S, end is %S" - ; fail-range-start previous-fail (point buffer) end) (setq char-after (char-after (point buffer) buffer) fail-range-start (point buffer)) - ; (message "arguments are %S %S" - ; (< (point buffer) end) - ; (not (gethash (encode-char char-after 'ucs) from-unicode))) (while (and (< (point buffer) end) - (not (gethash (encode-char char-after 'ucs) from-unicode))) + (or (and + (not (gethash (encode-char char-after 'ucs) from-unicode)) + (setq failed-reason 'unencodable)) + (and (not ignore-invalid-sequencesp) + (looking-at invalid-sequences-looking-at buffer) + (setq failed-reason 'invalid-sequence))) + (or (null previous-failed-reason) + (eq previous-failed-reason failed-reason))) (forward-char 1 buffer) (setq char-after (char-after (point buffer) buffer) - failed t)) + failed t + previous-failed-reason failed-reason)) (if (= fail-range-start (point buffer)) ;; The character can actually be encoded by the coding ;; system; check the characters past it. (forward-char 1 buffer) ;; The character actually failed. - ; (message "past the move through, point now %S" (point buffer)) (when errorp (error 'text-conversion-error (format "Cannot encode %s using coding system" (buffer-substring fail-range-start (point buffer) buffer)) (coding-system-name coding-system))) + (assert (not (null previous-failed-reason)) t + "previous-failed-reason should always be non-nil here") (put-range-table fail-range-start ;; If char-after is non-nil, we're not at ;; the end of the buffer. (setq fail-range-end (if char-after (point buffer) (point-max buffer))) - t ranges) + previous-failed-reason ranges) + (setq previous-failed-reason nil) (when highlightp - ; (message "highlighting") (setq extent (make-extent fail-range-start fail-range-end buffer)) (set-extent-priority extent (+ mouse-highlight-priority 2)) (set-extent-face extent 'query-coding-warning-face)) (skip-chars-forward skip-chars-arg end buffer))) - ; (message "about to give the result, ranges %S" ranges) (if failed (values nil ranges) (values t nil))))) -;;;###autoload (defun make-8-bit-coding-system (name unicode-map &optional description props) "Make and return a fixed-width 8-bit CCL coding system named NAME. NAME must be a symbol, and UNICODE-MAP a list. @@ -644,12 +758,20 @@ character sets will not be distinct when written to disk, which is less often what is intended. -Any octets not mapped will be decoded into the ISO 8859-1 characters with -the corresponding numeric value; unless another octet maps to that -character, in which case the Unicode private use area will be used. This -avoids spurious changes to files on disk when they contain octets that would -be otherwise remapped to the canonical values for the corresponding -characters in the coding system. +Any octets not mapped, and with values above #x7f, will be decoded into +XEmacs characters that reflect that their values are undefined. These +characters will be displayed in a language-environment-specific way. See +`unicode-error-default-translation-table' and the +`invalid-sequence-coding-system' argument to `set-language-info'. + +These characters will normally be treated as invalid when checking whether +text can be encoded with `query-coding-region'--see the +IGNORE-INVALID-SEQUENCESP argument to that function to avoid this. It is +possible to specify that octets with values less than #x80 (or indeed +greater than it) be treated in this way, by specifying explicitly that they +correspond to the character mapping to that octet in +`unicode-error-default-translation-table'. Far fewer coding systems +override the ASCII mapping, though, so this is not the default. DESCRIPTION and PROPS are as in `make-coding-system', which see. This function also accepts two additional (optional) properties in PROPS; @@ -668,7 +790,8 @@ (char-to-int ?~))) (aliases (plist-get props 'aliases)) (hash-table-sym (gentemp (format "%s-encode-table" name))) - encode-program decode-program result decode-table encode-table) + encode-program decode-program result decode-table encode-table + skip-chars invalid-sequences-skip-chars) ;; Some more sanity checking. (check-argument-range encode-failure-octet 0 #xFF) @@ -685,10 +808,13 @@ ;; Register the decode-table. (define-translation-hash-table hash-table-sym encode-table) - ;; Generate the programs. - (setq decode-program (make-8-bit-generate-decode-program decode-table) - encode-program (make-8-bit-generate-encode-program - decode-table encode-table encode-failure-octet)) + ;; Generate the programs and skip-chars strings. + (setq decode-program (make-8-bit-generate-decode-program decode-table)) + (multiple-value-setq + (encode-program skip-chars invalid-sequences-skip-chars) + (make-8-bit-generate-encode-program-and-skip-chars-strings + decode-table encode-table encode-failure-octet)) + (unless (vectorp encode-program) (setq encode-program (apply #'vector @@ -709,10 +835,10 @@ (coding-system-put name 'category (make-8-bit-choose-category decode-table)) (coding-system-put name '8-bit-fixed-query-skip-chars - (skip-chars-quote - (apply #'string (append decode-table nil)))) + skip-chars) + (coding-system-put name '8-bit-fixed-invalid-sequences-skip-chars + invalid-sequences-skip-chars) (coding-system-put name '8-bit-fixed-query-from-unicode encode-table) - (coding-system-put name 'query-coding-function #'8-bit-fixed-query-coding-region) (coding-system-put (intern (format "%s-unix" name)) @@ -751,7 +877,8 @@ (or (plist-get props 'encode-failure-octet) (char-to-int ?~))) (aliases (plist-get props 'aliases)) encode-program decode-program - decode-table encode-table) + decode-table encode-table + skip-chars invalid-sequences-skip-chars) ;; Some sanity checking. (check-argument-range encode-failure-octet 0 #xFF) @@ -761,25 +888,21 @@ (setq props (plist-remprop props 'encode-failure-octet) props (plist-remprop props 'aliases)) - ;; Work out encode-table and decode-table. + ;; Work out encode-table and decode-table (multiple-value-setq - (decode-table encode-table) - (make-8-bit-create-decode-encode-tables unicode-map)) + (decode-table encode-table) + (make-8-bit-create-decode-encode-tables unicode-map)) - ;; Generate the decode and encode programs. - (setq decode-program (make-8-bit-generate-decode-program decode-table) - encode-program (make-8-bit-generate-encode-program - decode-table encode-table encode-failure-octet)) + ;; Generate the decode and encode programs, and the skip-chars + ;; arguments. + (setq decode-program (make-8-bit-generate-decode-program decode-table)) + (multiple-value-setq + (encode-program skip-chars invalid-sequences-skip-chars) + (make-8-bit-generate-encode-program-and-skip-chars-strings + decode-table encode-table encode-failure-octet)) ;; And return the generated code. `(let ((encode-table-sym (gentemp (format "%s-encode-table" ',name))) - ;; The case-fold-search bind shouldn't be necessary. If I take - ;; it, out, though, I get: - ;; - ;; (invalid-read-syntax "Multiply defined symbol label" 1) - ;; - ;; when the file is byte compiled. - (case-fold-search t) (encode-table ,encode-table)) (define-translation-hash-table encode-table-sym encode-table) (make-coding-system @@ -797,8 +920,9 @@ (coding-system-put ',name 'category ',(make-8-bit-choose-category decode-table)) (coding-system-put ',name '8-bit-fixed-query-skip-chars - ',(skip-chars-quote - (apply #'string (append decode-table nil)))) + ,skip-chars) + (coding-system-put ',name '8-bit-fixed-invalid-sequences-skip-chars + ,invalid-sequences-skip-chars) (coding-system-put ',name '8-bit-fixed-query-from-unicode encode-table) (coding-system-put ',name 'query-coding-function #'8-bit-fixed-query-coding-region) @@ -819,7 +943,9 @@ ;; Ideally this would be in latin.el, but code-init.el uses it. (make-8-bit-coding-system 'iso-8859-1 - '() ;; No differences from Latin 1. + (loop + for i from #x80 to #xff + collect (list i (int-char i))) ;; Identical to Latin-1. "ISO-8859-1 (Latin-1)" '(mnemonic "Latin 1" documentation "The most used encoding of Western Europe and the Americas." diff -r 202cb69c4d87 -r e0a8715fdb1f lisp/mule/vietnamese.el --- a/lisp/mule/vietnamese.el Thu Feb 05 21:18:37 2009 -0500 +++ b/lisp/mule/vietnamese.el Sat Feb 07 17:13:37 2009 +0000 @@ -26,7 +26,7 @@ ;;; Commentary: -;; For Vietnames, the character sets VISCII and VSCII are supported. +;; For Vietnamese, the character sets VISCII and VSCII are supported. ;;; Code: diff -r 202cb69c4d87 -r e0a8715fdb1f lisp/unicode.el --- a/lisp/unicode.el Thu Feb 05 21:18:37 2009 -0500 +++ b/lisp/unicode.el Sat Feb 07 17:13:37 2009 +0000 @@ -617,38 +617,69 @@ "Used by `unicode-query-coding-region' to skip chars with known mappings.") (defun unicode-query-coding-region (begin end coding-system - &optional buffer errorp highlightp) - "The `query-coding-region' implementation for Unicode coding systems." + &optional buffer ignore-invalid-sequencesp + errorp highlightp) + "The `query-coding-region' implementation for Unicode coding systems. + +Supports IGNORE-INVALID-SEQUENCESP, that is, XEmacs characters that reflect +invalid octets on disk will be treated as encodable if this argument is +specified, and as not encodable if it is not specified." + + ;; Potential problem here; the octets that correspond to octets from #x00 + ;; to #x7f on disk will be treated by utf-8 and utf-7 as invalid + ;; sequences, and thus, in theory, encodable. + (check-argument-type #'coding-system-p (setq coding-system (find-coding-system coding-system))) (check-argument-type #'integer-or-marker-p begin) (check-argument-type #'integer-or-marker-p end) - (let* ((skip-chars-arg unicode-query-coding-skip-chars-arg) + (let* ((skip-chars-arg (concat unicode-query-coding-skip-chars-arg + (if ignore-invalid-sequencesp + unicode-invalid-sequence-regexp-range + ""))) (ranges (make-range-table)) (looking-at-arg (concat "[" skip-chars-arg "]")) + (case-fold-search nil) fail-range-start fail-range-end char-after failed - extent) + extent char-unicode invalid-sequence-p failed-reason + previous-failed-reason) (save-excursion (when highlightp - (map-extents #'(lambda (extent ignored-arg) - (when (eq 'query-coding-warning-face - (extent-face extent)) - (delete-extent extent))) buffer begin end)) + (query-coding-clear-highlights begin end buffer)) (goto-char begin buffer) (skip-chars-forward skip-chars-arg end buffer) (while (< (point buffer) end) -; (message -; "fail-range-start is %S, point is %S, end is %S" -; fail-range-start (point buffer) end) (setq char-after (char-after (point buffer) buffer) fail-range-start (point buffer)) (while (and (< (point buffer) end) (not (looking-at looking-at-arg)) - (= -1 (char-to-unicode char-after))) + (or (and + (= -1 (setq char-unicode (char-to-unicode char-after))) + (setq failed-reason 'unencodable)) + (and (not ignore-invalid-sequencesp) + ;; The default case, with ignore-invalid-sequencesp + ;; not specified: + ;; If the character is in the Unicode range that + ;; corresponds to an invalid octet, we want to + ;; treat it as unencodable. + (<= (eval-when-compile + (char-to-unicode + (aref (decode-coding-string "\xd8\x00\x00\x00" + 'utf-16-be) 3))) + char-unicode) + (<= char-unicode + (eval-when-compile + (char-to-unicode + (aref (decode-coding-string "\xd8\x00\x00\xFF" + 'utf-16-be) 3)))) + (setq failed-reason 'invalid-sequence))) + (or (null previous-failed-reason) + (eq previous-failed-reason failed-reason))) (forward-char 1 buffer) (setq char-after (char-after (point buffer) buffer) - failed t)) + failed t + previous-failed-reason failed-reason)) (if (= fail-range-start (point buffer)) ;; The character can actually be encoded by the coding ;; system; check the characters past it. @@ -660,13 +691,17 @@ (buffer-substring fail-range-start (point buffer) buffer)) (coding-system-name coding-system))) + (assert + (not (null previous-failed-reason)) t + "If we've got here, previous-failed-reason should be non-nil.") (put-range-table fail-range-start ;; If char-after is non-nil, we're not at ;; the end of the buffer. (setq fail-range-end (if char-after (point buffer) (point-max buffer))) - t ranges) + previous-failed-reason ranges) + (setq previous-failed-reason nil) (when highlightp (setq extent (make-extent fail-range-start fail-range-end buffer)) (set-extent-priority extent (+ mouse-highlight-priority 2)) diff -r 202cb69c4d87 -r e0a8715fdb1f tests/ChangeLog --- a/tests/ChangeLog Thu Feb 05 21:18:37 2009 -0500 +++ b/tests/ChangeLog Sat Feb 07 17:13:37 2009 +0000 @@ -1,3 +1,11 @@ +2009-02-07 Aidan Kehoe + + * automated/query-coding-tests.el: + Add FAILING-CASE arguments to the Assert calls, making #'q-c-debug + mostly unnecessary. Remove #'q-c-debug. + Add new tests that use the IGNORE-INVALID-SEQUENCESP argument to + #'query-coding-region; rework the existing ones to respect it. + 2009-01-31 Aidan Kehoe * automated/mule-tests.el: diff -r 202cb69c4d87 -r e0a8715fdb1f tests/automated/query-coding-tests.el --- a/tests/automated/query-coding-tests.el Thu Feb 05 21:18:37 2009 -0500 +++ b/tests/automated/query-coding-tests.el Sat Feb 07 17:13:37 2009 +0000 @@ -31,28 +31,6 @@ (require 'bytecomp) -(defun q-c-debug (&rest aerger) - (let ((standard-output (get-buffer-create "query-coding-debug")) - (fmt (condition-case nil - (and (stringp (first aerger)) - (apply #'format aerger)) - (error nil)))) - (if fmt - (progn - (princ (apply #'format aerger)) - (terpri)) - (princ "--> ") - (let ((i 1)) - (dolist (sgra aerger) - (if (> i 1) (princ " ")) - (princ (format "%d. " i)) - (prin1 sgra) - (incf i)) - (terpri))))) - -;; Comment this out if debugging: -(defalias 'q-c-debug #'ignore) - (when (featurep 'mule) (let ((ascii-chars-string (apply #'string (loop for i from #x0 to #x7f @@ -64,7 +42,7 @@ (with-temp-buffer (insert ascii-chars-string) ;; First, check all the coding systems that are ASCII-transparent for - ;; ASCII-transparency in the check. + ;; ASCII-transparency in query-coding-region. (dolist (coding-system (delete-duplicates (mapcar #'(lambda (coding-system) @@ -87,76 +65,142 @@ unix-coding-system))) (coding-system-list nil)) :test #'eq)) - (q-c-debug "looking at coding system %S" (coding-system-name - coding-system)) (multiple-value-bind (query-coding-succeeded query-coding-table) (query-coding-region (point-min) (point-max) coding-system) - (Assert (eq t query-coding-succeeded)) - (Assert (null query-coding-table))) + (Assert (eq t query-coding-succeeded) + (format "checking query-coding-region ASCII-transparency, %s" + coding-system)) + (Assert (null query-coding-table) + (format "checking query-coding-region ASCII-transparency, %s" + coding-system))) (multiple-value-bind (query-coding-succeeded query-coding-table) (query-coding-string ascii-chars-string coding-system) - (Assert (eq t query-coding-succeeded)) - (Assert (null query-coding-table)))) + (Assert (eq t query-coding-succeeded) + (format "checking query-coding-string ASCII-transparency, %s" + coding-system)) + (Assert (null query-coding-table) + (format "checking query-coding-string ASCII-transparency, %s" + coding-system)))) (delete-region (point-min) (point-max)) ;; Check for success from the two Latin-1 coding systems (insert latin-1-chars-string) (multiple-value-bind (query-coding-succeeded query-coding-table) (query-coding-region (point-min) (point-max) 'iso-8859-1-unix) - (Assert (eq t query-coding-succeeded)) - (Assert (null query-coding-table))) + (Assert (eq t query-coding-succeeded) + "checking query-coding-region iso-8859-1-transparency") + (Assert (null query-coding-table) + "checking query-coding-region iso-8859-1-transparency")) (multiple-value-bind (query-coding-succeeded query-coding-table) (query-coding-string (buffer-string) 'iso-8859-1-unix) - (Assert (eq t query-coding-succeeded)) - (Assert (null query-coding-table))) + (Assert (eq t query-coding-succeeded) + "checking query-coding-string iso-8859-1-transparency") + (Assert (null query-coding-table) + "checking query-coding-string iso-8859-1-transparency")) (multiple-value-bind (query-coding-succeeded query-coding-table) (query-coding-string (buffer-string) 'iso-latin-1-with-esc-unix) - (Assert (eq t query-coding-succeeded)) - (Assert (null query-coding-table))) + (Assert + (eq t query-coding-succeeded) + "checking query-coding-region iso-latin-1-with-esc-transparency") + (Assert + (null query-coding-table) + "checking query-coding-region iso-latin-1-with-esc-transparency")) ;; Make it fail, check that it fails correctly (insert (decode-char 'ucs #x20AC)) ;; EURO SIGN (multiple-value-bind (query-coding-succeeded query-coding-table) (query-coding-region (point-min) (point-max) 'iso-8859-1-unix) - (Assert (null query-coding-succeeded)) - (Assert (equal query-coding-table - #s(range-table type start-closed-end-open data - ((257 258) t))))) + (Assert + (null query-coding-succeeded) + "checking that query-coding-region fails, U+20AC, iso-8859-1") + (Assert + (equal query-coding-table + #s(range-table type start-closed-end-open data + ((257 258) unencodable))) + "checking query-coding-region fails correctly, U+20AC, iso-8859-1")) (multiple-value-bind (query-coding-succeeded query-coding-table) (query-coding-region (point-min) (point-max) 'iso-latin-1-with-esc-unix) ;; Stupidly, this succeeds. The behaviour is compatible with ;; GNU, though, and we encourage people not to use ;; iso-latin-1-with-esc-unix anyway: - (Assert query-coding-succeeded) - (Assert (null query-coding-table))) + (Assert + query-coding-succeeded + "checking that query-coding-region succeeds, U+20AC, \ +iso-latin-with-esc-unix-1") + (Assert + (null query-coding-table) + "checking that query-coding-region succeeds, U+20AC, \ +iso-latin-with-esc-unix-1")) ;; Check that it errors correctly. (setq text-conversion-error-signalled nil) (condition-case nil - (query-coding-region (point-min) (point-max) 'iso-8859-1-unix nil t) + (query-coding-region (point-min) (point-max) 'iso-8859-1-unix + (current-buffer) nil t) (text-conversion-error (setq text-conversion-error-signalled t))) - (Assert text-conversion-error-signalled) + (Assert + text-conversion-error-signalled + "checking query-coding-region signals text-conversion-error correctly") (setq text-conversion-error-signalled nil) (condition-case nil (query-coding-region (point-min) (point-max) - 'iso-latin-1-with-esc-unix nil t) + 'iso-latin-1-with-esc-unix nil nil t) (text-conversion-error (setq text-conversion-error-signalled t))) - (Assert (null text-conversion-error-signalled)) + (Assert + (null text-conversion-error-signalled) + "checking query-coding-region doesn't signal text-conversion-error") (delete-region (point-min) (point-max)) (insert latin-1-chars-string) (decode-coding-region (point-min) (point-max) 'windows-1252-unix) (goto-char (point-max)) ;; #'decode-coding-region just messed up point. (multiple-value-bind (query-coding-succeeded query-coding-table) (query-coding-region (point-min) (point-max) 'windows-1252-unix) - (Assert (eq t query-coding-succeeded)) - (Assert (null query-coding-table))) + (Assert + (null query-coding-succeeded) + "check query-coding-region fails, windows-1252, invalid-sequences") + (Assert + (equal query-coding-table + #s(range-table type start-closed-end-open + data ((130 131) invalid-sequence + (142 143) invalid-sequence + (144 146) invalid-sequence + (158 159) invalid-sequence))) + "check query-coding-region fails, windows-1252, invalid-sequences")) + (multiple-value-bind (query-coding-succeeded query-coding-table) + (query-coding-region (point-min) (point-max) 'windows-1252-unix + (current-buffer) t) + (Assert + (eq t query-coding-succeeded) + "checking that query-coding-region succeeds, U+20AC, windows-1252") + (Assert + (null query-coding-table) + "checking that query-coding-region succeeds, U+20AC, windows-1252")) (insert ?\x80) (multiple-value-bind (query-coding-succeeded query-coding-table) + (query-coding-region (point-min) (point-max) 'windows-1252-unix + (current-buffer) t) + (Assert + (null query-coding-succeeded) + "checking that query-coding-region fails, U+0080, windows-1252") + (Assert + (equal query-coding-table + #s(range-table type start-closed-end-open data + ((257 258) unencodable))) + "checking that query-coding-region fails, U+0080, windows-1252")) + (multiple-value-bind (query-coding-succeeded query-coding-table) (query-coding-region (point-min) (point-max) 'windows-1252-unix) - (Assert (null query-coding-succeeded)) - (Assert (equal query-coding-table - #s(range-table type start-closed-end-open data - ((257 258) t))))) + (Assert + (null query-coding-succeeded) + "check query-coding-region fails, U+0080, invalid-sequence, cp1252") + (Assert + (equal query-coding-table + #s(range-table type start-closed-end-open + data ((130 131) invalid-sequence + (142 143) invalid-sequence + (144 146) invalid-sequence + (158 159) invalid-sequence + (257 258) unencodable))) + "check query-coding-region fails, U+0080, invalid-sequence, cp1252")) ;; Try a similar approach with koi8-o, the koi8 variant with ;; support for Old Church Slavonic. (delete-region (point-min) (point-max)) @@ -164,29 +208,53 @@ (decode-coding-region (point-min) (point-max) 'koi8-o-unix) (multiple-value-bind (query-coding-succeeded query-coding-table) (query-coding-region (point-min) (point-max) 'koi8-o-unix) - (Assert (eq t query-coding-succeeded)) - (Assert (null query-coding-table))) + (Assert + (eq t query-coding-succeeded) + "checking that query-coding-region succeeds, koi8-o-unix") + (Assert + (null query-coding-table) + "checking that query-coding-region succeeds, koi8-o-unix")) (multiple-value-bind (query-coding-succeeded query-coding-table) (query-coding-region (point-min) (point-max) 'escape-quoted) - (Assert (eq t query-coding-succeeded)) - (Assert (null query-coding-table))) + (Assert (eq t query-coding-succeeded) + "checking that query-coding-region succeeds, escape-quoted") + (Assert (null query-coding-table) + "checking that query-coding-region succeeds, escape-quoted")) (multiple-value-bind (query-coding-succeeded query-coding-table) (query-coding-region (point-min) (point-max) 'windows-1252-unix) - (Assert (null query-coding-succeeded)) - (Assert (equal query-coding-table - #s(range-table type start-closed-end-open - data ((129 131) t (132 133) t (139 140) t - (141 146) t (155 156) t (157 161) t - (162 170) t (173 176) t (178 187) t - (189 192) t (193 257) t))))) + (Assert + (null query-coding-succeeded) + "checking that query-coding-region fails, windows-1252 and Cyrillic") + (Assert + (equal query-coding-table + #s(range-table type start-closed-end-open + data ((129 131) unencodable + (132 133) unencodable + (139 140) unencodable + (141 146) unencodable + (155 156) unencodable + (157 161) unencodable + (162 170) unencodable + (173 176) unencodable + (178 187) unencodable + (189 192) unencodable + (193 257) unencodable))) + "checking that query-coding-region fails, windows-1252 and Cyrillic")) (multiple-value-bind (query-coding-succeeded query-coding-table) (query-coding-region (point-min) (point-max) 'koi8-r-unix) - (Assert (null query-coding-succeeded)) - (Assert (equal query-coding-table - #s(range-table type start-closed-end-open - data ((129 154) t (155 161) t (162 164) t - (165 177) t (178 180) t - (181 192) t))))) + (Assert + (null query-coding-succeeded) + "checking that query-coding-region fails, koi8-r and OCS characters") + (Assert + (equal query-coding-table + #s(range-table type start-closed-end-open + data ((129 154) unencodable + (155 161) unencodable + (162 164) unencodable + (165 177) unencodable + (178 180) unencodable + (181 192) unencodable))) + "checking that query-coding-region fails, koi8-r and OCS characters")) ;; Check that the Unicode coding systems handle characters ;; without Unicode mappings. (delete-region (point-min) (point-max)) @@ -210,19 +278,29 @@ utf-16-little-endian-bom)) (multiple-value-bind (query-coding-succeeded query-coding-table) (query-coding-region (point-min) (point-max) coding-system) - (Assert (null query-coding-succeeded)) + (Assert (null query-coding-succeeded) + "checking unicode coding systems fail with unmapped chars") (Assert (equal query-coding-table #s(range-table type start-closed-end-open data - ((173 174) t (209 210) t - (254 255) t))))) + ((173 174) unencodable + (209 210) unencodable + (254 255) unencodable))) + "checking unicode coding systems fail with unmapped chars")) (multiple-value-bind (query-coding-succeeded query-coding-table) (query-coding-region (point-min) 173 coding-system) - (Assert (eq t query-coding-succeeded)) - (Assert (null query-coding-table))) + (Assert (eq t query-coding-succeeded) + "checking unicode coding systems succeed sans unmapped chars") + (Assert + (null query-coding-table) + "checking unicode coding systems succeed sans unmapped chars")) (multiple-value-bind (query-coding-succeeded query-coding-table) (query-coding-region 174 209 coding-system) - (Assert (eq t query-coding-succeeded)) - (Assert (null query-coding-table))) + (Assert + (eq t query-coding-succeeded) + "checking unicode coding systems succeed sans unmapped chars, again") + (Assert + (null query-coding-table) + "checking unicode coding systems succeed sans unmapped chars again")) (multiple-value-bind (query-coding-succeeded query-coding-table) (query-coding-region 210 254 coding-system) (Assert (eq t query-coding-succeeded)) @@ -230,77 +308,143 @@ ;; Check that it errors correctly. (setq text-conversion-error-signalled nil) (condition-case nil - (query-coding-region (point-min) (point-max) coding-system nil t) + (query-coding-region (point-min) (point-max) coding-system + (current-buffer) nil t) (text-conversion-error (setq text-conversion-error-signalled t))) - (Assert text-conversion-error-signalled) + (Assert text-conversion-error-signalled + "checking that unicode coding systems error correctly") (setq text-conversion-error-signalled nil) (condition-case nil - (query-coding-region (point-min) 173 coding-system nil t) + (query-coding-region (point-min) 173 coding-system + (current-buffer) + nil t) (text-conversion-error (setq text-conversion-error-signalled t))) - (Assert (null text-conversion-error-signalled))) + (Assert + (null text-conversion-error-signalled) + "checking that unicode coding systems do not error when unnecessary")) + (delete-region (point-min) (point-max)) + (insert (decode-coding-string "\xff\xff\xff\xff" + 'greek-iso-8bit-with-esc)) + (insert (decode-coding-string "\xff\xff\xff\xff" 'utf-8)) + (insert (decode-coding-string "\xff\xff\xff\xff" + 'greek-iso-8bit-with-esc)) + (dolist (coding-system '(utf-8 utf-16 utf-16-little-endian + utf-32 utf-32-little-endian)) + (multiple-value-bind (query-coding-succeeded query-coding-table) + (query-coding-region (point-min) (point-max) coding-system) + (Assert (null query-coding-succeeded) + (format + "checking %s fails with unmapped chars and invalid seqs" + coding-system)) + (Assert (equal query-coding-table + #s(range-table type start-closed-end-open + data ((1 5) unencodable + (5 9) invalid-sequence + (9 13) unencodable))) + (format + "checking %s fails with unmapped chars and invalid seqs" + coding-system))) + (multiple-value-bind (query-coding-succeeded query-coding-table) + (query-coding-region (point-min) (point-max) coding-system + (current-buffer) t) + (Assert (null query-coding-succeeded) + (format + "checking %s fails with unmapped chars sans invalid seqs" + coding-system)) + (Assert + (equal query-coding-table + #s(range-table type start-closed-end-open + data ((1 5) unencodable + (9 13) unencodable))) + (format + "checking %s fails correctly, unmapped chars sans invalid seqs" + coding-system)))) ;; Now to test #'encode-coding-char. Most of the functionality was ;; tested in the query-coding-region tests above, so we don't go into ;; as much detail here. - (Assert (null (encode-coding-char - (decode-char 'ucs #x20ac) 'iso-8859-1))) - (Assert (equal "\x80" (encode-coding-char - (decode-char 'ucs #x20ac) 'windows-1252))) + (Assert + (null (encode-coding-char + (decode-char 'ucs #x20ac) 'iso-8859-1)) + "check #'encode-coding-char doesn't think iso-8859-1 handles U+20AC") + (Assert + (equal "\x80" (encode-coding-char + (decode-char 'ucs #x20ac) 'windows-1252)) + "check #'encode-coding-char doesn't think windows-1252 handles U+0080") (delete-region (point-min) (point-max)) ;; And #'unencodable-char-position. (insert latin-1-chars-string) (insert (decode-char 'ucs #x20ac)) - (Assert (= 257 (unencodable-char-position (point-min) (point-max) - 'iso-8859-1))) - (Assert (equal '(257) (unencodable-char-position (point-min) (point-max) - 'iso-8859-1 1))) + (Assert + (= 257 (unencodable-char-position (point-min) (point-max) + 'iso-8859-1)) + "check #'unencodable-char-position doesn't think latin-1 encodes U+20AC") + (Assert + (equal '(257) (unencodable-char-position (point-min) (point-max) + 'iso-8859-1 1)) + "check #'unencodable-char-position doesn't think latin-1 encodes U+20AC") ;; Compatiblity, sigh: - (Assert (equal '(257) (unencodable-char-position (point-min) (point-max) - 'iso-8859-1 0))) + (Assert + (equal '(257) (unencodable-char-position (point-min) (point-max) + 'iso-8859-1 0)) + "check #'unencodable-char-position has some borked GNU semantics") (dotimes (i 6) (insert (decode-char 'ucs #x20ac))) ;; Check if it stops at one: (Assert (equal '(257) (unencodable-char-position (point-min) (point-max) - 'iso-8859-1 1))) + 'iso-8859-1 1)) + "check #'unencodable-char-position stops at 1 when asked to") ;; Check if it stops at four: (Assert (equal '(260 259 258 257) (unencodable-char-position (point-min) (point-max) - 'iso-8859-1 4))) + 'iso-8859-1 4)) + "check #'unencodable-char-position stops at 4 when asked to") ;; Check whether it stops at seven: (Assert (equal '(263 262 261 260 259 258 257) (unencodable-char-position (point-min) (point-max) - 'iso-8859-1 7))) + 'iso-8859-1 7)) + "check #'unencodable-char-position stops at 7 when asked to") ;; Check that it still stops at seven: (Assert (equal '(263 262 261 260 259 258 257) (unencodable-char-position (point-min) (point-max) - 'iso-8859-1 2000))) + 'iso-8859-1 2000)) + "check #'unencodable-char-position stops at 7 if 2000 asked for") ;; Now, #'check-coding-systems-region. ;; UTF-8 should certainly be able to encode these characters: (Assert (eq t (check-coding-systems-region (point-min) (point-max) - '(utf-8)))) - (Assert (equal '((iso-8859-1 257 258 259 260 261 262 263) - (windows-1252 129 131 132 133 134 135 136 137 138 139 - 140 141 143 146 147 148 149 150 151 152 - 153 154 155 156 157 159 160)) - (sort - (check-coding-systems-region (point-min) (point-max) - '(utf-8 iso-8859-1 - windows-1252)) - ;; (The sort is to make the algorithm irrelevant.) - #'(lambda (left right) - (string< (car left) (car right)))))) + '(utf-8))) + "check #'check-coding-systems-region gives t if encoding works") + (Assert + (equal '((iso-8859-1 257 258 259 260 261 262 263) + (windows-1252 129 130 131 132 133 134 135 136 + 137 138 139 140 141 142 143 144 + 145 146 147 148 149 150 151 152 + 153 154 155 156 157 158 159 160)) + (sort + (check-coding-systems-region (point-min) (point-max) + '(utf-8 iso-8859-1 + windows-1252)) + ;; (The sort is to make the algorithm irrelevant.) + #'(lambda (left right) + (string< (car left) (car right))))) + "check #'check-coding-systems-region behaves well given a list") ;; Ensure that the indices are all decreased by one when passed a ;; string: - (Assert (equal '((iso-8859-1 256 257 258 259 260 261 262) - (windows-1252 128 130 131 132 133 134 135 136 137 138 - 139 140 142 145 146 147 148 149 150 151 - 152 153 154 155 156 158 159)) - (sort - (check-coding-systems-region (buffer-string) nil - '(utf-8 iso-8859-1 - windows-1252)) - #'(lambda (left right) - (string< (car left) (car right))))))))) + (Assert + (equal '((iso-8859-1 256 257 258 259 260 261 262) + (windows-1252 128 129 130 131 132 133 134 135 + 136 137 138 139 140 141 142 143 + 144 145 146 147 148 149 150 151 + 152 153 154 155 156 157 158 159)) + (sort + (check-coding-systems-region (buffer-string) nil + '(utf-8 iso-8859-1 + windows-1252)) + #'(lambda (left right) + (string< (car left) (car right))))) + "check #'check-coding-systems-region behaves given a string and list")))) + +