Mercurial > hg > xemacs-beta
diff lisp/mule/mule-coding.el @ 4690:257b468bf2ca
Move the #'query-coding-region implementation to C.
This is necessary because there is no reasonable way to access the
corresponding mswindows-multibyte functionality from Lisp, and we need such
functionality if we're going to have a reliable and portable
#'query-coding-region implementation. However, this change doesn't yet
provide #'query-coding-region for the mswindow-multibyte coding systems,
there should be no functional differences between an XEmacs with this change
and one without it.
src/ChangeLog addition:
2009-09-19 Aidan Kehoe <kehoea@parhasard.net>
Move the #'query-coding-region implementation to C.
This is necessary because there is no reasonable way to access the
corresponding mswindows-multibyte functionality from Lisp, and we
need such functionality if we're going to have a reliable and
portable #'query-coding-region implementation. However, this
change doesn't yet provide #'query-coding-region for the
mswindow-multibyte coding systems, there should be no functional
differences between an XEmacs with this change and one without it.
* mule-coding.c (struct fixed_width_coding_system):
Add a new coding system type, fixed_width, and implement it. It
uses the CCL infrastructure but has a much simpler creation API,
and its own query_method, formerly in lisp/mule/mule-coding.el.
* unicode.c:
Move the Unicode query method implementation here from
unicode.el.
* lisp.h: Declare Fmake_coding_system_internal, Fcopy_range_table
here.
* intl-win32.c (complex_vars_of_intl_win32):
Use Fmake_coding_system_internal, not Fmake_coding_system.
* general-slots.h: Add Qsucceeded, Qunencodable, Qinvalid_sequence
here.
* file-coding.h (enum coding_system_variant):
Add fixed_width_coding_system here.
(struct coding_system_methods):
Add query_method and query_lstream_method to the coding system
methods.
Provide flags for the query methods.
Declare the default query method; initialise it correctly in
INITIALIZE_CODING_SYSTEM_TYPE.
* file-coding.c (default_query_method):
New function, the default query method for coding systems that do
not set it. Moved from coding.el.
(make_coding_system_1):
Accept new elements in PROPS in #'make-coding-system; aliases, a
list of aliases; safe-chars and safe-charsets (these were
previously accepted but not saved); and category.
(Fmake_coding_system_internal):
New function, what used to be #'make-coding-system--on Mule
builds, we've now moved some of the functionality of this to
Lisp.
(Fcoding_system_canonical_name_p):
Move this earlier in the file, since it's now called from within
make_coding_system_1.
(Fquery_coding_region):
Move the implementation of this here, from coding.el.
(complex_vars_of_file_coding):
Call Fmake_coding_system_internal, not Fmake_coding_system;
specify safe-charsets properties when we're a mule build.
* extents.h (mouse_highlight_priority, Fset_extent_priority,
Fset_extent_face, Fmap_extents):
Make these available to other C files.
lisp/ChangeLog addition:
2009-09-19 Aidan Kehoe <kehoea@parhasard.net>
Move the #'query-coding-region implementation to C.
* coding.el:
Consolidate code that depends on the presence or absence of Mule
at the end of this file.
(default-query-coding-region, query-coding-region):
Move these functions to C.
(default-query-coding-region-safe-charset-skip-chars-map):
Remove this variable, the corresponding C variable is
Vdefault_query_coding_region_chartab_cache in file-coding.c.
(query-coding-string): Update docstring to reflect actual multiple
values, be more careful about not modifying a range table that
we're currently mapping over.
(encode-coding-char): Make the implementation of this simpler.
(featurep 'mule): Autoload #'make-coding-system from
mule/make-coding-system.el if we're a mule build; provide an
appropriate compiler macro.
Do various non-mule compatibility things if we're not a mule
build.
* update-elc.el (additional-dump-dependencies):
Add mule/make-coding-system as a dump time dependency if we're a
mule build.
* unicode.el (ccl-encode-to-ucs-2):
(decode-char):
(encode-char):
Move these earlier in the file, for the sake of some byte compile
warnings.
(unicode-query-coding-region):
Move this to unicode.c
* mule/make-coding-system.el:
New file, not dumped. Contains the functionality to rework the
arguments necessary for fixed-width coding systems, and contains
the implementation of #'make-coding-system, which now calls
#'make-coding-system-internal.
* mule/vietnamese.el (viscii):
* mule/latin.el (iso-8859-2):
(windows-1250):
(iso-8859-3):
(iso-8859-4):
(iso-8859-14):
(iso-8859-15):
(iso-8859-16):
(iso-8859-9):
(macintosh):
(windows-1252):
* mule/hebrew.el (iso-8859-8):
* mule/greek.el (iso-8859-7):
(windows-1253):
* mule/cyrillic.el (iso-8859-5):
(koi8-r):
(koi8-u):
(windows-1251):
(alternativnyj):
(koi8-ru):
(koi8-t):
(koi8-c):
(koi8-o):
* mule/arabic.el (iso-8859-6):
(windows-1256):
Move all these coding systems to being of type fixed-width, not of
type CCL. This allows the distinct query-coding-region for them to
be in C, something which will eventually allow us to implement
query-coding-region for the mswindows-multibyte coding systems.
* mule/general-late.el (posix-charset-to-coding-system-hash):
Document why we're pre-emptively persuading the byte compiler that
the ELC for this file needs to be written using escape-quoted.
Call #'set-unicode-query-skip-chars-args, now the Unicode
query-coding-region implementation is in C.
* mule/thai-xtis.el (tis-620):
Don't bother checking whether we're XEmacs or not here.
* mule/mule-coding.el:
Move the eight bit fixed-width functionality from this file to
make-coding-system.el.
tests/ChangeLog addition:
2009-09-19 Aidan Kehoe <kehoea@parhasard.net>
* automated/mule-tests.el:
Check a coding system's type, not an 8-bit-fixed property, for
whether that coding system should be treated as a fixed-width
coding system.
* automated/query-coding-tests.el:
Don't test the query coding functionality for mswindows-multibyte
coding systems, it's not yet implemented.
author | Aidan Kehoe <kehoea@parhasard.net> |
---|---|
date | Sat, 19 Sep 2009 22:53:13 +0100 |
parents | c786c3fd0740 |
children | 308d34e9f07d |
line wrap: on
line diff
--- a/lisp/mule/mule-coding.el Sat Sep 19 17:56:23 2009 +0200 +++ b/lisp/mule/mule-coding.el Sat Sep 19 22:53:13 2009 +0100 @@ -28,9 +28,6 @@ ;;; split off of mule.el and mostly moved to coding.el -;; Needed for make-8-bit-coding-system. -(eval-when-compile (require 'ccl)) - ;;; Code: (defun coding-system-force-on-output (coding-system register) @@ -222,730 +219,16 @@ (put symbol 'translation-hash-table-id id) id)) -(defvar make-8-bit-private-use-start (decode-char 'ucs #xE000) - "Start of a 256 code private use area for make-8-bit-coding-system. - -This is used to ensure that distinct octets on disk for a given coding -system map to distinct XEmacs characters, preventing a spurious changes when -a file is read, not changed, and then written. ") - -(defun make-8-bit-generate-helper (decode-table encode-table - encode-failure-octet) - "Helper function, `make-8-bit-generate-encode-program-and-skip-chars-strings', -which see. - -Deals with the case where ASCII and another character set can both be -encoded unambiguously and completely into the coding-system; if this is so, -returns a list comprised of such a ccl-program and the character set in -question. If not, it returns a list with both entries nil." - (let ((tentative-encode-program-parts - (eval-when-compile - (let* ((vec-len 128) - (compiled - (append - (ccl-compile - `(1 - (loop - (read-multibyte-character r0 r1) - (if (r0 == ,(charset-id 'ascii)) - (write r1) - ((if (r0 == #xABAB) - ;; #xBFFE is a sentinel in the compiled - ;; program. - ((r0 = r1 & #x7F) - (write r0 ,(make-vector vec-len #xBFFE))) - ((mule-to-unicode r0 r1) - (if (r0 == #xFFFD) - (write #xBEEF) - ((lookup-integer encode-table-sym r0 r3) - (if r7 - (write-multibyte-character r0 r3) - (write #xBEEF)))))))) - (repeat)))) nil)) - (first-part compiled) - (last-part - (member-if-not (lambda (entr) (eq #xBFFE entr)) - (member-if - (lambda (entr) (eq #xBFFE entr)) - first-part)))) - (while compiled - (when (eq #xBFFE (cadr compiled)) - (assert (= vec-len (search '(#xBFFE) (cdr compiled) - :test #'/=)) nil - "Strange ccl vector length") - (setcdr compiled nil)) - (setq compiled (cdr compiled))) - ;; Is the generated code as we expect it to be? - (assert (and (memq #xABAB first-part) - (memq #xBEEF14 last-part)) - nil - "This code assumes that the constant #xBEEF is #xBEEF14 in \ -compiled CCL code,\nand that the constant #xABAB is #xABAB. If that is -not the case, and it appears not to be--that's why you're getting this -message--it will not work. ") - (list first-part last-part vec-len)))) - (charset-lower -1) - (charset-upper -1) - worth-trying known-charsets encode-program - other-charset-vector ucs) - - (loop for char across decode-table - do (pushnew (char-charset char) known-charsets)) - (setq known-charsets (delq 'ascii known-charsets)) - - (loop for known-charset in known-charsets - do - ;; This is not possible for two dimensional charsets. - (when (eq 1 (charset-dimension known-charset)) - (if (eq 'control-1 known-charset) - (setq charset-lower 0 - charset-upper 31) - ;; There should be a nicer way to get the limits here. - (condition-case args-out-of-range - (make-char known-charset #x100) - (args-out-of-range - (setq charset-lower (third args-out-of-range) - charset-upper (fourth args-out-of-range))))) - (loop - for i from charset-lower to charset-upper - always (and (setq ucs - (encode-char (make-char known-charset i) 'ucs)) - (gethash ucs encode-table)) - finally (setq worth-trying known-charset)) - - ;; Only trying this for one charset at a time, the first find. - (when worth-trying (return)) - - ;; Okay, this charset is not worth trying, Try the next. - (setq charset-lower -1 - charset-upper -1 - worth-trying nil))) - - (when worth-trying - (setq other-charset-vector - (make-vector (third tentative-encode-program-parts) - encode-failure-octet)) - (loop for i from charset-lower to charset-upper - do (aset other-charset-vector i - (gethash (encode-char (make-char worth-trying i) - 'ucs) encode-table))) - (setq encode-program - (nsublis - (list (cons #xABAB (charset-id worth-trying))) - (nconc - (copy-list (first - tentative-encode-program-parts)) - (append other-charset-vector nil) - (copy-tree (second - tentative-encode-program-parts)))))) - (values encode-program worth-trying))) - -(defun make-8-bit-generate-encode-program-and-skip-chars-strings - (decode-table encode-table encode-failure-octet) - "Generate a CCL program to encode a 8-bit fixed-width charset. - -DECODE-TABLE must have 256 non-cons entries, and will be regarded as -describing a map from the octet corresponding to an offset in the -table to the that entry in the table. ENCODE-TABLE is a hash table -map from unicode values to characters in the range [0,255]. -ENCODE-FAILURE-OCTET describes an integer between 0 and 255 -\(inclusive) to write in the event that a character cannot be encoded. " - (check-argument-type #'vectorp decode-table) - (check-argument-range (length decode-table) #x100 #x100) - (check-argument-type #'hash-table-p encode-table) - (check-argument-type #'integerp encode-failure-octet) - (check-argument-range encode-failure-octet #x00 #xFF) - (let ((encode-program nil) - (general-encode-program - (eval-when-compile - (let ((prog (append - (ccl-compile - `(1 - (loop - (read-multibyte-character r0 r1) - (mule-to-unicode r0 r1) - (if (r0 == #xFFFD) - (write #xBEEF) - ((lookup-integer encode-table-sym r0 r3) - (if r7 - (write-multibyte-character r0 r3) - (write #xBEEF)))) - (repeat)))) nil))) - (assert (memq #xBEEF14 prog) - nil - "This code assumes that the constant #xBEEF is #xBEEF14 \ -in compiled CCL code.\nIf that is not the case, and it appears not to -be--that's why you're getting this message--it will not work. ") - prog))) - (encode-program-with-ascii-optimisation - (eval-when-compile - (let ((prog (append - (ccl-compile - `(1 - (loop - (read-multibyte-character r0 r1) - (if (r0 == ,(charset-id 'ascii)) - (write r1) - ((mule-to-unicode r0 r1) - (if (r0 == #xFFFD) - (write #xBEEF) - ((lookup-integer encode-table-sym r0 r3) - (if r7 - (write-multibyte-character r0 r3) - (write #xBEEF)))))) - (repeat)))) nil))) - (assert (memq #xBEEF14 prog) - nil - "This code assumes that the constant #xBEEF is #xBEEF14 \ -in compiled CCL code.\nIf that is not the case, and it appears not to -be--that's why you're getting this message--it will not work. ") - prog))) - (ascii-encodes-as-itself nil) - (control-1-encodes-as-itself t) - (invalid-sequence-code-point-start - (eval-when-compile - (char-to-unicode - (aref (decode-coding-string "\xd8\x00\x00\x00" 'utf-16-be) 3)))) - further-char-set skip-chars invalid-sequences-skip-chars) - - ;; Is this coding system ASCII-compatible? If so, we can avoid the hash - ;; table lookup for those characters. - (loop - for i from #x00 to #x7f - always (eq (int-to-char i) (gethash i encode-table)) - finally (setq ascii-encodes-as-itself t)) - - ;; Note that this logic handles EBCDIC badly. For example, CP037, - ;; MIME name ebcdic-na, has the entire repertoire of ASCII and - ;; Latin 1, and thus a more optimal ccl encode program would check - ;; for those character sets and use tables. But for now, we do a - ;; hash table lookup for every character. - (if (null ascii-encodes-as-itself) - ;; General encode program. Pros; general and correct. Cons; - ;; slow, a hash table lookup + mule-unicode conversion is done - ;; for every character encoding. - (setq encode-program general-encode-program) - (multiple-value-setq - (encode-program further-char-set) - ;; Encode program with ascii-ascii mapping (based on a - ;; character's mule character set), and one other mule - ;; character set using table-based encoding, other - ;; character sets using hash table lookups. - ;; make-8-bit-non-ascii-completely-coveredp only returns - ;; such a mapping if some non-ASCII charset with - ;; characters in decode-table is entirely covered by - ;; encode-table. - (make-8-bit-generate-helper decode-table encode-table - encode-failure-octet)) - (unless encode-program - ;; If make-8-bit-non-ascii-completely-coveredp returned nil, - ;; but ASCII still encodes as itself, do one-to-one mapping - ;; for ASCII, and a hash table lookup for everything else. - (setq encode-program encode-program-with-ascii-optimisation))) - - (setq encode-program - (nsublis - (list (cons #xBEEF14 - (logior (lsh encode-failure-octet 8) - #x14))) - (copy-tree encode-program))) - (loop - for i from #x80 to #x9f - do (unless (= i (aref decode-table i)) - (setq control-1-encodes-as-itself nil) - (return))) - (loop - for i from #x00 to #xFF - initially (setq skip-chars - (cond - ((and ascii-encodes-as-itself - control-1-encodes-as-itself further-char-set) - (concat "\x00-\x9f" (charset-skip-chars-string - further-char-set))) - ((and ascii-encodes-as-itself - control-1-encodes-as-itself) - "\x00-\x9f") - ((null ascii-encodes-as-itself) - (skip-chars-quote (apply #'string - (append decode-table nil)))) - (further-char-set - (concat (charset-skip-chars-string 'ascii) - (charset-skip-chars-string further-char-set))) - (t - (charset-skip-chars-string 'ascii))) - invalid-sequences-skip-chars "") - with decoded-ucs = nil - with decoded = nil - with no-ascii-transparency-skip-chars-list = - (unless ascii-encodes-as-itself (append decode-table nil)) - ;; Can't use #'match-string here, see: - ;; http://mid.gmane.org/18829.34118.709782.704574@parhasard.net - with skip-chars-test = - #'(lambda (skip-chars-string testing) - (with-temp-buffer - (insert testing) - (goto-char (point-min)) - (skip-chars-forward skip-chars-string) - (= (point) (point-max)))) - do - (setq decoded (aref decode-table i) - decoded-ucs (char-to-unicode decoded)) - (cond - ((<= invalid-sequence-code-point-start decoded-ucs - (+ invalid-sequence-code-point-start #xFF)) - (setq invalid-sequences-skip-chars - (concat (string decoded) - invalid-sequences-skip-chars)) - (assert (not (funcall skip-chars-test skip-chars decoded)) - "This char should only be skipped with \ -`invalid-sequences-skip-chars', not by `skip-chars'")) - ((not (funcall skip-chars-test skip-chars decoded)) - (if ascii-encodes-as-itself - (setq skip-chars (concat skip-chars (string decoded))) - (push decoded no-ascii-transparency-skip-chars-list)))) - finally (unless ascii-encodes-as-itself - (setq skip-chars - (skip-chars-quote - (apply #'string - no-ascii-transparency-skip-chars-list))))) - (values encode-program skip-chars invalid-sequences-skip-chars))) - -(defun make-8-bit-create-decode-encode-tables (unicode-map) - "Return a list \(DECODE-TABLE ENCODE-TABLE) given UNICODE-MAP. -UNICODE-MAP should be an alist mapping from integer octet values to -characters with UCS code points; DECODE-TABLE will be a 256-element -vector, and ENCODE-TABLE will be a hash table mapping from 256 numbers -to 256 distinct characters. " - (check-argument-type #'listp unicode-map) - (let ((decode-table (make-vector 256 nil)) - (encode-table (make-hash-table :size 256)) - (private-use-start (encode-char make-8-bit-private-use-start 'ucs)) - (invalid-sequence-code-point-start - (eval-when-compile - (char-to-unicode - (aref (decode-coding-string "\xd8\x00\x00\x00" 'utf-16-be) 3)))) - desired-ucs decode-table-entry) - - (loop for (external internal) - in unicode-map - do - (aset decode-table external internal) - (assert (not (eq (encode-char internal 'ucs) -1)) - nil - "Looks like you're calling make-8-bit-coding-system in a \ -dumped file, \nand you're either not providing a literal UNICODE-MAP -or PROPS. Don't do that; make-8-bit-coding-system relies on sensible -Unicode mappings being available, which they are at compile time for -dumped files (but this requires the mentioned literals), but not, for -most of them, at run time. ") - - (puthash (encode-char internal 'ucs) - ;; This is semantically an integer, but Dave Love's design - ;; for lookup-integer in CCL means we need to store it as a - ;; character. - (int-to-char external) - encode-table)) - - ;; Now, go through the decode table. For octet values above #x7f, if the - ;; decode table entry is nil, this means that they have an undefined - ;; mapping (= they map to XEmacs characters with keys in - ;; unicode-error-default-translation-table); for octet values below or - ;; equal to #x7f, it means that they map to ASCII. - - ;; If any entry (whether below or above #x7f) in the decode-table - ;; already maps to some character with a key in - ;; unicode-error-default-translation-table, it is treated as an - ;; undefined octet by `query-coding-region'. That is, it is not - ;; necessary for an octet value to be above #x7f for this to happen. - - (dotimes (i 256) - (setq decode-table-entry (aref decode-table i)) - (if decode-table-entry - (when (get-char-table - decode-table-entry - unicode-error-default-translation-table) - ;; The caller is explicitly specifying that this octet - ;; corresponds to an invalid sequence on disk: - (assert (= (get-char-table - decode-table-entry - unicode-error-default-translation-table) i) - "Bad argument to `make-8-bit-coding-system'. -If you're going to designate an octet with value below #x80 as invalid -for this coding system, make sure to map it to the invalid sequence -character corresponding to its octet value on disk. ")) - - ;; decode-table-entry is nil; either the octet is to be treated as - ;; contributing to an error sequence (when (> #x7f i)), or it should - ;; be attempted to treat it as ASCII-equivalent. - (setq desired-ucs (or (and (< i #x80) i) - (+ invalid-sequence-code-point-start i))) - (while (gethash desired-ucs encode-table) - (assert (not (< i #x80)) - "UCS code point should not already be in encode-table!" - ;; There is one invalid sequence char per octet value; - ;; with eight-bit-fixed coding systems, it makes no sense - ;; for us to be multiply allocating them. - (gethash desired-ucs encode-table)) - (setq desired-ucs (+ private-use-start desired-ucs) - private-use-start (+ private-use-start 1))) - (puthash desired-ucs (int-to-char i) encode-table) - (setq desired-ucs (if (> desired-ucs #xFF) - (unicode-to-char desired-ucs) - ;; So we get Latin-1 when run at dump time, - ;; instead of JIT-allocated characters. - (int-to-char desired-ucs))) - (aset decode-table i desired-ucs))) - (values decode-table encode-table))) - -(defun make-8-bit-generate-decode-program (decode-table) - "Given DECODE-TABLE, generate a CCL program to decode an 8-bit charset. -DECODE-TABLE must have 256 non-cons entries, and will be regarded as -describing a map from the octet corresponding to an offset in the -table to the that entry in the table. " - (check-argument-type #'vectorp decode-table) - (check-argument-range (length decode-table) #x100 #x100) - (let ((decode-program-parts - (eval-when-compile - (let* ((compiled - (append - (ccl-compile - `(3 - ((read r0) - (loop - (write-read-repeat r0 ,(make-vector - 256 'sentinel)))))) nil)) - (first-part compiled) - (last-part - (member-if-not #'symbolp - (member-if-not #'integerp first-part)))) - ;; Chop off the sentinel sentinel sentinel [..] part. - (while compiled - (if (symbolp (cadr compiled)) - (setcdr compiled nil)) - (setq compiled (cdr compiled))) - (list first-part last-part))))) - (nconc - ;; copy-list needed, because the structure of the literal provided - ;; by our eval-when-compile hangs around. - (copy-list (first decode-program-parts)) - (append decode-table nil) - (second decode-program-parts)))) - -(defun make-8-bit-choose-category (decode-table) - "Given DECODE-TABLE, return an appropriate coding category. -DECODE-TABLE is a 256-entry vector describing the mapping from octets on -disk to XEmacs characters for some fixed-width 8-bit coding system. " - (check-argument-type #'vectorp decode-table) - (check-argument-range (length decode-table) #x100 #x100) - (loop - named category - for i from #x80 to #x9F - do (unless (= i (aref decode-table i)) - (return-from category 'no-conversion)) - finally return 'iso-8-1)) - -(defun 8-bit-fixed-query-coding-region (begin end coding-system &optional - buffer ignore-invalid-sequencesp - errorp highlightp) - "The `query-coding-region' implementation for 8-bit-fixed coding systems. - -Uses the `8-bit-fixed-query-from-unicode' and `8-bit-fixed-query-skip-chars' -coding system properties. The former is a hash table mapping from valid -Unicode code points to on-disk octets in the coding system; the latter a set -of characters as used by `skip-chars-forward'. Both of these properties are -generated automatically by `make-8-bit-coding-system'. - -See that the documentation of `query-coding-region'; see also -`make-8-bit-coding-system'. " - (check-argument-type #'coding-system-p - (setq coding-system (find-coding-system coding-system))) - (check-argument-type #'integer-or-marker-p begin) - (check-argument-type #'integer-or-marker-p end) - (let ((from-unicode - (or (coding-system-get coding-system '8-bit-fixed-query-from-unicode) - (coding-system-get (coding-system-base coding-system) - '8-bit-fixed-query-from-unicode))) - (skip-chars-arg - (or (coding-system-get coding-system '8-bit-fixed-query-skip-chars) - (coding-system-get (coding-system-base coding-system) - '8-bit-fixed-query-skip-chars))) - (invalid-sequences-skip-chars - (or (coding-system-get coding-system - '8-bit-fixed-invalid-sequences-skip-chars) - (coding-system-get (coding-system-base coding-system) - '8-bit-fixed-invalid-sequences-skip-chars))) - (ranges (make-range-table)) - (case-fold-search nil) - char-after fail-range-start fail-range-end extent - failed invalid-sequences-looking-at failed-reason - previous-failed-reason) - (check-type from-unicode hash-table) - (check-type skip-chars-arg string) - (check-type invalid-sequences-skip-chars string) - (setq invalid-sequences-looking-at - (if (equal "" invalid-sequences-skip-chars) - ;; Regexp that will never match. - #r".\{0,0\}" - (concat "[" invalid-sequences-skip-chars "]"))) - (when ignore-invalid-sequencesp - (setq skip-chars-arg - (concat skip-chars-arg invalid-sequences-skip-chars))) - (save-excursion - (when highlightp - (query-coding-clear-highlights begin end buffer)) - (goto-char begin buffer) - (skip-chars-forward skip-chars-arg end buffer) - (while (< (point buffer) end) - (setq char-after (char-after (point buffer) buffer) - fail-range-start (point buffer)) - (while (and - (< (point buffer) end) - (or (and - (not (gethash (encode-char char-after 'ucs) from-unicode)) - (setq failed-reason 'unencodable)) - (and (not ignore-invalid-sequencesp) - (looking-at invalid-sequences-looking-at buffer) - (setq failed-reason 'invalid-sequence))) - (or (null previous-failed-reason) - (eq previous-failed-reason failed-reason))) - (forward-char 1 buffer) - (setq char-after (char-after (point buffer) buffer) - failed t - previous-failed-reason failed-reason)) - (if (= fail-range-start (point buffer)) - ;; The character can actually be encoded by the coding - ;; system; check the characters past it. - (forward-char 1 buffer) - ;; The character actually failed. - (when errorp - (error 'text-conversion-error - (format "Cannot encode %s using coding system" - (buffer-substring fail-range-start (point buffer) - buffer)) - (coding-system-name coding-system))) - (assert (not (null previous-failed-reason)) t - "previous-failed-reason should always be non-nil here") - (put-range-table fail-range-start - ;; If char-after is non-nil, we're not at - ;; the end of the buffer. - (setq fail-range-end (if char-after - (point buffer) - (point-max buffer))) - previous-failed-reason ranges) - (setq previous-failed-reason nil) - (when highlightp - (setq extent (make-extent fail-range-start fail-range-end buffer)) - (set-extent-priority extent (+ mouse-highlight-priority 2)) - (set-extent-face extent 'query-coding-warning-face)) - (skip-chars-forward skip-chars-arg end buffer))) - (if failed - (values nil ranges) - (values t nil))))) - -(defun make-8-bit-coding-system (name unicode-map &optional description props) - "Make and return a fixed-width 8-bit CCL coding system named NAME. -NAME must be a symbol, and UNICODE-MAP a list. - -UNICODE-MAP is a plist describing a map from octets in the coding -system NAME (as integers) to XEmacs characters. Those XEmacs -characters will be used explicitly on decoding, but for encoding (most -relevantly, on writing to disk) XEmacs characters that map to the same -Unicode code point will be unified. This means that the ISO-8859-? -characters that map to the same Unicode code point will not be -distinct when written to disk, which is normally what is intended; it -also means that East Asian Han characters from different XEmacs -character sets will not be distinct when written to disk, which is -less often what is intended. - -Any octets not mapped, and with values above #x7f, will be decoded into -XEmacs characters that reflect that their values are undefined. These -characters will be displayed in a language-environment-specific way. See -`unicode-error-default-translation-table' and the -`invalid-sequence-coding-system' argument to `set-language-info'. - -These characters will normally be treated as invalid when checking whether -text can be encoded with `query-coding-region'--see the -IGNORE-INVALID-SEQUENCESP argument to that function to avoid this. It is -possible to specify that octets with values less than #x80 (or indeed -greater than it) be treated in this way, by specifying explicitly that they -correspond to the character mapping to that octet in -`unicode-error-default-translation-table'. Far fewer coding systems -override the ASCII mapping, though, so this is not the default. - -DESCRIPTION and PROPS are as in `make-coding-system', which see. This -function also accepts two additional (optional) properties in PROPS; -`aliases', giving a list of aliases to be initialized for this -coding-system, and `encode-failure-octet', an integer between 0 and 256 to -write in place of XEmacs characters that cannot be encoded, defaulting to -the code for tilde `~'. " - (check-argument-type #'symbolp name) - (check-argument-type #'listp unicode-map) - (check-argument-type #'stringp - (or description - (setq description - (format "Coding system used for %s." name)))) - (check-valid-plist props) - (let ((encode-failure-octet (or (plist-get props 'encode-failure-octet) - (char-to-int ?~))) - (aliases (plist-get props 'aliases)) - (hash-table-sym (gentemp (format "%s-encode-table" name))) - encode-program decode-program result decode-table encode-table - skip-chars invalid-sequences-skip-chars) - - ;; Some more sanity checking. - (check-argument-range encode-failure-octet 0 #xFF) - (check-argument-type #'listp aliases) - - ;; Don't pass on our extra data to make-coding-system. - (setq props (plist-remprop props 'encode-failure-octet) - props (plist-remprop props 'aliases)) - - (multiple-value-setq - (decode-table encode-table) - (make-8-bit-create-decode-encode-tables unicode-map)) - - ;; Register the decode-table. - (define-translation-hash-table hash-table-sym encode-table) - - ;; Generate the programs and skip-chars strings. - (setq decode-program (make-8-bit-generate-decode-program decode-table)) - (multiple-value-setq - (encode-program skip-chars invalid-sequences-skip-chars) - (make-8-bit-generate-encode-program-and-skip-chars-strings - decode-table encode-table encode-failure-octet)) - - (unless (vectorp encode-program) - (setq encode-program - (apply #'vector - (nsublis (list (cons 'encode-table-sym hash-table-sym)) - (copy-tree encode-program))))) - (unless (vectorp decode-program) - (setq decode-program - (apply #'vector decode-program))) - - ;; And now generate the actual coding system. - (setq result - (make-coding-system - name 'ccl - description - (plist-put (plist-put props 'decode decode-program) - 'encode encode-program))) - (coding-system-put name '8-bit-fixed t) - (coding-system-put name 'category - (make-8-bit-choose-category decode-table)) - (coding-system-put name '8-bit-fixed-query-skip-chars - skip-chars) - (coding-system-put name '8-bit-fixed-invalid-sequences-skip-chars - invalid-sequences-skip-chars) - (coding-system-put name '8-bit-fixed-query-from-unicode encode-table) - (coding-system-put name 'query-coding-function - #'8-bit-fixed-query-coding-region) - (coding-system-put (intern (format "%s-unix" name)) - 'query-coding-function - #'8-bit-fixed-query-coding-region) - (coding-system-put (intern (format "%s-dos" name)) - 'query-coding-function - #'8-bit-fixed-query-coding-region) - (coding-system-put (intern (format "%s-mac" name)) - 'query-coding-function - #'8-bit-fixed-query-coding-region) - (loop for alias in aliases - do (define-coding-system-alias alias name)) - result)) - -(define-compiler-macro make-8-bit-coding-system (&whole form name unicode-map - &optional description props) - ;; We provide the compiler macro (= macro that is expanded only on - ;; compilation, and that can punt to a runtime version of the - ;; associate function if necessary) not for reasons of speed, though - ;; it does speed up things at runtime a little, but because the - ;; Unicode mappings are available at compile time in the dumped - ;; files, but they are not available at run time for the vast - ;; majority of them. - - (if (not (and (and (consp name) (eq (car name) 'quote)) - (and (consp unicode-map) (eq (car unicode-map) 'quote)) - (and (or (and (consp props) (eq (car props) 'quote)) - (null props))))) - ;; The call does not use literals; do it at runtime. - form - (setq name (cadr name) - unicode-map (cadr unicode-map) - props (if props (cadr props))) - (let ((encode-failure-octet - (or (plist-get props 'encode-failure-octet) (char-to-int ?~))) - (aliases (plist-get props 'aliases)) - encode-program decode-program - decode-table encode-table - skip-chars invalid-sequences-skip-chars) - - ;; Some sanity checking. - (check-argument-range encode-failure-octet 0 #xFF) - (check-argument-type #'listp aliases) - - ;; Don't pass on our extra data to make-coding-system. - (setq props (plist-remprop props 'encode-failure-octet) - props (plist-remprop props 'aliases)) - - ;; Work out encode-table and decode-table - (multiple-value-setq - (decode-table encode-table) - (make-8-bit-create-decode-encode-tables unicode-map)) - - ;; Generate the decode and encode programs, and the skip-chars - ;; arguments. - (setq decode-program (make-8-bit-generate-decode-program decode-table)) - (multiple-value-setq - (encode-program skip-chars invalid-sequences-skip-chars) - (make-8-bit-generate-encode-program-and-skip-chars-strings - decode-table encode-table encode-failure-octet)) - - ;; And return the generated code. - `(let ((encode-table-sym (gentemp (format "%s-encode-table" ',name))) - (encode-table ,encode-table)) - (define-translation-hash-table encode-table-sym encode-table) - (make-coding-system - ',name 'ccl ,description - (plist-put (plist-put ',props 'decode - ,(apply #'vector decode-program)) - 'encode - (apply #'vector - (nsublis - (list (cons - 'encode-table-sym - (symbol-value 'encode-table-sym))) - ',encode-program)))) - (coding-system-put ',name '8-bit-fixed t) - (coding-system-put ',name 'category - ',(make-8-bit-choose-category decode-table)) - (coding-system-put ',name '8-bit-fixed-query-skip-chars - ,skip-chars) - (coding-system-put ',name '8-bit-fixed-invalid-sequences-skip-chars - ,invalid-sequences-skip-chars) - (coding-system-put ',name '8-bit-fixed-query-from-unicode encode-table) - (coding-system-put ',name 'query-coding-function - #'8-bit-fixed-query-coding-region) - (coding-system-put ',(intern (format "%s-unix" name)) - 'query-coding-function - #'8-bit-fixed-query-coding-region) - (coding-system-put ',(intern (format "%s-dos" name)) - 'query-coding-function - #'8-bit-fixed-query-coding-region) - (coding-system-put ',(intern (format "%s-mac" name)) - 'query-coding-function - #'8-bit-fixed-query-coding-region) - ,(macroexpand `(loop for alias in ',aliases - do (define-coding-system-alias alias - ',name))) - (find-coding-system ',name))))) - ;; Ideally this would be in latin.el, but code-init.el uses it. -(make-8-bit-coding-system +(make-coding-system 'iso-8859-1 - (loop - for i from #x80 to #xff - collect (list i (int-char i))) ;; Identical to Latin-1. + 'fixed-width "ISO-8859-1 (Latin-1)" - '(mnemonic "Latin 1" - documentation "The most used encoding of Western Europe and the Americas." - aliases (iso-latin-1 latin-1))) + (eval-when-compile + `(unicode-map + ,(loop + for i from #x80 to #xff + collect (list i (int-char i))) ;; Identical to Latin-1. + mnemonic "Latin 1" + documentation "The most used encoding of Western Europe and the Americas." + aliases (iso-latin-1 latin-1))))