Mercurial > hg > xemacs-beta

diff lisp/mule/mule-coding.el @ 4690:257b468bf2ca
Move the #'query-coding-region implementation to C. This is necessary because there is no reasonable way to access the corresponding mswindows-multibyte functionality from Lisp, and we need such functionality if we're going to have a reliable and portable #'query-coding-region implementation. However, this change doesn't yet provide #'query-coding-region for the mswindow-multibyte coding systems, there should be no functional differences between an XEmacs with this change and one without it. src/ChangeLog addition: 2009-09-19 Aidan Kehoe <kehoea@parhasard.net> Move the #'query-coding-region implementation to C. This is necessary because there is no reasonable way to access the corresponding mswindows-multibyte functionality from Lisp, and we need such functionality if we're going to have a reliable and portable #'query-coding-region implementation. However, this change doesn't yet provide #'query-coding-region for the mswindow-multibyte coding systems, there should be no functional differences between an XEmacs with this change and one without it. * mule-coding.c (struct fixed_width_coding_system): Add a new coding system type, fixed_width, and implement it. It uses the CCL infrastructure but has a much simpler creation API, and its own query_method, formerly in lisp/mule/mule-coding.el. * unicode.c: Move the Unicode query method implementation here from unicode.el. * lisp.h: Declare Fmake_coding_system_internal, Fcopy_range_table here. * intl-win32.c (complex_vars_of_intl_win32): Use Fmake_coding_system_internal, not Fmake_coding_system. * general-slots.h: Add Qsucceeded, Qunencodable, Qinvalid_sequence here. * file-coding.h (enum coding_system_variant): Add fixed_width_coding_system here. (struct coding_system_methods): Add query_method and query_lstream_method to the coding system methods. Provide flags for the query methods. Declare the default query method; initialise it correctly in INITIALIZE_CODING_SYSTEM_TYPE. * file-coding.c (default_query_method): New function, the default query method for coding systems that do not set it. Moved from coding.el. (make_coding_system_1): Accept new elements in PROPS in #'make-coding-system; aliases, a list of aliases; safe-chars and safe-charsets (these were previously accepted but not saved); and category. (Fmake_coding_system_internal): New function, what used to be #'make-coding-system--on Mule builds, we've now moved some of the functionality of this to Lisp. (Fcoding_system_canonical_name_p): Move this earlier in the file, since it's now called from within make_coding_system_1. (Fquery_coding_region): Move the implementation of this here, from coding.el. (complex_vars_of_file_coding): Call Fmake_coding_system_internal, not Fmake_coding_system; specify safe-charsets properties when we're a mule build. * extents.h (mouse_highlight_priority, Fset_extent_priority, Fset_extent_face, Fmap_extents): Make these available to other C files. lisp/ChangeLog addition: 2009-09-19 Aidan Kehoe <kehoea@parhasard.net> Move the #'query-coding-region implementation to C. * coding.el: Consolidate code that depends on the presence or absence of Mule at the end of this file. (default-query-coding-region, query-coding-region): Move these functions to C. (default-query-coding-region-safe-charset-skip-chars-map): Remove this variable, the corresponding C variable is Vdefault_query_coding_region_chartab_cache in file-coding.c. (query-coding-string): Update docstring to reflect actual multiple values, be more careful about not modifying a range table that we're currently mapping over. (encode-coding-char): Make the implementation of this simpler. (featurep 'mule): Autoload #'make-coding-system from mule/make-coding-system.el if we're a mule build; provide an appropriate compiler macro. Do various non-mule compatibility things if we're not a mule build. * update-elc.el (additional-dump-dependencies): Add mule/make-coding-system as a dump time dependency if we're a mule build. * unicode.el (ccl-encode-to-ucs-2): (decode-char): (encode-char): Move these earlier in the file, for the sake of some byte compile warnings. (unicode-query-coding-region): Move this to unicode.c * mule/make-coding-system.el: New file, not dumped. Contains the functionality to rework the arguments necessary for fixed-width coding systems, and contains the implementation of #'make-coding-system, which now calls #'make-coding-system-internal. * mule/vietnamese.el (viscii): * mule/latin.el (iso-8859-2): (windows-1250): (iso-8859-3): (iso-8859-4): (iso-8859-14): (iso-8859-15): (iso-8859-16): (iso-8859-9): (macintosh): (windows-1252): * mule/hebrew.el (iso-8859-8): * mule/greek.el (iso-8859-7): (windows-1253): * mule/cyrillic.el (iso-8859-5): (koi8-r): (koi8-u): (windows-1251): (alternativnyj): (koi8-ru): (koi8-t): (koi8-c): (koi8-o): * mule/arabic.el (iso-8859-6): (windows-1256): Move all these coding systems to being of type fixed-width, not of type CCL. This allows the distinct query-coding-region for them to be in C, something which will eventually allow us to implement query-coding-region for the mswindows-multibyte coding systems. * mule/general-late.el (posix-charset-to-coding-system-hash): Document why we're pre-emptively persuading the byte compiler that the ELC for this file needs to be written using escape-quoted. Call #'set-unicode-query-skip-chars-args, now the Unicode query-coding-region implementation is in C. * mule/thai-xtis.el (tis-620): Don't bother checking whether we're XEmacs or not here. * mule/mule-coding.el: Move the eight bit fixed-width functionality from this file to make-coding-system.el. tests/ChangeLog addition: 2009-09-19 Aidan Kehoe <kehoea@parhasard.net> * automated/mule-tests.el: Check a coding system's type, not an 8-bit-fixed property, for whether that coding system should be treated as a fixed-width coding system. * automated/query-coding-tests.el: Don't test the query coding functionality for mswindows-multibyte coding systems, it's not yet implemented.
author: Aidan Kehoe <kehoea@parhasard.net>
date: Sat, 19 Sep 2009 22:53:13 +0100
parents: c786c3fd0740
children: 308d34e9f07d
--- a/lisp/mule/mule-coding.el	Sat Sep 19 17:56:23 2009 +0200
+++ b/lisp/mule/mule-coding.el	Sat Sep 19 22:53:13 2009 +0100
@@ -28,9 +28,6 @@
 
 ;;; split off of mule.el and mostly moved to coding.el
 
-;; Needed for make-8-bit-coding-system. 
-(eval-when-compile (require 'ccl))
-
 ;;; Code:
 
 (defun coding-system-force-on-output (coding-system register)
@@ -222,730 +219,16 @@
     (put symbol 'translation-hash-table-id id)
     id))
 
-(defvar make-8-bit-private-use-start (decode-char 'ucs #xE000)
-  "Start of a 256 code private use area for make-8-bit-coding-system.
-
-This is used to ensure that distinct octets on disk for a given coding
-system map to distinct XEmacs characters, preventing a spurious changes when
-a file is read, not changed, and then written.  ")
-
-(defun make-8-bit-generate-helper (decode-table encode-table
-				   encode-failure-octet)
-  "Helper function, `make-8-bit-generate-encode-program-and-skip-chars-strings',
-which see.
-
-Deals with the case where ASCII and another character set can both be
-encoded unambiguously and completely into the coding-system; if this is so,
-returns a list comprised of such a ccl-program and the character set in
-question.  If not, it returns a list with both entries nil."
-  (let ((tentative-encode-program-parts
-	 (eval-when-compile 
-	   (let* ((vec-len 128)
-		  (compiled 
-		   (append
-                    (ccl-compile
-                     `(1
-                       (loop
-                         (read-multibyte-character r0 r1)
-                         (if (r0 == ,(charset-id 'ascii))
-                             (write r1)
-                           ((if (r0 == #xABAB)
-                                ;; #xBFFE is a sentinel in the compiled
-                                ;; program.
-				((r0 = r1 & #x7F)
-				 (write r0 ,(make-vector vec-len #xBFFE)))
-                              ((mule-to-unicode r0 r1)
-                               (if (r0 == #xFFFD)
-                                   (write #xBEEF)
-                                 ((lookup-integer encode-table-sym r0 r3)
-                                  (if r7
-                                      (write-multibyte-character r0 r3)
-                                    (write #xBEEF))))))))
-                         (repeat)))) nil))
-		  (first-part compiled)
-		  (last-part
-		   (member-if-not (lambda (entr) (eq #xBFFE entr))
-				  (member-if
-                                   (lambda (entr) (eq #xBFFE entr))
-                                   first-part))))
-	     (while compiled
-	       (when (eq #xBFFE (cadr compiled))
-		 (assert (= vec-len (search '(#xBFFE) (cdr compiled)
-					    :test #'/=)) nil
-					    "Strange ccl vector length")
-		 (setcdr compiled nil))
-	       (setq compiled (cdr compiled)))
-             ;; Is the generated code as we expect it to be?
-	     (assert (and (memq #xABAB first-part)
-			  (memq #xBEEF14 last-part))
-	    nil
-	    "This code assumes that the constant #xBEEF is #xBEEF14 in \
-compiled CCL code,\nand that the constant #xABAB is #xABAB. If that is
-not the case, and it appears not to be--that's why you're getting this
-message--it will not work.  ")
-	     (list first-part last-part vec-len))))
-	(charset-lower -1)
-	(charset-upper -1)
-	worth-trying known-charsets encode-program
-	other-charset-vector ucs)
-
-    (loop for char across decode-table
-      do (pushnew (char-charset char) known-charsets))
-    (setq known-charsets (delq 'ascii known-charsets))
-
-    (loop for known-charset in known-charsets 
-      do
-      ;; This is not possible for two dimensional charsets. 
-      (when (eq 1 (charset-dimension known-charset))
-        (if (eq 'control-1 known-charset)
-            (setq charset-lower 0
-                  charset-upper 31)
-	  ;; There should be a nicer way to get the limits here.
-          (condition-case args-out-of-range
-              (make-char known-charset #x100)
-            (args-out-of-range 
-             (setq charset-lower (third args-out-of-range)
-                   charset-upper (fourth args-out-of-range)))))
-	(loop
-	  for i from charset-lower to charset-upper
-	  always (and (setq ucs 
-			    (encode-char (make-char known-charset i) 'ucs))
-		      (gethash ucs encode-table))
-	  finally (setq worth-trying known-charset))
-
-	;; Only trying this for one charset at a time, the first find.
-	(when worth-trying (return))
-
-	;; Okay, this charset is not worth trying, Try the next.
-	(setq charset-lower -1
-	      charset-upper -1
-	      worth-trying nil)))
-
-    (when worth-trying
-      (setq other-charset-vector
-	    (make-vector (third tentative-encode-program-parts)
-			 encode-failure-octet))
-      (loop for i from charset-lower to charset-upper
-        do (aset other-charset-vector i
-		 (gethash (encode-char (make-char worth-trying i)
-				       'ucs) encode-table)))
-      (setq encode-program
-            (nsublis
-             (list (cons #xABAB (charset-id worth-trying)))
-             (nconc
-              (copy-list (first 
-                          tentative-encode-program-parts))
-              (append other-charset-vector nil)
-              (copy-tree (second 
-                          tentative-encode-program-parts))))))
-    (values encode-program worth-trying)))
-
-(defun make-8-bit-generate-encode-program-and-skip-chars-strings
-  (decode-table encode-table encode-failure-octet)
-  "Generate a CCL program to encode a 8-bit fixed-width charset.
-
-DECODE-TABLE must have 256 non-cons entries, and will be regarded as
-describing a map from the octet corresponding to an offset in the
-table to the that entry in the table.  ENCODE-TABLE is a hash table
-map from unicode values to characters in the range [0,255].
-ENCODE-FAILURE-OCTET describes an integer between 0 and 255
-\(inclusive) to write in the event that a character cannot be encoded.  "
-  (check-argument-type #'vectorp decode-table)
-  (check-argument-range (length decode-table) #x100 #x100)
-  (check-argument-type #'hash-table-p encode-table)
-  (check-argument-type #'integerp encode-failure-octet)
-  (check-argument-range encode-failure-octet #x00 #xFF)
-  (let ((encode-program nil)
-	(general-encode-program
-	 (eval-when-compile
-	   (let ((prog (append 
-			(ccl-compile
-			 `(1
-			   (loop
-			     (read-multibyte-character r0 r1)
-			     (mule-to-unicode r0 r1)
-			     (if (r0 == #xFFFD)
-				 (write #xBEEF)
-			       ((lookup-integer encode-table-sym r0 r3)
-				(if r7
-				    (write-multibyte-character r0 r3)
-				  (write #xBEEF))))
-			     (repeat)))) nil)))
-	     (assert (memq #xBEEF14 prog)
-		     nil
-		     "This code assumes that the constant #xBEEF is #xBEEF14 \
-in compiled CCL code.\nIf that is not the case, and it appears not to
-be--that's why you're getting this message--it will not work.  ")
-	     prog)))
-	(encode-program-with-ascii-optimisation
-	 (eval-when-compile 
-	   (let ((prog (append
-			(ccl-compile
-			 `(1
-			   (loop
-			     (read-multibyte-character r0 r1)
-			     (if (r0 == ,(charset-id 'ascii))
-				 (write r1)
-			       ((mule-to-unicode r0 r1)
-				(if (r0 == #xFFFD)
-				    (write #xBEEF)
-				  ((lookup-integer encode-table-sym r0 r3)
-				   (if r7
-				       (write-multibyte-character r0 r3)
-				     (write #xBEEF))))))
-			     (repeat)))) nil)))
-	     (assert (memq #xBEEF14 prog)
-		     nil
-		     "This code assumes that the constant #xBEEF is #xBEEF14 \
-in compiled CCL code.\nIf that is not the case, and it appears not to
-be--that's why you're getting this message--it will not work.  ")
-	     prog)))
-         (ascii-encodes-as-itself nil)
-         (control-1-encodes-as-itself t)
-         (invalid-sequence-code-point-start
-          (eval-when-compile
-            (char-to-unicode
-             (aref (decode-coding-string "\xd8\x00\x00\x00" 'utf-16-be) 3))))
-         further-char-set skip-chars invalid-sequences-skip-chars)
-
-    ;; Is this coding system ASCII-compatible? If so, we can avoid the hash
-    ;; table lookup for those characters. 
-    (loop
-      for i from #x00 to #x7f
-      always (eq (int-to-char i) (gethash i encode-table))
-      finally (setq ascii-encodes-as-itself t))
-
-    ;; Note that this logic handles EBCDIC badly. For example, CP037,
-    ;; MIME name ebcdic-na, has the entire repertoire of ASCII and
-    ;; Latin 1, and thus a more optimal ccl encode program would check
-    ;; for those character sets and use tables. But for now, we do a
-    ;; hash table lookup for every character.
-    (if (null ascii-encodes-as-itself)
-	;; General encode program. Pros; general and correct. Cons;
-	;; slow, a hash table lookup + mule-unicode conversion is done
-	;; for every character encoding. 
-	(setq encode-program general-encode-program)
-      (multiple-value-setq
-          (encode-program further-char-set)
-        ;; Encode program with ascii-ascii mapping (based on a
-        ;; character's mule character set), and one other mule
-        ;; character set using table-based encoding, other
-        ;; character sets using hash table lookups.
-        ;; make-8-bit-non-ascii-completely-coveredp only returns
-        ;; such a mapping if some non-ASCII charset with
-        ;; characters in decode-table is entirely covered by
-        ;; encode-table.
-        (make-8-bit-generate-helper decode-table encode-table
-                                    encode-failure-octet))
-      (unless encode-program
-	;; If make-8-bit-non-ascii-completely-coveredp returned nil,
-	;; but ASCII still encodes as itself, do one-to-one mapping
-	;; for ASCII, and a hash table lookup for everything else.
-	(setq encode-program encode-program-with-ascii-optimisation)))
-
-    (setq encode-program
-          (nsublis
-           (list (cons #xBEEF14 
-                       (logior (lsh encode-failure-octet 8)
-                               #x14)))
-           (copy-tree encode-program)))
-    (loop
-      for i from #x80 to #x9f
-      do (unless (= i (aref decode-table i))
-           (setq control-1-encodes-as-itself nil)
-           (return)))
-    (loop
-      for i from #x00 to #xFF
-      initially (setq skip-chars
-                      (cond
-                       ((and ascii-encodes-as-itself
-                             control-1-encodes-as-itself further-char-set)
-                        (concat "\x00-\x9f" (charset-skip-chars-string
-                                             further-char-set)))
-                       ((and ascii-encodes-as-itself
-                             control-1-encodes-as-itself)
-                        "\x00-\x9f")
-                       ((null ascii-encodes-as-itself)
-                        (skip-chars-quote (apply #'string
-                                                 (append decode-table nil))))
-                       (further-char-set
-                        (concat (charset-skip-chars-string 'ascii)
-                                (charset-skip-chars-string further-char-set)))
-                       (t 
-                        (charset-skip-chars-string 'ascii)))
-                      invalid-sequences-skip-chars "")
-      with decoded-ucs = nil
-      with decoded = nil
-      with no-ascii-transparency-skip-chars-list = 
-           (unless ascii-encodes-as-itself (append decode-table nil))
-      ;; Can't use #'match-string here, see:
-      ;; http://mid.gmane.org/18829.34118.709782.704574@parhasard.net
-      with skip-chars-test = 
-           #'(lambda (skip-chars-string testing)
-               (with-temp-buffer
-                 (insert testing)
-                 (goto-char (point-min))
-                 (skip-chars-forward skip-chars-string)
-                 (= (point) (point-max))))
-      do
-      (setq decoded (aref decode-table i)
-            decoded-ucs (char-to-unicode decoded))
-      (cond
-       ((<= invalid-sequence-code-point-start decoded-ucs
-            (+ invalid-sequence-code-point-start #xFF))
-        (setq invalid-sequences-skip-chars
-              (concat (string decoded)
-                      invalid-sequences-skip-chars))
-        (assert (not (funcall skip-chars-test skip-chars decoded))
-                "This char should only be skipped with \
-`invalid-sequences-skip-chars', not by `skip-chars'"))
-       ((not (funcall skip-chars-test skip-chars decoded))
-        (if ascii-encodes-as-itself
-            (setq skip-chars (concat skip-chars (string decoded)))
-          (push decoded no-ascii-transparency-skip-chars-list))))
-      finally (unless ascii-encodes-as-itself
-                (setq skip-chars
-                      (skip-chars-quote
-                       (apply #'string
-                              no-ascii-transparency-skip-chars-list)))))
-    (values encode-program skip-chars invalid-sequences-skip-chars)))
-
-(defun make-8-bit-create-decode-encode-tables (unicode-map)
-  "Return a list \(DECODE-TABLE ENCODE-TABLE) given UNICODE-MAP. 
-UNICODE-MAP should be an alist mapping from integer octet values to
-characters with UCS code points; DECODE-TABLE will be a 256-element
-vector, and ENCODE-TABLE will be a hash table mapping from 256 numbers
-to 256 distinct characters.  "
-  (check-argument-type #'listp unicode-map)
-  (let ((decode-table (make-vector 256 nil))
-        (encode-table (make-hash-table :size 256))
-	(private-use-start (encode-char make-8-bit-private-use-start 'ucs))
-        (invalid-sequence-code-point-start
-         (eval-when-compile
-           (char-to-unicode
-            (aref (decode-coding-string "\xd8\x00\x00\x00" 'utf-16-be) 3))))
-	desired-ucs decode-table-entry)
-
-    (loop for (external internal)
-      in unicode-map
-      do
-      (aset decode-table external internal)
-      (assert (not (eq (encode-char internal 'ucs) -1))
-	      nil
-	      "Looks like you're calling make-8-bit-coding-system in a \
-dumped file, \nand you're either not providing a literal UNICODE-MAP
-or PROPS. Don't do that; make-8-bit-coding-system relies on sensible
-Unicode mappings being available, which they are at compile time for
-dumped files (but this requires the mentioned literals), but not, for
-most of them, at run time.  ")
-
-      (puthash (encode-char internal 'ucs)
-	       ;; This is semantically an integer, but Dave Love's design
-	       ;; for lookup-integer in CCL means we need to store it as a
-	       ;; character.
-	       (int-to-char external)
-	       encode-table))
-
-    ;; Now, go through the decode table. For octet values above #x7f, if the
-    ;; decode table entry is nil, this means that they have an undefined
-    ;; mapping (= they map to XEmacs characters with keys in
-    ;; unicode-error-default-translation-table); for octet values below or
-    ;; equal to #x7f, it means that they map to ASCII.
-
-    ;; If any entry (whether below or above #x7f) in the decode-table
-    ;; already maps to some character with a key in
-    ;; unicode-error-default-translation-table, it is treated as an
-    ;; undefined octet by `query-coding-region'. That is, it is not
-    ;; necessary for an octet value to be above #x7f for this to happen.
-
-    (dotimes (i 256)
-      (setq decode-table-entry (aref decode-table i))
-      (if decode-table-entry
-          (when (get-char-table
-                 decode-table-entry
-                 unicode-error-default-translation-table)
-            ;; The caller is explicitly specifying that this octet
-            ;; corresponds to an invalid sequence on disk:
-            (assert (= (get-char-table
-                        decode-table-entry
-                        unicode-error-default-translation-table) i)
-                    "Bad argument to `make-8-bit-coding-system'.
-If you're going to designate an octet with value below #x80 as invalid
-for this coding system, make sure to map it to the invalid sequence
-character corresponding to its octet value on disk. "))
-
-        ;; decode-table-entry is nil; either the octet is to be treated as
-        ;; contributing to an error sequence (when (> #x7f i)), or it should
-        ;; be attempted to treat it as ASCII-equivalent. 
-        (setq desired-ucs (or (and (< i #x80) i)
-                              (+ invalid-sequence-code-point-start i)))
-        (while (gethash desired-ucs encode-table)
-          (assert (not (< i #x80))
-                  "UCS code point should not already be in encode-table!"
-                  ;; There is one invalid sequence char per octet value;
-                  ;; with eight-bit-fixed coding systems, it makes no sense
-                  ;; for us to be multiply allocating them.
-                  (gethash desired-ucs encode-table))
-          (setq desired-ucs (+ private-use-start desired-ucs)
-                private-use-start (+ private-use-start 1)))
-        (puthash desired-ucs (int-to-char i) encode-table)
-        (setq desired-ucs (if (> desired-ucs #xFF)
-                              (unicode-to-char desired-ucs)
-                            ;; So we get Latin-1 when run at dump time,
-                            ;; instead of JIT-allocated characters.
-                            (int-to-char desired-ucs)))
-        (aset decode-table i desired-ucs)))
-    (values decode-table encode-table)))
-
-(defun make-8-bit-generate-decode-program (decode-table)
-  "Given DECODE-TABLE, generate a CCL program to decode an 8-bit charset.
-DECODE-TABLE must have 256 non-cons entries, and will be regarded as
-describing a map from the octet corresponding to an offset in the
-table to the that entry in the table.  "
-  (check-argument-type #'vectorp decode-table)
-  (check-argument-range (length decode-table) #x100 #x100)
-  (let ((decode-program-parts
-	 (eval-when-compile
-	   (let* ((compiled
-		   (append 
-		    (ccl-compile 
-		     `(3
-		       ((read r0)
-                        (loop
-			 (write-read-repeat r0 ,(make-vector
-						 256 'sentinel)))))) nil))
-		  (first-part compiled)
-		  (last-part
-		   (member-if-not #'symbolp
-				  (member-if-not #'integerp first-part))))
-	     ;; Chop off the sentinel sentinel sentinel [..] part. 
-	     (while compiled
-	       (if (symbolp (cadr compiled))
-		   (setcdr compiled nil))
-	       (setq compiled (cdr compiled)))
-	     (list first-part last-part)))))
-   (nconc
-    ;; copy-list needed, because the structure of the literal provided
-    ;; by our eval-when-compile hangs around.
-    (copy-list (first decode-program-parts))
-    (append decode-table nil)
-    (second decode-program-parts))))
-
-(defun make-8-bit-choose-category (decode-table)
-  "Given DECODE-TABLE, return an appropriate coding category.
-DECODE-TABLE is a 256-entry vector describing the mapping from octets on
-disk to XEmacs characters for some fixed-width 8-bit coding system.  "
-  (check-argument-type #'vectorp decode-table)
-  (check-argument-range (length decode-table) #x100 #x100)
-  (loop
-    named category
-    for i from #x80 to #x9F
-    do (unless (= i (aref decode-table i))
-	 (return-from category 'no-conversion))
-    finally return 'iso-8-1))
-
-(defun 8-bit-fixed-query-coding-region (begin end coding-system &optional
-                                        buffer ignore-invalid-sequencesp
-                                        errorp highlightp)
-  "The `query-coding-region' implementation for 8-bit-fixed coding systems.
-
-Uses the `8-bit-fixed-query-from-unicode' and `8-bit-fixed-query-skip-chars'
-coding system properties.  The former is a hash table mapping from valid
-Unicode code points to on-disk octets in the coding system; the latter a set
-of characters as used by `skip-chars-forward'.  Both of these properties are
-generated automatically by `make-8-bit-coding-system'.
-
-See that the documentation of `query-coding-region'; see also
-`make-8-bit-coding-system'. "
-  (check-argument-type #'coding-system-p
-                       (setq coding-system (find-coding-system coding-system)))
-  (check-argument-type #'integer-or-marker-p begin)
-  (check-argument-type #'integer-or-marker-p end)
-  (let ((from-unicode
-         (or (coding-system-get coding-system '8-bit-fixed-query-from-unicode)
-	     (coding-system-get (coding-system-base coding-system)
-				'8-bit-fixed-query-from-unicode)))
-        (skip-chars-arg
-         (or (coding-system-get coding-system '8-bit-fixed-query-skip-chars)
-	     (coding-system-get (coding-system-base coding-system)
-				'8-bit-fixed-query-skip-chars)))
-	(invalid-sequences-skip-chars
-	 (or (coding-system-get coding-system
-				'8-bit-fixed-invalid-sequences-skip-chars)
-	     (coding-system-get (coding-system-base coding-system)
-				'8-bit-fixed-invalid-sequences-skip-chars)))
-	(ranges (make-range-table))
-        (case-fold-search nil)
-        char-after fail-range-start fail-range-end extent
-	failed invalid-sequences-looking-at failed-reason
-        previous-failed-reason)
-    (check-type from-unicode hash-table)
-    (check-type skip-chars-arg string)
-    (check-type invalid-sequences-skip-chars string)
-    (setq invalid-sequences-looking-at
-	  (if (equal "" invalid-sequences-skip-chars)
-	      ;; Regexp that will never match.
-	      #r".\{0,0\}" 
-	      (concat "[" invalid-sequences-skip-chars "]")))
-    (when ignore-invalid-sequencesp
-      (setq skip-chars-arg
-	    (concat skip-chars-arg invalid-sequences-skip-chars)))
-    (save-excursion
-      (when highlightp
-        (query-coding-clear-highlights begin end buffer))
-      (goto-char begin buffer)
-      (skip-chars-forward skip-chars-arg end buffer)
-      (while (< (point buffer) end)
-	(setq char-after (char-after (point buffer) buffer)
-	      fail-range-start (point buffer))
-	(while (and
-		(< (point buffer) end)
-		(or (and 
-                     (not (gethash (encode-char char-after 'ucs) from-unicode))
-                     (setq failed-reason 'unencodable))
-                    (and (not ignore-invalid-sequencesp)
-                         (looking-at invalid-sequences-looking-at buffer)
-                         (setq failed-reason 'invalid-sequence)))
-                (or (null previous-failed-reason)
-                    (eq previous-failed-reason failed-reason)))
-	  (forward-char 1 buffer)
-	  (setq char-after (char-after (point buffer) buffer)
-		failed t
-                previous-failed-reason failed-reason))
-	(if (= fail-range-start (point buffer))
-	    ;; The character can actually be encoded by the coding
-	    ;; system; check the characters past it.
-	    (forward-char 1 buffer)
-	  ;; The character actually failed. 
-	  (when errorp 
-	    (error 'text-conversion-error
-		   (format "Cannot encode %s using coding system"
-			   (buffer-substring fail-range-start (point buffer)
-					     buffer))
-		   (coding-system-name coding-system)))
-          (assert (not (null previous-failed-reason)) t
-                  "previous-failed-reason should always be non-nil here")
-	  (put-range-table fail-range-start
-			   ;; If char-after is non-nil, we're not at
-			   ;; the end of the buffer.
-			   (setq fail-range-end (if char-after
-						    (point buffer)
-						  (point-max buffer)))
-			   previous-failed-reason ranges)
-          (setq previous-failed-reason nil)
-	  (when highlightp
-	    (setq extent (make-extent fail-range-start fail-range-end buffer))
-	    (set-extent-priority extent (+ mouse-highlight-priority 2))
-	    (set-extent-face extent 'query-coding-warning-face))
-	  (skip-chars-forward skip-chars-arg end buffer)))
-      (if failed 
-	  (values nil ranges)
-	(values t nil)))))
-
-(defun make-8-bit-coding-system (name unicode-map &optional description props)
-  "Make and return a fixed-width 8-bit CCL coding system named NAME.
-NAME must be a symbol, and UNICODE-MAP a list. 
-
-UNICODE-MAP is a plist describing a map from octets in the coding
-system NAME (as integers) to XEmacs characters.  Those XEmacs
-characters will be used explicitly on decoding, but for encoding (most
-relevantly, on writing to disk) XEmacs characters that map to the same
-Unicode code point will be unified.  This means that the ISO-8859-? 
-characters that map to the same Unicode code point will not be
-distinct when written to disk, which is normally what is intended; it
-also means that East Asian Han characters from different XEmacs
-character sets will not be distinct when written to disk, which is
-less often what is intended.
-
-Any octets not mapped, and with values above #x7f, will be decoded into
-XEmacs characters that reflect that their values are undefined.  These
-characters will be displayed in a language-environment-specific way. See
-`unicode-error-default-translation-table' and the
-`invalid-sequence-coding-system' argument to `set-language-info'.
-
-These characters will normally be treated as invalid when checking whether
-text can be encoded with `query-coding-region'--see the
-IGNORE-INVALID-SEQUENCESP argument to that function to avoid this.  It is
-possible to specify that octets with values less than #x80 (or indeed
-greater than it) be treated in this way, by specifying explicitly that they
-correspond to the character mapping to that octet in
-`unicode-error-default-translation-table'.  Far fewer coding systems
-override the ASCII mapping, though, so this is not the default.
-
-DESCRIPTION and PROPS are as in `make-coding-system', which see.  This
-function also accepts two additional (optional) properties in PROPS;
-`aliases', giving a list of aliases to be initialized for this
-coding-system, and `encode-failure-octet', an integer between 0 and 256 to
-write in place of XEmacs characters that cannot be encoded, defaulting to
-the code for tilde `~'.  "
-  (check-argument-type #'symbolp name)
-  (check-argument-type #'listp unicode-map)
-  (check-argument-type #'stringp
-		       (or description 
-			   (setq description
-				 (format "Coding system used for %s." name))))
-  (check-valid-plist props)
-  (let  ((encode-failure-octet (or (plist-get props 'encode-failure-octet)
-				   (char-to-int ?~)))
-	 (aliases (plist-get props 'aliases))
-	 (hash-table-sym (gentemp (format "%s-encode-table" name)))
-	 encode-program decode-program result decode-table encode-table
-         skip-chars invalid-sequences-skip-chars)
-
-    ;; Some more sanity checking. 
-    (check-argument-range encode-failure-octet 0 #xFF)
-    (check-argument-type #'listp aliases)
-
-    ;; Don't pass on our extra data to make-coding-system.
-    (setq props (plist-remprop props 'encode-failure-octet)
-	  props (plist-remprop props 'aliases))
-
-    (multiple-value-setq
-	(decode-table encode-table)
-      (make-8-bit-create-decode-encode-tables unicode-map))
-
-    ;; Register the decode-table. 
-    (define-translation-hash-table hash-table-sym encode-table)
-
-    ;; Generate the programs and skip-chars strings. 
-    (setq decode-program (make-8-bit-generate-decode-program decode-table))
-    (multiple-value-setq
-        (encode-program skip-chars invalid-sequences-skip-chars)
-      (make-8-bit-generate-encode-program-and-skip-chars-strings
-       decode-table encode-table encode-failure-octet))
-
-    (unless (vectorp encode-program) 
-      (setq encode-program 
-	    (apply #'vector
-		   (nsublis (list (cons 'encode-table-sym hash-table-sym))
-			    (copy-tree encode-program)))))
-    (unless (vectorp decode-program)
-      (setq decode-program
-	    (apply #'vector decode-program)))
-
-    ;; And now generate the actual coding system.
-    (setq result 
-	  (make-coding-system 
-           name  'ccl
-           description 
-           (plist-put (plist-put props 'decode decode-program)
-                      'encode encode-program)))
-    (coding-system-put name '8-bit-fixed t)
-    (coding-system-put name 'category 
-                       (make-8-bit-choose-category decode-table))
-    (coding-system-put name '8-bit-fixed-query-skip-chars
-                       skip-chars)
-    (coding-system-put name '8-bit-fixed-invalid-sequences-skip-chars
-                       invalid-sequences-skip-chars) 
-    (coding-system-put name '8-bit-fixed-query-from-unicode encode-table)
-    (coding-system-put name 'query-coding-function
-                       #'8-bit-fixed-query-coding-region)
-    (coding-system-put (intern (format "%s-unix" name))
-		       'query-coding-function
-                       #'8-bit-fixed-query-coding-region)
-    (coding-system-put (intern (format "%s-dos" name))
-		       'query-coding-function
-                       #'8-bit-fixed-query-coding-region)
-    (coding-system-put (intern (format "%s-mac" name))
-		       'query-coding-function
-                       #'8-bit-fixed-query-coding-region)
-    (loop for alias in aliases
-      do (define-coding-system-alias alias name))
-    result))
-
-(define-compiler-macro make-8-bit-coding-system (&whole form name unicode-map
-						 &optional description props)
-  ;; We provide the compiler macro (= macro that is expanded only on
-  ;; compilation, and that can punt to a runtime version of the
-  ;; associate function if necessary) not for reasons of speed, though
-  ;; it does speed up things at runtime a little, but because the
-  ;; Unicode mappings are available at compile time in the dumped
-  ;; files, but they are not available at run time for the vast
-  ;; majority of them.
-
-  (if (not (and (and (consp name) (eq (car name) 'quote))
-		(and (consp unicode-map) (eq (car unicode-map) 'quote))
-		(and (or (and (consp props) (eq (car props) 'quote))
-			 (null props)))))
-      ;; The call does not use literals; do it at runtime.
-      form
-    (setq name (cadr name)
-	  unicode-map (cadr unicode-map)
-	  props (if props (cadr props)))
-    (let  ((encode-failure-octet
-	    (or (plist-get props 'encode-failure-octet) (char-to-int ?~)))
-	   (aliases (plist-get props 'aliases))
-	   encode-program decode-program
-	   decode-table encode-table
-           skip-chars invalid-sequences-skip-chars)
-
-      ;; Some sanity checking. 
-      (check-argument-range encode-failure-octet 0 #xFF)
-      (check-argument-type #'listp aliases)
-
-      ;; Don't pass on our extra data to make-coding-system.
-      (setq props (plist-remprop props 'encode-failure-octet)
-	    props (plist-remprop props 'aliases))
-
-      ;; Work out encode-table and decode-table
-      (multiple-value-setq
-          (decode-table encode-table)
-        (make-8-bit-create-decode-encode-tables unicode-map))
-
-      ;; Generate the decode and encode programs, and the skip-chars
-      ;; arguments.
-      (setq decode-program (make-8-bit-generate-decode-program decode-table))
-      (multiple-value-setq
-          (encode-program skip-chars invalid-sequences-skip-chars)
-        (make-8-bit-generate-encode-program-and-skip-chars-strings
-         decode-table encode-table encode-failure-octet))
-
-      ;; And return the generated code. 
-      `(let ((encode-table-sym (gentemp (format "%s-encode-table" ',name)))
-             (encode-table ,encode-table))
-        (define-translation-hash-table encode-table-sym encode-table)
-        (make-coding-system 
-         ',name 'ccl ,description
-         (plist-put (plist-put ',props 'decode 
-                               ,(apply #'vector decode-program))
-                    'encode
-                    (apply #'vector
-                           (nsublis
-                            (list (cons
-                                   'encode-table-sym
-                                   (symbol-value 'encode-table-sym)))
-                            ',encode-program))))
-	(coding-system-put ',name '8-bit-fixed t)
-        (coding-system-put ',name 'category 
-                           ',(make-8-bit-choose-category decode-table))
-        (coding-system-put ',name '8-bit-fixed-query-skip-chars
-                           ,skip-chars)
-        (coding-system-put ',name '8-bit-fixed-invalid-sequences-skip-chars
-                           ,invalid-sequences-skip-chars) 
-        (coding-system-put ',name '8-bit-fixed-query-from-unicode encode-table)
-        (coding-system-put ',name 'query-coding-function
-                           #'8-bit-fixed-query-coding-region)
-	(coding-system-put ',(intern (format "%s-unix" name))
-			   'query-coding-function
-			   #'8-bit-fixed-query-coding-region)
-	(coding-system-put ',(intern (format "%s-dos" name))
-			   'query-coding-function
-			   #'8-bit-fixed-query-coding-region)
-	(coding-system-put ',(intern (format "%s-mac" name))
-			   'query-coding-function
-			   #'8-bit-fixed-query-coding-region)
-        ,(macroexpand `(loop for alias in ',aliases
-                        do (define-coding-system-alias alias
-                             ',name)))
-        (find-coding-system ',name)))))
-
 ;; Ideally this would be in latin.el, but code-init.el uses it.
-(make-8-bit-coding-system
+(make-coding-system
  'iso-8859-1 
- (loop
-   for i from #x80 to #xff
-   collect (list i (int-char i))) ;; Identical to Latin-1.
+ 'fixed-width
  "ISO-8859-1 (Latin-1)"
- '(mnemonic "Latin 1"
-   documentation "The most used encoding of Western Europe and the Americas."
-   aliases (iso-latin-1 latin-1)))
+ (eval-when-compile
+   `(unicode-map 
+     ,(loop
+        for i from #x80 to #xff
+        collect (list i (int-char i))) ;; Identical to Latin-1.
+     mnemonic "Latin 1"
+     documentation "The most used encoding of Western Europe and the Americas."
+     aliases (iso-latin-1 latin-1))))
author	Aidan Kehoe <kehoea@parhasard.net>
date	Sat, 19 Sep 2009 22:53:13 +0100
parents	c786c3fd0740
children	308d34e9f07d