xemacs-beta: lisp/mule/mule-coding.el comparison

comparison lisp/mule/mule-coding.el @ 4690:257b468bf2ca

Move the #'query-coding-region implementation to C. This is necessary because there is no reasonable way to access the corresponding mswindows-multibyte functionality from Lisp, and we need such functionality if we're going to have a reliable and portable #'query-coding-region implementation. However, this change doesn't yet provide #'query-coding-region for the mswindow-multibyte coding systems, there should be no functional differences between an XEmacs with this change and one without it. src/ChangeLog addition: 2009-09-19 Aidan Kehoe <kehoea@parhasard.net> Move the #'query-coding-region implementation to C. This is necessary because there is no reasonable way to access the corresponding mswindows-multibyte functionality from Lisp, and we need such functionality if we're going to have a reliable and portable #'query-coding-region implementation. However, this change doesn't yet provide #'query-coding-region for the mswindow-multibyte coding systems, there should be no functional differences between an XEmacs with this change and one without it. * mule-coding.c (struct fixed_width_coding_system): Add a new coding system type, fixed_width, and implement it. It uses the CCL infrastructure but has a much simpler creation API, and its own query_method, formerly in lisp/mule/mule-coding.el. * unicode.c: Move the Unicode query method implementation here from unicode.el. * lisp.h: Declare Fmake_coding_system_internal, Fcopy_range_table here. * intl-win32.c (complex_vars_of_intl_win32): Use Fmake_coding_system_internal, not Fmake_coding_system. * general-slots.h: Add Qsucceeded, Qunencodable, Qinvalid_sequence here. * file-coding.h (enum coding_system_variant): Add fixed_width_coding_system here. (struct coding_system_methods): Add query_method and query_lstream_method to the coding system methods. Provide flags for the query methods. Declare the default query method; initialise it correctly in INITIALIZE_CODING_SYSTEM_TYPE. * file-coding.c (default_query_method): New function, the default query method for coding systems that do not set it. Moved from coding.el. (make_coding_system_1): Accept new elements in PROPS in #'make-coding-system; aliases, a list of aliases; safe-chars and safe-charsets (these were previously accepted but not saved); and category. (Fmake_coding_system_internal): New function, what used to be #'make-coding-system--on Mule builds, we've now moved some of the functionality of this to Lisp. (Fcoding_system_canonical_name_p): Move this earlier in the file, since it's now called from within make_coding_system_1. (Fquery_coding_region): Move the implementation of this here, from coding.el. (complex_vars_of_file_coding): Call Fmake_coding_system_internal, not Fmake_coding_system; specify safe-charsets properties when we're a mule build. * extents.h (mouse_highlight_priority, Fset_extent_priority, Fset_extent_face, Fmap_extents): Make these available to other C files. lisp/ChangeLog addition: 2009-09-19 Aidan Kehoe <kehoea@parhasard.net> Move the #'query-coding-region implementation to C. * coding.el: Consolidate code that depends on the presence or absence of Mule at the end of this file. (default-query-coding-region, query-coding-region): Move these functions to C. (default-query-coding-region-safe-charset-skip-chars-map): Remove this variable, the corresponding C variable is Vdefault_query_coding_region_chartab_cache in file-coding.c. (query-coding-string): Update docstring to reflect actual multiple values, be more careful about not modifying a range table that we're currently mapping over. (encode-coding-char): Make the implementation of this simpler. (featurep 'mule): Autoload #'make-coding-system from mule/make-coding-system.el if we're a mule build; provide an appropriate compiler macro. Do various non-mule compatibility things if we're not a mule build. * update-elc.el (additional-dump-dependencies): Add mule/make-coding-system as a dump time dependency if we're a mule build. * unicode.el (ccl-encode-to-ucs-2): (decode-char): (encode-char): Move these earlier in the file, for the sake of some byte compile warnings. (unicode-query-coding-region): Move this to unicode.c * mule/make-coding-system.el: New file, not dumped. Contains the functionality to rework the arguments necessary for fixed-width coding systems, and contains the implementation of #'make-coding-system, which now calls #'make-coding-system-internal. * mule/vietnamese.el (viscii): * mule/latin.el (iso-8859-2): (windows-1250): (iso-8859-3): (iso-8859-4): (iso-8859-14): (iso-8859-15): (iso-8859-16): (iso-8859-9): (macintosh): (windows-1252): * mule/hebrew.el (iso-8859-8): * mule/greek.el (iso-8859-7): (windows-1253): * mule/cyrillic.el (iso-8859-5): (koi8-r): (koi8-u): (windows-1251): (alternativnyj): (koi8-ru): (koi8-t): (koi8-c): (koi8-o): * mule/arabic.el (iso-8859-6): (windows-1256): Move all these coding systems to being of type fixed-width, not of type CCL. This allows the distinct query-coding-region for them to be in C, something which will eventually allow us to implement query-coding-region for the mswindows-multibyte coding systems. * mule/general-late.el (posix-charset-to-coding-system-hash): Document why we're pre-emptively persuading the byte compiler that the ELC for this file needs to be written using escape-quoted. Call #'set-unicode-query-skip-chars-args, now the Unicode query-coding-region implementation is in C. * mule/thai-xtis.el (tis-620): Don't bother checking whether we're XEmacs or not here. * mule/mule-coding.el: Move the eight bit fixed-width functionality from this file to make-coding-system.el. tests/ChangeLog addition: 2009-09-19 Aidan Kehoe <kehoea@parhasard.net> * automated/mule-tests.el: Check a coding system's type, not an 8-bit-fixed property, for whether that coding system should be treated as a fixed-width coding system. * automated/query-coding-tests.el: Don't test the query coding functionality for mswindows-multibyte coding systems, it's not yet implemented.

author	Aidan Kehoe <kehoea@parhasard.net>
date	Sat, 19 Sep 2009 22:53:13 +0100
parents	c786c3fd0740
children	308d34e9f07d

comparison

equal deleted inserted replaced

-:0636c6ccb430
+:257b468bf2ca
 ;; Boston, MA 02111-1307, USA.
 ;;; Commentary:
 ;;; split off of mule.el and mostly moved to coding.el
-;; Needed for make-8-bit-coding-system.
-(eval-when-compile (require 'ccl))
 ;;; Code:
 (defun coding-system-force-on-output (coding-system register)
 "Return the 'force-on-output property of CODING-SYSTEM for the specified REGISTER."
 	      (setq done t))
 	  (setq id (1+ id)))))
 (put symbol 'translation-hash-table-id id)
 id))
-(defvar make-8-bit-private-use-start (decode-char 'ucs #xE000)
-"Start of a 256 code private use area for make-8-bit-coding-system.
-This is used to ensure that distinct octets on disk for a given coding
-system map to distinct XEmacs characters, preventing a spurious changes when
-a file is read, not changed, and then written.  ")
-(defun make-8-bit-generate-helper (decode-table encode-table
-				   encode-failure-octet)
-"Helper function, `make-8-bit-generate-encode-program-and-skip-chars-strings',
-which see.
-Deals with the case where ASCII and another character set can both be
-encoded unambiguously and completely into the coding-system; if this is so,
-returns a list comprised of such a ccl-program and the character set in
-question.  If not, it returns a list with both entries nil."
-(let ((tentative-encode-program-parts
-	 (eval-when-compile
-	   (let* ((vec-len 128)
-		  (compiled
-		   (append
-(ccl-compile
-`(1
-(loop
-(read-multibyte-character r0 r1)
-(if (r0 == ,(charset-id 'ascii))
-(write r1)
-((if (r0 == #xABAB)
-;; #xBFFE is a sentinel in the compiled
-;; program.
-				((r0 = r1 & #x7F)
-				 (write r0 ,(make-vector vec-len #xBFFE)))
-((mule-to-unicode r0 r1)
-(if (r0 == #xFFFD)
-(write #xBEEF)
-((lookup-integer encode-table-sym r0 r3)
-(if r7
-(write-multibyte-character r0 r3)
-(write #xBEEF))))))))
-(repeat)))) nil))
-		  (first-part compiled)
-		  (last-part
-		   (member-if-not (lambda (entr) (eq #xBFFE entr))
-				  (member-if
-(lambda (entr) (eq #xBFFE entr))
-first-part))))
-	     (while compiled
-	       (when (eq #xBFFE (cadr compiled))
-		 (assert (= vec-len (search '(#xBFFE) (cdr compiled)
-					    :test #'/=)) nil
-					    "Strange ccl vector length")
-		 (setcdr compiled nil))
-	       (setq compiled (cdr compiled)))
-;; Is the generated code as we expect it to be?
-	     (assert (and (memq #xABAB first-part)
-			  (memq #xBEEF14 last-part))
-	    nil
-	    "This code assumes that the constant #xBEEF is #xBEEF14 in \
-compiled CCL code,\nand that the constant #xABAB is #xABAB. If that is
-not the case, and it appears not to be--that's why you're getting this
-message--it will not work.  ")
-	     (list first-part last-part vec-len))))
-	(charset-lower -1)
-	(charset-upper -1)
-	worth-trying known-charsets encode-program
-	other-charset-vector ucs)
-(loop for char across decode-table
-do (pushnew (char-charset char) known-charsets))
-(setq known-charsets (delq 'ascii known-charsets))
-(loop for known-charset in known-charsets
-do
-;; This is not possible for two dimensional charsets.
-(when (eq 1 (charset-dimension known-charset))
-(if (eq 'control-1 known-charset)
-(setq charset-lower 0
-charset-upper 31)
-	  ;; There should be a nicer way to get the limits here.
-(condition-case args-out-of-range
-(make-char known-charset #x100)
-(args-out-of-range
-(setq charset-lower (third args-out-of-range)
-charset-upper (fourth args-out-of-range)))))
-	(loop
-	  for i from charset-lower to charset-upper
-	  always (and (setq ucs
-			    (encode-char (make-char known-charset i) 'ucs))
-		      (gethash ucs encode-table))
-	  finally (setq worth-trying known-charset))
-	;; Only trying this for one charset at a time, the first find.
-	(when worth-trying (return))
-	;; Okay, this charset is not worth trying, Try the next.
-	(setq charset-lower -1
-	      charset-upper -1
-	      worth-trying nil)))
-(when worth-trying
-(setq other-charset-vector
-	    (make-vector (third tentative-encode-program-parts)
-			 encode-failure-octet))
-(loop for i from charset-lower to charset-upper
-do (aset other-charset-vector i
-		 (gethash (encode-char (make-char worth-trying i)
-				       'ucs) encode-table)))
-(setq encode-program
-(nsublis
-(list (cons #xABAB (charset-id worth-trying)))
-(nconc
-(copy-list (first
-tentative-encode-program-parts))
-(append other-charset-vector nil)
-(copy-tree (second
-tentative-encode-program-parts))))))
-(values encode-program worth-trying)))
-(defun make-8-bit-generate-encode-program-and-skip-chars-strings
-(decode-table encode-table encode-failure-octet)
-"Generate a CCL program to encode a 8-bit fixed-width charset.
-DECODE-TABLE must have 256 non-cons entries, and will be regarded as
-describing a map from the octet corresponding to an offset in the
-table to the that entry in the table.  ENCODE-TABLE is a hash table
-map from unicode values to characters in the range [0,255].
-ENCODE-FAILURE-OCTET describes an integer between 0 and 255
-\(inclusive) to write in the event that a character cannot be encoded.  "
-(check-argument-type #'vectorp decode-table)
-(check-argument-range (length decode-table) #x100 #x100)
-(check-argument-type #'hash-table-p encode-table)
-(check-argument-type #'integerp encode-failure-octet)
-(check-argument-range encode-failure-octet #x00 #xFF)
-(let ((encode-program nil)
-	(general-encode-program
-	 (eval-when-compile
-	   (let ((prog (append
-			(ccl-compile
-			 `(1
-			   (loop
-			     (read-multibyte-character r0 r1)
-			     (mule-to-unicode r0 r1)
-			     (if (r0 == #xFFFD)
-				 (write #xBEEF)
-			       ((lookup-integer encode-table-sym r0 r3)
-				(if r7
-				    (write-multibyte-character r0 r3)
-				  (write #xBEEF))))
-			     (repeat)))) nil)))
-	     (assert (memq #xBEEF14 prog)
-		     nil
-		     "This code assumes that the constant #xBEEF is #xBEEF14 \
-in compiled CCL code.\nIf that is not the case, and it appears not to
-be--that's why you're getting this message--it will not work.  ")
-	     prog)))
-	(encode-program-with-ascii-optimisation
-	 (eval-when-compile
-	   (let ((prog (append
-			(ccl-compile
-			 `(1
-			   (loop
-			     (read-multibyte-character r0 r1)
-			     (if (r0 == ,(charset-id 'ascii))
-				 (write r1)
-			       ((mule-to-unicode r0 r1)
-				(if (r0 == #xFFFD)
-				    (write #xBEEF)
-				  ((lookup-integer encode-table-sym r0 r3)
-				   (if r7
-				       (write-multibyte-character r0 r3)
-				     (write #xBEEF))))))
-			     (repeat)))) nil)))
-	     (assert (memq #xBEEF14 prog)
-		     nil
-		     "This code assumes that the constant #xBEEF is #xBEEF14 \
-in compiled CCL code.\nIf that is not the case, and it appears not to
-be--that's why you're getting this message--it will not work.  ")
-	     prog)))
-(ascii-encodes-as-itself nil)
-(control-1-encodes-as-itself t)
-(invalid-sequence-code-point-start
-(eval-when-compile
-(char-to-unicode
-(aref (decode-coding-string "\xd8\x00\x00\x00" 'utf-16-be) 3))))
-further-char-set skip-chars invalid-sequences-skip-chars)
-;; Is this coding system ASCII-compatible? If so, we can avoid the hash
-;; table lookup for those characters.
-(loop
-for i from #x00 to #x7f
-always (eq (int-to-char i) (gethash i encode-table))
-finally (setq ascii-encodes-as-itself t))
-;; Note that this logic handles EBCDIC badly. For example, CP037,
-;; MIME name ebcdic-na, has the entire repertoire of ASCII and
-;; Latin 1, and thus a more optimal ccl encode program would check
-;; for those character sets and use tables. But for now, we do a
-;; hash table lookup for every character.
-(if (null ascii-encodes-as-itself)
-	;; General encode program. Pros; general and correct. Cons;
-	;; slow, a hash table lookup + mule-unicode conversion is done
-	;; for every character encoding.
-	(setq encode-program general-encode-program)
-(multiple-value-setq
-(encode-program further-char-set)
-;; Encode program with ascii-ascii mapping (based on a
-;; character's mule character set), and one other mule
-;; character set using table-based encoding, other
-;; character sets using hash table lookups.
-;; make-8-bit-non-ascii-completely-coveredp only returns
-;; such a mapping if some non-ASCII charset with
-;; characters in decode-table is entirely covered by
-;; encode-table.
-(make-8-bit-generate-helper decode-table encode-table
-encode-failure-octet))
-(unless encode-program
-	;; If make-8-bit-non-ascii-completely-coveredp returned nil,
-	;; but ASCII still encodes as itself, do one-to-one mapping
-	;; for ASCII, and a hash table lookup for everything else.
-	(setq encode-program encode-program-with-ascii-optimisation)))
-(setq encode-program
-(nsublis
-(list (cons #xBEEF14
-(logior (lsh encode-failure-octet 8)
-#x14)))
-(copy-tree encode-program)))
-(loop
-for i from #x80 to #x9f
-do (unless (= i (aref decode-table i))
-(setq control-1-encodes-as-itself nil)
-(return)))
-(loop
-for i from #x00 to #xFF
-initially (setq skip-chars
-(cond
-((and ascii-encodes-as-itself
-control-1-encodes-as-itself further-char-set)
-(concat "\x00-\x9f" (charset-skip-chars-string
-further-char-set)))
-((and ascii-encodes-as-itself
-control-1-encodes-as-itself)
-"\x00-\x9f")
-((null ascii-encodes-as-itself)
-(skip-chars-quote (apply #'string
-(append decode-table nil))))
-(further-char-set
-(concat (charset-skip-chars-string 'ascii)
-(charset-skip-chars-string further-char-set)))
-(t
-(charset-skip-chars-string 'ascii)))
-invalid-sequences-skip-chars "")
-with decoded-ucs = nil
-with decoded = nil
-with no-ascii-transparency-skip-chars-list =
-(unless ascii-encodes-as-itself (append decode-table nil))
-;; Can't use #'match-string here, see:
-;; http://mid.gmane.org/18829.34118.709782.704574@parhasard.net
-with skip-chars-test =
-#'(lambda (skip-chars-string testing)
-(with-temp-buffer
-(insert testing)
-(goto-char (point-min))
-(skip-chars-forward skip-chars-string)
-(= (point) (point-max))))
-do
-(setq decoded (aref decode-table i)
-decoded-ucs (char-to-unicode decoded))
-(cond
-((<= invalid-sequence-code-point-start decoded-ucs
-(+ invalid-sequence-code-point-start #xFF))
-(setq invalid-sequences-skip-chars
-(concat (string decoded)
-invalid-sequences-skip-chars))
-(assert (not (funcall skip-chars-test skip-chars decoded))
-"This char should only be skipped with \
-`invalid-sequences-skip-chars', not by `skip-chars'"))
-((not (funcall skip-chars-test skip-chars decoded))
-(if ascii-encodes-as-itself
-(setq skip-chars (concat skip-chars (string decoded)))
-(push decoded no-ascii-transparency-skip-chars-list))))
-finally (unless ascii-encodes-as-itself
-(setq skip-chars
-(skip-chars-quote
-(apply #'string
-no-ascii-transparency-skip-chars-list)))))
-(values encode-program skip-chars invalid-sequences-skip-chars)))
-(defun make-8-bit-create-decode-encode-tables (unicode-map)
-"Return a list \(DECODE-TABLE ENCODE-TABLE) given UNICODE-MAP.
-UNICODE-MAP should be an alist mapping from integer octet values to
-characters with UCS code points; DECODE-TABLE will be a 256-element
-vector, and ENCODE-TABLE will be a hash table mapping from 256 numbers
-to 256 distinct characters.  "
-(check-argument-type #'listp unicode-map)
-(let ((decode-table (make-vector 256 nil))
-(encode-table (make-hash-table :size 256))
-	(private-use-start (encode-char make-8-bit-private-use-start 'ucs))
-(invalid-sequence-code-point-start
-(eval-when-compile
-(char-to-unicode
-(aref (decode-coding-string "\xd8\x00\x00\x00" 'utf-16-be) 3))))
-	desired-ucs decode-table-entry)
-(loop for (external internal)
-in unicode-map
-do
-(aset decode-table external internal)
-(assert (not (eq (encode-char internal 'ucs) -1))
-	      nil
-	      "Looks like you're calling make-8-bit-coding-system in a \
-dumped file, \nand you're either not providing a literal UNICODE-MAP
-or PROPS. Don't do that; make-8-bit-coding-system relies on sensible
-Unicode mappings being available, which they are at compile time for
-dumped files (but this requires the mentioned literals), but not, for
-most of them, at run time.  ")
-(puthash (encode-char internal 'ucs)
-	       ;; This is semantically an integer, but Dave Love's design
-	       ;; for lookup-integer in CCL means we need to store it as a
-	       ;; character.
-	       (int-to-char external)
-	       encode-table))
-;; Now, go through the decode table. For octet values above #x7f, if the
-;; decode table entry is nil, this means that they have an undefined
-;; mapping (= they map to XEmacs characters with keys in
-;; unicode-error-default-translation-table); for octet values below or
-;; equal to #x7f, it means that they map to ASCII.
-;; If any entry (whether below or above #x7f) in the decode-table
-;; already maps to some character with a key in
-;; unicode-error-default-translation-table, it is treated as an
-;; undefined octet by `query-coding-region'. That is, it is not
-;; necessary for an octet value to be above #x7f for this to happen.
-(dotimes (i 256)
-(setq decode-table-entry (aref decode-table i))
-(if decode-table-entry
-(when (get-char-table
-decode-table-entry
-unicode-error-default-translation-table)
-;; The caller is explicitly specifying that this octet
-;; corresponds to an invalid sequence on disk:
-(assert (= (get-char-table
-decode-table-entry
-unicode-error-default-translation-table) i)
-"Bad argument to `make-8-bit-coding-system'.
-If you're going to designate an octet with value below #x80 as invalid
-for this coding system, make sure to map it to the invalid sequence
-character corresponding to its octet value on disk. "))
-;; decode-table-entry is nil; either the octet is to be treated as
-;; contributing to an error sequence (when (> #x7f i)), or it should
-;; be attempted to treat it as ASCII-equivalent.
-(setq desired-ucs (or (and (< i #x80) i)
-(+ invalid-sequence-code-point-start i)))
-(while (gethash desired-ucs encode-table)
-(assert (not (< i #x80))
-"UCS code point should not already be in encode-table!"
-;; There is one invalid sequence char per octet value;
-;; with eight-bit-fixed coding systems, it makes no sense
-;; for us to be multiply allocating them.
-(gethash desired-ucs encode-table))
-(setq desired-ucs (+ private-use-start desired-ucs)
-private-use-start (+ private-use-start 1)))
-(puthash desired-ucs (int-to-char i) encode-table)
-(setq desired-ucs (if (> desired-ucs #xFF)
-(unicode-to-char desired-ucs)
-;; So we get Latin-1 when run at dump time,
-;; instead of JIT-allocated characters.
-(int-to-char desired-ucs)))
-(aset decode-table i desired-ucs)))
-(values decode-table encode-table)))
-(defun make-8-bit-generate-decode-program (decode-table)
-"Given DECODE-TABLE, generate a CCL program to decode an 8-bit charset.
-DECODE-TABLE must have 256 non-cons entries, and will be regarded as
-describing a map from the octet corresponding to an offset in the
-table to the that entry in the table.  "
-(check-argument-type #'vectorp decode-table)
-(check-argument-range (length decode-table) #x100 #x100)
-(let ((decode-program-parts
-	 (eval-when-compile
-	   (let* ((compiled
-		   (append
-		    (ccl-compile
-		     `(3
-		       ((read r0)
-(loop
-			 (write-read-repeat r0 ,(make-vector
-						 256 'sentinel)))))) nil))
-		  (first-part compiled)
-		  (last-part
-		   (member-if-not #'symbolp
-				  (member-if-not #'integerp first-part))))
-	     ;; Chop off the sentinel sentinel sentinel [..] part.
-	     (while compiled
-	       (if (symbolp (cadr compiled))
-		   (setcdr compiled nil))
-	       (setq compiled (cdr compiled)))
-	     (list first-part last-part)))))
-(nconc
-;; copy-list needed, because the structure of the literal provided
-;; by our eval-when-compile hangs around.
-(copy-list (first decode-program-parts))
-(append decode-table nil)
-(second decode-program-parts))))
-(defun make-8-bit-choose-category (decode-table)
-"Given DECODE-TABLE, return an appropriate coding category.
-DECODE-TABLE is a 256-entry vector describing the mapping from octets on
-disk to XEmacs characters for some fixed-width 8-bit coding system.  "
-(check-argument-type #'vectorp decode-table)
-(check-argument-range (length decode-table) #x100 #x100)
-(loop
-named category
-for i from #x80 to #x9F
-do (unless (= i (aref decode-table i))
-	 (return-from category 'no-conversion))
-finally return 'iso-8-1))
-(defun 8-bit-fixed-query-coding-region (begin end coding-system &optional
-buffer ignore-invalid-sequencesp
-errorp highlightp)
-"The `query-coding-region' implementation for 8-bit-fixed coding systems.
-Uses the `8-bit-fixed-query-from-unicode' and `8-bit-fixed-query-skip-chars'
-coding system properties.  The former is a hash table mapping from valid
-Unicode code points to on-disk octets in the coding system; the latter a set
-of characters as used by `skip-chars-forward'.  Both of these properties are
-generated automatically by `make-8-bit-coding-system'.
-See that the documentation of `query-coding-region'; see also
-`make-8-bit-coding-system'. "
-(check-argument-type #'coding-system-p
-(setq coding-system (find-coding-system coding-system)))
-(check-argument-type #'integer-or-marker-p begin)
-(check-argument-type #'integer-or-marker-p end)
-(let ((from-unicode
-(or (coding-system-get coding-system '8-bit-fixed-query-from-unicode)
-	     (coding-system-get (coding-system-base coding-system)
-				'8-bit-fixed-query-from-unicode)))
-(skip-chars-arg
-(or (coding-system-get coding-system '8-bit-fixed-query-skip-chars)
-	     (coding-system-get (coding-system-base coding-system)
-				'8-bit-fixed-query-skip-chars)))
-	(invalid-sequences-skip-chars
-	 (or (coding-system-get coding-system
-				'8-bit-fixed-invalid-sequences-skip-chars)
-	     (coding-system-get (coding-system-base coding-system)
-				'8-bit-fixed-invalid-sequences-skip-chars)))
-	(ranges (make-range-table))
-(case-fold-search nil)
-char-after fail-range-start fail-range-end extent
-	failed invalid-sequences-looking-at failed-reason
-previous-failed-reason)
-(check-type from-unicode hash-table)
-(check-type skip-chars-arg string)
-(check-type invalid-sequences-skip-chars string)
-(setq invalid-sequences-looking-at
-	  (if (equal "" invalid-sequences-skip-chars)
-	      ;; Regexp that will never match.
-	      #r".\{0,0\}"
-	      (concat "[" invalid-sequences-skip-chars "]")))
-(when ignore-invalid-sequencesp
-(setq skip-chars-arg
-	    (concat skip-chars-arg invalid-sequences-skip-chars)))
-(save-excursion
-(when highlightp
-(query-coding-clear-highlights begin end buffer))
-(goto-char begin buffer)
-(skip-chars-forward skip-chars-arg end buffer)
-(while (< (point buffer) end)
-	(setq char-after (char-after (point buffer) buffer)
-	      fail-range-start (point buffer))
-	(while (and
-		(< (point buffer) end)
-		(or (and
-(not (gethash (encode-char char-after 'ucs) from-unicode))
-(setq failed-reason 'unencodable))
-(and (not ignore-invalid-sequencesp)
-(looking-at invalid-sequences-looking-at buffer)
-(setq failed-reason 'invalid-sequence)))
-(or (null previous-failed-reason)
-(eq previous-failed-reason failed-reason)))
-	  (forward-char 1 buffer)
-	  (setq char-after (char-after (point buffer) buffer)
-		failed t
-previous-failed-reason failed-reason))
-	(if (= fail-range-start (point buffer))
-	    ;; The character can actually be encoded by the coding
-	    ;; system; check the characters past it.
-	    (forward-char 1 buffer)
-	  ;; The character actually failed.
-	  (when errorp
-	    (error 'text-conversion-error
-		   (format "Cannot encode %s using coding system"
-			   (buffer-substring fail-range-start (point buffer)
-					     buffer))
-		   (coding-system-name coding-system)))
-(assert (not (null previous-failed-reason)) t
-"previous-failed-reason should always be non-nil here")
-	  (put-range-table fail-range-start
-			   ;; If char-after is non-nil, we're not at
-			   ;; the end of the buffer.
-			   (setq fail-range-end (if char-after
-						    (point buffer)
-						  (point-max buffer)))
-			   previous-failed-reason ranges)
-(setq previous-failed-reason nil)
-	  (when highlightp
-	    (setq extent (make-extent fail-range-start fail-range-end buffer))
-	    (set-extent-priority extent (+ mouse-highlight-priority 2))
-	    (set-extent-face extent 'query-coding-warning-face))
-	  (skip-chars-forward skip-chars-arg end buffer)))
-(if failed
-	  (values nil ranges)
-	(values t nil)))))
-(defun make-8-bit-coding-system (name unicode-map &optional description props)
-"Make and return a fixed-width 8-bit CCL coding system named NAME.
-NAME must be a symbol, and UNICODE-MAP a list.
-UNICODE-MAP is a plist describing a map from octets in the coding
-system NAME (as integers) to XEmacs characters.  Those XEmacs
-characters will be used explicitly on decoding, but for encoding (most
-relevantly, on writing to disk) XEmacs characters that map to the same
-Unicode code point will be unified.  This means that the ISO-8859-?
-characters that map to the same Unicode code point will not be
-distinct when written to disk, which is normally what is intended; it
-also means that East Asian Han characters from different XEmacs
-character sets will not be distinct when written to disk, which is
-less often what is intended.
-Any octets not mapped, and with values above #x7f, will be decoded into
-XEmacs characters that reflect that their values are undefined.  These
-characters will be displayed in a language-environment-specific way. See
-`unicode-error-default-translation-table' and the
-`invalid-sequence-coding-system' argument to `set-language-info'.
-These characters will normally be treated as invalid when checking whether
-text can be encoded with `query-coding-region'--see the
-IGNORE-INVALID-SEQUENCESP argument to that function to avoid this.  It is
-possible to specify that octets with values less than #x80 (or indeed
-greater than it) be treated in this way, by specifying explicitly that they
-correspond to the character mapping to that octet in
-`unicode-error-default-translation-table'.  Far fewer coding systems
-override the ASCII mapping, though, so this is not the default.
-DESCRIPTION and PROPS are as in `make-coding-system', which see.  This
-function also accepts two additional (optional) properties in PROPS;
-`aliases', giving a list of aliases to be initialized for this
-coding-system, and `encode-failure-octet', an integer between 0 and 256 to
-write in place of XEmacs characters that cannot be encoded, defaulting to
-the code for tilde `~'.  "
-(check-argument-type #'symbolp name)
-(check-argument-type #'listp unicode-map)
-(check-argument-type #'stringp
-		       (or description
-			   (setq description
-				 (format "Coding system used for %s." name))))
-(check-valid-plist props)
-(let  ((encode-failure-octet (or (plist-get props 'encode-failure-octet)
-				   (char-to-int ?~)))
-	 (aliases (plist-get props 'aliases))
-	 (hash-table-sym (gentemp (format "%s-encode-table" name)))
-	 encode-program decode-program result decode-table encode-table
-skip-chars invalid-sequences-skip-chars)
-;; Some more sanity checking.
-(check-argument-range encode-failure-octet 0 #xFF)
-(check-argument-type #'listp aliases)
-;; Don't pass on our extra data to make-coding-system.
-(setq props (plist-remprop props 'encode-failure-octet)
-	  props (plist-remprop props 'aliases))
-(multiple-value-setq
-	(decode-table encode-table)
-(make-8-bit-create-decode-encode-tables unicode-map))
-;; Register the decode-table.
-(define-translation-hash-table hash-table-sym encode-table)
-;; Generate the programs and skip-chars strings.
-(setq decode-program (make-8-bit-generate-decode-program decode-table))
-(multiple-value-setq
-(encode-program skip-chars invalid-sequences-skip-chars)
-(make-8-bit-generate-encode-program-and-skip-chars-strings
-decode-table encode-table encode-failure-octet))
-(unless (vectorp encode-program)
-(setq encode-program
-	    (apply #'vector
-		   (nsublis (list (cons 'encode-table-sym hash-table-sym))
-			    (copy-tree encode-program)))))
-(unless (vectorp decode-program)
-(setq decode-program
-	    (apply #'vector decode-program)))
-;; And now generate the actual coding system.
-(setq result
-	  (make-coding-system
-name  'ccl
-description
-(plist-put (plist-put props 'decode decode-program)
-'encode encode-program)))
-(coding-system-put name '8-bit-fixed t)
-(coding-system-put name 'category
-(make-8-bit-choose-category decode-table))
-(coding-system-put name '8-bit-fixed-query-skip-chars
-skip-chars)
-(coding-system-put name '8-bit-fixed-invalid-sequences-skip-chars
-invalid-sequences-skip-chars)
-(coding-system-put name '8-bit-fixed-query-from-unicode encode-table)
-(coding-system-put name 'query-coding-function
-#'8-bit-fixed-query-coding-region)
-(coding-system-put (intern (format "%s-unix" name))
-		       'query-coding-function
-#'8-bit-fixed-query-coding-region)
-(coding-system-put (intern (format "%s-dos" name))
-		       'query-coding-function
-#'8-bit-fixed-query-coding-region)
-(coding-system-put (intern (format "%s-mac" name))
-		       'query-coding-function
-#'8-bit-fixed-query-coding-region)
-(loop for alias in aliases
-do (define-coding-system-alias alias name))
-result))
-(define-compiler-macro make-8-bit-coding-system (&whole form name unicode-map
-						 &optional description props)
-;; We provide the compiler macro (= macro that is expanded only on
-;; compilation, and that can punt to a runtime version of the
-;; associate function if necessary) not for reasons of speed, though
-;; it does speed up things at runtime a little, but because the
-;; Unicode mappings are available at compile time in the dumped
-;; files, but they are not available at run time for the vast
-;; majority of them.
-(if (not (and (and (consp name) (eq (car name) 'quote))
-		(and (consp unicode-map) (eq (car unicode-map) 'quote))
-		(and (or (and (consp props) (eq (car props) 'quote))
-			 (null props)))))
-;; The call does not use literals; do it at runtime.
-form
-(setq name (cadr name)
-	  unicode-map (cadr unicode-map)
-	  props (if props (cadr props)))
-(let  ((encode-failure-octet
-	    (or (plist-get props 'encode-failure-octet) (char-to-int ?~)))
-	   (aliases (plist-get props 'aliases))
-	   encode-program decode-program
-	   decode-table encode-table
-skip-chars invalid-sequences-skip-chars)
-;; Some sanity checking.
-(check-argument-range encode-failure-octet 0 #xFF)
-(check-argument-type #'listp aliases)
-;; Don't pass on our extra data to make-coding-system.
-(setq props (plist-remprop props 'encode-failure-octet)
-	    props (plist-remprop props 'aliases))
-;; Work out encode-table and decode-table
-(multiple-value-setq
-(decode-table encode-table)
-(make-8-bit-create-decode-encode-tables unicode-map))
-;; Generate the decode and encode programs, and the skip-chars
-;; arguments.
-(setq decode-program (make-8-bit-generate-decode-program decode-table))
-(multiple-value-setq
-(encode-program skip-chars invalid-sequences-skip-chars)
-(make-8-bit-generate-encode-program-and-skip-chars-strings
-decode-table encode-table encode-failure-octet))
-;; And return the generated code.
-`(let ((encode-table-sym (gentemp (format "%s-encode-table" ',name)))
-(encode-table ,encode-table))
-(define-translation-hash-table encode-table-sym encode-table)
-(make-coding-system
-',name 'ccl ,description
-(plist-put (plist-put ',props 'decode
-,(apply #'vector decode-program))
-'encode
-(apply #'vector
-(nsublis
-(list (cons
-'encode-table-sym
-(symbol-value 'encode-table-sym)))
-',encode-program))))
-	(coding-system-put ',name '8-bit-fixed t)
-(coding-system-put ',name 'category
-',(make-8-bit-choose-category decode-table))
-(coding-system-put ',name '8-bit-fixed-query-skip-chars
-,skip-chars)
-(coding-system-put ',name '8-bit-fixed-invalid-sequences-skip-chars
-,invalid-sequences-skip-chars)
-(coding-system-put ',name '8-bit-fixed-query-from-unicode encode-table)
-(coding-system-put ',name 'query-coding-function
-#'8-bit-fixed-query-coding-region)
-	(coding-system-put ',(intern (format "%s-unix" name))
-			   'query-coding-function
-			   #'8-bit-fixed-query-coding-region)
-	(coding-system-put ',(intern (format "%s-dos" name))
-			   'query-coding-function
-			   #'8-bit-fixed-query-coding-region)
-	(coding-system-put ',(intern (format "%s-mac" name))
-			   'query-coding-function
-			   #'8-bit-fixed-query-coding-region)
-,(macroexpand `(loop for alias in ',aliases
-do (define-coding-system-alias alias
-',name)))
-(find-coding-system ',name)))))
 ;; Ideally this would be in latin.el, but code-init.el uses it.
-(make-8-bit-coding-system
+(make-coding-system
 'iso-8859-1
-(loop
+'fixed-width
-for i from #x80 to #xff
-collect (list i (int-char i))) ;; Identical to Latin-1.
 "ISO-8859-1 (Latin-1)"
-'(mnemonic "Latin 1"
+(eval-when-compile
-documentation "The most used encoding of Western Europe and the Americas."
+`(unicode-map
-aliases (iso-latin-1 latin-1)))
+,(loop
+for i from #x80 to #xff
+collect (list i (int-char i))) ;; Identical to Latin-1.
+mnemonic "Latin 1"
+documentation "The most used encoding of Western Europe and the Americas."
+aliases (iso-latin-1 latin-1))))

Mercurial > hg > xemacs-beta

comparison lisp/mule/mule-coding.el @ 4690:257b468bf2ca